Skip to main content
Eukaryotic Cell logoLink to Eukaryotic Cell
. 2006 Sep 22;5(12):2079–2091. doi: 10.1128/EC.00222-06

Analysis of Euglena gracilis Plastid-Targeted Proteins Reveals Different Classes of Transit Sequences

Dion G Durnford 1,*, Michael W Gray 2
PMCID: PMC1694827  PMID: 16998072

Abstract

The plastid of Euglena gracilis was acquired secondarily through an endosymbiotic event with a eukaryotic green alga, and as a result, it is surrounded by a third membrane. This membrane complexity raises the question of how the plastid proteins are targeted to and imported into the organelle. To further explore plastid protein targeting in Euglena, we screened a total of 9,461 expressed sequence tag (EST) clusters (derived from 19,013 individual ESTs) for full-length proteins that are plastid localized to characterize their targeting sequences and to infer potential modes of translocation. Of the 117 proteins identified as being potentially plastid localized whose N-terminal targeting sequences could be inferred, 83 were unique and could be classified into two major groups. Class I proteins have tripartite targeting sequences, comprising (in order) an N-terminal signal sequence, a plastid transit peptide domain, and a predicted stop-transfer sequence. Within this class of proteins are the lumen-targeted proteins (class IB), which have an additional hydrophobic domain similar to a signal sequence and required for further targeting across the thylakoid membrane. Class II proteins lack the putative stop-transfer sequence and possess only a signal sequence at the N terminus, followed by what, in amino acid composition, resembles a plastid transit peptide. Unexpectedly, a few unrelated plastid-targeted proteins exhibit highly similar transit sequences, implying either a recent swapping of these domains or a conserved function. This work represents the most comprehensive description to date of transit peptides in Euglena and hints at the complex routes of plastid targeting that must exist in this organism.


A fundamental problem in cell biology is the precise and efficient targeting of proteins synthesized by cytoplasmic ribosomes to their appropriate intracellular locations. Proteins destined for the endomembrane system, mitochondria, or the chloroplast usually have specific N-terminal targeting domains that are required for proper subcellular localization. These leader sequences are often removed by specific proteases at the protein's destination prior to it assuming its active conformation. For chloroplast-targeted proteins in plants and algae, an N-terminal transit peptide (TP) is both necessary and sufficient for correct plastid targeting (11). Transit peptides are not conserved in sequence but exhibit characteristic biochemical properties, such as an elevated content of the hydroxylated amino acids serine and threonine as well as a deficiency of acidic (aspartate and glutamate) amino acids (76). Within a typical chloroplast, there are six distinct locations to which the constituent proteins must be sorted, and some proteins have to cross up to three membranes (33). This complexity requires additional targeting information within the transit peptide, such as the signal sequence-like domain found in proteins targeted to the thylakoid membrane, or information contained within the mature portion of the protein itself (62).

Plants, green algae, and red algae have plastids derived from an endosymbiotic cyanobacterium, with two membranes enveloping the chloroplast (34). Protein targeting to these plastids is fairly well understood; generally, the transit peptides direct newly synthesized proteins to the outer envelope membrane, where they interact with receptors and other components of the translocation apparatus so that protein import and subsequent sorting can take place (33). Many of the translocation components present in the outer and inner envelopes have been identified in plants (31).

Many protists, however, possess secondary plastids that are believed to have arisen from endosymbiosis with a eukaryotic alga. These organisms have complex plastids with either three membranes around the chloroplast, as occurs in the dinoflagellates and Euglena spp., or four membranes, as in the stramenopiles and haptophytes (34). The presence of additional membranes surrounding the plastid would seem to necessitate additional targeting information, complicating the process of translocation. We know, for example, that during the evolution of secondary plastids, genes from the endosymbiont were functionally transferred to the host's nuclear genome. These genes must then be expressed and their protein products targeted back to the organelle, and this process is undoubtedly more complicated than that in the case of primary plastids. A significant hurdle in this pathway is the necessity to acquire appropriate targeting information that allows nucleus-encoded proteins to be directed to the plastid and to traverse additional membranes in the process. Understanding the mechanism of targeting and translocation in organisms with complex plastids has been key to understanding how the transition from algal symbiont to plastid occurred (12, 35, 47, 50, 63, 74).

In protists with four membranes around the plastid, the outermost membrane often has ribosomes attached and is typically continuous with the endoplasmic reticulum (ER) (23). Proteins directed to these plastids possess bipartite targeting sequences, with an N-terminal signal sequence (24) that directs them to the chloroplast ER, where they are cotranslationally imported across the first membrane (4, 7, 30). The domain after the signal sequence is the predicted transit peptide for transport across the inner two membranes, in a process likely to resemble translocation across plant chloroplast envelopes (43).

The euglenophytes and dinoflagellates have plastids with three membranes, the outermost of which lacks bound ribosomes. In both cases, plastid proteins are targeted through the endomembrane system (49, 53, 67, 70, 71). From studies of several complete, publicly available Euglena gracilis plastid protein sequences (13, 25, 27, 28, 38, 44, 52, 56, 61, 64, 66, 73), it was predicted that the plastid proteins have an N-terminal signal sequence, an inference that was confirmed by both in vitro (38) and in vivo (70, 71) experimental approaches. Following the signal sequence is the predicted transit peptide, which is sufficient for translocation across plant chloroplast membranes (29), and a hydrophobic region that acts as a “stop-transfer” sequence to prevent complete transport into the ER, such that the mature protein remains in the cytoplasm (69). The protein is then targeted to the plastid, likely via a vesicular transport system (67). Also described for Euglena are tripartite transit sequences that possess an additional hydrophobic domain predicted to target proteins to the thylakoid lumen (73).

Because relatively few Euglena plastid protein sequences are publicly available, the study we report here more comprehensively examines the characteristics of plastid-targeting sequences. Since many of the known Euglena proteins, including all of those for which biochemical analyses of targeting have been conducted, are encoded as polyproteins, we sought to determine whether all plastid proteins are likely to proceed to the plastid via a similar pathway in this organism. By examining the targeting sequences of a large number of plastid proteins, the majority of which are not organized as polyproteins, we have been able to define the characteristics that can be used to identify Euglena plastid-targeted proteins with high confidence and to infer modes of transport to the plastid.

MATERIALS AND METHODS

E. gracilis strain Z was cultured under several different conditions, and cDNA libraries were produced commercially in the PCDNA3.1(+) vector (DNA Technologies Inc.). Expressed sequence tag (EST) sequencing was performed at the Atlantic Genome Centre (Halifax, Nova Scotia, Canada) and the B.C. Cancer Agency (Vancouver, British Columbia, Canada). A total of 19,013 ESTs were retained following quality and vector trimming via the taxonomically broad EST database (TBestDB [http://tbestdb.bcm.umontreal.ca/searches/login.php]), under the auspices of the Protist EST Program. The ESTs were clustered to form a total of 9,461 unique groups.

To search for plastid-targeted proteins, the 9,461 clusters were translated in three reading frames (ORFs) (plus orientation), and the longest ORF of >19 amino acids starting with a methionine was retained for further analyses (http://maven.smith.edu/∼vvouille/sumCGI/translator.html). Screening for plastid-targeted proteins was carried out in several rounds. First, all ORFs were screened for the presence of a signal sequence using the program SignalP3 (6, 51; http://www.cbs.dtu.dk/services/SignalP/). Any ORFs with a signal sequence predicted with the hidden Markov model (HMM) or the artificial neural network (NN) were retained. All selected ORFs were then rescreened, and those having a clear role in plastid function and/or those whose top BLASTnr hit was plant, algal, or cyanobacterial in origin were segregated for further consideration. Finally, the putative plastid-targeted proteins were screened further according to the following criteria: (i) the top BLAST hit (NCBI nonredundant database) was plant/algal or cyanobacterial and/or the protein has a clear role in plastid function, and (ii) the BLASTp E value was ≤1e−05. The ORF was considered to possess a complete transit sequence when (i) there was evidence for a spliced leader sequence (TTTTTTTCG) at the 5′ end of the cDNA that would indicate that the cDNA was full length (72), (ii) there was an extension of the ORF toward the N terminus upstream of the first region of evident amino acid sequence similarity following a BLASTp search, and (iii) the beginning of the mature protein was identified by comparison with orthologous proteins.

Potential membrane-spanning regions were identified using the hidden Markov model-based program TMHMM (39; http://www.cbs.dtu.dk/services/TMHMM/). Hydrophobicity plots were generated using the Protscale program at the exPASy site (http://www.expasy.org/tools/protscale.html), using a Kyte-Doolittle scale with a sliding window length of 7 or 19 nucleotides, as indicated. The amino acid content of peptides was calculated using the PEPSTATS program in the EMBOSS package, available at AnaBench (http://anabench.bcm.umontreal.ca/anabench/Anabench-Jsp/Welcome.jsp). Sequence logo displays were generated using the online program WebLogo (weblogo.berkeley.edu/logo.cgi).

Nucleotide sequence accession numbers.

All individual EST sequences have been deposited in the NCBI dbEST database under accession numbers EG565093 to EG565263.

RESULTS

From 9,461 individual Euglena EST clusters, a total of 117 full-length plastid proteins were identified. Eliminating nearly identical isoforms from the data set left a total of 83 unique proteins for further analysis (Table 1). In addition to functioning in basic photosynthetic reactions, the proteins identified had predicted roles in the biosynthesis of proteins, lipids, carotenoids, and chlorophyll. Proteins involved in signal transduction and plastid metabolism were also found. Through determination of the N-terminal-most regions of sequence similarity in BLASTp searches, targeting domains were delineated and found to be very long, with an average size (± standard deviation) of 152 ± 25 residues (Table 1). The shortest estimated presequence was 95 residues, for Rubisco activase, and the longest was 211 (Albino3) (Table 1).

TABLE 1.

EST clusters

Cluster IDa Annotation Organism with top BLASTp hit BLASTp score Presequence length (aa) TP length (aa)b TMH1 positionc TMH2 positionc Id
Class IA proteins
    0726 Ferredoxin Arabidopsis thaliana 3e−06 148 60 7-29 89-111 2
    0899 50S ribosomal protein L3 Cyanophora paradoxa 3e−12 168 52 13-35 87-109 2
    1043 Putative ferredoxin Arabidopsis thaliana 4e−09 144 65 7-29 94-113 1
    1116 30S ribosomal protein S20 Synechococcus elongatus 4e−08 170 64 13-35 99-121 2
    1127 Zeta-carotene desaturase Oryza sativa 1e−22 182 55 21-43 98-121 2
    1204 Uroporphyrinogen decarboxylase Anopheles gambiae 2e−31 149 60 21-38 98-120 2
    1312 Putative ferredoxin Arabidopsis thaliana 3e−09 143 58 11-33 91-110 1
    1428 RubisCO small subunit Euglena gracilis e−118 120 56 4-26 82-104 2
    1495 Glutaredoxin 2 Actinobacillus actinomycetem- comitans 6e−10 130 55 7-24 79-101 2
    1503 Membrane-associated 30-kDa protein Pisum sativum 2e−10 168 75 7-29 104-126 1
    1573 Putative ferredoxin Arabidopsis thaliana 6e−09 147 61 13-35 96-118 1
    1674 Sugar nucleotide phosphorylase Arabidopsis thaliana 7e−12 168 66 21-43 109-131 1
    1706 50S ribosomal protein L34 Arabidopsis thaliana 7e−06 190 63 13-35 98-117 1
    2042 Peptidyl-prolyl cis-trans isomerase Oryza sativa 6e−07 178 67 7-26 93-117 1
    2448 Ferredoxin-like protein Rhizobium loti 8e−05 120 58 15-37 95-117 1
    2566 Ribose-5-phosphate isomerase Spinacia oleracea 2e−24 144 61 20-42 103-125 1
    2596 Ycf53 (tetrapyrrole-binding protein) Synechococcus elongatus 2e−12 182 58 12-34 92-114 1
    2669 50S ribosomal protein L11 Odontella sinensis 3e−12 186 55 19-41 96-118 2
    2795 d-Ribulose-5-phosphate 3-epimerase Arabidopsis thaliana 1e−24 150 58 21-40 98-120 2
    2990 Albino 3 Bigelowiella natans 5e−37 211 77 29-51 128-150 1
    3121 Chaperonin PSII quinone-binding protein Arabidopsis thaliana 8e−30 189 94 3-25 119-136 1
    3164 Rhodanese domain-containing protein Oryza sativa 6e−08 133 58 5-24 82-104 1
    3171 Photosystem II 22-kDa protein Arabidopsis thaliana 3e−17 152 67 21-40 107-126 1
    3330 Coproporphyrinogen III oxidase Chlamydomonas reinhardtii e−108 156 53 22-44 97-119 2
    3362 ATP synthase delta chain Nicotiana tabacum 7e−22 147 57 15-37 94-116 2
    3372 50S ribosomal protein L15 Bigelowiella natans 1e−14 191 54 24-46 100-119 1
    3375 Light-regulated Chlp-localized protein Solanum tuberosum 4e−20 120 60 12-31 91-110 1
    3383 ATP synthase gamma chain Odontella sinensis 7e−78 137 60 13-35 95-113 3
    3449 Cytochrome f Euglena gracilis 4e−91 147 60 7-26 86-108 2
    3469 Porphobilinogen deaminase Euglena gracilis 0 151 56 17-39 95-112 2
    3474 Probable membrane-associated 30-kDa protein Synechocystis sp. 7e−49 151 63 7-26 89-111 1
    3482 Fructose-1,6-bisphosphatase Bigelowiella natans 2e−71 188 56 20-37 93-115 1
    3500 Glu 1-semialdehyde 2,1-aminomutase Chlorarachnion sp. e−148 138 53 7-29 82-99 2
    3504 Carbonic anhydrase Deinococcus radiodurans 7e−28 102 52 5-24 76-98 3
    3558 Carbonic anhydrase Deinococcus radiodurans 1e−10 140 62 13-35 97-119 1
    3594 50S ribosomal protein L28 Toxoplasma gondii 4e−18 160 55 13-35 90-112 2
    3603 Peroxiredoxin precursor Chlamydomonas reinhardtii 8e−85 134 57 5-27 84-106 2
    3619 50S ribosomal protein L21 Thermoanaerobacter tengcongensis 3e−11 168 52 13-35 87-109 1
    3635 Coproporphyrinogen III oxidase Chlamydomonas reinhardtii 1e−78 169 60 29-51 111-133 3
    3653 Delta 12 fatty acid desaturase Phaeodactylum tricornutum 2e−98 162 77 13-30 107-129 2
    3673 Carbonic anhydrase Deinococcus radiodurans 8e−30 179 68 13-32 100-122 4
    3676 30S ribosomal protein S1 Chlamydomonas reinhardtii 7e−32 233 66 4-26 92-114 1
    3817 Acyl carrier protein Synechocystis sp. 6e−13 122 49 15-34 83-105 1
    3830 Ferredoxin Euglena viridis 1e−41 138 56 17-39 95-117 3
    3881 ATP/ADP transporter Galdieria sulfuraria 0 148 52 12-34 86-105 1
    3900 PsbM Zea mays 0.017 154 63 13-35 98-120 3
    3911 LHCI Euglena gracilis 5e−86 179 50 13-35 85-107 1
    3934 Ferredoxin-NADP+ reductase Chlamydomonas reinhardtii e−144 114 51 5-27 78-100 1
    3943 NADPH protochlorophyllide reductase Chlorarachnion sp. 5e−69 155 53 12-34 87-109 1
    3946 RuBisCO activase Chlorococcum littorale e−145 95 59 13-35 72-101 2
    3996 LHCI Euglena gracilis e−116 158 55 13-35 90-109 2
    4008 LHCI Euglena gracilis 0 141 51 13-35 86-108 1
    4056 CP29 Oryza sativa 7e−57 136 50 12-34 84-106 1
    7084 Chl. synthase 33-kDa subunit Anabaena sp. 9e−10 141 45 15-34 79-101 1
    7147 SOUL-heme-binding protein Arabidopsis thaliana 7e−20 136 68 5-22 90-112 1
    7392 ATP-dependent Clp protease Vibrio cholerae 1e−44 143 58 15-37 95-117 1
    7739 Ycf3 (PSI assembly) Physcomitrella patens 3e−27 162 64 20-42 106-128 1
    7766 RuBisCO 60-kDa chaperonin Arabidopsis thaliana 5e−37 122 67 7-29 96-118 1
    8108 YebC-related protein Arabidopsis thaliana 7e−17 108 57 7-29 86-108 1
    8254 Chlorophyll b synthase Dunaliella salina 8e−18 114 70 6-22 92-111 1
    8643 Uroporphyrinogen decarboxylase Ashbya gossypii 1e−17 164 63 17-39 102-124 1
    8888 3-Isopropylmalate dehydrogenase Bifidobacterium longum 7e−13 150 71 13-35 106-128 1
    9366 Photosystem II family protein Arabidopsis thaliana 4e−11 137 61 13-35 96-118 1
Class IB proteins
    3955 Oxygen evolving enhancer (OEE1) Euglena gracilis e−116 142 53 5-27 80-99 3
    4026 Oxygen evolving enhancer (OEE2) Lycopersicon esculentum 6e−17 153 49 20-42 91-113 2
    3381 HCF136 (PSII stability factor) Arabidopsis thaliana 5e−07 142 52 7-29 81-103 1
    3249 Putative ascorbate peroxidase Lycopersicon esculentum 2e−04 184 70 13-35 105-127 1
    3902 Cytochrome c6 Euglena gracilis 4e−69 123 60 29-51 111-133 2
    3752 PSI subunit III (PsaF) Chlamydomonas reinhardtii 6e−53 144 60 13-35 95-114 2
    2674 Thylakoid luminal 17.4-kDa protein Arabidopsis thaliana 5e−22 171 71 15-37 108-127 1
Class II proteins
    3630 Photosystem II (PsbW) Chlorarachnion sp. 4e−15 82 52 20-37 3
    3294 ABC transporter (cytochrome c biogenesis) Nostoc punctiforme 5e−33 175 135 34-53 1
    0923 PEP/phosphate translocator Phaeodactylum tricornutum 4e−10 166 132 13-35 1
    4012 Oxygen evolving enhancer (OEE3) Chlamydomonas reinhardtii 3e−22 61 36 13-35 2
    2060 Mg-protoporphyrin IX methyltransferase Synechococcus elongatus 4e−17 66 40 5-27 1
    2416 Peptide chain release factor (RF) 2 Synechocystis sp. 3e−42 99 70 13-35 1
    3797 PSI subunit IV (PsaE) Chlamydomonas reinhardtii 6e−17 95 61 15-37 3
    4932 50S ribosomal protein L9 Bigelowiella natans 6e−05 62 39 15-33 1
    8550 Short-chain (SC) dehydrogenase Prochlorococcus marinus 8e−07 120 82 29-51 2
    3784 Phosphoribulokinase Vaucheria litorea 1e−76 100 75 20-42 1
    9282 MECP synthase Arabidopsis thaliana 2e−36 121 80 28-50 1
    6808 Squalene and phytoene synthases Prochlorococcus marinus 1e−27 98 47 35-52 1
    2660 ClpB Phaseolus lunatus 8e−48 123 76 37-52 1
a

Original cluster IDs had “EEL0000” preceding the 4-digit numbers shown.

b

For class I proteins, this is the region between the signal sequence and the stop-transfer region.

c

TMH1 and TMH2 are the hydrophobic domains (range of amino acids is given from the start Met) of the signal sequence and stop-transfer sequence, respectively, as predicted by the TMHMM program. Underlined regions indicate that the TMHMM program did not predict a TMH (TMHMM value, 0.1 < P < 0.9) but that a hydrophobic patch is apparent from a Kyte-Doolittle analysis.

d

Number of nearly identical isoforms detected.

Of the few Euglena plastid proteins examined to date, all possess an N-terminal region similar to a eukaryotic signal sequence. Thus, the first strategy for identifying plastid-targeted proteins was to search for the presence of such a sequence, using SignalP3. Of the final group of 83 plastid-targeted proteins examined using the SignalP hidden Markov model, 68% were predicted to possess a signal sequence. This value dropped to 56% when the artificial NN was employed. In cases where SignalP did not predict a signal peptide but other screens indicated a potential plastid-targeted protein, there was nevertheless a clear hydrophobic region characteristic of a signal peptide. Based on the NN predictions for the signal sequence cleavage sites, the Euglena signal sequence was estimated to be 33 ± 9 residues long (range, 18 to 59 residues). The predicted cleavage site was consistent with that in other eukaryotic signal sequences (Fig. 1C).

FIG. 1.

FIG. 1.

Characteristics of class I targeting sequences of Euglena. (A) Averaged TMHMM probabilities for 70 class I proteins identified in this study. Because the region upstream of the first TMH is of variable length (range, 2 to 32 amino acids; mean, 12.7 ± 6.7 amino acids), the data were normalized to a starting TMHMM probability of ≥0.1, which corresponds to the beginning of a predicted membrane-spanning region, and then averaged. The error bars show 2 standard errors. Key features of a Euglena class I targeting sequence are depicted above the graph. (B) Overview (McClade) of amino acid categories of the targeting sequences of selected plastid-targeted proteins. Colors represent different amino acids, as follows: gray, hydrophobic and nonpolar (A, C, F, G, I, L, M, P, V, W, and Y); red, acidic (D and E); purple, basic (H, K, and R); yellow, hydroxylated (S and T); and blue, polar (Q and N). (C) Sequence logo plot showing occurrence of amino acids around the signal sequence cleavage site (arrow) predicted by SignalP (neural net). The y axis is displayed as bits, as described at weblogo.berkeley.edu/logo.cgi.

E. gracilis plastid-targeting sequences can be divided into two classes.

Class I plastid-targeting sequences are designated by analogy to a similar type of targeting domain identified in dinoflagellates (55). This class encompassed 89% of the Euglena proteins examined, which were characterized by the presence of two hydrophobic regions that are predicted by the TMHMM program to be transmembrane helices (TMH) (Fig. 1A). Figure 1A shows the average TMHMM probability for the class I plastid-targeting regions, with the first predicted transmembrane helix (TMH1) corresponding to the hydrophobic domain of a classic signal sequence (75). A basic amino acid precedes the first TMH in all but six proteins, with the average charge of this N-terminal region being +1.6. In only one case is the N-terminal region negatively charged (Table 1, cluster 3881 [ATP/ADP transporter]).

The location of the second TMH is remarkably consistent, at 60 ± 8 amino acids following the end of the first predicted TMH, with a range of 45 to 94 amino acids. We designate this localization the “60 ± 8 rule” (Fig. 1A). The properties of the amino acids within the targeting regions of selected plastid-localized proteins are shown in Fig. 1B. In this figure, the hydrophobic regions (gray) are obvious. The presence of the two TMH motifs separated by 60 ± 8 amino acids had excellent discriminating power for identifying potential plastid-targeted proteins. For class I targeting sequences, the TMHMM program was able to predict upwards of 95% of the plastid proteins simply by searching for N-terminal regions with TMHs according to the 60 ± 8 rule. If we combined the entire set of predicted plastid proteins (all classes), the TMHMM program would have an overall success rate of 82%. In cases where the TMHMM probability did not meet the threshold for formal TMH prediction (Table 1, underlined values), the probability of a TMH was usually between 0.3 and 0.9, and the success rate would be very high if the threshold was reduced in subsequent rounds of screening. Rescreening the entire population of ORFs using the 60 ± 8 rule detected all of the class I proteins listed in Table 1, including isoforms, plus an additional 25 proteins classified as unknowns (data not shown). The TP domains of dinoflagellates, whose plastid leader sequences have a similar structure (49), are about half the size (25 ± 8 residues) (data not shown) of those of Euglena proteins.

Class IB proteins (Table 1) also possess two predicted TMHs separated by 60 ± 8 amino acids, but they have a third hydrophobic domain with a mean distance of 17 residues (range, 7 to 25 residues) downstream of the end of TMH2 (Fig. 2). This region resembles a prokaryotic signal sequence and is postulated to function in the targeting of proteins to the thylakoid lumen (73). We identified five proteins that are homologous to thylakoid lumen-localized proteins and for which biochemical evidence for this location exists (four of these class IB proteins are shown in Fig. 2, along with a lumen-targeted class II protein [see below]). Two additional proteins are predicted to function in the lumen, based on their annotation as well as their possession of a putative lumen-targeting domain (LTD). Three of the seven class IB proteins (ascorbate peroxidase, HCF136, and OEE2) contain a double Arg immediately preceding the third hydrophobic domain (data not shown); another two class IB proteins (PSI-III and cytochrome c6) have the same motif within six amino acids of the start of the hydrophobic LTD, suggesting that the twin-arginine translocation (Tat) pathway (58) is functional in Euglena.

FIG. 2.

FIG. 2.

Kyte-Doolittle hydropathy plots for class IB plastid-targeting sequences of Euglena. Hydrophobicity plots for five confirmed lumen-targeted proteins are shown. The analyses were conducted with a window size of 19, and the hydrophobic regions (positive scores) corresponding to the TMHs of the signal sequence (SS) and the stop-transfer sequence (ST) are indicated with black bars. The hydrophobic region corresponding to the LTD is indicated with gray bars. Oxygen-evolving enhancer 3 (OEE3) has a class II targeting sequence and thus lacks the typical ST region. TP, transit peptide; MP, mature protein.

Class II targeting sequences in Euglena represent a departure from the class I type in that they lack the second TMH region upstream of the region specifying the mature protein, and hence do not conform to the 60 ± 8 rule (Fig. 3). The TMHMM probability scatter plot shows the presence of the hydrophobic region associated with the signal sequence in all class II proteins. This class represents 14% of the identified population of plastid-targeted proteins. Of the 13 class II proteins delineated so far, 6 have unambiguous functions in the plastid, while the others conceivably could be targeted elsewhere. However, they all possess signal sequence-like N termini, and their predicted functions are expected to occur within the plastid. Each of these proteins is also related to homologs from photosynthetic taxa, as gauged by the top BLASTp hit (Table 1), supporting a putative plastid localization. The OEE3 protein, which is located within the thylakoid lumen, exhibits a second hydrophobic region (Fig. 3, arrow) that represents an LTD analogous to that found in class IB targeting domains. The estimated presequence length is 100 amino acids. All of the class II sequences have a spliced leader sequence, indicating that they are not class I sequences that were artifactually truncated upstream of the stop-transfer domain.

FIG. 3.

FIG. 3.

Characteristics of class II targeting sequences of Euglena plastid proteins. (A) Scatter plot showing TMHMM probability for the first 100 amino acids. Because the region before the first TMH is of variable length, the data were normalized to a starting TMHMM probability of ≥0.1. In all cases, a second TMH 60 ± 8 amino acids downstream from the first was absent. The hydrophobic region centered at position 45 is the LTD of OEE3. (B) Overview (McClade) of amino acid categories of the targeting sequences of class II plastid-targeted proteins. Colors represent defined categories of amino acids, as indicated in the legend to Fig. 1. The black arrowhead indicates the predicted signal sequence cleavage site.

Plastid transit peptides of class I and II proteins.

In plants, targeting of proteins to the chloroplast is mediated by a transit peptide (for a review, see reference 11). Although sequence conservation per se is lacking, there is a general maintenance of certain chemical properties, including enrichment for the hydroxylated amino acids serine and threonine and a deficiency in acidic residues (76). In Euglena class I proteins, the intervening region between the two TMHs likely functions as a plastid TP (29). For class II proteins, we predicted that the region immediately following the signal sequence must have a role in targeting to the plastid. The exact length of the putative TP was difficult to assess, as we had little confidence in the ability of ChloroP to correctly predict the cleavage site, and thus the values in Table 1 are only estimates. However, from the predicted signal sequence site to the first region of clear sequence similarity to known proteins, the length ranged from 36 to 135 amino acids.

To test whether the class II TP region was similar to that of class I targeting sequences and to determine the chemical properties of both TP domains compared to the TPs of green algae and plants, we examined their amino acid compositions. We also compared these compositions to those of the mature region of proteins with class I targeting domains as well as selected Chlamydomonas proteins. The amino acid composition was calculated from the entire intervening region between the TMH regions of class I proteins (the predicted transit peptide), the estimated transit peptide from class II proteins that was located after the signal sequence and before the predicted start of the mature protein, and the entire coding region from all proteins having class I targeting sequences.

The data for selected amino acids and amino acid categories are shown in the form of box-and-whisker plots (Fig. 4). Since plastid transit peptides are reportedly enriched in hydroxylated amino acids and deficient in acidic amino acids (76), we analyzed a priori these amino acid categories in the putative TPs of class I and class II targeting sequences of Euglena in addition to a selection of 25 predicted TPs from Chlamydomonas proteins (Fig. 4). The region immediately downstream of the signal sequence in class I and II targeting sequences was significantly enriched in Ser and Thr (22% and 17%, respectively) compared to the mature regions of proteins with class I targeting sequences (11%) (one-way analysis of variance [ANOVA] and Tukey's test [α ≤ 0.05]). The TPs of Chlamydomonas proteins were similarly enriched in Ser/Thr (17%) compared to the mature portions of the proteins (11%) (Fig. 4). The putative transit peptide regions of class I and II targeting sequences were also significantly depleted in acidic amino acids (Asp and Glu) compared to the mature regions of the same proteins (Fig. 4) (one-way ANOVA and Tukey's test [α ≤ 0.05]). The predicted transit peptide regions were also found to have a higher Ala and Pro content than the mature portions of proteins (Fig. 4) (one-way ANOVA and Tukey's test [α ≤ 0.05]). However, given that 20 tests were conducted and that the amino acid composition is not truly independent, there is a possibility that some of these differences could be by chance. Although the Chlamydomonas TP exhibited a clear elevation in Ala content, there was no difference in the amount of Pro compared to that in the Euglena TPs. In terms of charged amino acids, the TP region is deficient in acidic amino acids, yet there is little significant change in the content of basic (His, Lys, and Arg) residues compared to the mature regions of the same proteins. However, examination of Lys and Arg separately reveals discrimination against Lys in the TP regions of class I and II targeting sequences (mean, 1.6% and 2.1%, respectively) compared to the mature proteins (mean, 5.8%; P < 0.001 [Kruskall-Wallis]) (Fig. 4). There were no significant differences in Arg content between the predicted transit peptides and the mature portions of the same proteins. Chlamydomonas TPs discriminate strongly against acidic amino acids (mean, 0.2%) and have an elevated content of Arg compared to the mature regions of the same proteins. Unlike Euglena, Chlamydomonas shows no bias against Lys in the TP. Without exception, the amino acid compositions of the Euglena class I and II transit peptides were the same, and both were significantly different from the composition of the mature protein (Fig. 4).

FIG. 4.

FIG. 4.

Amino acid composition analyses of the predicted TPs of class I and II targeting sequences compared to the mature proteins (MP). The amino acid compositions of the intervening region between TMH1 and TMH2 of class I targeting sequences (TP, I; n = 70), the predicted transit peptide region for class II proteins (TP, II; n = 13), and the mature protein regions from class I proteins (MP, I; n = 70) were determined. Also shown are the amino acid compositions of Chlamydomonas reinhardti TPs (TP, Cr; n = 25) and mature proteins (MP, Cr; n = 25). Box-and-whisker plots were used to represent the data and are based on quartiles around the median value. The box encloses 50% of the data, with 25% above and below the median (solid line). Each whisker represents the data range of an additional 25% of the data. The existence of outliers beyond the 5% and 95% confidence ranges is indicated with a solid dot where applicable. Categories indicated with different letters on the plot are significantly different (one-way ANOVA and Tukey's test [α ≤ 0.05]). All data were normal except for the Lys content in class II peptides, in which case nonparametric statistics were used to assess differences.

To examine the distribution of the acidic amino acids further, the class I transit peptide region was divided into thirds, and the acidic amino acid content was calculated (Fig. 5). From this analysis, an asymmetric distribution of acidic amino acids was apparent, such that the first third (TP1) lacked acidic residues (1%) while the latter third (TP3) had the same acidic content as the mature protein (11%). The Ser/Thr compositions of the putative TPs were not different among the three regions (TP1-3) (Fig. 5). The basic amino acid composition was the same within the three TP regions and the mature protein (Fig. 5).

FIG. 5.

FIG. 5.

Amino acid composition analysis of the plastid TP domain of class I targeting sequences. Each TP region was divided into three equal segments (TP1-3), and the basic (H, K, and R), acidic (D and E), and serine/threonine (Ser/Thr) contents were calculated. These values were compared to the averaged amino acid composition of the mature protein (MP).

Overall, the putative TP domains of the two classes of Euglena targeting sequences have the same amino acid composition, and this composition resembles that of plant chloroplast transit peptides (11, 20, 76) in terms of an elevated content of Ser/Thr. These putative TP domains were also predicted to be transit peptides by using ChloroP (18), with apparent success rates of 83% and 67% for class I and II targeting sequences, respectively, when the signal sequence domain was removed. Surprisingly, the success rates were still respectable when the signal sequence was retained during the analysis (71% and 50%) but not when the entire targeting sequence was removed. One notable exception is the lumen-targeted protein OEE3, with a class II targeting sequence that has a mere seven amino acids between the end of the TMH (the signal sequence) and the putative hydrophobic LTD. Two of the seven residues are basic amino acids (no acidic residues), but the region immediately after the LTD is strongly acidic (data not shown). Euglena TPs, like others in the green alga lineage, lack a requirement for a Phe at the N terminus that is commonly observed in chromalveolates (15, 55, 57) and glaucophytes (68) and that is essential for plastid import in vivo in diatoms (37).

Stop-transfer sequences are a predicted feature of class I proteins.

Stop-transfer sequences function to halt the cotranslational import of proteins into the ER and serve an important role in determining the orientation of a protein in the membrane (8). For Euglena, it has been proposed that the second TMH acts as a stop-transfer sequence (69). From analysis of a large number of proteins with class I targeting sequences, it is clear that a stop-transfer sequence is a common motif in Euglena plastid-targeted proteins. In a few cases (Table 1), the second TMH region was not predicted by the TMHMM program, and the probability of having a TMH ranged from 0.1 to 0.9. Nevertheless, in these cases, subsequent hydropathy plots confirmed that these targeting domains are still strongly hydrophobic (data not shown) and therefore likely to have the same stop-transfer function. Immediately following the second TMH and within six residues of its end, ca. 80% of proteins of this class have two or more basic amino acids, and 97% of proteins have at least one. Only 2 of the 71 class I proteins lack a positively charged residue immediately after the TMH. The sharp change in polarity immediately after the second hydrophobic region, particularly towards positively charged residues, is apparent in the hydropathy plots encompassing this region (Fig. 6A). Class I polypeptides display a sharp decline in hydrophobicity immediately following the second TMH, a feature that presumably acts to block further insertion into the membrane. In class IB proteins, an additional hydrophobic region, the lumen-targeting domain, is located 25 to 30 amino acids further downstream. In contrast, class II polypeptides do not exhibit this sharp increase in polarity immediately after the hydrophobic section of the predicted signal sequence. The sequence logo illustrates the common occurrence of basic amino acids immediately after the hydrophobic domain of the stop-transfer domain (Fig. 6B), which is not observed after the signal sequences of class II proteins. These differences provide additional evidence that the TMH of a class II protein is not simply the second TMH of a 5′-truncated cDNA encoding a class I protein.

FIG. 6.

FIG. 6.

(A) Kyte-Doolittle hydrophobicity profiles for the stop-transfer region of class I targeting sequences and the region immediately following the signal sequence of class II targeting domains. Plots begin 10 amino acid residues upstream of the start of the second TMH (for class I proteins) or the first TMH (for class II proteins), and the hydrophobicity profiles were calculated with a window size of 7 residues. The thick lines are the mean scores, and the thin lines on either side represent the 95% confidence intervals. The black bars above the hydrophobic regions indicate the location of the predicted TMH. (B) Sequence logo plot of class IA sequences when the second transmembrane helixes (TMH2) were aligned. Only the regions immediately before and after TMH2 are shown.

Some plastid transit sequences are conserved.

When the plastid-targeting sequences of Euglena class I proteins were used individually as tBLASTn queries against the Euglena database, unexpected similarities in certain groups of unrelated proteins were revealed. For instance, FNR and CP29 possess nearly identical targeting sequences despite having no functional relationship (Fig. 7A). There is also a group of targeting sequences that show various degrees of similarity, particularly within the signal sequence and plastid-targeting domains of the transit peptide. Within this group, the targeting sequences of rpL21 and rpL3 (tBLASTn E value = 5e−59) are nearly identical, and these sequences share a high degree of similarity with the targeting sequences of an acyl carrier protein (E = 2e−29) and two different light-harvesting complex (LHC) subunits (E = 2e−39 and 4e−21). Interestingly, the targeting domain of the first LHCI-like sequence shares a greater degree of similarity with those of the acyl carrier protein and ribosomal proteins than with the targeting domain of the other LHCI sequence (or of any other LHC sequence in the database). This similarity even extends into the putative stop-transfer domain, a region not expected to be conserved. In other cases, a tBLASTn search with a specific targeting sequence allowed the detection, as expected, of isoforms and members of a multigene family, a result that is attributable to gene duplication events. This search approach was able to recognize a variety of different plastid-targeting proteins, although the E values were generally >1e−20; thus, some of this similarity could simply be due to the constraints placed upon these regions by amino acid composition. In marked contrast, many other targeting sequences produced no significant hits at all. With the exception of rpL3, which was represented by a single EST, the remainder of the EST clusters analyzed here comprised multiple overlapping reads, with clear evidence of a spliced leader sequence, eliminating clustering artifacts as an explanation for the observed similarity.

FIG. 7.

FIG. 7.

Alignment of targeting sequences from selected Euglena plastid-targeted proteins. (A) Comparison of FNR and CP29 targeting sequences. Identical amino acids are white on a black background. (B) Second group of proteins possessing similar targeting sequences. Identical amino acids compared to the top sequence are indicated by white letters on a black background. The hydrophobic regions of the signal sequence and stop-transfer domains are indicated by lines above the appropriate amino acids. The mature portions of the proteins, if shown, are indicated with double underlining.

DISCUSSION

The discovery of LHC precursors in Golgi dictyosomes of Euglena (53) and subsequent in vitro experiments demonstrated that the Euglena LHCII presequence does indeed possess a functional signal motif (38), an inference that is strongly supported by the study reported here. Although the presence of a signal sequence-like region was part of our selection criteria, we found no evidence in the entire database of a plastid-targeted protein that lacked a signal sequence, suggesting that in Euglena, all plastid proteins proceed to the organelle via the endomembrane system. Although some in vitro studies have suggested the potential direct import of proteins into Euglena plastids, thereby bypassing the ER (65), the bulk of relevant biochemical work indicates that transport via the endomembrane system is required for plastid targeting (38, 67, 69-71). The endomembrane system is also important for plastid targeting in all protists with complex secondary plastids, including those with three (49, 55) and four (4, 7, 16, 19, 37, 59, 78, 79) plastid membranes.

In Euglena, proteins targeted to the plastid do not fully insert into the ER lumen or the membrane during translation due to the presence of a stop-transfer domain, so the majority of the protein remains exposed in the cytoplasm (69). Indeed, in class I proteins, the presequence has a second hydrophobic region followed by positively charged amino acids, both of which are characteristics typical of stop-transfer sequences (14, 41). Although 2 of the 70 class I proteins lack positively charged amino acids immediately after the second TMH, such residues are not an absolute requirement for a stop-transfer function, with the effectiveness of targeting depending on a combination of hydrophobicity, length, and charge (14, 41, 60). The presence of a functioning stop-transfer motif in a plastid presequence is unique to Euglena and dinoflagellates. Both groups have three plastid membranes, leading Nassoury et al. (49) to suggest that the stop-transfer sequence arose from a mechanistic requirement driven by the number of plastid membranes. It is generally agreed that Euglena and dinoflagellates are phylogenetically distant; thus, the similarities between their targeting sequences, and presumably the underlying transport mechanisms, would appear to be convergent as part of a necessary step in protein targeting.

Although targeting in organisms with complex plastids first requires import of the protein into the ER, little is known about subsequent mechanisms of targeting to the plastid. In organisms with three plastid membranes, such as euglenophytes and dinoflagellates, targeting from the ER to the outer plastid membrane involves vesicular transport via the Golgi system (49, 53). The segregation of plastid-bound proteins into the proper vesicles may involve receptors located in the endomembrane system that recognize the transit peptide and direct the protein to its appropriate destination. This pathway is analogous to that in animal and fungal systems, where receptors within the endomembrane system, such as the classic mannose-6-phosphate receptor system for targeting to the lysosome (22), are able to recognize features of the protein and ensure proper localization. Ultimately, cytoplasmic sorting factors, such as adaptins (9), may play a role in the accumulation of plastid-targeted proteins and their segregation to vesicles destined for the plastid. Such cytosolic factors could participate in the recognition of receptors that bind to plastid-targeted proteins and/or specific motifs just beyond the stop-transfer domain of the targeted protein itself to facilitate targeting. One potential series of residues includes the cluster of basic amino acids that immediately follows the stop-transfer domain. The importance of short, cytoplasm-exposed targeting motifs for intracellular sorting is well known (9). For Euglena, Sláviková et al. (67) determined that this cytoplasm-exposed portion of the presequence is not required for plastid import in vitro, but they suggested that it may function in vesicle routing.

Of particular interest here is our discovery of plastid-targeted proteins lacking the putative stop-transfer sequence (class II), implying that these proteins are inserted entirely into the ER, leaving a soluble portion within the ER lumen and a membrane portion integrated within the ER membrane, once the signal sequence is removed. Given that the Euglena class II proteins comprise both soluble and membrane proteins, it is unlikely that other domains within the mature protein could impart a similar stop-transfer effect to compensate for the lack of such a region in the presequence. The targeting route for class II proteins is conceptually similar to the targeting of proteins to the remnant plastid (apicoplast) in apicomplexans; apicoplast proteins lack the stop-transfer sequence and are targeted to the plastid via the ER (19), presumably by vesicular transport. Thus, for correct targeting, the putative transit peptide, and possibly the mature protein, must contain features that would be recognized by specific cofactors or receptors that are localized to the ER lumen, not the cytoplasm. Since class II transit peptides are predicted to lack the stop-transfer sequence and thus the cytoplasm-exposed region just beyond, redirection to the plastid must be facilitated solely by interaction with targeting factors that bind to the TP and allow these precursors to “hitchhike” in vesicles with the class I proteins. An alternative, albeit unlikely, mechanism is that the class II signal sequence acts as a signal anchor, with the N terminus facing the ER lumen. However, in this orientation the transit peptide would be facing the cytoplasm and presumably would be inaccessible to the targeting machinery.

Even more surprising is the resemblance of this class of targeting sequence to those of dinoflagellates, whose plastid-targeted proteins also exhibit a similar proportion of presequences lacking stop-transfer domains (55), with the remainder resembling class I proteins. As possible explanations for the dinoflagellates, Patron et al. (55) ruled out the evolutionary history of the gene transfer or final destination of the protein, suggesting instead that the “physical characteristics” of the plastid-targeted protein may determine the nature of its presequence. In support of the latter hypothesis, they found that the class I and II distinction was conserved between proteins in two dinoflagellates examined. If “physical characteristics” was the main factor determining the mode of transport, then we would predict that Euglena would exhibit a similar distribution of proteins having class I and II presequences. Some similarities are clearly evident, such as with phosphoribulose kinase and oxygen-evolving enhancer 3 (PsbO), which lack a stop-transfer sequence in both dinoflagellates and Euglena. However, other dinoflagellate proteins with class II (and III) targeting sequences are class I proteins in Euglena (acyl carrier protein, carbonic anhydrase, cytochrome c6, and the PSII 11-kDa protein). Although the sample size for comparison is small, there do not appear to be any obvious inherent functional or physical properties that would require a class I versus class II targeting sequence. In vitro import assays should help to define the functional requirements of the different classes of presequence and determine whether either is essential for the import of specific proteins.

With the exception of apicomplexans, complex plastids with four plastid membranes often have ribosomes attached to the outer membrane (chloroplast ER [CER]). However, the primary plastid-sorting mechanism must still occur after cotranslational import across/into the ER membrane, since in diatoms the signal sequences of ER and plastid-resident proteins are functionally equivalent (37), and in a raphidophyte, few ribosomes are bound to the CER (30). Thus, once inserted into the endomembrane system, the plastid-bound proteins still have to be targeted to and transverse at least three membranes, similar to the situation in Euglena and dinoflagellates. Though not involving the Golgi dictyosomes, a vesicular transfer between the CER and the third membrane has been proposed (23), and a recent report supports such a mechanism (37). Apicomplexans, in particular, provide a valuable model system for dissecting the targeting process in complex plastids with four membranes, with several studies indicating not only that there is partially redundant targeting information in the presequences of apicoplast-targeted proteins (26, 81, 82) but also that there is a distinction between the information for targeting and that for import into the apicoplast (26). Recent work has even identified proteins that interact with the TP and that may be involved in sorting from the ER to the apicoplast (82).

In Euglena, the region between the signal sequence and the stop-transfer sequence in class I proteins functions as a TP (67). This region and the TPs of class II proteins possess characteristics typical of most TPs. These similarities include enrichment in Ser/Thr (S/T bias) and Ala. S/T bias is a common feature of most transit peptides of plants and algae (2, 4, 11, 15, 20, 49, 54, 55, 59, 76). Some notable exceptions to this rule include apicomplexans (77, 78) and nucleomorph-encoded plastid proteins from the cryptomonad alga Guillardia theta (57). Replacement of all Ser/Thr residues in the TP of Plasmodium had no effect on plastid targeting, demonstrating a lack of a requirement for such residues (78). Although an elevated Ser/Thr content is evidently dispensable in apicomplexans, it remains one of the more consistent features of most TPs, which may reflect a requirement for phosphorylation-dependent binding of 14-3-3 proteins as part of a preinsertion guidance complex (46).

Euglena TPs also have an overall positive charge, an apparently universal feature of TPs (57), that is primarily due to a reduction in the content of acidic amino acids. Of particular interest is the asymmetric distribution of acidic amino acids in the TP, with the first two-thirds being deficient in such residues, whereas the remaining third has a composition resembling that of the mature protein. This asymmetry may reflect a distinction between functional TPs (with a bias against acidic residues) and regions having a different function. The importance of a TP depleted in acidic residues was demonstrated in Plasmodium, where the replacement of basic with acidic amino acids eliminated apicoplast targeting (19). Interestingly, Euglena TPs are also deficient in Lys (but not Arg) and have biases in favor of Ala and Pro compared to mature proteins, which are also features of the TPs of the chlorarachniophyte Bigelowiella natans (59). Some of the shared features of TPs, such as a bias against acidic amino acids and a bias in favor of some hydrophobic residues, may be due to a requirement for binding of import factors, such as molecular chaperones (Hsp70) (83). Although the biological significance of the biased amino acid composition in TPs is not entirely understood, and despite any differences in primary structure, TPs from diverse plastid types are functionally sufficient in heterologous import assays (3, 32, 42, 49, 67, 79).

The striking amino acid similarity between certain plastid-targeting sequences is surprising. In general, transit peptides lack evident sequence similarity, even among paralogs of the same gene family, so the detection of clusters of related targeting sequences may shed light on how targeting sequences were acquired following transfer of the endosymbiont's genes to the host nucleus during plastid evolution. Reports of highly similar plastid and mitochondrial TPs are relatively rare, but the examples can be separated into two categories. In the first case, homologs from different species exhibit a greater-than-expected similarity within the TP region compared to that of the mature proteins, which is attributed to a conserved functional role (80). The second category includes unrelated proteins that possess highly similar TPs (1, 5, 40, 45), which is what we observe in Euglena. This similarity is often attributed to exon shuffling, as introns commonly separate the transit peptide from the mature protein (36, 45). There are also reports of transit peptide acquisition through insertion into preexisting genes for plastid (5)- and mitochondrion-targeted (1) proteins. Thus, the newly transferred genes would acquire not only the targeting mechanism but also the regulatory sequences required for expression, in the so-called “lucky insertion scenario” (21). Although we lack the appropriate genomic information from Euglena to be able to completely assess the mechanism of TP acquisition, a genomic sequence for an LHCII gene of this organism does have an intron that roughly separates the predicted targeting domain from the mature protein (48), suggesting exon shuffling as a potential mode of TP acquisition. However, the similarity of the rpL3 and rpL21 presequences to a small portion of the LHCI mature protein (GFDPLGL) (Fig. 7) suggests that TP acquisition by insertion into a preexisting copy of the LHCI gene is also a strong possibility. The maintenance of a continued high degree of conservation between rpL21-rpL3 and CP29-FNR could also imply recent recombination, or perhaps alternative splicing, as described for rice mitochondrion-targeted rpS14 and SDHB proteins (40). The pronounced sequence conservation within these regions also raises the possibility that these targeting sequences have an additional function(s) in the cell, either before or after cleavage, as proposed for some mammalian signal sequences (10, 17).

In summary, we have characterized two distinct classes of Euglena plastid presequences, i.e., classes I and II, that differ by the presence and absence of a predicted stop-transfer sequence, respectively, revealing an additional level of complexity in the protein transport mechanism. In addition to enhancing our ability to predict Euglena presequences, we expect that the characteristics of these TPs will stimulate further import studies, both in vitro and in vivo, seeking to dissect the processes of targeting and import into the complex plastids of Euglena.

Acknowledgments

This work was carried out under the auspices of a Genome Canada large-scale genomics project, the Protist EST Program, with funding provided through Genome Atlantic and the Atlantic Innovation Fund. M.W.G. gratefully acknowledges salary support from the Canada Research Chairs Program and the Canadian Institute for Advanced Research (Program in Evolutionary Biology). D.G.D. also thanks the Natural Sciences and Engineering Research Council (NSERC) for ongoing support.

We are grateful to Patrick Keeling for sharing a paper on dinoflagellate targeting sequences prior to publication. We also thank Steve Heard and Penny Humby for helpful discussions on statistics. The technical assistance of H. Rissler, who isolated RNAs from Euglena for the construction of two of the five cDNA libraries sequenced for this study, is acknowledged.

Footnotes

Published ahead of print on 22 September 2006.

REFERENCES

  • 1.Adams, K. L., M. Rosenblueth, Y. L. Qiu, and J. D. Palmer. 2001. Multiple losses and transfers to the nucleus of two mitochondrial succinate dehydrogenase genes during angiosperm evolution. Genetics 158:1289-1300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Apt, K. E., D. Bhaya, and A. R. Grossman. 1994. Characterization of genes encoding the light-harvesting proteins in diatoms: biogenesis of the fucoxanthin chlorophyll a/c protein complex. J. Appl. Phycol. 6:225-230. [Google Scholar]
  • 3.Apt, K. E., N. E. Hoffman, and A. R. Grossman. 1993. The γ-subunit of R-phycoerythrin and its possible mode of transport into the plastid of red algae. J. Biol. Chem. 268:16208-16215. [PubMed] [Google Scholar]
  • 4.Apt, K. E., L. Zaslavkaia, J. C. Lippmeier, M. Lang, O. Kilian, R. Wetherbee, A. R. Grossman, and P. G. Kroth. 2002. In vivo characterization of diatom multipartite plastid targeting signals. J. Cell Sci. 115:4061-4069. [DOI] [PubMed] [Google Scholar]
  • 5.Arimura, S.-I., S. Takusagawa, S. Hatano, M. Nakazono, A. Hirai, and N. Tsutsumi. 1999. A novel plant nuclear gene encoding chloroplast ribosomal protein S9 has a transit peptide related to that of rice chloroplast ribosomal protein L12. FEBS Lett. 450:231-234. [DOI] [PubMed] [Google Scholar]
  • 6.Bendtsen, J. D., H. Nielsen, G. von Heijne, and S. Brunak. 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783-795. [DOI] [PubMed] [Google Scholar]
  • 7.Bhaya, D., and A. Grossman. 1991. Targeting proteins to diatom plastids involves transport through an endoplasmic reticulum. Mol. Gen. Genet. 229:400-404. [DOI] [PubMed] [Google Scholar]
  • 8.Blobel, G. 1980. Intracellular protein topogenesis. Proc. Natl. Acad. Sci. USA 77:1496-1500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bonifacino, J. S., and L. M. Traub. 2003. Signals for sorting of transmembrane proteins to endosomes and lysosomes. Annu. Rev. Biochem. 72:395-447. [DOI] [PubMed] [Google Scholar]
  • 10.Braud, V. M., D. S. Allan, C. A. O'Callaghan, K. Soderstrom, A. D'Andrea, G. S. Ogg, S. Lazetic, N. T. Young, J. I. Bell, J. H. Phillips, L. L. Lanier, and A. J. McMichael. 1998. HLA-E binds to natural killer cell receptors CD94/NKG2A, B and C. Nature 391:795-799. [DOI] [PubMed] [Google Scholar]
  • 11.Bruce, B. D. 2000. Chloroplast transit peptides: structure, function and evolution. Trends Cell Biol. 10:440-447. [DOI] [PubMed] [Google Scholar]
  • 12.Cavalier-Smith, T. 2002. Chloroplast evolution: secondary symbiogenesis and multiple losses. Curr. Biol. 12:R62-R64. [DOI] [PubMed] [Google Scholar]
  • 13.Chan, R. L., M. Keller, J. Canaday, J. H. Weil, and P. Imbault. 1990. Eight small subunits of Euglena ribulose-1 5-bisphosphate carboxylase-oxygenase are translated from a large messenger RNA as a polyprotein. EMBO J. 9:333-338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chen, H., and D. A. Kendall. 1995. Artificial transmembrane segments. Requirements for stop transfer and polypeptide orientation. J. Biol. Chem. 270:14115-14122. [DOI] [PubMed] [Google Scholar]
  • 15.Deane, J. A., M. Fraunholz, V. Su, U.-G. Maier, W. Martin, D. G. Durnford, and G. I. McFadden. 2000. Evidence for nucleomorph to host nucleus gene transfer: light-harvesting complex proteins from cryptomonads and chlorarachniophytes. Protist 151:239-252. [DOI] [PubMed] [Google Scholar]
  • 16.DeRocher, A., C. B. Hagen, J. E. Froehlich, J. E. Feagin, and M. Parsons. 2000. Analysis of targeting sequences demonstrates that trafficking to the Toxoplasma gondii plastid branches off the secretory system. J. Cell Sci. 113:3969-3977. [DOI] [PubMed] [Google Scholar]
  • 17.Eichler, R., O. Lenz, T. Strecker, M. Eickmann, H. D. Klenk, and W. Garten. 2003. Identification of Lassa virus glycoprotein signal peptide as a trans-acting maturation factor. EMBO Rep. 4:1084-1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Emanuelsson, O., H. Nielsen, and G. von Heijne. 1999. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8:978-984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Foth, B. J., S. A. Ralph, C. J. Tonkin, N. S. Struck, M. Fraunholz, D. S. Roos, A. F. Cowman, and G. I. McFadden. 2003. Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705-708. [DOI] [PubMed] [Google Scholar]
  • 20.Franzen, L. G., J.-D. Rochaix, and G. von Heijne. 1990. Chloroplast transit peptides from the green alga Chlamydomonas reinhardtii share features with both mitochondrial and higher plant chloroplast presequences. FEBS Lett. 260:165-168. [DOI] [PubMed] [Google Scholar]
  • 21.Gantt, J. S., S. L. Baldauf, P. J. Calie, N. F. Weeden, and J. D. Palmer. 1991. Transfer of rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain of an intron. EMBO J. 10:3073-3078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ghosh, P., N. M. Dahms, and S. Kornfeld. 2003. Mannose 6-phosphate receptors: new twists in the tale. Nat. Rev. Mol. Cell Biol. 4:202-212. [DOI] [PubMed] [Google Scholar]
  • 23.Gibbs, S. P. 1979. Route of entry of cytoplasmically synthesized proteins into chloroplasts of algae possessing chloroplast-ER. J. Cell Sci. 35:253-266. [DOI] [PubMed] [Google Scholar]
  • 24.Grossman, A., A. Manodori, and D. Snyder. 1990. Light-harvesting proteins of diatoms: their relationship to the chlorophyll a/b binding proteins of higher plants and their mode of transport into plastids. Mol. Gen. Genet. 224:91-100. [DOI] [PubMed] [Google Scholar]
  • 25.Hannaert, V., H. Brinkmann, U. Nowitzki, J. A. Lee, M. A. Albert, C. W. Sensen, T. Gaasterland, M. Muller, P. Michels, and W. Martin. 2000. Enolase from Trypanosoma brucei, from the amitochondriate protist Mastigamoeba balamuthi, and from the chloroplast and cytosol of Euglena gracilis: pieces in the evolutionary puzzle of the eukaryotic glycolytic pathway. Mol. Biol. Evol. 17:989-1000. [DOI] [PubMed] [Google Scholar]
  • 26.Harb, O. S., B. Chatterjee, M. J. Fraunholz, M. J. Crawford, M. Nishi, and D. S. Roos. 2004. Multiple functionally redundant signals mediate targeting to the apicoplast in the apicomplexan parasite Toxoplasma gondii. Eukaryot. Cell 3:663-674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Henze, K., A. Badr, M. Wettern, R. Cerff, and W. Martin. 1995. A nuclear gene of eubacterial origin in Euglena gracilis reflects cryptic endosymbioses during protist evolution. Proc. Natl. Acad. Sci. USA 92:9122-9126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Houlne, G., and R. Schantz. 1987. Molecular analysis of the transcripts encoding the light-harvesting chlorophyll a/b protein in Euglena gracilis: unusual size of the mRNA. Curr. Genet. 12:611-616. [DOI] [PubMed] [Google Scholar]
  • 29.Inagaki, J., Y. Fujita, T. Hase, and Y. Yamamoto. 2000. Protein translocation within chloroplast is similar in Euglena and higher plants. Biochem. Biophys. Res. Commun. 277:436-442. [DOI] [PubMed] [Google Scholar]
  • 30.Ishida, K., T. Cavalier-Smith, and B. R. Green. 2000. Endomembrane structure and the chloroplast protein targeting pathway in Heterosigma akashiwo (Raphidophyceae, Chromista). J. Phycol. 36:1135-1144. [Google Scholar]
  • 31.Jackson-Constan, D., and K. Keegstra. 2001. Arabidopsis genes encoding components of the chloroplastic protein import apparatus. Plant Physiol. 125:1567-1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jakowitsch, J., C. Neumann-Spallart, Y. Ma, J. Steiner, H. E. Schenk, H. J. Bohnert, and W. Löffelhardt. 1996. In vitro import of pre-ferredoxin-NADP+-oxidoreductase from Cyanophora paradoxa into cyanelles and into pea chloroplasts. FEBS Lett. 381:153-155. [DOI] [PubMed] [Google Scholar]
  • 33.Keegstra, K., and K. Cline. 1999. Protein import and routing systems of chloroplasts. Plant Cell 11:557-570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their hosts. Am. J. Bot. 91:1481-1493. [DOI] [PubMed] [Google Scholar]
  • 35.Kilian, O., and P. G. Kroth. 2003. Evolution of protein targeting into “complex” plastids: the “secretory transport hypothesis.” Plant Biol. (Stuttgart) 5:350-358. [Google Scholar]
  • 36.Kilian, O., and P. G. Kroth. 2004. Presequence acquisition during secondary endocytobiosis and the possible role of introns. J. Mol. Evol. 58:712-721. [DOI] [PubMed] [Google Scholar]
  • 37.Kilian, O., and P. G. Kroth. 2005. Identification and characterization of a new conserved motif within the presequence of proteins targeted into complex diatom plastids. Plant J. 41:175-183. [DOI] [PubMed] [Google Scholar]
  • 38.Kishore, R., U. S. Muchhal, and S. D. Schwartzbach. 1993. The presequence of Euglena LHCPII, a cytoplasmically synthesized chloroplast protein, contains a functional endoplasmic reticulum-targeting domain. Proc. Natl. Acad. Sci. USA 90:11845-11849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Krogh, A., B. Larsson, G. von Heijne, and E. L. Sonnhammer. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305:567-580. [DOI] [PubMed] [Google Scholar]
  • 40.Kubo, N., K. Harada, A. Hirai, and K.-I. Kadowaki. 1999. A single nuclear transcript encoding mitochondrial RPS14 and SDHB of rice is processed by alternative splicing: common use of the same mitochondrial targeting signal for different proteins. Proc. Natl. Acad. Sci. USA 96:9207-9211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kuroiwa, T., M. Sakaguchi, K. Mihara, and T. Omura. 1991. Systematic analysis of stop-transfer sequence for microsomal membrane. J. Biol. Chem. 266:9251-9255. [PubMed] [Google Scholar]
  • 42.Lang, M., K. E. Apt, and P. G. Kroth. 1998. Protein transport into “complex” diatom plastids utilizes two different targeting signals. J. Biol. Chem. 273:30973-30978. [DOI] [PubMed] [Google Scholar]
  • 43.Lang, M., and P. G. Kroth. 2001. Diatom fucoxanthin chlorophyll a/c-binding protein (FCP) and land plant light-harvesting proteins use a similar pathway for thylakoid membrane insertion. J. Biol. Chem. 276:7985-7991. [DOI] [PubMed] [Google Scholar]
  • 44.Lin, Q., L. Ma, W. Burkhart, and L. L. Spremulli. 1994. Isolation and characterization of cDNA clones for chloroplast translational initiation factor-3 from Euglena gracilis. J. Biol. Chem. 269:9436-9444. [PubMed] [Google Scholar]
  • 45.Long, M., S. J. de Souza, C. Rosenberg, and W. Gilbert. 1996. Exon shuffling and the origin of the mitochondrial targeting function in plant cytochrome c1 precursor. Proc. Natl. Acad. Sci. USA 93:7727-7731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.May, T., and J. Soll. 2000. 14-3-3 proteins form a guidance complex with chloroplast precursor proteins in plants. Plant Cell 12:53-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.McFadden, G. I. 1999. Plastids and protein targeting. J. Eukaryot. Microbiol. 46:339-346. [DOI] [PubMed] [Google Scholar]
  • 48.Muchhal, U. S., and S. D. Schwartzbach. 1994. Characterization of the unique intron-exon junctions of Euglena gene(s) encoding the polyprotein precursor to the light-harvesting chlorophyll a/b binding protein of photosystem II. Nucleic Acids Res. 22:5737-5744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Nassoury, N., M. Cappadocia, and D. Morse. 2003. Plastid ultrastructure defines the protein import pathway in dinoflagellates. J. Cell Sci. 116:2867-2874. [DOI] [PubMed] [Google Scholar]
  • 50.Nassoury, N., and D. Morse. 2005. Protein targeting to the chloroplasts of photosynthetic eukaryotes: getting there is half the fun. Biochim. Biophys. Acta 1743:5-19. [DOI] [PubMed] [Google Scholar]
  • 51.Nielsen, H., and A. Krogh. 1998. Prediction of signal peptides and signal anchors by a hidden Markov model. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6:122-130. [PubMed] [Google Scholar]
  • 52.Nowitzki, U., G. Gelius-Dietrich, M. Schwieger, K. Henze, and W. Martin. 2004. Chloroplast phosphoglycerate kinase from Euglena gracilis: endosymbiotic gene replacement going against the tide. Eur. J. Biochem. 271:4123-4131. [DOI] [PubMed] [Google Scholar]
  • 53.Osafune, T., S. Sumida, J. A. Schiff, and E. Hase. 1991. Immunolocalization of LHCP II apoprotein in the Golgi during light-induced chloroplast development in non-dividing Euglena cells. J. Electron Microsc. 40:41-47. [Google Scholar]
  • 54.Pancic, P. G., and H. Strotmann. 1993. Structure of the nuclear encoded g subunit of CFoCF1 of the diatom Odontella sinensis including its presequence. FEBS Lett. 320:61-66. [DOI] [PubMed] [Google Scholar]
  • 55.Patron, N. J., R. F. Waller, J. M. Archibald, and P. J. Keeling. 2005. Complex protein targeting to dinoflagellate plastids. J. Mol. Biol. 348:1015-1024. [DOI] [PubMed] [Google Scholar]
  • 56.Plaumann, M., B. Pelzer-Reith, W. F. Martin, and C. Schnarrenberger. 1997. Multiple recruitment of class-I aldolase to chloroplasts and eubacterial origin of eukaryotic class-II aldolases revealed by cDNAs from Euglena gracilis. Curr. Genet. 31:430-438. [DOI] [PubMed] [Google Scholar]
  • 57.Ralph, S. A., B. J. Foth, N. Hall, and G. I. McFadden. 2004. Evolutionary pressures on apicoplast transit peptides. Mol. Biol. Evol. 21:2183-2194. [DOI] [PubMed] [Google Scholar]
  • 58.Robinson, C. 2000. The twin-arginine translocation system: a novel means of transporting folded proteins in chloroplasts and bacteria. Biol. Chem. 381:89-93. [DOI] [PubMed] [Google Scholar]
  • 59.Rogers, M. B., J. M. Archibald, M. A. Field, C. Li, B. Striepen, and P. J. Keeling. 2004. Plastid-targeting peptides from the chlorarachniophyte Bigelowiella natans. J. Eukaryot. Microbiol. 51:529-535. [DOI] [PubMed] [Google Scholar]
  • 60.Saaf, A., E. Wallin, and G. von Heijne. 1998. Stop-transfer function of pseudo-random amino acid segments during translocation across prokaryotic and eukaryotic membranes. Eur. J. Biochem. 251:821-829. [DOI] [PubMed] [Google Scholar]
  • 61.Santillán-Torres, J. L., A. Atteia, M. G. Claros, and D. González-Halphen. 2003. Cytochrome f and subunit IV, two essential components of the photosynthetic bf complex typically encoded in the chloroplast genome, are nucleus-encoded in Euglena gracilis. Biochim. Biophys. Acta 1604:180-189. [DOI] [PubMed] [Google Scholar]
  • 62.Schnell, D. J. 1998. Protein targeting to the thylakoid membrane. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:97-126. [DOI] [PubMed] [Google Scholar]
  • 63.Schwartzbach, S. D., T. Osafune, and W. Löffelhardt. 1998. Protein import into cyanelles and complex chloroplasts. Plant Mol. Biol. 38:247-263. [PubMed] [Google Scholar]
  • 64.Sharif, A. L., A. G. Smith, and C. Abell. 1989. Isolation and characterisation of a cDNA clone for a chlorophyll synthesis enzyme from Euglena gracilis. The chloroplast enzyme hydroxymethylbilane synthase (porphobilinogen deaminase) is synthesised with a very long transit peptide in Euglena. Eur. J. Biochem. 184:353-359. [DOI] [PubMed] [Google Scholar]
  • 65.Shashidhara, L. S., S. H. Lim, J. B. Shackleton, C. Robinson, and A. G. Smith. 1992. Protein targeting across the three membranes of the Euglena chloroplast envelope. J. Biol. Chem. 267:12885-12891. [PubMed] [Google Scholar]
  • 66.Shigemori, Y., J. Inagaki, H. Mori, M. Nishimura, S. Takahashi, and Y. Yamamoto. 1994. The presequence of the precursor to the nucleus-encoded 30 kDa protein of photosystem II in Euglena gracilis Z includes two hydrophobic domains. Plant Mol. Biol. 24:209-215. [DOI] [PubMed] [Google Scholar]
  • 67.Sláviková, S., R. Vacula, Z. Fang, T. Ehara, T. Osafune, and S. D. Schwartzbach. 2005. Homologous and heterologous reconstitution of Golgi to chloroplast transport and protein import into the complex chloroplasts of Euglena. J. Cell Sci. 118:1651-1661. [DOI] [PubMed] [Google Scholar]
  • 68.Steiner, J. M., and W. Löffelhardt. 2005. Protein translocation into and within cyanelles. Mol. Membr. Biol. 22:123-132. [DOI] [PubMed] [Google Scholar]
  • 69.Sulli, C., Z. Fang, U. Muchhal, and S. D. Schwartzbach. 1999. Topology of Euglena chloroplast protein precursors within endoplasmic reticulum to Golgi to chloroplast transport vesicles. J. Biol. Chem. 274:457-463. [DOI] [PubMed] [Google Scholar]
  • 70.Sulli, C., and S. D. Schwartzbach. 1995. The polyprotein precursor to the Euglena light-harvesting chlorophyll a/b-binding protein is transported to the Golgi apparatus prior to chloroplast import and polyprotein processing. J. Biol. Chem. 270:13084-13090. [DOI] [PubMed] [Google Scholar]
  • 71.Sulli, C., and S. D. Schwartzbach. 1996. A soluble protein is imported into Euglena chloroplasts as a membrane-bound precursor. Plant Cell 8:43-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Tessier, L. H., M. Keller, R. L. Chan, R. Fournier, J. H. Weil, and P. Imbault. 1991. Short leader sequences may be transferred from small RNAs to pre-mature mRNAs by trans-splicing in Euglena. EMBO J. 10:2621-2625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Vacula, R., J. M. Steiner, J. Krajcovic, L. Ebringer, and W. Löffelhardt. 1999. Nucleus-encoded precursors to thylakoid lumen proteins of Euglena gracilis possess tripartite presequences. DNA Res. 6:45-49. [DOI] [PubMed] [Google Scholar]
  • 74.van Dooren, G. G., S. D. Schwartzbach, T. Osafune, and G. I. McFadden. 2001. Translocation of proteins across the multiple membranes of complex plastids. Biochim. Biophys. Acta 1541:34-53. [DOI] [PubMed] [Google Scholar]
  • 75.von Heijne, G. 1990. The signal peptide. J. Membr. Biol. 115:195-201. [DOI] [PubMed] [Google Scholar]
  • 76.von Heijne, G., J. Steppuhn, and R. G. Herrmann. 1989. Domain structure of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 180:535-545. [DOI] [PubMed] [Google Scholar]
  • 77.Waller, R. F., P. J. Keeling, R. G. Donald, B. Striepen, E. Handman, N. Lang-Unnasch, A. F. Cowman, G. S. Besra, D. S. Roos, and G. I. McFadden. 1998. Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc. Natl. Acad. Sci. USA 95:12352-12357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Waller, R. F., M. B. Reed, A. F. Cowman, and G. I. McFadden. 2000. Protein trafficking to the plastid of Plasmodium falciparum is via the secretory pathway. EMBO J. 19:1794-1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wastl, J., and U.-G. Maier. 2000. Transport of proteins into cryptomonads complex plastids. J. Biol. Chem. 275:23194-23198. [DOI] [PubMed] [Google Scholar]
  • 80.Wolter, F. P., C. C. Fritz, L. Willmitzer, J. Schell, and P. H. Schreier. 1988. rbcS genes in Solanum tuberosum: conservation of transit peptide and exon shuffling during evolution. Proc. Natl. Acad. Sci. USA 85:846-850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Yung, S., T. R. Unnasch, and N. Lang-Unnasch. 2001. Analysis of apicoplast targeting and transit peptide processing in Toxoplasma gondii by deletional and insertional mutagenesis. Mol. Biochem. Parasitol. 118:11-21. [DOI] [PubMed] [Google Scholar]
  • 82.Yung, S. C., T. R. Unnasch, and N. Lang-Unnasch. 2003. Cis and trans factors involved in apicoplast targeting in Toxoplasma gondii. J. Parasitol. 89:767-776. [DOI] [PubMed] [Google Scholar]
  • 83.Zhang, X. P., and E. Glaser. 2002. Interaction of plant mitochondrial and chloroplast signal peptides with the Hsp70 molecular chaperone. Trends Plant Sci. 7:14-21. [DOI] [PubMed] [Google Scholar]

Articles from Eukaryotic Cell are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES