Abstract
Programmed translational frameshifts have been identified in genes from a broad range of organisms, but typically only a very few genes in a given organism require a frameshift for expression. In contrast, a recent analysis of gene sequences available in GenBank from ciliates in the genus Euplotes indicated that >5% required one or more +1 translational frameshifts to produce their predicted protein products. However, this sample of genes was nonrandom, biased, and derived from multiple Euplotes species. To test whether there truly is an abundance of frameshift genes in Euplotes, and to more accurately assess their frequency, we sequenced a random sample of 25 cloned genes/macronuclear DNA molecules from Euplotes crassus. Three new candidate +1 frameshift genes were identified in the sample that encode a membrane occupation and recognition nexus (MORN) repeat protein, a C2H2-type zinc finger protein, and a Ser/Thr protein kinase. Reverse transcription-PCR analyses indicate that all three genes are expressed in vegetatively proliferating cells and that the mRNAs retain the requirement of a frameshift. Although the sample of sequenced genes is relatively small, the results indicate that the frequency of genes requiring frameshifts in E. crassus is between 3.7% and 31.7% (at a 95% confidence interval). The current and past data also indicate that frameshift sites are found predominantly in genes that likely encode nonabundant proteins in the cell.
During the translation of an mRNA, a shift in reading frame is usually a catastrophic event, often resulting in a truncated and/or nonfunctional protein. Translational frameshifting is infrequent during the translation of most mRNAs (<5 × 10−5 frameshifts per codon translated [25]), but a number of mRNAs require a frameshift to express a functional protein and appear to have evolved to stimulate frameshifting. Such programmed translational frameshifting (reviewed in references 13 and 32) is relatively rare but is phylogenetically widespread, as examples are known in prokaryotes, eukaryotes, and a number of mobile genetic elements. Sequence elements within the mRNA facilitate programmed frameshifting, and these typically reside at the site of the frameshift, but more distally located sequences can also be involved.
A number of genes in ciliated protozoa of the genus Euplotes (class Spirotrichea) have been identified that appear to require a +1 translational frameshift to produce their protein products. The putative +1-frameshift genes encode the regulatory subunit of cyclic AMP-dependent protein kinase and a nuclear protein kinase of Euplotes octocarinatus (39, 40), a La motif protein (p43) in Euplotes aediculatus (1), the Euplotes crassus Tec2 transposon ORF2 protein (11, 18), and the reverse transcriptase subunits of telomerase (TERT) in three euplotid species (20, 29, 42). Since the complete sequences of less than 100 Euplotes genes have been determined, it appears that frameshifting is unusually common in euplotids, with perhaps >5% of genes requiring a frameshift for expression (reviewed in reference 22).
Neither the mechanism nor the sequence requirements for the +1 frameshift in Euplotes genes are known. However, all of the genes have a lysine codon (AAA) followed by a termination codon (TAA, except for one instance in which it is TAG) at the end of the initial open reading frame (0 frame). This “Euplotes frameshift motif” (5′-AAA-TA(A/g)-3′; three-base groupings denote codons in the 0 frame) bears similarities to the “shifty stop” type of sequence elements required for +1 frameshifting in genes of other organisms (reviewed in references 6 and 36). A shifty stop typically contains a codon that would allow its cognate tRNA positioned in the ribosome P-site to undergo a +1 shift in reading frame and still maintain pairing with two bases in the mRNA; the AAA lysine codon of the Euplotes frameshift motif fulfills this criterion. The second feature is a poorly recognized termination tetranucleotide (i.e., the termination codon plus the following nucleotide), which is thought to slow or stall the ribosome, allowing an opportunity for a shift in reading frame. Here the Euplotes frameshift motif does not appear to conform to a shifty stop site, as TAA-A is most frequently found at frameshift sites, and this is the most frequent tetranucleotide at true sites of translation termination (22). It is possible that another undefined feature of the Euplotes frameshift mRNAs is responsible for slowing translation. Alternatively, it has been suggested that a second unusual genetic feature of Euplotes, stop-codon reassignment, is involved in slowing translation and promoting the frameshifts (22). Euplotids have reassigned the UGA stop codon of the universal code so that it now encodes cysteine (16, 28). This has occurred, in part, as a result of changes to the single eukaryotic translation release factor 1 protein (eRF1) so that it no longer recognizes the UGA stop codon (9, 21, 34a). It is possible that these alterations to Euplotes eRF1 have also impaired its ability to recognize the remaining UAA and UAG stop codons. If so, translation termination would be a generally slow process in Euplotes, and encountering the stop codon within the frameshift motif would provide a pause that facilitates the +1 frameshift.
Whatever the mechanism, the apparent high frequency of euplotid genes requiring frameshifts is unprecedented. While the current data suggest that ∼5% of euplotid genes require one or more frameshifts for expression, there are a number of reasons to view this number with suspicion. First, it is based on a relatively small sample of 67 genes (22). Second, the gene sequences derive from seven different Euplotes species and, in some cases, orthologous genes were included from the different species. Third, the gene sample was not random, and in all likelihood was biased towards highly expressed genes, as many of the genes encode tubulins, histones, and proteins involved in translation. To more accurately assess the frequency of frameshift-requiring genes in a single Euplotes species, 25 randomly selected Euplotes crassus macronuclear chromosomes, which typically contain single genes, were completely sequenced. Three novel genes requiring +1 translational frameshifts have been identified, all of which are shown to be expressed in vegetatively growing cells. The results support a high frequency of +1 translational frameshifting in euplotids and, indeed, suggest that the frequency of such genes may exceed 10%. The functions of the encoded frameshift proteins are also discussed in regard to the possible role of frameshifting in the coordinate regulation of gene expression.
MATERIALS AND METHODS
Cells and nucleic acid purification.
Euplotes crassus cells were grown using the alga Dunaliella salina as the food source in artificial seawater as described previously (34), except that Reef Crystals (Aquarium Systems, Mentor, OH) served as the base for the artificial seawater and vitamin B12 was omitted. Total cellular DNA used for the construction of macronuclear clones was isolated from E. crassus strain X1 (14) as described previously (23). Total cellular RNA was isolated from late-log-phase E. crassus strain CT5 using Tri reagent (Molecular Research Center, Inc., Cincinnati, OH) and subsequently treated with TURBO DNA-free DNase (Ambion, Inc., Austin, TX), according to the manufacturers' protocols.
Cloning of macronuclear DNA molecules.
To construct small recombinant libraries of macronuclear DNA molecules, the single-stranded regions of telomeres were removed by treatment of 6 μg of E. crassus strain X1 DNA with 10 units of T4 DNA polymerase for 15 min at 37°C in the presence of all four deoxynucleotide triphosphates at a final concentration of 200 μM each. The DNA was then ligated into either the SmaI or HincII site of the pBluescript SK(+) phagemid (Stratagene, La Jolla, CA), transformed into Escherichia coli TOP10 chemically competent cells (Invitrogen, Carlsbad, CA), and the cells were spread on plates containing 50 μg/ml ampicillin and 40 μl of 20 μg/ml X-gal (5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside). White colonies were randomly selected and expanded, and DNA was prepared using either the Wizard Plus Miniprep kit (Promega, Madison, WI) or the QIAprep Spin Miniprep kit (QIAGEN, Valencia, CA). Sizes of macronuclear DNA inserts were determined by digestion with either EcoRI + BamHI or KpnI + BamHI restriction enzymes and electrophoresis on 0.8% agarose gels prepared and run in 1× TBE (89 mM Tris, 89 mM H3BO3, and 2 mM disodium EDTA; pH 8.3).
DNA sequencing.
All DNA sequencing was performed by the University of Connecticut Health Center Molecular Core facility using the Taq Dyedeoxy Termination cycle sequencing kit (Perkin Elmer Cetus, Norwalk, CT). For sequencing of the cloned macronuclear DNA molecules, the initial reactions employed the T3 (5′-ATTAACCCTCACTAAAGGGA-3′) and T7 (5′-TAATACGACTCACTATAGGG-3′) sequencing primers. As necessary, additional sequencing reactions were performed to extend the sequences, using oligonucleotide primers that were designed based on the initial sequence reads, until the complete sequences of the macronuclear DNA molecules were obtained. Five clones (pEC4, pEC5, pEC8, pEC9, and pEC10) were lost prior to the completion of sequencing. In these instances, the missing segments of DNA were obtained by PCR from total cellular DNA, and the PCR products were directly sequenced to complete the sequences of the macronuclear DNA molecules. All primers for sequencing and PCR were purchased from Invitrogen, and their sequences are available on request. The sequences of the macronuclear clones have been deposited in GenBank under accession numbers DQ114948 to DQ114975.
PCR and reverse transcription-PCR.
PCR was carried out using 100 ng of E. crassus strain X1 genomic DNA as the substrate and KlenTaq DNA polymerase (Sigma, St. Louis, MO) under conditions specified by the manufacturer. Twenty-five cycles of PCR were carried out, with a cycle consisting of a 95°C denaturation step for 1 min, a 1-min annealing step, and a 72°C elongation step for 30 seconds to 1 min, depending on the length of the expected product. The temperature for the annealing step was adjusted based on the G+C content of the primers. For sequencing, PCRs were typically run on a low-melting-point agarose (Invitrogen) gel, and the PCR product was excised and purified as described by Qian and Wilkinson (33).
Reverse transcription (RT)-PCR was performed using the SuperScript One-Step RT-PCR with Platinum Taq kit (Invitrogen). The reactions were performed according to the manufacturer's protocol, using 200 ng of E. crassus strain CT5 (mating type III) total RNA as the substrate and 30 cycles of PCR following the reverse transcription step. To assess any possible DNA contamination of the RNA preparation, control reactions lacking the reverse transcription step were performed by adding the substrate RNA to reactions after the 94°C step that inactivates the reverse transcriptase enzyme, but prior to PCR amplification.
The following pairs of oligonucleotides were used for genomic PCR and RT-PCR analyses of the pEC2, pEC14, and pEC26 putative frameshift genes, respectively: pEC2F (5′-AGGAGGCATTCCCACTTTTG-3′) and pEC2R (5′-TGATGAAGCAGAAGCTGGTG-3′), pEC14F (5′-ACTCATCCATGCAGACGGTG-3′) and pEC14R (5′-TTTTTCCAAATTCCCTCTCG-3′), and R3EC26 (5′-TATCCCTGGGAATGCACAAA-3′) and 3EC26 (5′-TGGTAGTCCTGTTCCTTTCC-3′).
Bioinformatic and statistical analyses.
A confidence interval for the frequency of E. crassus frameshift genes was calculated using the Blyth-Still-Casella method (7, 8) and StatXact-4 for Windows software (Cytel Software Corp., Cambridge, MA).
Sequences of the macronuclear DNA molecules, with telomeric repeats removed, were used in BLASTx and BLASTn analyses (see reference 27) of the nonredundant protein and nucleotide GenBank databases at the NCBI website (http://www.ncbi.nlm.nih.gov/blast/). Default parameters were employed, except that the euplotid nuclear genetic code was employed in BLASTx searches. For genes that failed to produce strong matches, selected long open reading frames were also used in BLASTp searches of the GenBank nonredundant protein database, BLASTp searches of the Tetrahymena thermophila preliminary gene predictions generated by The Institute for Genome Research (August 2004; http://tigrblast.tigr.org/er-blast/index.cgi?project=ttg), and tBLASTn searches of the Paramecium tetraurelia macronuclear genome sequences (http://paramecium.cgm.cnrs-gif.fr/blast/) and the Tetrahymena thermophila macronuclear genome (Assembly 2, November 2003; http://tigrblast.tigr.org/er-blast/index.cgi?project = ttg). In all BLAST analyses, only matches with expect (E) values of <10−6 were considered significant. In cases where a frameshift was suspected, predicted proteins were generated assuming that the +1 frameshift occurs following the incorporation of the lysine residue (AAA codon) of the 5′-AAA-TA(A/g)-3′ frameshift motif. The macronuclear sequences were also searched for tRNA genes using tRNAscan-SE 1.21 (26) at http://lowelab.ucsc.edu/tRNAscan-SE/, with a score of >40 considered significant, and some putative open reading frames (ORFs) were analyzed for conserved PROSITE domains and motifs (http://au.expasy.org/prosite/).
WebLogo (10, 35) (http://weblogo.berkeley.edu/logo.cgi) was employed to search for conserved sequences at defined distances from frameshift sites. To search for conserved sequence elements present at variable distances from frameshift sites, ClustalW (41) was used to align the frameshift sequences using both default parameters and reduced gap creation/extension penalties. Multiple Em for Motif Elicitation, version 3 (MEME; http://meme.sdsc.edu/meme/website/meme.html) (5) was also used to search for conserved elements using a variety of parameters.
RESULTS
Generation and sequencing of a random sample of cloned macronuclear DNA molecules.
To more accurately assess the frequency of Euplotes crassus genes requiring a translational frameshift for expression, small libraries of complete macronuclear DNA molecules from strain X1 (14) were constructed using the pBluescript SK(+) phagemid as a cloning vector (pEC clones). Twenty-six clones were randomly selected from the libraries, and the sizes of their macronuclear inserts were determined by restriction digestion followed by agarose gel electrophoresis. The inserts ranged from 431 bp to 5,390 bp in size (Table 1), with the average insert size being 1,690 bp. The average size for macronuclear DNA molecules in E. crassus has not been determined, but a previous electron microscopic analysis of macronuclear DNA from the related species E. aediculatus indicated an average size of 1.84 kbp (37). Since the macronuclear DNAs of these two Euplotes species show a very similar size distribution in agarose gel electrophoresis, the inserts of the 26 selected clones appear to be a reasonable sample of the macronuclear genome.
TABLE 1.
Characteristics of sequenced macronuclear DNA molecules
Clone/GenBank no. | Size (bp) of:
|
Function/motif | BLAST expect value (E) | |
---|---|---|---|---|
Insert | Long ORFa | |||
pEC1/DQ114948 | 545 | U2 snRNA | 4e − 28 | |
pEC2/DQ114952 | 1,101 | 915b | Probable zinc finger motif; Tetrahymena/Paramecium homologs | 2e − 9 |
pEC3/DQ114953 | 507 | 294c | ||
pEC4/DQ114951 | 1,445 | 1,266 | Elongation factor Tu | e − 128 |
pEC5/DQ114950 | 2,387 | 1,086 | Tetrahymena/Paramecium homologs | 1.3e − 9 |
pEC7/DQ114954 | 912 | 804 | ||
pEC8/DQ114955 | 2,016 | 960c | ||
pEC9/DQ114957 | 1,485 | 531c | ||
pEC10/DQ114956 | 2,520 | 861c | Zinc finger protein | 4e − 16 |
pEC11 | ∼4,200 | Sequence not determined | ||
pEC12/DQ114961 | 431 | tRNA-Ile (AAU anticodon) | 73.53d | |
pEC13/DQ114958 | 1,213 | 1,062 | DHHC zinc finger, palmitoyltransferase family | e − 17 |
pEC14/DQ114962 | 1,242 | 1,065b | Phosphatidylinositol-4-phosphate 5-kinase/MORN repeat | 3e − 65 |
pEC15/DQ114960 | 1,186 | 513 | ||
pEC16/DQ114963 | 1,408 | 1,197 | ||
pEC17/DQ114966 | 3,025 | 1,497 | ||
pEC18/DQ114968 | 1,682 | 609 | ||
pEC19/DQ114964 | 939 | 696 | Ubiquinone biosynthesis protein COQ4 homolog | 4e − 39 |
pEC20/DQ114965 | 664 | 462 | Nucleotide diphosphate kinase | 2e − 45 |
pEC21/DQ114967 | 1,368 | 903 | ||
pEC22/DQ114970 | 1,775 | 1,611 | Adenosine deaminase/Cat eye syndrome protein | 4e − 53 |
pEC23/DQ114974 | 1,718 | 1,533 | Calcium-dependent protein kinase | 9e − 64 |
pEC24/DQ114972 | 956 | 447 | ||
pEC25/DQ114971 | 2,562 | 1,248c | Tetrahymena micronuclear linker histone polyprotein | 9e − 7 |
pEC26/DQ114969 | 5,390 | 2,793b | Serine/threonine protein kinase, SNF-1-like | 2e − 49 |
pEC27/DQ114975 | 1,265 | 720 | C2H2 zinc finger protein | 4e − 7 |
Longest ORF, defined as starting with an ATG and ending in a termination codon, unless otherwise stated.
Putative frameshift gene; size listed represented the total ORF length assuming frameshifting to join separate reading frames.
ORF does not begin with ATG.
tRNAscan-SE Cove score (>40 is significant).
Using a primer-walking strategy (see Materials and Methods), the complete sequences of the macronuclear inserts of 25 of the clones were determined. All of the clones appear to contain inserts representing complete macronuclear DNA molecules, based on the presence of telomeric repeat sequences (5′-CCCCAAAA-3′) at both ends of each insert (one possible exception, pEC26, is discussed below).
Functions of the macronuclear DNA molecules.
To determine the possible functions of any genes within the macronuclear molecules, BLASTn and BLASTx searches (3) of the GenBank nonredundant nucleotide and protein databases, respectively, were carried out. In addition, the program tRNAscan-SE (26) was used to search for tRNA genes. In cases where these searches failed to identify possible gene functions, tBLASTn searches of the DNA sequence databases of the ciliates Tetrahymena thermophila and Paramecium tetraurelia were conducted in an attempt to determine whether ciliate homologs of the genes existed, and a conceptual translation of the longest open reading frame of the macronuclear insert was used in a BLASTp search of the protein database in a further attempt to identify any possible related proteins or sequence motifs that would be indicative of function.
For the BLAST searches, only database matches with expect (E) values of ≤1 × 10−6 were considered significant. Based on this criterion, 11 of the 25 macronuclear DNA molecules encode proteins of known function or have homologs in other nonciliate organisms (Table 1). In addition, the pEC1 macronuclear DNA molecule was found to encode a U2 small nuclear RNA (snRNA), and pEC12 is predicted to encode an isoleucine tRNA with an AAU anticodon. Of the 11 remaining macronuclear DNA molecules, two (pEC2 and pEC5) gave significant hits in the tBLASTn searches of the Tetrahymena and Paramecium genomic sequences (Table 1), indicating that these macronuclear DNA molecules likely encode proteins that are at least conserved among ciliates. Overall, evidence for a possible gene function, or at least evidence for the presence of a functional gene, was obtained for 15 of the 25 cloned macronuclear DNA molecules.
Identification of candidate frameshift genes.
The above bioinformatic analyses also provided indications that some of the newly sequenced macronuclear genes require a translational frameshift for expression. Such genes are expected to generate two separate hits to different regions of the same protein in BLASTx searches, with the two matching regions of the macronuclear DNA molecule encoding polypeptides in different reading frames. Such results can also be due to the presence of introns, which are rare in spirotrich genes (17), but an intron is also expected to result in a gap in the alignment of the conceptually translated DNA sequence with the protein homolog, while genes requiring a frameshift should not display such an alignment gap. In addition, based on the typical arrangement of +1 frameshift genes in Euplotes, the initial open reading frame (reading frame 0) is expected to terminate with the sequence 5′-AAA-TA(A/g)-3′ (22).
Based on these criteria, three of the E. crassus macronuclear genes are strong candidates for requiring +1 frameshifts for expression. The pEC14 macronuclear DNA molecule generated strong matches in BLASTx searches to proteins containing MORN repeats (Table 1). The MORN repeat is a 14-amino-acid (aa) motif that is found in multiple copies in a number of functionally distinct proteins (for examples, see references 19, 24, and 38). In the case of the protein junctophilin, the MORN repeats have been implicated in this protein's association with the plasma membrane, giving rise to the MORN acronym (“membrane occupation and recognition nexus”). In the pEC14 macronuclear DNA molecule, an initial open reading frame (0 frame ORF) encodes six complete MORN repeats and terminates with part of a seventh, while the second +1 open reading frame (+1 frame ORF) contains the remainder of the seventh MORN repeat plus two additional complete repeats (Fig. 1a). The 0 frame ORF terminates with the frameshift motif sequence 5′-AAA-TAA-3′, making it likely that a +1 translational frameshift joins the 0 and +1 reading frames to generate a protein of 354 aa that contains a total of nine MORN repeats.
FIG. 1.
Maps of macronuclear chromosomes containing putative frameshift genes and organization of their coding regions. Horizontal black bars denote the pEC14 (a), pEC2 (b), and pEC26 (c) macronuclear DNA molecules, with the positions of selected initiation codons (ATG), termination codons (TAA or TAG), and frameshift motifs indicated. Rectangles typically denote the 0 and +1 ORFs that can be joined by a +1 frameshift to produce a single protein. Lines terminating with black balls indicate the segments amplified by PCR of genomic DNA and RT-PCR of mRNA to confirm the presence of putative frameshift sites. In the case of pEC26 (c), not all termination codons are shown, for simplicity, and the position of the internal block of telomeric repeats is indicated (G4T4). In addition, the three ORFs (0, +1, and +2 frame) encoding the 12 conserved protein kinase domains (I-XI) are indicated, and two long ORFs that precede the protein kinase coding region are indicated by rectangles with question marks.
The pEC2 macronuclear DNA molecule (1,101 bp) (Fig. 1b) contains a short ORF at one end that is predicted to encode a 148-aa peptide containing a C2H2-type zinc finger motif (PROSITE PS00028). This small ORF (+1 frame) could be joined to a larger upstream ORF (0 frame) that terminates in 5′-AAA-TAA-3′ by a +1 frameshift to generate a protein of 304 aa (Fig. 1b). While this predicted protein had no significant hits in the GenBank database, it did produce strong matches (E values of <10−8) in tBLASTn searches of both the Tetrahymena thermophila and Paramecium tetraurelia genome sequences. Moreover, in searching the database of proteins predicted from the Tetrahymena genome sequence with the putative pEC2 protein sequence, two putative proteins (nos. 200.m00067 and 150.m00064; 351 aa and 394 aa, respectively) were identified, with expect values of <10−13. Each of these putative Tetrahymena proteins shared similarity with the amino acid sequences that would be encoded by both the 0 and +1 frame ORFs of pEC2, supporting the notion that the two ORFs are joined by a frameshift to produce a single protein.
Macronuclear clone pEC26 is also a strong candidate for a +1 translational frameshift, albeit a more complex one (Fig. 1c). In this case, BLASTx searches identified three overlapping ORFs (183 bp, 441 bp, and 1,077 bp), each shifted +1 relative to the upstream ORF, that conceptually encode parts of a serine/threonine protein kinase (Table 1; the top database hits were to members of the SNF-1-like subfamily). Twelve conserved domains have been identified in protein kinases (15). For pEC26, the 0 frame ORF would encode domains I, II, and part of III, the contiguous +1 ORF would encode the remainder of domain III through part of domain VII, and the +2 ORF would encode the remainder of domain VII through domain XI (Fig. 1c). Therefore, two +1 shifts in reading frame would be required to produce a complete protein kinase domain. It should also be noted that the protein kinase region occupies only a small portion of this 5.39-kbp macronuclear DNA molecule, and it is preceded by two large ORFs of 1,095 bp and 1,293 bp (Fig. 1c). While conceptual translations of these upstream ORFs failed to identify homologs in database searches, the second of the upstream ORFs terminates with a 5′-AAA-TAA-3′ frameshift motif, and it is oriented such that a +1 frameshift would translationally link it to the protein kinase region. Thus, it is possible that this gene requires three +1 frameshifts for expression.
One final unusual feature of the pEC26 clone is that it contains a 28-bp block of the Euplotes telomeric repeat sequence 5′-GGGGTTTT-3′ (G4T4 repeat) beginning 1,167 bp from the left end of the cloned insert (Fig. 1c). Internal blocks of telomeric repeats of this length, which happens to correspond to the length of the double-stranded region of Euplotes macronuclear telomeres (23), have not previously been seen in internal regions of spirotrich macronuclear DNA molecules. This suggested that the pEC26 clone insert might represent a composite clone derived from all or part of two macronuclear DNA molecules artificially joined during the cloning process. A series of PCR analyses using total genomic DNA as a substrate supported this hypothesis (data not shown). Six combinations of oligonucleotide primers whose binding sites were all located to the right of the internal G4T4 block (as oriented in Fig. 1c) generated PCR products of the expected sizes, consistent with this region constituting a single macronuclear chromosome. In contrast, three combinations of primers whose binding sites bracketed the G4T4 block failed to produce the PCR products predicted from the pEC26 clone. While it is possible that the internal telomeric repeat block or some other feature of this region of pEC26 interferes with successful amplification, the results are consistent with the notion that the pEC26 clone is an artifact in the sense that the regions to the left and right of the G4T4 block are derived from all or parts of two different macronuclear DNA molecules.
To exclude the possibility that cloning or sequencing errors resulted in the appearance of the frameshift sites in the three macronuclear genes, PCRs were carried out on genomic DNA using primers that flanked putative frameshift sites (Fig. 1), and the resulting PCR products were directly sequenced. The sequence of the pEC14 genomic PCR product completely matched that of the clone, confirming the presence of the frameshift site. The genomic PCR sequences for pEC2 and pEC26 essentially matched those of their respective clones, with the exception that four polymorphic positions were seen in the pEC2 genomic PCR product, and three polymorphic positions were found in the pEC26 genomic sequence. In each case, one of the two alternative bases at each polymorphic position matched the sequence of the clone, and the alternative base represented a synonymous change in the coding region. These polymorphisms likely represent allelic variation, as the E. crassus X1 strain is not inbred. Overall, the results indicate that the frameshift sites are indeed present in the three genes, including two alternative forms of the gene in the cases of pEC2 and pEC26.
Expression of the frameshift genes.
RT-PCR analyses were carried out to determine if the three newly identified frameshift genes are expressed in vegetative cells and to confirm that the frameshift sites were present in the mRNAs. These analyses utilized total RNA isolated from E. crassus strain CT5 as the substrate, as strain X1 was no longer viable at the time of the analysis, and the same primers used in the previous analysis of genomic DNA (Fig. 1). For each of the genes, PCR products of the expected size were obtained, and these were dependent on the inclusion of the reverse transcription step, indicating that they are not the result of contaminating DNA (Fig. 2). Two additional smaller RT-PCR products were observed in the pEC14 RT-PCR analysis (Fig. 2), but these proved to be nonspecific, as they were shown to be generated by only one of the two pEC14 oligonucleotide primers (data not shown).
FIG. 2.
Agarose gel displaying RT-PCR products from the pEC2, pEC14, and pEC26 genes. Reactions were carried out both in the presence (+) and absence (−) of a reverse transcription step. Sizes of selected marker DNA fragments are shown to the left in kilobase pairs, and the sizes of RT-PCR products are indicated in kilobase pairs to the right of the gels. Note that for pEC14, the two bands smaller than 0.53 kbp were shown to be nonspecific PCR products.
Bulk sequencing of the RT-PCR products confirmed the presence of frameshift sites in all three genes. For pECR14, the sequence of the RT-PCR product was identical to that of the clone, while for pEC26, the sequence differed only in regard to the same three polymorphic positions observed in the genomic PCR product. In the case of pEC2, the RT-PCR product displayed four single-base differences compared to the clone, all of which represent synonymous changes: two of the single-base changes correspond to polymorphic positions observed in the genomic PCR product, while the other two were at unique positions. This low level of sequence diversity is again likely due to allelic variation, in this case partly attributable to the use of the E. crassus CT5 strain in the analyses. Nonetheless, the results confirm the presence of single frameshift sites in the pEC2 and pEC14 mRNAs, as well as the two frameshift sites within the protein kinase-encoding region of pEC26. This rules out the possibility that intron removal or RNA editing modifies the mRNAs so that a frameshift is no longer required.
Conserved sequences associated with frameshift sites.
With the expanded sample of Euplotes frameshift genes, a number of analyses were carried out to look for conserved sequence elements that might facilitate frameshifting. For these analyses, the 50 bp upstream and downstream of the conserved 5′-AAA-TA(A/g)-3′ motif from the 12 known frameshift sites were considered (only single examples of a frameshift site were considered in cases where more than one homolog with a frameshift site has been identified). To search for conserved sequence elements that might exist at a defined distance from frameshift sites, the sequences were aligned at the 5′AAA-TA(A/g)-3′ frameshift motif, and individual positions in the aligned sequences were evaluated for information content/sequence conservation using WebLogo (10, 35). The only well-conserved sequence element identified was the 5′-AAA-TA(A/g)-3′ frameshift motif itself (Fig. 3). It was previously reported that there was an additional conserved A residue following this motif (22, 40), but that conclusion was based on a small sample of frameshift sites, and many of the sites subsequently identified do not have an A residue at this position. Conserved sequence elements might also exist at a variable distance from the site of the frameshift. Tan et al. (40) noted that the hexanucleotide 5′-CAAGAA-3′ was often found upstream of the six then-known frameshift sites. However, exact matches to this hexanucleotide are present in only 4 of the 12 currently known frameshift sites, making it unlikely that this sequence element is important for frameshifting. Additional searches for conserved sequence elements at variable distances from the frameshift motif (see Materials and Methods) failed to identify any highly conserved sequence elements that were shared by all of the frameshift sites.
FIG. 3.
WebLogo displaying sequence conservation in the vicinities of frameshift sites. Sizes of letters denote information content, or sequence conservation, at each position. The analysis is based on the alignment of the 50 bp preceding and following the 5′-AAA-TA(A/g)-3′ frameshift motif from the following frameshift sites/genes: the single frameshifts of pEC2 (GenBank accession no. DQ114952), pEC14 (DQ114962), and the three frameshift sites of pEC26 (DQ114969) identified in this study; E. octocarinatus cyclic AMP-dependent protein kinase (AJ238280) (39); E. aediculatus p43/La motif protein (AF307939) (1); E. octocarinatus npk2/Eondr2 (AJ249684) (40); E. crassus orf2 of transposon Tec2-1 (L03360) (18); frameshift sites 1 and 2 of E. crassus TERT-1 (AF528527) (42); and frameshift site 3 of Euplotes minuta TERT (AY303934) (29).
DISCUSSION
In the current study, the sequences of 25 randomly selected E. crassus macronuclear chromosomes have been determined. Thirteen of the 25 macronuclear chromosomes appear to encode proteins with homologs in other organisms, 2 encode RNA products, and the function of the remaining 10 macronuclear chromosomes is uncertain. Three of the protein-encoding genes likely require +1 frameshifts for the proper production of their protein products (Fig. 1). One of these macronuclear chromosomes, pEC26, encodes a protein kinase, and at least two +1 frameshifts are required to produce this protein. This represents the third example of a Euplotes gene requiring multiple frameshifts for expression; the other two cases involve two of the TERT genes in E. crassus, where two frameshift sites have been noted (20). All three putative frameshift genes are transcribed in vegetatively growing cells (Fig. 2), and the mRNAs have been shown to match the gene sequence, indicating that RNA editing does not remove the requirement of a frameshift.
Frequency of frameshift genes in E. crassus.
Considering only the macronuclear chromosomes that do not encode untranslated RNA products, 3 of 23 (13%) were found to require frameshifts. This value is likely an underestimate of the percentage of genes requiring a frameshift, as the strategy for defining a frameshift site depended on the identification of a homologous gene in another organism. That is, some of the macronuclear chromosomes whose functions are unknown may also require a frameshift for expression, and, indeed, there are cases in this subpopulation where ORFs of unknown function could be joined to each other by a +1 frameshift at a 5′-AAA-TAA-3′ sequence.
The observed 13% frequency of frameshifting somewhat exceeds the value of ∼7.5% (5 of 67 genes) obtained from a previous survey of genes available in GenBank (22) and provides support for the notion that euplotids possess an extremely high number of genes requiring +1 frameshifts for expression. While there is still considerable uncertainty as to the true percentage of frameshift genes, as a result of the small sample size, the current data provide a 95% confidence interval of 3.7 to 31.7% for the percentage of frameshift genes. Even the 3.7% value at the lower end of this range is >100-fold higher than the reported frequency of frameshift genes in other organisms, such as yeast, where only 2 nontransposon genes (4, 31) of the ∼6,000 total protein-coding genes in the genome (∼0.03%) have been reported to require frameshifts for expression.
Efficiency of frameshifting.
A previous evolutionary analysis of the TERT genes in a number of Euplotes species (29) indicated that frameshift sites have arisen during the diversification of euplotids. Coupled with the observed high frequency of frameshift genes, this led to the suggestion that euplotids may possess an efficient mechanism of +1 frameshifting, such that mutations resulting in appropriately oriented reading frames joined by the 5′-AAA-TA(A/g)-3′ frameshift motif would be selectively neutral. In genes from other organisms that require frameshifts, the frequency at which the ribosome shifts reading frames varies considerably, but can be as high as 80% (reviewed in references 12 and 13). While information of this type is not available for Euplotes, the current results and past studies suggest that there are some constraints on the types of genes containing frameshift sites, and, thus, that not all ribosomes undergo a frameshift. Specifically, frameshift sites occur predominantly in genes encoding proteins with enzymatic functions, as opposed to genes encoding abundant proteins in the cell. Eight different types of Euplotes genes have been identified to date with frameshift sites. Five encode enzymes (three protein kinases, TERT, and the Tec2 tyrosine recombinase), and the p43 La motif protein is associated with the RNA component of the telomerase enzyme (2) and appears to anchor it in the nucleus (30). The functions of the remaining two proteins (the pEC14/MORN repeat protein and the pEC2 protein) are unknown, but there is no reason to suspect that they might be abundant in the cell. In contrast, the complete coding sequences for 27 genes encoding tubulins, histones, and ribosomal proteins in seven different Euplotes species are currently listed in GenBank (as of July 2005), and none have been reported to require a frameshift for expression. Tubulins, histones, and ribosomal proteins are almost certainly among the most abundant proteins in the cell, and the absence of any genes requiring a frameshift among this reasonably sized sample suggests that frameshift sites are not tolerated within highly expressed proteins. The apparent avoidance of frameshift sites in genes encoding abundant proteins suggests that frameshifting may reduce the level of translated protein, so that there may be selection against alleles with frameshift sites for either genes encoding abundant proteins or for genes whose protein products are at or near critical levels in the cell.
Is Euplotes +1 translational frameshifting involved in regulating gene expression?
Programmed translational frameshifting is known or thought to play a role in regulating the expression of a number of genes in other organisms (reviewed in references 13 and 32), and a number of reports have proposed that it also may be involved in regulating gene expression in euplotids (for examples, see references 1, 11, and 20). At the present time, there is no direct experimental evidence for frameshifting playing a role in euplotid gene regulation. However, if the 5′-AAA-TA(A/g)-3′ frameshift motif is the only sequence element required for a +1 frameshift, a regulatory function for frameshifting would appear unlikely, as it would presumably influence the expression of a significant fraction of the genes in the genome. It is possible that different classes of accessory regulatory elements exist, with particular elements shared by subsets of genes involved in a common cellular process, which would enable their coordinate regulation. There are some indications of genes with related functions being overrepresented among the currently small number of known euplotid frameshift genes, but the significance in each case is still unclear. First, three of the known frameshift genes encode putative protein kinases, but there is as yet no evidence that these three enzymes are involved in the same cellular process or pathway. Second, two of the known frameshift genes, TERT (20, 29, 42) and the p43 La motif protein gene (1), are involved in telomere-related functions. However, telomeres have been intensely studied in euplotids, so the identification of these two genes may simply be a matter of representation of telomere-related genes in the overall small sample from this organism. Thus, it is still difficult to differentiate between +1 frameshifting serving a regulatory function in euplotids, as opposed to these organisms having evolved a relatively efficient frameshift mechanism that tolerates the existence of frameshift sites within genes. More detailed studies of the expression of individual genes under different conditions, as well as expansion of the list of genes requiring frameshifts for expression, will be needed to resolve this issue.
Acknowledgments
This work was supported by grants from the National Science Foundation (MCB-9816765 and MCB-0343813).
I thank Stephen Walsh for his help with statistical analysis and Donna Cortezzo and Sara Avatapalli for technical assistance.
REFERENCES
- 1.Aigner, S., J. Lingner, K. J. Goodrich, C. A. Grosshans, A. Shevchenko, M. Mann, and T. R. Cech. 2000. Euplotes telomerase contains an La motif protein produced by apparent translational frameshifting. EMBO J. 19:6230-6239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Aigner, S., J. Postberg, H. J. Lipps, and T. R. Cech. 2003. The Euplotes La motif protein p43 has properties of a telomerase-specific subunit. Biochemistry 42:5736-5747. [DOI] [PubMed] [Google Scholar]
- 3.Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Asakura, T., T. Sasaki, F. Nagano, A. Satoh, H. Obaishi, H. Nishioka, H. Imamura, K. Hotta, K. Tanaka, H. Nakanishi, and Y. Takai. 1998. Isolation and characterization of a novel actin filament-binding protein from Saccharomyces cerevisiae. Oncogene 16:121-130. [DOI] [PubMed] [Google Scholar]
- 5.Bailey, T. L., and M. Gribskov. 1998. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14:48-54. [DOI] [PubMed] [Google Scholar]
- 6.Baranov, P. V., R. F. Gesteland, and J. F. Atkins. 2002. Recoding: translational bifurcations in gene expression. Gene 286:187-201. [DOI] [PubMed] [Google Scholar]
- 7.Blyth, C., and H. Still. 1983. Binomial confidence intervals. J. Am. Stat. Assoc. 78:108-116. [Google Scholar]
- 8.Casella, G. 1986. Refining binomial confidence intervals. Can. J. Stat. 14:113-129. [Google Scholar]
- 9.Chavatte, L., S. Kervestin, A. Favre, and O. Jean-Jean. 2003. Stop codon selection in eukaryotic translation termination: comparison of the discriminating potential between human and ciliate eRF1s. EMBO J. 22:1644-1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Crooks, G. E., G. Hon, J. M. Chandonia, and S. E. Brenner. 2004. WebLogo: a sequence logo generator. Genome Res. 14:1188-1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Doak, T. G., D. J. Witherspoon, C. L. Jahn, and G. Herrick. 2003. Selection on the genes of Euplotes crassus Tec1 and Tec2 transposons: evolutionary appearance of a programmed frameshift in a Tec2 gene encoding a tyrosine family site-specific recombinase. Eukaryot. Cell 2:95-102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Farabaugh, P. J. 1996. Programmed translational frameshifting. Microbiol. Rev. 60:103-134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Farabaugh, P. J. 2000. Translational frameshifting: implications for the mechanism of translational frame maintenance. Prog. Nucleic Acid Res. 64:131-170. [DOI] [PubMed] [Google Scholar]
- 14.Frels, J. S., and C. L. Jahn. 1995. DNA rearrangements in Euplotes crassus coincide with discrete periods of DNA replication during the polytene chromosome stage of macronuclear development. Mol. Cell. Biol. 15:6488-6495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hanks, S. K., and A. M. Quinn. 1991. Protein kinase catalytic domain sequence database: identification of conserved features of primary structure and classification of family members. Methods Enzymol. 200:38-62. [DOI] [PubMed] [Google Scholar]
- 16.Harper, D. S., and C. L. Jahn. 1989. Differential use of termination codons in ciliated protozoa. Proc. Natl. Acad. Sci. USA 86:3252-3256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hoffman, D. C., R. C. Anderson, M. L. DuBois, and D. M. Prescott. 1995. Macronuclear gene-sized molecules of hypotrichs. Nucleic Acids Res. 23:1279-1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jahn, C. L., S. Z. Doktor, J. S. Frels, J. W. Jaraczewski, and M. F. Krikau. 1993. Structures of the Euplotes crassus Tec1 and Tec2 elements: identification of putative transposase coding regions. Gene 133:71-78. [DOI] [PubMed] [Google Scholar]
- 19.Ju, T. K., and F. L. Huang. 2004. MSAP, the meichroacidin homolog of carp (Cyprinus carpio), differs from the rodent counterpart in germline expression and involves flagellar differentiation. Biol. Reprod. 71:1419-1429. [DOI] [PubMed] [Google Scholar]
- 20.Karamysheva, Z., L. Wang, T. Shrode, L. A. Hurley, J. Welch, and D. E. Shippen. 2003. Expression of the Euplotes telomerase catalytic subunit is controlled by differential activation of three TERT genes, programmed ribosomal frameshifting, and RNP assembly. Cell 113:565-576. [DOI] [PubMed] [Google Scholar]
- 21.Kervestin, S., L. Frolova, L. Kisselev, and O. Jean-Jean. 2001. Stop codon recognition in ciliates: Euplotes release factor does not respond to reassigned UGA codon. EMBO Rep. 2:680-684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Klobutcher, L. A., and P. J. Farabaugh. 2002. Shifty ciliates: frequent programmed translational frameshifting in euplotids. Cell 111:763-766. [DOI] [PubMed] [Google Scholar]
- 23.Klobutcher, L. A., M. T. Swanton, P. Donini, and D. M. Prescott. 1981. All gene-sized DNA molecules in four species of hypotrichs have the same terminal sequence and an unusual 3′ terminus. Proc. Natl. Acad. Sci. USA 78:3015-3019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kunita, R., A. Otomo, H. Mizumura, K. Suzuki, J. Showguchi-Miyata, Y. Yanagisawa, S. Hadano, and J. E. Ikeda. 2004. Homo-oligomerization of ALS2 through its unique carboxy-terminal regions is essential for the ALS2-associated Rab5 guanine nucleotide exchange activity and its regulatory function on endosome trafficking. J. Biol. Chem. 279:38626-38635. [DOI] [PubMed] [Google Scholar]
- 25.Kurland, C. G. 1992. Translational accuracy and the fitness of bacteria. Annu. Rev. Genet. 26:29-50. [DOI] [PubMed] [Google Scholar]
- 26.Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955-964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McGinnis, S., and T. L. Madden. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32:W20-W25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Meyer, F., H. J. Schmidt, E. Plumper, A. Hasilik, G. Mersmann, H. E. Meyer, A. Engstrom, and K. Heckmann. 1991. UGA is translated as cysteine in pheromone 3 of Euplotes octocarinatus. Proc. Natl. Acad. Sci. USA 88:3758-3761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Mollenbeck, M., M. C. Gavin, and L. A. Klobutcher. 2004. Evolution of programmed ribosomal frameshifting in the TERT gene of Euplotes. J. Mol. Evol. 58:701-711. [DOI] [PubMed] [Google Scholar]
- 30.Mollenbeck, M., J. Postberg, K. Paeschke, M. Rossbach, F. Jonsson, and H. J. Lipps. 2003. The telomerase-associated protein p43 is involved in anchoring telomerase in the nucleus. J. Cell Sci. 116:1757-1761. [DOI] [PubMed] [Google Scholar]
- 31.Morris, D. K., and V. Lundblad. 1997. Programmed translational frameshifting in a gene required for yeast telomere replication. Curr. Biol. 11:65-74. [DOI] [PubMed] [Google Scholar]
- 32.Namy, O., J. P. Rousset, S. Napthine, and I. Brierley. 2004. Reprogrammed genetic decoding in cellular gene expression. Mol. Cell 13:157-168. [DOI] [PubMed] [Google Scholar]
- 33.Qian, L., and M. Wilkinson. 1991. DNA fragment purification: removal of agarose 10 minutes after electrophoresis. BioTechniques 10:736-737. [PubMed] [Google Scholar]
- 34.Roth, M., M. Lin, and D. M. Prescott. 1985. Large scale synchronous mating and the study of macronuclear development in Euplotes crassus. J. Cell Biol. 101:79-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34a.Salas-Marco, J., H. Fan-Minogue, A. K. Kallmeyer, L. A. Klobutcher, P. J. Farabaugh, and D. M. Bedwell. Distinct paths to stop codon reassignment by the variant code organisms Tetrahymena and Euplotes. Mol. Cell. Biol., in press. [DOI] [PMC free article] [PubMed]
- 35.Schneider, T. D., and R. M. Stephens. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18:6097-6100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Stahl, G., G. P. McCarty, and P. J. Farabaugh. 2002. Ribosome structure: revisiting the connection between translational accuracy and unconventional decoding. Trends Biochem. Sci. 27:178-183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Swanton, M. T., J. M. Heumann, and D. M. Prescott. 1980. Gene-sized DNA molecules of the macronuclei in three species of hypotrichs: size distributions and absence of nicks. Chromasoma 77:217-227. [DOI] [PubMed] [Google Scholar]
- 38.Takeshima, H., S. Komazaki, M. Nishi, M. Iino, and K. Kangawa. 2000. Junctophilins: a novel family of junctional membrane complex proteins. Mol. Cell 6:11-22. [DOI] [PubMed] [Google Scholar]
- 39.Tan, M., K. Heckmann, and C. Brunen-Nieweler. 2001. Analysis of micronuclear, macronuclear and cDNA sequences encoding the regulatory subunit of cAMP-dependent protein kinase of Euplotes octocarinatus: evidence for a ribosomal frameshift. J. Eukaryot. Microbiol. 48:80-87. [DOI] [PubMed] [Google Scholar]
- 40.Tan, M., A. Liang, C. Brunen-Nieweler, and K. Heckmann. 2001. Programmed translational frameshifting is likely required for expressions of genes encoding putative nuclear protein kinases of the ciliate Euplotes octocarinatus. J. Eukaryot. Microbiol. 48:575-582. [DOI] [PubMed] [Google Scholar]
- 41.Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang, L., S. R. Dean, and D. E. Shippen. 2002. Oligomerization of the telomerase reverse transcriptase from Euplotes crassus. Nucleic Acids Res. 30:4032-4039. [DOI] [PMC free article] [PubMed] [Google Scholar]