Since the advent of methodologies to analyze the content of whole genomes (e.g., renaturation kinetics and Cot analysis), it has been known that a large fraction of eukaryotic genomes is highly repetitive (1, 2). Recent computer-assisted analysis of several sequenced eukaryotic genomes, including Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, and humans, has demonstrated that most repetitive DNA is composed of or derived from transposable elements (TEs). In the human genome, for example, TEs are the single most abundant component, accounting for over 40% of the total DNA (3). Although this amount of TEs is viewed as a hindrance to those engaged in the determination and assembly of DNA sequence, the availability of both complete and partial eukaryotic genome sequences is providing TE biologists with a bonanza of raw material that is being used to understand how genomes evolve.
Before the report in PNAS by Kapitonov and Jurka (4), all eukaryotic TEs were thought to use one of two mechanisms for transposition. Class 1, or retrotransposons, transpose via an RNA intermediate in reactions catalyzed by element-encoded proteins, including reverse transcriptase. In contrast, the transposon itself is the intermediate for class 2 elements where an element-encoded transposase catalyzes reactions, resulting in TE excision from one site and reinsertion elsewhere in the genome (the so-called cut-and-paste mechanism). In addition to these two mechanisms, some prokaryotic TEs (called IS or insertion sequences), move by another mechanism called rolling circle (RC) transposition (5, 6). This process is similar to the RC replication of some plasmids, single-stranded (ss) bacteriophage, and plant geminiviruses. In a recent issue of PNAS, Kapitonov and Jurka (4) report that RC transposons also occur in eukaryotes where, surprisingly, they comprise about 2% of the genomes of A. thaliana and C. elegans.
How could a group of TEs that account for such a large fraction of the genomes of these well-studied organisms remain until now essentially unknown? One answer to this question is that RC transposons have distinct structural features that are not easily detected by computer-assisted searches of DNA sequence databases. Helitron families of elements (as the eukaryotic RC transposons are called) do not generate target site duplications on insertion, as do all other eukaryotic TEs. These short duplications are derived from staggered endonucleolytic cleavage of the target DNA by element-encoded transposase or integrase. Instead, Helitrons target the dinucleotide AT, and insertion does not lead to the duplication of this sequence. Similarly, RC transposons do not have terminal inverted repeats, as do all other class 2 elements. Rather, Helitrons begin with a 5′ TC and end with a 3′ CTRR (Fig. 1a). Although there is a 16- to 20-nt palindrome just upstream of the 3′ CTRR, conservation of palindrome structure but not sequence would apparently preclude the use of a consensus sequence in the identification of Helitrons by computer-assisted searches. By analogy to RC mechanisms in prokaryotes, the distinct structural hallmarks of Helitrons are hypothesized to be essential for RC-mediated transposition (Fig. 1).
Helitrons may also have escaped classification for so long because the vast majority of family members are nonautonomous, defective elements that resemble internal deletion derivatives of their cognate autonomous element. It is important to note that up to 10 homogeneous subfamilies of nonautonomous Helitrons, with members ranging from 0.5 to 3 kb, were previously identified in the Arabidopsis genome as abundant repeats. These elements were first designated AthE1 (7) and AtREP (8) and, later, Basho (9). However, in the absence of any obvious structural features of either class 1 or class 2 elements, these repeat families remained mysterious and unclassified. It was only when the complete genome sequence of Arabidopsis became available that Kapitonov and Jurka (4) were able to identify the much less abundant but very large (5.5 to 15 kb) Helitrons that have coding capacity for products related to RC replication proteins.
Although rare in prokaryotes, nonautonomous elements are common and abundant members of most eukaryotic transposon families. They are usually internally deleted derivatives of autonomous members and lack coding capacity for the transposase. Because most DNA transposon families contain distinct groups of nonautonomous elements that are conserved in both sequence and length, it is likely that most subfamilies arose from a single or a few deleted copies that were subsequently amplified with enzymes encoded in trans by an autonomous element. This seems to be the case for the RC-transposing Helitrons, because homogeneous groups of defective elements sharing their termini with autonomous copies are abundant in the A. thaliana, Oryza sativa (rice), and C. elegans genomes. Although nonautonomous RC transposons have not been reported in prokaryotes, engineered nonautonomous copies of the Escherichia coli RC element IS91 transposed at high frequency when supplied with transposase in trans (5, 6).
What is still mysterious is how the RC mechanism generates nonautonomous elements. For other eukaryotic class 2 elements, it has been shown that such defective copies can arise by incomplete double strand gap repair after excision of an autonomous element (10–12). It is unlikely that a similar mechanism can account for the origin of nonautonomous Helitrons because they presumably do not excise as double-stranded molecules and thus do not create a double strand gap at the donor site. Nevertheless, recombination and slippage during the copying of the transposed single strand at the donor site may account for the origin of internally deleted Helitrons (see Fig. 1b). Alternatively, nonautonomous Helitrons may form de novo from host sequences given the minimal cis requirements that appear to be necessary for RC-mediated transposition.
Other open questions concern the function and origin of the putative genes encoded by the larger Helitrons. The preliminary analysis of Kapitonov and Jurka (4) suggests that Helitrons from A. thaliana, O. sativa, and C. elegans have coding capacity for a large product of ≈1500 aa that contains an ≈500-aa domain similar to eukaryotic, prokaryotic, and viral 5′ to 3′ DNA helicases. These putative products of Helitron also share motifs with the replicator initiator proteins of RC plasmids and certain ssDNA viruses. More surprisingly, the plant Helitrons harbor additional genes that are related to RPA70, the largest subunit of replication protein A. RPA70 is a cellular ssDNA-binding protein that is conserved in plants, animals, and fungi. The gene richness of plant Helitrons is in sharp contrast with other class 2 transposons, including bacterial RC insertion sequences, which usually encode only one protein, a transposase. Whereas it has been shown in vitro that the transposase alone is sufficient to mediate the cut-and-paste mechanism (13–15), it is known that host-encoded factors are also required in vivo for most transposition reactions (16–18). Similarly, prokaryotic RC transposition has been shown to require host-encoded helicases and ssDNA-binding proteins (18). The identification of motifs for some of these functions among the Helitron-encoded products suggests a scenario whereby prokaryotic and eukaryotic RC elements arose from a common ancestral element, but that eukaryotic Helitrons have evolved further through the capture of additional functions from their host. A hypothetical mechanism for the acquisition of host genes by RC elements is depicted in Fig. 1b and is based on results showing that transposition of bacterial IS91 and presumably of Helitrons has minimal cis requirements. That is, only the 5′ end of IS91 is required to initiate transposition (5), whereas a cryptic downstream palindrome could furnish a new terminator if the normal terminator was bypassed. Whatever the mechanism, transduction events must occur with sufficient frequency to permit the eventual capture of useful genes or exons. In this regard, it is tempting to view Helitrons as “exon shuffling machines.”
Although Helitrons are the first RC transposons identified in eukaryotic genomes, an RC mechanism is known to be responsible for the replication of geminiviruses, a group of ssDNA viruses that infect many plant species (19). Some of these viruses encode a Rep protein with both helicase and ssDNA-binding activities that can interact with the cellular machinery of DNA replication (20, 21). As suggested by Kapitonov and Jurka (4), it is possible that Helitrons represent the missing evolutionary link between prokaryotic RC elements and geminiviruses. Alternatively, Helitrons may have arisen from geminiviruses that were integrated into the genome of an early eukaryotic ancestor. On the surface, this scenario seems unlikely because integration into the host genome is not part of the geminivirus life cycle; that is, replication occurs extrachromosomally. However, it is noteworthy that multiple copies of geminivirus DNA have been found integrated into the chromosomes of tobacco (22). In the context of this commentary, this is probably not a surprising finding. Like integrated geminiviruses, RC transposons can now be added to a growing list of entities known to reside in eukaryotic genomes. More and more, genomes are beginning to resemble the family attic where the relics and mementos of several lifetimes are stored and await discovery.
Footnotes
See companion article on page 8714 in issue 15 of volume 98.
References
- 1.Britten R J, Graham D E, Neufeld B R. Methods Enzymol. 1974;29:323–405. doi: 10.1016/0076-6879(74)29033-5. [DOI] [PubMed] [Google Scholar]
- 2.Goldberg R B. Biochem Genet. 1978;16:45–68. doi: 10.1007/BF00484384. [DOI] [PubMed] [Google Scholar]
- 3.International Human Genome Sequencing Consortium. Nature (London) 2001;409:860–921. [Google Scholar]
- 4.Kapitonov V V, Jurka J. Proc Natl Acad Sci USA. 2001;98:8714–8719. doi: 10.1073/pnas.151269298. . (First Published July 10, 2001; 10.1073/pnas.151269298) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mendiola M V, Bernales I, de la Cruz F. Proc Natl Acad Sci USA. 1994;91:1922–1926. doi: 10.1073/pnas.91.5.1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.del Pilar Garcillan-Barcia M, Bernales I, Mendiola M V, de la Cruz F. Mol Microbiol. 2001;39:494–501. doi: 10.1046/j.1365-2958.2001.02261.x. [DOI] [PubMed] [Google Scholar]
- 7.Surzycki S A, Belknap W R. J Mol Evol. 1999;48:684–691. doi: 10.1007/pl00006512. [DOI] [PubMed] [Google Scholar]
- 8.Kapitonov V V, Jurka J. Genetica. 1999;107:27–37. [PubMed] [Google Scholar]
- 9.Le Q H, Wright S, Yu Z, Bureau T. Proc Natl Acad Sci USA. 2000;97:7376–7381. doi: 10.1073/pnas.97.13.7376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Engels W R, Johnson-Schlitz D M, Eggleston W B, Sved J. Cell. 1990;62:515–525. doi: 10.1016/0092-8674(90)90016-8. [DOI] [PubMed] [Google Scholar]
- 11.Plasterk R H. EMBO J. 1991;10:1919–1925. doi: 10.1002/j.1460-2075.1991.tb07718.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rubin E, Levy A A. Mol Cell Biol. 1997;17:6294–6302. doi: 10.1128/mcb.17.11.6294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kaufman P D, Rio D C. Cell. 1992;69:27–39. doi: 10.1016/0092-8674(92)90116-t. [DOI] [PubMed] [Google Scholar]
- 14.Vos J C, De Baere I, Plasterk R H. Genes Dev. 1996;10:755–761. doi: 10.1101/gad.10.6.755. [DOI] [PubMed] [Google Scholar]
- 15.Lampe D J, Churchill M E, Robertson H M. EMBO J. 1996;15:5470–5479. [PMC free article] [PubMed] [Google Scholar]
- 16.Mizuuchi K. Annu Rev Biochem. 1992;61:1011–1051. doi: 10.1146/annurev.bi.61.070192.005051. [DOI] [PubMed] [Google Scholar]
- 17.Beall E L, Rio D C. Genes Dev. 1996;10:921–933. doi: 10.1101/gad.10.8.921. [DOI] [PubMed] [Google Scholar]
- 18.Mahillon J, Chandler M. Microbiol Mol Biol Rev. 1998;62:725–774. doi: 10.1128/mmbr.62.3.725-774.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Stenger D C, Revington G N, Stevenson M C, Bisaro D M. Proc Natl Acad Sci USA. 1991;88:8029–8033. doi: 10.1073/pnas.88.18.8029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Koonin E V, Ilyina T V. J Gen Virol. 1992;73:2763–2766. doi: 10.1099/0022-1317-73-10-2763. [DOI] [PubMed] [Google Scholar]
- 21.Gutierrez C. EMBO J. 2000;19:792–799. doi: 10.1093/emboj/19.5.792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bejarano E R, Khashoggi A, Witty M, Lichtenstein C. Proc Natl Acad Sci USA. 1996;93:759–764. doi: 10.1073/pnas.93.2.759. [DOI] [PMC free article] [PubMed] [Google Scholar]