Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Nov 25;110(50):20140–20145. doi: 10.1073/pnas.1310958110

Acquisition of an Archaea-like ribonuclease H domain by plant L1 retrotransposons supports modular evolution

Georgy Smyshlyaev a,b,1, Franka Voigt c, Alexander Blinov a, Orsolya Barabas c, Olga Novikova d
PMCID: PMC3864347  PMID: 24277848

Significance

Transposons are jumping genes that constitute a sizeable fraction of eukaryotic genomes. They drive genome evolution and can cause genetic diseases and cancer. Although transposons were first discovered in plants and much of our knowledge about them stems from plants, the most abundant human transposon, L1, has barely been investigated in plants. In this study, we identify plant L1 retrotransposons from a variety of plant genomes and show that, similar to viruses, they evolved in a modular fashion by gaining and losing various protein-coding domains. Moreover, we find that plant L1s carry an active Archaea-like ribonuclease H (RNH) domain, suggesting that they shuttle RNH between plants, bacteria, and Archaea.

Abstract

Although a variety of non-LTR retrotransposons of the L1 superfamily have been found in plant genomes over recent decades, their diversity, distribution, and evolution have yet to be analyzed in depth. Here, we perform comprehensive comparative and evolutionary analyses of L1 retrotransposons from 29 genomes of land plants covering a wide range of taxa. We identify numerous L1 elements in these genomes and detect a striking diversity of their domain composition. We show that all known land plant L1 retrotransposons can be grouped into five major families based on their phylogenetic relationships and domain composition. Moreover, we trace the putative evolution timeline that created the current variants and reveal that evolutionary events included losses and acquisitions of diverse putative RNA-binding domains and the acquisition of an Archaea-like ribonuclease H (RNH) domain. We also show that the latter RNH domain is autonomously active in vitro and speculate that retrotransposons may play a role in the horizontal transfer of RNH between plants, Archaea, and bacteria. The acquisition of an Archaea-like RNH domain by plant L1 retrotransposons negates the hypothesis that RNH domains in non-LTR retrotransposons have a single origin and provides evidence that acquisition happened at least twice. Together, our data indicate that the evolution of the investigated retrotransposons can be mainly characterized by repeated events of domain rearrangements and identify modular evolution as a major trend in the evolution of plant L1 retrotransposons.


Ever since the discovery of the first transposon by Barbara McClintock in 1948 in maize, plants have provided a prime model system to study transposition. Both DNA transposons and retrotransposons are abundant in plants (1) and play a major role in driving genetic diversity and evolution of these organisms. Interestingly, non-LTR retrotransposons, the class of retrotransposons that is most represented in mammals, are found less abundantly in plants (25). More recently, however, a variety of non-LTR retrotransposons has been discovered in plants, and they were classified to three non-LTR retrotransposon superfamilies: retrotransposable element (RTE), Dualen, and L1 (68).

The L1 superfamily is perhaps the best-studied plant non-LTR retrotransposon group, members of which have been identified in most plants. Among other species, they occur in corn [Zea mays (Cin4)], thale cress [Arabidopsis thaliana (Ta11-1)], and the alga Chlorella vulgaris (Zepp) (911).

In general, plant L1 retrotransposons carry two ORFs. The protein encoded by ORF1 (ORF1p) usually contains an RNA recognition motif (RRM) (1214). Similarly, human L1 ORF1p, which presumably acts as an RNA chaperone (15), also contains an RRM that was shown to bind single-stranded nucleic acids in cooperation with the ORF1p C-terminal domain (14). The second ORF (ORF2) of plant L1 retrotransposons encodes a polyprotein (ORF2p) that exhibits apurinic/apyrimidinic endonuclease (APE) and reverse transcriptase (RT) activities. In some plant L1 retrotransposons, an additional domain can also be found within ORF2p that is homologous to ribonuclease H (RNH) (13). RNH proteins usually function as housekeeping enzymes to degrade RNA primers during replication. They are also ubiquitously present in LTR retrotransposons but less common in non-LTR elements. Despite the accumulating knowledge on plant L1 elements, only a few studies have so far attempted to analyze their diversity via a comprehensive genomic approach (13), leaving many questions regarding their mechanisms and evolution unanswered.

Previously, we identified three distinct groups of plant L1-like retrotransposons in flowering plants: Ta11, Beta vulgaris non-LTR retroelement (BNR), and Cin4 (13). Here, we extend this in silico analysis and perform comprehensive mining of L1-like retrotransposons in diverse plant genomes. Through phylogenetic and structural analysis, we find that these elements exhibit a great diversity of domain composition and identify two additional evolutionary groups of plant L1-like retrotransposons: the purine-rich domain-containing (PUR) and the nonseed land plant-specific (NSLP) groups. We also detect an unconventional RNH domain in ORF2p of some plant L1-like retrotransposons and show that this domain (i) is closely related to archaeal RNHs, (ii) has been acquired by a family of plant L1s, and (iii) is autonomously active in vitro. Based on these data, we hypothesize that L1 retrotransposons might function as vectors for the horizontal transfer of RNHs between plants, bacteria, and Archaea. Moreover, we reconstitute the process of plant L1-like retrotransposon evolution and argue that it mainly occurs in a modular fashion through a series of acquisition, loss, or exchange of independent functional domains.

Results and Discussion

Identification of L1 Retrotransposons from Plants.

To perform a comprehensive phylogenetic and evolutionary analysis of L1 retrotransposons across a wide range of plant taxa, we first aimed to expand the list of available elements and identified L1 elements from the fully sequenced genomes of moss Physcomitrella patens (Bryophyta), spikemoss Selaginella moellendorffii (Lycophyta), date palm Phoenix dactylifera (Monocots), tomato Solanum lycopersicum, and clementine Citrus clementina (Eudicots). Newly identified elements were then added to the elements reported in our previous study (13). This resulted in a comprehensive set of L1 retrotransposons from a total of 16 genomes (the list of all studied genomes is presented in Table S1), 14 of which were derived from seed plants and two species, P. patens and S. moellendorffii, represented nonseed land plants. In these 16 genomes, we found a total of 99 L1-like retrotransposons, 91 of which have not been reported previously in Repbase to our knowledge (Table S2). Thirty-three of the identified elements were represented by at least one complete and putatively intact copy within the respective genome.

These 99 elements were complemented with 50 retrotransposon sequences from 13 plant genomes that were previously reported in Repbase or GenBank, resulting in a total set of 149 elements from 29 genomes (Table S2).

Plant L1 Retrotransposon Families Have Diverse Domain Architecture.

For the above 149 elements, we performed phylogenetic analysis of their RT domain (Fig. S1) and analyzed their domain composition in general. Together, these analyses revealed that all investigated land plant L1-like retrotransposons form a distinct lineage within the L1 superfamily and can be classified into five distinct evolutionary families (Fig. 1 and Fig. S1). In addition to the three previously described families, Cin4, BNR, and Ta11, we identified two additional families: NSLP and PUR. Consistent with previous studies, we found that plant L1 retrotransposons contain APE, RT, and CCHC domains in their ORF2p. Ta11 family L1s also contain an unexpected but unambiguous RNH domain at the C terminus of ORF2p. Moreover, we found a striking diversity of domain architecture in ORF1p of the analyzed elements. In fact, each of the five plant L1 families is characterized by unique ORF1p domain architecture specific to the individual family. Identified ORF1p domains include RRM, CCHC, and PUR domains.

Fig. 1.

Fig. 1.

Diversity of plant L1 retrotransposons. The number and distribution of the elements identified in the study are shown in pie charts, with numbers displaying the number of elements in the highlighted group of plants (dark gray, nonseed; gray, dicots; white, monocots) from the corresponding plant L1 family. Structure schemes of plant L1 elements are also shown. CCHC, Zn-finger CCHC motif; N-RRM, N-terminal RRM of BNR retrotransposons.

The seven elements found in P. patens and S. moellendorffii clustered together in the phylogenetic tree and were grouped into a NSLP family. In ORF1p of these retrotransposons, we only detected a single CCHC motif (Fig. 1) and found no evidence for an RRM domain.

Cin4, the second plant L1-like retrotransposon family, was named after the Cin4 element from Z. mays (9). This family is likely specific to monocots. Its ORF1 encodes a protein with a double CCHC Zn-finger motif, followed by a C-terminal RRM domain (Fig. 1). Interestingly, HHpred analysis demonstrated that whereas several families of non-LTR retrotransposons contain RRM in their ORF1p (12, 14, 16), the RRM of Cin4 retrotransposons is related to mammalian nucleolin and U2AF(65) splicing factor RRMs but not to the RRM domain encoded by mammalian L1 retrotransposons, such as human Long Interspersed Nuclear Element (Fig. S2).

The third family of plant L1 retrotransposons, BNR, was previously described by Heitkam and Schmidt (12). Representatives of this family contain a specific type of RRM [N-terminal RRM (N-RRM) in Fig. 1] as part of ORF1p. Our analysis revealed that a conserved region downstream of the RRM described in previous research is actually a second RRM (termed simply RRM in Fig. 1), homologous to the RRM of Cin4 retrotransposons (Fig. S2).

The unique PUR family (13 elements; Fig. 1) is a sister group to the BNR family (Fig. S1) but is structurally distinct. ORF1p of PUR retrotransposons contains an RRM similar to that of Cin4 and to the second RRM of the BNR retrotransposons. However, in addition to RRM, the PUR family shows a domain that is remotely related to the so-called PUR (purine-rich) domain (Fig. 1 and Fig. S2). PUR domains are highly conserved nucleic acid-binding motifs known to encode sequence-specific DNA- and RNA-binding domains (1719).

The Ta11 family of plant L1 retrotransposons was named after one of the earliest L1 elements described in A. thaliana (10). All members of the Ta11 family carry an RRM, followed by a Zn-finger CCHC motif in ORF1p. A comparative analysis revealed that RRM of Ta11 is related to RRMs of Cin4 and PUR and to the second RRM of the BNR retrotransposons (Fig. S2). The ORF2 polyprotein of Ta11 retrotransposons contains a C-terminal RNH domain (Fig. 1). The presence of an RNH domain is unique to the Ta11 family among plant L1 retrotransposons, which may indicate that this domain was acquired by a Ta11 ancestor following diversification of L1 families.

Together, our phylogenetic and structural analysis of a comprehensive set of 149 L1 elements from plants revealed that these elements have strikingly diverse domain architecture and identified an unexpected RNH domain in one L1 family (Ta11). These findings prompted further investigations into the evolutionary history of RNHs and plant L1s in general.

RNH Domain of Ta11 L1-Like Retrotransposons Is Homologous to Archaeal RNHs.

RNH homologs can be found in virtually all organisms, from viruses to higher eukaryotes (6, 20, 21). They specifically degrade the RNA strand of RNA/DNA hybrid molecules, and thereby help to replicate and repair genomes (2123). Based on their amino acid sequence and the presence of distinct functional sites, RNHs are classified as bacterial, eukaryotic, and archaeal (24) (Figs. 2, 3, and S3). However, this designation of individual RNH classes does not correlate with their actual distribution among taxa; for instance, archaeal RNH was also found in bacterial and plant genomes (25).

Fig. 2.

Fig. 2.

Amino acid sequence alignment of different types of RNHs. The different types of plant RNHs are marked in the left part of the alignment. The names of the three types of RNHs that are not associated with RT are shown in bold italic font. The name of the family and the names of Ta11 family representatives are shown in bold font. The basic protrusion that has been proposed to contribute to establishing RNA⁄ DNA hybrid specificity is indicated with a dotted box. The semiconservative residue varying between the archaeal and bacterial RNHs is denoted by the bigger font at position 164 of the alignment. The residues believed to be important for the catalytic mechanism of RNHs are indicated with stars at the bottom of the alignment. The conserved residues are highlighted in black and gray. The secondary structure of E. coli RNH (Protein Data Bank ID code 1G15_A) is shown at the bottom of the alignment. The α-helices are depicted as helices, and the β-sheets are shown as arrows.

Fig. 3.

Fig. 3.

Maximum-likelihood tree based on the amino acid sequences of different types of RNHs. Statistical support was evaluated by the approximate likelihood-ratio test (aLRT). The different types of RNHs are denoted with brackets to the right of the tree. The names of Ta11 retrotransposons are in shown in bold font. The names of the three types of RNHs that are not associated with RT are shown in bold italic font. Accession numbers of the sequences are provided in Fig. S3.

In retroviruses and retrotransposons, RNHs are usually encoded in association with RT as part of a multifunctional polyprotein. Here, their function is to eliminate the viral or transposon RNA template after reverse transcription (6). Depending on their origin, these enzymes are phylogenetically divided into LTR retrotransposon, non-LTR retrotransposon, and retroviral RNHs. Although LTRs and retroviruses always contain an RNH domain, non-LTR retrotransposons can afford to lack RNH domains despite the fact that they largely depend on robust RNH activity. This is because reverse transcription occurs in the nucleus, where they can hijack host RNH activity (6). Nevertheless, many non-LTR retrotransposons do contain their own RNH domain (26). This non-LTR RNH domain was proposed to have emerged in a common ancestor of non-LTR superfamilies, such as R1, I, Tad1, and LOA, after the divergence of L1 elements (27). In this context, our identification of an RNH domain in the L1 superfamily Ta11 retrotransposons is remarkable, per se, but its location within the ORF2p polyprotein is perhaps even more surprising. Namely, in Ta11 ORF2p, the RNH domain does not occur in direct association with RT as in other retrotransposons (27, 28); rather, it occurs at the very C terminus of ORF2p, separated from RT by a Zn-finger CCHC motif (Fig. 1).

Thus, to investigate further the origin of the unorthodox Ta11 RNH domain, we performed multiple alignments with RNH domains from diverse sources. This revealed that the Ta11 RNH domains lack a semiconservative His residue (bold H in Fig. 2) that is present in many RNHs, including cellular proteins from prokaryotes, eukaryotes, retroviruses, and non-LTR retrotransposons. Instead, the Ta11 RNH domain contains an Arg residue (bold R in Fig. 2), as was previously shown for an RNH from the archaeon Sulfolobus tokodaii (29).

Interestingly, LTR retrotransposon RNHs contain neither a conserved His nor Arg in that position, and are therefore assumed to exert lower catalytic activity than all other RNHs. This is believed to be advantageous to preserve the polypurine tract primers that prime the synthesis of the second retrotransposon DNA strand (26). Non-LTR retrotransposons, however, depend on high RNH activity (either host-derived or self-encoded), because their RNA template must be completely degraded before second-strand DNA synthesis can occur during transposon integration (27). These mechanistic differences might explain the stronger selection for a highly active RNH containing all conserved catalytic residues in non-LTR retrotransposons as opposed to LTR retrotransposons. Even though retroviruses also rely on polypurine tract primers in the same way as LTR retrotransposons, their RNHs contain a well-conserved His residue. It appears that retroviruses apply a distinct mechanism to modulate their RNH activity via a retrovirus-specific connection domain that is absent in LTR retrotransposons (27).

The above analysis suggests that the Ta11 RNH domain is more similar to archaeal RNHs than to non-LTR RNHs. To investigate this further, we created a phylogenetic tree of RNHs (Figs. 3 and S3) using eukaryotic, bacterial, and retrotransposon-encoded RNHs, as well as archaeal RNHs. We also included archaeal RNH homologs that were previously reported on plant chromosomes (25), chloroplast, and mitochondrial genomes (30). This demonstrated that the Ta11 RNH and chromosomal Archaea-like RNH domains belong to a monophyletic group. The RNHs of LTR retrotransposons form an adjoining clade, with clear subbranching into the major LTR retrotransposon groups. On the other hand, RNH domains of non-LTR retrotransposons from non-L1 superfamilies do not seem to be related to L1 RNH domains and are rather more closely related to RNHs of class 2 retroviruses (Fig. 3). Interestingly, our phylogenetic analysis also revealed that RNHs of class 1 and class 3 retroviruses are clearly separated from RNHs of class 2 retroviruses (Fig. 3). They rather resemble eukaryotic and bacterial RNHs and possess a basic protrusion (Fig. 2), which was shown to be important for RNA/DNA hybrid binding in bacterial RNHs (20). None of the other RT-associated RNHs, including RNH of class 2 retroviruses, contains this basic protrusion (Fig. 2). In accordance with the previously proposed reacquisition hypothesis (26, 27), our data suggest that different retroviruses have acquired RNH domains on two different occasions from two independent sources. We propose that class 2 retroviruses reacquired RNH from non-LTR retrotransposons, whereas class 1 and 3 retroviruses have more likely acquired eukaryotic- or bacterial-like RNHs with basic protrusion.

The unusual occurrence of an Archaea-like RNH domain in the Ta11 L1 retrotransposons, the lack of the domain in other L1 families, and its higher similarity to Archaea-like host RNHs in plant genomes than to RNHs from other non-LTR superfamilies suggest that plant Ta11 retrotransposons may have acquired this RNH domain from their host genome. Moreover, these findings also imply that acquisition of RNH by Ta11 retrotransposons occurred independently from all other RNH-containing non-LTR retrotransposons. On the other hand, our phylogenetic analysis also revealed an intriguing homology between several plant, bacterial, and archaeal RNHs, the source of which remains ambiguous. It was previously hypothesized that the RNH homologs in bacteria and plants might be derived via retroviral horizontal gene transfer from Archaea (25). However, retrotransposons have also been previously reported to be transferred horizontally (3135) and could provide a vehicle for the interspecies transfer of RNH genes between Archaea, plant, and bacteria.

Ta11 RNH Domain Is Biochemically Active in Isolation.

Genes that are moved via horizontal transfer between species can only benefit the host if they retain their functionality during the transfer process. Therefore, if retrotransposons play a role in horizontal transfer of RNH, one can expect that their cargo RNH domain will remain active as a stand-alone protein. To test the independent biochemical activity of the Ta11 RNH domain, we performed in vitro activity assays using recombinantly expressed RNH from LIb, an L1 retrotransposon found in Ipomoea batatas. LIb was chosen because this Ta11 element has been demonstrated to transpose actively within its host genome (36). WT LIbRNH and its catalytic mutant (LIbRNH D1326N) were overexpressed in Escherichia coli, purified, and combined with a 5′-P32–labeled-poly(A)/poly(dT) 20-mer duplex substrate to test their activity. Upon incubation of WT LIbRNH with excess (100 nM) substrate, the RNA strand of the RNA/DNA hybrid was cleaved in the presence of either Mg2+ or Mn2+, whereas the catalytic mutant LIbRNH D1326N did not show any activity (Fig. 4). Furthermore, the detected activity was specific for the RNA/DNA hybrid, because the incubation of WT LIbRNH with a single-stranded RNA (ssRNA) substrate did not lead to substrate cleavage.

Fig. 4.

Fig. 4.

Enzymatic activity of the RNH domain of the LIb retrotransposon; wt denotes the WT LIbRNH protein, and mut denotes the mutant LIbRNH protein containing a conserved aspartate residue to asparagine residue substitution. Samples, including ssRNA and ssRNA together with WT LIbRNH in the presence of Mg2+, are shown in lanes 2 and 3, respectively. No cleavage of the ssRNA substrate is observed upon addition of the WT LIb protein. Lanes 5 and 13 represent the RNA/DNA hybrid without protein. The cleavage of the RNA/DNA hybrid substrate with addition of the WT LIbRNH protein is presented in lanes 6–8 (in the presence of Mg2+) and lanes 14–16 (in the presence of Mn2+). No cleavage is observed with the addition of the mutant protein (lanes 9–11 and lanes 17–19 in the presence of Mg2+ and Mn2+, respectively).

Plant L1 Retrotransposons Evolve in a Modular Fashion.

Botstein’s theory of modular evolution for viruses (37) states that the product of evolution is not a specific virus but a family of interchangeable modules, each with a particular biological function. Thus, modularity, or the clustering of epistatic interactions, is considered to be a determining feature of virus-like evolutionary systems and has also been proposed to be important for transposable element evolution (38). Recent work has shown that bacterial mobile DNA elements can acquire discrete functional modules (39), and numerous examples of the acquisition of novel modules by retrotransposons are also known (14, 40, 41). The exchange of modules between mobile elements produces structural variants of transposons with a potential evolutionary advantage and may yield novel transposon families (38, 39).

Above, we have described five families of L1 retrotransposons. Each of these families is characterized by a distinct module architecture assembled from a total of seven structural domains: two different RRM domains; a PUR domain; a Zn-finger CCHC motif; and APE, RT, and RNH domains. This structural diversity suggests that plant L1 retrotransposons may evolve in a modular fashion. To investigate this further, we sought to trace the timeline of evolution of these elements reconstructed based on their RT phylogeny.

As mentioned above, ORF1p of NSLP from nonseed plants does not contain a recognizable RNA-binding domain, except for the CCHC motif. In contrast, all seed plant L1 retrotransposons contain at least one RRM domain. For this reason, we speculate that NSLP elements represent precursor elements of seed plant L1 retrotransposons, which acquired an RRM domain later in evolution (Fig. 5). Nucleolin and U2AF(65) splicing factor homologs, which are present in plant genomes (42, 43), are the most likely sources of this domain. This module acquisition seems to represent the first major step in plant L1 retrotransposon evolution.

Fig. 5.

Fig. 5.

Modular evolution of plant L1 retrotransposons. The proposed evolutionary timeline leading to the emergence of modern plant L1 families is based on RT phylogeny and is indicated by arrows in the figure. Schemes of the structures of the modern plant L1 elements are shown inside the dotted box. The hypothetical precursor elements are shown in parentheses.

The next evolutionary step toward the establishment of modern L1 variants is more ambiguous. The most parsimonious model suggests a diversification through the preservation or loss of the C-terminal CCHC motif in ORF1p (Fig. 5). The first clade, which retained the CCHC motif, gave rise to the Ta11 retrotransposons. The fact that only the elements of the Ta11 family are present in both monocots and dicots is in good agreement with this model and supports the hypothesis that Ta11 is the most ancient family of seed plant L1 retrotransposons. Because we identified the RNH-containing Ta11 retrotransposons to be part of a monophyletic clade, we conclude that their next evolutionary step (Fig. 5) must have been the acquisition of an RNH domain at the C terminus of ORF2p of an ancestral Ta11 element. Moreover, the finding that archaeal RNH proteins homologous to the Ta11 RNH domain are present in plant genomes suggests that the most likely source of this domain is the host genome itself. Independent acquisition of this RNH domain, together with its autonomous biochemical activity, supports that it can be considered an “individual functional unit” or “module” in Botstein’s terminology (37).

The process of acquiring new domains to substitute for the lost Zn-finger CCHC motif in all other plant L1 retrotransposon families appears to have been the next event in evolution. According to our evolutionary reconstruction, the CCHC motif was replaced with RRM or PUR domains or with a double N-terminal CCHC motif; these structural variants gave rise to the BNR, PUR, and Cin4 retrotransposons, respectively (Fig. 5). Although the presence of an obligatory RRM domain in all seed plant L1 retrotransposons probably reflects its importance for ORF1p function, the presence of additional RNA-binding structures (e.g., Zn-finger CCHC motif, PUR domain, additional RRM domain) in all these elements implies that a single RRM might not be sufficient to execute ORF1p’s multiple and diverse functions.

Taken together, these data demonstrate that the evolution of plant L1 retrotransposons is best described as a series of acquisition and elimination events of various functional domains, and they suggest that modularity is a major trend in plant L1 retrotransposon evolution.

In summary, we present a detailed analysis of the L1 superfamily of non-LTR retrotransposons in plants and classify them into five distinct families. These data provide the community with an extensive resource on plant L1 elements and a reference system with which to classify novel elements identified in the future. By further analyzing these plant L1 families, we also show that, similar to viruses, modular evolution is a key characteristic of plant L1 retrotransposon evolution. Moreover, we find that one family, Ta11, has acquired a functional RNH domain. Dissimilarity of this Archaea-like RNH domain in plant L1s to RNH domains of other non-LTR retrotransposons negates the previously proposed hypothesis (27) that RNH domains in non-LTR retrotransposons have a single origin and provides evidence that such acquisition has happened at least twice.

Experimental Procedures

Whole-Genome Sequence Analysis and Comparative Analysis.

The genomic sequences used in this study were retrieved from databases as listed in Table S2. Table S2 also contains a list of all retrotransposons found. We used the HMMER2 algorithm imbedded in the UGENE software (44) to identify L1-like retrotransposons from whole-genome sequences and retrieved whole, intact copies of elements as described previously (28, 45).

RT alignment was performed using the MUSCLE algorithm (46). The phylogenetic analysis was performed using the maximum-likelihood method implemented in the phylogenetic estimation using the maximum likelihood (PhyML) program package (47). The approximate likelihood-ratio test of the branches was used for statistical support (48). The profile multiple alignment with predicted local structures and 3D constraints (PROMALS3D) server (49), was used for producing the alignment of the amino acid sequences of RRM, PUR, and RNH (Fig. S2). We used the HHpred online server to perform homology detection and structure prediction for structural domains search in putatively intact ORF proteins (50).

Plasmid Design and Protein Purification.

The nucleotide sequence of the LIb retrotransposon RNH domain corresponding to amino acids 1,205–1,366 of the ORF2 polyprotein (GenBank accession no. BAE79382) was codon-optimized and synthesized using GeneArt (www.lifetechnologies.com). This construct was cloned into the pETM-22 expression vector. The resulting plasmid (p22LIbRNHwt) encoded the WT LIbRNH domain fused to an N-terminal thioredoxin-6× His tag. The expression vector p22LIbRNHmut contained a mutant RNH variant carrying a mutation at the conserved aspartate 1,326 (D1326N), which rendered the RNH inactive (51). E. coli strain BL21(DE3) cells were transformed with the above-described expression vectors. Cells were lysed in lysis buffer [100 mM Tris (pH 7.5), 1 M LiCl, 500 mM NaCl, 0.1 mM Tris(2-carboxyethyl)phosphine (TCEP), 5 mM imidazole, and 5% (vol/vol) glycerol], and the supernatant was applied to a Ni2+-Sepharose column (GE Healthcare). We used 100 mM Tris (pH 7.5), 500 mM NaCl, 0.1 mM TCEP, 5% (vol/vol) glycerol, and 100 mM imidazole as an eluent, according to the manufacturer’s instructions (GE Healthcare). The tags were removed through Prescission protease (PepCore; the European Molecular Biology Laboratory Heidelberg) cleavage, and the protein was further purified on an S200 (16/60) gel-filtration column (GE Healthcare). Protein purity was >90% as evaluated by SDS/PAGE.

RNH Activity Assay.

A 20-mer polyA RNA strand and a complementary polyT DNA strand were chemically synthesized by Integrated DNA Technologies (www.idtdna.com). The ssRNA substrate was 5′-P32–labeled, and the RNA/DNA duplexes were prepared by hybridizing the labeled RNA strands with a twofold excess of cDNA oligomers. For in vitro cleavage assays, purified RNH was incubated with the 20-mer duplex substrate for 1 h at room temperature in a buffer containing 50 mM Tris⋅HCl (pH 7.0), 200 mM NaCl, 0.1 mM TCEP, 5% (vol/vol) glycerol, 20 μg/mL BSA, and 5 mM MgCl2 or MnCl2. Increasing protein concentrations (1–10 μM) were tested while the substrate concentration was constant (100 nM). To demonstrate specificity for the RNA/DNA duplex, activity was also tested against an ssRNA substrate. All reactions were terminated with proteinase K incubation, and products were purified by ethanol precipitation. The results were analyzed on 12% (vol/vol) Tris-Borate-EDTA-urea gel electrophoresis and imaged with a Fuji FLA 7000 phosphoimager.

Supplementary Material

Supporting Information

Acknowledgments

We thank the Protein Expression and Purification Core Facility and Proteomics Core Facility at the European Molecular Biology Laboratory for technical support. This work was supported by the Ministry of Education and Science of the Russian Federation (State Contract no. 14.740.11.1191), by the Russian Academy of Sciences (Program nos. 6.6, 28, and VI.61.1.2), and by intramural funding at the European Molecular Biology Laboratory. G.S. was supported by a German Academic Exchange Service scholarship.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1310958110/-/DCSupplemental.

References

  • 1.Kumar A, Bennetzen JL. Plant retrotransposons. Annu Rev Genet. 1999;33:479–532. doi: 10.1146/annurev.genet.33.1.479. [DOI] [PubMed] [Google Scholar]
  • 2.Noma K, Ohtsubo E, Ohtsubo H. Non-LTR retrotransposons (LINEs) as ubiquitous components of plant genomes. Mol Gen Genet. 1999;261(1):71–79. doi: 10.1007/s004380050943. [DOI] [PubMed] [Google Scholar]
  • 3.Le QH, Wright S, Yu Z, Bureau T. Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA. 2000;97(13):7376–7381. doi: 10.1073/pnas.97.13.7376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Turcotte K, Srinivasan S, Bureau T. Survey of transposable elements from rice genomic sequences. Plant J. 2001;25(2):169–179. doi: 10.1046/j.1365-313x.2001.00945.x. [DOI] [PubMed] [Google Scholar]
  • 5.Prak ET, Kazazian HH., Jr Mobile elements and the human genome. Nat Rev Genet. 2000;1(2):134–144. doi: 10.1038/35038572. [DOI] [PubMed] [Google Scholar]
  • 6.Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol. 1999;16(6):793–805. doi: 10.1093/oxfordjournals.molbev.a026164. [DOI] [PubMed] [Google Scholar]
  • 7.Zupunski V, Gubensek F, Kordis D. Evolutionary dynamics and evolutionary history in the RTE clade of non-LTR retrotransposons. Mol Biol Evol. 2001;18(10):1849–1863. doi: 10.1093/oxfordjournals.molbev.a003727. [DOI] [PubMed] [Google Scholar]
  • 8.Kojima KK, Fujiwara H. An extraordinary retrotransposon family encoding dual endonucleases. Genome Res. 2005;15(8):1106–1117. doi: 10.1101/gr.3271405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schwarz-Sommer Z, Leclercq L, Göbel E, Saedler H. Cin4, an insert altering the structure of the A1 gene in Zea mays, exhibits properties of nonviral retrotransposons. EMBO J. 1987;6(13):3873–3880. doi: 10.1002/j.1460-2075.1987.tb02727.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wright DA, et al. Multiple non-LTR retrotransposons in the genome of Arabidopsis thaliana. Genetics. 1996;142(2):569–578. doi: 10.1093/genetics/142.2.569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Higashiyama T, Noutoshi Y, Fujie M, Yamada T. Zepp, a LINE-like retrotransposon accumulated in the Chlorella telomeric region. EMBO J. 1997;16(12):3715–3723. doi: 10.1093/emboj/16.12.3715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Heitkam T, Schmidt T. BNR - a LINE family from Beta vulgaris - contains a RRM domain in open reading frame 1 and defines a L1 sub-clade present in diverse plant genomes. Plant J. 2009;59(6):872–882. doi: 10.1111/j.1365-313X.2009.03923.x. [DOI] [PubMed] [Google Scholar]
  • 13.Smyshlyaev GA, Blinov AG. Evolution and biodiversity of L1 retrotransposons in angiosperm genomes. Russian Journal of Genetics: Applied Research. 2012;2(1):72–78. [Google Scholar]
  • 14.Khazina E, Weichenrieder O. Non-LTR retrotransposons encode noncanonical RRM domains in their first open reading frame. Proc Natl Acad Sci USA. 2009;106(3):731–736. doi: 10.1073/pnas.0809964106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Martin SL. Nucleic acid chaperone properties of ORF1p from the non-LTR retrotransposon, LINE-1. RNA Biol. 2010;7(6):706–711. doi: 10.4161/rna.7.6.13766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kapelinskaia TV, Kagramanova AS, Korolev AL, Mukha DV. [First open reading frame protein (ORF1p) of the Blattella germanica R1 retroposon and phylogenetically close GAG-like proteins of insects and fungi contain RRM domains] Genetika. 2011;47(2):149–158. Russian. [PubMed] [Google Scholar]
  • 17.Gallia GL, Johnson EM, Khalili K. Puralpha: A multifunctional single-stranded DNA- and RNA-binding protein. Nucleic Acids Res. 2000;28(17):3197–3205. doi: 10.1093/nar/28.17.3197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.White MK, Johnson EM, Khalili K. Multiple roles for Puralpha in cellular and viral regulation. Cell Cycle. 2009;8(3):1–7. doi: 10.4161/cc.8.3.7585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Graebsch A, Roche S, Niessing D. X-ray structure of Pur-alpha reveals a Whirly-like fold and an unusual nucleic-acid binding surface. Proc Natl Acad Sci USA. 2009;106(44):18521–18526. doi: 10.1073/pnas.0907990106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tadokoro T, Kanaya S. Ribonuclease H: Molecular diversities, substrate binding domains, and catalytic mechanism of the prokaryotic enzymes. FEBS J. 2009;276(6):1482–1493. doi: 10.1111/j.1742-4658.2009.06907.x. [DOI] [PubMed] [Google Scholar]
  • 21.Cerritelli SM, Crouch RJ. Ribonuclease H: The enzymes in eukaryotes. FEBS J. 2009;276(6):1494–1505. doi: 10.1111/j.1742-4658.2009.06908.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Qiu J, Qian Y, Frank P, Wintersberger U, Shen B. Saccharomyces cerevisiae RNase H(35) functions in RNA primer removal during lagging-strand DNA synthesis, most efficiently in cooperation with Rad27 nuclease. Mol Cell Biol. 1999;19(12):8361–8371. doi: 10.1128/mcb.19.12.8361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Arudchandran A, et al. The absence of ribonuclease H1 or H2 alters the sensitivity of Saccharomyces cerevisiae to hydroxyurea, caffeine and ethyl methanesulphonate: Implications for roles of RNases H in DNA replication and repair. Genes Cells. 2000;5(10):789–802. doi: 10.1046/j.1365-2443.2000.00373.x. [DOI] [PubMed] [Google Scholar]
  • 24.Marchler-Bauer A, et al. CDD: A Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011;39(Database issue):D225–D229. doi: 10.1093/nar/gkq1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ohtani N, Yanagawa H, Tomita M, Itaya M. Identification of the first archaeal Type 1 RNase H gene from Halobacterium sp. NRC-1: Archaeal RNase HI can cleave an RNA-DNA junction. Biochem J. 2004;381(Pt 3):795–802. doi: 10.1042/BJ20040153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Malik HS, Eickbush TH. Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 2001;11(7):1187–1197. doi: 10.1101/gr.185101. [DOI] [PubMed] [Google Scholar]
  • 27.Malik HS. Ribonuclease H evolution in retrotransposable elements. Cytogenet Genome Res. 2005;110(1-4):392–401. doi: 10.1159/000084971. [DOI] [PubMed] [Google Scholar]
  • 28.Novikova O, Fet V, Blinov A. Non-LTR retrotransposons in fungi. Funct Integr Genomics. 2009;9(1):27–42. doi: 10.1007/s10142-008-0093-8. [DOI] [PubMed] [Google Scholar]
  • 29.You D-J, Chon H, Koga Y, Takano K, Kanaya S. Crystal structure of type 1 ribonuclease H from hyperthermophilic archaeon Sulfolobus tokodaii: Role of arginine 118 and C-terminal anchoring. Biochemistry. 2007;46(41):11494–11503. doi: 10.1021/bi700830f. [DOI] [PubMed] [Google Scholar]
  • 30.Stoppel R, Meurer J. The cutting crew—Ribonucleases are key players in the control of plastid gene expression. J Exp Bot. 2012;63(4):1663–1673. doi: 10.1093/jxb/err401. [DOI] [PubMed] [Google Scholar]
  • 31.Loreto ELS, Carareto CMA, Capy P. Revisiting horizontal transfer of transposable elements in Drosophila. Heredity (Edinb) 2008;100(6):545–554. doi: 10.1038/sj.hdy.6801094. [DOI] [PubMed] [Google Scholar]
  • 32.Novikova O, et al. Novel clades of chromodomain-containing Gypsy LTR retrotransposons from mosses (Bryophyta) Plant J. 2008;56(4):562–574. doi: 10.1111/j.1365-313X.2008.03621.x. [DOI] [PubMed] [Google Scholar]
  • 33.Cheng X, Zhang D, Cheng Z, Keller B, Ling H-Q. A new family of Ty1-copia-like retrotransposons originated in the tomato genome by a recent horizontal transfer event. Genetics. 2009;181(4):1183–1193. doi: 10.1534/genetics.108.099150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Roulin A, et al. Whole genome surveys of rice, maize and sorghum reveal multiple horizontal transfers of the LTR-retrotransposon Route66 in Poaceae. BMC Evol Biol. 2009;9:58. doi: 10.1186/1471-2148-9-58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sormacheva I, et al. Vertical evolution and horizontal transfer of CR1 non-LTR retrotransposons and Tc1/mariner DNA transposons in Lepidoptera species. Mol Biol Evol. 2012;29(12):3685–3702. doi: 10.1093/molbev/mss181. [DOI] [PubMed] [Google Scholar]
  • 36.Yamashita H, Tahara M. A LINE-type retrotransposon active in meristem stem cells causes heritable transpositions in the sweet potato genome. Plant Mol Biol. 2006;61(1-2):79–94. doi: 10.1007/s11103-005-6002-9. [DOI] [PubMed] [Google Scholar]
  • 37.Botstein D. A theory of modular evolution for bacteriophages. Ann N Y Acad Sci. 1980;354:484–490. doi: 10.1111/j.1749-6632.1980.tb27987.x. [DOI] [PubMed] [Google Scholar]
  • 38.Lerat E, Brunet F, Bazin C, Capy P. Is the evolution of transposable elements modular? Genetica. 1999;107(1-3):15–25. [PubMed] [Google Scholar]
  • 39.Roberts AP, Mullany P. A modular master on the move: The Tn916 family of mobile genetic elements. Trends Microbiol. 2009;17(6):251–258. doi: 10.1016/j.tim.2009.03.002. [DOI] [PubMed] [Google Scholar]
  • 40.Malik HS, Eickbush TH. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999;73(6):5186–5190. doi: 10.1128/jvi.73.6.5186-5190.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kramerov DA, Vassetzky NS. Origin and evolution of SINEs in eukaryotic genomes. Heredity (Edinb) 2011;107(6):487–495. doi: 10.1038/hdy.2011.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tajrishi MM, Tuteja R, Tuteja N. Nucleolin: The most abundant multifunctional phosphoprotein of nucleolus. Commun Integr Biol. 2011;4(3):267–275. doi: 10.4161/cib.4.3.14884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Domon C, Lorković ZJ, Valcárcel J, Filipowicz W. Multiple forms of the U2 small nuclear ribonucleoprotein auxiliary factor U2AF subunits expressed in higher plants. J Biol Chem. 1998;273(51):34603–34610. doi: 10.1074/jbc.273.51.34603. [DOI] [PubMed] [Google Scholar]
  • 44.Okonechnikov K, Golosova O, Fursov M. UGENE team Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–1167. doi: 10.1093/bioinformatics/bts091. [DOI] [PubMed] [Google Scholar]
  • 45.Novikov A, Smyshlyaev G, Novikova O. Evolutionary history of LTR retrotransposon chromodomains in plants. Int J Plant Genomics. 2012;2012:874743. doi: 10.1155/2012/874743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Guindon S, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
  • 48.Anisimova M, Gascuel O. Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol. 2006;55(4):539–552. doi: 10.1080/10635150600755453. [DOI] [PubMed] [Google Scholar]
  • 49.Pei J, Kim B-H, Grishin NV. PROMALS3D: A tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008;36(7):2295–2300. doi: 10.1093/nar/gkn072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21(7):951–960. doi: 10.1093/bioinformatics/bti125. [DOI] [PubMed] [Google Scholar]
  • 51.Fisher CL, Pei GK. Modification of a PCR-based site-directed mutagenesis method. Biotechniques. 1997;23(4):570–571, 574. doi: 10.2144/97234bm01. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES