Abstract
Autonomous non-long terminal repeat (non-LTR) retrotransposons and their repetitive remnants are ubiquitous components of mammalian genomes. Recently, we identified non-LTR retrotransposon families, Ingi-1_AAl and Ingi-1_EE, in two hedgehog genomes. Here we rename them to Vingi-1_AAl and Vingi-1_EE and report a new clade “Vingi,” which is a sister clade of Ingi that lacks the ribonuclease H domain. In the European hedgehog genome, there are 11 non-autonomous families of elements derived from Vingi-1_EE by internal deletions. No retrotransposons related to Vingi elements were found in any of the remaining 33 mammalian genomes nearly completely sequenced to date, but we identified several new families of Vingi and Ingi retrotransposons outside mammals. Our data suggest the horizontal transfer of Vingi elements to hedgehog, although the vertical transfer cannot be ruled out. The compact structure and trans-mobilization of nonautonomous derivatives of Vingi can make them useful for in vivo retrotransposition assay system.
Keywords: non-LTR retrotransposon, Ingi, Vingi, hedgehog, mammal, nonautonomous, horizontal transfer, reverse transcriptase
Abundant proliferation of long interspersed elements 1 (LINE1), also known as L1 elements, is an outstanding feature of eutherian and marsupial genomes (Lander et al. 2001; Waterston et al. 2002; Lindblad-Toh et al. 2005; Gentles et al. 2007). Their ongoing activity in the human genome was a subject of detailed studies (Moran et al. 1996; Sassaman et al. 1997; Brouha et al. 2003). Other types of non-LTR retrotransposons also left their footprints in mammalian genomes. Non-LTR retrotransposons have been divided into over 28 clades (Malik et al. 1999; Kapitonov et al. 2009), and at least 6 clades (L1, L2, CR1, RTE, RTEX, and R4) are present as fossils in the human and other mammalian genomes (Lander et al. 2001; Kapitonov and Jurka 2003; Jurka et al. 2005; Gentles et al. 2007; Jurka et al. 2007).
Recently, we identified a new kind of mammalian non-LTR retrotransposons present in two species of hedgehogs (Jurka and Kojima 2010; Kapitonov and Jurka 2010). They are related to the Ingi clade, first identified in trypanosomes (Kimmel et al. 1987) and afterward in metazoans (Kapitonov and Jurka 2009; Kapitonov et al. 2009).
We began with analysis of Ingi-1_EE from the European hedgehog Erinaceus europaeus (Jurka and Kojima 2010) and Ingi-1_AAl from the middle African hedgehog Atelerix albiventris (Kapitonov and Jurka 2010). We can exclude the possibility of contamination (supplementary material S1 and table S1, Supplementary Material online). Their consensus nucleotide sequences are ∼97% identical to each other. They encode a single 993-aa protein that does not contain any ribonuclease H (RNH) domain, unlike Ingi elements from trypanosome and sea slug (fig. 1A). Ingi elements from Branchiostoma floridae and from Strongylocentrotus purpuratus also lack the RNH domain. Based on phylogenetic studies, we have shown that these non-LTR retrotransposons belong to a separate new clade, called Vingi (fig. 1 and supplementary table S2, Supplementary Material online). The hedgehog Vingi elements have a very compact structure. The 5′ untranslated region (UTRs) of their consensus sequences are only ∼90-bp long and the stop codons terminating the single open-reading frame overlap with 3′ AAG terminal repeats, leaving no significant 3′ UTR.
Approximately 240 copies of Vingi-1_EE (formerly Ingi-1_EE), in the available sequence data are of full length, and all are >90% identical to their consensus sequence. However, in the available genomic sequences, only one Vingi-1_EE copy (scaffold_379327: 57840-54759) encodes a full-length protein uninterrupted by stop codons (supplementary material S2, Supplementary Material online). This copy is ∼98% identical to the consensus sequence at the DNA level and codes for the 993-aa protein 96% identical to the protein encoded by the consensus sequence.
The autonomous Vingi-1_EE family is associated with 11 nonautonomous families derived from Vingi-1_EE via different internal deletions and amplified to the relatively high copy numbers (table 1, supplementary figure S1 and material S2, Supplementary Material online). The total copy number of all autonomous and nonautonomous Vingi in E. europaeus is estimated at ∼50,000 and they occupy 0.8% of the available genomic sequences. We did not find any nonautonomous Vingi elements in A. albiventris.
Table 1.
Name | Length (bp) | Copy number of full-length copies | Corresponding positions in Vingi-1_EE | Average identity ± standard deviationa |
Vingi-1_EE | 3085 | 242 | 1-3085 | 0.9782 ± 0.0058 |
Vingi-1N1_EE | 102 | 92 | 1-80, 3062-3085 | 0.9758 ± 0.0154 |
Vingi-1N2_EE | 403 | 20 | 1-129, 2820-3082 | 0.9882 ± 0.0715 |
Vingi-1N3_EE | 195 | 6273 | 1-149, 3036-3081 | 0.9751 ± 0.0003 |
Vingi-1N4_EE | 574 | 101 | 1-170, 2679-3085 | 0.9723 ± 0.0141 |
Vingi-1N5_EE | 628 | 106 | 1-211, 2669-3085 | 0.9731 ± 0.0131 |
Vingi-1N6_EE | 339 | 12 | 1-209, 2953-3082 | 0.9758 ± 0.1151 |
Vingi-1N7_EE | 874 | 8 | 1-212, 2324-2405, 2502-3081 | 0.9777 ± 0.1729 |
Vingi-1N8_EE | 680 | 12 | 1-222, 2625-3082 | 0.9743 ± 0.1149 |
Vingi-1N9_EE | 619 | 5 | 1-243, 2709-3084 | 0.9814 ± 0.2776 |
Vingi-1N10_EE | 730 | 12 | 1-271, 2623-3081 | 0.9838 ± 0.1159 |
Vingi-1N11_EE | 675 | 8 | 3-321, 2726-3080 | 0.9849 ± 0.1741 |
CpG dinucleotides, prone to quick decay to TpG and CpA, in the consensus sequences were excluded in calculations of identity.
We found no Ingi or Vingi elements in any sequenced mammalian genomes other than hedgehogs. Outside of mammals, Ingi and Vingi elements were found in vertebrate, annelid, mollusk, nematode, and insect species (supplementary table S2, Supplementary Material online).
Based on the reverse transcriptase (RT) phylogeny, Ingi and Vingi elements were clustered together inside the I group, with a high statistical support (fig. 1 and supplementary fig. S2, Supplementary Material online). Based on the phylogeny and domain structure (fig. 1), Ingi and Vingi elements can be classified into five clusters (Ingi-A, Ingi-B, and Ingi-C and Vingi-A and Vingi-B). The monophyly of the Vingi-A cluster is also supported by the substitutions at the highly conserved SDH triplet in the apurinic-like endonuclease domain (supplementary fig. S3, Supplementary Material online).
Vertebrate Vingi elements were clustered into four lineages (fig. 1B, thick lines). Anolis carolinensis was the closest vertebrate related to hedgehogs, but Vingi-1_Acar and Vingi-2_Acar were not closely related to the Vingi elements from hedgehogs (Vingi-1_EE and Vingi-1_AAl). The sister lineage of the hedgehog Vingi elements (Vingi-1_PMa, Vingi-1_Lme, Vingi-1_Lch, Vingi-1_Tcas, and Vingi-1_BM) showed a phylogenetic relationship similar to that of their host species. This relationship is consistent with vertical transmission of Vingi elements in several bilaterian lineages for several hundred million years. In contrast, the hedgehog Vingi lineage does not include any retrotransposons from other species.
The new clade of non-LTR retrotransposons called “Vingi” includes two clusters: Vingi-A and Vingi-B (fig. 1). No Vingi elements encode RNH. The phylogeny of Vingi-1_PMa, Vingi-1_Lme, Vingi-1_Lch, Vingi-1_Tcas, and Vingi-1_BM indicates that the origin of Vingi can be traced back to the last common ancestor of all bilaterians. Although the clustering of Vingi-A and Vingi-B is not well supported statistically in the maximum likelihood phylogeny, these two clusters are generally positioned together independently of different methods of multiple alignment and evolutionary models. The Vingi clade possibly represents an internal branch inside of the Ingi clade, and Vingi-A and Vingi-B are likely to become separate clades after more elements are characterized.
Aside from the two hedgehog species, no Vingi or other elements from the I group could be identified in any of the remaining 33 mammalian genomes sequenced to date. Many fragments of non-LTR retrotransposons that likely predate the eutherian radiation (e.g., for L2 and L3) are still present in the human genome (Lander et al. 2001; Kapitonov and Jurka 2003). To date, all non-LTR retrotransposons presumed to be horizontally transferred into mammals are members of the RTE clade represented by Bov-B (Kordis and Gubensek 1998; Zupunski et al. 2001; Gentles et al. 2007; Gogolevsky et al. 2008). The compact and simple structure of Vingi is comparable to that of RTE elements. Although there is still a possibility that Vingi elements may exists in other mammals not yet sequenced, we consider horizontal transfer as the most likely scenario of evolution of the hedgehog Vingi non-LTR retrotransposons.
Our knowledge of the retrotransposition of non-LTR retrotransposons is largely based on the study of mammalian L1 (Moran et al. 1996; Sassaman et al. 1997; Babushok and Kazazian 2007). However, L1 has some characteristics distinct from other non-LTR retrotransposons. For example, it shows weak sequence dependency to its own 3′ tail, which causes processed pseudogene formation and retrotransposition of the 3′ flanking sequences (Moran et al. 1999; Esnault et al. 2000). To understand the general features of the non-LTR retrotransposon mobilization, we need another non-LTR retrotransposon family that works in the same experimental system. The existence of Vingi-1N1_EE indicates that only the 5′ 80-bp and 3′ 25-bp termini are sufficient to be mobilized in trans by the Vingi-1_EE protein. This may be useful for inserting reporter genes in future experimental studies of Vingi.
Materials and Methods
Genomic sequences of various species were obtained mostly from NCBI GenBank, and prototypic sequences of known non-LTR retrotransposons were obtained from Repbase (http://www.girinst.org/repbase). Unpublished genomic sequences of European hedgehog and of other mammals used in this study were deposited in Genbank by Broad Institute (see Acknowledgments).
New Ingi and Vingi non-LTR retrotransposons were identified by repeated Blast (Altschul et al. 1997) and CENSOR (Kohany et al. 2006) searches using genomic sequences of various species with Ingi and Vingi elements as queries. The consensus sequences were derived using the majority rule applied to the corresponding set of multiple aligned copies of retrotransposons. We considered a particular copy of Vingi-1_EE and non-autonomous derivatives to be of full length when it started within the first 10 nucleotides relative to the consensus sequence and ended inside of the terminal AAG repeats.
We aligned the RT domain sequences of non-LTR retrotransposons with the aid of either MAFFT (Katoh et al. 2005) or MUSCLE (Edgar 2004). We constructed maximum likelihood trees by PhyML (Guindon and Gascuel 2003; Guindon et al. 2005) with bootstrap values (100 replicates) or approximate likelihood ratio test nonparametric branch support based on a Shimodaira–Hasegawa–like procedure (Anisimova and Gascuel 2006) in three different amino acid substitution models: RtREV, LG, and WAG. The phylogenetic trees were drawn with the aid of FigTree 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).
Supplementary Materials
Supplementary figures S1–S3, tables S1 and S2, and materials S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Supplementary Material
Acknowledgments
We thank the Broad Institute Genome Sequencing Platform and Genome Sequencing and Analysis Program, Federica Di Palma, and Kerstin Lindblad-Toh for making the data for E. europaeus and other mammalian genomes available. This work was supported by the National Institutes of Health grant 5 P41 LM006252. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine or the National Institutes of Health.
References
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anisimova M, Gascuel O. Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol. 2006;55:539–552. doi: 10.1080/10635150600755453. [DOI] [PubMed] [Google Scholar]
- Babushok DV, Kazazian HH., Jr Progress in understanding the biology of the human mutagen LINE-1. Hum Mutat. 2007;28:527–539. doi: 10.1002/humu.20486. [DOI] [PubMed] [Google Scholar]
- Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH., Jr Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A. 2003;100:5280–5285. doi: 10.1073/pnas.0831042100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esnault C, Maestre J, Heidmann T. Human LINE retrotransposons generate processed pseudogenes. Nat Genet. 2000;24:363–367. doi: 10.1038/74184. [DOI] [PubMed] [Google Scholar]
- Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J. Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res. 2007;17:992–1004. doi: 10.1101/gr.6070707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gogolevsky KP, Vassetzky NS, Kramerov DA. Bov-B-mobilized SINEs in vertebrate genomes. Gene. 2008;407:75–85. doi: 10.1016/j.gene.2007.09.021. [DOI] [PubMed] [Google Scholar]
- Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003;52:696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]
- Guindon S, Lethiec F, Duroux P, Gascuel O. PHYML Online–a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 2005;33:W557–W559. doi: 10.1093/nar/gki352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jurka J, Kapitonov VV, Kohany O, Jurka MV. Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet. 2007;8:241–259. doi: 10.1146/annurev.genom.8.080706.092416. [DOI] [PubMed] [Google Scholar]
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
- Jurka J, Kojima K. Ingi non-LTR retrotransposons from mammals. Repbase Rep. 2010;10:154. [Google Scholar]
- Kapitonov VV, Jurka J. The esterase and PHD domains in CR1-like non-LTR retrotransposons. Mol Biol Evol. 2003;20:38–46. doi: 10.1093/molbev/msg011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapitonov VV, Jurka J. New families of I non-LTR retrotransposons from animals. Repbase Rep. 2009;9:1529–1534. [Google Scholar]
- Kapitonov VV, Jurka J. Ingi-1_AAl, a family of Ingi non-LTR retrotransposons from middle-African hedgehogs. Repbase Rep. 2010;10:156. [Google Scholar]
- Kapitonov VV, Tempel S, Jurka J. Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences. Gene. 2009;448:207–213. doi: 10.1016/j.gene.2009.07.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimmel BE, ole-MoiYoi OK, Young JR. Ingi, a 5.2-kb dispersed sequence element from Trypanosoma brucei that carries half of a smaller mobile element at either end and has homology with mammalian LINEs. Mol Cell Biol. 1987;7:1465–1475. doi: 10.1128/mcb.7.4.1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohany O, Gentles AJ, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in Repbase: repbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474. doi: 10.1186/1471-2105-7-474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kordis D, Gubensek F. Unusual horizontal transfer of a long interspersed nuclear element between distant vertebrate classes. Proc Natl Acad Sci U S A. 1998;95:10704–10709. doi: 10.1073/pnas.95.18.10704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lander ES, Linton LM, Birren B, et al. (100 co-authors) Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- Lindblad-Toh K, Wade CM, Mikkelsen TS, et al. (46 co-authors) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438:803–819. doi: 10.1038/nature04338. [DOI] [PubMed] [Google Scholar]
- Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol. 1999;16:793–805. doi: 10.1093/oxfordjournals.molbev.a026164. [DOI] [PubMed] [Google Scholar]
- Moran JV, DeBerardinis RJ, Kazazian HH., Jr Exon shuffling by L1 retrotransposition. Science. 1999;283:1530–1534. doi: 10.1126/science.283.5407.1530. [DOI] [PubMed] [Google Scholar]
- Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH., Jr High frequency retrotransposition in cultured mammalian cells. Cell. 1996;87:917–927. doi: 10.1016/s0092-8674(00)81998-4. [DOI] [PubMed] [Google Scholar]
- Sassaman DM, Dombroski BA, Moran JV, Kimberland ML, Naas TP, DeBerardinis RJ, Gabriel A, Swergold GD, Kazazian HH., Jr Many human L1 elements are capable of retrotransposition. Nat Genet. 1997;16:37–43. doi: 10.1038/ng0597-37. [DOI] [PubMed] [Google Scholar]
- Waterston RH, Lindblad-Toh K, Birney E, et al. (222 co-authors) Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
- Zupunski V, Gubensek F, Kordis D. Evolutionary dynamics and evolutionary history in the RTE clade of non-LTR retrotransposons. Mol Biol Evol. 2001;18:1849–1863. doi: 10.1093/oxfordjournals.molbev.a003727. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.