Abstract
A consensus sequence for the human long interspersed repeated DNA element, L1H8 (LINE or KpnI sequence), is presented. The sequence contains two open reading frames (ORFs) which are homologous to ORFs in corresponding regions of L1 elements in other species. The L1H8 ORFs are separated by a small evolutionarily nonconserved region. The 5′ end of the consensus contains frequent terminators in all three reading frames and has a relatively high GC content with numerous stretches of weak homology with AluI repeats. The 5′ ORF extends for a minimum of 723 bp (241 codons). The 3′ ORF is 3843 bp (1281 codons) and predicts a protein of 149 kD which has regions of weak homology to the polymerase domain of various reverse transcriptases. The 3′ end of the consensus has a 208-bp nonconserved region followed by an adenine-rich end. The organization of the L1H8 consensus sequence resembles the structure of eukaryotic mRNAs except for the noncoding region between ORFs. However, due to base substitutions or truncation most elements appear incapable of producing mRNA that can be translated. Our observation that individual elements cluster into subfamilies on the basis of the presence or absence of blocks of sequence, or by the linkage of alternative bases at multiple positions, suggests that most L1 sequences were derived from a small number of structural genes. An estimate of the mammalian L1 substitution rate was derived and used to predict the age of individual human elements. From this it follows that the majority of human L1 sequences have been generated within the last 30 million years. The human elements studied here differ from each other, yet overall the L1H8 sequences demonstrate a pattern of species-specificity when compared to the L1 families of other mammals. Possible mechanisms that may account for the origin and evolution of the L1 family are discussed. These include pseudogene formation (retroposition), transposition, gene conversion, and RNA recombination.
Footnotes
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession No. J03034.
References
- 1.Adams J.W., Kaufman R.E., Kretschmer P.J., Harrison M., Nienhuis A.W. A family of long reiterated DNA sequences, one copy of which is next to the human beta globin gene. Nucleic Acids Res. 1980;8:6113–6128. doi: 10.1093/nar/8.24.6113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Barnes W.M., Bevan M. Kilo-sequencing: An ordered strategy for rapid DNA sequence data acquisition. Nucleic Acids Res. 1983;11:349–368. doi: 10.1093/nar/11.2.349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bernstein L.B., Mount S.M., Weiner A.M. Pseudogenes for human small nuclear RNA U3 appear to arise by integration of self-primed reverse transcripts of the RNA into new chromosomal sites. Cell. 1983;32:461–472. doi: 10.1016/0092-8674(83)90466-x. [DOI] [PubMed] [Google Scholar]
- 4.Burton F.H., Loeb D.D., Voliva C.F., Martin S.L., Edgell M.H., Hutchison C.A. Conservation throughout mammalia and extensive protein-coding capacity of the highly repeated DNA long interspersed sequence one. J. Mol. Biol. 1986;187:291–304. doi: 10.1016/0022-2836(86)90235-4. [DOI] [PubMed] [Google Scholar]
- 5.Citron B.A., Chaudary P.V., Rao D.N., Kaufman S. Evidence for transcription and potential translation of the human 1.9 kb HindIII repetitive element. Nucleic Acids Res. 1986;14:3137–3142. doi: 10.1093/nar/14.7.3137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Collins F.S., Weissman S.M. The molecular genetics of human hemoglobin. Prog. Nucleic Acid Res. Mol. Biol. 1984;31:315–462. doi: 10.1016/s0079-6603(08)60382-7. [DOI] [PubMed] [Google Scholar]
- 7.Dale R.M.K., McClure B.A., Houchins J.P. A rapid single-stranded cloning strategy for producing a sequential series of overlapping clones for use in DNA sequencing: Application to sequencing the corn mitochondrial 18 S rDNA. Plasmid. 1985;13:31–40. doi: 10.1016/0147-619x(85)90053-8. [DOI] [PubMed] [Google Scholar]
- 8.Deininger P.L., Jolly D.J., Rubin C.M., Friedmann T., Schmid C.W. Base sequence studies of 300 nucleotides renatured repeated human DNA clones. J. Mol. Biol. 1981;151:17–33. doi: 10.1016/0022-2836(81)90219-9. [DOI] [PubMed] [Google Scholar]
- 9.Demers G.W., Brech K., Hardison R.C. Long interspersed L1 repeats in rabbit DNA are homologous to L1 repeats of rodents and primates in an open-reding-frame region. Mol. Biol. Evol. 1986;3:179–190. doi: 10.1093/oxfordjournals.molbev.a040390. [DOI] [PubMed] [Google Scholar]
- 10.Dente L., Cesareni G., Cortese R. pEMBL: A new family of single stranded plasmids. Nucleic Acids Res. 1983;11:1645–1655. doi: 10.1093/nar/11.6.1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.DiGiovanni L., Haynes S.R., Misra R., Jelinek W.R. 5th ed. Vol. 80. 1983. KpnI family of long-dispersed repeated DNA sequences of man: Evidence for entry into genomic DNA of DNA copies of poly(A)-terminated KpnI RNAs; pp. 6533–6537. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Doolittle W.F. RNA-mediated gene conversion. Trends Genet. 1985;1:64–65. [Google Scholar]
- 13.Economou-Pachnis A., Lohse M.A., Furano A.V., Tsichlis P.N. 5th ed. Vol. 82. 1985. Insertion of long interspersed repeated elements at the Igh and Mlvi-2 (Moloney leukemia virus integration 2) loci of rats; pp. 2857–2861. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fanning T., Singer M. The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins. Nucleic Acids Res. 1987;15:2251–2260. doi: 10.1093/nar/15.5.2251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fawcett D.H., Lister C.K., Kellet E., Finnegan D.J. Transposable elements controlling I-R hybrid dysgenesis in D. melanogaster are similar to mammalian LINEs. Cell. 1986;47:1007–1015. doi: 10.1016/0092-8674(86)90815-9. [DOI] [PubMed] [Google Scholar]
- 16.Fink G.R., Boeke J.D., Garfinkel D.J. The mechanism and consequences of retrotransposition. Trends Genet. 1986;2:118–123. [Google Scholar]
- 17.Fujita A., Hattori M., Takenaka O., Sakaki Y. The L1 family (KpnI family) sequence near the 3′ end of human β-globin gene may have been derived from an active L1 sequence. Nucleic Acids Res. 1987;15:4007–4020. doi: 10.1093/nar/15.10.4007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Garoff H., Ansorge W. Improvements of DNA sequencing gels. Anal. Biochem. 1981;115:450–457. doi: 10.1016/0003-2697(81)90031-2. [DOI] [PubMed] [Google Scholar]
- 19.Hattori M., Hidaka S., Sakaki Y. Sequence analysis of a KpnI family member near the 3′ end of human β-globin gene. Nucleic Acids Res. 1985;13:7813–7827. doi: 10.1093/nar/13.21.7813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hattori M., Kuhara S., Takenaka O., Sakaki Y. L1 family of repetitive DNA sequences in primates may be derived from a sequence encoding a reverse transcriptase-related protein. Nature (London) 1986;321:625–628. doi: 10.1038/321625a0. [DOI] [PubMed] [Google Scholar]
- 21.Heyer W.-D., Munz P., Amstutz H., Aebi R., Gysler C., Schuchert P., Szankasi P., Leupold U., Kohli J., Gamulin V., Soll D. Inactivation of nonsense suppressor transfer RNA gene in Schizosaccharomyces pombe: Intergenic conversion and hot spots of mutation. J. Mol. Biol. 1986;188:343–353. doi: 10.1016/0022-2836(86)90159-2. [DOI] [PubMed] [Google Scholar]
- 22.Hong G.F. A systematic DNA sequencing strategy. J. Mol. Biol. 1982;158:539–549. doi: 10.1016/0022-2836(82)90213-3. [DOI] [PubMed] [Google Scholar]
- 23.Johnson M.S., McClure M.A., Feng D.F., Gray J., Doolittle R.F. 5th ed. Vol. 83. 1986. Computer analysis of retroviral pol genes: Assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes; pp. 7648–7652. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jubier-Maurin V., Dod B.J., Bellis M., Piechaczyk M., Roizes G. Comparative study of the L1 family in the genus Mus: Possible role of retroposition and conversion events in its concerted evolution. J. Mol. Biol. 1985;184:547–564. doi: 10.1016/0022-2836(85)90302-x. [DOI] [PubMed] [Google Scholar]
- 25.Katzir N., Rechavi G., Cohen J.B., Unger T., Simoni F., Segal S., Cohen D., Givol D. 5th ed. Vol. 82. 1985. “Retroposon” insertion into the cellular oncogene c-myc in canine transmissible venereal tumor; pp. 1054–1058. (Proc. Natl. Acad. Sci). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kazazian H.H., Antonarakis S.E. The varieties of mutation. In: Motulsky A., Epstein C., Childs B., editors. 5th ed. Vol. 7. 1987. (Progress in Medical Genetics). in press. [Google Scholar]
- 27.Keck J.G., Stohlman S.A., Soe L.H., Makino S., Lai M.M. Multiple recombination sites at the 5′-end of murine coronavirus RNA. Virology. 1987;156:331–341. doi: 10.1016/0042-6822(87)90413-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kimmel B.E., Ole-Moiyoi O.K., Young J.R. Ingi, a 5.2-kb dispersed sequence element from Trypanosoma brucei that carries half of a smaller mobile element at either end and has homology with mammalian LINEs. Mol. Cell. Biol. 1987;7:1465–1475. doi: 10.1128/mcb.7.4.1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kirkegaard K., Baltimore D. The mechanism of RNA recombination in poliovirus. Cell. 1986;47:433–443. doi: 10.1016/0092-8674(86)90600-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kole L.B., Haynes S.R., Jelinek W. Discrete and heterogeneous high molecular weight RNAs complementary to a long dispersed repeat family (a possible transposon) of human DNA. J. Mol. Biol. 1983;165:257–286. doi: 10.1016/s0022-2836(83)80257-5. [DOI] [PubMed] [Google Scholar]
- 31.Kozak M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell. 1986;44:283–292. doi: 10.1016/0092-8674(86)90762-2. [DOI] [PubMed] [Google Scholar]
- 32.Lakshmikumaran M.S., D'Ambrosio E., Laimins L.A., Lin D.T., Furano A.V. Long interspersed repeated DNA (LINE) causes polymorphism at the rat insulin 1 locus. Mol. Cell. Biol. 1985;5:2197–2203. doi: 10.1128/mcb.5.9.2197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Larhammar D., Servenius B., Rask L., Peterson P.A. 5th ed. Vol. 82. 1985. Characterization of an HLA DR beta pseudogene; pp. 1475–1479. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lerman M.I., Thayer R.E., Singer M.F. 5th ed. Vol. 80. 1983. KpnI family of long interspersed repeated DNA sequences in primates: Polymorphism of family members and evidence for transcription; pp. 3966–3970. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li Q., Powers P.A., Smithies O. Nucleotide sequence of 16-kilobase pairs of DNA 5′ to the human ϵ-globin gene. J. Biol. Chem. 1985;260:14901–14910. [PubMed] [Google Scholar]
- 36.Lipman D.J., Pearson W.R. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441. doi: 10.1126/science.2983426. [DOI] [PubMed] [Google Scholar]
- 37.Loeb D.D., Padgett R.W., Hardies S.C., Shehee W.R., Comer M.B., Edgell M.H., Hutchison C.A. The sequence of a large L1Md element reveals a tandemly repested 5′ end and several features found in retrotransposons. Mol. Cell. Biol. 1986;6:168–182. doi: 10.1128/mcb.6.1.168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Manuelidis L. Nucleotide sequence definition of a major human repeated DNA, the HindIII 1.9 kb family. Nucleic Acids Res. 1982;10:3211–3219. doi: 10.1093/nar/10.10.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Martin S.L., Voliva C.F., Hardies S.C., Edgell M.H., Hutchison C.A. Tempo and mode of concerted evolution in the L1 repeat family of mice. Mol. Biol. Evol. 1985;2:127–140. doi: 10.1093/oxfordjournals.molbev.a040340. [DOI] [PubMed] [Google Scholar]
- 40.Messing J., Vieira V. A new pair of M13 vectors for selecting either DNA strand of double-digest restriction fragments. Gene. 1982;19:269–277. doi: 10.1016/0378-1119(82)90016-6. [DOI] [PubMed] [Google Scholar]
- 41.Miyake T., Migita K., Sakaki Y. Some KpnI family members are associated with the Alu family in the human genome. Nucleic Acids Res. 1983;11:6837–6846. doi: 10.1093/nar/11.19.6837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nomiyama H., Tsuzuki T., Wakasugi S., Fukuda M., Shimada K. Interruption of a human nuclear sequence homologous to mitochondrial DNA by a member of the KpnI 1.8 kb family. Nucleic Acids Res. 1984;12:5225–5234. doi: 10.1093/nar/12.13.5225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Okamoto T., Reitz M.S., Clarke M.F., Jagodzinski L.L., Wong-Staal F. Activation of a novel KpnI transcript by downstream integration of a human T-lymphotropic virus Type I provirus. J. Biol. Chem. 1986;261:4615–4619. [PubMed] [Google Scholar]
- 44.Poncz M., Schwartz E., Ballantine M., Surrey S. Nucleotide sequence analysis of the delta beta-globin gene region in humans. J. Biol. Chem. 1983;258:11599–11609. [PubMed] [Google Scholar]
- 45.Potter S.S. 5th ed. Vol. 81. 1984. Rearranged sequences of a human KpnI element; pp. 1012–1016. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Queen C., Korn L.J. A comprehensive sequence analysis program for the IBM personal computer. Nucleic Acids Res. 1984;12:581–599. doi: 10.1093/nar/12.1part2.581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rogan P.K., Pan J., Weissman S.M. L1 repeat elements in the human ϵ-γ-globin gene intergenic region: Sequence analysis and concerted evolution within this family. Mol. Biol. Evol. 1987;4:327–342. doi: 10.1093/oxfordjournals.molbev.a040448. [DOI] [PubMed] [Google Scholar]
- 48.Sakaki Y., Hattori M., Fujita A., Yoshioka K., Kuhara S., Takenaka O. 5th ed. Vol. 51. 1986. The LINE-1 family of primates may encode a reverse transcriptase-like protein; pp. 465–469. (Cold Spring Harbor Symp. Quant. Biol). [DOI] [PubMed] [Google Scholar]
- 49.Sanger F., Nicklen S., Coulson A.R. 5th ed. Vol. 74. 1977. DNA sequencing with chain-terminating inhibitors; pp. 5463–5467. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schmeckpeper B.J., Scott A.F., Smith K.D. Transcripts homologous to a long repeated DNA element in the human genome. J. Biol. Chem. 1984;259:1218–1225. [PubMed] [Google Scholar]
- 51.Schmeckpeper B.J., Smith K.D., Dorman B.P., Ruddle F.H., Talbot C.C. 5th ed. Vol. 76. 1979. Partial purification and characterization of DNA from the human X chromosome; pp. 6525–6528. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schmeckpeper B.J., Willard H.F., Smith K.D. Isolation and chracterization of cloned human DNA fragments carrying reiterated sequences common to both autosomes and the X chromosome. Nucleic Acids Res. 1981;9:1853–1872. doi: 10.1093/nar/9.8.1853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Scott A.F., Heath P., Trusko S., Boyer S.H., Prass W., Goodman M., Czelusniak J., Chang L.-Y.E., Slightom J.L. The sequence of the gorilla fetal globin genes: Evidence for multiple gene conversions in human evolution. Mol. Biol. Evol. 1984;1:371–389. doi: 10.1093/oxfordjournals.molbev.a040325. [DOI] [PubMed] [Google Scholar]
- 54.Shafit-Zagardo B., Brown F.L., Zavodny P.J., Maio J.J. Transcription of the KpnI families of long interspersed DNAs in human cells. Nature (London) 1983;304:277–280. doi: 10.1038/304277a0. [DOI] [PubMed] [Google Scholar]
- 55.Singer M.F., Skowronski J. Making sense out of LINES: Long interspersed repeat sequences in mammalian genomes. Trends Biochem. Sci. 1985;10:119–122. [Google Scholar]
- 56.Skowronski J., Singer M.F. 5th ed. Vol. 82. 1985. Expression of a cytoplasmic LINE-1 transcript is regulated in a human teratocarcinoma cell line; pp. 6050–6054. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Soares M.B., Schon E., Efstratiadis A. Rat LINE 1: The origin and evolution of a family of long interspersed middle repetitive DNA elements. J. Mol. Evol. 1985;22:117–133. doi: 10.1007/BF02101690. [DOI] [PubMed] [Google Scholar]
- 58.Sun L., Paulson K.E., Schmid C.W., Kadyk L., Leinwand L. Non-Alu family interspersed repeats in human DNA and their transcriptional activity. Nucleic Acids Res. 1984;12:2669–2690. doi: 10.1093/nar/12.6.2669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Temin H.M. Reverse transcription in the eukaryotic genome: Retroviruses, pararetroviruses, retrotransposons, and retrotranscripts. Mol. Biol. Evol. 1985;2:455–468. doi: 10.1093/oxfordjournals.molbev.a040365. [DOI] [PubMed] [Google Scholar]
- 60.Thayer R.E., Singer M.F. Interruption of an α-satellite array by a short member of the KpnI family of interspersed, highly repeated monkey DNA sequences. Mol. Cell Biol. 1983;3:967–973. doi: 10.1128/mcb.3.6.967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ueda S., Nakai S., Nishida Y., Hisajima H., Honjo T. Long terminal repeat-like elements flank a human immunoglobin epsilon pseudogene that lacks introns. EMBO J. 1982;1:1539–1544. doi: 10.1002/j.1460-2075.1982.tb01352.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ullrich A., Gray A., Goeddel D.V., Dull T.J. Nucleotide sequence of a portion of human chromosome 9 containing a leukocyte interferon gene cluster. J. Mol. Biol. 1982;156:467–486. doi: 10.1016/0022-2836(82)90261-3. [DOI] [PubMed] [Google Scholar]
- 63.Wagner M. A consideration of the origin of processed pseudogenes. Trends Genet. 1986;2:134–137. [Google Scholar]
- 64.Weiner A.M., Deininger P.L., Efstratiadis A. Nonviral retroposons: Genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 1986;55:631–661. doi: 10.1146/annurev.bi.55.070186.003215. [DOI] [PubMed] [Google Scholar]
- 65.Yoshitake S., Schach B.G., Foster D.C., Davie E.W., Kurachi K. Nucleotide sequence of the gene for human factor IX (anti-hemophilic factor B) Biochemistry (USA) 1985;24:3736–3750. doi: 10.1021/bi00335a049. [DOI] [PubMed] [Google Scholar]