Abstract
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
Full Text
The Full Text of this article is available as a PDF (106.3 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Bernardi G. The isochore organization of the human genome and its evolutionary history--a review. Gene. 1993 Dec 15;135(1-2):57–66. doi: 10.1016/0378-1119(93)90049-9. [DOI] [PubMed] [Google Scholar]
- Billoud B., Kontic M., Viari A. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. Nucleic Acids Res. 1996 Apr 15;24(8):1395–1403. doi: 10.1093/nar/24.8.1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boguski M. S., Lowe T. M., Tolstoshev C. M. dbEST--database for "expressed sequence tags". Nat Genet. 1993 Aug;4(4):332–333. doi: 10.1038/ng0893-332. [DOI] [PubMed] [Google Scholar]
- Bult C. J., White O., Olsen G. J., Zhou L., Fleischmann R. D., Sutton G. G., Blake J. A., FitzGerald L. M., Clayton R. A., Gocayne J. D. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. doi: 10.1126/science.273.5278.1058. [DOI] [PubMed] [Google Scholar]
- Dandekar T., Hentze M. W. Finding the hairpin in the haystack: searching for RNA motifs. Trends Genet. 1995 Feb;11(2):45–50. doi: 10.1016/s0168-9525(00)88996-9. [DOI] [PubMed] [Google Scholar]
- Daniels G. R., Deininger P. L. Repeat sequence families derived from mammalian tRNA genes. 1985 Oct 31-Nov 6Nature. 317(6040):819–822. doi: 10.1038/317819a0. [DOI] [PubMed] [Google Scholar]
- Eddy S. R., Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994 Jun 11;22(11):2079–2088. doi: 10.1093/nar/22.11.2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fichant G. A., Burks C. Identifying potential tRNA genes in genomic DNA sequences. J Mol Biol. 1991 Aug 5;220(3):659–671. doi: 10.1016/0022-2836(91)90108-i. [DOI] [PubMed] [Google Scholar]
- Fleischmann R. D., Adams M. D., White O., Clayton R. A., Kirkness E. F., Kerlavage A. R., Bult C. J., Tomb J. F., Dougherty B. A., Merrick J. M. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
- Fraser C. M., Gocayne J. D., White O., Adams M. D., Clayton R. A., Fleischmann R. D., Bult C. J., Kerlavage A. R., Sutton G., Kelley J. M. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. doi: 10.1126/science.270.5235.397. [DOI] [PubMed] [Google Scholar]
- Gautheret D., Major F., Cedergren R. Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA. Comput Appl Biosci. 1990 Oct;6(4):325–331. doi: 10.1093/bioinformatics/6.4.325. [DOI] [PubMed] [Google Scholar]
- Grate L., Herbster M., Hughey R., Haussler D., Mian I. S., Noller H. RNA modeling using Gibbs sampling and stochastic context free grammars. Proc Int Conf Intell Syst Mol Biol. 1994;2:138–146. [PubMed] [Google Scholar]
- Green C. J., Vold B. S. Staphylococcus aureus has clustered tRNA genes. J Bacteriol. 1993 Aug;175(16):5091–5096. doi: 10.1128/jb.175.16.5091-5096.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gribskov M., Lüthy R., Eisenberg D. Profile analysis. Methods Enzymol. 1990;183:146–159. doi: 10.1016/0076-6879(90)83011-w. [DOI] [PubMed] [Google Scholar]
- Hatlen L., Attardi G. Proportion of HeLa cell genome complementary to transfer RNA and 5 s RNA. J Mol Biol. 1971 Mar 28;56(3):535–553. doi: 10.1016/0022-2836(71)90400-1. [DOI] [PubMed] [Google Scholar]
- Kano A., Ohama T., Abe R., Osawa S. Unassigned or nonsense codons in Micrococcus luteus. J Mol Biol. 1993 Mar 5;230(1):51–56. doi: 10.1006/jmbi.1993.1125. [DOI] [PubMed] [Google Scholar]
- Keeney J. B., Chapman K. B., Lauermann V., Voytas D. F., Aström S. U., von Pawel-Rammingen U., Byström A., Boeke J. D. Multiple molecular determinants for retrotransposition in a primer tRNA. Mol Cell Biol. 1995 Jan;15(1):217–226. doi: 10.1128/mcb.15.1.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
- Laferrière A., Gautheret D., Cedergren R. An RNA pattern matching program with enhanced performance and portability. Comput Appl Biosci. 1994 Apr;10(2):211–212. doi: 10.1093/bioinformatics/10.2.211. [DOI] [PubMed] [Google Scholar]
- Lisacek F., Diaz Y., Michel F. Automatic identification of group I intron cores in genomic DNA sequences. J Mol Biol. 1994 Jan 28;235(4):1206–1217. doi: 10.1006/jmbi.1994.1074. [DOI] [PubMed] [Google Scholar]
- Marvel C. C. A program for the identification of tRNA-like structures in DNA sequence data. Nucleic Acids Res. 1986 Jan 10;14(1):431–435. doi: 10.1093/nar/14.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oba T., Andachi Y., Muto A., Osawa S. CGG: an unassigned or nonsense codon in Mycoplasma capricolum. Proc Natl Acad Sci U S A. 1991 Feb 1;88(3):921–925. doi: 10.1073/pnas.88.3.921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paolella G., Russo T. A microcomputer program for the identification of tRNA genes. Comput Appl Biosci. 1985 Sep;1(3):149–151. doi: 10.1093/bioinformatics/1.3.149. [DOI] [PubMed] [Google Scholar]
- Pavesi A., Conterio F., Bolchi A., Dieci G., Ottonello S. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res. 1994 Apr 11;22(7):1247–1256. doi: 10.1093/nar/22.7.1247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sakakibara Y., Brown M., Hughey R., Mian I. S., Sjölander K., Underwood R. C., Haussler D. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 1994 Nov 25;22(23):5112–5120. doi: 10.1093/nar/22.23.5112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saurin W., Marlière P. Matching relational patterns in nucleic acid sequences. Comput Appl Biosci. 1987 Jun;3(2):115–120. doi: 10.1093/bioinformatics/3.2.115. [DOI] [PubMed] [Google Scholar]
- Shortridge R. D., Pirtle I. L., Pirtle R. M. IBM microcomputer programs that analyze DNA sequences for tRNA genes. Comput Appl Biosci. 1986 Apr;2(1):13–17. doi: 10.1093/bioinformatics/2.1.13. [DOI] [PubMed] [Google Scholar]
- Sibbald P. R., Sommerfeldt H., Argos P. Overseer: a nucleotide sequence searching tool. Comput Appl Biosci. 1992 Feb;8(1):45–48. doi: 10.1093/bioinformatics/8.1.45. [DOI] [PubMed] [Google Scholar]
- Staden R. A computer program to search for tRNA genes. Nucleic Acids Res. 1980 Feb 25;8(4):817–825. [PMC free article] [PubMed] [Google Scholar]
- Staden R. Methods to define and locate patterns of motifs in sequences. Comput Appl Biosci. 1988 Mar;4(1):53–60. doi: 10.1093/bioinformatics/4.1.53. [DOI] [PubMed] [Google Scholar]
- Steinberg S., Misch A., Sprinzl M. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1993 Jul 1;21(13):3011–3015. doi: 10.1093/nar/21.13.3011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woźniak P., Makałowski W. Searching for tRNA genes in DNA sequences--an IBM microcomputer program. Comput Appl Biosci. 1990 Jan;6(1):49–50. doi: 10.1093/bioinformatics/6.1.49. [DOI] [PubMed] [Google Scholar]