Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1994 Jun 11;22(11):2079–2088. doi: 10.1093/nar/22.11.2079

RNA sequence analysis using covariance models.

S R Eddy 1, R Durbin 1
PMCID: PMC308124  PMID: 8029015

Abstract

We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.

Full text

PDF
2079

Images in this article

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Bairoch A. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Res. 1993 Jul 1;21(13):3097–3103. doi: 10.1093/nar/21.13.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baldi P., Chauvin Y., Hunkapiller T., McClure M. A. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063. doi: 10.1073/pnas.91.3.1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bartel D. P., Szostak J. W. Isolation of new ribozymes from a large pool of random sequences [see comment]. Science. 1993 Sep 10;261(5127):1411–1418. doi: 10.1126/science.7690155. [DOI] [PubMed] [Google Scholar]
  5. Barton G. J. Protein multiple sequence alignment and flexible pattern matching. Methods Enzymol. 1990;183:403–428. doi: 10.1016/0076-6879(90)83027-7. [DOI] [PubMed] [Google Scholar]
  6. Berg O. G., von Hippel P. H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987 Feb 20;193(4):723–750. doi: 10.1016/0022-2836(87)90354-8. [DOI] [PubMed] [Google Scholar]
  7. Bork P., Ouzounis C., Sander C., Scharf M., Schneider R., Sonnhammer E. Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III. Protein Sci. 1992 Dec;1(12):1677–1690. doi: 10.1002/pro.5560011216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brenner S., Lerner R. A. Encoded combinatorial chemistry. Proc Natl Acad Sci U S A. 1992 Jun 15;89(12):5381–5383. doi: 10.1073/pnas.89.12.5381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brown J. W., Haas E. S., James B. D., Hunt D. A., Liu J. S., Pace N. R. Phylogenetic analysis and evolution of RNase P RNA in proteobacteria. J Bacteriol. 1991 Jun;173(12):3855–3863. doi: 10.1128/jb.173.12.3855-3863.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cech T. R., Bass B. L. Biological catalysis by RNA. Annu Rev Biochem. 1986;55:599–629. doi: 10.1146/annurev.bi.55.070186.003123. [DOI] [PubMed] [Google Scholar]
  11. Chiu D. K., Kolodziejczak T. Inferring consensus structure from nucleic acid sequences. Comput Appl Biosci. 1991 Jul;7(3):347–352. doi: 10.1093/bioinformatics/7.3.347. [DOI] [PubMed] [Google Scholar]
  12. Dam E., Pleij K., Draper D. Structural and functional aspects of RNA pseudoknots. Biochemistry. 1992 Dec 1;31(47):11665–11676. doi: 10.1021/bi00162a001. [DOI] [PubMed] [Google Scholar]
  13. Daniels G. R., Deininger P. L. Repeat sequence families derived from mammalian tRNA genes. 1985 Oct 31-Nov 6Nature. 317(6040):819–822. doi: 10.1038/317819a0. [DOI] [PubMed] [Google Scholar]
  14. Ellington A. D., Szostak J. W. In vitro selection of RNA molecules that bind specific ligands. Nature. 1990 Aug 30;346(6287):818–822. doi: 10.1038/346818a0. [DOI] [PubMed] [Google Scholar]
  15. Fichant G. A., Burks C. Identifying potential tRNA genes in genomic DNA sequences. J Mol Biol. 1991 Aug 5;220(3):659–671. doi: 10.1016/0022-2836(91)90108-i. [DOI] [PubMed] [Google Scholar]
  16. Fournier M. J., Maxwell E. S. The nucleolar snRNAs: catching up with the spliceosomal snRNAs. Trends Biochem Sci. 1993 Apr;18(4):131–135. doi: 10.1016/0968-0004(93)90020-n. [DOI] [PubMed] [Google Scholar]
  17. Fox G. E., Woese C. R. 5S RNA secondary structure. Nature. 1975 Aug 7;256(5517):505–507. doi: 10.1038/256505a0. [DOI] [PubMed] [Google Scholar]
  18. Gautheret D., Major F., Cedergren R. Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA. Comput Appl Biosci. 1990 Oct;6(4):325–331. doi: 10.1093/bioinformatics/6.4.325. [DOI] [PubMed] [Google Scholar]
  19. Green P., Lipman D., Hillier L., Waterston R., States D., Claverie J. M. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. doi: 10.1126/science.8456298. [DOI] [PubMed] [Google Scholar]
  20. Gribskov M., Lüthy R., Eisenberg D. Profile analysis. Methods Enzymol. 1990;183:146–159. doi: 10.1016/0076-6879(90)83011-w. [DOI] [PubMed] [Google Scholar]
  21. Gutell R. R., Power A., Hertz G. Z., Putz E. J., Stormo G. D. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res. 1992 Nov 11;20(21):5785–5795. doi: 10.1093/nar/20.21.5785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Guthrie C., Patterson B. Spliceosomal snRNAs. Annu Rev Genet. 1988;22:387–419. doi: 10.1146/annurev.ge.22.120188.002131. [DOI] [PubMed] [Google Scholar]
  23. Han K., Kim H. J. Prediction of common folding structures of homologous RNAs. Nucleic Acids Res. 1993 Mar 11;21(5):1251–1257. doi: 10.1093/nar/21.5.1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Higgins D. G., Bleasby A. J., Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992 Apr;8(2):189–191. doi: 10.1093/bioinformatics/8.2.189. [DOI] [PubMed] [Google Scholar]
  25. Konings D. A., Hogeweg P. Pattern analysis of RNA secondary structure similarity and consensus of minimal-energy folding. J Mol Biol. 1989 Jun 5;207(3):597–614. doi: 10.1016/0022-2836(89)90468-3. [DOI] [PubMed] [Google Scholar]
  26. Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
  27. Lambowitz A. M., Belfort M. Introns as mobile genetic elements. Annu Rev Biochem. 1993;62:587–622. doi: 10.1146/annurev.bi.62.070193.003103. [DOI] [PubMed] [Google Scholar]
  28. Larsen N., Zwieb C. The signal recognition particle database (SRPDB). Nucleic Acids Res. 1993 Jul 1;21(13):3019–3020. doi: 10.1093/nar/21.13.3019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lisacek F., Diaz Y., Michel F. Automatic identification of group I intron cores in genomic DNA sequences. J Mol Biol. 1994 Jan 28;235(4):1206–1217. doi: 10.1006/jmbi.1994.1074. [DOI] [PubMed] [Google Scholar]
  30. Marvel C. C. A program for the identification of tRNA-like structures in DNA sequence data. Nucleic Acids Res. 1986 Jan 10;14(1):431–435. doi: 10.1093/nar/14.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Michel F., Netter P., Xu M. Q., Shub D. A. Mechanism of 3' splice site selection by the catalytic core of the sunY intron of bacteriophage T4: the role of a novel base-pairing interaction in group I introns. Genes Dev. 1990 May;4(5):777–788. doi: 10.1101/gad.4.5.777. [DOI] [PubMed] [Google Scholar]
  32. Michel F., Umesono K., Ozeki H. Comparative and functional anatomy of group II catalytic introns--a review. Gene. 1989 Oct 15;82(1):5–30. doi: 10.1016/0378-1119(89)90026-7. [DOI] [PubMed] [Google Scholar]
  33. Michel F., Westhof E. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. J Mol Biol. 1990 Dec 5;216(3):585–610. doi: 10.1016/0022-2836(90)90386-Z. [DOI] [PubMed] [Google Scholar]
  34. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  35. Noller H. F., Kop J., Wheaton V., Brosius J., Gutell R. R., Kopylov A. M., Dohme F., Herr W., Stahl D. A., Gupta R. Secondary structure model for 23S ribosomal RNA. Nucleic Acids Res. 1981 Nov 25;9(22):6167–6189. doi: 10.1093/nar/9.22.6167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Noller H. F., Woese C. R. Secondary structure of 16S ribosomal RNA. Science. 1981 Apr 24;212(4493):403–411. doi: 10.1126/science.6163215. [DOI] [PubMed] [Google Scholar]
  37. Okimoto R., Wolstenholme D. R. A set of tRNAs that lack either the T psi C arm or the dihydrouridine arm: towards a minimal tRNA adaptor. EMBO J. 1990 Oct;9(10):3405–3411. doi: 10.1002/j.1460-2075.1990.tb07542.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Oliver S. G., van der Aart Q. J., Agostoni-Carbone M. L., Aigle M., Alberghina L., Alexandraki D., Antoine G., Anwar R., Ballesta J. P., Benit P. The complete DNA sequence of yeast chromosome III. Nature. 1992 May 7;357(6373):38–46. doi: 10.1038/357038a0. [DOI] [PubMed] [Google Scholar]
  39. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Robertson D. L., Joyce G. F. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature. 1990 Mar 29;344(6265):467–468. doi: 10.1038/344467a0. [DOI] [PubMed] [Google Scholar]
  41. Rosen C. A. Regulation of HIV gene expression by RNA-protein interactions. Trends Genet. 1991 Jan;7(1):9–14. doi: 10.1016/0168-9525(91)90015-i. [DOI] [PubMed] [Google Scholar]
  42. Saurin W., Marlière P. Matching relational patterns in nucleic acid sequences. Comput Appl Biosci. 1987 Jun;3(2):115–120. doi: 10.1093/bioinformatics/3.2.115. [DOI] [PubMed] [Google Scholar]
  43. Schneider T. D., Stormo G. D., Gold L., Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. doi: 10.1016/0022-2836(86)90165-8. [DOI] [PubMed] [Google Scholar]
  44. Shapiro B. A., Zhang K. Z. Comparing multiple RNA secondary structures using tree comparisons. Comput Appl Biosci. 1990 Oct;6(4):309–318. doi: 10.1093/bioinformatics/6.4.309. [DOI] [PubMed] [Google Scholar]
  45. Staden R. A computer program to search for tRNA genes. Nucleic Acids Res. 1980 Feb 25;8(4):817–825. [PMC free article] [PubMed] [Google Scholar]
  46. Steinberg S., Misch A., Sprinzl M. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1993 Jul 1;21(13):3011–3015. doi: 10.1093/nar/21.13.3011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Sulston J., Du Z., Thomas K., Wilson R., Hillier L., Staden R., Halloran N., Green P., Thierry-Mieg J., Qiu L. The C. elegans genome sequencing project: a beginning. Nature. 1992 Mar 5;356(6364):37–41. doi: 10.1038/356037a0. [DOI] [PubMed] [Google Scholar]
  48. Symons R. H. Small catalytic RNAs. Annu Rev Biochem. 1992;61:641–671. doi: 10.1146/annurev.bi.61.070192.003233. [DOI] [PubMed] [Google Scholar]
  49. Theil E. C. Regulation of ferritin and transferrin receptor mRNAs. J Biol Chem. 1990 Mar 25;265(9):4771–4774. [PubMed] [Google Scholar]
  50. Tuerk C., Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science. 1990 Aug 3;249(4968):505–510. doi: 10.1126/science.2200121. [DOI] [PubMed] [Google Scholar]
  51. Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989 Apr 7;244(4900):48–52. doi: 10.1126/science.2468181. [DOI] [PubMed] [Google Scholar]
  52. Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981 Jan 10;9(1):133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Zwieb C. Structure and function of signal recognition particle RNA. Prog Nucleic Acid Res Mol Biol. 1989;37:207–234. doi: 10.1016/s0079-6603(08)60699-6. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES