Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1984 Jan 11;12(1 Pt 1):215–226. doi: 10.1093/nar/12.1part1.215

On the statistical significance of nucleic acid similarities.

D J Lipman, W J Wilbur, T F Smith, M S Waterman
PMCID: PMC320998  PMID: 6694902

Abstract

When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The utility of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities. In two of the cases we identify the statistical properties which are responsible for the apparent similarity. In the other two cases the statistical significance of the similarity persists even when the known statistical properties of sequences are modelled. For one of these cases biological significance is likely while the other case remains an enigma.

Full text

PDF
215

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Fickett J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. doi: 10.1093/nar/10.17.5303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Fitch W. M. Random sequences. J Mol Biol. 1983 Jan 15;163(2):171–176. doi: 10.1016/0022-2836(83)90002-5. [DOI] [PubMed] [Google Scholar]
  3. Goad W. B., Kanehisa M. I. Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries. Nucleic Acids Res. 1982 Jan 11;10(1):247–263. doi: 10.1093/nar/10.1.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Grantham R., Gautier C., Gouy M. Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res. 1980 May 10;8(9):1893–1912. doi: 10.1093/nar/8.9.1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Grantham R., Gautier C., Gouy M., Mercier R., Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980 Jan 11;8(1):r49–r62. doi: 10.1093/nar/8.1.197-c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gubbins E. J., Maurer R. A., Lagrimini M., Erwin C. R., Donelson J. E. Structure of the rat prolactin gene. J Biol Chem. 1980 Sep 25;255(18):8655–8662. [PubMed] [Google Scholar]
  7. Korn L. J., Queen C. L., Wegman M. N. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401–4405. doi: 10.1073/pnas.74.10.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lipman D. J., Wilbur W. J. Contextual constraints on synonymous codon choice. J Mol Biol. 1983 Jan 25;163(3):363–376. doi: 10.1016/0022-2836(83)90063-3. [DOI] [PubMed] [Google Scholar]
  9. Moreau J., Marcaud L., Maschat F., Kejzlarova-Lepesant J., Lepesant J. A., Scherrer K. A + T-rich linkers define functional domains in eukaryotic DNA. Nature. 1982 Jan 21;295(5846):260–262. doi: 10.1038/295260a0. [DOI] [PubMed] [Google Scholar]
  10. Nussinov R. Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res. 1980 Oct 10;8(19):4545–4562. doi: 10.1093/nar/8.19.4545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Nussinov R. Strong adenine clustering in nucleotide sequences. J Theor Biol. 1980 Jul 21;85(2):285–291. doi: 10.1016/0022-5193(80)90021-1. [DOI] [PubMed] [Google Scholar]
  12. Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
  13. Smith T. F., Waterman M. S., Sadler J. R. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. doi: 10.1093/nar/11.7.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES