Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1985 Jan 25;13(2):645–656. doi: 10.1093/nar/13.2.645

The statistical distribution of nucleic acid similarities.

T F Smith, M S Waterman, C Burks
PMCID: PMC341021  PMID: 3871073

Abstract

All pairs of a large set of known vertebrate DNA sequences were searched by computer for most similar segments. Analysis of this data shows that the computed similarity scores are distributed proportionally to the logarithm of the product of the lengths of the sequences involved. This distribution is closely related to recent results of Erdos and others on the longest run of heads in coin tossing. A simple rule is derived for determination of statistical significance of the similarity scores and to assist in relating statistical and biological significance.

Full text

PDF
645

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Comb M., Seeburg P. H., Adelman J., Eiden L., Herbert E. Primary structure of the human Met- and Leu-enkephalin precursor and its mRNA. Nature. 1982 Feb 25;295(5851):663–666. doi: 10.1038/295663a0. [DOI] [PubMed] [Google Scholar]
  2. Doolittle R. F., Hunkapiller M. W., Hood L. E., Devare S. G., Robbins K. C., Aaronson S. A., Antoniades H. N. Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science. 1983 Jul 15;221(4607):275–277. doi: 10.1126/science.6304883. [DOI] [PubMed] [Google Scholar]
  3. Fitch W. M., Smith T. F. Optimal sequence alignments. Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382–1386. doi: 10.1073/pnas.80.5.1382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Goad W. B., Kanehisa M. I. Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries. Nucleic Acids Res. 1982 Jan 11;10(1):247–263. doi: 10.1093/nar/10.1.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Hardison R. C., Butler E. T., 3rd, Lacy E., Maniatis T., Rosenthal N., Efstratiadis A. The structure and transcription of four linked rabbit beta-like globin genes. Cell. 1979 Dec;18(4):1285–1297. doi: 10.1016/0092-8674(79)90239-3. [DOI] [PubMed] [Google Scholar]
  6. Haynes S. R., Jelinek W. R. Low molecular weight RNAs transcribed in vitro by RNA polymerase III from Alu-type dispersed repeats in Chinese hamster DNA are also found in vivo. Proc Natl Acad Sci U S A. 1981 Oct;78(10):6130–6134. doi: 10.1073/pnas.78.10.6130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Heilig R., Perrin F., Gannon F., Mandel J. L., Chambon P. The ovalbumin gene family: structure of the X gene and evolution of duplicated split genes. Cell. 1980 Jul;20(3):625–637. doi: 10.1016/0092-8674(80)90309-8. [DOI] [PubMed] [Google Scholar]
  8. Hudson P., Haley J., Cronk M., Shine J., Niall H. Molecular cloning and characterization of cDNA sequences coding for rat relaxin. Nature. 1981 May 14;291(5811):127–131. doi: 10.1038/291127a0. [DOI] [PubMed] [Google Scholar]
  9. Karlin S., Ghandour G., Ost F., Tavare S., Korn L. J. New approaches for computer analysis of nucleic acid sequences. Proc Natl Acad Sci U S A. 1983 Sep;80(18):5660–5664. doi: 10.1073/pnas.80.18.5660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kataoka T., Kawakami T., Takahashi N., Honjo T. Rearrangement of immunoglobulin gamma 1-chain gene and mechanism for heavy-chain class switch. Proc Natl Acad Sci U S A. 1980 Feb;77(2):919–923. doi: 10.1073/pnas.77.2.919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Krayev A. S., Kramerov D. A., Skryabin K. G., Ryskov A. P., Bayev A. A., Georgiev G. P. The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNA. Nucleic Acids Res. 1980 Mar 25;8(6):1201–1215. doi: 10.1093/nar/8.6.1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Law S. W., Dugaiczyk A. Homology between the primary structure of alpha-fetoprotein, deduced from a complete cDNA sequence, and serum albumin. Nature. 1981 May 21;291(5812):201–205. doi: 10.1038/291201a0. [DOI] [PubMed] [Google Scholar]
  13. Lin Y., Gross J. K. Molecular cloning and characterization of winter flounder antifreeze cDNA. Proc Natl Acad Sci U S A. 1981 May;78(5):2825–2829. doi: 10.1073/pnas.78.5.2825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Miller W. L., Martial J. A., Baxter J. D. Molecular cloning of DNA complementary to bovine growth hormone mRNA. J Biol Chem. 1980 Aug 25;255(16):7521–7524. [PubMed] [Google Scholar]
  15. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  16. Ng R., Abelson J. Isolation and sequence of the gene for actin in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 1980 Jul;77(7):3912–3916. doi: 10.1073/pnas.77.7.3912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ollo R., Auffray C., Morchamps C., Rougeon F. Comparison of mouse immunoglobulin gamma 2a and gamma 2b chain genes suggests that exons can be exchanged between genes in a multigenic family. Proc Natl Acad Sci U S A. 1981 Apr;78(4):2442–2446. doi: 10.1073/pnas.78.4.2442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pan J., Elder J. T., Duncan C. H., Weissman S. M. Structural analysis of interspersed repetitive polymerase III transcription units in human DNA. Nucleic Acids Res. 1981 Mar 11;9(5):1151–1170. [PMC free article] [PubMed] [Google Scholar]
  19. Proudfoot N. J., Maniatis T. The structure of a human alpha-globin pseudogene and its relationship to alpha-globin gene duplication. Cell. 1980 Sep;21(2):537–544. doi: 10.1016/0092-8674(80)90491-2. [DOI] [PubMed] [Google Scholar]
  20. Roskam W. G., Rougeon F. Molecular cloning and nucleotide sequence of the human growth hormone structural gene. Nucleic Acids Res. 1979 Sep 25;7(2):305–320. doi: 10.1093/nar/7.2.305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Salim M., Maden B. E. Nucleotide sequence of Xenopus laevis 18S ribosomal RNA inferred from gene sequence. Nature. 1981 May 21;291(5812):205–208. doi: 10.1038/291205a0. [DOI] [PubMed] [Google Scholar]
  22. Seeburg P. H., Shine J., Martial J. A., Baxter J. D., Goodman H. M. Nucleotide sequence and amplification in bacteria of structural gene for rat growth hormone. Nature. 1977 Dec 8;270(5637):486–494. doi: 10.1038/270486a0. [DOI] [PubMed] [Google Scholar]
  23. Sekiya T., Kuchino Y., Nishimura S. Mammalian tRNA genes: nucleotide sequence of rat genes for tRNAAsp, tRNAGly and tRNAGlu. Nucleic Acids Res. 1981 May 25;9(10):2239–2250. doi: 10.1093/nar/9.10.2239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Smith T. F., Burks C. Sequence banks. Searching for sequence similarities. Nature. 1983 Jan 20;301(5897):194–194. doi: 10.1038/301194a0. [DOI] [PubMed] [Google Scholar]
  25. Smith T. F., Waterman M. S., Sadler J. R. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. doi: 10.1093/nar/11.7.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Vanin E. F., Goldberg G. I., Tucker P. W., Smithies O. A mouse alpha-globin-related pseudogene lacking intervening sequences. Nature. 1980 Jul 17;286(5770):222–226. doi: 10.1038/286222a0. [DOI] [PubMed] [Google Scholar]
  27. Watson R. J., Umene K., Enquist L. W. Reiterated sequences within the intron of an immediate-early gene of herpes simplex virus type 1. Nucleic Acids Res. 1981 Aug 25;9(16):4189–4199. doi: 10.1093/nar/9.16.4189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Williams J. G., Kay R. M., Patient R. K. The nucleotide sequence of the major beta-globin mRNA from Xenopus laevis. Nucleic Acids Res. 1980 Sep 25;8(18):4247–4258. doi: 10.1093/nar/8.18.4247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Woo S. L., Beattie W. G., Catterall J. F., Dugaiczyk A., Staden R., Brownlee G. G., O'Malley B. W. Complete nucleotide sequence of the chicken chromosomal ovalbumin gene and its biological significance. Biochemistry. 1981 Oct 27;20(22):6437–6446. doi: 10.1021/bi00525a024. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES