Abstract
A central question in sequence comparison is the statistical significance of an observed similarity. For local alignment containing gaps to optimize sequence similarity this problem has so far not been solved mathematically. Using as a basis the Chen-Stein theory of Poisson approximation, we present a practical method to approximate the probability that a local alignment score is a result of chance alone. For a set of similarity scores and gap penalties only one simulation of random alignments needs to be calculated to derive the key information allowing us to estimate the significance of any alignment calculated under this setting. We present applications to data base searching and the analysis of pairwise and self-comparisons of proteins.
Full text
PDF



Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Barker W. C., Dayhoff M. O. Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase. Proc Natl Acad Sci U S A. 1982 May;79(9):2836–2839. doi: 10.1073/pnas.79.9.2836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins J. F., Coulson A. F. Significance of protein sequence similarities. Methods Enzymol. 1990;183:474–487. doi: 10.1016/0076-6879(90)83032-5. [DOI] [PubMed] [Google Scholar]
- Dayhoff M. O., Barker W. C., Hunt L. T. Establishing homologies in protein sequences. Methods Enzymol. 1983;91:524–545. doi: 10.1016/s0076-6879(83)91049-2. [DOI] [PubMed] [Google Scholar]
- Doolittle R. F., Hunkapiller M. W., Hood L. E., Devare S. G., Robbins K. C., Aaronson S. A., Antoniades H. N. Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science. 1983 Jul 15;221(4607):275–277. doi: 10.1126/science.6304883. [DOI] [PubMed] [Google Scholar]
- Doolittle R. F. Similar amino acid sequences: chance or common ancestry? Science. 1981 Oct 9;214(4517):149–159. doi: 10.1126/science.7280687. [DOI] [PubMed] [Google Scholar]
- Goldstein L., Waterman M. S. Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons. Bull Math Biol. 1992 Sep;54(5):785–812. doi: 10.1007/BF02459930. [DOI] [PubMed] [Google Scholar]
- Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982 Dec 15;162(3):705–708. doi: 10.1016/0022-2836(82)90398-9. [DOI] [PubMed] [Google Scholar]
- Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipman D. J., Pearson W. R. Rapid and sensitive protein similarity searches. Science. 1985 Mar 22;227(4693):1435–1441. doi: 10.1126/science.2983426. [DOI] [PubMed] [Google Scholar]
- McCaldon P., Argos P. Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. Proteins. 1988;4(2):99–122. doi: 10.1002/prot.340040204. [DOI] [PubMed] [Google Scholar]
- Pearson W. R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics. 1991 Nov;11(3):635–650. doi: 10.1016/0888-7543(91)90071-l. [DOI] [PubMed] [Google Scholar]
- Riordan J. R., Rommens J. M., Kerem B., Alon N., Rozmahel R., Grzelczak Z., Zielenski J., Lok S., Plavsic N., Chou J. L. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science. 1989 Sep 8;245(4922):1066–1073. doi: 10.1126/science.2475911. [DOI] [PubMed] [Google Scholar]
- Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
- Vingron M., Waterman M. S. Sequence alignment and penalty choice. Review of concepts, case studies and implications. J Mol Biol. 1994 Jan 7;235(1):1–12. doi: 10.1016/s0022-2836(05)80006-3. [DOI] [PubMed] [Google Scholar]
- Waterman M. S., Eggert M. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol. 1987 Oct 20;197(4):723–728. doi: 10.1016/0022-2836(87)90478-5. [DOI] [PubMed] [Google Scholar]
- Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]