Abstract
A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity. The method does not involve sequence alignment procedures and permits the rapid screening of large amounts of sequence data. It identifies those sequences which deserve more careful analysis of sequence similarity at the level of resolution of the single nucleotide. It uses observed quantities only and does not involve the adoption of any theoretical model.
Full text
PDF





Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Blaisdell B. E. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155–5159. doi: 10.1073/pnas.83.14.5155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blaisdell B. E. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1984;21(3):278–288. doi: 10.1007/BF02102360. [DOI] [PubMed] [Google Scholar]
- Brendel V., Beckmann J. S., Trifonov E. N. Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J Biomol Struct Dyn. 1986 Aug;4(1):11–21. doi: 10.1080/07391102.1986.10507643. [DOI] [PubMed] [Google Scholar]
- Caizzi R., Bozzetti M. P., Caggese C., Ritossa F. Homologous nuclear genes encode cytoplasmic and mitochondrial glutamine synthetase in Drosophila melanogaster. J Mol Biol. 1990 Mar 5;212(1):17–26. doi: 10.1016/0022-2836(90)90301-2. [DOI] [PubMed] [Google Scholar]
- Gouy M., Gautier C., Attimonelli M., Lanave C., di Paola G. ACNUC--a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput Appl Biosci. 1985 Sep;1(3):167–172. doi: 10.1093/bioinformatics/1.3.167. [DOI] [PubMed] [Google Scholar]
- Kimelberg H. K. "Homology" controversy. Science. 1987 Nov 27;238(4831):1217–1217. doi: 10.1126/science.3685969. [DOI] [PubMed] [Google Scholar]
- Lewin R. When does homology mean something else? Science. 1987 Sep 25;237(4822):1570–1570. doi: 10.1126/science.3629257. [DOI] [PubMed] [Google Scholar]
- Lipman D. J., Pearson W. R. Rapid and sensitive protein similarity searches. Science. 1985 Mar 22;227(4693):1435–1441. doi: 10.1126/science.2983426. [DOI] [PubMed] [Google Scholar]
- Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearson W. R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. doi: 10.1016/0076-6879(90)83007-v. [DOI] [PubMed] [Google Scholar]
- Pesole G., Bozzetti M. P., Lanave C., Preparata G., Saccone C. Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci U S A. 1991 Jan 15;88(2):522–526. doi: 10.1073/pnas.88.2.522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips G. J., Arnold J., Ivarie R. The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2627–2638. doi: 10.1093/nar/15.6.2627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pietrokovski S., Hirshon J., Trifonov E. N. Linguistic measure of taxonomic and functional relatedness of nucleotide sequences. J Biomol Struct Dyn. 1990 Jun;7(6):1251–1268. doi: 10.1080/07391102.1990.10508563. [DOI] [PubMed] [Google Scholar]
- Reeck G. R., de Haën C., Teller D. C., Doolittle R. F., Fitch W. M., Dickerson R. E., Chambon P., McLachlan A. D., Margoliash E., Jukes T. H. "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell. 1987 Aug 28;50(5):667–667. doi: 10.1016/0092-8674(87)90322-9. [DOI] [PubMed] [Google Scholar]
- Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
- Stückle E. E., Emmrich C., Grob U., Nielsen P. J. Statistical analysis of nucleotide sequences. Nucleic Acids Res. 1990 Nov 25;18(22):6641–6647. doi: 10.1093/nar/18.22.6641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]