Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1986 Jan 10;14(1):597–610. doi: 10.1093/nar/14.1.597

Improving the efficiency of dot-matrix similarity searches through use of an oligomer table.

B Fristensky
PMCID: PMC339447  PMID: 3753792

Abstract

Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.

Full text

PDF
597

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Deininger P. L., Jolly D. J., Rubin C. M., Friedmann T., Schmid C. W. Base sequence studies of 300 nucleotide renatured repeated human DNA clones. J Mol Biol. 1981 Sep 5;151(1):17–33. doi: 10.1016/0022-2836(81)90219-9. [DOI] [PubMed] [Google Scholar]
  2. Deno H., Shinozaki K., Sugiura M. Nucleotide sequence of tobacco chloroplast gene for the alpha subunit of proton-translocating ATPase. Nucleic Acids Res. 1983 Apr 11;11(7):2185–2191. doi: 10.1093/nar/11.7.2185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Engler J. A., van Bree M. P. The nucleotide sequence and protein-coding capability of the transposable element IS5. Gene. 1981 Aug;14(3):155–163. doi: 10.1016/0378-1119(81)90111-6. [DOI] [PubMed] [Google Scholar]
  4. Fickett J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. doi: 10.1093/nar/10.17.5303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fristensky B., Lis J., Wu R. Portable microcomputer software for nucleotide sequence analysis. Nucleic Acids Res. 1982 Oct 25;10(20):6451–6463. doi: 10.1093/nar/10.20.6451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Goad W. B., Kanehisa M. I. Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries. Nucleic Acids Res. 1982 Jan 11;10(1):247–263. doi: 10.1093/nar/10.1.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Hawley D. K., McClure W. R. Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res. 1983 Apr 25;11(8):2237–2255. doi: 10.1093/nar/11.8.2237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Korn L. J., Queen C. L., Wegman M. N. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401–4405. doi: 10.1073/pnas.74.10.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Krayev A. S., Kramerov D. A., Skryabin K. G., Ryskov A. P., Bayev A. A., Georgiev G. P. The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNA. Nucleic Acids Res. 1980 Mar 25;8(6):1201–1215. doi: 10.1093/nar/8.6.1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Levene S. D., Crothers D. M. A computer graphics study of sequence-directed bending in DNA. J Biomol Struct Dyn. 1983 Oct;1(2):429–435. doi: 10.1080/07391102.1983.10507452. [DOI] [PubMed] [Google Scholar]
  11. Lipman D. J., Wilbur W. J., Smith T. F., Waterman M. S. On the statistical significance of nucleic acid similarities. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215–226. doi: 10.1093/nar/12.1part1.215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Maizel J. V., Jr, Lenk R. P. Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci U S A. 1981 Dec;78(12):7665–7669. doi: 10.1073/pnas.78.12.7665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Martinez H. M. An efficient method for finding repeats in molecular sequences. Nucleic Acids Res. 1983 Jul 11;11(13):4629–4634. doi: 10.1093/nar/11.13.4629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  15. Pinkham J. L., Platt T. The nucleotide sequence of the rho gene of E. coli K-12. Nucleic Acids Res. 1983 Jun 11;11(11):3531–3545. doi: 10.1093/nar/11.11.3531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Pustell J., Kafatos F. C. A high speed, high capacity homology matrix: zooming through SV40 and polyoma. Nucleic Acids Res. 1982 Aug 11;10(15):4765–4782. doi: 10.1093/nar/10.15.4765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Schoner B., Kahn M. The nucleotide sequence of IS5 from Escherichia coli. Gene. 1981 Aug;14(3):165–174. doi: 10.1016/0378-1119(81)90112-8. [DOI] [PubMed] [Google Scholar]
  18. Schoner B., Kelly S., Smith H. O. The nucleotide sequence of the HhaII restriction and modification genes from Haemophilus haemolyticus. Gene. 1983 Oct;24(2-3):227–236. doi: 10.1016/0378-1119(83)90083-5. [DOI] [PubMed] [Google Scholar]
  19. Smith T. F., Waterman M. S., Sadler J. R. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. doi: 10.1093/nar/11.7.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Taylor P. A fast homology program for aligning biological sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):447–455. doi: 10.1093/nar/12.1part2.447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES