Abstract
We propose a new method for homology search of nucleic acids or proteins in databanks. All the possible subsequences of a specific length in a sequence are converted into a code and stored in an indexed file (hash-coding). This preliminary work of codifying an entire bank is rather long but it enables an immediate access to all the sequence fragments of a given type. With our method a strict homology pattern of twenty nucleotides can be found for example in the Los Alamos bank (GENBANK) in less than 2 seconds. We can also use this data storage to considerably speed up the non-strict homology search programs and to write a program to help in the selection of nucleic acid hybridization probes.
Full text
PDFSelected References
These references are in PubMed. This may not be the complete list of references from this article.
- Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dumas J. P., Ninio J. Efficient algorithms for folding and comparing nucleic acid sequences. Nucleic Acids Res. 1982 Jan 11;10(1):197–206. doi: 10.1093/nar/10.1.197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goad W. B., Kanehisa M. I. Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries. Nucleic Acids Res. 1982 Jan 11;10(1):247–263. doi: 10.1093/nar/10.1.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gouy M., Milleret F., Mugnier C., Jacobzone M., Gautier C. ACNUC: a nucleic acid sequence data base and analysis system. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):121–127. doi: 10.1093/nar/12.1part1.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orcutt B. C., George D. G., Dayhoff M. O. Protein and Nucleic Acid Sequence Database Systems. Annu Rev Biophys Bioeng. 1983;12:419–441. doi: 10.1146/annurev.bb.12.060183.002223. [DOI] [PubMed] [Google Scholar]
- Reich J. G., Drabsch H., Däumler A. On the statistical assessment of similarities in DNA sequences. Nucleic Acids Res. 1984 Jul 11;12(13):5529–5543. doi: 10.1093/nar/12.13.5529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staden R. An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. Nucleic Acids Res. 1982 May 11;10(9):2951–2961. doi: 10.1093/nar/10.9.2951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staden R. Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Res. 1982 Aug 11;10(15):4731–4751. doi: 10.1093/nar/10.15.4731. [DOI] [PMC free article] [PubMed] [Google Scholar]