Skip to main content
Comparative and Functional Genomics logoLink to Comparative and Functional Genomics
. 2001 Feb;2(1):4–9. doi: 10.1002/cfg.61

Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?

K Cara Woodwark 1,, Simon J Hubbard 1, Stephen G Oliver 2
PMCID: PMC2447189  PMID: 18628895

Abstract

Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University’s and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University’s version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University’s version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith–Waterman algorithm.

Full Text

The Full Text of this article is available as a PDF (405.8 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Fischer G., James S. A., Roberts I. N., Oliver S. G., Louis E. J. Chromosomal evolution in Saccharomyces. Nature. 2000 May 25;405(6785):451–454. doi: 10.1038/35013058. [DOI] [PubMed] [Google Scholar]
  4. Goffeau A., Barrell B. G., Bussey H., Davis R. W., Dujon B., Feldmann H., Galibert F., Hoheisel J. D., Jacq C., Johnston M. Life with 6000 genes. Science. 1996 Oct 25;274(5287):546, 563-7. doi: 10.1126/science.274.5287.546. [DOI] [PubMed] [Google Scholar]
  5. Keogh R. S., Seoighe C., Wolfe K. H. Evolution of gene order and chromosome number in Saccharomyces, Kluyveromyces and related fungi. Yeast. 1998 Mar 30;14(5):443–457. doi: 10.1002/(SICI)1097-0061(19980330)14:5<443::AID-YEA243>3.0.CO;2-L. [DOI] [PubMed] [Google Scholar]
  6. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ryu S. L., Murooka Y., Kaneko Y. Reciprocal translocation at duplicated RPL2 loci might cause speciation of Saccharomyces bayanus and Saccharomyces cerevisiae. Curr Genet. 1998 May;33(5):345–351. doi: 10.1007/s002940050346. [DOI] [PubMed] [Google Scholar]
  8. Seoighe C., Wolfe K. H. Updated map of duplicated regions in the yeast genome. Gene. 1999 Sep 30;238(1):253–261. doi: 10.1016/s0378-1119(99)00319-4. [DOI] [PubMed] [Google Scholar]
  9. Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
  10. Wolfe K. H., Shields D. C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997 Jun 12;387(6634):708–713. doi: 10.1038/42711. [DOI] [PubMed] [Google Scholar]

Articles from Comparative and Functional Genomics are provided here courtesy of Wiley

RESOURCES