Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1996 Jul 15;24(14):2730–2739. doi: 10.1093/nar/24.14.2730

PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

E Birney 1, J D Thompson 1, T J Gibson 1
PMCID: PMC145991  PMID: 8759004

Abstract

DNA translation frames can be disrupted for several reasons, including: (i) errors in sequence determination; (ii) RNA processing, such as intron removal and guide RNA editing; (iii) less commonly, polymerase frameshifting during transcription or ribosomal frameshifting during translation. Frameshifts frequently confound computational activities involving homologous sequences, such as database searches and inferences on structure, function or phylogeny made from multiple alignments. A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting. The algorithm has been incorporated into a new package, WiseTools, for comparison of biological sequences. A protein profile can be compared against either a DNA sequence or a protein sequence. The program PairWise may be used interactively for alignment of any two sequence inputs. SearchWise can perform combinations of searches through DNA or protein databases by a protein profile or DNA sequence. Routine application of the programs has revealed a set of database entries with frameshifts caused by errors in sequence determination.

Full Text

The Full Text of this article is available as a PDF (293.0 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Aasland R., Gibson T. J., Stewart A. F. The PHD finger: implications for chromatin-mediated transcriptional regulation. Trends Biochem Sci. 1995 Feb;20(2):56–59. doi: 10.1016/s0968-0004(00)88957-4. [DOI] [PubMed] [Google Scholar]
  2. Aasland R., Stewart A. F., Gibson T. The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem Sci. 1996 Mar;21(3):87–88. [PubMed] [Google Scholar]
  3. Aasland R., Stewart A. F. The chromo shadow domain, a second chromo domain in heterochromatin-binding protein 1, HP1. Nucleic Acids Res. 1995 Aug 25;23(16):3168–3173. doi: 10.1093/nar/23.16.3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Adams J. M., Houston H., Allen J., Lints T., Harvey R. The hematopoietically expressed vav proto-oncogene shares homology with the dbl GDP-GTP exchange factor, the bcr gene and a yeast gene (CDC24) involved in cytoskeletal organization. Oncogene. 1992 Apr;7(4):611–618. [PubMed] [Google Scholar]
  5. Adams M. D., Kelley J. M., Gocayne J. D., Dubnick M., Polymeropoulos M. H., Xiao H., Merril C. R., Wu A., Olde B., Moreno R. F. Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991 Jun 21;252(5013):1651–1656. doi: 10.1126/science.2047873. [DOI] [PubMed] [Google Scholar]
  6. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  7. Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 1994 Sep;22(17):3578–3580. [PMC free article] [PubMed] [Google Scholar]
  8. Bairoch A., Bucher P., Hofmann K. The PROSITE database, its status in 1995. Nucleic Acids Res. 1996 Jan 1;24(1):189–196. doi: 10.1093/nar/24.1.189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beck S. Accuracy of DNA sequencing: should the sequence quality be monitored? DNA Seq. 1993;4(3):215–217. doi: 10.3109/10425179309015635. [DOI] [PubMed] [Google Scholar]
  10. Benner S. A., Cohen M. A., Gonnet G. H. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 1994 Nov;7(11):1323–1332. doi: 10.1093/protein/7.11.1323. [DOI] [PubMed] [Google Scholar]
  11. Benson D. A., Boguski M., Lipman D. J., Ostell J. GenBank. Nucleic Acids Res. 1994 Sep;22(17):3441–3444. doi: 10.1093/nar/22.17.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Birney E., Kumar S., Krainer A. R. Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res. 1993 Dec 25;21(25):5803–5816. doi: 10.1093/nar/21.25.5803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Boguski M. S., Bairoch A., Attwood T. K., Michaels G. S. Proto-vav and gene expression. Nature. 1992 Jul 9;358(6382):113–113. doi: 10.1038/358113a0. [DOI] [PubMed] [Google Scholar]
  14. Bork P., Gibson T. J. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]
  15. Bork P. Sperm-egg binding protein or proto-oncogene? Science. 1996 Mar 8;271(5254):1431–1435. doi: 10.1126/science.271.5254.1431. [DOI] [PubMed] [Google Scholar]
  16. Claverie J. M. Detecting frame shifts by amino acid sequence comparison. J Mol Biol. 1993 Dec 20;234(4):1140–1157. doi: 10.1006/jmbi.1993.1666. [DOI] [PubMed] [Google Scholar]
  17. Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Fichant G. A., Quentin Y. A frameshift error detection algorithm for DNA sequencing projects. Nucleic Acids Res. 1995 Aug 11;23(15):2900–2908. doi: 10.1093/nar/23.15.2900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gibson T. J., Higgins D. G. Non-muscle and smooth muscle myosin light chain kinases: no end in sight. DNA Seq. 1993;3(5):333–335. doi: 10.3109/10425179309020833. [DOI] [PubMed] [Google Scholar]
  20. Gibson T. J., Hyvönen M., Musacchio A., Saraste M., Birney E. PH domain: the first anniversary. Trends Biochem Sci. 1994 Sep;19(9):349–353. doi: 10.1016/0968-0004(94)90108-2. [DOI] [PubMed] [Google Scholar]
  21. Gibson T. J., Thompson J. D., Blocker A., Kouzarides T. Evidence for a protein domain superfamily shared by the cyclins, TFIIB and RB/p107. Nucleic Acids Res. 1994 Mar 25;22(6):946–952. doi: 10.1093/nar/22.6.946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gribskov M., Burgess R. R. Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. Nucleic Acids Res. 1986 Aug 26;14(16):6745–6763. doi: 10.1093/nar/14.16.6745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Guan X., Uberbacher E. C. Alignments of DNA and protein sequences containing frameshift errors. Comput Appl Biosci. 1996 Feb;12(1):31–40. doi: 10.1093/bioinformatics/12.1.31. [DOI] [PubMed] [Google Scholar]
  25. Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Katzav S., Martin-Zanca D., Barbacid M. vav, a novel human oncogene derived from a locus ubiquitously expressed in hematopoietic cells. EMBO J. 1989 Aug;8(8):2283–2290. doi: 10.1002/j.1460-2075.1989.tb08354.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kharrat A., Macias M. J., Gibson T. J., Nilges M., Pastore A. Structure of the dsRNA binding domain of E. coli RNase III. EMBO J. 1995 Jul 17;14(14):3572–3584. doi: 10.1002/j.1460-2075.1995.tb07363.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Pearson W. R., Miller W. Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 1992;210:575–601. doi: 10.1016/0076-6879(92)10029-d. [DOI] [PubMed] [Google Scholar]
  30. Posfai J., Roberts R. J. Finding errors in DNA sequences. Proc Natl Acad Sci U S A. 1992 May 15;89(10):4698–4702. doi: 10.1073/pnas.89.10.4698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Snyder E. E., Stormo G. D. Identification of protein coding regions in genomic DNA. J Mol Biol. 1995 Apr 21;248(1):1–18. doi: 10.1006/jmbi.1995.0198. [DOI] [PubMed] [Google Scholar]
  32. Staden R. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):551–567. doi: 10.1093/nar/12.1part2.551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Staden R. Searching for patterns in protein and nucleic acid sequences. Methods Enzymol. 1990;183:193–211. doi: 10.1016/0076-6879(90)83014-z. [DOI] [PubMed] [Google Scholar]
  34. States D. J., Botstein D. Molecular sequence accuracy and the analysis of protein coding regions. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5518–5522. doi: 10.1073/pnas.88.13.5518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. States D. J. Molecular sequence accuracy: analysing imperfect data. Trends Genet. 1992 Feb;8(2):52–55. doi: 10.1016/0168-9525(92)90349-9. [DOI] [PubMed] [Google Scholar]
  36. Séraphin B. Sm and Sm-like proteins belong to a large family: identification of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. EMBO J. 1995 May 1;14(9):2089–2098. doi: 10.1002/j.1460-2075.1995.tb07200.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Tachibana K., Ishiura M., Uchida T., Kishimoto T. The starfish egg mRNA responsible for meiosis reinitiation encodes cyclin. Dev Biol. 1990 Aug;140(2):241–252. doi: 10.1016/0012-1606(90)90074-s. [DOI] [PubMed] [Google Scholar]
  38. Tatusov R. L., Mushegian A. R., Bork P., Brown N. P., Hayes W. S., Borodovsky M., Rudd K. E., Koonin E. V. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol. 1996 Mar 1;6(3):279–291. doi: 10.1016/s0960-9822(02)00478-5. [DOI] [PubMed] [Google Scholar]
  39. Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]
  40. Uberbacher E. C., Mural R. J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci U S A. 1991 Dec 15;88(24):11261–11265. doi: 10.1073/pnas.88.24.11261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Waterman M. S., Eggert M. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol. 1987 Oct 20;197(4):723–728. doi: 10.1016/0022-2836(87)90478-5. [DOI] [PubMed] [Google Scholar]
  42. Wilson R., Ainscough R., Anderson K., Baynes C., Berks M., Bonfield J., Burton J., Connell M., Copsey T., Cooper J. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature. 1994 Mar 3;368(6466):32–38. doi: 10.1038/368032a0. [DOI] [PubMed] [Google Scholar]
  43. Xu Y., Mural R. J., Uberbacher E. C. Correcting sequencing errors in DNA coding regions using a dynamic programming approach. Comput Appl Biosci. 1995 Apr;11(2):117–124. doi: 10.1093/bioinformatics/11.2.117. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES