Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1992 May 15;89(10):4698–4702. doi: 10.1073/pnas.89.10.4698

Finding errors in DNA sequences.

J Posfai 1, R J Roberts 1
PMCID: PMC49150  PMID: 1316617

Abstract

An algorithm is described that can detect certain errors within coding regions of DNA sequences. The algorithm is based on the idea that an insertion or deletion error within a coding sequence would interrupt the reading frame and cause the correct translation of a DNA sequence to require one or more frameshifts. If the coding sequence shows similarity to a known protein sequence then such errors can be detected by comparing the conceptual translations of DNA sequences in all six reading frames with every sequence in a protein sequence data base. We have incorporated these ideas into a computer program, called DETECT, that can serve as an aid to the experimentalist who is determining new DNA sequences so that obvious errors may be located and corrected. The program has been tested using raw experimental data and against sequences from the European Molecular Biology Laboratory data base, annotated as containing frameshifts. We have also tested it using unidentified open reading frames that flank known, annotated genes in the GenBank data base. Many potential errors are apparent and in some cases functions can be suggested for the "corrected" versions of these reading frames leading to the identification of new genes. As more sequences are determined the power of this method will increase substantially.

Full text

PDF
4698

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. An F. Y., Clewell D. B. Tn917 transposase. Sequence correction reveals a single open reading frame corresponding to the tnpA determinant of Tn3-family elements. Plasmid. 1991 Mar;25(2):121–124. doi: 10.1016/0147-619x(91)90023-p. [DOI] [PubMed] [Google Scholar]
  3. Atkins J. F., Weiss R. B., Gesteland R. F. Ribosome gymnastics--degree of difficulty 9.5, style 10.0. Cell. 1990 Aug 10;62(3):413–423. doi: 10.1016/0092-8674(90)90007-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barry E. M., Weiss A. A., Ehrmann I. E., Gray M. C., Hewlett E. L., Goodwin M. S. Bordetella pertussis adenylate cyclase toxin and hemolytic activities require a second gene, cyaC, for activation. J Bacteriol. 1991 Jan;173(2):720–726. doi: 10.1128/jb.173.2.720-726.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Behrens B., Noyer-Weidner M., Pawlek B., Lauster R., Balganesh T. S., Trautner T. A. Organization of multispecific DNA methyltransferases encoded by temperate Bacillus subtilis phages. EMBO J. 1987 Apr;6(4):1137–1142. doi: 10.1002/j.1460-2075.1987.tb04869.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Craigen W. J., Cook R. G., Tate W. P., Caskey C. T. Bacterial peptide chain release factors: conserved primary structure and possible frameshift regulation of release factor 2. Proc Natl Acad Sci U S A. 1985 Jun;82(11):3616–3620. doi: 10.1073/pnas.82.11.3616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dayhoff M. O., Barker W. C., Hunt L. T. Establishing homologies in protein sequences. Methods Enzymol. 1983;91:524–545. doi: 10.1016/s0076-6879(83)91049-2. [DOI] [PubMed] [Google Scholar]
  8. Fickett J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. doi: 10.1093/nar/10.17.5303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gingeras T. R., Milazzo J. P., Sciaky D., Roberts R. J. Computer programs for the assembly of DNA sequences. Nucleic Acids Res. 1979 Sep 25;7(2):529–545. doi: 10.1093/nar/7.2.529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gingeras T. R., Rice P., Roberts R. J. A semi-automated method for the reading of nucleic acid sequencing gels. Nucleic Acids Res. 1982 Jan 11;10(1):103–114. doi: 10.1093/nar/10.1.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Glaser P., Sakamoto H., Bellalou J., Ullmann A., Danchin A. Secretion of cyclolysin, the calmodulin-sensitive adenylate cyclase-haemolysin bifunctional protein of Bordetella pertussis. EMBO J. 1988 Dec 1;7(12):3997–4004. doi: 10.1002/j.1460-2075.1988.tb03288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Grymes R. A., Travers P., Engelberg A. GEL--a computer tool for DNA sequencing projects. Nucleic Acids Res. 1986 Jan 10;14(1):87–98. doi: 10.1093/nar/14.1.87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Henikoff S., Wallace J. C. Detection of protein similarities using nucleotide sequence databases. Nucleic Acids Res. 1988 Jul 11;16(13):6191–6204. doi: 10.1093/nar/16.13.6191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lauster R., Trautner T. A., Noyer-Weidner M. Cytosine-specific type II DNA methyltransferases. A conserved enzyme core with variable target-recognizing domains. J Mol Biol. 1989 Mar 20;206(2):305–312. doi: 10.1016/0022-2836(89)90480-4. [DOI] [PubMed] [Google Scholar]
  15. Lautenberger J. A. A program for reading DNA sequence gels using a small computer equipped with a graphics tablet. Nucleic Acids Res. 1982 Jan 11;10(1):27–30. doi: 10.1093/nar/10.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Mahillon J., Lereclus D. Structural and functional analysis of Tn4430: identification of an integrase-like protein involved in the co-integrate-resolution process. EMBO J. 1988 May;7(5):1515–1526. doi: 10.1002/j.1460-2075.1988.tb02971.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Mahillon J., Seurinck J. Complete nucleotide sequence of pGI2, a Bacillus thuringiensis plasmid containing Tn4430. Nucleic Acids Res. 1988 Dec 23;16(24):11827–11828. doi: 10.1093/nar/16.24.11827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pósfai J., Bhagwat A. S., Pósfai G., Roberts R. J. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 1989 Apr 11;17(7):2421–2435. doi: 10.1093/nar/17.7.2421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Saiki R. K., Gelfand D. H., Stoffel S., Scharf S. J., Higuchi R., Horn G. T., Mullis K. B., Erlich H. A. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 1988 Jan 29;239(4839):487–491. doi: 10.1126/science.2448875. [DOI] [PubMed] [Google Scholar]
  21. Shaw J. H., Clewell D. B. Complete nucleotide sequence of macrolide-lincosamide-streptogramin B-resistance transposon Tn917 in Streptococcus faecalis. J Bacteriol. 1985 Nov;164(2):782–796. doi: 10.1128/jb.164.2.782-796.1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Shepherd J. C. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc Natl Acad Sci U S A. 1981 Mar;78(3):1596–1600. doi: 10.1073/pnas.78.3.1596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Simpson L., Shaw J. RNA editing and the mitochondrial cryptogenes of kinetoplastid protozoa. Cell. 1989 May 5;57(3):355–366. doi: 10.1016/0092-8674(89)90911-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sjöberg S., Carlsson P., Enerbäck S., Bjursell G. A compact, flexible and cheap system for acquiring sequence data from autoradiograms with a digitizer and transferring it to an arbitrary host computer. Comput Appl Biosci. 1989 Feb;5(1):41–46. doi: 10.1093/bioinformatics/5.1.41. [DOI] [PubMed] [Google Scholar]
  25. Staden R. A computer program to enter DNA gel reading data into a computer. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):499–503. doi: 10.1093/nar/12.1part2.499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505–519. doi: 10.1093/nar/12.1part2.505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Staden R. The current status and portability of our sequence handling software. Nucleic Acids Res. 1986 Jan 10;14(1):217–231. doi: 10.1093/nar/14.1.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. States D. J., Botstein D. Molecular sequence accuracy and the analysis of protein coding regions. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5518–5522. doi: 10.1073/pnas.88.13.5518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Tran-Betcke A., Behrens B., Noyer-Weidner M., Trautner T. A. DNA methyltransferase genes of Bacillus subtilis phages: comparison of their nucleotide sequences. Gene. 1986;42(1):89–96. doi: 10.1016/0378-1119(86)90153-8. [DOI] [PubMed] [Google Scholar]
  30. Welch R. A., Pellett S. Transcriptional organization of the Escherichia coli hemolysin genes. J Bacteriol. 1988 Apr;170(4):1622–1630. doi: 10.1128/jb.170.4.1622-1630.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. West J. Automated sequence reading and analysis. Nucleic Acids Res. 1988 Mar 11;16(5):1847–1856. doi: 10.1093/nar/16.5.1847. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES