Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1998 Jan 1;26(1):21–26. doi: 10.1093/nar/26.1.21

The Genome Sequence DataBase (GSDB): improving data quality and data access.

C Harger 1, M Skupski 1, J Bingham 1, A Farmer 1, S Hoisie 1, P Hraber 1, D Kiphart 1, L Krakowski 1, M McLeod 1, J Schwertfeger 1, G Seluja 1, A Siepel 1, G Singh 1, D Stamper 1, P Steadman 1, N Thayer 1, R Thompson 1, P Wargo 1, M Waugh 1, J J Zhuang 1, P A Schad 1
PMCID: PMC147232  PMID: 9399793

Abstract

In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www. ncgr.org/gsdb ) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies.

Full Text

The Full Text of this article is available as a PDF (172.8 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Blattner F. R., Plunkett G., 3rd, Bloch C. A., Perna N. T., Burland V., Riley M., Collado-Vides J., Glasner J. D., Rode C. K., Mayhew G. F. The complete genome sequence of Escherichia coli K-12. Science. 1997 Sep 5;277(5331):1453–1462. doi: 10.1126/science.277.5331.1453. [DOI] [PubMed] [Google Scholar]
  3. Bork P., Bairoch A. Go hunting in sequence databases but watch out for the traps. Trends Genet. 1996 Oct;12(10):425–427. doi: 10.1016/0168-9525(96)60040-7. [DOI] [PubMed] [Google Scholar]
  4. Burland V., Plunkett G., 3rd, Sofia H. J., Daniels D. L., Blattner F. R. Analysis of the Escherichia coli genome VI: DNA sequence of the region from 92.8 through 100 minutes. Nucleic Acids Res. 1995 Jun 25;23(12):2105–2119. doi: 10.1093/nar/23.12.2105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Devine K. M., Wolfe K. Bacterial genomes: a TIGR in the tank. Trends Genet. 1995 Nov;11(11):429–431. doi: 10.1016/s0168-9525(00)89138-6. [DOI] [PubMed] [Google Scholar]
  6. Fraser C. M., Gocayne J. D., White O., Adams M. D., Clayton R. A., Fleischmann R. D., Bult C. J., Kerlavage A. R., Sutton G., Kelley J. M. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. doi: 10.1126/science.270.5235.397. [DOI] [PubMed] [Google Scholar]
  7. Harger C., Skupski M., Allen E., Clark C., Crowley D., Dickinson E., Easley D., Espinosa-Lujan A., Farmer A., Fields C. The Genome Sequence DataBase version 1.0 (GSDB): from low pass sequences to complete genomes. Nucleic Acids Res. 1997 Jan 1;25(1):18–23. doi: 10.1093/nar/25.1.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Keen G., Burton J., Crowley D., Dickinson E., Espinosa-Lujan A., Franks E., Harger C., Manning M., March S., McLeod M. The Genome Sequence DataBase (GSDB): meeting the challenge of genomic sequencing. Nucleic Acids Res. 1996 Jan 1;24(1):13–16. doi: 10.1093/nar/24.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Koonin E. V., Mushegian A. R., Galperin M. Y., Walker D. R. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol. 1997 Aug;25(4):619–637. doi: 10.1046/j.1365-2958.1997.4821861.x. [DOI] [PubMed] [Google Scholar]
  10. Koonin E. V., Mushegian A. R., Rudd K. E. Sequencing and analysis of bacterial genomes. Curr Biol. 1996 Apr 1;6(4):404–416. doi: 10.1016/s0960-9822(02)00508-0. [DOI] [PubMed] [Google Scholar]
  11. Kristensen T., Lopez R., Prydz H. An estimate of the sequencing error frequency in the DNA sequence databases. DNA Seq. 1992;2(6):343–346. doi: 10.3109/10425179209020815. [DOI] [PubMed] [Google Scholar]
  12. Kunst F., Vassarotti A., Danchin A. Organization of the European Bacillus subtilis genome sequencing project. Microbiology. 1995 Feb;141(Pt 2):249–255. doi: 10.1099/13500872-141-2-249. [DOI] [PubMed] [Google Scholar]
  13. Laan M., Kallioniemi O. P., Hellsten E., Alitalo K., Peltonen L., Palotie A. Mechanically stretched chromosomes as targets for high-resolution FISH mapping. Genome Res. 1995 Aug;5(1):13–20. doi: 10.1101/gr.5.1.13. [DOI] [PubMed] [Google Scholar]
  14. Lamperti E. D., Kittelberger J. M., Smith T. F., Villa-Komaroff L. Corruption of genomic databases with anomalous sequence. Nucleic Acids Res. 1992 Jun 11;20(11):2741–2747. doi: 10.1093/nar/20.11.2741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lopez R., Kristensen T., Prydz H. Database contamination. Nature. 1992 Jan 16;355(6357):211–211. doi: 10.1038/355211a0. [DOI] [PubMed] [Google Scholar]
  16. Maymó-Gatell X., Chien Y., Gossett J. M., Zinder S. H. Isolation of a bacterium that reductively dechlorinates tetrachloroethene to ethene. Science. 1997 Jun 6;276(5318):1568–1571. doi: 10.1126/science.276.5318.1568. [DOI] [PubMed] [Google Scholar]
  17. Ogasawara N., Fujita Y., Kobayashi Y., Sadaie Y., Tanaka T., Takahashi H., Yamane K., Yoshikawa H. Systematic sequencing of the Bacillus subtilis genome: progress report of the Japanese group. Microbiology. 1995 Feb;141(Pt 2):257–259. doi: 10.1099/13500872-141-2-257. [DOI] [PubMed] [Google Scholar]
  18. Pearson W. R. Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol. 1994;24:307–331. doi: 10.1385/0-89603-246-9:307. [DOI] [PubMed] [Google Scholar]
  19. Reynolds T. L. Technical report. Vector DNA artifacts in the nucleotide sequence database. Biotechniques. 1994 Jun;16(6):1124–1125. [PubMed] [Google Scholar]
  20. Rothberg J. M. Gene patents. Nature. 1992 Apr 30;356(6372):738–738. doi: 10.1038/356738d0. [DOI] [PubMed] [Google Scholar]
  21. Savakis C., Doelz R. Contamination of cDNA sequences in databases. Science. 1993 Mar 19;259(5102):1677–1678. doi: 10.1126/science.8456288. [DOI] [PubMed] [Google Scholar]
  22. Singh G. B., Kramer J. A., Krawetz S. A. Mathematical model to predict regions of chromatin attachment to the nuclear matrix. Nucleic Acids Res. 1997 Apr 1;25(7):1419–1425. doi: 10.1093/nar/25.7.1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Smith R. F., Wiese B. A., Wojzynski M. K., Davison D. B., Worley K. C. BCM Search Launcher--an integrated interface to molecular biology data base search and analysis services available on the World Wide Web. Genome Res. 1996 May;6(5):454–462. doi: 10.1101/gr.6.5.454. [DOI] [PubMed] [Google Scholar]
  24. Tanksley S. D., McCouch S. R. Seed banks and molecular maps: unlocking genetic potential from the wild. Science. 1997 Aug 22;277(5329):1063–1066. doi: 10.1126/science.277.5329.1063. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES