Abstract
The DBcat (http://www.infobiogen.fr/services/dbcat ) is a comprehensive catalog of biological databases, maintained and curated at Infobiogen. It contains 500 databases classified by application domains. The DBcat is a structured flat-file library, that can be searched by means of an SRS server or a dedicated Web interface. The files are available for download from Infobiogen anonymous ftp server.
INTRODUCTION
Identifying sources of information is crucial in today’s fast moving world of Biology: not only does it allow researchers to find information rapidly for their experiments, but it also allows biologists a place (and sometimes the only one) where their results can be published and for science politicians who fund (or not) databases, a means of impacting and analyzing the directions of research (1). In Biology, as in other sciences, information is disseminated in various heterogeneous databases and this situation is unlikely to change with the announced deluge of data from the genomic and proteomic projects (2).
We developed the DBcat, a comprehensive catalog of 500 biological databases, to help all kind of biological database users to identify the information they are seeking.
ORGANIZATION
DBcat has a simple data model implemented under the form of database entries in a flat-file. All entries have the same structure that consists of a list of fields and their associated values: the main fields are the database name (NAME), its description (DESCRIPTION), its domain (DNA, RNA, Protein, Literature...), authors and contacts (AUTHOR, CONTACT) and the WWW address (URL-FTP) where the database can be browsed. As an example, Figure 1 shows the entry corresponding to the DBcat database itself.
DATA ACQUISITION
The DBcat was started at Généthon in 1994 as part of a technological survey that also gave birth to a catalog of software for molecular biology and genetics: the BioCatalog (http://www. ebi.ac.uk/biocat/ ) (3). The DBcat is now produced at Infobiogen. New databases are searched in the Web, either by means of general purpose Web search engines or biology-oriented Web sites. Journals, such as the Nucleic Acids Research Database Issue, are also consulted. The producers of the database are asked, via Email, to complete a form and to check their entries in DBcat. They can also use a Web form for spontaneous submissions (http://www.infobiogen.fr/services/dbcat/file/dbcat_form.html ). If the author has validated the entry corresponding to its database, it is marked as CHECKED.
ACCESS
The DBcat contains 500 database entries, available in one flat-file. To reflect the areas of interest of the users, the database entries are also grouped into eight application domains: DNA, RNA, Protein, Genomics, Mapping, Protein structure, Literature, Miscellaneous. The number of databases listed in each domain is given in Table 1. Note that databases may belong to several domains, and only the first one is taken into account in the statistics.
Table 1. Statistics of the DBcat entries per application domains.
Domain | No. of records |
---|---|
DNA | 82 |
RNA | 29 |
Protein | 93 |
Genomic | 58 |
Mapping | 30 |
Protein structure | 18 |
Literature | 39 |
Miscellaneous | 151 |
Total | 500 |
The DBcat provides the users with a variety of modes of access:
• Download the flat-files: ftp://ftp.infobiogen.fr/pub/db/dbcat
• Web interface homepage with a simple query by name interface: http://www.infobiogen.fr/services/dbcat/
• SRS server: http://www.infobiogen.fr/srs/
CONCLUSIONS
Recently, a survey conducted by Ellis and Kalumbi (1) based on the DBcat concluded that two-thirds of biological databases were ‘facing uncertain funding for a very near future’ and many interesting and free biological databases were ‘on the verge of financial collapse’. This survey shows the utility of the DBcat in identifying the existing sources of information.
We invite database managers to submit any new database to DBcat. Updates and correction are also encouraged.
Acknowledgments
ACKNOWLEDGEMENTS
The authors wish to thank Patricia Rodriguez-Tomé and the GREG for supporting the first version of DBcat. We also wish to thank the authors/curators of the 500 databases listed in this catalog for providing their work and expertise to the community.
REFERENCES
- 1.Ellis L. and Kalumbi,D. (1998) Nature Biotechnol., 16, 1323–1324. [DOI] [PubMed] [Google Scholar]
- 2.Reichhardt T. (1999) Nature, 399, 517–520. [DOI] [PubMed] [Google Scholar]
- 3.Rodriguez-Tomé P. (1998) Bioinformatics, 14, 469–470. [DOI] [PubMed] [Google Scholar]