Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2001 Jan 1;29(1):1–10. doi: 10.1093/nar/29.1.1

The Molecular Biology Database Collection: an updated compilation of biological database resources

Andreas D Baxevanis 1,a
PMCID: PMC29860  PMID: 11125037

Abstract

The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists’ attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries of each of the databases included in the Collection are available through the Nucleic Acids Research Web site, at http://www.nar.oupjournals.org.

With the advent of the new millennium, the scientific community marked a significant milestone in the study of biology—the completion of the ‘working draft’ of the human genome (1). Amongst much fanfare, the completion of the working draft was announced by President Clinton at a White House ceremony on June 26, 2000 (http://www.whitehouse.gov/WH/New/html/20000626.html). This announcement signaled that the majority of biological and biomedical research would now be conducted in a ‘sequence-based’ fashion. This new approach, long-awaited and much-debated, promises to quickly lead to advances not just in the understanding of basic biological processes, but in the prevention, diagnosis and treatment of many genetic and genomic disorders. While the fruits of sequencing the human genome may not be known or appreciated for another hundred years, the implications to the basic way in which medicine will be practised in the future is staggering.

At the time of writing of this paper, the International Human Genome Sequencing Consortium had fully finished 24.7% of the human sequence, with another 66.2% of the sequence being available in draft form. In the course of this sequencing, two of the human chromosomes have been finished, namely chromosomes 21 and 22 (2,3). Even with most of the chromosomes incomplete, some interesting insights have already been made into the structure of the human genome, such as a decided down-estimate in the number of genes actually in the human genome. While most of the attention of the scientific community and the public at large has focused on the human sequence, a number of model organisms have also been sequenced, including that of the fruit fly (Drosophila melanogaster) in 2000 (4); the complete genomes of organisms such as the rat and the mouse will quickly follow over the next several years. Efforts are also focused on sequence variation, with the SNP Consortium anticipating the identification of a million single nucleotide polymorphisms (SNPs) by the end of 2000, far ahead of the initial goal of discovering 100 000 SNPs by 2003 (1).

Database efforts have kept pace with the furious rate at which this sequence data is being generated, providing investigators access to all public data in a practically instantaneous fashion (5). While most biologists are familiar with the databases comprising the International Nucleotide Sequence Database Collaboration (DDBJ, EMBL and GenBank), numerous other specialized databases have emerged. These specialized databases often arise out of a particular need, whether it be to address a particular biological question of interest or to better serve a particular segment of the biological community. This journal has devoted its first issue over the last several years to documenting the availability and features of these specialized databases in order to better serve its readership and to promote the use of these resources in the design and analysis of experiments. These reviewed databases are collectively listed in the Molecular Biology Database Collection.

The databases included in the current version of the Collection are shown in Table 1. This year, 55 new entries have been added, bringing the total number of databases listed to 281. While this number may seem large for a ‘curated collection’, these databases distinguish themselves by their approach to presenting the underlying data–for example, by adding new value to the underlying data by virtue of curation, by providing new types of data connections or by implementing other innovative approaches facilitating biological discovery. The individual entries are classified by type, but the reader should recognize that the distinctions between these classes are often arbitrary, and that many of these databases provide more than one type of information to the user.

Table 1. Molecular Biology Database Collection.

Major Sequence Repositories    
DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration
EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl.html All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration
GenBank http://www.ncbi.nlm.nih.gov/ All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration
Genome Sequence Database (GSDB) http://www.ncgr.org/research/sequence/ All known nucleotide and protein sequences
STACK http://www.sanbi.ac.za/Dbases.html Non-redundant, gene-oriented clusters
TIGR Gene Indices http://www.tigr.org/tdb/index.html Non-redundant, gene-oriented clusters
UniGene http://www.ncbi.nlm.nih.gov/UniGene/ Non-redundant, gene-oriented clusters
     
Comparative Genomics    
Clusters of Orthologous Groups (COG) http://www.ncbi.nlm.nih.gov/COG/ Phylogenetic classification of proteins from 21 complete genomes
XREFdb http://www.ncbi.nlm.nih.gov/XREFdb/ Cross-referencing of model organism genetics with mammalian phenotypes
     
Gene Expression    
ASDB http://cbcg.nersc.gov/asdb Protein products and expression patterns of alternatively-spliced genes
Axeldb http://www.dkfz-heidelberg.de/abt0135/axeldb.htm Gene expression in Xenopus
BodyMap http://bodymap.ims.u-tokyo.ac.jp/ Human and mouse gene expression data
EpoDB http://www.cbil.upenn.edu/epodb/ Genes expressed in vertebrate RBC
FlyView http://pbio07.uni-muenster.de/ Drosophila development and genetics
Gene Expression Database (GXD) http://www.informatics.jax.org/searches/gxdindex_form.shtml Mouse gene expression and genomics
Interferon Stimulated Gene Database http://www.lerner.ccf.org/labs/williams/xchip-html.cgi Genes induced by treatment with interferons
Kidney Development Database http://www.ana.ed.ac.uk/anatomy/database/kidbase/kidhome.html Kidney development and gene expression
MAGEST http://star.scl.kyoto-u.ac.jp/magest/ Ascidian (Halocynthia roretzi) gene expression patterns
MethDB http://www.methdb.de DNA methylation data, patterns and profiles
Mouse Atlas and Gene Expression Database http://genex.hgu.mrc.ac.uk Spatially-mapped gene expression data
PEDB http://chroma.mbt.washington.edu/PEDB/ Normal and aberrant prostate gene expression
RECODE http://recode.genetics.utah.edu Genes using programmed translational recoding in their expression
Stanford Microarray Database http://genome-www.stanford.edu/microarray Raw and normalized data from microarray experiments
TRIPLES http://ygac.med.yale.edu/triples/triples.htm TRansposon-Insertion Phenotypes, Localization, and Expression in Saccharomyces
Tooth Development Database http://bite-it.helsinki.fi/ Gene expression in dental tissue
     
Gene Identification and Structure    
AllGenes http://www.allgenes.org Human and mouse gene index integrating gene, transcript and protein annotation
Ares Lab Intron Site http://www.cse.ucsc.edu/research/compbio/yeast_introns.html Yeast spliceosomal introns
AsMamDB http://166.111.30.65/ASMAMDB.html Alternatively-spliced mammalian genes
COMPEL http://compel.bionet.nsc.ru/ Composite regulatory elements
CUTG http://www.kazusa.or.jp/codon/ Codon usage tables
DBTBS http://elmo.ims.u-tokyo.ac.jp/dbtbs/ Bacillus subtilis binding factors and promoters
EID http://mcb.harvard.edu/gilbert/EID/ Protein-coding, intron-containing genes
EPD http://www.epd.isb-sib.ch/ Eukaryotic POL II promoters with experimentally-determined transcription start sites
ExInt http://intron.bic.nus.edu.sg/exint/exint.html Exon-intron structure of eukaryotic genes
HUNT http://www.hri.co.jp/HUNT Annotated human full-length cDNA sequences
IDB/IEDB http://nutmeg.bio.indiana.edu/intron/index.html Intron sequence and evolution
PLACE http://www.dna.affrc.go.jp/htdocs/PLACE Plant cis-acting regulatory elements
PlantCARE http://sphinx.rug.ac.be:8080/PlantCARE/index.htm Plant cis-acting regulatory elements
PromEC http://bioinfo.md.huji.ac.il/marg/promec Escherichia coli mRNA promoters with experimentally identified transcriptional start sites
RRNDB http://rrndb.cme.msu.edu Variation in prokaryotic ribosomal RNA operons
STRBase http://www.cstl.nist.gov/div831/strbase/ Short tandem DNA repeats
SpliceDB http://genomic.sanger.ac.uk/spldb/SpliceDB.html Canonical and non-canonical mammalian splice sites
TRRD http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4 Transcription regulatory regions of eukaryotic genes
TransTerm http://uther.otago.ac.nz/Transterm.html Codon usage, start and stop signals
VIDA http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html Virus genome open reading frames
WormBase http://www.wormbase.org Guide to Caernorhabditis elegans biology
YIDB http://www.EMBL-Heidelberg.DE/ExternalInfo/seraphin/yidb.html Yeast nuclear and mitochondrial intron sequences
rSNP Guide http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/ Single nucleotide polymorphisms in regulatory gene regions
     
Genetic and Physical Maps    
DRESH http://www.tigem.it/LOCAL/drosophila/dros.html Human cDNA clones homologous to Drosophila mutant genes
G3-RH http://www-shgc.stanford.edu/RH/ Stanford G3 and TNG radiation hybrid maps
GB4-RH http://www.sanger.ac.uk/Software/RHserver//Rhserver.shtml Genebridge4 (GB4) human radiation hybrid maps
GDB http://www.gdb.org Human genes and genomic maps
GenAtlas http://www.citi2.fr/GENATLAS/ Human genes, markers and phenotypes
GenMapDB http://genomics.med.upenn.edu/genmapdb Mapped human BAC clones
GeneMap ‘99 http://www.ncbi.nlm.nih.gov/genemap/ International Radiation Mapping Consortium human gene map
HuGeMap http://www.infobiogen.fr/services/Hugemap Human genome genetic and physical map data
IXDB http://ixdb.mpimg-berlin-dahlem.mpg.de Physical maps of human chromosome X
RHdb http://www.ebi.ac.uk/RHdb Radiation hybrid map data
Radiation Hybrid Database http://www.ebi.ac.uk/RHdb Radiation hybrid map data
     
Genomic Databases    
ACeDB http://www.sanger.ac.uk/Software/Acedb/ C.elegans, Saccharomyces pombe, and human sequences and genomic information
AMmtDB http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB Metazoan mitochondrial DNA sequences
ArkDB http://www.thearkdb.org/genome_mapping.html Genome databases for farm and other animals
Comprehensive Microbial Resource http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl Completed microbial genomes
CropNet http://ukcrop.net/ Genome mapping in crop plants
CyanoBase http://www.kazusa.or.jp/cyano/ Synechocystis sp. genome
EMGlib http://pbil.univ-lyon1.fr/emglib/emglib.html Completely sequenced microbial genomes from bacteria, archaea, yeast
EcoGene http://bmb.med.miami.edu/EcoGene/EcoWeb/ E.coli K-12 sequences
FlyBase http://www.fruitfly.org Drosophila sequences and genomic information
Full-Malaria http://133.11.149.55 Full-length cDNA library from erythrocytic-stage Plasmodium falciparum
GOBASE http://megasun.bch.umontreal.ca/gobase/gobase.html Organelle genome database
GOLD http://igweb.integratedgenomics.com/GOLD/ Information regarding complete and ongoing genome projects
HIV Sequence Database http://hiv-web.lanl.gov/ HIV RNA sequences
Human BAC Ends Database http://www.tigr.org/tdb/humgen/bac_end_search/bac_end_intro.html Non-redundant human BAC end sequences
ICB http://www.mbio.co.jp/icb Identification and classification of bacteria using protein-coding
INE http://rgp.dna.affrc.go.jp/giot/INE.html Rice genetic and physical maps and sequence data
MITOMAP http://www.gen.emory.edu/mitomap.html Human mitochondrial genome
MITOP http://websvr.mips.biochem.mpg.de/proj/medgen/mitop Mitochondrial proteins, genes, and diseases
Medicago Genome Initiative http://www.noble.org/medicago/ Model legume Medicago truncatula ESTs, gene expression and proteomic data
Mendel Database http://jiio6.jic.bbsrc.ac.uk/ Database of plant EST and STS sequences annotated with gene family information
MitBASE http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl Mitochondrial genomes, intra-species variants, and mutants
MitoDat http://www-lecb.ncifcrf.gov/mitoDat/ Mitochondrial proteins (predominantly human)
MitoNuc/MitoAln http://bio-www.ba.cnr.it:8000/srs6/ Nuclear genes coding for mitochondrial proteins
Mouse Genome Database (MGD) http://www.informatics.jax.org Mouse genetics and genomics
Munich Information Center for Protein Sequences (MIPS) http://www.mips.biochem.mpg.de/ Protein and genomic sequences
NRSub http://pbil.univ-lyon1.fr/nrsub/nrsub.html B.subtilis genome
PlasmoDB http://PlasmoDB.org Plasmodium GENOME
RsGDB http://www-mmg.med.uth.tmc.edu/sphaeroides Rhodobacter sphaeroides genome
Saccharomyces Genome Database (SGD) http://genome-www.stanford.edu/Saccharomyces S.cerevisiae genome
TIGR Microbial Database http://www.tigr.org/tdb/mdb/mdbcomplete.html Microbual genomes and chromosomes
The Arabidopsis Information Resource (TAIR) http://www.arabidopsis.org/ Arabidopsis thaliana genome
ZFIN http://www.zfin.org Genetic, genomic and developmental data from zebrafish
ZmDB http://zmdb.iastate.edu/ Maize genome database
     
Intermolecular Interactions    
Biomolecular Interaction Network Database (BIND) http://binddb.org Molecular interactions, complexes and pathways
DIP http://dip.doe-mbi.ucla.edu/ Catalog of protein–protein interactions
DPInteract http://arep.med.harvard.edu/dpinteract/ Binding sites for E.coli DNA-binding proteins
Database of Ribosomal Crosslinks (DRC) http://www.mpimg-berlin-dahlem.mpg.de/~ag_ribo/ag_brimacombe/drc/ Ribosomal crosslinking data
     
Metabolic Pathways and Cellular Regulation    
ENZYME http://www.expasy.ch/enzyme/ Enzyme nomenclature
EcoCyc http://ecocyc.pangeasystems.com/ecocyc/ E.coli K-12 genome, gene products, and metabolic pathways
EpoDB http://www.cbil.upenn.edu/EpoDB/ Genes expressed during human erythropoiesis
FlyNets http://gifts.univ-mrs.fr/FlyNets/FlyNets_home_page.html Drosophila melanogaster molecular interactions
Klotho http://www.ibc.wustl.edu/klotho/ Collection and categorization of biological compounds
Kyoto Encyclopedia of Genes and Genomes (KEGG) http://www.genome.ad.jp/kegg Metabolic and regulatory pathways
LIGAND http://www.genome.ad.jp/dbget/ligand.html Enzymatic ligands, substrates and reactions
RegulonDB http://www.cifn.unam.mx/Computational_Biology/regulondb/ E.coli transcriptional regulation and operon organization
UM-BBD http://www.labmed.umn.edu/umbbd/ Microbial biocatalytic reactions and biodegradation pathways
WIT2 http://wit.mcs.anl.gov/WIT2/ Integrated system for functional curation and development of metabolic models
     
Mutation Databases    
16S and 23S Ribosomal RNA Mutation Databases http://ribosome.fandm.edu 16S and 23S ribosomal RNA mutation database
ALFRED http://alfred.med.yale.edu/alfred/index.asp Allele frequencies and DNA polymorphisms
Androgen Receptor Gene Mutations Database http://www.mcgill.ca/androgendb/ Mutations in the androgen receptor gene
Asthma Gene Database http://cooke.gsf.de/asthmagen/main.cfm Linkage and mutation studies on the genetics of asthma and allergy
Asthma and Allergy Database http://cooke.gsf.de/asthmagen/main.cfm  
Atlas of Genetics and Cytogenetics in Oncology and Haematology http://www.infobiogen.fr/services/chromcancer/ Chromosomal abnormalities in cancer
BTKbase http://www.uta.fi/laitokset/imt/bioinfo/BTKbase/ Mutation registry for X-linked agammaglobulinemia
CASRDB http://data.mch.mcgill.ca/casrdb/ CASR mutations causing FHH, NSHPT and ADH
Cytokine Gene Polymorphism Database http://www.pam.bris.ac.uk/services/GAI/cytokine4.htm Cytokine gene polymorphisms, in vitro expression and disease-association studies
Database of Germline p53 Mutations http://www.lf2.cuni.cz/win/projects/germline_mut_p53.htm Mutations in human tumor and cell line p53 gene
GRAP Mutant Databases http://tinyGRAP.uit.no/GRAP/ Mutants of family A G-Protein Coupled Receptors (GRAP)
HGBASE http://hgbase.cgr.ki.se Intragenic sequence polymorphisms
HIV-RT http://hivdb.stanford.edu/hiv/ HIV reverse transcriptase and protease sequence variation
Haemophila B Mutation Database http://www.umds.ac.uk/molgen/haemBdatabase.htm Point mutations, short additions and deletions in the Factor IX gene
Human Gene Mutation Database (HGMD) http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html Known (published) gene lesions underlying human inherited disease
Human PAX2 Allelic Variant Database http://www.hgu.mrc.ac.uk/Softdata/PAX2/ Mutations in human PAX2 gene
Human PAX6 Allelic Variant Database http://www.hgu.mrc.ac.uk/Softdata/PAX6/ Mutations in human PAX6 gene
Human Type I and Type III Collagen Mutation Database http://www.le.ac.uk/genetics/collagen/ Human type I and type III collagen gene mutations
HvrBase http://db.eva.mpg.de/Hvrbase/ Primate mtDNA control region sequences
KMDB http://mutview.dmb.med.keio.ac.jp/mutview3/kmeyedb/index.html Mutations in human eye disease genes
KinMutBase http://www.uta.fi/imt/bioinfo/KinMutBase/ Disease-causing protein kinase mutations
MmtDB http://www.ba.cnr.it/areamt08/MmtDBWWW.htm Mutations and polymorphisms in metazoan mitochondrial DNA sequences
Mutation Spectra Database http://info.med.yale.edu/mutbase/ Mutations in viral, bacterial, yeast and mammalian genes
NCL Mutations http://www.ucl.ac.uk/ncl/ Mutations and polymorphisms in neuronal ceroid lipofuscinoses (NCL) genes
Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/Omim/ Catalog of human genetic and genomic disorders
PAHdb http://www.mcgill.ca/pahdb/ Mutations at the phenylalanine hydroxylase locus
PHEXdb http://data.mch.mcgill.ca/phexdb Mutations in PHEX gene causing X-linked hypophosphatemia
PMD http://pmd.ddbj.nig.ac.jp/ Compilation of protein mutant data
PTCH1 Mutation Database http://www.cybergene.se/PTCH/ptchbase.html Mutations and SNPs found in PTCH1
RB1 Gene Mutation Database http://www.d-lohmann.de/Rb/ Mutations in the human retinoblastoma (RB1) gene
Ribosomal RNA Mutational Database http://ribosome.fandm.edu/ 16S and 23S ribosomal RNA mutation database
SV40 Large T-Antigen Mutant Database http://bigdaddy.bio.pitt.edu/SV40/ Mutations in SV40 large tumor antigen gene
dbSNP http://www.ncbi.nlm.nih.gov/SNP/ Single nucleotide polymorphisms
iARC p53 Database http://www.iarc.fr/p53/ Missense mutations and small deletions in human p53 reported in peer-reviewed literature
p53 Databases http://metalab.unc.edu/dnam/mainpage.html Mutations at the human p53 and hprt genes; rodent transgenic lacI and lacZ mutations
     
Pathology    
FIMM http://sdmc.krdl.org.sg:8080/fimm/ Functional molecular immunology data
HCForum http://hcforum.imag.fr/welcome_eng.html Human cytogenetics database
Mouse Tumor Biology Database (MTB) http://tumor.informatics.jax.org Mouse tumor names, classification, incidence, pathology, genetic factors
Oral Cancer Gene Database http://www.tumor-gene.org/Oral/oral.html Cellular, molecular and biological data for genes involved in oral cancer
PEDB http://chroma.mbt.washington.edu/PEDB/ Sequences from prostate tissue and cell type-specific cDNA libraries
Tumor Gene Family Databases (TGDBs) http://www.tumor-gene.org/tgdf.html Cellular, molecular, and biological data about genes involved in various cancers
     
Protein Databases    
AARSDB http://rose.man.poznan.pl/aars/index.html Aminoacyl-tRNA synthetase sequences
ABCdb http://ir2lcb.cnrs-mrs.fr/ABCdb/ ABC transporters
DAtA http://luggagefast.Stanford.EDU/group/arabprotein/ Annotated coding sequences from Arabidopsis
DExH/D Family Database http://www.columbia.edu/~ej67/dbhome.htm DEAD-box, DEAH-box and DExH-box proteins
ESTHER http://www.ensam.inra.fr/cholinesterase/ Esterases and alpha/beta hydrolase enzymes and relatives
Endogenous GPCR List http://www.biomedcomp.com/GPCR.html G protein-coupled receptors; expression in cell lines
FUNPEP http://www.gpcr.org/FUNPEP/db Low-complexity or compositionally-biased protein sequences
GPCRDB http://swift.embl-heidelberg.de/7tm/ G protein-coupled receptors
GenProtEC http://genprotec.mbl.edu Escherichia coli K-12 genome, gene products and homologs
HIV Molecular Immunology Database http://hiv-web.lanl.gov/immunology/ HIV epitopes
HUGE http://www.kazusa.or.jp/huge/ Large (50 kDa) human proteins and cDNA sequences
Histone Database http://genome.nhgri.nih.gov/histones/ Histone and histone fold sequences and structures
Homeobox Page http://copan.bioz.unibas.ch/homeo.html Information relevant to homeobox proteins, classification and evolution
Homeodomain Resource http://genome.nhgri.nih.gov/homeodomain Homeodomain sequences, structures, and related genetic and genomic information
IMGT http://imgt.cines.fr:8104/ Immunoglobulin, T cell receptor and MHC sequences from human and other vertebrates
IMGT/HLA http://www.ebi.ac.uk/imgt/hla/ Human major histocompatibility complexes
InBase http://www.neb.com/neb/inteins.html Intervening protein sequences (inteins) and motifs
Kabat Database http://immuno.bme.nwu.edu/ Sequences of proteins of immunological interest
LGICdb http://www.pasteur.fr/recherche/banques/LGIC/LGIC.html Ligand-gated ion channel subunit sequences
MEROPS http://www.merops.co.uk Proteolytic enzymes (proteases/peptidases)
MHCPEP http://wehih.wehi.edu.au/mhcpep/ MHC-binding peptides
Membrane Protein Database http://biophys.bio.tuat.ac.jp/ohshima/database/ Membrane protein sequences, transmembrane regions and structures
MetaFam http://metafam.ahc.umn.edu/ Integrated protein family information
Nuclear Receptor Resource http://nrr.georgetown.edu/nrr/nrr.html Nuclear receptor superfamily
Olfactory Receptor Database http://ycmi.med.yale.edu/senselab/ordb/ Sequences for olfactory receptor-like molecules
PKR http://pkr.sdsc.edu Protein kinase sequences, enzymology, genetics, and molecular and structural properties
PPMdb http://sphinx.rug.ac.be:8080/ppmdb/index.html Arabidopsis plasma membrane protein sequence and expression data
PROMISE http://bioinf.leeds.ac.uk/promise/ Prosthetic centers and metal ions in protein active sites
Peptaibol http://www.cryst.bbk.ac.uk/peptaibol/welcome.html Peptaibol (antibiotic peptide) sequences
PhosphoBase http://www.cbs.dtu.dk/databases/PhosphoBase/ Protein phosphorylation sites
PlantsP http://PlantsP.sdsc.edu Plant protein kinases and protein phosphatases
Prolysis http://delphi.phys.univ-tours.fr/Prolysis/ Proteases and natural and synthetic protease inhibitors
Protein Information Resource (PIR) http://pir.georgetown.edu Comprehensive, annotated, non-redundant protein sequence database
Ribonuclease P Database http://www.mbio.ncsu.edu/RNaseP/home.html RNase P sequences, alignments and structures
SENTRA http://wit.mcs.anl.gov/WIT2/Sentra/HTML/sentra.html Sensory signal transduction proteins
SWISS-PROT/TrEMBL http://www.expasy.ch/sprot Curated protein sequences
TIGRFAMs http://www.tigr.org/TIGRFAMs Protein family resource for the functional identification of proteins
TRANSFAC http://transfac.gbf.de/TRANSFAC/index.html Transcription factors and binding sites
Wnt Database http://www.stanford.edu/~rnusse/wntwindow.html Wnt proteins and phenotypes
ooTFD http://www.ifti.org/ Transcription factors and gene expression
trEST, trGEN and Hits http://hits.isb-sib.ch Predicted protein sequences
     
Protein Sequence Motifs    
BLOCKS http://blocks.fhcrc.org/ Conserved sequence regions of protein families
CluSTr http://www.ebi.ac.uk/clustr/ Automatic classification of SWISS-PROT+TrEMBL proteins into related groups
InterPro http://www.ebi.ac.uk/interpro/ Integrated documentation resource for protein families, domains and sites
O-GLYCBASE http://www.cbs.dtu.dk/databases/OGLYCBASE/ Glycoproteins and O-linked glycosylation sites
PIR-ALN http://www-nbrf.georgetown.edu/pirwww/dbinfo/piraln.html Protein sequence alignments
PRINTS http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ Hierarchical gene family fingerprints
PROSITE http://www.expasy.ch/prosite/ Biologically-significant protein patterns and profiles
Pfam http://www.sanger.ac.uk/Software/Pfam/ Multiple sequence alignments and hidden Markov models of common protein domains
ProClass http://pir.georgeown.edu/gfserver/proclass.html Protein families defined by PIR superfamilies and PROSITE patterns
ProDom http://www.toulouse.inra.fr/prodom.html Protein domain families
ProtoMap http://www.protomap.cs.huji.ac.il/ Automated hierarchical classification of SWISS- PROT proteins
SBASE http://www3.icgeb.trieste.it/~sbasesrv/ Annotated protein domain sequences
SMART http://smart.embl-heidelberg.de/ Signaling domain sequences
SYSTERS http://www.dkfz-heidelberg.de/tbi/services/cluster/systersform Classification of protein sequences into disjoint clusters with annotations from various other resources
eMOTIF http://motif.stanford.edu/emotif Protein sequence motif determination and searches
iPROCLASS http://pir.georgetown.edu/iproclass/ Annotated protein classification database
     
Proteome Resources    
AAindex http://www.genome.ad.jp/dbget/ Physicochemical properties of peptides
Proteome Analysis Database http://www.ebi.ac.uk/proteome/ Online application of interpro and clustr for the functional classification of proteins in whole genomes
REBASE http://rebase.neb.com/rebase/rebase.html Restriction enzymes and associated methylases
SWISS-2DPAGE http://www.expasy.ch/ch2d/ Annotated two-dimensional polyacrylamide gel electrophoresis database
Yeast Proteome Database (YPD) http://www.proteome.com/databases/index.html S.cerevisiae proteome
     
RNA Sequences    
5S Ribosomal RNA Database http://biobases.ibch.poznan.pl/5SData/ 5S rRNA sequences
ACTIVITY http://wwwmgs.bionet.nsc.ru/mgs/systems/activity/ Functional DNA/RNA site activity
ARED http://rc.kfshrc.edu.sa AU-rich element-containing mRNAs
Collection of mRNA-like Noncoding RNAs http://biobases.ibch.poznan.pl/ncRNA/ Non-protein-coding RNA transcripts
European Large Subunit Ribosomal RNA Database http://rrna.uia.ac.be/lsu/index.html Alignment of large subunit ribosomal RNA sequences with secondary structure information
European Small Subunit Ribosomal RNA Database http://rrna.uia.ac.be/ssu/index.html Alignment of small subunit ribosomal RNA sequences with secondary structure information
Guide RNA Database http://www.biochem.mpg.de/~goeringe/ Guide RNA sequences
HyPaLib http://bibiserv.techfak.uni-bielefeld.de/HyPa/ Structural elements characteristic for classes of RNA
Intronerator http://www.cse.ucsc.edu/~kent/intronerator/ RNA splicing and gene structure in C.elegans; alignments of Caernorhabditis briggsae and C.elegans genomic sequences
Non-Canonical Interactions in RNA http://prion.bchs.uh.edu/bp_type/ Non-standard base-base interactions in known RNA structures
PLMItRNA http://bigarea.area.ba.cnr.it:8000/PLMItRNA/ Mitochondrial tRNA genes and molecules in photosynthetic eukaryotes
Pseudobase http://wwwbio.leidenuniv.nl/~Batenburg/PKB.html Information on RNA pseudoknots
RISCC http://ulises.umh.es/RISSC Ribosomal 16S-23S RNA gene spacer regions
RNA Modification Database http://medlib.med.utah.edu/RNAmods/ Naturally modified nucleosides in RNA
Ribosomal Database Project (RDP) http://rdp.cme.msu.edu/ rRNA sequences, alignments and phylogenies
SELEXdb http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/ Selected DNA/RNA functional site sequences
SRPDB http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html Signal recognition particle RNA, protein and receptor sequences
Small RNA Database http://mbcr.bcm.tmc.edu/smallRNA Direct sequencing of small RNA sequences from prokaryotes and eukaryotes
The tmRNA Website http://www.indiana.edu/~tmrna tmRNA sequences, foldings and alignments
UTRdb/UTRsite http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRHome/ 5′’ and 3′’ UTRs of eukaryotic mRNAs and relevant functional patterns
Viroids and viroid-like RNAs http://nt.ars-grin.gov/subviral/ Viroids and viroid-like RNAs
Yeast snoRNA Database http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.html Yeast small nucleolar RNA
tRNA Sequences http://www.uni-bayreuth.de/departments/biochemie/trna/ TRNA and tRNA gene sequences
tmRDB http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html TmRNA (10Sa RNA) sequences
     
Retrieval Systems and Database Structure    
KEYnet http://www.ba.cnr.it/keynet.html Hierarchical list of gene and protein names for data retrieval
TESS http://www.cbil.upenn.edu/tess Transcription element search system
Virgil http://www.infobiogen.fr/services/virgil Database interconnectivity
     
Structure    
ASTRAL http://astral.stanford.edu/ Sequences of domains of known structure, selected subsets and sequence-structure correspondences
BioImage http://www-embl.bioimage.org/ Searchable database of multidimensional biological images
BioMagResBank http://www.bmrb.wisc.edu/ NMR spectroscopic data from proteins, peptides and nucleic acids
CATH http://www.biochem.ucl.ac.uk/bsm/cath/ Hierarchical classification of protein domain structures
CE http://cl.sdsc.edu/ce.html CE: A Resource to Compute and Review 3-D Protein Structure Alignments
CKAAPs DB http://cl.sdsc.edu/ckaap Structurally-similar proteins with dissimilar sequences
CSD http://www.ccdc.cam.ac.uk/prods/csd/csd.html Crystal structure information for organic and metal organic compounds
Database of Macromolecular Movements http://bioinfo.mbb.yale.edu/MolMovDB/ Descriptions of protein and macromolecular motions, including movies
Decoys ‘R’ Us http://dd.stanford.edu/ Computer-generated protein conformations based on sequence data
HIC-Up http://alpha2.bmc.uu.se/hicup/ Structures of small molecules (hetero-compounds)
HSSP http://www.sander.ebi.ac.uk/hssp/ Structural families and alignments; structurally-conserved regions and domain architecture
IMB Jena Image Library of Biological Macromolecules http://www.imb-jena.de/IMAGE.html Visualization and analysis of three-dimensional biopolymer structures
ISSD http://www.protein.bio.msu.su/issd/ Integrated sequence and structural information
LPFC http://www-smi.stanford.edu/projects/helix/LPFC/ Library of protein family core structures
MMDB http://www.ncbi.nlm.nih.gov/Structure/ All experimentally-determined three-dimensional structures, linked to NCBI Entrez
ModBase http://pipe.rockefeller.edu/modbase Annotated comparative protein structure models
NDB http://ndbserver.rutgers.edu/NDB/ndb.html Nucleic acid-containing structures
NTDB http://ntdb.chem.cuhk.edu.hk Thermodynamic data for nucleic acids
PALI http://pauling.mbu.iisc.ernet.in/~pali Phylogeny and alignment of homologous protein structures
PDB http://www.rcsb.org/pdb/ Structure data determined by X-ray crystallography and NMR
PDB-REPRDB http://www.rwcp.or.jp/papia/ Representative protein chains, based on PDB entries
PDBsum http://www.biochem.ucl.ac.uk/bsm/pdbsum Summaries and analyses of PDB structures
PRESAGE http://presage.berkeley.edu/ Protein structures with experimental and predictive annotations
ProTherm http://www.rtc.riken.go.jp/jouhou/protherm/protherm.html Thermodynamic data for wild-type and mutant proteins
RESID http://www-nbrf.georgetown.edu/pirwww/dbinfo/resid.html Protein structure modifications
SCOP http://scop.mrc-lmb.cam.ac.uk/scop/ Familial and structural protein relationships
SLoop http://www-cryst.bioc.cam.ac.uk/~sloop/ Classification of protein loops
     
Transgenics    
Cre Transgenic Database http://www.mshri.on.ca/nagy/cre.htm Cre transgenic mouse lines
Transgenic/Targeted Mutation Database http://tbase.jax.org/ Information on transgenic animals and targeted mutations
     
Varied Biomedical Content    
BAliBASE http://www-igbmc.u-strasbg.fr/BioInfo/BaliBASE2/index.html Benchmark database for comparison of multiple sequence alignments
DBcat http://www.infobiogen.fr/services/dbcat/ Catalog of databases
DrugDB http://pharminfo.com/drugdb/db_mnu.html Pharmacologically-active compounds; generic and trade names
END http://www.ibc.wustl.edu/biognosis/agora_interface/html/agora_entrance.html Enzyme nomenclature
Global Image Database http://www.gwer.ch/qv/gid/gid.htm Annotated biological images
GlycoSuiteDB http://www.glycosuite.com N- and O-linked glycan structures and biological source information
HOX-PRO http://www.mssm.edu/molbio/hoxpro/new/hox-pro00.html Clustering of homeobox genes
Imprinted Genes and Parent-of-Origin Effects http://www.otago.ac.nz/IGC Imprinted genes and parent-of-origin effects in animals
LocusLink/RefSeq http://www.ncbi.nlm.nih.gov/LocusLink/ Curated sequence and descriptive information about genetic loci
MPDB http://www.biotech.ist.unige.it/interlab/mpdb.html Information on synthetic oligonucleotides proven useful as primers or probes
Molecular Probe Database http://srs.ebi.ac.uk/ Synthetic oligonucleotides, probes and PCR primers
NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html Names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequence
PubMed http://www.ncbi.nlm.nih.gov/PubMed/ MEDLINE and Pre-MEDLINE citations
Tree of Life http://phylogeny.arizona.edu/tree/phylogeny.html Information on phylogeny and biodiversity
Vectordb http://vectordb.atcg.com/ Characterization and classification of nucleic acid vectors

In addition to the list presented in this paper, an electronic version of the Database Issue and Collection can be accessed online and is freely available to everyone, regardless of subscription status, at http://www.nar.oupjournals.org. While the list contains the databases described in the papers comprising the current issue, it should be immediately apparent to the reader that there are simply not enough pages in this journal to accommodate full-length, printed descriptions of all 281 of the databases featured here. To address this, the online version of the Collection now includes short summaries of many of the databases, the summaries having been provided directly by the investigators responsible for the individual databases. It is hoped that this approach will provide the reader with an additional source of information that will facilitate finding and selecting the sources of data that would be of most value in addressing a specific biological problem. Contributors will be encouraged to keep their entries up-to-date, as the online descriptions will be updated on a regular basis.

Suggestions for the inclusion of additional database resources in this Collection are encouraged and may be directed to the author (andy@nhgri.nih.gov).

Supplementary Material

Acknowledgments

ACKNOWLEDGEMENT

I wish to thank Yi-Chi Barash for designing the Web-based submission tool for this Collection as well as for her technical support.

References

  • 1.Collins F.S., Patrinos,A., Jordan,E., Chakravarti,A., Gesteland,R., Walters,L. and members of the DOE and NIH Planning Groups (1998) New goals for the U.S. Human Genome Project: 1998–2003. Science, 282, 682–689. [DOI] [PubMed] [Google Scholar]
  • 2.Hattori M., Fujiyama,A., Taylor,T.D., Watanabe,H., Yada,T., Park,H.S., Toyoda,A., Ishii,K., Totoki,Y., Choi,D.K. et al. (2000) The DNA sequence of human chromosome 21. The chromosome 21 mapping and sequencing consortium. Nature, 405, 311–319. [DOI] [PubMed] [Google Scholar]
  • 3.Dunham I, Shimizu,N., Roe,B.A., Chissoe,S., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495. [DOI] [PubMed] [Google Scholar]
  • 4.Adams M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 2185–2195. [DOI] [PubMed] [Google Scholar]
  • 5.Guyer M.S. (1998) Statement on the rapid release of genomic DNA sequence. Genome Res., 8, 413. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES