Abstract
The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists’ attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries of each of the databases included in the Collection are available through the Nucleic Acids Research Web site, at http://www.nar.oupjournals.org.
With the advent of the new millennium, the scientific community marked a significant milestone in the study of biology—the completion of the ‘working draft’ of the human genome (1). Amongst much fanfare, the completion of the working draft was announced by President Clinton at a White House ceremony on June 26, 2000 (http://www.whitehouse.gov/WH/New/html/20000626.html). This announcement signaled that the majority of biological and biomedical research would now be conducted in a ‘sequence-based’ fashion. This new approach, long-awaited and much-debated, promises to quickly lead to advances not just in the understanding of basic biological processes, but in the prevention, diagnosis and treatment of many genetic and genomic disorders. While the fruits of sequencing the human genome may not be known or appreciated for another hundred years, the implications to the basic way in which medicine will be practised in the future is staggering.
At the time of writing of this paper, the International Human Genome Sequencing Consortium had fully finished 24.7% of the human sequence, with another 66.2% of the sequence being available in draft form. In the course of this sequencing, two of the human chromosomes have been finished, namely chromosomes 21 and 22 (2,3). Even with most of the chromosomes incomplete, some interesting insights have already been made into the structure of the human genome, such as a decided down-estimate in the number of genes actually in the human genome. While most of the attention of the scientific community and the public at large has focused on the human sequence, a number of model organisms have also been sequenced, including that of the fruit fly (Drosophila melanogaster) in 2000 (4); the complete genomes of organisms such as the rat and the mouse will quickly follow over the next several years. Efforts are also focused on sequence variation, with the SNP Consortium anticipating the identification of a million single nucleotide polymorphisms (SNPs) by the end of 2000, far ahead of the initial goal of discovering 100 000 SNPs by 2003 (1).
Database efforts have kept pace with the furious rate at which this sequence data is being generated, providing investigators access to all public data in a practically instantaneous fashion (5). While most biologists are familiar with the databases comprising the International Nucleotide Sequence Database Collaboration (DDBJ, EMBL and GenBank), numerous other specialized databases have emerged. These specialized databases often arise out of a particular need, whether it be to address a particular biological question of interest or to better serve a particular segment of the biological community. This journal has devoted its first issue over the last several years to documenting the availability and features of these specialized databases in order to better serve its readership and to promote the use of these resources in the design and analysis of experiments. These reviewed databases are collectively listed in the Molecular Biology Database Collection.
The databases included in the current version of the Collection are shown in Table 1. This year, 55 new entries have been added, bringing the total number of databases listed to 281. While this number may seem large for a ‘curated collection’, these databases distinguish themselves by their approach to presenting the underlying data–for example, by adding new value to the underlying data by virtue of curation, by providing new types of data connections or by implementing other innovative approaches facilitating biological discovery. The individual entries are classified by type, but the reader should recognize that the distinctions between these classes are often arbitrary, and that many of these databases provide more than one type of information to the user.
Table 1. Molecular Biology Database Collection.
Major Sequence Repositories | ||
DNA Data Bank of Japan (DDBJ) | http://www.ddbj.nig.ac.jp | All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration |
EMBL Nucleotide Sequence Database | http://www.ebi.ac.uk/embl.html | All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration |
GenBank | http://www.ncbi.nlm.nih.gov/ | All known nucleotide and protein sequences; International Nucleotide Sequence Database Collaboration |
Genome Sequence Database (GSDB) | http://www.ncgr.org/research/sequence/ | All known nucleotide and protein sequences |
STACK | http://www.sanbi.ac.za/Dbases.html | Non-redundant, gene-oriented clusters |
TIGR Gene Indices | http://www.tigr.org/tdb/index.html | Non-redundant, gene-oriented clusters |
UniGene | http://www.ncbi.nlm.nih.gov/UniGene/ | Non-redundant, gene-oriented clusters |
Comparative Genomics | ||
Clusters of Orthologous Groups (COG) | http://www.ncbi.nlm.nih.gov/COG/ | Phylogenetic classification of proteins from 21 complete genomes |
XREFdb | http://www.ncbi.nlm.nih.gov/XREFdb/ | Cross-referencing of model organism genetics with mammalian phenotypes |
Gene Expression | ||
ASDB | http://cbcg.nersc.gov/asdb | Protein products and expression patterns of alternatively-spliced genes |
Axeldb | http://www.dkfz-heidelberg.de/abt0135/axeldb.htm | Gene expression in Xenopus |
BodyMap | http://bodymap.ims.u-tokyo.ac.jp/ | Human and mouse gene expression data |
EpoDB | http://www.cbil.upenn.edu/epodb/ | Genes expressed in vertebrate RBC |
FlyView | http://pbio07.uni-muenster.de/ | Drosophila development and genetics |
Gene Expression Database (GXD) | http://www.informatics.jax.org/searches/gxdindex_form.shtml | Mouse gene expression and genomics |
Interferon Stimulated Gene Database | http://www.lerner.ccf.org/labs/williams/xchip-html.cgi | Genes induced by treatment with interferons |
Kidney Development Database | http://www.ana.ed.ac.uk/anatomy/database/kidbase/kidhome.html | Kidney development and gene expression |
MAGEST | http://star.scl.kyoto-u.ac.jp/magest/ | Ascidian (Halocynthia roretzi) gene expression patterns |
MethDB | http://www.methdb.de | DNA methylation data, patterns and profiles |
Mouse Atlas and Gene Expression Database | http://genex.hgu.mrc.ac.uk | Spatially-mapped gene expression data |
PEDB | http://chroma.mbt.washington.edu/PEDB/ | Normal and aberrant prostate gene expression |
RECODE | http://recode.genetics.utah.edu | Genes using programmed translational recoding in their expression |
Stanford Microarray Database | http://genome-www.stanford.edu/microarray | Raw and normalized data from microarray experiments |
TRIPLES | http://ygac.med.yale.edu/triples/triples.htm | TRansposon-Insertion Phenotypes, Localization, and Expression in Saccharomyces |
Tooth Development Database | http://bite-it.helsinki.fi/ | Gene expression in dental tissue |
Gene Identification and Structure | ||
AllGenes | http://www.allgenes.org | Human and mouse gene index integrating gene, transcript and protein annotation |
Ares Lab Intron Site | http://www.cse.ucsc.edu/research/compbio/yeast_introns.html | Yeast spliceosomal introns |
AsMamDB | http://166.111.30.65/ASMAMDB.html | Alternatively-spliced mammalian genes |
COMPEL | http://compel.bionet.nsc.ru/ | Composite regulatory elements |
CUTG | http://www.kazusa.or.jp/codon/ | Codon usage tables |
DBTBS | http://elmo.ims.u-tokyo.ac.jp/dbtbs/ | Bacillus subtilis binding factors and promoters |
EID | http://mcb.harvard.edu/gilbert/EID/ | Protein-coding, intron-containing genes |
EPD | http://www.epd.isb-sib.ch/ | Eukaryotic POL II promoters with experimentally-determined transcription start sites |
ExInt | http://intron.bic.nus.edu.sg/exint/exint.html | Exon-intron structure of eukaryotic genes |
HUNT | http://www.hri.co.jp/HUNT | Annotated human full-length cDNA sequences |
IDB/IEDB | http://nutmeg.bio.indiana.edu/intron/index.html | Intron sequence and evolution |
PLACE | http://www.dna.affrc.go.jp/htdocs/PLACE | Plant cis-acting regulatory elements |
PlantCARE | http://sphinx.rug.ac.be:8080/PlantCARE/index.htm | Plant cis-acting regulatory elements |
PromEC | http://bioinfo.md.huji.ac.il/marg/promec | Escherichia coli mRNA promoters with experimentally identified transcriptional start sites |
RRNDB | http://rrndb.cme.msu.edu | Variation in prokaryotic ribosomal RNA operons |
STRBase | http://www.cstl.nist.gov/div831/strbase/ | Short tandem DNA repeats |
SpliceDB | http://genomic.sanger.ac.uk/spldb/SpliceDB.html | Canonical and non-canonical mammalian splice sites |
TRRD | http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4 | Transcription regulatory regions of eukaryotic genes |
TransTerm | http://uther.otago.ac.nz/Transterm.html | Codon usage, start and stop signals |
VIDA | http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html | Virus genome open reading frames |
WormBase | http://www.wormbase.org | Guide to Caernorhabditis elegans biology |
YIDB | http://www.EMBL-Heidelberg.DE/ExternalInfo/seraphin/yidb.html | Yeast nuclear and mitochondrial intron sequences |
rSNP Guide | http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/ | Single nucleotide polymorphisms in regulatory gene regions |
Genetic and Physical Maps | ||
DRESH | http://www.tigem.it/LOCAL/drosophila/dros.html | Human cDNA clones homologous to Drosophila mutant genes |
G3-RH | http://www-shgc.stanford.edu/RH/ | Stanford G3 and TNG radiation hybrid maps |
GB4-RH | http://www.sanger.ac.uk/Software/RHserver//Rhserver.shtml | Genebridge4 (GB4) human radiation hybrid maps |
GDB | http://www.gdb.org | Human genes and genomic maps |
GenAtlas | http://www.citi2.fr/GENATLAS/ | Human genes, markers and phenotypes |
GenMapDB | http://genomics.med.upenn.edu/genmapdb | Mapped human BAC clones |
GeneMap ‘99 | http://www.ncbi.nlm.nih.gov/genemap/ | International Radiation Mapping Consortium human gene map |
HuGeMap | http://www.infobiogen.fr/services/Hugemap | Human genome genetic and physical map data |
IXDB | http://ixdb.mpimg-berlin-dahlem.mpg.de | Physical maps of human chromosome X |
RHdb | http://www.ebi.ac.uk/RHdb | Radiation hybrid map data |
Radiation Hybrid Database | http://www.ebi.ac.uk/RHdb | Radiation hybrid map data |
Genomic Databases | ||
ACeDB | http://www.sanger.ac.uk/Software/Acedb/ | C.elegans, Saccharomyces pombe, and human sequences and genomic information |
AMmtDB | http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB | Metazoan mitochondrial DNA sequences |
ArkDB | http://www.thearkdb.org/genome_mapping.html | Genome databases for farm and other animals |
Comprehensive Microbial Resource | http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl | Completed microbial genomes |
CropNet | http://ukcrop.net/ | Genome mapping in crop plants |
CyanoBase | http://www.kazusa.or.jp/cyano/ | Synechocystis sp. genome |
EMGlib | http://pbil.univ-lyon1.fr/emglib/emglib.html | Completely sequenced microbial genomes from bacteria, archaea, yeast |
EcoGene | http://bmb.med.miami.edu/EcoGene/EcoWeb/ | E.coli K-12 sequences |
FlyBase | http://www.fruitfly.org | Drosophila sequences and genomic information |
Full-Malaria | http://133.11.149.55 | Full-length cDNA library from erythrocytic-stage Plasmodium falciparum |
GOBASE | http://megasun.bch.umontreal.ca/gobase/gobase.html | Organelle genome database |
GOLD | http://igweb.integratedgenomics.com/GOLD/ | Information regarding complete and ongoing genome projects |
HIV Sequence Database | http://hiv-web.lanl.gov/ | HIV RNA sequences |
Human BAC Ends Database | http://www.tigr.org/tdb/humgen/bac_end_search/bac_end_intro.html | Non-redundant human BAC end sequences |
ICB | http://www.mbio.co.jp/icb | Identification and classification of bacteria using protein-coding |
INE | http://rgp.dna.affrc.go.jp/giot/INE.html | Rice genetic and physical maps and sequence data |
MITOMAP | http://www.gen.emory.edu/mitomap.html | Human mitochondrial genome |
MITOP | http://websvr.mips.biochem.mpg.de/proj/medgen/mitop | Mitochondrial proteins, genes, and diseases |
Medicago Genome Initiative | http://www.noble.org/medicago/ | Model legume Medicago truncatula ESTs, gene expression and proteomic data |
Mendel Database | http://jiio6.jic.bbsrc.ac.uk/ | Database of plant EST and STS sequences annotated with gene family information |
MitBASE | http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl | Mitochondrial genomes, intra-species variants, and mutants |
MitoDat | http://www-lecb.ncifcrf.gov/mitoDat/ | Mitochondrial proteins (predominantly human) |
MitoNuc/MitoAln | http://bio-www.ba.cnr.it:8000/srs6/ | Nuclear genes coding for mitochondrial proteins |
Mouse Genome Database (MGD) | http://www.informatics.jax.org | Mouse genetics and genomics |
Munich Information Center for Protein Sequences (MIPS) | http://www.mips.biochem.mpg.de/ | Protein and genomic sequences |
NRSub | http://pbil.univ-lyon1.fr/nrsub/nrsub.html | B.subtilis genome |
PlasmoDB | http://PlasmoDB.org | Plasmodium GENOME |
RsGDB | http://www-mmg.med.uth.tmc.edu/sphaeroides | Rhodobacter sphaeroides genome |
Saccharomyces Genome Database (SGD) | http://genome-www.stanford.edu/Saccharomyces | S.cerevisiae genome |
TIGR Microbial Database | http://www.tigr.org/tdb/mdb/mdbcomplete.html | Microbual genomes and chromosomes |
The Arabidopsis Information Resource (TAIR) | http://www.arabidopsis.org/ | Arabidopsis thaliana genome |
ZFIN | http://www.zfin.org | Genetic, genomic and developmental data from zebrafish |
ZmDB | http://zmdb.iastate.edu/ | Maize genome database |
Intermolecular Interactions | ||
Biomolecular Interaction Network Database (BIND) | http://binddb.org | Molecular interactions, complexes and pathways |
DIP | http://dip.doe-mbi.ucla.edu/ | Catalog of protein–protein interactions |
DPInteract | http://arep.med.harvard.edu/dpinteract/ | Binding sites for E.coli DNA-binding proteins |
Database of Ribosomal Crosslinks (DRC) | http://www.mpimg-berlin-dahlem.mpg.de/~ag_ribo/ag_brimacombe/drc/ | Ribosomal crosslinking data |
Metabolic Pathways and Cellular Regulation | ||
ENZYME | http://www.expasy.ch/enzyme/ | Enzyme nomenclature |
EcoCyc | http://ecocyc.pangeasystems.com/ecocyc/ | E.coli K-12 genome, gene products, and metabolic pathways |
EpoDB | http://www.cbil.upenn.edu/EpoDB/ | Genes expressed during human erythropoiesis |
FlyNets | http://gifts.univ-mrs.fr/FlyNets/FlyNets_home_page.html | Drosophila melanogaster molecular interactions |
Klotho | http://www.ibc.wustl.edu/klotho/ | Collection and categorization of biological compounds |
Kyoto Encyclopedia of Genes and Genomes (KEGG) | http://www.genome.ad.jp/kegg | Metabolic and regulatory pathways |
LIGAND | http://www.genome.ad.jp/dbget/ligand.html | Enzymatic ligands, substrates and reactions |
RegulonDB | http://www.cifn.unam.mx/Computational_Biology/regulondb/ | E.coli transcriptional regulation and operon organization |
UM-BBD | http://www.labmed.umn.edu/umbbd/ | Microbial biocatalytic reactions and biodegradation pathways |
WIT2 | http://wit.mcs.anl.gov/WIT2/ | Integrated system for functional curation and development of metabolic models |
Mutation Databases | ||
16S and 23S Ribosomal RNA Mutation Databases | http://ribosome.fandm.edu | 16S and 23S ribosomal RNA mutation database |
ALFRED | http://alfred.med.yale.edu/alfred/index.asp | Allele frequencies and DNA polymorphisms |
Androgen Receptor Gene Mutations Database | http://www.mcgill.ca/androgendb/ | Mutations in the androgen receptor gene |
Asthma Gene Database | http://cooke.gsf.de/asthmagen/main.cfm | Linkage and mutation studies on the genetics of asthma and allergy |
Asthma and Allergy Database | http://cooke.gsf.de/asthmagen/main.cfm | |
Atlas of Genetics and Cytogenetics in Oncology and Haematology | http://www.infobiogen.fr/services/chromcancer/ | Chromosomal abnormalities in cancer |
BTKbase | http://www.uta.fi/laitokset/imt/bioinfo/BTKbase/ | Mutation registry for X-linked agammaglobulinemia |
CASRDB | http://data.mch.mcgill.ca/casrdb/ | CASR mutations causing FHH, NSHPT and ADH |
Cytokine Gene Polymorphism Database | http://www.pam.bris.ac.uk/services/GAI/cytokine4.htm | Cytokine gene polymorphisms, in vitro expression and disease-association studies |
Database of Germline p53 Mutations | http://www.lf2.cuni.cz/win/projects/germline_mut_p53.htm | Mutations in human tumor and cell line p53 gene |
GRAP Mutant Databases | http://tinyGRAP.uit.no/GRAP/ | Mutants of family A G-Protein Coupled Receptors (GRAP) |
HGBASE | http://hgbase.cgr.ki.se | Intragenic sequence polymorphisms |
HIV-RT | http://hivdb.stanford.edu/hiv/ | HIV reverse transcriptase and protease sequence variation |
Haemophila B Mutation Database | http://www.umds.ac.uk/molgen/haemBdatabase.htm | Point mutations, short additions and deletions in the Factor IX gene |
Human Gene Mutation Database (HGMD) | http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html | Known (published) gene lesions underlying human inherited disease |
Human PAX2 Allelic Variant Database | http://www.hgu.mrc.ac.uk/Softdata/PAX2/ | Mutations in human PAX2 gene |
Human PAX6 Allelic Variant Database | http://www.hgu.mrc.ac.uk/Softdata/PAX6/ | Mutations in human PAX6 gene |
Human Type I and Type III Collagen Mutation Database | http://www.le.ac.uk/genetics/collagen/ | Human type I and type III collagen gene mutations |
HvrBase | http://db.eva.mpg.de/Hvrbase/ | Primate mtDNA control region sequences |
KMDB | http://mutview.dmb.med.keio.ac.jp/mutview3/kmeyedb/index.html | Mutations in human eye disease genes |
KinMutBase | http://www.uta.fi/imt/bioinfo/KinMutBase/ | Disease-causing protein kinase mutations |
MmtDB | http://www.ba.cnr.it/∼areamt08/MmtDBWWW.htm | Mutations and polymorphisms in metazoan mitochondrial DNA sequences |
Mutation Spectra Database | http://info.med.yale.edu/mutbase/ | Mutations in viral, bacterial, yeast and mammalian genes |
NCL Mutations | http://www.ucl.ac.uk/ncl/ | Mutations and polymorphisms in neuronal ceroid lipofuscinoses (NCL) genes |
Online Mendelian Inheritance in Man | http://www.ncbi.nlm.nih.gov/Omim/ | Catalog of human genetic and genomic disorders |
PAHdb | http://www.mcgill.ca/pahdb/ | Mutations at the phenylalanine hydroxylase locus |
PHEXdb | http://data.mch.mcgill.ca/phexdb | Mutations in PHEX gene causing X-linked hypophosphatemia |
PMD | http://pmd.ddbj.nig.ac.jp/ | Compilation of protein mutant data |
PTCH1 Mutation Database | http://www.cybergene.se/PTCH/ptchbase.html | Mutations and SNPs found in PTCH1 |
RB1 Gene Mutation Database | http://www.d-lohmann.de/Rb/ | Mutations in the human retinoblastoma (RB1) gene |
Ribosomal RNA Mutational Database | http://ribosome.fandm.edu/ | 16S and 23S ribosomal RNA mutation database |
SV40 Large T-Antigen Mutant Database | http://bigdaddy.bio.pitt.edu/SV40/ | Mutations in SV40 large tumor antigen gene |
dbSNP | http://www.ncbi.nlm.nih.gov/SNP/ | Single nucleotide polymorphisms |
iARC p53 Database | http://www.iarc.fr/p53/ | Missense mutations and small deletions in human p53 reported in peer-reviewed literature |
p53 Databases | http://metalab.unc.edu/dnam/mainpage.html | Mutations at the human p53 and hprt genes; rodent transgenic lacI and lacZ mutations |
Pathology | ||
FIMM | http://sdmc.krdl.org.sg:8080/fimm/ | Functional molecular immunology data |
HCForum | http://hcforum.imag.fr/welcome_eng.html | Human cytogenetics database |
Mouse Tumor Biology Database (MTB) | http://tumor.informatics.jax.org | Mouse tumor names, classification, incidence, pathology, genetic factors |
Oral Cancer Gene Database | http://www.tumor-gene.org/Oral/oral.html | Cellular, molecular and biological data for genes involved in oral cancer |
PEDB | http://chroma.mbt.washington.edu/PEDB/ | Sequences from prostate tissue and cell type-specific cDNA libraries |
Tumor Gene Family Databases (TGDBs) | http://www.tumor-gene.org/tgdf.html | Cellular, molecular, and biological data about genes involved in various cancers |
Protein Databases | ||
AARSDB | http://rose.man.poznan.pl/aars/index.html | Aminoacyl-tRNA synthetase sequences |
ABCdb | http://ir2lcb.cnrs-mrs.fr/ABCdb/ | ABC transporters |
DAtA | http://luggagefast.Stanford.EDU/group/arabprotein/ | Annotated coding sequences from Arabidopsis |
DExH/D Family Database | http://www.columbia.edu/~ej67/dbhome.htm | DEAD-box, DEAH-box and DExH-box proteins |
ESTHER | http://www.ensam.inra.fr/cholinesterase/ | Esterases and alpha/beta hydrolase enzymes and relatives |
Endogenous GPCR List | http://www.biomedcomp.com/GPCR.html | G protein-coupled receptors; expression in cell lines |
FUNPEP | http://www.gpcr.org/FUNPEP/db | Low-complexity or compositionally-biased protein sequences |
GPCRDB | http://swift.embl-heidelberg.de/7tm/ | G protein-coupled receptors |
GenProtEC | http://genprotec.mbl.edu | Escherichia coli K-12 genome, gene products and homologs |
HIV Molecular Immunology Database | http://hiv-web.lanl.gov/immunology/ | HIV epitopes |
HUGE | http://www.kazusa.or.jp/huge/ | Large (50 kDa) human proteins and cDNA sequences |
Histone Database | http://genome.nhgri.nih.gov/histones/ | Histone and histone fold sequences and structures |
Homeobox Page | http://copan.bioz.unibas.ch/homeo.html | Information relevant to homeobox proteins, classification and evolution |
Homeodomain Resource | http://genome.nhgri.nih.gov/homeodomain | Homeodomain sequences, structures, and related genetic and genomic information |
IMGT | http://imgt.cines.fr:8104/ | Immunoglobulin, T cell receptor and MHC sequences from human and other vertebrates |
IMGT/HLA | http://www.ebi.ac.uk/imgt/hla/ | Human major histocompatibility complexes |
InBase | http://www.neb.com/neb/inteins.html | Intervening protein sequences (inteins) and motifs |
Kabat Database | http://immuno.bme.nwu.edu/ | Sequences of proteins of immunological interest |
LGICdb | http://www.pasteur.fr/recherche/banques/LGIC/LGIC.html | Ligand-gated ion channel subunit sequences |
MEROPS | http://www.merops.co.uk | Proteolytic enzymes (proteases/peptidases) |
MHCPEP | http://wehih.wehi.edu.au/mhcpep/ | MHC-binding peptides |
Membrane Protein Database | http://biophys.bio.tuat.ac.jp/ohshima/database/ | Membrane protein sequences, transmembrane regions and structures |
MetaFam | http://metafam.ahc.umn.edu/ | Integrated protein family information |
Nuclear Receptor Resource | http://nrr.georgetown.edu/nrr/nrr.html | Nuclear receptor superfamily |
Olfactory Receptor Database | http://ycmi.med.yale.edu/senselab/ordb/ | Sequences for olfactory receptor-like molecules |
PKR | http://pkr.sdsc.edu | Protein kinase sequences, enzymology, genetics, and molecular and structural properties |
PPMdb | http://sphinx.rug.ac.be:8080/ppmdb/index.html | Arabidopsis plasma membrane protein sequence and expression data |
PROMISE | http://bioinf.leeds.ac.uk/promise/ | Prosthetic centers and metal ions in protein active sites |
Peptaibol | http://www.cryst.bbk.ac.uk/peptaibol/welcome.html | Peptaibol (antibiotic peptide) sequences |
PhosphoBase | http://www.cbs.dtu.dk/databases/PhosphoBase/ | Protein phosphorylation sites |
PlantsP | http://PlantsP.sdsc.edu | Plant protein kinases and protein phosphatases |
Prolysis | http://delphi.phys.univ-tours.fr/Prolysis/ | Proteases and natural and synthetic protease inhibitors |
Protein Information Resource (PIR) | http://pir.georgetown.edu | Comprehensive, annotated, non-redundant protein sequence database |
Ribonuclease P Database | http://www.mbio.ncsu.edu/RNaseP/home.html | RNase P sequences, alignments and structures |
SENTRA | http://wit.mcs.anl.gov/WIT2/Sentra/HTML/sentra.html | Sensory signal transduction proteins |
SWISS-PROT/TrEMBL | http://www.expasy.ch/sprot | Curated protein sequences |
TIGRFAMs | http://www.tigr.org/TIGRFAMs | Protein family resource for the functional identification of proteins |
TRANSFAC | http://transfac.gbf.de/TRANSFAC/index.html | Transcription factors and binding sites |
Wnt Database | http://www.stanford.edu/~rnusse/wntwindow.html | Wnt proteins and phenotypes |
ooTFD | http://www.ifti.org/ | Transcription factors and gene expression |
trEST, trGEN and Hits | http://hits.isb-sib.ch | Predicted protein sequences |
Protein Sequence Motifs | ||
BLOCKS | http://blocks.fhcrc.org/ | Conserved sequence regions of protein families |
CluSTr | http://www.ebi.ac.uk/clustr/ | Automatic classification of SWISS-PROT+TrEMBL proteins into related groups |
InterPro | http://www.ebi.ac.uk/interpro/ | Integrated documentation resource for protein families, domains and sites |
O-GLYCBASE | http://www.cbs.dtu.dk/databases/OGLYCBASE/ | Glycoproteins and O-linked glycosylation sites |
PIR-ALN | http://www-nbrf.georgetown.edu/pirwww/dbinfo/piraln.html | Protein sequence alignments |
PRINTS | http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ | Hierarchical gene family fingerprints |
PROSITE | http://www.expasy.ch/prosite/ | Biologically-significant protein patterns and profiles |
Pfam | http://www.sanger.ac.uk/Software/Pfam/ | Multiple sequence alignments and hidden Markov models of common protein domains |
ProClass | http://pir.georgeown.edu/gfserver/proclass.html | Protein families defined by PIR superfamilies and PROSITE patterns |
ProDom | http://www.toulouse.inra.fr/prodom.html | Protein domain families |
ProtoMap | http://www.protomap.cs.huji.ac.il/ | Automated hierarchical classification of SWISS- PROT proteins |
SBASE | http://www3.icgeb.trieste.it/~sbasesrv/ | Annotated protein domain sequences |
SMART | http://smart.embl-heidelberg.de/ | Signaling domain sequences |
SYSTERS | http://www.dkfz-heidelberg.de/tbi/services/cluster/systersform | Classification of protein sequences into disjoint clusters with annotations from various other resources |
eMOTIF | http://motif.stanford.edu/emotif | Protein sequence motif determination and searches |
iPROCLASS | http://pir.georgetown.edu/iproclass/ | Annotated protein classification database |
Proteome Resources | ||
AAindex | http://www.genome.ad.jp/dbget/ | Physicochemical properties of peptides |
Proteome Analysis Database | http://www.ebi.ac.uk/proteome/ | Online application of interpro and clustr for the functional classification of proteins in whole genomes |
REBASE | http://rebase.neb.com/rebase/rebase.html | Restriction enzymes and associated methylases |
SWISS-2DPAGE | http://www.expasy.ch/ch2d/ | Annotated two-dimensional polyacrylamide gel electrophoresis database |
Yeast Proteome Database (YPD) | http://www.proteome.com/databases/index.html | S.cerevisiae proteome |
RNA Sequences | ||
5S Ribosomal RNA Database | http://biobases.ibch.poznan.pl/5SData/ | 5S rRNA sequences |
ACTIVITY | http://wwwmgs.bionet.nsc.ru/mgs/systems/activity/ | Functional DNA/RNA site activity |
ARED | http://rc.kfshrc.edu.sa | AU-rich element-containing mRNAs |
Collection of mRNA-like Noncoding RNAs | http://biobases.ibch.poznan.pl/ncRNA/ | Non-protein-coding RNA transcripts |
European Large Subunit Ribosomal RNA Database | http://rrna.uia.ac.be/lsu/index.html | Alignment of large subunit ribosomal RNA sequences with secondary structure information |
European Small Subunit Ribosomal RNA Database | http://rrna.uia.ac.be/ssu/index.html | Alignment of small subunit ribosomal RNA sequences with secondary structure information |
Guide RNA Database | http://www.biochem.mpg.de/~goeringe/ | Guide RNA sequences |
HyPaLib | http://bibiserv.techfak.uni-bielefeld.de/HyPa/ | Structural elements characteristic for classes of RNA |
Intronerator | http://www.cse.ucsc.edu/~kent/intronerator/ | RNA splicing and gene structure in C.elegans; alignments of Caernorhabditis briggsae and C.elegans genomic sequences |
Non-Canonical Interactions in RNA | http://prion.bchs.uh.edu/bp_type/ | Non-standard base-base interactions in known RNA structures |
PLMItRNA | http://bigarea.area.ba.cnr.it:8000/PLMItRNA/ | Mitochondrial tRNA genes and molecules in photosynthetic eukaryotes |
Pseudobase | http://wwwbio.leidenuniv.nl/~Batenburg/PKB.html | Information on RNA pseudoknots |
RISCC | http://ulises.umh.es/RISSC | Ribosomal 16S-23S RNA gene spacer regions |
RNA Modification Database | http://medlib.med.utah.edu/RNAmods/ | Naturally modified nucleosides in RNA |
Ribosomal Database Project (RDP) | http://rdp.cme.msu.edu/ | rRNA sequences, alignments and phylogenies |
SELEXdb | http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/ | Selected DNA/RNA functional site sequences |
SRPDB | http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html | Signal recognition particle RNA, protein and receptor sequences |
Small RNA Database | http://mbcr.bcm.tmc.edu/smallRNA | Direct sequencing of small RNA sequences from prokaryotes and eukaryotes |
The tmRNA Website | http://www.indiana.edu/~tmrna | tmRNA sequences, foldings and alignments |
UTRdb/UTRsite | http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRHome/ | 5′’ and 3′’ UTRs of eukaryotic mRNAs and relevant functional patterns |
Viroids and viroid-like RNAs | http://nt.ars-grin.gov/subviral/ | Viroids and viroid-like RNAs |
Yeast snoRNA Database | http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.html | Yeast small nucleolar RNA |
tRNA Sequences | http://www.uni-bayreuth.de/departments/biochemie/trna/ | TRNA and tRNA gene sequences |
tmRDB | http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html | TmRNA (10Sa RNA) sequences |
Retrieval Systems and Database Structure | ||
KEYnet | http://www.ba.cnr.it/keynet.html | Hierarchical list of gene and protein names for data retrieval |
TESS | http://www.cbil.upenn.edu/tess | Transcription element search system |
Virgil | http://www.infobiogen.fr/services/virgil | Database interconnectivity |
Structure | ||
ASTRAL | http://astral.stanford.edu/ | Sequences of domains of known structure, selected subsets and sequence-structure correspondences |
BioImage | http://www-embl.bioimage.org/ | Searchable database of multidimensional biological images |
BioMagResBank | http://www.bmrb.wisc.edu/ | NMR spectroscopic data from proteins, peptides and nucleic acids |
CATH | http://www.biochem.ucl.ac.uk/bsm/cath/ | Hierarchical classification of protein domain structures |
CE | http://cl.sdsc.edu/ce.html | CE: A Resource to Compute and Review 3-D Protein Structure Alignments |
CKAAPs DB | http://cl.sdsc.edu/ckaap | Structurally-similar proteins with dissimilar sequences |
CSD | http://www.ccdc.cam.ac.uk/prods/csd/csd.html | Crystal structure information for organic and metal organic compounds |
Database of Macromolecular Movements | http://bioinfo.mbb.yale.edu/MolMovDB/ | Descriptions of protein and macromolecular motions, including movies |
Decoys ‘R’ Us | http://dd.stanford.edu/ | Computer-generated protein conformations based on sequence data |
HIC-Up | http://alpha2.bmc.uu.se/hicup/ | Structures of small molecules (hetero-compounds) |
HSSP | http://www.sander.ebi.ac.uk/hssp/ | Structural families and alignments; structurally-conserved regions and domain architecture |
IMB Jena Image Library of Biological Macromolecules | http://www.imb-jena.de/IMAGE.html | Visualization and analysis of three-dimensional biopolymer structures |
ISSD | http://www.protein.bio.msu.su/issd/ | Integrated sequence and structural information |
LPFC | http://www-smi.stanford.edu/projects/helix/LPFC/ | Library of protein family core structures |
MMDB | http://www.ncbi.nlm.nih.gov/Structure/ | All experimentally-determined three-dimensional structures, linked to NCBI Entrez |
ModBase | http://pipe.rockefeller.edu/modbase | Annotated comparative protein structure models |
NDB | http://ndbserver.rutgers.edu/NDB/ndb.html | Nucleic acid-containing structures |
NTDB | http://ntdb.chem.cuhk.edu.hk | Thermodynamic data for nucleic acids |
PALI | http://pauling.mbu.iisc.ernet.in/~pali | Phylogeny and alignment of homologous protein structures |
PDB | http://www.rcsb.org/pdb/ | Structure data determined by X-ray crystallography and NMR |
PDB-REPRDB | http://www.rwcp.or.jp/papia/ | Representative protein chains, based on PDB entries |
PDBsum | http://www.biochem.ucl.ac.uk/bsm/pdbsum | Summaries and analyses of PDB structures |
PRESAGE | http://presage.berkeley.edu/ | Protein structures with experimental and predictive annotations |
ProTherm | http://www.rtc.riken.go.jp/jouhou/protherm/protherm.html | Thermodynamic data for wild-type and mutant proteins |
RESID | http://www-nbrf.georgetown.edu/pirwww/dbinfo/resid.html | Protein structure modifications |
SCOP | http://scop.mrc-lmb.cam.ac.uk/scop/ | Familial and structural protein relationships |
SLoop | http://www-cryst.bioc.cam.ac.uk/~sloop/ | Classification of protein loops |
Transgenics | ||
Cre Transgenic Database | http://www.mshri.on.ca/nagy/cre.htm | Cre transgenic mouse lines |
Transgenic/Targeted Mutation Database | http://tbase.jax.org/ | Information on transgenic animals and targeted mutations |
Varied Biomedical Content | ||
BAliBASE | http://www-igbmc.u-strasbg.fr/BioInfo/BaliBASE2/index.html | Benchmark database for comparison of multiple sequence alignments |
DBcat | http://www.infobiogen.fr/services/dbcat/ | Catalog of databases |
DrugDB | http://pharminfo.com/drugdb/db_mnu.html | Pharmacologically-active compounds; generic and trade names |
END | http://www.ibc.wustl.edu/biognosis/agora_interface/html/agora_entrance.html | Enzyme nomenclature |
Global Image Database | http://www.gwer.ch/qv/gid/gid.htm | Annotated biological images |
GlycoSuiteDB | http://www.glycosuite.com | N- and O-linked glycan structures and biological source information |
HOX-PRO | http://www.mssm.edu/molbio/hoxpro/new/hox-pro00.html | Clustering of homeobox genes |
Imprinted Genes and Parent-of-Origin Effects | http://www.otago.ac.nz/IGC | Imprinted genes and parent-of-origin effects in animals |
LocusLink/RefSeq | http://www.ncbi.nlm.nih.gov/LocusLink/ | Curated sequence and descriptive information about genetic loci |
MPDB | http://www.biotech.ist.unige.it/interlab/mpdb.html | Information on synthetic oligonucleotides proven useful as primers or probes |
Molecular Probe Database | http://srs.ebi.ac.uk/ | Synthetic oligonucleotides, probes and PCR primers |
NCBI Taxonomy Browser | http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html | Names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequence |
PubMed | http://www.ncbi.nlm.nih.gov/PubMed/ | MEDLINE and Pre-MEDLINE citations |
Tree of Life | http://phylogeny.arizona.edu/tree/phylogeny.html | Information on phylogeny and biodiversity |
Vectordb | http://vectordb.atcg.com/ | Characterization and classification of nucleic acid vectors |
In addition to the list presented in this paper, an electronic version of the Database Issue and Collection can be accessed online and is freely available to everyone, regardless of subscription status, at http://www.nar.oupjournals.org. While the list contains the databases described in the papers comprising the current issue, it should be immediately apparent to the reader that there are simply not enough pages in this journal to accommodate full-length, printed descriptions of all 281 of the databases featured here. To address this, the online version of the Collection now includes short summaries of many of the databases, the summaries having been provided directly by the investigators responsible for the individual databases. It is hoped that this approach will provide the reader with an additional source of information that will facilitate finding and selecting the sources of data that would be of most value in addressing a specific biological problem. Contributors will be encouraged to keep their entries up-to-date, as the online descriptions will be updated on a regular basis.
Suggestions for the inclusion of additional database resources in this Collection are encouraged and may be directed to the author (andy@nhgri.nih.gov).
Supplementary Material
Acknowledgments
ACKNOWLEDGEMENT
I wish to thank Yi-Chi Barash for designing the Web-based submission tool for this Collection as well as for her technical support.
References
- 1.Collins F.S., Patrinos,A., Jordan,E., Chakravarti,A., Gesteland,R., Walters,L. and members of the DOE and NIH Planning Groups (1998) New goals for the U.S. Human Genome Project: 1998–2003. Science, 282, 682–689. [DOI] [PubMed] [Google Scholar]
- 2.Hattori M., Fujiyama,A., Taylor,T.D., Watanabe,H., Yada,T., Park,H.S., Toyoda,A., Ishii,K., Totoki,Y., Choi,D.K. et al. (2000) The DNA sequence of human chromosome 21. The chromosome 21 mapping and sequencing consortium. Nature, 405, 311–319. [DOI] [PubMed] [Google Scholar]
- 3.Dunham I, Shimizu,N., Roe,B.A., Chissoe,S., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495. [DOI] [PubMed] [Google Scholar]
- 4.Adams M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 2185–2195. [DOI] [PubMed] [Google Scholar]
- 5.Guyer M.S. (1998) Statement on the rapid release of genomic DNA sequence. Genome Res., 8, 413. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.