Skip to main content
. 2008 Nov 11;6(31):129–147. doi: 10.1098/rsif.2008.0341

Table 1.

Illustrating a selection of data resources (annotation tools, databases and repositories). (The resources are divided into those which serve genomic data, proteomic sequence and proteomic structure. An indication of ‘automatic’ or ‘manual’ annotation is associated with each method to describe how the data are generated. Those which just provide a central point for a particular set of data are annotated as a ‘repository’. This list includes a great variety of resources that the authors consider useful; however, other specialized reviews can afford a bigger coverage of tools or databases for specific problems (Casadio et al. 2008; Meinnel & Giglione 2008).)

data resource description URL type: manual automatic repository
genome
ASAP II database of splicing variants including tissue and cancer analysis (Kim et al. 2007) http://bioinformatics.ucla.edu/ASAP2/ A
ASPicDB database of splicing pattern of human genes (Castrignano et al. 2008) http://t.caspur.it/ASPicDB/ A
ASTD database containing alternative transcripts generated by either alternative splicing or alternative start or end points (Stamm et al. 2006) http://www.ebi.ac.uk/astd/ A
dbSNP a catalogue of variation from the National Center for Biotechnology Information (Smigielski et al. 2000) http://www.ncbi.nlm.nch.gov/projects/snp R
Ensembl pipeline which includes prediction of genes, transcripts and peptides (Flicek et al. 2008) http://www.ensembl.org A
FlyBase database of Drosophila genomes (Grumbling & Strelets 2006) http://flybase.bio.indiana.edu/ M
GenBank database containing all publicly available DNA sequences (Benson et al. 2008) http://www.ncbi.nlm.nih.gov/Genbank/ R
GOLD resource monitoring the worldwide genome projects (Liolios et al. 2008) http://www.genomesonline.org/ A
NCBI tools repository of tools to perform analysis in several types of data: genes; proteins; and genomes (Wheeler et al. 2007) http://www.ncbi.nlm.nih.gov/Tools/ A
OMIM database of human-inherited diseases and the genes causing them (Hamosh et al. 2002) http://www.ncbi.nlm.nih.gov/omim/ M
RefSeq non-redundant database of annotated sequences (genomic DNA, transcripts and proteins; Pruitt et al. 2007) http://www.ncbi.nlm.nih.gov/RefSeq/ M/A
SNPeffect database for the annotation of the effect of SNPs (Reumers et al. 2005) http://snpeffect.vib.be/index.php A
TAIR database containing genetic and molecular biology data for Arabidopsis thaliana (Swarbreck et al. 2008) http://www.arabidopsis.org/ M/A
UCSC genome browser browser for displaying genomic data (Karolchik et al. 2008) http://genome.ucsc.edu/ A
Vega repository of manually curated data for finished vertebrate genomes (Wilming et al. 2008) http://vega.sanger.ac.uk M
WormBase database containing genomic information for Caenorhabditis elegans and other nematodes (Rogers et al. 2008) http://www.wormbase.org/ M
proteomic/sequence
a suite of tools to analyse post-translational modifications from the CBS predicting the attachment of chemical groups: phosphorylation (NetPhos; Blom et al. 1999; NetPhosK; Blom et al. 2004; NetPhosYeast; Ingrell et al. 2007); O-linked glycosylation (NetOGlyc; Julenius et al. 2005; YinOYang; Gupta & Brunak 2002; DictyOGlyc; Gupta et al. 1999); N-linked glycosylation (NetNGlyc); C-linked glycosylation (NetCGlyc; Julenius 2007); glycation (NetGlycate; Johansen et al. 2006); acetylation (NetAcet; Kiemer et al. 2005); sulphation; and lipid attachment (LipoP; Juncker et al. 2003); http://www.cbs.dtu.dk/services/ A
tools for the indication of peptide cleavage: signal peptides (SignalP; Bendtsen et al. 2004; LipoP; Juncker et al. 2003; TatP; Bendtsen et al. 2005a,b); propeptides (ProP; Duckert et al. 2004); transit peptides (TargetP; Emanuelsson et al. 2007; ChloroP; Emanuelsson et al. 1999); viral polyprotein processing (NetCorona; Kiemer et al. 2004; NetPicoRNA; Blom et al. 1996); caspase cleavage and also protein sorting and subcellular localization; secretion (SecretomeP; Bendtsen et al. 2005a,b); import into mitochondria and chloroplasts (ChloroP); and nuclear export (NetNES; La Cour et al. 2004)
CSA database containing information about catalytic residues, part manually curated, part by homology (Porter et al. 2004) http://www.ebi.ac.uk/thornton-srv/databases/CSA/ M/A
FireDB/Firestar database containing residues with functional annotation (Lopez et al. 2007a) and a tool for predicting functional residues in unannotated sequences (Lopez et al. 2007b) http://firedb.bioinfo.cnio.es/ A
Gene3D functional annotation database which searches similarities between unannotated proteins from whichever origin and CATH domains (Yeats et al. 2008) http://gene3d.biochem.ucl.ac.uk/Gene3D/ A
Interpro consortium database which includes annotation from different database members (Mulder et al. 2007) http://www.ebi.ac.uk/interpro/ M/A
iProClass integrative database for protein functional features (Wu et al. 2004) http://pir.georgetown.edu/iproclass/ A
KEGG resource containing information about genes, functions, hierarchies, pathways and ligands (Kanehisa et al. 2008) http://www.genome.jp/kegg/ M/A
MEMSAT predicts the structure of all-helical transmembrane proteins and the location of their constituent helical elements within a membrane (Jones 2007) http://bioinf.cs.ucl.ac.uk/memsat/ A
Panther database of functional assignments for genes and proteins (Thomas et al. 2003) http://www.pantherdb.org/ M/A
Pfam database containing multiple alignments of protein domains and conserved regions (Finn et al. 2008) http://www.sanger.ac.uk/Software/Pfam/ M/A
PIR databases and tools for genomic and proteomic studies (Wu et al. 2007) http://pir.georgetown.edu/ A
PMut server aimed at the prediction of pathological mutations using neural networks (Ferrer-Costa et al. 2005a,b) http://mmb.pcb.ub.es/PMut/ A
PRIDE repository for proteomics data, which allows users to submit, retrieve and compare experimental data (Jones & Côté 2008) http://www.ebi.ac.uk/pride/ R
Prints database of fingerprints characterizing protein families (Attwood 2002) http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/ A
ProDom database of protein domain families generated using SwissProt and TrEMBL sequences (Bru et al. 2005) http://prodom.prabi.fr/ A
Prosite database of functional domains containing protein signatures (Hulo et al. 2006) http://www.expasy.ch/prosite/ A
ProtoNet server which clusters proteins in order to predict structure and function (Kaplan N. et al. 2005) http://www.protonet.cs.huji.ac.il/ A
PupaSuite Web tool focused on the analysis of SNPs (Conde et al. 2006) http://pupasuite.bioinfo.cipf.es/ A
SMART database of functional domains based on profiles obtained through hidden Markov models from homologous sequences (Schultz et al. 1998) http://smart.embl-heidelberg.de/ A
Superfamily database of functional domain assignments (at the SCOP superfamily level) for completely sequenced organisms (Gough et al. 2001) http://supfam.cs.bris.ac.uk/SUPERFAMILY/ A
TIGRFAMs database of protein families collated and annotated using HMMs (Haft et al. 2003) http://www.tigr.org/TIGRFAMs/index.shtml A
TMHMM prediction of transmembrane helices in proteins (Krogh et al. 2001) http://www.cbs.dtu.dk/services/TMHMM/ A
UniprotKB/SwissProt database containing protein information features (The UniProt Consortium 2008) www.ebi.ac.uk/swissprot/ M
UniprotKB/TrEMBL translated version of the EMBL database (The UniProt Consortium 2008) http://www.ebi.ac.uk/TrEMBL/ A
proteomic/structure
CATH classification of protein domain structures mainly based on structural features (secondary structure, architecture and topology) and homology clustering (Greene et al. 2007) http://www.cathdb.info/ M/A
Genomic Threading Database proteome annotation from structure folding recognition (McGuffin et al. 2004) http://bioinf.cs.ucl.ac.uk/GTD/ A
ModBase database of three-dimensional models built by homology modelling (Pieper et al. 2006) http://modbase.compbio.ucsf.edu/ A
MoDEL database containing molecular dynamics trajectories and their analysis (Rueda et al. 2007) http://mmb.pcb.ub.es/MODEL/ A
MSD collection, management and distribution of data about macromolecular structures (Tagari et al. 2006) http://www.ebi.ac.uk/msd/ A
PDBsum structural annotation of each three-dimensional structure deposited in the protein Data Bank (Laskowski et al. 2005a) http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/ A
PISA tool to analyse PDB structures in order to predict the macromolecular interfaces and the quaternary state (Krissinel & Henrick 2007) http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html A
Procognate database of cognate ligands for enzyme structures (Bashton et al. 2008) http://www.ebi.ac.uk/thornton-srv/databases/procognate/ A
ProFunc identifies the likely biochemical function of a protein from its three-dimensional structure (Laskowski et al. 2005b) http://www.ebi.ac.uk/thornton-srv/databases/ProFunc/ A
RCSB PDB atlas of three-dimensional protein structures into the PDB (Berman et al. 2002) http://www.rcsb.org R
SwissModel server aimed at the construction of homology models (Schwede et al. 2003) http://swissmodel.expasy.org/SWISS-MODEL.html A
SCOP structural classification of proteins based on evolutionary information and topology (Andreeva et al. 2004) http://scop.mrc-lmb.cam.ac.uk/scop/ M
wwPDB repository aimed at maintaining a single protein Data Bank archive of macromolecular structural data (Berman et al. 2003) http://www.wwpdb.org/index.html R
other
ArrayExpress database containing curated expression profiles (Parkinson et al. 2007) http://www.ebi.ac.uk/microarray-as/aew/ R
Babelomics integrated system for performing different analyses on gene function (Al-Shahrour et al. 2006) http://babelomics.bioinfo.cipf.es/ A
Brenda database containing enzyme functional information such as Km or substrates (Barthelmes et al. 2007) http://www.brenda-enzymes.info/ M/A
ChEBI dictionary of small chemical compounds (Degtyarenko et al. 2008) http://www.ebi.ac.uk/chebi/ M/A
GEPAS integrated system for performing different analyses on gene expression (Montaner et al. 2006) http://gepas.bioinfo.cipf.es/ A
GSCAN server for the scanning of SNPs and QTLs in the genome (Valdar et al. 2006) http://gscan.well.ox.ac.uk/ A
IntAct database containing molecular interaction data (Kerrien et al. 2007a) http://www.ebi.ac.uk/intact/ M
MACiE database of enzymatic reactions (Holliday et al. 2007) http://www.ebi.ac.uk/thornton-srv/databases/MACiE/ M