Table 1. List of protein databases available for GOblet analysis.
Database | Description | Number of proteins | Number of GO-ids | Non-IEA |
---|---|---|---|---|
swiss-tr | GO annotations for proteins (all species) from SwissProt and TrEMBL (EBI) | 810 379 | 7297 | 4999 |
swiss-tr-human | GO annotations for human proteins from SwissProt, TrEMBL (EBI) | 27 924 | 4206 | 3106 |
ensembl-human | GO-ids for human proteins (ENSEMBL) | 16 290 | 4970 | 4334 |
murines | The murine subset of SP-TrEMBL, mainly Mus musculus and Rattus norvegicus (EBI) | 32 814 | 3559 | 2284 |
drosophila | GO annotations for Drosophila melanogaster proteins (flybase) | 10 977 | 3489 | 3376 |
wormpep | GO annotated Caenorhabditis elegans transcripts (WormBase) | 7406 | 917 | 25 |
yeast | GO annotated yeast transcripts (SGD) | 5882 | 2372 | 2372 |
viridiplantae | The Viridiplantae (green plants) set is a subset of SP-TrEMBL | 86 595 | 1676 | 78 |
a.thaliana | Arabidopsis thaliana GOA and protein sequences (TAIR) | 29 013 | 2659 | 2098 |
o.sativa | Rice (Oryzes sativa) protein set (GRAMENE) | 18 879 | 955 | 420 |
In all cases except human (ENSEMBL), the GO annotations were downloaded from the Gene Ontology website. Sequences corresponding to the GOA tables were extracted either from the SwissProt/TrEMBL database maintained at the European Bioinformatics Institute, or from species-specific repositories (TAIR, GRAMENE, SGD, WormBase, flybase). The last column gives the total amount of non-IEA evidence codes. Respective links to the source databases can be found at http://goblet.molgen.mpg.de.