Table 1. Summary of some datasets of models in ModBase.
Dataset | Number of sequences attempted to be modeled | Number of sequences with reliable fold assignments | Number of models | Access |
---|---|---|---|---|
TrEMBL (2000) | 415 801 | 197 999 | 371 816 | Academic |
TrEMBL (2001) | 539 171 | 304 517 | 625 739 | Academic |
Homo sapiens | 33 093 | 19 437 | 53 965 | |
Mus musculus | 20 792 | 11 772 | 32 138 | |
Drosophila melanogaster | 16 567 | 8692 | 27 240 | |
Caenorhabditis elegans | 19 326 | 9538 | 26 083 | |
Arabidopsis thaliana | 29 213 | 16 052 | 41 164 | |
Saccharomyces cerevisiae | 6714 | 2972 | 7218 | |
Escherichia coli | 13 787 | 6336 | 11 572 | |
Mycoplasma genitalium | 564 | 285 | 533 | |
MODWEB | ∼4500 | 3994 | 5140 | Private |
NYSGRC | 13 451 | 7956 | 27 886 | NYSGRC |
Drosophila melanogaster* | 21 225 | 7112 | 200 153 | Academic |
Saccharomyces cerevisiae ribosome | 109 | 80 | 221 | Academic |
The number of sequences attempted to be modeled indicates the number of original sequences submitted to MODPIPE. For a definition of a reliable fold assignment see ‘Contents’. The number of models can be larger than the number of sequences because different segments of a sequence may be modeled independently and because the same segment may be modeled based on different template structures. The two TrEMBL datasets correspond to the June 2000 and March 23, 2001 versions of the complete TrEMBL database, respectively. For the 2001 TrEMBL dataset, the numbers for several organisms are shown separately. These numbers correspond to all the entries in the TrEMBL database, including multiple submissions, mutants and partial sequences. The MODWEB datasets are created by the MODWEB comparative modeling web server (http://guitar.rockefeller.edu/modweb) (N.Eswar and A.Sali, manuscript in preparation). The NYSGRC datasets are used in target selection and structure-based annotation by NYSGRC (37). The D.melanogaster* dataset contains models for the over-predicted putative genes in the D.melanogaster genome (38). The S.cerevisiae ribosome dataset contains comparative models for proteins in the yeast ribosome (40).