Skip to main content
. 2002 Jan 1;30(1):255–259. doi: 10.1093/nar/30.1.255

Table 1. Summary of some datasets of models in ModBase.

Dataset Number of sequences attempted to be modeled Number of sequences with reliable fold assignments Number of models Access
TrEMBL (2000) 415 801 197 999 371 816 Academic
TrEMBL (2001) 539 171 304 517 625 739 Academic
Homo sapiens  33 093  19 437  53 965  
Mus musculus  20 792  11 772  32 138  
Drosophila melanogaster  16 567   8692  27 240  
Caenorhabditis elegans  19 326   9538  26 083  
Arabidopsis thaliana  29 213  16 052  41 164  
Saccharomyces cerevisiae   6714   2972   7218  
Escherichia coli  13 787   6336  11 572  
Mycoplasma genitalium    564    285    533  
MODWEB  ∼4500   3994   5140 Private
NYSGRC  13 451    7956  27 886 NYSGRC
Drosophila melanogaster*  21 225   7112 200 153 Academic
Saccharomyces cerevisiae ribosome   109     80    221 Academic

The number of sequences attempted to be modeled indicates the number of original sequences submitted to MODPIPE. For a definition of a reliable fold assignment see ‘Contents’. The number of models can be larger than the number of sequences because different segments of a sequence may be modeled independently and because the same segment may be modeled based on different template structures. The two TrEMBL datasets correspond to the June 2000 and March 23, 2001 versions of the complete TrEMBL database, respectively. For the 2001 TrEMBL dataset, the numbers for several organisms are shown separately. These numbers correspond to all the entries in the TrEMBL database, including multiple submissions, mutants and partial sequences. The MODWEB datasets are created by the MODWEB comparative modeling web server (http://guitar.rockefeller.edu/modweb) (N.Eswar and A.Sali, manuscript in preparation). The NYSGRC datasets are used in target selection and structure-based annotation by NYSGRC (37). The D.melanogaster* dataset contains models for the over-predicted putative genes in the D.melanogaster genome (38). The S.cerevisiae ribosome dataset contains comparative models for proteins in the yeast ribosome (40).