Skip to main content
. 2012 Aug;19(8):957–967. doi: 10.1089/cmb.2011.0044

Table 1.

Data Sets for Performance Evaluation

Data set Family Protein avg_pro avg_len
Pfam 9318 2286710 245.4 151.1
A. thaliana 1962 36387 18.6 114.7
C. elegans 1164 15971 13.7 137.4
D. melanogaster 1294 13664 10.6 128.6
E. coli 404 2177 5.4 159.7
H. sapiens 1596 23051 14.4 102.2
M. musculus 1516 21891 14.4 110.8
S. cerevisiae 814 4526 5.6 157.2
SCOP 2234 15045 6.7 169.8
ProtClustDB 6521 356615 54.7 348.1
UniProt 8213 448469 54.6 343.7

For each set, family is the number of families, protein is the total number of proteins in all families, avg_pro is the average number of proteins within a family, and avg_len is the average length of proteins in amino acids.