Table 1.
Data set | Family | Protein | avg_pro | avg_len |
---|---|---|---|---|
Pfam | 9318 | 2286710 | 245.4 | 151.1 |
A. thaliana | 1962 | 36387 | 18.6 | 114.7 |
C. elegans | 1164 | 15971 | 13.7 | 137.4 |
D. melanogaster | 1294 | 13664 | 10.6 | 128.6 |
E. coli | 404 | 2177 | 5.4 | 159.7 |
H. sapiens | 1596 | 23051 | 14.4 | 102.2 |
M. musculus | 1516 | 21891 | 14.4 | 110.8 |
S. cerevisiae | 814 | 4526 | 5.6 | 157.2 |
SCOP | 2234 | 15045 | 6.7 | 169.8 |
ProtClustDB | 6521 | 356615 | 54.7 | 348.1 |
UniProt | 8213 | 448469 | 54.6 | 343.7 |
For each set, family is the number of families, protein is the total number of proteins in all families, avg_pro is the average number of proteins within a family, and avg_len is the average length of proteins in amino acids.