Table 1. Statistics for each of the core species’ canonical sequence sets.
Species | # of sequences in ref. proteome | # of sequences with >1 model | ≥ 80% | ≥60% | ≥40% | ≥20% | <20% | No template |
---|---|---|---|---|---|---|---|---|
H. sapiens | 21 006 | 15 195 | 5010 | 3299 | 2673 | 2648 | 1965 | 5411 |
M. musculus | 22 274 | 16 860 | 6331 | 3337 | 2789 | 2765 | 1638 | 5414 |
C. elegans | 20 071 | 10 566 | 3489 | 1917 | 1852 | 2026 | 1282 | 9505 |
E. coli K12 | 4306 | 3306 | 2620 | 274 | 211 | 132 | 69 | 1000 |
A. thaliana | 27 252 | 17 132 | 6544 | 3335 | 2772 | 3004 | 1477 | 10 120 |
D. melanogaster | 13 704 | 8502 | 2956 | 1488 | 1309 | 1525 | 1224 | 5202 |
S. cerevisiae | 6721 | 4101 | 1665 | 590 | 550 | 751 | 545 | 2620 |
C. crescentus | 3715 | 2633 | 1914 | 303 | 191 | 154 | 71 | 1082 |
M. tuberculosis | 3987 | 2921 | 2000 | 338 | 275 | 205 | 103 | 1066 |
P. aeruginosa | 5550 | 4270 | 3271 | 405 | 311 | 199 | 84 | 1280 |
S. aureus | 2881 | 1925 | 1500 | 170 | 117 | 94 | 44 | 956 |
P. falciparum | 5340 | 2781 | 722 | 352 | 382 | 535 | 790 | 2559 |
For each species we show the total number of canonical sequences in the reference proteome (according to UniProtKB), the number of sequences for which we have at least one model, followed by the number of sequences that have models that cover at least 80% (60%, 40%, etc.) of the respective reference sequence.