Skip to main content
. 2016 Nov 28;45(Database issue):D313–D319. doi: 10.1093/nar/gkw1132

Table 1. Statistics for each of the core species’ canonical sequence sets.

Species # of sequences in ref. proteome # of sequences with >1 model ≥ 80% ≥60% ≥40% ≥20% <20% No template
H. sapiens 21 006 15 195 5010 3299 2673 2648 1965 5411
M. musculus 22 274 16 860 6331 3337 2789 2765 1638 5414
C. elegans 20 071 10 566 3489 1917 1852 2026 1282 9505
E. coli K12 4306 3306 2620 274 211 132 69 1000
A. thaliana 27 252 17 132 6544 3335 2772 3004 1477 10 120
D. melanogaster 13 704 8502 2956 1488 1309 1525 1224 5202
S. cerevisiae 6721 4101 1665 590 550 751 545 2620
C. crescentus 3715 2633 1914 303 191 154 71 1082
M. tuberculosis 3987 2921 2000 338 275 205 103 1066
P. aeruginosa 5550 4270 3271 405 311 199 84 1280
S. aureus 2881 1925 1500 170 117 94 44 956
P. falciparum 5340 2781 722 352 382 535 790 2559

For each species we show the total number of canonical sequences in the reference proteome (according to UniProtKB), the number of sequences for which we have at least one model, followed by the number of sequences that have models that cover at least 80% (60%, 40%, etc.) of the respective reference sequence.