Table 4.
Total data set
|
Groups with ≥1 protein for which complete EC annotation is available
|
Groups with ≥2 proteins for which complete EC annotations are available
|
Consistent EC assignmentsc
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Inflation (cluster tightness) | Groups | Proteins (% of proteome)a | Groups | Proteins | EC-annotated (% of total)b | Groups | Proteins | EC-annotated (% of total) | Groups (% poss.) | Proteins | EC-annotated (% possible) |
1.1 | 6,249 | 50,771 (50) | 999 | 12,032 | 2,921 (82) | 664 | 9,561 | 2,586 (73) | 528 (80) | 5,476 | 1,958 (76) |
1.5 | 7,265 | 47,668 (47) | 1,117 | 8,730 | 2,877 (81) | 696 | 6,318 | 2,456 (69) | 596 (86) | 4,768 | 2,067 (84) |
2.0 | 7,569 | 46,245 (46) | 1,148 | 8,343 | 2,849 (80) | 701 | 5,916 | 2,402 (67) | 611 (87) | 4,610 | 2,073 (86) |
2.5 | 7,681 | 45,473 (45) | 1,160 | 8,171 | 2,840 (80) | 705 | 5,789 | 2,385 (67) | 617 (88) | 4,556 | 2,062 (86) |
3.0 | 7,786 | 44,729 (44) | 1,172 | 7,975 | 2,831 (79) | 706 | 5,553 | 2,365 (66) | 621 (88) | 4,450 | 2,059 (87) |
3.5 | 7,857 | 44,263 (44) | 1,180 | 7,889 | 2,821 (79) | 707 | 5,506 | 2,348 (66) | 624 (88) | 4,444 | 2,057 (88) |
4.0 | 7,896 | 43,900 (43) | 1,186 | 7,784 | 2,811 (79) | 704 | 5,414 | 2,329 (65) | 623 (88) | 4,372 | 2,048 (88) |
Total proteome size = 101,047 (Arabidopsis thaliana, 25,009 sequences; Caenorhabditis elegans, 19,774; Drosophila melanogaster, 13,288; Homo sapiens, 27,049; Plasmodium flaciparum, 5279; Saccharomyces cerevisiae, 6358; Escherichia coli, 4290).
A total of 3562 EC-annotated proteins were obtained from the ENZYME database (A. thaliana, 370; C. elegans, 269; D. melanogaster, 210; H. sapiens, 1160; S. cerevisiae, 778; E. coli, 775).
All EC-annotated sequences in the group were assigned the same EC number. Percentages indicate fraction of ortholog groups containing at least two complete EC assignments (the only data set for which consistency can be assessed), or percentage of EC-annotated sequences properly identified.