Table 2. Sequence Redundancy in Top-50 Annotating Articles.
Species | num. articles | num. prot | Clusters at 100% | % redundancy | Mean genes/cluster |
C. elegans | 12 | 8416 | 3338 | 60 | 3.74 |
A. thaliana | 16 | 8879 | 4694 | 47 | 3.92 |
M. musculus | 3 | 4220 | 2273 | 46 | 2.75 |
M. tuberculosis | 2 | 2351 | 1702 | 28 | 2.22 |
S. cerevisiae | 5 | 3542 | 2550 | 28 | 2.33 |
H. sapiens | 4 | 5593 | 4509 | 19 | 2.36 |
D. melanogaster | 3 | 1217 | 1003 | 18 | 2.17 |
S. pombe | 2 | 4502 | 4281 | 5 | 2.00 |
Species: annotated species; num. articles number of annotating articles; num. prot: number of proteins annotated by top-50 articles for that species; Clusters at 100%: number of clusters of 100% identical proteins; % redundancy: the product of column 4 by column 3: this is the percentage of proteins annotated more than once for a given species in the top 50 articles; Mean genes/cluster: the mean number of genes per cluster, for clusters having more than a single gene.