Skip to main content
letter
. 2003 Nov;13(11):2444–2449. doi: 10.1101/gr.1190803

Table 2.

Data Sets Used for Cross-Species Evaluation

Organism Protein sequences Assigned by EUCLID
Homo sapiens 23,740 13,419
Drosophila melanogaster 14,334 7235
Caenorhabditis elegans 20,263 7840
Arabidopsis thaliana 25,617 11,771
Schizosaccharomyces pombe 4952 2786
Saccharomyces cerevisiae 6329 3302
Aeropyrum pernix 2694 684
Pyrobaculum aerophilum 2605 867
Sulfolobus solfataricus 2977 1186
Sulfolobus tokodaii 2826 1045
Archaeoglobus fulgidus 2407 1074
Methanobacterium thermoautotrophicum 1869 867
Methanococcus jannaschii 1715 781
Methanosarcina mazei 3371 1420
Methanopyrus kandleri 1691 653
Methanosarcina acetivorans 4540 1850
Pyrococcus abyssi 1765 855
Pyrococcus horikoshii 2064 786
Thermoplasma acidophilum 1031 783
Thermoplasma volcanium 1499 792
Anabaena sp. 5366 2444
Aquifex aeolicus 1522 926
Borrelia burgdorferi 850 461
Bacillus halodurans 4066 2223
Bacillus subtilis 4100 2240
Buchnera sp. 564 469
Campylobacter jejuni 1654 975
Chlamydia pneumoniae 1052 530
Chlamydia trachomatis 894 498
Deinoccocus radiodurans 2937 1332
Escherichia coli 4289 2883
Fusobacterium nucleatum 2068 1083
Haemophilus influenzae 1709 1183
Helicobacter pylori 1566 815
Lactococcus lactis 2266 1229
Mycoplasma genitalium 480 332
Mycoplasma pneumoniae 677 441
Mycobacterium tuberculosis 3918 1973
Neisseria meningitides ser. A 2121 1132
Neisseria meningitides ser. B 2025 1088
Rickettsia prowazekii 834 548
Streptomyces coelicolor 7848 3625
Synechocystis sp. 3169 1598
Thermotoga maritima 1846 1064
Treponema pallidum 1031 507
Vibrio cholerae 3828 2054
Xylella fastidiosa 2766 1184
Yersinia pestis 4008 2566

The column “protein sequences” lists the number of protein-coding regions annotated in the genomes, with the exception of the organisms H. sapiens, D. melanogaster, C. elegans, and A. thaliana (see text for details on these data sets). The protein sequences that could be assigned to a cellular role by the EUCLID method (last column) show the amount of data available for validation of the ProtFun method for each organism.