Skip to main content
. 2001 Jan 1;29(1):22–28. doi: 10.1093/nar/29.1.22

Table 1. Representation of genomes in the COGsa.

Species
Total no. of encoded proteins
No. of proteins assigned to COGs
Proteins in COGs (%)
Archaea      
Archaeoglobus fulgidus 2420 1817 75
Methanococcus jannaschii 1786 1301 74
Methanobacterium thermoautotrophicum 1873 1365 73
P.abyssi 1767 1430 81
Pyrococcus horikoshiib 2080 1353 66
A.pernixb 2722 1157 43
Bacteria      
Aquifex aeolicus 1560 1312 84
Bacillus subtilis 4118 2767 67
Borrelia burgdorferic 1637 693 43
Campylobacter jejuni 1634 1282 78
Chlamydia trachomatis 895 630 71
Chlamydia pneumoniae 1053 646 62
D.radiodurans 3194 2133 67
Escherichia coli 4285 3308 77
Haemophilus influenzae 1695 1497 88
Helicobacter pylori 1578 1070 68
Mycobacterium tuberculosis 3924 2456 63
Mycoplasma genitalium 471 374 79
Mycoplasma pneumoniae 680 419 62
Neisseria meningitidis 2081 1446 70
Pseudomonas aeruginosa 5567 4166 75
Rickettsia prowazekii 836 673 81
Synechocystis sp. 3168 2048 65
Thermotoga maritima 1858 1497 81
Treponema pallidum 1036 705 68
Vibrio cholerae 3828 2715 71
Ureaplasma urealyticum 613 398 64
Xylella fastidiosa 2766 1481 54
Eukaryotes      
S.cerevisiae 5964 2158 36
Total 68571 45350 66

aNewly added genomes are underlined.

bThe low fraction of proteins assigned to COGs is probably due to over-prediction of protein-coding genes in the original genome annotation (see text and Table 2)

cThe low fraction of proteins assigned to COGs is due to the fact that part of the genome consists of multiple plasmids that code for poorly conserved proteins