Skip to main content
. 2014 May 17;15:146. doi: 10.1186/1471-2105-15-146

Table 1.

Sizes of gene set collections built from the NCBI gene2go table 1

 
 
 
Number of gene sets in collection (average number of genes in set)
Taxon ID Organism Number of genes with GO annotation All evidence codes High quality evidence codes
234826
Anaplasma marginale str. St. Maries
196
48 (40)
 
212042
Anaplasma phagocytophilum str. HZ
1288
218 (55)
221 (60)
3702
Arabidopsis thaliana
27942
2032 (129)
1951 (85)
227321
Aspergillus nidulans FGSC A4
7326
1152 (69)
35 (31)
198094
Bacillus anthracis str. Ames
5097
465 (81)
466 (81)
9913
Bos taurus
5567
2634 (67)
1285 (58)
6239
Caenorhabditis elegans
12642
1505 (84)
1098 (81)
195099
Campylobacter jejuni RM1221
1826
315 (62)
316 (63)
246194
Carboxydothermus hydrogenoformans Z-2901
2609
363 (64)
362 (65)
227377
Coxiella burnetii RSA 493
1798
271 (67)
272 (67)
214684
Cryptococcus neoformans var. neoformans JEC21
3427
969 (68)
 
7955
Danio rerio
16957
2201 (83)
1342 (68)
243164
Dehalococcoides ethenogenes 195
1583
265 (72)
265 (71)
352472
Dictyostelium discoideum AX4
7694
1184 (86)
801 (72)
7227
Drosophila melanogaster
12560
2750 (83)
2459 (78)
205920
Ehrlichia chaffeensis str. Arkansas
1090
221 (56)
223 (59)
511145
Escherichia coli str. K-12 substr. MG1655
2518
198 (112)
 
9031
Gallus gallus
2104
1460 (64)
643 (52)
243231
Geobacter sulfurreducens PCA
3269
347 (82)
348 (82)
9606
Homo sapiens
18106
5808 (82)
4403 (81)
265669
Listeria monocytogenes serotype 4b str. F2365
2811
384 (79)
385 (79)
243233
Methylococcus capsulatus str. Bath
2902
377 (72)
378 (72)
10090
Mus musculus
24667
5615 (79)
3643 (74)
222891
Neorickettsia sennetsu str. Miyayama
928
204 (54)
206 (56)
39947
Oryza sativa Japonica Group
4266
30 (18)
2 (14)
36329
Plasmodium falciparum 3D7
1770
212 (65)
219 (67)
223283
Pseudomonas syringae pv. tomato str. DC3000
3950
436 (73)
439 (77)
10116
Rattus norvegicus
18599
5746 (79)
3081 (75)
246200
Ruegeria pomeroyi DSS-3
4250
497 (85)
496 (86)
559292
Saccharomyces cerevisiae S288c
6244
2005 (75)
1849 (74)
284812
Schizosaccharomyces pombe 972 h-
5276
1627 (82)
1118 (67)
211586
Shewanella oneidensis MR-1
4272
418 (79)
419 (79)
999953
Trypanosoma brucei brucei strain 927/4 GUTat10.1
1073
157 (74)
147 (80)
9606
Homo sapiens (MSigDB collection)
18106
 
1422 (69)2
9606 Homo sapiens (From Affymetrix annotation file) 18106 5383 (80)  

Gene sets were built from the NCBI gene2go annotation table and GO ontology downloaded on 13th September 2013. Default settings were used which filter out gene sets containing fewer than 10 or more than 700 genes. Organisms were omitted when the biggest collection contained fewer than 30 sets. In cases where use of all evidence codes reduces the number of gene sets compared with using high quality codes only, this is due to maximum set size filtering. 1For comparison the currently available MSigDB GO based human collection and a human set built from the annotation file for the Affymetrix HG-U133 Plus 2.0 array are also shown. 2Set number and sizes were calculated for the MSigDB collection with filtering as above (the full collection contains 1454 gene sets).