Table 2. Fraction of cohesive modules for different datasets and different scoring schemes.
Dataset | Average Cooccurrence | Average Deviation from Modular | Homogeneous Columns | Species Absent | Species Present | Species Absent, Species Present |
SGD | 0.14 | 0.15 | 0.09 | 0.06 | 0.03 | 0.44 |
KEGG | 0.24 | 0.24 | 0.17 | 0.08 | 0.16 | 0.38 |
MIPS | 0.17 | 0.17 | 0.15 | 0.05 | 0.1 | 0.33 |
Aloy | 0.21 | 0.23 | 0.16 | 0.02 | 0.1 | 0.31 |
PE | 0.08 | 0.08 | 0.06 | 0.03 | 0.05 | 0.21 |
Socio-affinity | 0.27 | 0.3 | 0.2 | 0.01 | 0.19 | 0.24 |
All | 0.18 | 0.2 | 0.14 | 0.03 | 0.12 | 0.27 |
All curated | 0.19 | 0.19 | 0.15 | 0.06 | 0.1 | 0.37 |
Average Cooccurrence: for each pair of module subunits we calculate the fraction of species in which both subunits are either present or absent together. We average over all component pairs to obtain a score per module. Average deviation from modular: the sum of the deviation of the number of components of the functional module for each genome to the average number of module components per genome, adopted from Snel et al. [9]. Homogeneous Columns: the number of species in which a module is either completely present or completely absent, adopted from Gavin et al. [14]. Species Present, Species Absent: the number of species in which a module is completely present and the number of species in which the module is completely absent. Those two values together make up the raw score which is used throughout the article.