Table 1.
fitting error | fluidity | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
G | M | ΔA | ΔB | ΔC | ΔD | φobs | |||||
B. anthracis | 13 | 5523 | 80 | 21 | 78 | 13 | 0.08 | 0.09 | 0.08 | 0.09 | 0.08 |
E. coli | 15 | 4576 | 98 | 58 | 47 | 2.6 | 0.25 | 0.30 | 0.25 | 0.29 | 0.25 |
Staph. aureus | 19 | 2651 | 29 | 16 | 21 | 4.3 | 0.16 | 0.19 | 0.16 | 0.19 | 0.16 |
Strep. pneumonia | 26 | 2095 | 42 | 21 | 30 | 4.3 | 0.23 | 0.32 | 0.24 | 0.30 | 0.23 |
Strep. pyogenes | 14 | 1786 | 26 | 10 | 25 | 7.5 | 0.20 | 0.24 | 0.20 | 0.24 | 0.21 |
N. meningitidis | 12 | 2080 | 53 | 26 | 31 | 2.4 | 0.28 | 0.33 | 0.28 | 0.32 | 0.28 |
Model A assumes a constant population size, and the same gene transfer process for all genes. Model B assumes an exponentially growing population size. Model C assumes that a part of the genome is shared by all genomes (a rigid core); the other part is subjected to the same gene transfer process as in model A. Model D assumes two parts in the genomes, governed by different gene transfer rates. We determined for the four models the parameters that minimize the distance Δ between the empirical and the theoretical gene frequency distribution (see Materials and Methods for the definition of Δ). For each of the 6 bacterial species analyzed, we report the number of analyzed genomes G, the genome size M (average number of genes per genome), the distance Δ for the model fits, the genomic fluidity φobs estimated on the data, and the fluidity φpred for the model fits. Recall that model A has one parameter, models B and C have two parameters, and model D has three parameters.