Table 1. (A) The estimated completeness for the 10 Accumulibacter genomes in this study. (B) The expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set.
(A) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Genome | AW09 | AW06 | CAPSK01 | AW08 | AW07 | AW12 | CAP2UW1 | CAP1UW1 | AW11 | AW10 |
Completeness | 0.92 | 0.92 | 0.87 | 0.91 | 0.89 | 0.88 | 1 | 0.85 | 0.89 | 0.88 |
(B) | |||
---|---|---|---|
Patterns | Calculation | Expected probability | Sum |
PPPPPPPPPP | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.3494 | 0.349 |
APPPPPPPPP | 0.08 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.0304 | |
PAPPPPPPPP | 0.92 × 0.08 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.0304 | |
PPAPPPPPPP | 0.92 × 0.92 × 0.13 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.0522 | |
PPPAPPPPPP | 0.92 × 0.92 × 0.87 × 0.09 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.0346 | |
PPPPAPPPPP | 0.92 × 0.92 × 0.87 × 0.91 × 0.11 × 0.88 × 1 × 0.85 × 0.89 × 0.88 | 0.0432 | 0.391 |
PPPPPAPPPP | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.12 × 1 × 0.85 × 0.89 × 0.88 | 0.0476 | |
PPPPPPAPPP | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 0 × 0.85 × 0.89 × 0.88 | 0.0000 | |
PPPPPPPAPP | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.15 × 0.89 × 0.88 | 0.0617 | |
PPPPPPPPAP | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.11 × 0.88 | 0.0432 | |
PPPPPPPPPA | 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.12 | 0.0476 |
Given the completeness estimates, it is possible to calculate the expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set. For example, here we present 11 patterns of presence and absences and demonstrate how the probability of each pattern was calculated. The first pattern represents a gene that is present in all genomes. The 10 patterns below represent the possibilities for a single absence. Presence is indicated by a 'P', and absence is indicated by an 'A' or in bold for the calculation. For each pattern, if a gene family was present in a genome, the product of the completeness estimates for those genome was calculated. This was then multiplied by the product of 1 minus the completeness estimate of genomes in which the gene family was absent. The sum of these probabilities within a particular number of genomes may then be calculated. Presence and absence is binomial, therefore, there are 210 (1024) possible patterns.