Skip to main content
. 2016 Apr 29;10(12):2931–2945. doi: 10.1038/ismej.2016.67

Table 1. (A) The estimated completeness for the 10 Accumulibacter genomes in this study. (B) The expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set.

(A)
Genome AW09 AW06 CAPSK01 AW08 AW07 AW12 CAP2UW1 CAP1UW1 AW11 AW10
Completeness 0.92 0.92 0.87 0.91 0.89 0.88 1 0.85 0.89 0.88
(B)
Patterns Calculation Expected probability Sum
PPPPPPPPPP 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.3494 0.349
APPPPPPPPP 0.08 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.0304  
PAPPPPPPPP 0.92 × 0.08 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.0304  
PPAPPPPPPP 0.92 × 0.92 × 0.13 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.0522  
PPPAPPPPPP 0.92 × 0.92 × 0.87 × 0.09 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.0346  
PPPPAPPPPP 0.92 × 0.92 × 0.87 × 0.91 × 0.11 × 0.88 × 1 × 0.85 × 0.89 × 0.88 0.0432 0.391
PPPPPAPPPP 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.12 × 1 × 0.85 × 0.89 × 0.88 0.0476  
PPPPPPAPPP 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 0 × 0.85 × 0.89 × 0.88 0.0000  
PPPPPPPAPP 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.15 × 0.89 × 0.88 0.0617  
PPPPPPPPAP 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.11 × 0.88 0.0432  
PPPPPPPPPA 0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.12 0.0476  

Given the completeness estimates, it is possible to calculate the expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set. For example, here we present 11 patterns of presence and absences and demonstrate how the probability of each pattern was calculated. The first pattern represents a gene that is present in all genomes. The 10 patterns below represent the possibilities for a single absence. Presence is indicated by a 'P', and absence is indicated by an 'A' or in bold for the calculation. For each pattern, if a gene family was present in a genome, the product of the completeness estimates for those genome was calculated. This was then multiplied by the product of 1 minus the completeness estimate of genomes in which the gene family was absent. The sum of these probabilities within a particular number of genomes may then be calculated. Presence and absence is binomial, therefore, there are 210 (1024) possible patterns.