Skip to main content
. 2019 Mar 13;568(7753):505–510. doi: 10.1038/s41586-019-1058-x

Extended Data Fig. 5. Effect of completeness and contamination on the identification of OTUs from whole genomes.

Extended Data Fig. 5

ac, OTUs were identified for 296 genomes from the Bacteroides genus on the basis of average-linkage clustering of whole-genome ANI, using the ANIcalculator (v.1.0). The ANI cut-offs used for forming OTUs are indicated in the panel titles (94–97% ANI). The alignment fraction cut-offs, defined as the required percentage of genome length aligned between genome pairs (20–60%), is indicated by line colour. In each panel, the vertical axis indicates the number of OTUs identified from genomes on the basis of the ANI cut-off, alignment fraction cut-off and the degree of incompleteness and/or amount of contamination present in the 296 genomes. a, OTUs were identified for the 296 Bacteroides genomes with up to 80% of genes randomly removed. The number of OTUs is inflated when genomes are incomplete and the alignment fraction is >20%. b, OTUs were identified for the 296 Bacteroides genomes with up to 20% of genes from a different one of the 296 genomes. The number of OTUs is not affected by contamination when genomes are complete. c, OTUs were identified for the 296 Bacteroides genomes with 50% of genes randomly removed and up to 20% of genes from a different one of the 296 genomes, representing a worst-case scenario. The number of OTUs is inflated by contamination when genomes are 50% complete. Using a lower ANI threshold (for example, 94 or 95% versus 96 or 97%) reduces the negative effect of contamination. On the basis of these experiments, we chose an alignment fraction cut-off of 20% and an ANI cut-off of 95% for identifying OTUs from MAGs and reference genomes in the current study.