Skip to main content
. 2016 Apr 20;7:459. doi: 10.3389/fmicb.2016.00459

Figure 2.

Figure 2

Precision of taxonomy assignments is affected by highly similar sequences in different taxa. (A) For the 16S libraries described in Figure 1, sequences were clustered into operational taxonomic units (OTUs) using a 97% similarity threshold and taxonomy assignments were performed with the RDP classifier. Sequences from OTUs classified as Bifidobacterium (n = 3), Agrobacterium (n = 3), Streptococcus (n = 3), Lactobacillus (n = 3), Bacteroides (n = 3), Peptostreptococcaceae (n = 4), or Enterobacteriaceae (n = 9) were randomly extracted and aligned to the Greengenes database to extract the closest relative (best hit). In addition, we included Greengenes 16S rRNA gene sequences (in green) from Clostridium difficile and C. botulinum as reference for Peptostreptococcaceae and Citrobacter freundii and Enterobacter cloacae as reference for Enterobacteriaceae. The V4 region of the 16S rRNA gene was cropped from the Greengenes sequences to construct a phylogenetic tree with MEGA-6, using UPGMA hierarchical clustering and 10,000 bootstraps. (B) Sequences from our bacterial populations in Figure 1 were aligned against the NCBI nt and human microbiome project (HMP) databases to identify the most similar reference genome. For each bacterium, a simulated library was created by segmenting the reference genome sequence into 500 nt stretches (250 nt paired ends in a head-to-tail orientation), iterating the process to generate ~1.5 million sequences. This simulated library was aligned back to the reference genome and the taxonomy resolved with MEGAN5. As examples, we show the reads classification of Bifidobacterium breve, Bacteroides thetaiotamicron, and Escherichia coli, which accumulated a large proportion of reads that could be resolved at the species, genus or family levels, respectively. Color-matched bars on the right show the proportion of reads accumulated at each level for these particular examples. S, species; G, genus; F, family; O, order; C, class; P, phylum.