Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 Jun 20;8(4):e00961-22. doi: 10.1128/msystems.00961-22

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2023 Kishore et al.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

PMC Copyright notice

Fig 3 — Taxonomic reference databases vary in terms of their taxonomy assignments below the order level. (A) The taxonomic assignments of the top 50 representative sequences using the three different reference databases. This result illustrates how the same sequences are assigned to different genera under different databases. A significant portion of the representative sequences is assigned to an “unknown” genus in two of three databases (GreenGenes and NCBI). The number of assigned genera for each database is displayed at the top of each column. (B) The number of representative sequences assigned to the same taxonomic label when using different reference databases (for the top 100 representative sequences). The mismatches are fewer at higher taxonomic levels but, even at the order level there exists greater than 51% of mismatches, demonstrating the poor agreement in taxonomic labels assigned by the different databases. The data used for the analysis in (A and B) were samples (healthy and ASD) from the FMT data set. (C) The Bray-Curtis dissimilarity between the predicted taxonomy profile and expected taxonomy profile in the mock data sets shows that there is no singular best choice of database for every data set, as all the databases show similar performances. The GreenGene database and the Naive Bayes classifier are chosen as the defaults for the TA step of MiCoNE due to their popularity. The data sets used for the analysis in (C) were the mock data sets from mockrobiota.