Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Aug 5;20(10):e251–e260. doi: 10.1016/S1473-3099(20)30199-7

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2020 Elsevier Ltd. All rights reserved.

Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.

PMC Copyright notice

The importance of reference database choice, design, and versioning in taxonomic profiling of clinical metagenomics samples

(A) Schematic representation of a typical clinical metagenomics sample with species assigned as coloured DNA and grey denoting DNA deriving from the host, contaminants, unidentified taxa, or taxa sequenced at low depth. The pie chart provides the full metagenomic composition with the bar providing the species composition excluding host DNA and contaminants. (B) Taxonomic profiling based on database 1. Species confidently assigned are highlighted by colours with unassigned species shown in grey. Using database 1, species A, B, and D are correctly assigned. Species that are misassigned are outlined with a circle. In this instance, sequences from species C are assigned to the closely related species C' because of the lack of a representative of species C in the reference database. Additionally, the reference database contains a partially contaminated sequence from species E, which is misassigned to contaminant sequences in the test clinical metagenomics sample. This affects the inference of species composition shown in the bar. (C) The addition of species F to database 2 allows assignment of a greater proportion of the species present in the original clinical metagenomics sample. Quality control and improvement of reference species E, now species E (QC), removes the spurious assignment of contaminant species. Species C is still misassigned to species C', its closest representative in the database. (D) Updating the reference database to include species C results in the correct assignment of sequences to species C rather than species C'. Species F is taxonomically reassigned to species X, leading to a change in the assigned species name despite no change in the data in the reference or query datasets. In all cases the pink sequences present in the original clinical metagenomics sample are not assigned as this species is not present in any of the three reference databases.