Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 May 5;9:e11348. doi: 10.7717/peerj.11348

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

©2021 Léonard et al.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

PMC Copyright notice

(A) The preparation phase consists in downloading newly released prokaryotic genomes from NCBI RefSeq and to pre-compute all per-genome information required to run the second phase: k-mer composition, assembly quality, annotation richness, contamination level (and completeness) level. Pre-computed information for single genomes is stored in a relational database associated with TQMD. (B) The dereplication phase then retrieves this information for all genomes to dereplicate from the database and clusters the genomes from pairwise distances computed on the fly. Cluster representatives (one per cluster created) are chosen for each cluster based on the single-genome metrics computed during the preparation phase. The dereplication is iterative and the process repeats until representative genomes cannot be dereplicated anymore, which produces the final list of representatives. Parallelized steps are shown as overlaid boxes.