Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2017 Dec 22;8:2260. doi: 10.1038/s41467-017-02209-5

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2017

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Fig. 1 — StrainEst overview. a Given the complete and the draft genomes of the species of interest (G1, G2,…) and the species representative (SR), the pairwise Mash distances are computed. Genomes with Mash distances >0.1 from the SR are discarded and the remaining ones are clustered to remove redundant sequences. For each cluster, the genome with the lowest average distance from the other members is chosen as a representative (R1, R2,…). b The representative sequences are mapped using nucmer against SR and ambiguous mappings are removed. c For each representative, the positions of the variant sites (P1, P2,…) are identified and the SNV profiles are extracted. The profiles are clustered at 99% identity to guarantee their representativeness. d To create a reference set for metagenomic reads alignments that takes into account the variability of the species, representative genomes are selected for the metagenome alignment step (A1, A2, …) and (e) mapped against SR. f For each metagenome (MG), the reads are aligned to the chosen genomes using Bowtie 2. g The frequencies of the allelic variants at the variant positions defined in step (c) are extracted from the BAM file; sites with low coverage are filtered according to user-defined filtering parameters; the relative abundance profile is finally inferred by Lasso regression