FIGURE 1.
Pipeline overview. First, the complete genome data set was downloaded from the NCBI and filtered based on 95% ANI to obtain 126 species and 597 strains. Then, the one-to-one pairwise coverage of all strains of each species was calculated. In this case, the strain with the highest minimum coverage value for a given species was selected as the representative strain for that species. For example, if there are four strains of a species, we consider each of the strains as a reference strain; the coverage is calculated by aligning sequence data of the remaining three strains to the reference strain using bowtie2. Afterward, the minimum coverage of each strain is compared (Orange: strain_1: 0.79, strain_2: 0.86, strain_3: 0.87, strain_4: 0.83) and that with the highest value is selected (Red: strain_3: 0.87). Thus, the strain with the highest coverage is the representative strain of that species. A reference database was constructed using the representative strains and whole-metagenome shotgun sequencing data of probiotic probiotics were aligned to it. Only species exceeding 0.7137 coverage were judged to be present in the probiotic product.
