Table 2.
Method | k-mer Size * | Same Species | Genomic Distance Threshold ** | Different Species |
---|---|---|---|---|
Mash | 14 | less than | 0.004129 | greater than |
Mash | 16 | less than | 0.04317 | greater than |
Mash | 18 | less than | 0.05781 | greater than |
Mash | 20 | less than | 0.05915 | greater than |
Mash | 22 | less than | 0.05833 | greater than |
Dashing | 14 | greater than | 0.9076 | less than |
Dashing | 16 | greater than | 0.3736 | less than |
Dashing | 18 | greater than | 0.2437 | less than |
Dashing | 20 | greater than | 0.2084 | less than |
Dashing | 22 | greater than | 0.1863 | less than |
* The accuracy of the classification by thresholds was similar for different k-mer sizes in the table (0.17% difference between the minimum and maximum accuracy in case of Mash distances, and 0.60% difference in accuracy in case of Dashing distances). ** Distances were calculated from a reduced genome dataset of 1319 genomes after excluding incompletely identified species, species hybrids and outlier genomes and species (full description of the dataset is provided in the Methods section), therefore the thresholds in this table are different from the thresholds determined on the full dataset (Figure 2, Figure 3 and Figure 4).