Table 1.
k-mer size | # Canonical k-mer combinations | % of k-mers found per sample | % of k-mers found per sample, shared by at least two samples | ||
---|---|---|---|---|---|
Median | MAD | Median | MAD | ||
11-mer | 2.10 × 1006 | 100.00 % | 1.58 % | 100.00 % | 0.00 % |
15-mer | 5.40 × 1008 | 53.59 % | 17.07 % | 100.00 % | 0.00 % |
17-mer | 8.60 × 1009 | 8.90 % | 4.03 % | 98.37 % | 0.99 % |
21-mer | 2.20 × 1012 | 0.05 % | 0.03 % | 81.45 % | 20.55 % |
31-mer | 2.30 × 1018 | 0.000000061 % | 0.000000032 % | 67.05 % | 24.14 % |
The second column contains the total number of possible k-mers, calculated as (4k-mer size/2), where the division by two is due to canonization. The third column is the median and the Median Absolute Deviation (MAD) of the total number of k-mers found in the samples (Additional file 3: Table S3) divided by the number of possible k-mers, showing the percentage of combinations actually found and, consequently, the saturation of the search space; the fourth column gives the median and MAD of the percentage of valid k-mers (k-mers shared between at least two samples, Additional file 3: Table S3)