Skip to main content
. 2017 Oct 2;13(10):e1005777. doi: 10.1371/journal.pcbi.1005777

Table 2. The number of 10-mers needed to hit all 30-long sequences in four genomes: Two bacterial genomes A. tropicalis, C. crescentus, the worm C. elegans and a mammal genome, H. sapiens.

The genome sizes are quoted after removing all Ns and ambiguous codes. We tested three algorithms: minimizers picking the lexicographically smallest 10-mer, minimizer picking the first in a random k-mer ordering, and selection using the set produced by DOCKS. In case of multiple DOCKS-selected 10-mers in the 30-long window, the lexicographically smallest was chosen. # mers is the number of distinct 10-mers selected, and avg. dist. is the average distance between two selected 10-mers.

Species Genome size (Mbp) Method # mers (thousands) avg. dist.
A. tropicalis 0.393 lexicographic 32.9 9.48
randomized 28.0 11.0
DOCKS 23.7 12.4
C. crescentus 4 lexicographic 114.0 10.2
randomized 89.6 11.0
DOCKS 66.0 12.4
C. elegans 100 lexicographic 286.0 8.83
randomized 277.0 11.0
DOCKS 145.0 12.4
H. sapiens 2900 lexicographic 543.0 9.13
randomized 389.0 10.9
DOCKS 154.0 12.1