Skip to main content
. 2002 Apr;12(4):656–664. doi: 10.1101/gr.229202

Table 12.

Alignment Times in Seconds of 10,000 ESTs (Average Size 380 Bases) Against Human Genomic Sequence Using Various K Sizes and N Sizes

K N 2×106 2×107 2×108





10 2 3.9 35.6 680.1
10 3 3.2 21.4 348.7
11 2 2.4 8.1 92.4
11 3 2.3 6.5 61.8
12 2 3.9 7.0 39.9
12 3 3.7 6.4 33.8

The 2 × 106 genomic sequence is ctg12414, which is 2,034,363 bases long and was taken from the December 2000 UCSC human genome assembly (http://genome.ucsc.edu). The 2 × 107 genomic sequence is ctg15424 and is 20,341,418 bases long. The 2 × 108 column is chromosome 4 and is 200,175,155 bases long. The two major components of the run-time are the time it takes to bin and sort the K-mer hits (clumping is almost instantaneous after sorting), and the time it takes to extend the clumps into alignments. The bin/sort time depends on the number of hits, which is proportional to 4−K. The bin/sort time is somewhere between O(n) and O(n log n). The extend time is linear with respect to the number of clumps.