Table 12.
Alignment Times in Seconds of 10,000 ESTs (Average Size 380 Bases) Against Human Genomic Sequence Using Various K Sizes and N Sizes
K | N | 2×106 | 2×107 | 2×108 |
---|---|---|---|---|
10 | 2 | 3.9 | 35.6 | 680.1 |
10 | 3 | 3.2 | 21.4 | 348.7 |
11 | 2 | 2.4 | 8.1 | 92.4 |
11 | 3 | 2.3 | 6.5 | 61.8 |
12 | 2 | 3.9 | 7.0 | 39.9 |
12 | 3 | 3.7 | 6.4 | 33.8 |
The 2 × 106 genomic sequence is ctg12414, which is 2,034,363 bases long and was taken from the December 2000 UCSC human genome assembly (http://genome.ucsc.edu). The 2 × 107 genomic sequence is ctg15424 and is 20,341,418 bases long. The 2 × 108 column is chromosome 4 and is 200,175,155 bases long. The two major components of the run-time are the time it takes to bin and sort the K-mer hits (clumping is almost instantaneous after sorting), and the time it takes to extend the clumps into alignments. The bin/sort time depends on the number of hits, which is proportional to 4−K. The bin/sort time is somewhere between O(n) and O(n log n). The extend time is linear with respect to the number of clumps.