Skip to main content
. 2014 Oct 7;9(10):e109384. doi: 10.1371/journal.pone.0109384

Table 7. Comparison of MuGI and JST on simulated and real data, both over 1092 individual sequences of chr1.

Algorithm Max. allowed mismatches RAM usage
0 1 2 3 4 5 [GB]
Simulated data
JST-Horspool 8.0 s 2.58
JST-Myers 22.5 s 24.0 s 23.9 s 24.3 s 24.5 s 24.9 s 2.58
MuGI, Inline graphic = 30, sparsity  = 1 8.2 µs 14.8 µs 22.7 µs 33.6 µs 69.6 µs 176.0 µs 1.84
MuGI, Inline graphic = 30, sparsity  = 3 10.7 µs 21.7 µs 32.4 µs 47.9 µs 90.7 µs 239.0 µs 0.98
Real data
JST-Horspool 6.9 s 2.58
JST-Myers 18.4 s 19.1 s 19.2 s 20.0 s 20.3 s 20.3 s 2.58
MuGI, Inline graphic = 30, sparsity  = 1 7.6 µs 14.3 µs 22.1 µs 53.4 µs 172.3 µs 476.5 µs 1.84
MuGI, Inline graphic = 30, sparsity  = 3 12.3 µs 23.1 µs 35.3 µs 72.0 µs 238.2 µs 617.9 µs 0.98

MuGI was executed for parameters Inline graphic = 30 and sparsities: 1 (index size 2.0 GB), 3 (index size 1.0 GB). Results for JST are averages from only 100 queries due to very long running times. JST times include block generation (blocks of 100K SNPs were used), but in our experiments they are at least an order of magnitude lower than pattern searching. JST-Horspool uses the Boyer–Moore–Horspool exact matching algorithm, while JST-Myers uses Myers' bit-parallel approximate matching algorithm, handling the Levenshtein distance (Inline graphic-differences). The JST index size was 468 MB, in addition to the 253 MB of the reference sequence. Note its memory use during the search is significantly higher than the index size and depends on the block size (e.g., its memory use grows to about 13 GB with blocks of 1 M SNPs).