Table 3.
Name index structure | Bits/char | Size for genome in MB |
Reference | ||
---|---|---|---|---|---|
Yeast | Fruit fly | Human | |||
2-bit encoded string | 2 | 3 | 35 | 775 | NCBIa |
CSA Grossi et al. | 2.4 | 4 | 42 | 931 | (59,60) |
FM-index | 3.36 | 5 | 59 | 1302 | (45,39) |
SSA (best) | 4 | 6 | 70 | 1551 | (47,57) |
CST Russo et al.b | 5 | 8 | 87 | 1939 | (61,62) |
CSA Sadakane (best) | 5.6 | 8 | 98 | 2171 | (63,64) |
LZ-index (best) | 6.64 | 10 | 116 | 2574 | (57) |
byte encoded string | 8 | 12 | 139 | 3102 | NCBIa |
CST Navarrob | 12 | 18 | 209 | 4653 | (62) |
SSA (worst) | 20 | 30 | 349 | 7754 | (47,57) |
CST Sadakaneb | 30 | 45 | 523 | 11 632 | (44,62) |
LZ-index (worst) | 35.2 | 53 | 614 | 13 648 | (65,39) |
Suffix array | 40 | 60 | 697 | 15 509 | (35) |
Enhanced SA | 72 | 109 | 1255 | 27 916 | (19) |
WOTD suffix tree | 76 | 115 | 1325 | 29 467 | (33) |
ST McCreight | 232 | 350 | 4045 | 89 952 | (34,33) |
Column 6 contains references to the original theoretical proposals and an additional reference to the articles from which these practical estimates originate. For ease of comparison purposes, the index structures are sorted by increasing memory requirements. As a reference, the original (non-indexed) sequence is also included (bold), both stored using 2-bit encoding and byte encoding.
aGenome sizes were taken from the NCBI genome information pages http://www.ncbi.nlm.nih.gov/genome of Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fruit fly) and Homo Sapiens (human).
bMean of the interval of possible memory requirements given in (62).