Skip to main content
. 2012 May 12;40(15):6993–7015. doi: 10.1093/nar/gks408

Table 3.

Representative memory requirements for different index structure implementations, expressed both as bits per indexed character (column 2) and estimated size in megabytes for several known genomes (columns 3–5)

Name index structure Bits/char Size for genome in MB
Reference
Yeast Fruit fly Human
2-bit encoded string 2 3 35 775 NCBIa
    CSA Grossi et al. 2.4 4 42 931 (59,60)
    FM-index 3.36 5 59 1302 (45,39)
    SSA (best) 4 6 70 1551 (47,57)
    CST Russo et al.b 5 8 87 1939 (61,62)
    CSA Sadakane (best) 5.6 8 98 2171 (63,64)
    LZ-index (best) 6.64 10 116 2574 (57)
byte encoded string 8 12 139 3102 NCBIa
    CST Navarrob 12 18 209 4653 (62)
    SSA (worst) 20 30 349 7754 (47,57)
    CST Sadakaneb 30 45 523 11 632 (44,62)
    LZ-index (worst) 35.2 53 614 13 648 (65,39)
    Suffix array 40 60 697 15 509 (35)
    Enhanced SA 72 109 1255 27 916 (19)
    WOTD suffix tree 76 115 1325 29 467 (33)
    ST McCreight 232 350 4045 89 952 (34,33)

Column 6 contains references to the original theoretical proposals and an additional reference to the articles from which these practical estimates originate. For ease of comparison purposes, the index structures are sorted by increasing memory requirements. As a reference, the original (non-indexed) sequence is also included (bold), both stored using 2-bit encoding and byte encoding.

aGenome sizes were taken from the NCBI genome information pages http://www.ncbi.nlm.nih.gov/genome of Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fruit fly) and Homo Sapiens (human).

bMean of the interval of possible memory requirements given in (62).