Table 1:
Characteristics of the obtained matching statistics. The column # is the number of chr19 sequences stored in T, ∣SLP∣ the sizes of PHONIstd’s SLP grammar, and r the runs in BWT (in million). The column LF% is the percentage of how often Line 5 in Algo. 2 is true. This percentage increases with the number of samples since it becomes likelier for matches the longer the indexed text becomes. The average and maximum value of len in MS are roughly 81,558 and 3,100,685 for all instances.
# | ∣T∣ [GB] |
∣SLP∣ [MB] |
r [M] |
LF% |
---|---|---|---|---|
16 | 0.96 | 36.11 | 32.40 | 78.88 |
32 | 1.92 | 37.86 | 32.83 | 79.11 |
64 | 3.85 | 39.49 | 33.34 | 79.36 |
100 | 6.01 | 41.02 | 33.78 | 79.56 |
256 | 15.39 | 47.38 | 35.62 | 80.34 |
512 | 30.78 | 57.98 | 39.24 | 81.96 |
1000 | 60.11 | 80.64 | 45.93 | 84.61 |