. Author manuscript; available in PMC: 2021 Nov 11.

Published in final edited form as: Proc Data Compress Conf. 2021 May 10;2021:193–202. doi: 10.1109/dcc50243.2021.00027

Table 1:

Characteristics of the obtained matching statistics. The column # is the number of chr19 sequences stored in T, ∣SLP∣ the sizes of PHONI_std’s SLP grammar, and r the runs in BWT (in million). The column LF% is the percentage of how often Line 5 in Algo. 2 is true. This percentage increases with the number of samples since it becomes likelier for matches the longer the indexed text becomes. The average and maximum value of len in MS are roughly 81,558 and 3,100,685 for all instances.

#	∣T∣ [GB]	∣SLP∣ [MB]	r [M]	LF%
16	0.96	36.11	32.40	78.88
32	1.92	37.86	32.83	79.11
64	3.85	39.49	33.34	79.36
100	6.01	41.02	33.78	79.56
256	15.39	47.38	35.62	80.34
512	30.78	57.98	39.24	81.96
1000	60.11	80.64	45.93	84.61