Table 2.
Proposed |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Data set | Colors (m) | gzip | bzip2 | VARI | RBF | WTr | WTr (CI) | BF 95% | BF 99.0% |
Virus100 | 100 | 11.4 | 4.8 | 9.8 | 5.8 | 2.2 | 1.3 (52) | 0.36 | 0.44 |
Virus1000 | 1000 | 26.5 | 7.5 | 14.7 | 9.7 | 18.2 | 5.28 (272) | 0.49 | 0.82 |
Virus50000 | 53,412 | 135.3 | 37.7 | 56.0a | a,b | 662.1 | 64.8 (1693) | 2.58 | 7.41 |
Lactobacillus | 135 | 15.6 | 5.7 | 19.3 | 7.8 | 3.3 | 1.6 (20) | 0.95 | 1.40 |
chr22+gnomAD | 9 | 4.6 | 2.7 | 17.3a | 3.3a | N/A | 1.2 (1)c | 0.45 | 2.41 |
hg19+gnomAD | 30 | 10.9 | 5.4 | 14.5a | 5.6a | N/A | 5.4 (22)c | 0.68 | 1.82 |
Note: Each dataset is encoded with eight different compression schemes, including general compression with gzip and bzip2, existing methods specific to colored de Bruijn graphs VARI (Muggli et al., 2017) and Rainbowfish (RBF, Almodaresi et al., (2017)), as well as the wavelet trie encoding (WTr) with and without the class indicator bits set (CI; value in parenthesis describes the number of the first columns in the annotation matrices that were used as the indicator columns), and the corrected Bloom filters at (BF 95%) and (BF 99%) accuracy. All compression ratios are measured as average number of bits per edge. VARI was compiled with 1024 bit support.
On these datasets, VARI and RBF results are generated by exporting the annotation data in compatible formats.
Consumed more than 400GB memory limit.
The class indicators were the columns representing the reference chromosomes, hence, no extra columns were added.