Skip to main content
. 2016 Feb 19;17:94. doi: 10.1186/s12859-016-0932-x

Table 1.

Analysis of the influence of different threshold values on reference genome selection after taxonomy identification and compression ratios

Data Original (MB) Comp. (MB) Processing time Align. % No. files Comp. (MB) Processing time Align. % No. files
ERR321482 1429 191 299 m 20 s 26.99 211 193 239 m 28 s 24.22 29
422 m 21 s 3.57 1480 398 m 3 s 6.5 1567
12 m 24 s 8 m 13 s
SRR359032 3981 319 127 m 34 s 57.72 26 320 93 m 60 s 57.71 7
245 m 53 s 9.7 30 206 m 18 s 9.71 32
8 m 37s 7 m 27 s
ERR532393 8230 948 639 m 55 s 45.78 267 963 522 m 42.45 39
1061 m 50 s 1.98 1456 1067 m 49 s 7.16 1639
73 m 59 s 28 m 13s
SRR1450398 5399 703 440 m 4 s 7.14 190 703 364 m 34 s 6.82 26
866 m 56 s 0.6 793 790 m 52 s 0.91 818
21 m 2 s 17 m 38 s
SRR062462 6478 137 217 m 21 s 2.55 278 139 197 m 15 s 2.13 50
254 m 26 s 0.13 570 241 m 2 s 0.51 656
15 m 45 s 19 m 31 s

Columns in bold represent a threshold of 75 species, while the columns not bolded correspond to a cutoff of 10 species. The results are shown for MetaCRAM-Huffman. “Align. %” refers to the alignment rates for the first and second round, and “No. files” refers to the number of reference genome files selected in the first and second iteration. Processing times are recorded row by row denoting real, user, and system time in order