Table 3.
Merqury | Mash | |||
---|---|---|---|---|
Count | Union-sum | QV | QV | |
CPUs × nodes | 32 × 24 | 48 × 1 | 24 × 1 | 24 × 1 |
Wall clock time | 6 m 52 s/node | 7 m 43 s | 14 m 13 s | 3 h 36 m 17 s |
CPU time | 9.1 h | 4.7 h | 1.1 h | 19.0 h |
Memory | 21.2 G | 7.0 G | 10.56 G | 2.6 G |
Storage | 90 G (fastq.gz) | N/A | 48 G | 90 G (fastq.gz) |
Intermediates | 1.8 G × 24 | 48 G | 25.5 G | 23.1 M |
All statistics are for the diploid (maternal, paternal, and combined) assembly of the human genome NA12878. Merqury QV estimates are generated from the full k-mer databases and use exact k-mer counting, whereas Mash QV estimates are generated by streaming all reads against a MinHash sketch of the assembly using Mash Screen. Merqury’s Count and Union-sum steps count all k-mers in the reads, while the QV estimation counts k-mers in the assembly and compares these to the read counts. Mash’s QV estimation creates a k-mer sketch for the assembly and streams all reads against the sketch. Results are totaled over three QV operations (maternal, paternal, and combined). Runtimes were measured on Intel(R) Xeon(R) Gold 6140 CPU, with 2.30GHz. Storage requirements represent gzipped FASTQ files for counting and QV (Mash), and a binary database for QV (Meryl)
h hours, m minutes, s seconds, G gigabytes