Skip to main content
. 2020 Sep 14;21:245. doi: 10.1186/s13059-020-02134-9

Table 3.

Merqury runtime, memory, and disk requirements for QV estimation in a human genome

Merqury Mash
Count Union-sum QV QV
CPUs × nodes 32 × 24 48 × 1 24 × 1 24 × 1
Wall clock time 6 m 52 s/node 7 m 43 s 14 m 13 s 3 h 36 m 17 s
CPU time 9.1 h 4.7 h 1.1 h 19.0 h
Memory 21.2 G 7.0 G 10.56 G 2.6 G
Storage 90 G (fastq.gz) N/A 48 G 90 G (fastq.gz)
Intermediates 1.8 G × 24 48 G 25.5 G 23.1 M

All statistics are for the diploid (maternal, paternal, and combined) assembly of the human genome NA12878. Merqury QV estimates are generated from the full k-mer databases and use exact k-mer counting, whereas Mash QV estimates are generated by streaming all reads against a MinHash sketch of the assembly using Mash Screen. Merqury’s Count and Union-sum steps count all k-mers in the reads, while the QV estimation counts k-mers in the assembly and compares these to the read counts. Mash’s QV estimation creates a k-mer sketch for the assembly and streams all reads against the sketch. Results are totaled over three QV operations (maternal, paternal, and combined). Runtimes were measured on Intel(R) Xeon(R) Gold 6140 CPU, with 2.30GHz. Storage requirements represent gzipped FASTQ files for counting and QV (Mash), and a binary database for QV (Meryl)

h hours, m minutes, s seconds, G gigabytes