Skip to main content
. 2018 Sep 8;34(17):i748–i756. doi: 10.1093/bioinformatics/bty597

Table 1.

List of datasets used for evaluation

Id Query sequences (10 Kbp)
Reference genome
Source # Sequences N50 (bp)
D1 E. coli O157 genome 2 5.5 M E. coli K12 MG1655
D2 Human genome assembly (ONT+Illumina) 2269 7.7 M Human (hg38)
D3 Human genome assembly (ONT) 2263 7.4 M Human (hg38)
D4 Human (hg38) genome 365 145 M Gorilla (gorGor5)
D5 Chimp (panTro5) genome 3086 137 M Gorilla (gorGor5)
D6 Ultra-long human ONT reads 7656 129 K Human (hg38)

Note: Datasets D1–D5 are included to evaluate Mashmap2 for genome-to-genome mapping application, and D6 for long read mapping application. We discarded a small fraction of contigs and reads with length <10 Kbp.