. 2018 Sep 8;34(17):i748–i756. doi: 10.1093/bioinformatics/bty597

Table 1.

List of datasets used for evaluation

Id	Query sequences ( $\geq 10$ Kbp)			Reference genome
Id	Source	# Sequences	N50 (bp)	Reference genome
D1	E. coli O157 genome	2	5.5 M	E. coli K12 MG1655
D2	Human genome assembly (ONT+Illumina)	2269	7.7 M	Human (hg38)
D3	Human genome assembly (ONT)	2263	7.4 M	Human (hg38)
D4	Human (hg38) genome	365	145 M	Gorilla (gorGor5)
D5	Chimp (panTro5) genome	3086	137 M	Gorilla (gorGor5)
D6	Ultra-long human ONT reads	7656	129 K	Human (hg38)

Note: Datasets D1–D5 are included to evaluate Mashmap2 for genome-to-genome mapping application, and D6 for long read mapping application. We discarded a small fraction of contigs and reads with length <10 Kbp.