Table 1.
Id | Query sequences ( Kbp) |
Reference genome | ||
---|---|---|---|---|
Source | # Sequences | N50 (bp) | ||
D1 | E. coli O157 genome | 2 | 5.5 M | E. coli K12 MG1655 |
D2 | Human genome assembly (ONT+Illumina) | 2269 | 7.7 M | Human (hg38) |
D3 | Human genome assembly (ONT) | 2263 | 7.4 M | Human (hg38) |
D4 | Human (hg38) genome | 365 | 145 M | Gorilla (gorGor5) |
D5 | Chimp (panTro5) genome | 3086 | 137 M | Gorilla (gorGor5) |
D6 | Ultra-long human ONT reads | 7656 | 129 K | Human (hg38) |
Note: Datasets D1–D5 are included to evaluate Mashmap2 for genome-to-genome mapping application, and D6 for long read mapping application. We discarded a small fraction of contigs and reads with length <10 Kbp.