Table 3.
Assessment of assembly qualities for LazyB, Canu Wtdbg2, HASLR, Wengan and short-read only assemblies for two model organisms
Org. | X | Tool | Compl. [%] | #ctg | #MA | MM | InDels | NA50 |
---|---|---|---|---|---|---|---|---|
Yeast | LazyB | 90.466 | 127 | 9 | 192.56 | 274.62 | 118843 | |
LazyB+QM | 94.378 | 64 | 12 | 174.77 | 245.05 | 311094 | ||
Canu | 14.245 | 115 | 5 | 361.47 | 2039.15 | – | ||
Wtdbg2 | 22.237 | 177 | 0 | 849.07 | 805.31 | – | ||
HASLR | 64.158 | 111 | 1 | 14.87 | 34.86 | 60316 | ||
DBG2OLC | 45.645 | 53 | 20 | 2066.64 | 1655.92 | – | ||
Wengan | 95.718 | 41 | 11 | 49.14 | 68.47 | 438928 | ||
LazyB | 97.632 | 33 | 15 | 193.73 | 300.20 | 505126 | ||
LazyB+QM | 94.211 | 34 | 14 | 234.59 | 329.4 | 453273 | ||
Canu | 92.615 | 66 | 15 | 107.00 | 1343.37 | 247477 | ||
Wtdbg2 | 94.444 | 42 | 8 | 420.96 | 1895.28 | 389196 | ||
HASLR | 92.480 | 57 | 1 | 7.89 | 33.91 | 251119 | ||
DBG2OLC | 97.689 | 38 | 25 | 55.06 | 1020.48 | 506907 | ||
Wengan | 96.036 | 37 | 4 | 32.35 | 53.04 | 496058 | ||
Abyss | 95.247 | 283 | 0 | 9.13 | 1.90 | 90927 | ||
Fruit fly | LazyB | 71.624 | 1879 | 68 | 446.19 | 492.43 | 64415 | |
LazyB+QM | 75.768 | 1164 | 79 | 322.49 | 349.29 | 167975 | ||
Canu | – | – | – | – | – | – | ||
Wtdbg2 | 6.351 | 2293 | 2 | 916.77 | 588.19 | – | ||
HASLR | 24.484 | 1407 | 10 | 31.07 | 58.96 | – | ||
DBG2OLC | 25.262 | 974 | 141 | 1862.85 | 969.26 | – | ||
Wengan | 81.02 | 2129 | 192 | 105.35 | 123.33 | 77215 | ||
LazyB | 80.111 | 596 | 99 | 433.37 | 486.28 | 454664 | ||
LazyB+QM | 80.036 | 547 | 100 | 416.34 | 467.14 | 485509 | ||
Canu | 49.262 | 1411 | 275 | 494.66 | 1691.11 | – | ||
Wtdbg2 | 41.82 | 1277 | 155 | 2225.12 | 1874.01 | – | ||
HASLR | 67.059 | 2463 | 45 | 43.83 | 84.89 | 36979 | ||
DBG2OLC | 82.52 | 487 | 468 | 739.47 | 1536.32 | 498732 | ||
Wengan | 84.129 | 926 | 237 | 114.96 | 154.03 | 221730 | ||
Abyss | 83.628 | 5811 | 123 | 6.20 | 8.31 | 67970 | ||
Human | LazyB | 67.108 | 13210 | 2915 | 1177.59 | 1112.84 | 168170 | |
Unitig | 69.422 | 4146090 | 252 | 93.07 | 13.65 | 338 | ||
Abyss | 84.180 | 510315 | 2669 | 98.53 | 25.03 | 7963 |
LazyB outperforms Canu and Wtdbg2 in all categories, while significantly reducing contig counts compared to short-read only assemblies. While HASLR is more accurate, it covers significantly lower fractions of genomes at a higher contig count and drastically lower NA50. DBG2OL produces few contigs at a high NA50 for higher coverage cases, but calls significantly more mis-assemblies. Wengan performs well for yeast, but produces more misassemblies at a higher contig count on fruit fly. Merging LazyB assemblies to the set of short read contigs (+QM) has a positive effect at 5 long-read coverage but negligible influence at higher coverage. Mismatches and InDels are given per 100 kb. Accordingly, errors in LazyB ’s unpolished output constitute % except for human. Wtdbg2 assemblies were not polished. Column descriptions: X coverage of sequencing data, completeness of the assembly. #ctg: number of contigs, #MA: number of mis-assemblies (breakpoints relative to the reference assembly) M is Matches and InDels relative to the reference genomes. NA50 of correctly assembled contigs. We follow the definition of QUAST: Given a set of fragments as the sub-regions of the original contigs that were correctly aligned to the reference, the NA50 (also named NGA50) is defined as the minimal length of a fragment needed to cover 50% of the genome. This value is omitted when is correctly recalled