. 2021 Jun 1;16:8. doi: 10.1186/s13015-021-00186-5

Table 3.

Assessment of assembly qualities for LazyB, Canu Wtdbg2, HASLR, Wengan and short-read only assemblies for two model organisms

Org.	X	Tool	Compl. [%]	#ctg	#MA	MM	InDels	NA50
Yeast	$\sim 5 \times$	LazyB	90.466	127	9	192.56	274.62	118843
		LazyB+QM	94.378	64	12	174.77	245.05	311094
		Canu	14.245	115	5	361.47	2039.15	–
		Wtdbg2	22.237	177	0	849.07	805.31	–
		HASLR	64.158	111	1	14.87	34.86	60316
		DBG2OLC	45.645	53	20	2066.64	1655.92	–
		Wengan	95.718	41	11	49.14	68.47	438928
	$\sim 11 \times$	LazyB	97.632	33	15	193.73	300.20	505126
		LazyB+QM	94.211	34	14	234.59	329.4	453273
		Canu	92.615	66	15	107.00	1343.37	247477
		Wtdbg2	94.444	42	8	420.96	1895.28	389196
		HASLR	92.480	57	1	7.89	33.91	251119
		DBG2OLC	97.689	38	25	55.06	1020.48	506907
		Wengan	96.036	37	4	32.35	53.04	496058
	$\sim 80 \times$	Abyss	95.247	283	0	9.13	1.90	90927
Fruit fly	$\sim 5 \times$	LazyB	71.624	1879	68	446.19	492.43	64415
		LazyB+QM	75.768	1164	79	322.49	349.29	167975
		Canu	–	–	–	–	–	–
		Wtdbg2	6.351	2293	2	916.77	588.19	–
		HASLR	24.484	1407	10	31.07	58.96	–
		DBG2OLC	25.262	974	141	1862.85	969.26	–
		Wengan	81.02	2129	192	105.35	123.33	77215
	$\sim 10 \times$	LazyB	80.111	596	99	433.37	486.28	454664
		LazyB+QM	80.036	547	100	416.34	467.14	485509
		Canu	49.262	1411	275	494.66	1691.11	–
		Wtdbg2	41.82	1277	155	2225.12	1874.01	–
		HASLR	67.059	2463	45	43.83	84.89	36979
		DBG2OLC	82.52	487	468	739.47	1536.32	498732
		Wengan	84.129	926	237	114.96	154.03	221730
	$\sim 45 \times$	Abyss	83.628	5811	123	6.20	8.31	67970
Human	$\sim 10 \times$	LazyB	67.108	13210	2915	1177.59	1112.84	168170
	$\sim 43 \times$	Unitig	69.422	4146090	252	93.07	13.65	338
	$\sim 43 \times$	Abyss	84.180	510315	2669	98.53	25.03	7963

LazyB outperforms Canu and Wtdbg2 in all categories, while significantly reducing contig counts compared to short-read only assemblies. While HASLR is more accurate, it covers significantly lower fractions of genomes at a higher contig count and drastically lower NA50. DBG2OL produces few contigs at a high NA50 for higher coverage cases, but calls significantly more mis-assemblies. Wengan performs well for yeast, but produces more misassemblies at a higher contig count on fruit fly. Merging LazyB assemblies to the set of short read contigs (+QM) has a positive effect at 5 $\times$ long-read coverage but negligible influence at higher coverage. Mismatches and InDels are given per 100 kb. Accordingly, errors in LazyB ’s unpolished output constitute $< 1$ % except for human. Wtdbg2 assemblies were not polished. Column descriptions: X coverage of sequencing data, completeness of the assembly. #ctg: number of contigs, #MA: number of mis-assemblies (breakpoints relative to the reference assembly) M is Matches and InDels relative to the reference genomes. NA50 of correctly assembled contigs. We follow the definition of QUAST: Given a set of fragments as the sub-regions of the original contigs that were correctly aligned to the reference, the NA50 (also named NGA50) is defined as the minimal length of a fragment needed to cover 50% of the genome. This value is omitted when $< 50 %$ is correctly recalled