Skip to main content
[Preprint]. 2024 Mar 19:2024.03.15.585294. [Version 2] doi: 10.1101/2024.03.15.585294

Table 2:

Duplex + ultra-long curated assembly statistics for S. lycopersicum and Z. mays compared to existing reference genomes.

Asm Total BP (Mbp) Contigs Contig NG50 (Mb) LAI Gaps QV Errors T2T ctgs
Solanum lycopersicum Heinz 1706
Reference SL5.0 801.78 73 41.70 15.80 60 60.77 14 0/12
Verkko + curation 814.61 20 68.51 15.89 2 51.81 7 11/12
Zea mays B73
Reference Zm5.0 2,178.29 1,393 47.04 29.12 708 52.18 93 0/10
Verkko + curation 2,192.15 26 209.62 30.35 9 60.55 26 6/10

Total BP: the total length of assembly bases, in megabases. Contigs: number of sequences in the assembly, after splitting at gaps consisting of at least 3 Ns. Contig NG50: The length of the shortest contig such that half of the genome is in contigs of this length or greater. LAI: The LTR assembly index (Ou et al. 2018) for each assembly, higher is better. Gaps: the total number of gaps (composed of at least 3 Ns) in the assembly, lower is better. QV: the Phred (Ewing and Green 1998) log-scaled quality score calculated using Merqury (Rhie et al. 2020), higher is better. Errors: estimate of assembly errors based on VerityMap alignments and discordant k-mers (Mikheenko et al. 2020), lower is better. T2T ctgs: The count of telomere-to-telomere contigs for each assembly. A contig is defined as T2T if it has the canonical (TTTAGGG) telomere sequence within 50 kbp of the start and end and has no gaps, higher is better.