Skip to main content
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733

Table 2.

Common statistics used in the analysis of genome assemblies. Each statistic is classified as de novo (no reference required) or reference-based (reference genome required). All de novo statistics can be calculated using abyss-fac or QUAST, while the reference-based statistics are assessed using QUAST.

Statistic de novo or reference-based Description
n de novo The number of sequences in the assembly.
NG50 length de novo At least half of the genome size is assembled in pieces of the NG50 length and larger. In other words, if you add up the lengths of the contigs that are the NG50 length and larger, it will sum to at least half of the expected genome size.
NGA50 length reference-based Analogous to the NG50 length, but uses alignment blocks instead of contig lengths for the calculation. Therefore, it summarizes both the contiguity and correctness of the assemblies.
Misassemblies reference-based Number of large-scale errors in the assembly as compared to the supplied reference. These QUAST extensive misassemblies can be classified into 3 categories: relocations, inversions and translocations.
Scaffold NG50/NGA50 length de novo (NG50) and reference-based (NGA50) The “Scaffold NG50” and “Scaffold NGA50” lengths (as described above) are computed directly on the full scaffold lengths.
Contig NG50/NGA50 length de novo (NG50) and reference-based (NGA50) Prior to calculating the NG50 or NGA50 lengths, the assembly sequences are broken at ambiguous codes (“N”s). By default, QUAST will break the sequences at regions of >= 10 “N”s when calculating these contig statistics.
# N’s per 100 kbp de novo Number of ambiguous bases (“N”s) in the assembly per 100 kbp of sequence.