Table 1. Size and correctness metrics for de novo assembly.
Metric | Value |
Number of contigs | 5598 |
Total size of contigs | 147445959 |
Longest contig | 567504 |
Shortest contig | 1506 |
Number of contigs 10 Kbp | 2805 |
Number of contigs 100 Kbp | 331 |
Mean contig size | 26339 |
Median contig size | 10079 |
N50 contig length | 69692 |
L50 contig count | 554 |
NG50 contig length | 48552 |
LG50 contig count | 833 |
Contig GC content | 42.26% |
Genome fraction | 96.86% (92.24%) |
Duplication ratio | 1.15 (1.14) |
NA50 | 60103 (63010) |
LA50 | 623 (618) |
Mismatches per 100 Kbp | 7.77 (21.9) |
Short indels (5 bp) per 100 Kbp | 5.10 (7.93) |
Long indels (5 bp) per 100 Kbp | 0.46 (1.05) |
Fully-unaligned contigs | 377 (179) |
Partially unaligned contigs | 1214 (70) |
The N50 length metric measures the length of the contig for which 50% of the total assembly length is contained in contigs of that size or larger, while the L50 metric is the rank order of that contig if all contigs are ordered from longest to shortest. NG50 and LG50 are similar, but based on the expected genome size of 180 Mbp rather than the assembly length. QUAST [39] metrics are based on alignment of contigs to the euchromatic reference chromosome arms (which also contain most of the centric heterochromatin). NA50 and LA50 are analogous to N50 and L50, respectively, but in this case the lengths of aligned blocks rather than contigs are considered.
Values in parentheses represent metrics calculated upon inclusion of the heterochromatic reference scaffolds (XHet, 2LHet, 2RHet, 3LHet, 3RHet, YHet, and U), which contain gaps of arbitrary size and are in some cases not oriented with respect to one another [72]. Values outside of parentheses represent comparison of the assembly only to high-quality reference scaffolds X, 2L, 2R, 3L, 3R, and 4.