Skip to main content
. 2014 Sep 4;9(9):e106689. doi: 10.1371/journal.pone.0106689

Table 1. Size and correctness metrics for de novo assembly.

Metric Value
Number of contigs 5598
Total size of contigs 147445959
Longest contig 567504
Shortest contig 1506
Number of contigs Inline graphic10 Kbp 2805
Number of contigs Inline graphic100 Kbp 331
Mean contig size 26339
Median contig size 10079
N50 contig length 69692
L50 contig count 554
NG50 contig length 48552
LG50 contig count 833
Contig GC content 42.26%
Genome fraction 96.86% (92.24%)
Duplication ratio 1.15 (1.14)
NA50 60103 (63010)
LA50 623 (618)
Mismatches per 100 Kbp 7.77 (21.9)
Short indels (Inline graphic5 bp) per 100 Kbp 5.10 (7.93)
Long indels (Inline graphic5 bp) per 100 Kbp 0.46 (1.05)
Fully-unaligned contigs 377 (179)
Partially unaligned contigs 1214 (70)

The N50 length metric measures the length of the contig for which 50% of the total assembly length is contained in contigs of that size or larger, while the L50 metric is the rank order of that contig if all contigs are ordered from longest to shortest. NG50 and LG50 are similar, but based on the expected genome size of 180 Mbp rather than the assembly length. QUAST [39] metrics are based on alignment of contigs to the euchromatic reference chromosome arms (which also contain most of the centric heterochromatin). NA50 and LA50 are analogous to N50 and L50, respectively, but in this case the lengths of aligned blocks rather than contigs are considered.

Values in parentheses represent metrics calculated upon inclusion of the heterochromatic reference scaffolds (XHet, 2LHet, 2RHet, 3LHet, 3RHet, YHet, and U), which contain gaps of arbitrary size and are in some cases not oriented with respect to one another [72]. Values outside of parentheses represent comparison of the assembly only to high-quality reference scaffolds X, 2L, 2R, 3L, 3R, and 4.