Table 1. Evaluation of the performance of de novo genome assembly using MIRA and Celera.
Assembly | MIRA | Celera |
---|---|---|
Statistics of contig | ||
Number of contigs | 1,972 | 3,094 |
Total size of contigs (bp) | 109,184,716 | 107,097,920 |
Largest contig (bp) | 717,688 | 650,163 |
N50 contig length (bp) | 109,277 | 86,600 |
L50 count | 274 | 316 |
Contig GC content (%) | 35.51 | 35.48 |
Statistics of contig mapping | ||
Genome coverage (%) | 96.48 | 97.18 |
Duplication ratio | 1.134 | 1.103 |
NA50 contig length (bp) | 82,984 | 78,179 |
LA50 count | 349 | 372 |
Relocations | 443 | 225 |
Translocations | 245 | 131 |
Inversions | 40 | 28 |
SNVs per 100 Kb | 24.6 | 19.52 |
Short indels (<9 bp) | 0.01195% | 0.00647% |
Long indels (>=9 bp) | 0.000143% | 0.000049% |
Fully unaligned contigs | 0 | 8 |
Partially unaligned contigs | 6 | 29 |
The N50 length measures the length of the contig for which 50% of the total assembly length is contained in contigs of that size or larger, while the L50 metric is the ranking order of the contig if all contigs are ordered from longest to shortest. NA50 and LA50 are similar to N50 and L50 respectively except they are based on the alignment of the contigs against the genome. The relocation is a mis-assembly event that a single contig is “broken” with a minimum interval size of one Kbp and can be mapped to different regions of the same chromosome, while the translocation is the mis-assembly event that a single contig can be mapped to different chromosomes. The inversion is a mis-assembly event that a contig can be aligned to the opposite strands of the same chromosome. Duplication ratio is defined as the ratio of contig length and reference length.