Skip to main content
. 2019 Mar 4;20:165. doi: 10.1186/s12864-019-5542-3

Table 1.

Evaluation of the genome assemblies. Measures are median values

WHO reference strains Saskatchewan (NML Samples)† New Zealand EuroGASP 2013¥
# samples 11 27 398 1048
Scaffolds 1 1 1 1
Longest Length 2,167,463 2,210,644 2,212,822 2,212,219
Reference Length 2,172,826 2,232,367 2,232,025 2,153,922
GC (%) 52.64 52.49 52.53 52.52
Misassemblies 0 23 11 11
Unaligned Length 91,387 131,041 153,005 223,989
Genome Fraction (%) 95.95 93.88 92.33 92.62
Duplication ratio 1.01 1.01 1.01 1.01
N’s per 100 kb 4195.81 3873.08 5676.20 5874.67
Indels per 100 kb 1.78 30.72 25.14 30.73
Total aligned length 2,085,686 2,080,404 2,050,100 1,988,450
N50α 2,167,463 2,210,644 2,212,822 2,212,219
NA50α 2,050,950 225,330 301,699 239,517
NG50α 2,167,463 2,210,644 2,212,822 2,212,219
NGA50α 2,050,950 223,320 293,413 240,269

In one Saskatchewan sample (32657), a large number of reads were filtered out during the trimming process. This most likely is the reason for the poorer values in this case

¥Six samples were excluded from the analysis due to the lack of SRA numbers

αN50 is defined as the length of the shortest contig at 50% of the total assembly length. NG50 is similar to N50, except that they are based on genome size rather than assembly size. NA50 and NGA50 are similar to N50 and NG50 except it is based on the alignment of the contigs against a reference genome