Table 1.
Evaluation of the genome assemblies. Measures are median values
WHO reference strains | Saskatchewan (NML Samples)† | New Zealand | EuroGASP 2013¥ | |
---|---|---|---|---|
# samples | 11 | 27 | 398 | 1048 |
Scaffolds | 1 | 1 | 1 | 1 |
Longest Length | 2,167,463 | 2,210,644 | 2,212,822 | 2,212,219 |
Reference Length | 2,172,826 | 2,232,367 | 2,232,025 | 2,153,922 |
GC (%) | 52.64 | 52.49 | 52.53 | 52.52 |
Misassemblies | 0 | 23 | 11 | 11 |
Unaligned Length | 91,387 | 131,041 | 153,005 | 223,989 |
Genome Fraction (%) | 95.95 | 93.88 | 92.33 | 92.62 |
Duplication ratio | 1.01 | 1.01 | 1.01 | 1.01 |
N’s per 100 kb | 4195.81 | 3873.08 | 5676.20 | 5874.67 |
Indels per 100 kb | 1.78 | 30.72 | 25.14 | 30.73 |
Total aligned length | 2,085,686 | 2,080,404 | 2,050,100 | 1,988,450 |
N50α | 2,167,463 | 2,210,644 | 2,212,822 | 2,212,219 |
NA50α | 2,050,950 | 225,330 | 301,699 | 239,517 |
NG50α | 2,167,463 | 2,210,644 | 2,212,822 | 2,212,219 |
NGA50α | 2,050,950 | 223,320 | 293,413 | 240,269 |
†In one Saskatchewan sample (32657), a large number of reads were filtered out during the trimming process. This most likely is the reason for the poorer values in this case
¥Six samples were excluded from the analysis due to the lack of SRA numbers
αN50 is defined as the length of the shortest contig at 50% of the total assembly length. NG50 is similar to N50, except that they are based on genome size rather than assembly size. NA50 and NGA50 are similar to N50 and NG50 except it is based on the alignment of the contigs against a reference genome