Skip to main content
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Nat Methods. 2021 Feb 1;18(2):170–175. doi: 10.1038/s41592-020-01056-5

Table 2.

Statistics of human primary assemblies

Dataset Assembly Size (Gb) NG50 (Mb) NGA50 (Mb) QV Multi-copy genes retained (%) Resolved BACs (%) Gene completeness (asmgene)
Complete (%) Duplicated (%)
CHM13 (HiFi 32×) hifiasm 3.052 88.9 86.7 54.2 99.7 98.8 99.97 0.05
HiCanu 3.037 69.7 67.9 54.1 98.9 97.6 99.97 0.04
Peregrine 2.990 37.8 33.4 43.8 51.1 39.7 99.64 0.16
Falcon 2.862 27.1 21.8 50.1 30.2 34.2 99.47 0.03

(ONT 120×) Canu 2.936 80.0 47.3 32.7 76.9 86.7 99.30 0.10
Flye 2.900 37.5 34.0 33.5 54.7 60.6 99.22 0.11
Shasta 2.820 41.3 33.4 30.4 26.7 27.9 98.05 0.01

HG00733 (HiFi 33×) hifiasm (purge) 3.043 68.3 55.3 49.9 74.6 80.4 99.07 0.39
HiCanu (purge) 2.921 40.5 34.2 50.5 55.2 65.9 98.47 0.32
Peregrine 3.035 30.1 30.1 40.5 37.2 38.5 98.70 0.31
Falcon 2.861 24.4 23.2 46.3 33.6 38.0 96.51 0.15

(ONT 50×) Canu 2.923 41.1 36.6 29.5 54.6 69.3 98.32 0.66
Flye 2.890 26.7 25.4 29.9 34.2 44.7 97.88 0.20
Shasta 2.805 21.2 20.8 30.0 17.0 22.9 97.19 0.05

HG002 (HiFi 36×) hifiasm (purge) 3.067 98.2 64.1 51.5 75.8 99.26 0.32
HiCanu (purge) 2.953 48.3 39.4 52.1 59.7 98.71 0.18
Peregrine 3.081 33.4 32.5 41.3 42.5 99.14 0.36
Falcon 2.955 30.4 29.0 46.7 36.6 99.00 0.20

Polished ONT assemblies were generated by the Shasta developers8. HiCanu and hifiasm were run without duplication purging for the homozygous CHM13 cell line, and run with purging for the heterozygous HG00733 and HG002 cell lines. The NGA50 of an assembly is defined as the length of the correctly aligned block at 50% of the total reference genome size which is assumed to be 3.1 Gb. It was calculated based on the minigraph26 contig-to-reference alignment. The “QV” (quality value) equals the Phred-scaled contig base error rate measured by comparing 31-mers in contigs to 31-mers in short reads from the same sample. Percent “multi-copy genes retained” is reported by asmgene (Online Methods). It is the percentage of multi-copy genes in reference genome (multiple mapping positions at ≥99% sequence identity) that remain multi-copy in the assembly. A BAC is resolved if 99.5% of its bases can be mapped the assembly. There are 330 CHM13-specific BACs excluding those not resolved by the telomere-to-telomere (T2T) assembly, and there are 179 HG00733-specific BACs. HG002 does not have BAC data. Throughout the table, GRCh38 is used as the reference genome for HG00733 and HG002, and the T2T CHM13 assembly v0.9 is used as the reference for CHM13.