Skip to main content
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Nat Methods. 2021 Feb 1;18(2):170–175. doi: 10.1038/s41592-020-01056-5

Table 3.

Statistics of haplotype-resolved human assemblies

Dataset Assembly Size (Gb) QV NG50 (Mb) Multi-copy genes retained (%) Resolved BACs (%) Switch error (%) Hamming error (%) FNR (%) FDR (%)
HG00733 hifiasm (trio) 6.071 49.9 34.9 84.0 95.5 0.08 0.22 2.43
HiCanu (trio) 6.079 49.2 10.6 84.3 90.5 0.04 0.04 4.78
Peregrine (trio) 5.938 42.2 19.1 37.6 39.7 0.10 0.23 12.34
Peregrine (Hi-C) 5.867 41.6 26.1 33.2 35.2 0.12 0.67 3.31
Peregrine (Strand-seq) 5.805 45.8 26.6 33.0 46.9 0.18 0.72 3.99

HG002 hifiasm (trio) 5.967 51.6 43.0 80.6 0.79 0.34 0.88 0.26
HiCanu (trio) 6.003 50.4 12.1 80.4 0.75 0.19 1.57 0.32
Peregrine (trio) 5.888 42.7 25.8 38.7 0.70 0.18 4.42 4.18

Parental assemblies are merged together for computing QV, NG50 and BACs resolved. Calculating NG50 assumes a diploid human genome size of 6.2 Gb. Phased variants are called with dipcall28 for each pair of parental assemblies and are compared to HG002 truth variants from GIAB29 or HG00733 phased SNPs from HGSVC30. Phasing switch error rate: percent adjacent SNP pairs that are wrongly phased. Phasing hamming error rate: percent SNP sites that are wrongly phased. False negative rate (FNR): percent true variants that are missed in the assembly. False discovery rate (FDR): percent assembly-based variant calls that are not called in the truth data. RTG’s vcfeval31 is used for estimating variant FNR and FDR for HG002. For HG00733, FNR is estimated at heterozygous SNP sites only; FDR is not available because HGSVC does not provide confident regions. Percent “multi-copy genes retained” measures the percentage of multi-copy genes in GRCh38 (multiple mapping positions at ≥99% sequence identity) that remain multi-copy in the assembly, averaged between the two parental haplotypes. Gene completeness (asmgene) can be found in Supplementary Table 2.