Table 3.
Dataset | Assembly | Size (Gb) | QV | NG50 (Mb) | Multi-copy genes retained (%) | Resolved BACs (%) | Switch error (%) | Hamming error (%) | FNR (%) | FDR (%) |
---|---|---|---|---|---|---|---|---|---|---|
HG00733 | hifiasm (trio) | 6.071 | 49.9 | 34.9 | 84.0 | 95.5 | 0.08 | 0.22 | 2.43 | |
HiCanu (trio) | 6.079 | 49.2 | 10.6 | 84.3 | 90.5 | 0.04 | 0.04 | 4.78 | ||
Peregrine (trio) | 5.938 | 42.2 | 19.1 | 37.6 | 39.7 | 0.10 | 0.23 | 12.34 | ||
Peregrine (Hi-C) | 5.867 | 41.6 | 26.1 | 33.2 | 35.2 | 0.12 | 0.67 | 3.31 | ||
Peregrine (Strand-seq) | 5.805 | 45.8 | 26.6 | 33.0 | 46.9 | 0.18 | 0.72 | 3.99 | ||
HG002 | hifiasm (trio) | 5.967 | 51.6 | 43.0 | 80.6 | 0.79 | 0.34 | 0.88 | 0.26 | |
HiCanu (trio) | 6.003 | 50.4 | 12.1 | 80.4 | 0.75 | 0.19 | 1.57 | 0.32 | ||
Peregrine (trio) | 5.888 | 42.7 | 25.8 | 38.7 | 0.70 | 0.18 | 4.42 | 4.18 |
Parental assemblies are merged together for computing QV, NG50 and BACs resolved. Calculating NG50 assumes a diploid human genome size of 6.2 Gb. Phased variants are called with dipcall28 for each pair of parental assemblies and are compared to HG002 truth variants from GIAB29 or HG00733 phased SNPs from HGSVC30. Phasing switch error rate: percent adjacent SNP pairs that are wrongly phased. Phasing hamming error rate: percent SNP sites that are wrongly phased. False negative rate (FNR): percent true variants that are missed in the assembly. False discovery rate (FDR): percent assembly-based variant calls that are not called in the truth data. RTG’s vcfeval31 is used for estimating variant FNR and FDR for HG002. For HG00733, FNR is estimated at heterozygous SNP sites only; FDR is not available because HGSVC does not provide confident regions. Percent “multi-copy genes retained” measures the percentage of multi-copy genes in GRCh38 (multiple mapping positions at ≥99% sequence identity) that remain multi-copy in the assembly, averaged between the two parental haplotypes. Gene completeness (asmgene) can be found in Supplementary Table 2.