Skip to main content
. Author manuscript; available in PMC: 2023 Mar 1.
Published in final edited form as: Nat Biotechnol. 2022 Mar 24:10.1038/s41587-022-01261-x. doi: 10.1038/s41587-022-01261-x

Table 1.

Statistics of different assemblies

Dataset Assembler Size
(Gb)
N50
(Mb)
Hamming
error (%)
Multicopy genes
missed (%)
Gene completeness
Complete (%) Duplicated (%)
HG002 (HiFi + trio/Hi-C) hifiasm (Hi-C) 3.075/2.909 50.0/55.1 1.42/0.82 19.82/19.98 99.28/99.08 0.32/0.32
Falcon-Phase (Hi-C) 3.027/3.027 32.1/32.1 18.66/19.15 40.13/39.25 99.29/99.26 3.14/3.13
hifiasm (trio) 2.936/3.033 57.9/57.8 0.75/0.74 21.18/16.72 99.17/99.24 0.29/0.33
HG002 (HiFi only) hifiasm (dual) 3.033/3.015 57.8/44.7 28.25/21.59 18.47/20.30 99.11/99.04 0.35/0.31
hifiasm (primary/alt) 3.112/2.910 89.9/0.4 22.30/1.99 13.14/32.25 99.44/88.10 0.34/2.67
HiCanu (primary/alt) 2.960/3.143 48.4/0.3 27.76/0.68 34.95/20.62 98.88/85.63 0.19/5.15
HG00733 (HiFi + trio/Hi-C/Strand-seq) hifiasm (Hi-C) 3.024/3.062 44.5/40.6 1.79/1.48 14.97/18.31 99.44/99.51 0.31/0.35
DipAsm (Hi-C) 2.934/2.933 26.3/28.2 2.81/2.57 66.08/67.44 99.03/99.04 0.39/0.40
PGAS (Strand-seq) 2.905/2.900 30.1/25.9 3.25/2.60 66.48/68.31 99.15/99.18 0.16/0.15
hifiasm (trio) 3.047/3.026 52.3/45.6 0.78/0.99 14.57/18.87 99.50/99.28 0.42/0.32
HG00733 (HiFi only) hifiasm (dual) 3.027/3.049 48.3/36.4 38.08/36.40 19.82/17.52 99.36/99.18 0.34/0.42
hifiasm (primary/alt) 3.077/3.018 68.3/0.3 39.63/2.23 12.18/28.98 99.58/84.95 0.51/2.89
HiCanu (primary/alt) 2.918/3.312 44.5/0.2 38.79/1.00 42.75/14.81 98.89/82.78 0.14/6.29
European badger (HiFi + trio/Hi-C) hifiasm (Hi-C) 2.731/2.536 84.5/73.6 1.51/2.09 96.77/94.33 1.68/1.63
hifiasm (trio) 2.633/2.560 91.5/57.2 0.65/3.28 94.44/95.11 1.70/1.68
European badger (HiFi only) hifiasm (dual) 2.628/2.643 80.6/70.9 16.56/16.13 95.32/96.14 1.65/1.65
hifiasm (primary/alt) 2.724/1.711 85.0/0.2 12.88/1.83 96.82/51.59 1.67/1.35
HiCanu (primary/alt) 2.690/1.371 67.1/0.1 11.36/1.12 96.75/38.30 1.96/2.57
Sterlet (HiFi + trio/Hi-C) hifiasm (Hi-C) 1.869/1.879 10.4/9.3 3.48/2.52 93.05/93.16 57.83/58.35
hifiasm (trio) 1.865/1.853 11.3/11.4 0.75/0.44 93.30/93.27 59.15/57.91
Sterlet (HiFi only) hifiasm (dual) 1.873/1.869 10.6/9.2 11.32/11.34 93.41/92.80 56.92/58.79
hifiasm (primary/alt) 1.927/1.885 27.7/1.5 24.94/0.87 93.43/92.64 59.01/55.66
HiCanu (primary/alt) 1.724/2.114 7.3/2.2 12.31/1.99 91.48/90.25 42.47/59.97
South Island takahe (HiFi + trio/Hi-C) hifiasm (Hi-C) 1.315/1.154 12.5/13.2 0.70/0.64 97.01/90.27 0.54/0.46
hifiasm (trio) 1.237/1.236 12.9/12.6 1.87/0.19 91.22/92.29 0.64/0.61
South Island takahe (HiFi only) hifiasm (dual) 1.237/1.257 13.8/10.7 6.03/5.06 92.56/94.45 0.49/0.52
hifiasm (primary/alt) 1.320/0.644 16.3/0.3 5.12/1.01 97.11/45.33 0.59/0.73
Black Rhinoceros (HiFi + trio/Hi-C) hifiasm (Hi-C) 2.992/3.056 31.6/28.9 1.16/1.44 96.49/96.82 0.82/0.78
hifiasm (trio) 3.014/3.050 30.1/31.3 0.93/0.33 96.13/96.81 0.89/0.90
Black Rhinoceros (HiFi only) hifiasm (dual) 2.929/3.047 26.8/27.3 35.05/34.13 94.49/95.99 0.80/0.87
hifiasm (primary/alt) 3.055/2.846 38.9/0.7 36.44/3.42 96.79/84.76 0.80/1.01
HiCanu (primary/alt) 3.058/2.560 22.2/0.3 31.55/0.61 96.79/70.11 1.53/1.38

All assemblies of the same sample use the same HiFi and Hi-C reads, except PGAS which relies on strand-seq data for phasing. Each assembly consists of two sets of contigs. The two sets may represent paternal/maternal with trio binning, haplotype 1/haplotype 2 with haplotype-resolved assembly or hifiasm dual assembly, or represent primary/alternate contigs. The two numbers in each cell give the metrics for the two sets of contigs, respectively. FALCON-Phase HG002 assembly, DipAsm and PGAS HG00733 assemblies were acquired from their associated publications. For South Island takahe, HiCanu could not produce assembly in 3 weeks so it is excluded. The N50 of an assembly is defined as the sequence length of the shortest contig at 50% of the total assembly size. The completeness scores of all human assemblies were calculated by the asmgene method14 with GRCh38 as the reference genome. The completeness of non-human assemblies were evaluated by BUSCO15. All samples have parental short reads, which were used to calculate the phasing switch error rates (Supplementary Table 1) and phasing hamming error rates with yak5. The hamming error rate equals Σi min{pi, mi}/Σi(pi + mi) where pi and mi are the number of paternal- and maternal-specific 31-mers on contig i, respectively. ‘Multicopy genes missed’ is the percentage of multi-copy genes in GRCh38 (multiple mapping positions at ≥99% sequence identity) that are not multi-copy in the assembly. This metric is only reported for human samples as other species lack high-quality reference genomes and good gene annotations.