Table 1.
Assembly statistics for CHM13 and the human reference sorted by continuity
Primary technology | Assembly | Size (Gb) | No. of contigs | NG50 (Mb) | BACs resolved (%) | BACs %idy all | BACs %idy uni |
---|---|---|---|---|---|---|---|
56× Illumina linked reads | Supernova (this paper) | 2.95 | 42,828 | 0.21 | 17.3 | 99.975 | 99.985 |
76× PacBio CLR | FALCON (ref. 50) | 2.88 | 1,916 | 28.2 | 36.37 | 99.981 | 99.995 |
24× PacBio HiFi | Canu (ref. 22) | 3.03 | 5,206 | 29.1 | 45.46 | 99.979 | 99.997 |
Sanger BACs | GRCh38p13 (ref. 2) | 3.27 | 1,590 | 56.4 | 85.63 | 99.731a | 99.768a |
39× Nanopore ultra-long | Canu (this paper) | 2.94 | 448 | 70.1 | 82.11 | 99.980 | 99.994 |
aGRCh38 is expected to have a lower identity to BACs derived from CHM13 as it represents a different human genome.
Primary Technology: sequencing technology used for contig assembly. The PacBio CLR assembly was additionally polished using Illumina linked reads. The Nanopore ultra-long assembly was polished with the PacBio CLR and Illumina linked reads. GRCh38 is primarily based on Sanger-sequenced BACs, but has been continually curated and patched since the completion of the human genome project. Assembly: assembler used and reference to the published assembly. Size: sum of bases in the assembly in Gb including N-bases. GRCh38 assembly size includes 110 Mb of alternative (ALT) sequences. No. of contigs: total number of contigs in the assembly; scaffolds were split at three consecutive N-bases to obtain contigs. NG50: half of the 3.09-Gb human genome size contained in contigs of this length or greater in Mb. Supernova NG50 statistics were identical between the two reported pseudo-haplotypes. BACs resolved (%): percentage of 341 ‘challenging’ CHM13 BACs found intact in the assembly. BACs unresolved by the best CHM13 assembly either map across multiple contigs or map to a single contig with large structural variation, indicating an error in either the BAC or whole-genome assembly. BACs %idy all: median alignment accuracy versus all validation BACs. BACs %idy uni: median alignment accuracy versus the 31 validation BACs that occur outside of segmental duplications (Supplementary Note 4).