Table 1.
Summary | GRCh38 | T2T-CHM13 | ±% |
---|---|---|---|
Assembled bases (Gbp) | 2.92 | 3.05 | +4.5% |
Unplaced bases (Mbp) | 11.42 | 0 | −100.0% |
Gap bases (Mbp) | 120.31 | 0 | −100.0% |
# Contigs | 949 | 24 | −97.5% |
Ctg NG50 (Mbp) | 56.41 | 154.26 | +173.5% |
# Issues | 230 | 46 | −80.0% |
Issues (Mbp) | 230.43 | 8.18 | −96.5% |
| |||
Gene Annotation | |||
| |||
# Genes | 60,090 | 63,494 | +5.7% |
protein coding | 19,890 | 19,969 | +0.4% |
# Exclusive genes | 263 | 3,604 | |
protein coding | 63 | 140 | |
# Transcripts | 228,597 | 233,615 | +2.2% |
protein coding | 84,277 | 86,245 | +2.3% |
# Exclusive transcripts | 1,708 | 6,693 | |
protein coding | 829 | 2,780 | |
| |||
Segmental duplications (SDs) | |||
| |||
% SDs | 5.00% | 6.61% | |
SD bases (Mbp) | 151.71 | 201.93 | +33.1% |
# SDs | 24097 | 41528 | +72.3% |
| |||
RepeatMasker | |||
| |||
% Repeats | 51.89% | 53.94% | |
Repeat bases (Mbp) | 1,516.37 | 1,647.81 | +8.7% |
LINE | 626.33 | 631.64 | +0.8% |
SINE | 386.48 | 390.27 | +1.0% |
LTR | 267.52 | 269.91 | +0.9% |
Satellite | 76.51 | 150.42 | +96.6% |
DNA | 108.53 | 109.35 | +0.8% |
Simple repeat | 36.5 | 77.69 | +112.9% |
Low complexity | 6.16 | 6.44 | +4.6% |
Retroposon | 4.51 | 4.65 | +3.3% |
rRNA | 0.21 | 1.71 | +730.4% |
GRCh38 summary statistics exclude “alts” (110 Mbp), patches (63 Mbp), and Chromosome Y (58 Mbp). Assembled bases: all non-N bases. Unplaced bases: not assigned or positioned within a chromosome. # Contigs: GRCh38 scaffolds were split at three consecutive Ns to obtain contigs. NG50: half of the 3.05 Gbp human genome size contained in contigs of this length or greater. # Exclusive genes/transcripts: for GRCh38, GENCODE genes/transcripts not found in CHM13; for CHM13, extra putative paralogs that are not in GENCODE. Segmental duplication analysis is from (42). RepeatMasker analysis is from (49).