Skip to main content
. 2021 Apr 7;593(7857):101–107. doi: 10.1038/s41586-021-03420-7

Extended Data Fig. 3. Genes with improved alignment to the CHM13 chromosome 8 assembly relative to GRCh38.

Extended Data Fig. 3

a, Ideogram of chromosome 8 showing protein-coding genes with improved transcript alignments to the CHM13 chromosome 8 assembly relative to GRCh38 (hg38). Each gene is labelled with its name, count of improved transcripts from the CHM13 cell line, count of improved transcripts from other tissues, the average percent improvement of non-CHM13 cell line alignments, and the number of tissue sources with improved transcript mappings. b, c, Differential percentage sequence identity of transcripts aligning to CHM13 or GRCh38 for CHM13 cell line transcripts (b) and non-CHM13 cell line transcripts (c). df, Multiple-sequence alignments for WDYHV1 (d), MCPH1 (e) and PCMTD1 (f), all of which have at least 0.1% greater sequence identity of >20 full-length Iso-Seq transcripts to the CHM13 chromosome 8 assembly than to GRCh38 (Methods). For each gene, the GRCh38 annotation is compared to the same annotation lifted over to the CHM13 chromosome 8 assembly, and the substitutions are confirmed by translated predicted open reading frames from Iso-Seq transcripts. Matching amino acids are shaded in grey, those matching only the Iso-Seq data are in red, and those different from the Iso-Seq data are in blue. Each substitution in CHM13 relative to GRCh38 has an allele frequency of 0.36 in gnomAD (v3). g, Location of DEFA and DEFB genes in the CHM13 chromosome 8 β-defensin locus. Segmental duplication regions were identified by SEDEF85, and new paralogues are shown in red. Duplication cassettes are marked with arrows indicating orientation for each copy.