Skip to main content
[Preprint]. 2023 Apr 19:arXiv:2304.09729v1. [Version 1]

Table 2.

Human chromosome-specific high-order alpha repeats (αHORs)

chr HORmon a centromere HiCAT b centromere SRF (k=171) chromosome c SRF/171 assembly d SRF/171 HiFi reads d SRF/101 Illumina d SRF (k=171) HPRC assembly e

1 6 2 6 (4.2); 11 (0.5) 6 (2.0) 6 (3.5) 2 (2.2) 6 [89]
2 4 4 4 (2.3) 4 (2.3) 4 (2.2) 4 (2.2) 4 [94]
3 17 17 17 (1.4) 17 (1.4) 17 (1.4) 17 (1.4) 17 [94]
4 19 19 19 (3.5) 19 (3.5) 19 (2.9) 19 (3.4) 19 [94]
5 6 12 8 (2.5) 8 (1.8) 4 (1.9) missing 8 [43]; 4 [37]
6 18 18 18 (2.0) 18 (2.0) 18 (2.0) 18 (2.0) 18 [93]
7 6 6 6 (3.3) 6 (3.2) 6 (3.2) 6 (3.2) 6 [92]; 12 [2]
8 11 15 7 (1.1) 7 (1.1) 7 (1.0) 11 (1.0) 7 [61]; 8 [33]
9 7 11 4 (1.8) 4 (1.4) 11 (2.0) 4 (1.7) 4 [77]; 11 [17]
10 8 6 8 (2.1) 8 (2.1) 8 (1.7) 8 (2.1) 6 [66]; 8 [28]
11 5 5 5 (3.4) 5 (3.3) 5 (3.4) 5 (3.4) 5 [94]
12 8 8 8 (2.6) 8 (2.6) 8 (2.6) 8 (2.6) 8 [94]
13 11 7 4 (0.4) 4 (0.4) 7 (1.5) 7 (1.5) 4 [55]; 11 [23]; 7 [16]
14 8 8 8 (2.6) missing missing missing missing
15 11 15 11 (0.8); 20 (0.5) 11 (0.8) 11 (0.8) 11 (0.8) 11 [94]
16 10 10 10 (2.0) 10 (1.9) 10 (1.9) missing 10 [94]
17 16 14 16 (3.3) 16 (3.3) 16 (3.5) 16 (3.5) 16 [56]; 13 [38]
18 12 12 8 (3.6) 8 (3.8) 12 (4.9) missing 12 [66]; 8 [19]
19 2 2 4 (0.4); 2 (0.4) missing 13 (0.5) missing 13 [29]; 32 [4]
20 16 16 16 (2.1) 16 (2.1) 16 (2.1) 8 (0.5) 16 [94]
21 11 11 11 (0.3) missing missing missing missing
22 8 8 8 (2.9); 20 (0.5) 8 (2.8) 8 (2.6) 8 (2.9) 8 [94]
X 12 12 12 (3.1) 12 (3.1) 12 (3.1) 12 (3.1) 12 [76]
Y 34 No Y 34 (0.3) 34 (0.3) No Y No Y 34 [18]
a

αHOR lengths in the monomer unit in the CHM13 v2.0 genome, retrieved from Kunyavskaya et al. (2022).

b

length of “top 1” αHOR from each chromosome retrieved from Gao et al. (2022). Both HORmon and HiCAT were applicable to extracted centromeric sequences only.

c

SRF applied to each CHM13 chromosome separately. In a format “m (L)”, m denotes the length of an HOR in the monomer unit and L is its span on the CHM13 assembly in megabases.

d

SRF applied to CHM13 assembly, PacBio High-Fidelity (HiFi) reads and Illumina short reads, respectively. k=101 used for Illumina reads. CHM13 reads do not contain chrY.

e

SRF applied to 94 phased haploid assemblies produced by the Human Pangenome Reference Consortium (HPRC). In a format “m [n]”, m is the monomer length and n is the number of samples with the HOR according to manual inspection.