Table 1.
# | Chrom. | New HOR name | Old HOR name | HOR length (mon) | RM or sample contiga | Genomic size (kb)b | Homogeneous/Divergent | Age | Statusc | CENP-A readsd | Ref. |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1,5,19 | S1C1/5/19H1Le | D1Z7/D5Z2/D19Z3 | 6 | GJ212201.1 | 2282 | homogeneous | modern | live | 43,226 | [6] |
2 | 3 | S1C3H1L | D3Z1 | 17 | GJ211871.1 | 2102 | homogeneous | modern/archaic | live | 31,967 | [7] |
3 | 3 | S1C3H2e | D3-2 | 10 | GJ211866.1 | 461 | homogeneous | archaic | pseudo | 484 | [1] |
4 | 3 | S1C3H3d | – | 5 | ABBA01004655.1 | 217 | divergent | archaic | relic | – | this paper |
5 | 3,6,7,8,10,12,20 | S1CMH1d | – | 4 | ABBA01004652.1 | 251 | divergent | archaic | relic | – | this paper |
6 | 5p | S1C5pH2 | – | 16 | GJ211887.1 | 143 | homogeneous | modern | pseudo | 451 | RM |
7 | 6 | S1C6H1L | D6Z1 | 18 | GJ211907.1 | 1276 | homogeneous | archaic | live | 32,711 | [8] |
8 | 7 | S1C7H1L | D7Z1 | 6 | GJ211908.1 | 2659 | homogeneous | modern | live | 32,278 | [9] |
9 | 10 | S1C10H1L | D10Z1 | 8 | GJ211932.1 | 1561 | homogeneous | modern | live | 30,214 | [10] |
10 | 10 | S1C10H1—B | – | 14 | GJ211933.1 | 48 | homogeneous | modern | pseudo | 1028 | RM |
11 | 10 | S1C10H1—C | – | 8 | GJ211936.1 | 48 | homogeneous | modern | pseudo | 116 | RM |
12 | 10 | S1C10H2 | – | 18 | GJ211930.1 | 249 | homogeneous | modern | pseudo | 340 | RM |
13 | 12 | S1C12H1L | D12Z3 | 8 | GJ211954.1 | 2350 | homogeneous | modern | live | 35,979 | [10] |
14 | 12 | S1C12H2 | – | 18 | GJ211949.1 | 47 | homogeneous | modern | pseudo | 221 | RM |
15 | 12 | S1C12H3d | – | 8f | AEKP01211346.1 | 23 | divergent | archaic | relic | – | this paper |
16 | 10,12 | S1C10/12H1d | – | 2 | ABBA01049496.1 | 93 | divergent | modern | relic | – | this paper |
17 | 16 | S1C16H1L | D16Z2 | 10 | GJ212051.1 | 1928 | homogeneous | modern | live | 38,243 | [11] |
RM is supposed to be a model of a HOR cluster constructed under the assumption that all copies form a single array in one chromosome. For known special cases such as double or triple HOR domains (same live HOR in two or three chromosomes; see Section SF1 in morpho-functional classification of AS used in this work (definitions and terminology)), the size of RM is adjusted appropriately to represent only one chromosome. For divergent HORs not represented by RMs, only the names of sample contigs are provided. These contigs do not contain all copies of respective HORs and their size does not reflect the genomic copy number.
Genomic size is an estimated length that this HOR occupies in haploid genome. For homogeneous HORs represented by RMs it is just the RM length. For S1C1/5/19 RM the size reflects the length of the HOR array on one chromosome under the assumption that the arrays in all three chromosomes are of equal size. For divergent HORs it is calculated from the data in Table 2 (the sum of corrected copy numbers for all monomers of a HOR multiplied to the monomer length).
Pseudo and relic are the two kinds of dead HORs we discriminate here. As a rule, the live and dead HORs can be discriminated by CENP-A binding and the large size of the live arrays (see other columns in this Table). Dead pseudo HORs are homogeneous (divergence 1–3%) and dead relic HORs are divergent (9–15%).
The figures shown in this column are the numbers of the 99 bp sequence reads corresponding to a given HOR out of 1 million read sample of CENP-A CHIP-seq dataset (SRR1561921) obtained from a HuRef lymphoblastoid cell line [12]. The whole dataset (about 6 million reads split in 1 million portions) was annotated by HumAS-HMMER used the same way as described in this paper. As the portions did not differ significantly, the data for only one of them are shown in the Table. For S1C1/5/19, the number is adjusted to represent the length of this HOR array on one chromosome under the assumption that the arrays in all three chromosomes are of equal size. One can see that the live HORs are the major CENP-A binding sites. The data are shown only for homogeneous HORs. The numbers for divergent HORs were slightly higher than for dead homogeneous HORs and much lower than for live homogeneous HORs. However, these numbers were not deemed reliable due to admittedly less specific annotation of divergent HORs and more effect the false coverage has on their quantification, both of which could be exacerbated by the short length of the monomer fragments in deep sequencing reads. Thus, the CENP-A binding with these HORs was likely to be somewhat overestimated.
HOR S1C3H2 is also represented in the assembly by GJ211867.1 (length 14 kb) which upon thorough analysis was disqualified as a valid RM for this HOR (see Section Non-redundant list of SF1 RMs and Supplementary note 1). HOR S1C1/5/19H1L is also represented in the assembly by GJ212205.1 (length 340 bp) and GJ212206.1 (length 340 bp) which were disqualified as valid RMs for this HOR due to their short length (see Section Non-redundant list of SF1 RMs and Supplementary note 1).
In fact, S1C12H3 is not an 8-mer, but a variable size HOR 1-2-3-(4–5)n-6-7-8 based on 8 types of monomers (see Supplementary note 1).