Summary
Great apes have maintained a stable karyotype with few large-scale rearrangements; in contrast, gibbons have undergone a high rate of chromosomal rearrangements coincident with rapid centromere turnover. Here, we characterize fully assembled centromeres in the eastern hoolock gibbon, Hoolock leuconedys (HLE), finding a diverse group of transposable elements (TEs) that differ from the canonical alpha-satellites found across centromeres of other apes. We find that HLE centromeres contain a CpG methylation centromere dip region, providing evidence that this epigenetic feature is conserved in the absence of satellite arrays. We uncovered a variety of atypical centromeric features, including protein-coding genes and mismatched replication timing. Further, we identify duplications and deletions in HLE centromeres that distinguish them from other gibbons. Finally, we observed differentially methylated TEs, topologically associated domain boundaries, and segmental duplications at chromosomal breakpoints, and thus propose that a combination of multiple genomic attributes with propensities for chromosome instability shaped gibbon centromere evolution.
Keywords: chromosome evolution, centromeres, gibbons, genome assembly, primate genomics, transposable elements, methylation, replication timing
Graphical abstract

Highlights
-
•
HLE centromeres are composed of diverse repeat classes yet maintain a CDR
-
•
HLE centromeres contain atypical features like genes and differing replication timing
-
•
Chromosomal rearrangements disrupting TADs are negatively selected
-
•
A combination of (epi)genomic features may contribute to HLE karyotypic diversity
Hartley et al. investigate the non-alpha-satellite-based centromeres of the eastern hoolock gibbon (HLE), revealing that while they carry a centromere dip region, other genomic and epigenetic attributes distinguish them from great ape centromeres. Their findings highlight features of centromeres and breakpoints that may contribute to karyotypic evolution in HLE.
Introduction
Gibbons (family Hylobatidae) are a group of ∼20 species of small apes that last shared a common ancestor with great apes ∼17 million years ago. As a consequence of high rates of inter- and intra-chromosomal rearrangements, the karyotypes of the four extant gibbon genera are diverse with variable chromosome numbers (Figures 1A and 1B)1,2,3: Hoolock (2n = 38); Nomascus (2n = 52); Symphalangus (2n = 50), and Hylobates (2n = 44).4,5 In gibbons, rapid chromosomal rearrangements have led to a high rate of centromere turnover; for example, the eastern hoolock gibbon (Hoolock leuconedys [HLE]) retains at least 22 genus-specific chromosome rearrangements, 13 inactivated centromeres, and six evolutionary new centromeres (ENCs).2 Moreover, unlike the centromeres of great apes composed of 171bp AT-rich alpha-satellite arrays and larger higher-order arrays (HORs),6,7,8 the centromeres of many gibbon species lack alpha-satellite arrays arranged in an HOR structure.9,10
Figure 1.
Gibbon genera display high rates of karyotype variation since their radiation ∼8 million years ago
(A) The phylogeny of lesser apes with estimated divergence times is depicted based on Veeramah et al.11 The four gibbon genera (Hylobates, Hoolock, Symphalangus, and Nomascus) descended from a shared common ancestor ∼8 million years ago and now present with highly derivative karyotypes (2n = 38–52). The number below each branch represents the number of known species within each genus.
(B) Synteny between human and gibbon chromosomes is shown with a representative species from each gibbon genus, based on Capozzi et al.2 Each color represents homology to a different human autosome, with a key depicted along the bottom.
(C) Synteny between our assembled HLE chromosomes (top) and human T2T-CHM13 chromosomes (bottom) agrees with those demonstrated in (B), confirming a lack of large-scale structural misassemblies, and highlights the genome-wide chromosome rearrangements present in the HLE genome.
Recent studies in a complete human genome have identified a variety of epigenetic features that define centromere function, including a dip of decreased CpG methylation at the active centromere (centromere dip region [CDR]),12,13 late replication timing,14 and variable chromatin compaction into dichromatin.15 However, the conservation and function of these features beyond humans and within satellite-free centromeres are yet to be resolved. In addition, despite gibbons exhibiting a rate of chromosomal rearrangements up to 20 times higher than other primates,2 the factors driving this high rate are still not fully understood. One potential contributor to karyotype variation in gibbons is the propagation of LAVA, an active ∼2 kb gibbon-specific composite retrotransposon composed of long interspersed nuclear elements (LINEs), AluSz, variable number tandem repeats (VNTRs), and Alu-like segments.9,16 While present among all gibbons, LAVA has propagated at variable rates among the genera and is most abundantly found in the centromeres and pericentromeres of Hoolock species.9 LAVA elements are hypothesized to contribute to karyotype evolution by their subsequent co-option as a regulatory element within genome repair pathways.17 Although it is unclear whether LAVA arrays and centromeric variation are a causative or consequential factor in gibbon speciation, these highly variable centromeric units are a distinguishing feature of the Hoolock genus.9,10 Thus, the eastern hoolock gibbon serves as a compelling model to interrogate the potential relationship between rearrangement breakpoints and centromere turnover during rapid karyotypic evolution.
To survey centromeres and chromosomal rearrangements of the eastern hoolock gibbon, we developed a long-read-based genome assembly for HLE, cmHooLeu1. By analyzing the centromere-specific histone centromere protein A (CENP-A) occupancy, transposable elements (TEs), CpG methylation, chromatin accessibility, replication timing, transcription, RNA polymerase occupancy, and genome spatial conformation, we identified features that define centromere identity and breakpoints in the HLE genome. We find that, despite variability in TE content and sequence identity, HLE centromeres have a CDR that overlaps with CENP-A enrichment, indicating that this epigenetic feature is not restricted to alpha-satellite-containing native centromeres in apes. We further identify genomic and epigenomic features within functional centromeres associated with replication stress and chromosome instability, such as the presence of protein-coding genes, regulatory elements, pericentromeric segmental duplications (SDs) of LAVA and SST1 repeats, and variable replication timing within the centromere. Thus, we hypothesize that a combination of genomic and epigenetic features increases the propensity for chromosomal rearrangements.
Results
Generating and annotating a genome assembly for Betty, a female eastern hoolock gibbon
To generate a genome assembly for a female eastern hoolock gibbon (HLE), we generated ∼59× coverage of Oxford Nanopore Technologies (ONT) long-read sequencing data (including ∼19.5× coverage of ultra-long sequencing), 62× coverage of Illumina PCR-free sequencing, and 29× coverage of Dovetail Omni-C sequencing. Following assembly, scaffolding, and polishing, the final reference, cmHooLeu1, consists of 19 scaffolds corresponding to the haploid chromosome number of HLE with a total length of 2.761 Gb (N50 = 159.997 Mb), roughly equivalent to the short-read k-mer-based genome estimate (2.779–2.782 Gb; Tables 1 and S1). Alignment to the human reference T2T-CHM13 confirms the organization of HLE to human syntenic blocks,2,12 supporting the lack of large structural misassemblies in our HLE assembly (Figure 1C). Using BUSCO analysis,18 we found that the expected gene set is 95.5% complete using the primate dataset, and the consensus quality (QV) score using Illumina PCR-free sequencing was determined to be 46.71, roughly equivalent to an inferred nucleotide accuracy of 99.997% (Table 1).19
Table 1.
HLE assembly quality statistics
| Number of scaffolds | 19 |
| Total length (bp) | 2,760,609,521 |
| Largest scaffold (bp) | 226,217,378 |
| Scaffold N50 | 159.996 Mb |
| Scaffold L50 | 8 |
| Percentage of assembly in scaffolds | 100.00% |
| Number of contigs | 77 |
| Contig N50 | 79.153 Mb |
| Contig L50 | 13 |
| GC (%) | 41.25 |
| #Ns per 100 kbp | 0.22 |
| BUSCO complete | 95.5% |
| BUSCO fragmented | 1.2% |
| BUSCO missing | 3.3% |
| Merqury QV | 46.71 |
We identified an overall repeat content23 of 50.38%, for a total of 1,390,689,114 repetitive bases composed primarily of LINEs (21.54%), short interspersed nuclear elements (SINEs) (14.09%), and long terminal repeats (LTRs) (8.57%) (Table S2). LAVA comprises 0.7% of the assembled genome, higher than what is found in genomes of other gibbon genera (0.2% in Symphalangus syndactylus [SSY], GCF_028878055.2; 0.12% in Nomascus leucogenys [NLE], GCF_006542625.1; and 0.06% in Hylobates moloch [HME], GCF_009828535.3). A total of 20,113 protein-coding genes were predicted (BUSCO: 99.5% complete; 0.2% fragmented; 0.3% missing; n = 9,226). In addition, to assess the nascent transcriptome, we performed precision run-on sequencing (PRO-seq).24 The top three transcriptionally active repeats are SINEs, LINEs, and LTRs, and satellite transcription was low, mimicking the pattern seen in humans (Figures S1 and S2).25
To identify centromeric regions within the HLE genome that nucleate kinetochore assembly initiated through the deposition of CENP-A, CENP-A CUT&RUN sequencing was performed, and one centromere region was identified per chromosome. Five centromeres (Cen1, Cen3, Cen9, Cen11, and CenX) displayed even ONT and PacBio HiFi read coverage across the region, indicating uncollapsed assembly of these centromeric sequences (Figures S3, S4, and Table S3); independent assemblies using HiFi data (Figure S6 and Table S4) confirm their structure. Thus, unless noted otherwise, only these five centromeres were used in subsequent analyses.
HLE centromeres vary in repeat organization yet maintain a CDR and a dichromatin conformation
Among assembled centromeres, the span of CENP-A enrichment averaged 127 kb (75–162 kb) (Table S3), more than half the average size of CENP-A enrichment in CHM13 centromeres (317 kb).8 Unlike the alpha-satellite-rich centromeres of great apes, HLE centromeres displayed highly variable, complex repeat content dominated by the presence of TEs at both the surrounding pericentromere (Figure 2A) and the site of CENP-A enrichment (Figure 2B). The most prevalent TEs across assembled centromeres were LINEs, found in three of the five uncollapsed centromeres (Cen1, Cen3, and Cen9), followed by a mixture of SINEs and LTRs (Figures 2, S7, and Table S5). Notably, only two of the assembled centromeres, Cen3 and CenX, were composed of the primate centromeric alpha-satellite (Figures 2A, 3B, S8B, and S8C; Table S5), and only in CenX was it the major constituent of the CENP-A-enriched region (Figures 2B and 3A). Other satellites were also identified in pericentromeric regions; namely, Cen3 contained CER, a 48 bp repeat found on the centromeric q arms of human chr22, chr14, and chr18,26 and CenX contained gamma satellites (GSATs), a GC-rich satellite with large arrays in humans often associated with alpha-satellites (Figure 2A; Table S5).27 While LAVA was not found to be the main constituent of the DNA associated with CENP-A, it was found to be present in two centromeres (Cen9 and Cen11). Although not fully assembled, characterization of the collapsed centromeres also revealed a high density of LINEs and novel satellite/TE structures (Figure S7).
Figure 2.
Centromeres of the eastern hoolock gibbon are enriched with diverse transposable elements and vary in repeat organization yet maintain a CDR
(A–B) The percentage of total repeat content (bp) classified as each repeat class is shown for (A) the centromere region (defined as the CENP-A domain and 500 kb upstream and downstream) and (B) for the CENP-A enrichment domain of the five assembled HLE centromeres, highlighting the highly variable repeat composition of centromeres.
(C–D) Below, chromosome ideograms show the position of HLE (C) Cen9 and (D) Cen11, with chromosomes colored by their synteny to human chromosomes as per Figure 1. From top to bottom, genome tracks denote CENP-A CUT&RUN enrichment (blue), repeat annotations colored according to the key below, synteny to T2T-CHM13, and predicted HLE-CHM13 synteny breakpoints. Fiber-seq inferred regulatory element (FIRE) tracks show FIRE density binned per 1 kb on a heatmap scale from white to black (i.e., low to high density), showing increased density of FIREs correlating with CENP-A enrichment corresponding to dichromatin organization. Gene tracks (blue and tan bars indicating true and falsely predicted exons, respectively) show gene predictions from FLAG, showing the presence of several genes nearby and overlapping with CENP-A enrichment. Replication timing from E/L Repli-seq is shown as black points indicating the log ratio of early-to-late coverage over 5 kb windows from 4 (early replication) to −4 (late replication), with a red line indicating the 10 point moving average. CpG methylation is shown via line plot (black line) and on a heatmap scale from low CpG methylation (black) to high CpG methylation (red). In HLE, CENP-A enrichment is associated with a dip in CpG methylation (CDRs) even in the absence of alpha-satellite-containing centromeres and despite significant changes in CG density (purple). Finally, sequence identity plots are shown for each assembled centromere, with a scale from blue (low identity) to red (high identity). Overall, regions of CENP-A enrichment share little sequence identity compared to canonical and pericentromeric primate alpha-satellite arrays.
Figure 3.
HLE CenX displays canonical centromeric alpha-satellite signatures, but a latent alpha-satellite centromere on HLE chromosome 17 has lost epigenetic signatures of centromere function
(A and B) Chromosome ideograms show the position of HLE (A) CenX flanked by dense LINE-rich regions and a latent centromere on (B) chr17, with chromosomes colored by their synteny to human chromosomes as per Figure 1. Genome tracks denote CENP-A CUT&RUN enrichment (blue), repeat annotations with each repeat class represented by a different color, synteny to T2T-CHM13, predicted breakpoints, FIRE elements, genes, replication timing, CpG methylation, CG percentage, and sequence identity, per the key in Figure 2. Colorless regions on the T2T-CHM13 synteny track indicate regions without shared alignment.
(C and D) Zoomed images for (C) CenX and (D) chr17 highlight the repeat organization of the two alpha-satellite arrays, which present with LINE (CenX and chr17), Alu (CenX), and LAVA (chr17) insertions. Tracks show the presence of a CDR over the functional CenX, which is absent in the highly methylated latent alpha-satellite centromere on chr17. Bottom, red boxes within FIRE tracks show the presence of disorganized FIRE elements/open chromatin (C, blue inset) in the active CenX corresponding to dichromatin, which is absent in the surrounding heterochromatin (C, purple inset) and the latent centromere on chr17 (D, purple inset). The LAVA element on chr17 (yellow) is more accessible than the surrounding alpha-satellites (D, blue inset), suggesting it is functional.
(E) The image shows a dot plot (Gepard, word length 50)28 comparing the HLE CenX (HLE_chr_X:56,370,779–56,477,597) sequenced assembled herein to the SSY gibbon CenX (mSymSyn1_v2.0 chrX_hap1:70,696,431–71,324,358) described previously.29 Corresponding alpha-satellite annotation tracks are shown for both centromeres, showing alpha-satellite suprachromosomal families (SFs) and the strand orientation (blue and red). A deletion breakpoint coincides with the only AS strand switch point in the HLE CenX and is shown in the HLE_chr_X:56,442,082–56,442,094 window. Breaks in the diagonals on both sides represent small deletions in SSY relative to the HLE-SSY common ancestor. UCSC Browser annotation tracks are described in Makova et al.29 and represent alpha-satellite superfamily annotation (top/left) and alpha-satellite strand annotation (bottom/right).
All assembled centromeres lack the region of high sequence identity encompassing alpha-satellite HOR expansions that have been identified in humans and other primates (Figures 2C, 2D, 3A, and S8).8,29,30,31 In fact, within the only hoolock centromere containing predominantly alpha-satellites, CenX, there is only 75%–80% sequence identity across the core. None of the assembled centromeres contain a highly homogenized repeat structure, as evidenced in the StainedGlass sequence similarity plots (Figures 2C, 2D, 3A, and S7), which show limited to no similarity among repeats within a centromere. Comparison to the published CenX of siamang (SSY)29 revealed that the SSY CenX is much larger (∼602 kb) compared to the HLE CenX (∼90 kb), and the two centromeres share only the extreme flanks of the alpha-satellite array, with the middle, including the homogeneous active HOR array in SSY, deleted in HLE CenX (Figure 3E). The HLE CenX array contains only the divergent monomeric suprachromosomal families (SFs) formed by older SFs 9–12, which have not been shown to form functional centromeres in other apes.6,7,8 HLE CenX does not contain SF4, which forms active centromeres in SSY.29 Nevertheless, this reduced centromere is capable of supporting kinetochore formation.
Despite the lack of overall homogenized repeat arrays at the centromere core in HLE, satellite arrays with high sequence identity often flank the site of CENP-A enrichment, including a CER satellite array on Cen3, a LAVA expansion on Cen9, and large tracts of LINE/L1s on CenX (Figures 2C, 2D, and 3A). The variation in centromere composition suggests that gibbon centromeres underwent an independent evolutionary history compared to great apes, potentially as a response to drastic chromosomal rearrangements impacting centromeric regions. Like human alpha-satellite centromeres,25 these satellites exhibit a low PRO-seq signal, yet the PRO-seq signal increases over TEs that comprise the centromere and at the edges of CENP-A enrichment (Figure S9). However, unlike highly homogenized human centromere cores, the CENP-A binding domain is not a transcriptional dead zone, even at the alpha-satellite-enriched CenX (Figure S9), mimicking the pattern seen at the diverged, outer alpha-satellite layers in human centromeres.25
Within the active centromeric alpha-satellite HORs in humans, a small region is associated with a distinctive decrease in CpG methylation (CDR) coincident with CENP-A nucleosomes, indicating the importance of CpG methylation in proper kinetochore positioning and centromere functioning.13 However, it is not clear if this epigenetic feature is conserved widely across centromeres and among small apes that lack the canonical ape centromeric alpha-satellite organization. Using single-molecule DNA modification detection obtainable with ONT sequencing reads, we found that each assembled centromere contained a CDR concomitant with CENP-A occupancy despite high CpG methylation in the surrounding heterochromatin (Figures 2C, 2D, 3A, and S8). The conservation of CDRs in HLE centromeres supports a model in which epigenetic regulation of kinetochore positioning is independent of alpha-satellite-dense centromeres or HOR organization. Using Fiber-seq, we overlapped the five assembled HLE centromeres with m6A methylation calls to examine chromatin compaction in these regions and found a higher density of 6mA-methylated regions and Fiber-seq inferred regulatory elements (FIREs) within CENP-A enrichment than outside, indicative of accessible chromatin (Figures 2C, 2D, 3A, and S8). Thus, in addition to CDRs, HLE centromeres contain dichromatin, a unique form of chromatin compaction recently reported to define centromeric chromatin.15
Rearrangement of protein-coding genes is present throughout HLE centromeres and pericentromeres
Centromeres are typically established on highly condensed, gene-poor repetitive sequences, providing little opportunity for complex kinetochore machinery to interfere with the expression of protein-coding genes.32 However, we identified a total of 21 predicted genes within 200 kb of the span of CENP-A enrichment in the five assembled centromeres (Table S6; Figures 2C, 2D, and 3A). In one of the assembled centromeres, Cen9, the predicted genes directly overlapped with CENP-A enrichment.
Gene annotation of Cen9 revealed that CENP-A enrichment overlapped a portion of a phospholipid scramblase 2 (PLSCR2) prediction, which, in humans, is located on a syntenic portion of chr3 (Figure 2C). CENP-A enrichment at Cen9 overlaps with an HLE breakpoint between segments syntenic to human chr3 and chr12 and disrupts the phospholipid scramblase gene cluster present on human chr3, which is comprised of PLSCR1, PLSCR2, PLSCR4, and PLSCR5, separating these genes to HLE chr6 (PLSCR5 and PLSCR1) and chr9 (PLSCR4) (Figures S10A and S10B). While the HLE Cen9 region annotation is analogous to a potential PLSCR2 transcript in humans, we found no evidence of PLSCR2 exons at the HLE Cen9 region or elsewhere in the assembly, suggesting potential loss of the gene during recombination. In humans, PLSCR2 is a lowly expressed protein-coding gene in a family of calcium-binding proteins suggested to be involved in the blood coagulation cascade as well as macrophage clearance of apoptotic cells.33 No evidence of transcription in HLE was detected in the recombined region by mapping total RNA sequencing and precision run-on sequencing (PRO-seq) reads to the assembly (Figure S10A). Despite the disruption to organization in HLE, PLSCR1 remained highly transcribed in HLE and NLE gibbons (Figure S10A).
SDs and divergent replication timing, hallmarks of chromosome instability, are predicted among HLE centromeres
Highly identical and repetitive by definition, SDs are known to lead to chromosomal rearrangements via nonallelic homologous recombination, resulting in deletions, duplications, translocations, and inversions.34,35 Notably, many studies have identified an association between SDs and subsequent chromosomal instability and recurrent genomic disorders,35,36,37,38 and SDs have been linked to numerous cases of evolutionary chromosome rearrangements, including in mice,39 chimpanzees,40 and other apes.35,41,42 Therefore, we surveyed SDs, broadly defined as >1 kb long genomic duplications exceeding >90% sequence identity, in the HLE assembly using BISER.38,43 In total, 2,842 SDs were predicted, accounting for a total of 19,696,320 bp, or 0.71% of the assembly (Figure S11A). Intersecting SDs with all centromere predictions ±500 kb, 10.6% of bases are predicted to be covered by SDs, nearly ∼15× higher than the genome-wide total. While the total predicted SDs are relatively low compared to SDs found in human T2T-CHM13 (estimated to comprise 218 Mb and ∼7.0% of the genome44), we suspect that collapsed sequences in our assembly result in an underestimation of SDs both genome-wide and among centromeres in our assembly.
Previous studies have reported associations between SST1 repeats, LAVA elements, and heterochromatic enrichment in HLE.9,10 In our assembly, we found that CENP-A is not directly associated with LAVA or SST1 enrichment (except for Cen6 and Cen18) (Figures S4 and S5). Instead, SST1 and LAVA form large, high-identity arrays, sometimes exceeding 1 Mb, in the pericentromere of many chromosomes (Figures 2C, 2D, and S3–S5). Therefore, we extended our analysis of SDs to pericentromeric LAVA and SST1 repeats by intersecting SDs with LAVA and SST1 annotations. Of the 2,842 SDs, 154 (362,610 bp) contained LAVA (Figure S11B), while 487 SDs (91,899 bp) overlapped SST1s (Figure S11B). The majority of these high-identity SDs overlapped with pericentromeric locations (including both collapsed and uncollapsed centromeres), suggesting that pericentromeric loci are enriched in inter-chromosomal duplications (Figure S11B).
In addition to SDs, dysregulation of replication timing profiles has been linked extensively to genome instability, with observable shifts to earlier replication timing profiles associated with cancerous phenotypes,45,46,47,48 chromosomal breakage,49,50 and a higher probability of translocations in mice and humans.51 In humans, the timing of centromere replication occurs during mid-late S phase, and its exact timing is more precisely conserved across all chromosomes within an individual, suggesting the functional coordination of centromeric replication timing among chromosomes.14 In order to investigate whether this applies to HLE centromeres that have variable TEs and gene composition, we performed E/L Repli-seq52 on HLE to map DNA replication timing genome wide.
Among the five assembled HLE centromeres, three (Cen1, Cen3, and CenX) were consistent with expectations of mid-to-late S-phase replication timing across the centromeric and pericentromeric regions (Figures 3A, S8A, and S8B). However, Cen9 and Cen11 showed evidence of mid-to-early S-phase replication (Figures 2C and 2D). Additionally, while Cen11 appears to have consistent early S-phase replication timing across the centromere region, Cen9 coincides with shifts in replication timing from the early to late S phase, particularly near expansive LAVA arrays in the surrounding pericentromere (Figure 2C). Of note, the two assembled HLE centromeres with early S-phase replication also share genes within or directly upstream of CENP-A enrichment, significant regions of upstream hypomethylation concomitant with increases in chromatin accessibility in the region, and pericentromeric LAVA expansions (Figures 2C and 2D), which may contribute to replication timing dysregulation.
HLE Cen17 is defined by a unique composite repeat duplication not found in other apes
Within the Hoolock genus, HLE Cen17 (Figure 4) and Cen11 (Figure 2D) are the only ENCs unique to the genus, and they exhibit significantly lower heterochromatic LAVA amplification compared to all other centromeres,2,9 prompting further investigation. Previous studies have identified a latent, non-functional centromere on the q arm of HLE chr17 concomitant with the formation of an ENC.2 Therefore, we searched for alpha-satellite sequences and identified a ∼28 kb array of alpha-satellites presumed to be the latent centromere on chr17 and corresponding to a portion of the syntenic centromere of SSY chr24 (Figure S12). This array is smaller than most active alpha-satellite arrays in primates; however, NucFreq53 analysis supports the proper assembly of the region (Figure S13). No CENP-A CUT&RUN reads aligned to this alpha-satellite array located ∼550 kb downstream of the CENP-A region (Figure 3B). Unlike active alpha-satellite arrays in other primates,30 this array lacks sequence homogenization and contains multiple TE insertions, including three LINE/L1Hylobs and two LAVA insertions, LAVA_B and LAVA_E (Figures 3B and 3D). Moreover, these satellites lack a detectable CDR (Figures 3B and 3D), indicating that alpha-satellite arrays without CENP-A also lack a CDR. Fiber-seq analysis showed an absence of accessible chromatin compared to the active alpha-satellite array on the X chromosome, with the exception of a single highly accessible and potentially transposition-competent LAVA insertion (Figures 3A and 3B). Further analyses are required to determine whether alpha-satellite sequence degeneration, total array size reduction, and/or increased TE insertions into this region served as causative factors for centromere re-localization or followed centromere inactivation.
Figure 4.
HLE Cen17 is defined by a unique composite repeat duplication not found in other apes
(A) Left, HLE chr17 is depicted with colors indicating synteny to T2T-CHM13. CENP-A CUT&RUN enrichment is shown vertically along the entire chromosome (blue). A zoomed image shows CENP-A CUT&RUN mapping filtered by reads overlapping with unique 21-mers in the HLE assembly and total unfiltered CENP-A peaks. Bottom, a repeat track shows the L5A5 composite repeat assembled in tandem 24 times. FIRE element density, genes, CpG methylation, and GC percentage are shown according to the key in Figure 2.
(B) The 3,319 bp consensus sequence of the L5A5 repeat is shown.
(C) DNA FISH on HLE metaphase spreads using a Dig-labeled oligo specific to the L5A5 repeat shows centromeric hybridization on one chromosome pair (green). Human chr20 whole chromosome paint (red) hybridizes to the same chromosome as L5A5, confirming the location of L5A5 to HLE chr17, which shares synteny to human chr20.2
(D) Top, distribution of 21-mer counts from PCR-free Illumina data is shown as 21-mer multiplicity (the number of times a 21-mer was found in the PCR-free Illumina reads) vs. the number of 21-mers found at that multiplicity. The chart peaks at 46×, representing the estimated PCR-Free Illumina sequencing depth. Bottom, the L5A5 copy number is estimated. Along the x axis, the L5A5 consensus sequence is shown, and the y axis represents the estimated number of L5A5 repeats in the HLE diploid genome. A horizontal line represents the median of ∼1,154 copies in the HLE diploid genome (∼577 per haplotype).
(E) A phylogeny of the L5A5 repeat across 14 primates is shown. The L5A5 repeat was found in all great apes, gibbons, and the golden snub-nosed monkey but not in marmoset, tarsier, or lemur genome. While the L5A5 repeat subunit structure is relatively conserved among gibbons, a SINE/AluSx and LINE/L1ME1 deletion shortened the consensus in great apes by ∼700 bp. HLE is the only species with an arrayed L5A5 centromeric structure; all other species have only one L5A5 copy identified.
While Cen11 was successfully resolved and found to be composed of diverse TEs (Figure 2D), sequence collapses were present in the span of CENP-A enrichment of Cen17 (Figures S4 and S5). Nevertheless, we confirmed CENP-A enrichment by overlapping mapped reads with unique 21-mer sequences in the HLE assembly localized to the expected 84 kb region (Figure 4A). Our analyses revealed that HLE Cen17 is composed of a composite repeat25 exclusive to this locus (Figure 4A). Each composite repeat contained 10 subunits—five LINEs (including one L2a and three L1Ms and one split by an Alu insertion) and five SINE/Alus (including three AluSx elements, one AluJb, and one AluY) and a short (T)n simple repeat—averaging 3,319 bp in length (Figure 4B; Table S7) and arrayed in the linear assembled sequence 24 times (Figure 4A). Fluorescence in situ hybridization (FISH) confirmed the centromeric localization of the sequence, hereafter called “L5A5,” to one chromosome pair confirmed to be HLE chr17 (Table S8; Figure 4C). Although 24 copies of the L5A5 repeat were assembled in Cen17 (Figure 4D), we employed a k-mer-based approach to estimate the uncollapsed copy number of the L5A5 repeat.12 Using this approach, we estimate 1,154 total copies of L5A5 (∼577 copies per haplotype totaling ∼1.9 Mb; Figure 4D) and confirmed its arrayed structure in HLE using PCR (Table S9; Figure S14).
As the L5A5 repeat has not previously been described as a centromeric repeat, we searched for copies in 13 other primates with available high-quality genome assemblies representing species from each major lineage (Table S10). The L5A5 composite repeat was detected across great apes, gibbons, and the golden snub-nosed monkey (Figure 4E) but not in the marmoset, tarsier, or lemur genome, suggesting that the L5A5 unit evolved after the split between Old and New World monkeys, likely due to active TE insertions (Figure 4E). Expectedly, L5A5 repeats in the three other gibbon genera shared the most similarity to the HLE L5A5, with one short L1M duplication at the end of the consensus differentiating them (Figure 4E). The L5A5 repeat in the golden snub-nosed monkey assembly displayed the most divergence from the Hoolock consensus, including additional LINE/L1ME2 and LTR/ERVL insertions and sequence deletions (Figure 4E), likely a reflection of the >25 million year divergence time between gibbons and Old World monkeys.54 Similarly, all great apes shared a SINE/AluSx and LINE/L1ME1 sequence deletion, shortening the consensus sequence to ∼2,619 bp (Figure 4E).
In addition to lineage-specific sequence evolution across primates, the most notable difference between the centromeric L5A5 repeats in HLE and the non-centromeric L5A5 across other primates was its copy number. While L5A5 was arrayed ∼577 times per haplotype in HLE, BLAST analysis (Figure 4E) and PCR in HLE (Figure S14; Table S9) confirmed that only one copy per haplotype was present in each of the other primate assemblies, including non-HLE gibbons. Combined with the observation that Cen17 is a Hoolock-specific ENC, these findings suggest a link between the L5A5 composite repeat amplification and the formation of the lineage-specific centromere.
HLE breakpoints exhibit distinct genetic and epigenetic features
The availability of a high-quality, contiguous assembly for HLE and the suite of omics data we generated provided a unique opportunity to investigate the genetic and epigenetic mechanisms underlying the karyotypic diversity in gibbons and its possible relationship with rapid centromere evolution. Paired with previously reported chromosome mapping2 in gibbons, we expanded our analysis to HLE evolutionary breakpoints (EBRs) by aligning our cmHooLeu1 against reference assemblies for human (T2T-CHM13), NLE (Asia_NLE_v1), HMO (HMol_V3), and SSY (NHGRI_mSymSyn1-v1.1-hic.freeze_pri) to identify EBRs.55 Overall, a total of 364 EBRs were identified (123, 92, 74, and 75 EBRs comparing HLE to CHM13, NLE, HMO, and SSY, respectively, with a median size of 40.8 kb and accounting for 202 unique loci; Tables S11 and S12; Figure 5A). Among the 19 centromeres in HLE, 13 (∼68%) overlapped with predicted breakpoints (Figure 5B).
Figure 5.
HLE BOSs exhibit distinct genetic and epigenetic features
(A) An ideogram of the assembled HLE chromosomes is shown, with colors corresponding to synteny between human chromosomes (T2T-CHM13) according to the key. To the right of each chromosome, circle markers indicate the location of HLE breaks of synteny (BOSs) respective to the T2T-CHM13, NLE, HMO, and SSY genome assemblies in differing colors.
(B) An UpSet plot shows BOSs found at each HLE centromere.
(C) The percentage of total repeats in the overall HLE assembly and at BOSs respective to T2T-CHM13, NLE, HMO, and SSY are shown, with each repeat class represented in a different color. SINEs, LAVAs, SST1s, and simple/low-complexity repeats are prevalent in BOS regions, while LINEs appear depleted.
(D) Aggregated CpG methylation across LINE/L1Hylobs and LAVAs are depicted as ridge plots, showing repeats annotated within BOSs respective to CHM13, HMO, NLE, and SSY, as well as repeats outside BOSs. Both L1Hylobs and LAVAs are less methylated in BOSs on average (highlighted in yellow), with a few exceptions. Specifically, LAVAs in HLE-NLE and HLE-SSY BOSs and L1Hylobs in HLE-HMO BOSs show significant shifts toward lower CpG methylation (p < 0.0001).
(E) Percentage of BOSs covered by segmental duplications is shown, with one dot corresponding to each BOS and black lines indicating the average percentage of bases covered in each category (including BOSs with no coverage). The vertical red line at 0.0071 indicates the coverage of SDs genome wide.
(F) Dot plots (black) and LOESS smoothed curves (blue) show dips in median insulation scores at BOSs. Heatmaps show a reduction in the frequency of genomic interactions around BOSs on a scale from low (blue) to high (red).
(G) A dot plot of minimum insulation score (left) shows that older BOSs (HyA) are more insulated (lower insulation score) than younger (HLE) BOSs (p < 0.0001). On the right, the marginal effect of BOS age on nucleotide diversity is plotted after controlling for other genomic features associated with nucleotide diversity. Older BOSs were found to have significantly lower nucleotide diversity than younger BOSs (p < 0.0001). Error bars show the standard deviation.
Compared to genome-wide repeats in the HLE assembly, breakpoints contain a higher percentage of LAVA elements (averaging 4.62% of EBR repeats vs. 1.39% genome wide), SST1s (averaging 3.18% vs. 0.05%), and SINEs (averaging 33.19% vs. 28.03%) (Figure 5C; Table S13). Similarly, satellites, low-complexity repeats, and simple repeats composed a higher percentage of repeats in EBRs than genome wide (Figure 5C; Table S13). Although prevalent in breakpoint regions, LINEs were less commonly found compared to the overall assembly, composing an average of 31.47% EBR repeats compared to 42.93% of repeats genome wide (Figure 5C; Table S13).
Previous work in NLE identified enrichment of hypomethylated Alus in EBRs,56 suggesting a correlation between epigenetic state and genome stability. CpG methylation has been shown to repress the retrotransposition of TEs in mammals57; therefore, the hypomethylation of Alus at breakpoints may lead to higher TE activity and genome recombination in gibbons. Accordingly, we assessed CpG methylation of LINEs, SINEs, and LAVAs within and outside of EBRs using Methylartist58 (Figures 5D and S15; Table S14). We did not detect hypomethylation of Alus at breakpoints (Figure S15); in fact, on average, Alus were more methylated within EBRs than outside (Figure S15; Table S14). However, L1Hylob LINEs were less methylated at HLE-HMO EBRs on average compared to those outside of EBRs (p < 0.0001), and LAVA elements in HLE-NLE and HLE-SSY EBRs were less methylated than LAVAs found elsewhere (p < 0.0001) (Figure 5D; Table S14). These results suggest that hypomethylation and consequential activity of TEs may be correlated to the genome instability of gibbons, yet this does not appear to be restricted to a specific repeat class.59
NLE EBRs have been found to be enriched in SDs, suggesting a link between duplicated regions and chromosome reshuffling in gibbons.2,56,60 Of the 364 total EBRs identified in the HLE assembly, 179 overlap with at least one SD (49.18%) (Table S15; Figure 5E). Of these, 95.5% (n = 171) were covered by a higher percentage of SDs compared to the HLE assembly, for an average of 25.85% (compared to 0.71% of the total assembly SD coverage) (Table S15; Figure 5E). These observations provide additional support for a correlation between SDs in centromeres and karyotype evolution in gibbons.
Finally, several studies report an association between EBRs and spatial chromatin conformation, particularly boundaries of self-interacting topologically associated domains (TADs).61,62,63 Breakpoints are often found colocalized with TAD boundaries and present with a reduction of chromatin interaction across the two sides of the breakpoint. Consistent with these reports, our Omni-C data showed a reduction of chromatin interaction frequency at HLE breakpoints as well as a decrease in insulation score (a measure of the frequency of interactions passing through any given region of the genome) around EBRs (Figure 5F). HLE EBRs were significantly closer to TAD boundaries and had significantly lower median and minimum insulation scores compared to random background (p < 0.05, Table S16).
In order to investigate the temporal dynamics of chromosomal rearrangements and TAD boundary establishment, we utilized FISH-based chromosome mapping2 to stratify breakpoints into two groups based on their evolutionary context: those shared in the ancestral karyotype state (HyA) or those found specific to the HLE lineage (Table S12). We find that shared HyA breakpoints, which likely correspond to older evolutionary events, provide stronger insulation for chromatin interactions (p < 0.0001) and were located closer to TAD boundaries (p < 0.05) compared to younger, Hoolock-specific breakpoints. In addition, HyA breakpoints were also found to be associated with significantly lower estimates of nucleotide diversity than HLE breakpoints (Figures 5G, S16, and S17; Table S17). Despite strong insulation and low diversity of ancestral breakpoints, no other genetic or epigenetic features were found to be significantly different among these breakpoints, including breakpoint size, SD coverage, CpG methylation, replication timing, chromatin accessibility, predicted CTCF binding sites, gene content, or repeat composition, with the exception of LAVA, which was found to be slightly enriched in older breakpoints (Figure S18). Combined, these features suggest that older breakpoints are more constrained regardless of DNA content or other epigenetic features and thus represent stronger TAD boundaries.
Discussion
Most DNA sequences involved in essential cellular functions are highly conserved across species, yet centromeric DNA and associated proteins evolve rapidly, indicating that centromeres are specified and maintained through epigenetic mechanisms.64 While the factors driving centromere localization and function have been difficult to elucidate using short-read-based methods, advancements in centromere assembly have now afforded the opportunity to map key epigenetic features that distinguish functional centromeres. In this study, we used long-read sequencing to assemble the eastern hoolock gibbon (HLE) genome and generate five gapless gibbon centromere assemblies, enabling genomic analysis of gibbon centromere content and organization at the level of single chromosomes. We report that HLE centromeres vary in their TEs and satellite composition across chromosomes, differing from the canonical alpha-satellite organization of other ape centromeres.8 Despite this difference, functional HLE centromeres retain a CpG methylation dip (CDR) concomitant with CENP-A nucleosome enrichment, indicating that this epigenetic feature is conserved across ape lineages and independent of alpha-satellites. Additionally, we find that dichotomous compacted and accessible chromatin (dichromatin) is conserved among both human alpha-satellite arrays and the TE-rich centromeres. Thus, the conserved epigenetic structure of primate centromeres includes centromere-specific histones (CENP-A), dichotomous chromatin, and a CDR independent of sequence type (satellite vs. TE vs. composite repeat) or organization (complex vs. HOR vs. array).
The findings described herein support numerous observations in species across other phyla with TE-rich centromeres and centromere specifications that are not dependent on satellite DNA. Such observations have been made in Drosophila,65 several macropodid species66,67,68 and the koala,69 horses and zebras,70 the Atlantic horseshoe crab,71 and many plant species (e.g., Luo et al.,72 Gent et al.,73 and Nagaki et al.74) whose centromeres are made up of centromeric retroelements and satellite sequences. However, our identification of alpha-satellite-free centromeres in HLE is the first example of native TE-rich centromeres in any ape described to date, indicating that alpha-satellites are not a requirement for maintaining native ape centromeres. Of note, SST1 elements were recently found to have the highest enrichment of accessibility peaks compared to other satellites in the human genome,13 and retroelements, like LINEs, were found to be embedded within some alpha-satellite arrays in humans but largely exist as the principal loci of centromeric transcription, defining centromere TE boundaries.25 The enrichment of these repeats with high accessibility and transcription potential within and surrounding HLE centromeres suggests a potential role for these elements in centromere specification via transcriptional or epigenetic regulation.
Despite the conservation of the epigenetic structure, we find several features that differentiate gibbon centromeres from their human counterparts. We identified an overlap between an EBR in the PLSCR protein-coding gene cluster and CENP-A enrichment in HLE Cen9, a significant dissimilarity between the gene-poor heterochromatic regions underlying CENP-A-bound regions in humans and many other species. While PLSCR2 is lowly expressed in humans,75 in gibbons, the lack of detected expression paired with its recombined, potentially inactive structure due to chromosomal rearrangement and localization at an HLE centromere suggests PLSCR2 loss of function.
DNA replication within the cell during the S phase is a highly tuned, dynamically regulated temporal process.76,77 Replication timing domains, functional units of replication origins that initiate synchronously, provide a characteristic and coordinated temporal organization for DNA replication dependent on cell and developmental status.76,77 It has been reported that centromeres have variable replication timing across eukaryotes, with centromeres replicating in the early S phase across yeast species,78,79,80,81 the mid S phase in maize,82 and the mid-late S phase in humans14,83,84,85,86; however, within a species, the replication timing of centromeres is conserved among chromosomes. Replication stress has been shown to be caused by several factors, including chromatin compaction, collisions between transcriptional and replicative machinery, repeats, secondary structures, and histone modifications (reviewed in Zeman and Cimprich87). In fact, most active genes and SINE/Alus are early replicating, while LINEs and centromeres have been reported to be late replicating.87,88 SINE-R, VNTR, and Alu-like (SVA) elements, which share the VNTR and Alu-like components with LAVA, have been shown to cause replication fork stalling, leading to later replication.89 Such stress has been linked extensively to genome instability in mice and humans.50 Here, we found a discrepancy in the replication timing of HLE centromeres compared to that of humans, which takes place in the mid-late S phase.84 Instead, HLE chromosomes contain centromeres with both early- and late-replicating DNA. We postulate that a source of such asynchrony lies within the highly variable repeat composition of the HLE centromere. Moreover, the two HLE centromeres with early S-phase replication (Cen9 and Cen11) all have genes overlapping or directly upstream of CENP-A enrichment regions, significant upstream hypomethylation concomitant with increased chromatin accessibility, and pericentromeric LAVA expansions comprised of TEs with likely opposing replication timing expectations. Hence, such conflicting features may cause replication stress and variation in replication timing, which may, in turn, lead to genome instability. However, whether replication stress is a cause or consequence of the chromosome instability observed in HLE (and possibly in all gibbons) is still yet to be determined.
In addition to replicative stress at the centromeres, our analysis of EBRs in the HLE genome suggests an additional mechanism that may contribute to genome instability. CpG methylation is an important regulator of TE activity, suppressing TE expression and subsequent propagation in the genome, which can drive chromosomal rearrangements.57,90 Previous studies in NLE showed that Alu elements at breakpoints were less methylated than those outside of breakpoints, suggesting that these epigenetic differences contribute to TE activity leading to repeat-mediated chromosomal rearrangements.56 While we did not detect similar patterns of Alu hypomethylation at breakpoints in HLE, we find hypomethylation of other TEs, particularly L1Hylobs and LAVAs, at breakpoint regions. Given the high rates of LAVA proliferation in HLE compared to other gibbon genera, it is clear that TEs have undergone unique evolutionary histories since the radiation of gibbons from a common ancestor. Therefore, it is plausible that, similar to Alus in NLE, hypomethylation of L1Hylob and LAVA elements contributed to karyotype diversity in the Hoolock genus. Exploration of other genera is required to determine if this phenomenon is specific to the Hoolock genus or has played a more widespread role in the karyotype evolution of other gibbons.
Finally, we extended our analysis of breakpoints into TADs, chromatin domains that serve as fundamental units of three-dimensional genome organization and are hypothesized to serve as gene regulatory units by controlling long-range interactions.91 The conservation of such domains across species has been associated with the conservation of syntenic regions across evolution, and studies have identified that chromosomal breakpoints are more common at TAD boundaries than inside TADs.61,63,92,93 These observations have reinforced the hypothesis that TADs are evolutionarily constrained, and chromosomal rearrangements disrupting TADs are negatively selected. We found a reduction of chromatin interaction frequency and insulation scores at breakpoints, concordant with previous findings suggesting that breakpoints coincide with TAD boundaries.55,63 Additionally, we find that ancestral breakpoints among gibbons generate stronger interaction insulation and have lower nucleotide diversity than younger breakpoints despite no other significant differences in genetic and epigenetic features (with the exception of LAVA elements). Collectively, these data suggest that there is a relationship between genome rearrangements and maintenance of genome topology. Combined with the observations in HLE centromeres described above, repeat-mediated chromosomal rearrangements paired with centromeric replicative stress emerge as a potential driver for the karyotypic variation seen in Hoolock gibbons. Therefore, we propose that an array of interconnected epigenetic and genetic features, rather than just one isolated element, contribute to the genome remodeling observed in gibbons. All features identified within and around HLE centromeres, including hypomethylation, chromatin compaction, TEs, SDs, and satellite arrays, have been linked to genomic instability in other model systems and other gibbon species. We speculate that these various features all impose unique replicative stress on the HLE genome, as the finely-tuned replication timing program must balance the presence of centromeric coding genes and associated transcriptional machinery, extreme variation in methylation and chromatin compaction, and large arrays of tandemly repeated SST1 and LAVA SDs. Continued efforts to produce high-quality genome resources from gibbons promise to unravel the mechanisms dictating their unique chromosome evolution and provide much needed genomic information for conservation management efforts for these endangered apes.
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Rachel J. O’Neill (rachel.oneill@uconn.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
The datasets generated during this study are available at the NCBI BioProject: PRJNA1153068 and BioSample: SAMN43386187. The code used during this study is available at Github: https://github.com/gabriellehartley/HLE_Assembly and Zenodo: https://doi.org/10.5281/ZENODO.13324332.
Acknowledgments
R.J.O. and G.A.H. were supported by NIH R01GM123312 and NSF 2217100. L. Carbone is supported by NIH R01HG010333 and P51OD011092. D.M.G. is supported by NIH R01HG007352. The authors would like to acknowledge the Center for Genome Innovation for sequencing resources and the UConn Computational Biology Core for computational resources. The authors would also like to acknowledge Gabriel Mirasol for the gibbon illustrations depicted in Figure 1. Finally, the authors would like to acknowledge the Gibbon Conservation Center in Santa Clarita, California, for their sample support and continued efforts in gibbon conservation.
Author contributions
Conceptualization, G.A.H., L. Carbone, and R.J.O.; data curation, G.A.H. and R.J.O.; formal analysis – genome assembly, G.A.H. and R.J.O.; formal analysis – repeat and segmental duplication annotation, G.A.H., J.M.S., and R.J.O.; formal analysis – fluorescence in situ hybridization, G.A.H. and R.J.O.; formal analysis – alpha-satellite annotation, I.A. and F.R.; formal analysis – substitution rates, N.A.; formal analysis – gene predictions, E.F. and W.T.; formal analysis – breakpoint and TAD analysis, G.A.H., M.O., J.W., and L. Carbone; formal analysis – FIRE, G.A.H., S.N., D.D., and A.S.; sequencing data generation – genome sequencing (ONT), N.P., G.A.H., and R.J.O.; sequencing data generation – genome sequencing (PacBio/Fiber-seq), G.A.H., Y.M., A.S., and R.J.O.; sequencing data generation – genome sequencing (Illumina PCR Free), N.P., G.A.H., and R.J.O.; sequencing data generation – Omni-C, C.M., N.P., and R.J.O.; sequencing data generation – PRO-seq, S.J.H., R.D., D.S., and L. Core; sequencing data generation – Repli-seq, G.A.H., T.S., D.M.G., and R.J.O.; funding acquisition, L. Carbone and R.J.O.; sample acquisition, L. Carbone and R.J.O.; writing – primary, G.A.H. and R.J.O.; writing – review & editing, G.A.H., M.O., S.J.H., E.F., and R.J.O.
Declaration of interests
R.J.O. serves on the scientific advisory board (SAB) of Colossal Biosciences and has been supported to present at ONT events.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| CENP-A monoclonal antibody (3–19) | Enzo | Cat# ADI-KAM-CC006; RRID: AB_10621614 |
| Purified Mouse Anti- BrdU | BD Pharmingen | Cat# 555627; RRID: AB_10015222 |
| Anti-Mouse IgG (whole molecule) antibody produced in rabbit | Sigma-Aldrich | Cat# M7023; RRID: AB_260634 |
| Chemicals, peptides, and recombinant proteins | ||
| Hia5 MTase | Laboratory of Andrew Stergachis | N/A |
| S-adenosylmethionine | New England Biolabs | Cat# B9003S |
| BrdU | Sigma-Aldrich | Cat# B5002 |
| Propidium iodide | Sigma-Aldrich | Cat# P4170 |
| Critical commercial assays | ||
| PromethION R9.4 flow cells | Oxford Nanopore Technologies | Cat# FLOPRO002 |
| Circulomics Nanobind UHMW DNA Extraction for cultured cells | Circulomics | Cat# EXT-CLU-001 |
| Circulomics Nanobind CBB Big DNA Kit | Circulomics | Cat# NB-900-001-01 |
| Circulomics Nanobind UL Library Prep Kit | Circulomics | Cat# NB-900-601-01 |
| Ultra-Long DNA Sequencing Kit | Oxford Nanopore Technologies | Cat# SQK-ULK001 |
| Illumina DNA PCR-Free Library Prep Kit | Illumina | Cat# 20041794 |
| Dovetail™ Omni-C™ Proximity Ligation Assay | Cantata | N/A |
| Qubit 1X dsDNA HS Assay Kit | ThermoFisher | Cat# Q33230 |
| High Sensitivity D5000 ScreenTapes | Agilent Technologies | Cat# 5067-5592 |
| SMRTbell® prep kit 3.0 | Pacific Biosciences | Cat# 102-141-700 |
| Sage Science PippinHT | Sage Science | Cat# ELF0001 |
| Monarch HMW DNA Extraction Kit for Cells & Blood | New England Biolabs | Cat# T3050L |
| Short Read Eliminator (SRE) XS kit | Pacific Biosciences | Cat# 102-208-200 |
| CUT&RUN Assay Kit | Cell Signaling Technology | Cat# 86652 |
| DNA purification with spin columns kit | Cell Signaling Techology | Cat# 14209 |
| High Sensitivity D1000 ScreenTapes | Agilent Technologies | Cat# 5067-5584 |
| NEBNext Ultra II DNA Library Prep Kit for Illumina | New England Biolabs | Cat# E7645S |
| Norgen RNA purification columns | Norgen Biotek | Cat# 37500 |
| MyOne Streptavidin C1 DynaBeads | Invitrogen | Cat# 65001 |
| NEB Monarch PCR & DNA cleanup kit | New England Biolabs | Cat# T1030L |
| Zymo Quick-DNA Microprep kit | Zymo | Cat# D3021 |
| NEBNext Ultra II FS kit | New England Biolabs | Cat# E7805 |
| NEB Multiplex Oligos for Illumina | New England Biolabs | Cat# E7600S |
| AMPure XP beads | Beckman Coulter | Cat# A63880 |
| Aquarius Whole Chromosome Painting probe | CytoCell | Cat# LPP 20G |
| Deposited data | ||
| HLE Betty standard ONT sequencing | This paper | SRA: SRR30454115 |
| HLE Betty Fiber-sequencing | This paper | SRA: SRR30454114 |
| HLE Betty Omni-C sequencing | This paper | SRA: SRR30454113 |
| HLE Betty CENP-A CUT&RUN sequencing | This paper | SRA: SRR30454112 |
| HLE Betty E/L Repli-seq | This paper | SRA: SRR30454111 |
| HLE Betty PRO-seq | This paper | SRA: SRR30454110 |
| HLE Betty Illumina PCR Free sequencing | This paper | SRA: SRR30454109 |
| HLE Betty ultra-long ONT sequencing | This paper | SRA: SRR30454108 |
| HLE-Betty-RNA Sequencing | Hartley et al.10 | GEO: GSM4891323 |
| Drosophila genome assembly | The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics | RefSeq: GCF_000001215.4 |
| Hoolock leuconedys whole genome sequencing | Okhovat et al.55 | SRA: SRR10075429 |
| Hoolock leuconedys whole genome sequencing | Okhovat et al.55 | SRA: SRR10075430 |
| Hoolock leuconedys whole genome sequencing | Okhovat et al.55 | SRA: SRR10075432 |
| Hoolock leuconedys whole genome sequencing | Okhovat et al.55 | SRA: SRR10075433 |
| Homo sapiens genome assembly | T2T Consortium | GenBank: GCA_009914755.4 |
| Pan paniscus genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank: GCA_029289425.1 |
| Pan troglodytes genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank: GCA_028858775.1 |
| Gorilla gorilla genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank: GCA_029281585.1 |
| Pongo pygmaeus genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank:GCA_028885625.1 |
| Pongo abelii genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank: GCA_028885655.1 |
| Hylobates moloch genome assembly | University of California, Santa Cruz | GenBank:GCA_009828535.3 |
| Symphalangus syndactylus genome assembly | National Human Genome Research Institute, National Institutes of Health | GenBank: GCA_028878055.1 |
| Nomascus leucogenys genome assembly | University of Washington | GenBank: GCA_006542625.1 |
| Rhinopithecus roxellana genome assembly | Northwest University | GenBank:GCA_007565055.1 |
| Callithrix jacchus genome assembly | McDonnell Genome Institute at Washington University | GenBank:GCA_009663435.1 |
| Cephalopachus bancanus genome assembly | BGI | GenBank:GCA_027257055.1 |
| Microcebus murinus genome assembly | Baylor College of Medicine and Broad Institute | GenBank: GCA_000165445.2 |
| Experimental models: Cell lines | ||
| HLE lymphoblastoid cell line | Laboratory of Lucia Carbone | Betty |
| Oligonucleotides | ||
| Probe: L5A5, TGGAGATAGCAGATCCATCCCC AAGTCCTCAAT |
This paper | N/A |
| Primer: L5A5 Internal, Forward ATGGCTAGGA GGCCAGCATA |
This paper | N/A |
| Primer: L5A5 Internal, Reverse TCCCGAGCAC TGATGAACTC |
This paper | N/A |
| Primer: L5A5 Junction, Forward ACCCCTTCC TATGGCTATATTCT |
This paper | N/A |
| Primer: L5A5 Junction, Reverse TCCACGTCA CTGAAAACTTCTCT |
This paper | N/A |
| Software and algorithms | ||
| Guppy (v5.0.16) | Oxford Nanopore Technologies | https://github.com/nanoporetech/pyguppyclient |
| FastQC (v0.11.7) | Andrews, S.94 | https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
| cutadapt (v3.5) | Martin, M.95 | https://github.com/marcelm/cutadapt/ |
| Integrated Genomics Viewer | Robinson et al.96 | https://igv.org/ |
| Geneious Prime software (build 2021-07-19 12:20) | Geneious | http://www.geneious.com/ |
| RepeatMasker (v4.1.2-p1) | Smit et al.23 | https://www.repeatmasker.org/ |
| BLAST (v2.7.1) | Altschul et al.97 | https://blast.ncbi.nlm.nih.gov/ |
| Flye (v2.9) | Kolmogorov et al.98 | https://github.com/mikolmogorov/Flye |
| Medaka (v1.4.3) | Oxford Nanopore Technologies | https://github.com/nanoporetech/medaka |
| Burrow’s Wheeler Aligner (v0.7.17) | Li et al.99 | https://bio-bwa.sourceforge.net/ |
| Samtools (v1.7) | Li et al.100 | https://www.htslib.org/ |
| Pilon (v1.22) | Walker et al.101 | https://github.com/broadinstitute/pilon |
| minimap2 (v2.15) | Li, H.102 | https://github.com/lh3/minimap2 |
| PURGEhaplotigs (v1.0) | Roach et al.103 | https://bitbucket.org/mroachawri/purge_haplotigs/src/master/ |
| BBMap (v39.08) | Bushnell, B.21 | https://github.com/BioInfoTools/BBMap |
| Pairtools (v0.3.0) | Open2C et al.104 | https://pairtools.readthedocs.io/en/latest/index.html |
| Preseq (v0.1.24) | Smith Lab105 | https://github.com/smithlabcode/preseq |
| Juicer (v1.6) | Durand et al.106 | https://github.com/aidenlab/juicer |
| 3D-DNA (v180922) | Dudchenko et al.107 | https://github.com/aidenlab/3d-dna |
| Juicebox with Assembly Tools (v1.11.08) | Dudchenko et al.108 | https://github.com/aidenlab/Juicebox |
| LASTZ (v1.04.15) | Harris, R.109 | https://github.com/lastz/lastz |
| UCSC GenomeBrowser Kent Tools (v369) | Kent et al.110 | https://github.com/ENCODE-DCC/kentUtils |
| Custom alignment scripts | This paper | https://github.com/carbonelab/lastz-pipeline |
| Emboss (v6.6.0) | Rice et al.111 | http://emboss.open-bio.org/ |
| RagTag (v2.1.0) | Alonge et al.112 | https://github.com/malonge/RagTag |
| TGS-GapCloser (v1.0.1) | Xu et al.113 | https://github.com/BGI-Qingdao/TGS-GapCloser |
| MitoHiFi (v3.2) | Uliano-Silva et al.114 | https://github.com/marcelauliano/MitoHiFi |
| QUAST (v5.0.2) | Gurevich et al.20 | https://quast.sourceforge.net/ |
| BUSCO (v5.0.0) | Simão et al.22 | https://busco.ezlab.org/ |
| Meryl (v1.3) | Rhie et al.19 | https://meryl.readthedocs.io/en/latest/ |
| Merqury (v1.3) | Rhie et al.19 | https://github.com/marbl/merqury |
| GenomeScope2.0 | Vurture et al.115 | https://github.com/tbenavi1/genomescope2.0 |
| Hifiasm (0.20.0-r639) | Cheng et al.116 | https://github.com/chhylp123/hifiasm |
| Crossmatch search engine (v1.090518) | Gordon et al.117 | http://www.phrap.org/phredphrapconsed.html |
| RMComp.pl | Hoyt et al.25 | https://doi.org/10.5281/zenodo.5895031 |
| Bonito (v2.1.2) | Oxford Nanopore Technologies | https://github.com/nanoporetech/bonito |
| Winnowmap (v2.03) | Jain et al.118 | https://github.com/marbl/Winnowmap |
| Modbam2bed (v0.9.5) | Oxford Nanopore Technologies | https://github.com/epi2me-labs/modbam2bed |
| FLAG | Troy et al.119 | https://www.biorxiv.org/content/10.1101/2023.07.14.548907v3 |
| Jasmine | Pacific Biosciences | https://github.com/PacificBiosciences/jasmine |
| Fibertools (v0.6.2) | Jha et al.120 | https://github.com/fiberseq/fibertools-rs |
| FIRE (v0.1.0) | Vollger et al.121 | https://github.com/fiberseq/FIRE |
| BISER (v1.4) | Išerić et al.43 | https://github.com/0xTCG/biser |
| Bedtools (v2.29.0) | Quinlan et al.122 | https://bedtools.readthedocs.io/en/latest/ |
| Circos (v0.69-9) | Krzywinski et al.123 | https://circos.ca/ |
| NucFreq (v0.1) | Vollger et al.53 | https://github.com/mrvollger/NucFreq |
| StainedGlass (v0.5) | Vollger et al.124 | https://github.com/mrvollger/StainedGlass |
| Methylartist (v1.2.7) | Cheetham et al.58 | https://github.com/adamewing/methylartist |
| Hisat2 (v2.2.1) | Kim e al.125 | https://daehwankimlab.github.io/hisat2/ |
| Bowtie2 (v2.5.0) | Langmead et al.126 | https://github.com/BenLangmead/bowtie2 |
| deepTools (v3.5.0) | Ramírez et al.127 | https://deeptools.readthedocs.io/en/develop/ |
| seqkit (v2.2.0) | Shen et al.128 | https://bioinf.shenwei.me/seqkit/ |
| AS-HMMER-SF | Altemose et al.8 | https://github.com/fedorrik/HumAS-HMMER_for_AnVIL |
| Gepard (v2.1.0) | Krumsiek et al.28 | https://github.com/univieCUBE/gepard |
| GraphPad Prism (v9) | GraphPad | https://www.graphpad.com/ |
| HiCRes | Marchal et al.129 | https://github.com/ClaireMarchal/HiCRes |
| HiCExplorer (v3.7.2) | Ramírez et al.130 | https://hicexplorer.readthedocs.io/en/latest/ |
| Hicpileup | Okhovat et al.55 | https://github.com/carbonelab/hicpileup |
| Homer (v4.11) | Heinz et al.131 | http://homer.ucsd.edu/homer/ |
| sratoolkit (v.3.0.0) | NCBI132 | https://github.com/ncbi/sra-tools |
| fastp v.0.23.4 | Chen et al.133 | https://github.com/OpenGene/fastp |
| GenMap (v.1.3.0) | Pockrandt et al.134 | https://github.com/cpockrandt/genmap |
| bcftools (v.1.20) | Danecek et al.135 | https://github.com/samtools/bcftools |
| vcftools (v.0.1.16) | Danecek et al.136 | https://vcftools.sourceforge.net/ |
| GFFUtils (v.0.13) | Dale, R.137 | https://daler.github.io/gffutils/ |
| bestNormalize package | Peterson, R.A.138 | https://doi.org/10.32614/CRAN.package.bestNormalize |
| sJplot package | Lüdecke, D.139 | https://doi.org/10.32614/CRAN.package.sjPlot |
| Other | ||
| Assembly code and documentation | This paper | Zenodo: https://doi.org/10.5281/ZENODO.13324332 |
| Assembly code and documentation | This paper | Github: https://github.com/gabriellehartley/HLE_Assembly |
Experimental model and study participant details
Cells used in this study were collected from a transformed Hoolock leuconedys (HLE) lymphoblastoid cell line (LCL) for Betty, a female eastern hoolock gibbon residing at the Gibbon Conservation Center in Santa Clarita, CA. Cells were cultured in RPMI-1640 supplemented with 10% Fetal Bovine Serium (FBS), 1% MEM non-essential amino acids, 1% L-Glutamine 1% antibiotic-antimycotic and 1% sodium pyruvate at 3°C and 5% CO2.
Method details
DNA sequencing
Oxford Nanopore Technologies sequencing
High molecular weight (HMW) genomic DNA was extracted from above described cells following a modified protocol.140 In brief, cells were lysed in 300ul lysis buffer (400 mM Tris pH 8.0; 60 mM EDTA pH 8.0; 150 mM NaCl; 1% SDS with 15 μL Puregene Proteinase K (Qiagen)) at 55°C for 2 h with inversion every 30 min. Post incubation, an additional 185 μL of lysis buffer was added and incubated overnight at 50°C. The lysate was treated with 500ug/ml of RNase A, incubated at 37°C for 20 min. The DNA was extracted using an equal volume of phenol:chloroform: isoamyl alcohol (25:24:1), followed by two rounds of chloroform: isoamyl alcohol (24:1). The aqueous layer containing the DNA was collected and the DNA precipitated with ethanol, DNA was spooled and washed with 70% ethanol. The DNA was resuspended in nuclease free water. Library preparation was performed using the Ligation Sequencing Kit (LSK109). HMW DNA was sequenced on the PromethION platform from Oxford Nanopore Technologies using a PromethION R9.4 FLOPRO002 flow cell and basecalled using Guppy (v2.2.3).141 Across five flow cells, a total of 89 Gb passed quality filtering (∼31x coverage). Before assembly, obtained reads were recalled using Guppy (v5.0.16)141 to improve basecalls.
To isolate ultra-high molecular weight (UHMW) DNA, the Circulomics Nanobind UHMW DNA Extraction for cultured cells (EXT-CLU-001) protocol was followed according to manufacturer’s instructions with the Circulomics Nanobind CBB Big DNA Kit (NB-900-001-01). Library preparation was performed using the Circulomics Nanobind Library Prep protocol for ultra long sequencing (LBP-ULN-001) using the Circulomics Nanobind UL Library Prep Kit (NB-900-601-01) and the Oxford Nanopore Technologies Ultra-Long DNA Sequencing Kit (SQK-ULK001). UHMW DNA was sequenced on the PromethION platform using an R9.4 FLOPRO002 flow cell and basecalled using Guppy (v5.0.16).141 A total of 45 Gb passed quality filtering (∼16x coverage).
Illumina PCR-free sequencing
In order to generate highly accurate PCR-free Illumina sequencing reads used for polishing and QV score estimation, the Illumina DNA PCR-Free Library Prep Tagmentation protocol (#20041794) was followed for library preparation according to the developer’s instructions. The library was sequenced on the Illumina NovaSeq 6000 platform to a depth of ∼50X. Reads were QCed using FastQC (v0.11.7)94 and trimmed using cutadapt (v3.5)95 prior to genome polishing using a quality cutoff score of 20 (-q 20) and a minimum read length of 50bp (-m 50).
Dovetail Omni-C sequencing
To generate Dovetail Omni-C reads, roughly 1.5 million cells were collected from the previously described Hoolock leuconedys LCLs and processed according to the Dovetail Omni-C Proximity Ligation Assay protocol for mammalian samples (v1.4). Lysate quantification was performed using the Qubit dsDNA HS kit for the Qubit Fluorometer and the High Sensitivity D5000 ScreenTape kit (5067–5592) for the Agilent TapeStation 2200. 150bp paired end sequencing was performed on the Illumina NextSeq 550 V2 platform to a depth of ∼274M reads.
PacBio HiFi sequencing and fiber-seq
To perform Fiber-seq, cells were collected and processed according to the methods in.15 Briefly, 1-2M HLE cells were pelleted at 1000 rpm for 5 min and washed once with PBS, then pelleted again at 1000 rpm for 5 min. The cell pellet was resuspended in 60 μL Buffer A (15 mM Tris, pH 8.0; 15 mM NaCl; 60 mM KCl; 1mM EDTA, pH 8.0; 0.5 mM EGTA, pH 8.0; 0.5 mM Spermidine) and 60 μL of cold Lysis buffer (0.1% Igepal in Buffer A). The pellet was gently resuspended, then placed on ice for 10 min. The sample was pelleted at 4°C for 5 min at 350 x g, and the supernatant was removed. The nuclei pellet was resuspended in 57.5 μL Buffer A, 1 μL of Hia5 MTase (200 U) 19 and 1.5 μL 32 mM S-adenosylmethionine (NEB B9003S) (0.8 mM final concentration). The sample was placed at 25°C for 10 min, then 3 μL of 20% SDS was added to stop the reaction. To generate PacBio libraries, DNA was sheared using the Diagenode Megaruptor 3 system targeting 15-20kb fragment lengths. A library was created per manufacturer’s instructions using the SMRTbell prep kit 3.0 (102-141-700). A high-pass size selection of the SMRTbell library was performed using the Sage Science PippinHT platform (Sage Science; ELF0001) according to the manufacturer’s protocol, using a high-pass cutoff of 10-15kb to target an average library size of 17-20kb. The library was run on the pacBio Sequel II system.
To obtain HiFi reads post-assembly for validation, DNA was extracted using the Monarch HMW DNA Extraction Kit for Cells & Blood (T3050L) from New England Biolabs and sheared using the Diagenode Megaruptor 3 system, targeting 18kb fragment lengths. The DNA was size selected using the Pacific Biosciences Short Read Eliminator (SRE) XS kit (102-208-200). A library was created per manufacturer’s instructions using the SMRTbell prep kit 3.0 (102-141-700). One SMRT Cell was run on the PacBio Revio system, obtaining ∼10X coverage.
CENP-A CUT&RUN preparation
The CUT&RUN Assay Kit (#86652) from Cell Signaling Technology was used to assess CENPA protein-DNA interactions following manufacturer’s instructions. 250,000 cells per condition were pelleted and washed in 1X wash buffer (10X wash buffer [#31415], 100X spermidine [#27287], 200X protease inhibitor cocktail [#7012]). Cells were bound to Concanavalin A beads for 5 min at room temperature, then resuspended in 1X binding buffer (100X spermidine, 200X protease inhibitor cocktail, 40X digitonin solution [#16359], and antibody binding buffer [#15338]). To assess CENPA-DNA interactions, the CENP-A monoclonal antibody (Enzo, ADI-KAM-CC006-E) was used at a dilution of 1:50. The antibodies were bound at 4°C for 2 h, then the beads were washed in digitonin buffer (10X wash buffer, 100X spermidine, 200X protease inhibitor cocktail, and 40X digitonin solution) on a magnetic rack. The beads were resuspended in digitonin buffer and pAG-MNase enzyme (#40366) then incubated at 4°C for 1 h. Following, the beads were washed in digitonin buffer on a magnetic rack, then resuspended in digitonin buffer and calcium chloride and incubated at 4°C for 30 min. Digestion was stopped with 1X stop buffer {4X stop buffer [#48105], digitonin solution, and 200X RNase A [#7013]). For normalization, 10 pg/uL spike-in DNA [#40366] was added at a 1:100 dilution. The samples were incubated at 37°C for 10 min, then supernatants were transferred to a new microfuge tube. Samples were incubated at 65°C for 2 h before proceeding to DNA purification. Input chromatin samples were sheared to ∼100–700 bases using a Covaris S2 sonicator prior to purification.
DNA purification was performed using the Cell Signaling DNA purification with spin columns kit (#14209). DNA concentration was assessed using the Qubit dsDNA HS kit for the Qubit Fluorometer and the High Sensitivity D1000 kit (5067–5584) for the Agilent TapeStation 2200. CENP-A and Input libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (#E7645S) and sequenced using the Illumina NovaSeq 6000 150bp paired end settings to a depth of ∼15M reads.
Precision run-on sequencing (PRO-seq)
At the time of harvest, 8 × 106 cells were collected and pelleted in a swinging bucket centrifuge at 800 x g for 5 min at 4°C. Media was aspirated without disturbing the cell pellet followed by washing with 1x PBS, pipetting gently to break up the cell pellet. After another spin, the PBS was aspirated and 4 mL of cold Chromatin Lysis Buffer (20mM HEPES pH 7.5, 300 mM NaCl, 0.2 mM EDTA, 1M Urea, and 1% NP-40; a final concentration of 1mM DTT, 0.01% (w/v) Dextran Sulfate MW 500,000, and RNase Inhibitor were added prior to use) was added directly to the cell pellet, mixing by pipetting. Samples were incubated on ice for 5 min before being transferred to a 5mL tube and spun in a swinging bucket rotor at 2,500 x g for 8 min at 4°C. Chromatin Lysis Buffer was aspirated and the chromatin pellet was washed twice with 4 mL of cold Chromatin Wash Buffer, (150mM KCl, 10mM Tris-Cl pH 8, 10% Glycerol, 250mM Sucrose, 500 mM Betaine; 0.5mM DTT and RNase Inhibitor were added prior to use.) mixing with a gentle vortex prior to being spun at 7,500 x g for 8 min at 4°C. Chromatin Wash Buffer was then aspirated and 0.5 mL of Buffer F (50 mM Tris-Cl pH 8, 40% Glycerol, 1 mM EDTA, 5 mM MgCl; 1mM DTT and RNase Inhibitor were added prior to use) was added and the chromatin pellet was transferred to a 1.5 mL tube before being spun at 7,500 x g for 8 min at 4°C. After pelleting, 0.4 mL of Buffer F was removed and the chromatin pellet was snap-frozen prior to being stored at −80°C.
PRO-seq libraries were generated using biological replicates of chromatin by following the protocol described in Mahat et al.24 and Judd et al.,142 with some modifications. Four biotin run-on reactions were carried out in a final volume of 200 μL and 25,000 permeabilized Drosophila S2 nuclei were added as a spike-in control during the reaction. During the run-on reaction, samples were vigorously pipetted for 60 s after adding the run-on master mix, then incubated at 37°C for 10 min. After the run-on, RNA was extracted using Norgen RNA purification columns (Cat #37500) as per the manufacturer’s protocol. Following purification, RNA was base-hydrolyzed for 20 min on ice prior to enrichment of biotinylated nascent RNA using 25 μL MyOne Streptavidin C1 DynaBeads (Invitrogen, Cat #65001). 3′ RNA adapter ligation was carried out off-bead using a 15 pmol adapter, followed by 5′ decapping, 5′ hydroxyl repair, and 5′ RNA adapter ligation performed on beads. Upon completion of reverse transcription, libraries were pre-amplified for 5 cycles using the following cycling parameters: (1) 95°C for 2 min, (2) 95°C for 30 s, (3) 56°C for 30 s, (4) 72°C for 30 s, (5) go to step two for four more times, (6) 72°C for 5 min.24 Test amplifications using serial dilutions of the pre-amplified libraries were then performed to determine the ideal number of cycles for full-scale amplification, with 11 cycles chosen for both samples. Fully amplified libraries were purified using NEB Monarch PCR & DNA cleanup kit (Cat #T1030L, quantified by Qubit, pooled in an equimolar fashion, and submitted for paired-end sequencing on an Illumina NextSeq 500 at the Center for Genome Innovation (UConn, Storrs, CT). Sequencing reads were partitioned such that read 1 was 44 base pairs (bp) in length, while read 2 was 40bp. A total of three sequencing runs were performed to reach the desired sequencing depth of 70 million and 81 million paired end reads for replicate 1 and replicate 2, respectively.
E/L repli-seq
To assess replication timing in HLE, 5 million HLE suspension cells were labeled with BrdU (Sigma-Aldrich #B5002) to a final concentration of 100 μM. Cells were incubated for 2 h at 37°C to allow for BrdU incorporation, then cells were pelleted at 1000 rpm for 5 min. The supernatant was discarded, and 2.5 mL of cold PBS/1% FBS was added. Cells were gently mixed and transferred to a round bottom tube. 7.5 mL of cold 100% ethanol was added with gentle shaking and stored at −20°C before cell sorting.
Following BrdU labeling, cells were processed according to the E/L repli-seq protocol by52 but with some modifications. In short, nuclei were obtained by pepsin treatment as described in the supplementary methods in52 then stained with propidium iodide (Sigma-Aldrich, #P4170) to assess cell cycle phase, and sorted by fluorescence-activated cell sorting (FACS) into early-S and late-S fractions according to the DNA content. DNA was prepared from each pool independently using the Zymo Quick-DNA Microprep kit (Zymo, #D3021), then purified DNA was fragmented and adaptor-ligated with NEBNext Ultra II FS kit (NEB #E7805) according to the manufacturer’s instructions. Finally, adaptor-ligated DNA fragments were immunoprecipitated with an BD Pharmingen Purified Mouse Anti- BrdU antibody (BD, #555627) and anti-mouse secondary antibody (Sigma-Aldrich #M7023) to capture BrdU-labeled DNA. DNA was indexed with NEB Multiplex Oligos for Illumina (NEB, #E7600S), and purified using AMPure XP beads (Beckman Coulter #A63880). Libraries were sequenced using the Illumina NovaSeq 6000 platform to generate 150bp paired-end reads.
L5A5 oligo probe design and FISH
To create a consensus sequence for the L5A5 repeat, each of the 24 assembled L5A5 units were manually obtained from IGV96 and imported into the Geneious software (build 2021-07-19 12:20).143 The 24 sequences were aligned to one another using a Geneious global alignment with free end gaps using a cost matrix of 65% similarity, a gap open penalty of 12, a gap extension penalty of 3, and 2 refinement iterations. From the resulting alignment, a consensus sequence was generated using a 0% majority threshold (the most common base at each position). The resulting 3,319bp consensus sequence was annotated using RepeatMasker (v4.1.2-p1).23 BLAST (v2.7.1)97 was used to search for L5A5 repeats elsewhere in the HLE assembly.
To localize the L5A5 repeat on HLE metaphase chromosome spreads, a 33bp oligo was designed specific to the L5A5 repeat. The oligo was diluted to a concentration of 100 μM, then a 3′ end labeling reaction was set up according to the following: 4 μL 5x TdT reaction buffer, 4 μL 25 mM CoCl2, 1 μL 1 mM dig-dUTPs, 1 μL TdT transferase enzyme, 6 μL ddATPs, and 3 μL water (20 μL total reaction). The reaction was incubated at 37°C for 30 min, then the reaction was stopped using 2 μL 0.2 mM EDTA. DNA FISH was performed on HLE metaphase chromosome spreads as published previously.10 Briefly, prior to slide preparation for hybridization, slides were dehydrated in 100% EtOH, then air dried. Slides were treated with 200 μL 0.1 mg/mL RNase A in 2X SSC and incubated at 37°C for 15 min in a humid chamber with a parafilm coverslip, then rinsed 4 times with 2X SSC pH 7.0 for 2 min each. Following, slides were treated with HCl (49.2 mL water +0.8 mL 12N HCl) for 10 min, then rinsed in 2X SSC pH 7.0. Lastly, slides were dehydrated in a 70%, 90%, and 100% EtOH row for 2 min each before air drying. To denature the slides, they were treated with 70% formamide in 2X SSC at 72°C for 2 min, then transferred to a −20°C 70%, 90%, and 100% EtOH row for 2 min each before air drying. Meanwhile, 5 μL of the L5A5 oligo probe was precipitated with 1 μL of 10 μg/μL salmon sperm carrier DNA and 2.5X volume of 100% cold EtOH at −80°C for 40 min. After precipitation, the probe was centrifuged at 13,000 rpm for 25 min. The supernatant was removed, and the probe was resuspended in 12 μL Hybrisol VII (MP Biomedicals). The probe was rehydrated for 1 h, then denatured at 80°C for 5 min. The probe was applied to the slide, then incubated overnight with a sealed coverslip at 37°C in a humid chamber. The following day, the coverslip was removed, and slides were washed 1 time for 2 min at 72°C in 0.4X SSC/0.3% NP-40, then 1 time for 1 min at room temperature in 2XSSC/0.1% NP-40. Subsequently, the slides were rinsed on 0.2% Tween 20/4XSSC, then blocked for 30 min at 37°C in 0.2% Tween 20/4XSSC/5% BSA. Slides were rinsed in 0.2% Tween 20/4XSSC, then an anti-Dig fluorophore was applied at a 1:400 dilution at 37°C for 30 min. Slides were rinsed three times in 0.2% Tween 20/4XSSC for 5 min each at 45°C, rinsed in H2O, then serially dehydrated in a 70%, 90%, and 100% EtOH row for 2 min each. After air drying, a counterstain of DAPI (diluted 1:5 in Vectashield) was applied to the slides and covered with a cover glass. Slides were imaged on an Olympus AX70 microscope using CytoVision software (Leica Biosystems Richmond, Inc.).
Chromosome painting was carried out on HLE metaphase chromosome spreads using Aquarius Whole Chromosome Painting probes (Cytocell Ltd, LPP 20G) for human chromosome 20. Slides and probes were co-denatured at 72°C for 5 min on a Hybaid in-situ block, then probes were hybridized to slides treated above in a humid chamber overnight at 37°C. Post hybridization, slides were washed in 2XSSC at room temperature to remove the coverslip, then washed in 0.4XSSC at 60°C for 2 min and 2XSSC/0.5% Tween 20 for 1 min at room temperature. Slides were then rinsed in distilled water, dehydrated in a 70%, 90%, and 100% EtOH row, then counterstained as above with a 1:5 dilution of DAPI in Vectashield (Vector Laboratories, Inc.). As above, images were captured using an Olympus AX70 microscope and CytoVision software (Leica Biosystems Richmond, Inc.).
L5A5 copy number validation with PCR
To validate the L5A5 repeat array identified in HLE and not in other species, primers were designed to target a 774bp internal portion of L5A5 repeats (expected to amplify in both HLE and NLE) and a 774bp portion of the junction between two L5A5 repeats (expected to amplify in only HLE). A PCR reaction was set up according to the following: 1 μL 10 μM F primer, 1 μL 10 μM R primer, 1 μL Taq polymerase, 2.5 μL 2.5 mM dNTPs, 5 μL 10X PCR buffer, 120 ng of HLE or NLE DNA, and H2O to 50 μL. Reactions were cycled in a thermal cycler according to the following: 94°C for 3 min, then 94°C for 30 s, 58°C for 30 s, 72°C for 1 min (repeated for 35 cycles), then 72°C for 5 min. Amplification was visualized on a 1% agarose/1X TBE gel with EtBr, run at 90V for 50 min. The gel was imaged on a BIO RAD GelDoc EZ Imager using the Image Lab (v6.0.1) software.
Quantification and statistical analysis
Genome assembly
Full assembly code can be found on Zenodo144 and at Github: https://github.com/gabriellehartley/HLE_Assembly. Flye (v2.9)98 was used to assemble the raw Oxford Nanopore reads using an estimated genome size of 2.9 Gb, the size of the previously assembled Nomascus leucogenys genome. Medaka (v1.4.3)145 was used for long read polishing using default settings and the r941_prom_sup_g507 model. Publicly available Illumina WGS sequencing reads for the same individual (SAMN12702557) were used to polish the assembly by mapping the reads with Burrow’s Wheeler Aligner (v0.7.17)99 using the bwa mem algorithm and processed using Samtools (v1.7).100 Short read error correction was performed using Pilon (v1.22)101 with default parameters. Haplotype redundancies and assembly artifacts based on read coverage were removed using minimap2 (v2.15)102 and PURGEhaplotigs (v1.0).103 The reformat.sh module of BBMap (v39.08)21 was used to impose a 3kb limit on the genome.
Omni-C sequencing quality control was performed using FastQC (v0.11.7)94 and was analyzed according to the Dovetail documentation using the Dovetail’s pre-built environment. Briefly, Burrow’s Wheeler Aligner (v0.7.17-r1188)99 and Samtools (v1.9)100 was used to align and process the Omni-C reads to the assembly using the -5SP and -T0 flags to accommodate independently mapping mate pairs. Pairtools (v0.3.0)104 was used to identify valid ligation events (pairtools parse; –min-map 40, --walks-policy 5unique, and –max-inter-align-gap 30 flags), sort the file (pairtools sort) remove PCR duplicates (pairtools dedup; -mark-dups and -output-stats flags) and split the file (pairtools split; -output-pairs and -output-sam flags). Dovetail’s get_qc.py script was used to retrieve key library statistics and Preseq (v0.1.24)105 was used to estimate library complexity.
Juicer (v1.6)106 and 3D-DNA (v180922)107 were used to scaffold the assembly using the protocols outlined by developers. Juicebox with Assembly Tools (v1.11.08)108 was used for manual review of the produced scaffolds. LASTZ (v1.04.15)109 and UCSC GenomeBrowser Kent Tools (v369)110 were used to align the assembly to the CHM13 v1.1 genome using a custom pipeline (https://github.com/carbonelab/lastz-pipeline). Resulting alignments were validated according to predicted syntenic regions and large-scale chromosome misassemblies and misorientations were manually corrected using Emboss (v6.6.0).111
To reduce any misassemblies associated with manual curation and scaffolding, pre-scaffolded contigs (the “query”) were aligned to the curated assembly chromosome scaffolds (the “reference”) and scaffolded using RagTag (v2.1.0).112 The resulting scaffolded assembly (built from “query” contigs) was gap filled using TGS-GapCloser (v1.0.1).113 The gap filled, final assembly was polished with Illumina reads using Pilon (v.1.22)101 with default parameters.
To assemble the mitochondrial genome, HiFi reads (described above) were assembled using MitoHiFi (v3.2).114
Genome QC
Quality metrics of the assembly were analyzed using QUAST (v5.0.2).20 BUSCO (Benchmarking Universal Single-Copy Orthologs) (v5.0.0; MetaEuk v4.0)22 was used to assess assembly gene completeness using the lineage dataset primates_odb10. To assess k-mer completeness, Meryl (v1.3)19 was used to count 21-mers found in the Illumina PCR-free library, and Merqury (v1.3)19 was used to determine the QV score and estimate completeness of identified 21-mers within the assembly. GenomeScope2.0115 was used to estimate the HLE genome size.
Hifiasm validation
To validate our centromere assemblies using an orthogonal approach, we combined the standard PacBio Hifi sequencing and Fiber-sequencing (on the PacBio platform) and created a HiFi-only assembly using Hifiasm116 with default parameters (v0.20.0-r639). We aligned both haplotypes produced by Hifiasm to our ONT-based assembly using minimap2 (v2.15)102 and searched for errors in our assembly using Integrated Genomics Viewer (IGV).96
Repeat and gene annotations
Repeats in the genome were annotated with RepeatMasker (v4.1.2-p1)23 using the Crossmatch search engine (v1.090518)117 and a combined gibbon (Hylobates sp.) Dfam (v3.6) and Repbase (20181026) repeat library with the “-lib gibbon” flag. Repeats identified in the CHM13 genome not yet included in the gibbon Dfam repeat lineage25 were annotated using a custom repeat library against the draft masked genome. The two repeat annotations were merged using the RMComp.pl pipeline.25 The RepeatMasker summary script buildSummary.pl was used to summarize the percent count and percent base pairs of each repeat. Genes were predicted and annotated using FLAG.119
CpG methylation
CpG methylation was called using Bonito (v2.1.2)146 in raw ONT reads. Raw reads were converted to Fastq format using Samtools (v1.9),100 and mapped to the HLE assembly using Winnowmap (v2.03).118 Modbam2bed (v0.9.5)147 was used to generate viewable aggregated CpG methylation tracks.
FIRE annotations
To call Fiber-seq inferred regulatory elements (FIREs), reads were processed with jasmine using the –keep-kinetics flag, 6mAs were predicted with fibertools (v0.6.2),120 then passed into the FIRE (v0.1.0) pipeline121 to call FIREs and binned to 1kb for visualization.
Segmental duplication annotations
Segmental duplications in the HLE assembly were predicted using BISER (v1.4)43 using default parameters. Next, segmental duplications were filtered using awk to include only predictions that both 1) were >1kb in length and 2) shared >90% ungapped sequence identity. Segmental duplication predictions were overlapped with RepeatMasker LAVA and SST1 annotations using Bedtools (v2.29.0) intersect.122 Segmental duplications were visualized using Circos (v0.69-9).123
CENP-A CUT&RUN analysis and validation
Quality of the CENP-A enriched chromatin and input sequencing was analyzed using FastQC (v0.11.7).94 Reads were trimmed using Cutadapt (v3.5)95 using a quality cutoff score of 20 (-q 20) and a minimum read length of 50bp (-m 50). Trimmed reads were aligned to the HLE assembly using Burrows-Wheeler Aligner (BWA) (v0.7.5a-r405)99 using a minimum seed length of 50bp (-k 50) and skipping seeds with more than 1 million occurrences (-c 1000000). Alignments were filtered to remove multi-mappers (-F 2308) using Samtools (v1.9)100 and converted to bed format using Bedtools (v2.29.0).122 Because of the repetitive nature of centromeres, a marker-assisted filtering approach was implemented to retain only aligned reads that overlapped with a unique k-mer in the assembly. Meryl (v1.3)19 was used to generate a database of 21-mers from the HLE assembly (meryl k = 21 count), filter the resulting assembly for unique k-mers (meryl equal-to 1) and convert the database to a bed file (meryl-lookup -bed). The overlapSelect module of GenomeBrowser tools (v20180626)110 was used to intersect CENP-A and input alignments with unique 21-mers. The resulting alignments, now containing only reads that overlap a unique 21-mer in the assembly, were converted to bedgraphs using Bedtools (v2.29.0)122 for viewing in IGV.96
To validate assembled centromeres, read coverage over regions with CENP-A enrichment were analyzed using NucFreq (v0.1).53 Of the 19 assembled chromosomes, 14 had anomalies in read coverage; therefore, the five well assembled centromere regions were targeted for further analysis.
Repeats, genes and methylation in centromeres
In order to assess the repetitive content across each of the five uncollapsed HLE centromeres, the RepeatMasker annotations were converted to bed format and intersected with regions of CENP-A enrichment using Bedtools (v2.29.0) intersect.122 Overall repeat composition across CENP-A regions was summarized using the buildSummary.pl utility script within the RepeatMasker package.23 Repeat tracks were visualized using Integrated Genomics Viewer (IGV). Sequence similarity was assessed using StainedGlass (v0.5).124 CpG methylation tracks were generated using Modbam2bed (v0.9.5).147 Detailed visualization of CpG methylation across centromeres was performed using the methylartist locus command available with Methylartist (v1.2.7).58 Gene predictions were overlapped with CENP-A regions using Bedtools intersect (v2.29.0)122 and gene ontology was examined using the GO database.148,149
To validate the gene predictions that overlap with CENP-A enrichment on HLE chromosome 9, these regions were aligned to homologous genes from human (T2T CHM13v2.0/hs1), Rhesus macaque (Mmul_10/rheMac10), marmoset (Callithrix_jacchus_cj1700_1.1/calJac4), and NLE (GGSC Nleu3.0/nomLeu3) using Clone Manager Professional 9 software. For the PLSCR1, PLSCR2, and PLSCR2-like genes predicted on HLE chromosome 6 and at HLE Cen9, exons were manually annotated based on homologous exons identified in human, macaque, marmoset, and NLE.
Alignment of RNA-sequencing
Publicly available RNA-sequencing for Betty was mapped to the HLE assembly using Hisat2 (v2.2.1).125 Reads were processed to bam format using Samtools (v1.9)100 and converted to bedgraph format for visualization using Bedtools (v2.29.0) bamtobed and Bedtools (v2.29.0) genomecov.122
PRO-seq analysis
Raw fastq files were first trimmed for quality (--nextseq-trim = 20) and adapter sequences and then trimmed to a length of 44bp (--length) before discarding any remaining reads <15bp (-m) using cutadapt (v3.5).95 For this study, a three-way quantification of transcription was applied to capture the range of possible transcriptional activity across the genome, mimicking the approach laid out by.25
For alignment with Bowtie2 (v2.5.0),126 paired-end reads were mapped to a combined HLE - Drosophila assembly (NCBI RefSeq assembly: GCF_000001215.4) using default “best match” parameters along with parameters to prevent the reporting of discordant mate (--no-discordant) and individual mate (--no-mixed) alignments, hence retaining only confident alignments. Sam files containing reads mapped to HLE were processed into a bed file for both plus and minus strands using Bedtools (v2.29.0).122 This bed file was subsequently used for either: 1) counting read abundance across repeats with BEDtools, or 2) generation of a single-nucleotide 3′ end only BigWig file110,122 indicating RNA polymerase occupancy. The latter was used for visualization in the UCSC genome browser and for heatmap generation of genic transcriptional activity genome-wide with deepTools (v3.5.0).127
For alignment with Bowtie (v1.3.1),126 read 1 was reverse complemented using seqkit (v2.2.0)128,150 prior to being mapped to a combined HLE - Drosophila assembly (NCBI RefSeq assembly: GCF_000001215.4) using -k 100 parameters, reporting up to 100 valid alignments per read, and zero mismatches (i.e., a perfect alignment). Specifying zero mismatches for Bowtie2 (above) was not required as this is already the default behavior. Following mapping, sam files containing reads mapping to HLE were processed into a bed file for both plus and minus strands using Bedtools (v2.29.0).122 This bed file was then subsequently used for either: 1) unique 21-mer filtering, 2) counting read abundance across repeats with BEDtools, or 3) BigWig file generation of the 3′ end (Bedtools (v2.29.0),122 GenomeBrowser/20180626) for visualization in the UCSC genome browser.
Single-copy 21-mers were generated from the HLE assembly using Meryl (v1.3).19 Bed files of the Bowtie -k 100 mapped paired-end reads were used to filter through Meryl single copy 21-mers using overlapSelect with the option ‘-overlapBases = XXbp’ (wherein, XX represents the length of the single copy k-mers (21-mer).110 This location-based filtering method requires that a minimum of the entire length of the k-mer (21bp in this study) should overlap with a given read in order to be retained.
To assess the similarity and variation between the two PRO-seq replicates, a principal component analysis (PCA) plot was generated from Bowtie2 (default, “best match”) position sorted bams using deepTools (v3.5.0).127 The spearman correlation was visualized based on the output of multiBamSummary. After confirming the strong correlation between the two replicates, they were merged together for all other plots and visualizations. Heatmaps representing PRO-seq transcriptional profiles of genes were generated with deepTools computeMatrix and plotHeatmap.127 Specific plotting parameters include: --averageTypeBins max and --averageTypeSummaryPlot mean, and --zMax 6. PRO-seq read counts across each repeat class was obtained with BEDtools coverage -counts, requiring at least half the read pair to overlap a given repeat in order to be reported.122
E/L repli-seq processing
Early (E) and late (L) S phase Repli-seq data was processed according to the protocols in.52 First, Bedtools (v2.29.0) makewindows122 was used to make 5kb windows corresponding to the HLE assembly. Bowtie2 (v2.5.0)126 was used with the –no-mixed –no-discordant –reorder -X 1000 flags, then alignments were processed, sorted, and filtered using Samtools (v1.9)100 to retain alignments with a MAPQ score greater than 20. Samtools (v1.9) rmdup100 was used to remove duplicate reads, then Bedtools (v2.29.0) bamToBed and Bedtools (v2.29.0) intersect122 were used to intersect the alignments with the generated 5kb windows. The coverage was assessed using a custom script in,52 and the base 2 log ratio of early versus late S-phase samples were calculated over the 5kb genomic windows. Finally, samples were post-processed using quantile normalization and Loess smoothing using the preprocessCore package. Final bedgraphs were visualized in IGV.
Alpha-satellite subfamily (SF) annotation
Alpha-satellite SFs were annotated using AS-HMMER-SF tool8,29 which uses HMMs (hidden Markov models) to identify each alpha-satellite monomer with one of the 18 suprachromosomal families discovered in ape genomes. To align HLE alpha-satellites to SSY alpha-satellites, a dot matrix was built using Gepard (v2.1.0)28 with word length 50 and window size 0. The sequences and annotations of SSY centromeres were published previously.29,151
L5A5 copy number estimation across primates
To estimate the copy number of L5A5 repeats in the HLE genome, a k-mer based approach was taken as developed for the rDNA copy number estimation in T2T-CHM13 genome assembly.12 First, 21-mers in the HLE PCR-Free Illumina reads were counted using Meryl (v1.3),19 then 21-mer multiplicity (or number of times the 21-mer was found in the read set) was plotted against the corresponding counts of 21-mers found at that multiplicity. The average 21-mer multiplicity was determined to be 46, corresponding to an average sequencing depth of 46X. Next, the coverage of 21-mers in the Illumina PCR-Free read set was determined across the L5A5 consensus sequence using Meryl v(1.3).19 The corresponding counts were divided by 23X, the estimated read coverage per haplotype, to determine the diploid copy number of L5A5 repeats. The median was determined to be ∼1,154 total copies in the genome, or ∼577 copies per haplotype.
In order to assess the presence of L5A5 repeats across other primates, BLASTN (v2.7.1)97 was used to search for L5A5 composite repeats across 13 additional primates. The start and stop locations of identified repeats were curated using the Geneious software (build 2021-07-19 12:20).143 The sequences were aligned to the HLE consensus using a Geneious alignment with default high sensitivity settings.
Evaluation of breakpoints
Breakpoints corresponding to T2T-CHM13, NLE, HMO, and SSY were identified in the HLE assembly by performing pairwise alignments between the HLE assembly and each other genome using LASTZ (v1.04.15).109 Filtering and chaining was performed with UCSC GenomeBrowser Kent Tools (v369).110 A custom python script, axtToSyn,152 was used to detect synteny blocks and breakpoints using a minimum alignment length of 1,000bp and a minimum alignment score of 100,000. Identified breakpoints, reported by axtToSyn as the 1,000bp flanking the left and right of breakpoints, were collapsed into one set of coordinates. Breakpoints and syntenic blocks were plotted using RIdeogram.153 Identified breakpoints were intersected with RepeatMasker annotations using Bedtools (v2.29.0) intersect and summarized using the RepeatMasker utility script buildSummary.pl.23
To analyze methylation of repeats at breakpoints, intersected repeats were filtered for one of: L1Hylob, L1Ms, L1P (LINEs), AluJ, AluS, AluY (SINEs), or LAVAs. Repeats found within each breakpoint were concatenated together, then Bedtools (v2.29.0) subtract122 was used to remove TEs found within breakpoints from genome wide TEs, to create five categories: breakpoints to T2T-CHM13, breakpoints to HMO, breakpoints to NLE, breakpoints to SSY, and TEs found outside of breakpoints. Methylartist (v1.2.7) segmeth58 was used to output aggregated CpG methylation calls over intervals for each repeat. Following, Methylartist (v1.2.7)58 segplot was used to create ridge plots for each repeat type. Aggregated methylation calls over intervals as called by Methylartist (v1.2.7) segmeth58 for each repeat type were averaged to get average methylation per repeat type per breakpoint category. Descriptive statistics were calculated using GraphPad Prism (v9).
To analyze segmental duplication content of breakpoint regions, SD calls from BISER (v1.4)43 were intersected with breakpoints using Bedtools (v2.29.0) intersect.122 SD intersect statistics were calculated using GraphPad Prism (v9). To identify older (HyA) and young (HLE) breakpoints, HLE-CHM13 breakpoints corresponding to those between syntenic blocks identified in2 were subsetted; other breakpoints were excluded from the analysis due to lack of age evidence. Subsetted breakpoints were aged by identifying those present in the gibbon ancestor (HyA) or not found in the gibbon ancestor (HLE). Comparisons between aged breakpoints were made by intersecting varying data types with breakpoints using Bedtools (v2.29.0) intersect122 and statistical comparisons were performed using the Mann-Whitney U test.
Primary processing of Omni-C data
Raw Omni-C data was aligned to the reference HLE genome using bwa (v7.0.17)99 with the following settings (mem -5SP -T0 -t16). Following alignment, the parse module of pairtools (v1.0.2)104 was utilized (--min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30) to identify ligation junctions and outputs were sorted using the pairtools sort module with default settings. Next, the dedup module was used to detect and remove PCR duplicates. To generate bam files for downstream analysis, we used pairtools to split, sort and index the.pairsam file with default settings. Using HiCRes129 we estimated resolution of the Omni-C dataset and used the cooler tool v0.9.3154 to convert the indexed output file into a single resolution cool matrix (10kb bin size) and the multi-resolution mcool matrix (resolutions ranging 10kb-10.24Mb). Output files were compressed using samtools bgzipm.135 TAD boundary annotations and genome-wide insulation scores were generated by HiCExplorer (v3.7.2)130 with a 10Kb resolution (--minDepth 100000 --maxDepth 600000).
Genome conformation at EBRs
We used previously published55 custom scripts (Github: https://github.com/carbonelab/hicpileup) to visualize median insulation score in 600Mb windows centered at the synteny breakpoints identified against CHM13, NLE, HMO and SSY. The same scripts were used to visualize aggregate Hi-C contact frequencies in 2Mb windows centered at synteny breakpoints. To further explore the relationship between chromatin interaction and age of breakpoints, for each breakpoint we determined minimum and median insulation score as well as distance from the closest TAD boundary, using bedtools and custom shell and R scripts. Bedtools shuffle (v2.31.1) (-chrom -noOverlapping) was used to generate a shuffled version of each set of breakpoints followed by two-tailed wilcoxon rank-sum tests to compare distance to nearest boundary, and minimum/median insulation score between observed and shuffled breakpoints.
CTCF binding motif annotation at EBRs
We used the findMotifsGenome.pl (size -given) function from Homer (v4.11)131 to predict all CTCF binding motifs within sequences corresponding to all BOS sites.
Nucleotide diversity analysis
All paired-end illumina data were retrieved from the NCBI database for Hoolock leuconedys using fastq-dump from the sratoolkit (v.3.0.0)132 (SRA: SRR10075429, SRA: SRR10075430, SRA: SRR10075432, and SRA: SRR10075433), and adapters were then trimmed from paired-ends using fastp v.0.23.4.133
The genome assembly fasta was indexed using bwa index,155 followed by alignment and sorting each sample using bwa mem with the Sentieon workflow under default parameters. We then applied the LocusCollector algorithm from Sentieon to the bam alignments to collect read information and applied Sentieon’s Dedup algorithm to mark and remove PCR duplicates. Next, we used Sentieon’s Haplotyper algorithm, using emit_mode gvcf and default filters, to produce a gvcf file for each sample. This was followed by joint genotyping using the GVCFTyper algorithm, under default filtering settings.
We then generated a mappability filter using GenMap (v.1.3.0) and retained only sites that map to the genome uniquely with a kmer size of 150.134 In addition to this filter, we removed sites with map quality <40 or quality scores <20 using bcftools (v.1.20).135 We then assessed sample depth, missingness, and heterozygosity using vcftools (v.0.1.16), and output the average nucleotide diversity (π) with window sizes of 100kb.136 Using GFFUtils (v.0.13),137 we converted the GTF annotation file into bed format, and used a series of custom scripts to calculate the distance to the nearest breakpoint and the centromere from the midpoint of each window. We also calculate the percentage of each window covered by an annotated gene, segmental duplications, repeat content, as well as SST1, LAVA, L1P, L1M, L1Hb, AluY, AluS, AluJ, and AluJb elements specifically.
In R156 we performed a linear model on the above described data as predictors, for the following interaction effects: distance to the nearest breakpoint∗breakpoint age, minimum insulation score of the nearest TAD boundary∗ the closest TAD boundary distance∗ the distance to the nearest breakpoint, with nucleotide diversity as the response variable. Due to lack of normality in the residuals, we performed an ordered normalization transformation on the nucleotide diversity values with the bestNormalize package.138 We then use the backwards step function from the car package157 to determine the model with the best fit and evaluate interaction effects using the plot_model function, with type = “eff”, from the sJplot package.139
Additional resources
Full assembly code can be found on Zenodo144 and at Github: https://github.com/gabriellehartley/HLE_Assembly.
Published: March 14, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xgen.2025.100808.
Supplemental information
References
- 1.Carbone L., Harris R.A., Gnerre S., Veeramah K.R., Lorente-Galdos B., Huddleston J., Meyer T.J., Herrero J., Roos C., Aken B., et al. Gibbon genome and the fast karyotype evolution of small apes. Nature. 2014;513:195–201. doi: 10.1038/nature13679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Capozzi O., Carbone L., Stanyon R.R., Marra A., Yang F., Whelan C.W., de Jong P.J., Rocchi M., Archidiacono N. A comprehensive molecular cytogenetic analysis of chromosome rearrangements in gibbons. Genome Res. 2012;22:2520–2528. doi: 10.1101/gr.138651.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Misceo D., Capozzi O., Roberto R., Dell’oglio M.P., Rocchi M., Stanyon R., Archidiacono N. Tracking the complex flow of chromosome rearrangements from the Hominoidea Ancestor to extant Hylobates and Nomascus Gibbons by high-resolution synteny mapping. Genome Res. 2008;18:1530–1537. doi: 10.1101/gr.078295.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mittermeier, R.A., Wilson, D.E., and Rylands, A.B. (2013). Handbook of the Mammals of the World: Primates (Lynx Edicions).
- 5.Thinh V.N., Mootnick A.R., Geissmann T., Li M., Ziegler T., Agil M., Moisson P., Nadler T., Walter L., Roos C. Mitochondrial evidence for multiple radiations in the evolutionary history of small apes. BMC Evol. Biol. 2010;10:74. doi: 10.1186/1471-2148-10-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Willard H.F. Chromosome-specific organization of human alpha satellite DNA. Am. J. Hum. Genet. 1985;37:524–532. [PMC free article] [PubMed] [Google Scholar]
- 7.Alexandrov I.A., Medvedev L.I., Mashkova T.D., Kisselev L.L., Romanova L.Y., Yurov Y.B. Definition of a new alpha satellite suprachromosomal family characterized by monomeric organization. Nucleic Acids Res. 1993;21:2209–2215. doi: 10.1093/nar/21.9.2209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Altemose N., Logsdon G.A., Bzikadze A.V., Sidhwani P., Langley S.A., Caldas G.V., Hoyt S.J., Uralsky L., Ryabov F.D., Shew C.J., et al. Complete genomic and epigenetic maps of human centromeres. Science. 2022;376 doi: 10.1126/science.abl4178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Carbone L., Harris R.A., Mootnick A.R., Milosavljevic A., Martin D.I.K., Rocchi M., Capozzi O., Archidiacono N., Konkel M.K., Walker J.A., et al. Centromere remodeling in Hoolock leuconedys (Hylobatidae) by a new transposable element unique to the gibbons. Genome Biol. Evol. 2012;4:648–658. doi: 10.1093/gbe/evs048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hartley G.A., Okhovat M., O’Neill R.J., Carbone L. Comparative Analyses of Gibbon Centromeres Reveal Dynamic Genus-Specific Shifts in Repeat Composition. Mol. Biol. Evol. 2021;38:3972–3992. doi: 10.1093/molbev/msab148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Veeramah K.R., Woerner A.E., Johnstone L., Gut I., Gut M., Marques-Bonet T., Carbone L., Wall J.D., Hammer M.F. Examining phylogenetic relationships among gibbon genera using whole genome sequence data using an approximate bayesian computation approach. Genetics. 2015;200:295–308. doi: 10.1534/genetics.115.174425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Nurk S., Koren S., Rhie A., Rautiainen M., Bzikadze A.V., Mikheenko A., Vollger M.R., Altemose N., Uralsky L., Gershman A., et al. The complete sequence of a human genome. Science. 2022;376:44–53. doi: 10.1126/science.abj6987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gershman A., Sauria M.E.G., Guitart X., Vollger M.R., Hook P.W., Hoyt S.J., Jain M., Shumate A., Razaghi R., Koren S., et al. Epigenetic patterns in a complete human genome. Science. 2022;376 doi: 10.1126/science.abj5089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Massey D.J., Koren A. Telomere-to-telomere human DNA replication timing profiles. Sci. Rep. 2022;12:9560. doi: 10.1038/s41598-022-13638-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dubocanin D., Hartley G.A., Cortes A.E.S., Mao Y., Hedouin S., Ranchalis J., Agarwal A., Logsdon G.A., Munson K.M., Real T., et al. Conservation of dichromatin organization along regional centromeres. bioRxiv. 2024;2023 doi: 10.1101/2023.04.20.537689. Preprint at. [DOI] [PubMed] [Google Scholar]
- 16.Meyer T.J., Held U., Nevonen K.A., Klawitter S., Pirzer T., Carbone L., Schumann G.G. The Flow of the Gibbon LAVA Element Is Facilitated by the LINE-1 Retrotransposition Machinery. Genome Biol. Evol. 2016;8:3209–3225. doi: 10.1093/gbe/evw224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Okhovat M., Nevonen K.A., Davis B.A., Michener P., Ward S., Milhaven M., Harshman L., Sohota A., Fernandes J.D., Salama S.R., et al. Co-option of the lineage-specific LAVA retrotransposon in the gibbon genome. Proc. Natl. Acad. Sci. USA. 2020;117:19328–19338. doi: 10.1073/pnas.2006038117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Seppey M., Manni M., Zdobnov E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol. Biol. 2019;1962:227–245. doi: 10.1007/978-1-4939-9173-0_14. [DOI] [PubMed] [Google Scholar]
- 19.Rhie A., Walenz B.P., Koren S., Phillippy A.M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gurevich A., Saveliev V., Vyahhi N., Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bushnell, B. (2014). BBMap: A Fast, Accurate, Splice-Aware Aligner.
- 22.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 23.Smit, AFA, Hubley, R & Green, P (2021). RepeatMasker: RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences (RepeatMasker Open-4.0.).
- 24.Mahat D.B., Kwak H., Booth G.T., Jonkers I.H., Danko C.G., Patel R.K., Waters C.T., Munson K., Core L.J., Lis J.T. Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq) Nat. Protoc. 2016;11:1455–1476. doi: 10.1038/nprot.2016.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hoyt S.J., Storer J.M., Hartley G.A., Grady P.G.S., Gershman A., de Lima L.G., Limouse C., Halabian R., Wojenski L., Rodriguez M., et al. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science. 2022;376 doi: 10.1126/science.abk3112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Warburton P.E., Hasson D., Guillem F., Lescale C., Jin X., Abrusan G. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genom. 2008;9:533. doi: 10.1186/1471-2164-9-533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lin C.C., Sasi R., Lee C., Fan Y.S., Court D. Isolation and identification of a novel tandemly repeated DNA sequence in the centromeric region of human chromosome 8. Chromosoma. 1993;102:333–339. doi: 10.1007/BF00661276. [DOI] [PubMed] [Google Scholar]
- 28.Krumsiek J., Arnold R., Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007;23:1026–1028. doi: 10.1093/bioinformatics/btm039. [DOI] [PubMed] [Google Scholar]
- 29.Makova K.D., Pickett B.D., Harris R.S., Hartley G.A., Cechova M., Pal K., Nurk S., Yoo D., Li Q., Hebbar P., et al. The complete sequence and comparative analysis of ape sex chromosomes. Nature. 2024;630:401–411. doi: 10.1038/s41586-024-07473-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Logsdon G.A., Rozanski A.N., Ryabov F., Potapova T., Shepelev V.A., Catacchio C.R., Porubsky D., Mao Y., Yoo D., Rautiainen M., et al. The variation and evolution of complete human centromeres. Nature. 2024;629:136–145. doi: 10.1038/s41586-024-07278-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Miga K.H., Koren S., Rhie A., Vollger M.R., Gershman A., Bzikadze A., Brooks S., Howe E., Porubsky D., Logsdon G.A., et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585:79–84. doi: 10.1038/s41586-020-2547-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Barra V., Fachinetti D. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat. Commun. 2018;9:4340. doi: 10.1038/s41467-018-06545-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sahu S.K., Aradhyam G.K., Gummadi S.N. Calcium binding studies of peptides of human phospholipid scramblases 1 to 4 suggest that scramblases are new class of calcium binding proteins in the cell. Biochim. Biophys. Acta. 2009;1790:1274–1281. doi: 10.1016/j.bbagen.2009.06.008. [DOI] [PubMed] [Google Scholar]
- 34.Shaw C.J., Bi W., Lupski J.R. Genetic Proof of Unequal Meiotic Crossovers in Reciprocal Deletion and Duplication of 17p11.2. Am. J. Hum. Genet. 2002;71:1072–1081. doi: 10.1086/344346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sharp A.J., Locke D.P., McGrath S.D., Cheng Z., Bailey J.A., Vallente R.U., Pertz L.M., Clark R.A., Schwartz S., Segraves R., et al. Segmental Duplications and Copy-Number Variation in the Human Genome. Am. J. Hum. Genet. 2005;77:78–88. doi: 10.1086/431652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cooper G.M., Coe B.P., Girirajan S., Rosenfeld J.A., Vu T.H., Baker C., Williams C., Stalker H., Hamid R., Hannig V., et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 2011;43:838–846. doi: 10.1038/ng.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ji Y., Eichler E.E., Schwartz S., Nicholls R.D. Structure of Chromosomal Duplicons and their Role in Mediating Human Genomic Disorders. Genome Res. 2000;10:597–610. doi: 10.1101/gr.10.5.597. [DOI] [PubMed] [Google Scholar]
- 38.Bailey J.A., Yavor A.M., Massa H.F., Trask B.J., Eichler E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001;11:1005–1017. doi: 10.1101/gr.187101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Armengol L., Pujana M.A., Cheung J., Scherer S.W., Estivill X. Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum. Mol. Genet. 2003;12:2201–2208. doi: 10.1093/hmg/ddg223. [DOI] [PubMed] [Google Scholar]
- 40.Locke D.P., Archidiacono N., Misceo D., Cardone M.F., Deschamps S., Roe B., Rocchi M., Eichler E.E. Refinement of a chimpanzee pericentric inversion breakpoint to a segmental duplication cluster. Genome Biol. 2003;4:R50. doi: 10.1186/gb-2003-4-8-r50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Locke D.P., Segraves R., Carbone L., Archidiacono N., Albertson D.G., Pinkel D., Eichler E.E. Large-Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization. Genome Res. 2003;13:347–357. doi: 10.1101/gr.1003303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Samonte R.V., Eichler E.E. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 2002;3:65–72. doi: 10.1038/nrg705. [DOI] [PubMed] [Google Scholar]
- 43.Išerić H., Alkan C., Hach F., Numanagić I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithm Mol. Biol. 2022;17:1–15. doi: 10.1186/s13015-022-00210-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Vollger M.R., Guitart X., Dishuck P.C., Mercuri L., Harvey W.T., Gershman A., Diekhans M., Sulovari A., Munson K.M., Lewis A.P., et al. Segmental duplications and their variation in a complete human genome. Science. 2022;376 doi: 10.1126/science.abj6965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chang D.J., Cimprich K.A. DNA damage tolerance: when it’s OK to make mistakes. Nat. Chem. Biol. 2009;5:82–90. doi: 10.1038/nchembio.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.De S., Michor F. DNA replication timing and long-range DNA interactions predict mutational landscapes of cancer genomes. Nat. Biotechnol. 2011;29:1103–1108. doi: 10.1038/nbt.2030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fritz A., Sinha S., Marella N., Berezney R. Alterations in replication timing of cancer-related genes in malignant human breast cancer cells. J. Cell. Biochem. 2013;114:1074–1083. doi: 10.1002/jcb.24447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Du Q., Bert S.A., Armstrong N.J., Caldon C.E., Song J.Z., Nair S.S., Gould C.M., Luu P.-L., Peters T., Khoury A., et al. Replication timing and epigenome remodelling are associated with the nature of chromosomal rearrangements in cancer. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-08302-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gan W., Guan Z., Liu J., Gui T., Shen K., Manley J.L., Li X. R-loop-mediated genomic instability is caused by impairment of replication fork progression. Genes Dev. 2011;25:2041–2056. doi: 10.1101/gad.17010011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Smith L., Plug A., Thayer M. Delayed replication timing leads to delayed mitotic chromosome condensation and chromosomal instability of chromosome translocations. Proc. Natl. Acad. Sci. USA. 2001;98:13300–13305. doi: 10.1073/pnas.241355098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Yaffe E., Farkash-Amar S., Polten A., Yakhini Z., Tanay A., Simon I. Comparative Analysis of DNA Replication Timing Reveals Conserved Large-Scale Chromosomal Architecture. PLoS Genet. 2010;6 doi: 10.1371/journal.pgen.1001011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Marchal C., Sasaki T., Vera D., Wilson K., Sima J., Rivera-Mulia J.C., Trevilla-García C., Nogues C., Nafie E., Gilbert D.M. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat. Protoc. 2018;13:819–839. doi: 10.1038/nprot.2017.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Vollger M.R., Dishuck P.C., Sorensen M., Welch A.E., Dang V., Dougherty M.L., Graves-Lindsay T.A., Wilson R.K., Chaisson M.J.P., Eichler E.E. Long-read sequence and assembly of segmental duplications. Nat. Methods. 2019;16:88–94. doi: 10.1038/s41592-018-0236-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Kumar S., Suleski M., Craig J.M., Kasprowicz A.E. TimeTree 5: An Expanded Resource for Species Divergence Times. Mol. Biol. Evol. 2022;39 doi: 10.1093/molbev/msac174. https://www.ncbi.nlm.nih [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Okhovat M., VanCampen J., Nevonen K.A., Harshman L., Li W., Layman C.E., Ward S., Herrera J., Wells J., Sheng R.R., et al. TAD evolutionary and functional characterization reveals diversity in mammalian TAD boundary properties and function. Nat. Commun. 2023;14:8111. doi: 10.1038/s41467-023-43841-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Carbone L., Harris R.A., Vessere G.M., Mootnick A.R., Humphray S., Rogers J., Kim S.K., Wall J.D., Martin D., Jurka J., et al. Evolutionary breakpoints in the gibbon suggest association between cytosine methylation and karyotype evolution. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Yoder J.A., Walsh C.P., Bestor T.H. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–340. doi: 10.1016/s0168-9525(97)01181-5. [DOI] [PubMed] [Google Scholar]
- 58.Cheetham S.W., Kindlova M., Ewing A.D. Methylartist: tools for visualizing modified bases from nanopore sequence data. Bioinformatics. 2022;38:3109–3112. doi: 10.1093/bioinformatics/btac292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Brown J.D., O’Neill R.J. The mysteries of chromosome evolution in gibbons: methylation is a prime suspect. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Carbone L., Vessere G.M., ten Hallers B.F.H., Zhu B., Osoegawa K., Mootnick A., Kofler A., Wienberg J., Rogers J., Humphray S., et al. A high-resolution map of synteny disruptions in gibbon and human genomes. PLoS Genet. 2006;2 doi: 10.1371/journal.pgen.0020223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Krefting J., Andrade-Navarro M.A., Ibn-Salem J. Evolutionary stability of topologically associating domains is associated with conserved gene regulation. BMC Biol. 2018;16:87. doi: 10.1186/s12915-018-0556-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Li X., Wang J., Yu Y., Li G., Wang J., Li C., Zeng Z., Li N., Zhang Z., Dong Q., et al. Genomic rearrangements and evolutionary changes in 3D chromatin topologies in the cotton tribe (Gossypieae) BMC Biol. 2023;21:56. doi: 10.1186/s12915-023-01560-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lazar N.H., Nevonen K.A., O’Connell B., McCann C., O’Neill R.J., Green R.E., Meyer T.J., Okhovat M., Carbone L. Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res. 2018;28:983–997. doi: 10.1101/gr.233874.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Henikoff S., Ahmad K., Malik H.S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 2001;293:1098–1102. doi: 10.1126/science.1062939. [DOI] [PubMed] [Google Scholar]
- 65.Chang C.-H., Chavan A., Palladino J., Wei X., Martins N.M.C., Santinello B., Chen C.-C., Erceg J., Beliveau B.J., Wu C.-T., et al. Islands of retroelements are major components of Drosophila centromeres. PLoS Biol. 2019;17 doi: 10.1371/journal.pbio.3000241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ferreri G.C., Marzelli M., Rens W., O’Neill R.J. A centromere-specific retroviral element associated with breaks of synteny in macropodine marsupials. Cytogenet. Genome Res. 2004;107:115–118. doi: 10.1159/000079580. [DOI] [PubMed] [Google Scholar]
- 67.Longo M.S., Carone D.M., NISC Comparative Sequencing Program. Green E.D., O’Neill M.J., O’Neill R.J. Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty. BMC Genom. 2009;10:334. doi: 10.1186/1471-2164-10-334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Renfree M.B., Papenfuss A.T., Deakin J.E., Lindsay J., Heider T., Belov K., Rens W., Waters P.D., Pharo E.A., Shaw G., et al. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 2011;12:R81. doi: 10.1186/gb-2011-12-8-r81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Johnson R.N., O’Meally D., Chen Z., Etherington G.J., Ho S.Y.W., Nash W.J., Grueber C.E., Cheng Y., Whittington C.M., Dennison S., et al. Adaptation and conservation insights from the koala genome. Nat. Genet. 2018;50:1102–1111. doi: 10.1038/s41588-018-0153-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Piras F.M., Nergadze S.G., Magnani E., Bertoni L., Attolini C., Khoriauli L., Raimondi E., Giulotto E. Uncoupling of satellite DNA and centromeric function in the genus Equus. PLoS Genet. 2010;6 doi: 10.1371/journal.pgen.1000845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Castellano K.R., Neitzey M.L., Starovoitov A., Barrett G.A., Reid N.M., Vuruputoor V.S., Webster C.N., Storer J.M., Pauloski N.R., Ameral N.J., et al. Genome Assembly of a Living Fossil, the Atlantic Horseshoe Crab Limulus Polyphemus, Reveals Lineage-Specific Whole Genome Duplications, Transposable Element-Based Centromeres and a ZW Sex Chromosome System. Mol. Biol. Evol. 2025;42 doi: 10.1093/molbev/msaf021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Luo S., Mach J., Abramson B., Ramirez R., Schurr R., Barone P., Copenhaver G., Folkerts O. The cotton centromere contains a Ty3-gypsy-like LTR retroelement. PLoS One. 2012;7 doi: 10.1371/journal.pone.0035261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Gent J.I., Wang N., Dawe R.K. Stable centromere positioning in diverse sequence contexts of complex and satellite centromeres of maize and wild relatives. Genome Biol. 2017;18:121. doi: 10.1186/s13059-017-1249-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Nagaki K., Cheng Z., Ouyang S., Talbert P.B., Kim M., Jones K.M., Henikoff S., Buell C.R., Jiang J. Sequencing of a rice centromere uncovers active genes. Nat. Genet. 2004;36:138–145. doi: 10.1038/ng1289. [DOI] [PubMed] [Google Scholar]
- 75.Sayers E.W., Bolton E.E., Brister J.R., Canese K., Chan J., Comeau D.C., Connor R., Funk K., Kelly C., Kim S., et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2021;50:D20–D26. doi: 10.1093/nar/gkab1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Fragkos M., Ganier O., Coulombe P., Méchali M. DNA replication origin activation in space and time. Nat. Rev. Mol. Cell Biol. 2015;16:360–374. doi: 10.1038/nrm4002. [DOI] [PubMed] [Google Scholar]
- 77.Fu H., Baris A., Aladjem M.I. Replication timing and nuclear structure. Curr. Opin. Cell Biol. 2018;52:43–50. doi: 10.1016/j.ceb.2018.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Raghuraman M.K., Winzeler E.A., Collingwood D., Hunt S., Wodicka L., Conway A., Lockhart D.J., Davis R.W., Brewer B.J., Fangman W.L. Replication dynamics of the yeast genome. Science. 2001;294:115–121. doi: 10.1126/science.294.5540.115. [DOI] [PubMed] [Google Scholar]
- 79.Kim S.-M., Dubey D.D., Huberman J.A. Early-replicating heterochromatin. Genes Dev. 2003;17:330–335. doi: 10.1101/gad.1046203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Kim S.M., Huberman J.A. Regulation of replication timing in fission yeast. EMBO J. 2001;20:6115–6126. doi: 10.1093/emboj/20.21.6115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Kaplan D.L. Springer; 2016. The Initiation of DNA Replication in Eukaryotes. [Google Scholar]
- 82.Wear E.E., Song J., Zynda G.J., LeBlanc C., Lee T.-J., Mickelson-Young L., Concia L., Mulvaney P., Szymanski E.S., Allen G.C., et al. Genomic analysis of the DNA replication timing program during mitotic S phase in maize (Zea mays) root tips. Plant Cell. 2017;29:2126–2149. doi: 10.1105/tpc.17.00037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Massey D.J., Kim D., Brooks K.E., Smolka M.B., Koren A. Next-Generation Sequencing Enables Spatiotemporal Resolution of Human Centromere Replication Timing. Genes. 2019;10 doi: 10.3390/genes10040269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Ahmad K., Henikoff S. Centromeres are specialized replication domains in heterochromatin. J. Cell Biol. 2001;153:101–110. doi: 10.1083/jcb.153.1.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Ten Hagen K.G., Gilbert D.M., Willard H.F., Cohen S.N. Replication timing of DNA sequences associated with human centromeres and telomeres. Mol. Cell Biol. 1990;10:6348–6355. doi: 10.1128/mcb.10.12.6348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Erliandri I., Fu H., Nakano M., Kim J.-H., Miga K.H., Liskovykh M., Earnshaw W.C., Masumoto H., Kouprina N., Aladjem M.I., Larionov V. Replication of alpha-satellite DNA arrays in endogenous human centromeric regions and in human artificial chromosome. Nucleic Acids Res. 2014;42:11502–11516. doi: 10.1093/nar/gku835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zeman M.K., Cimprich K.A. Causes and consequences of replication stress. Nat. Cell Biol. 2014;16:2–9. doi: 10.1038/ncb2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Natale F., Scholl A., Rapp A., Yu W., Rausch C., Cardoso M.C. DNA replication and repair kinetics of Alu, LINE-1 and satellite III genomic repetitive elements. Epigenetics Chromatin. 2018;11:61. doi: 10.1186/s13072-018-0226-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Bacolla A., Ye Z., Ahmed Z., Tainer J.A. Cancer mutational burden is shaped by G4 DNA, replication stress and mitochondrial dysfunction. Prog. Biophys. Mol. Biol. 2019;147:47–61. doi: 10.1016/j.pbiomolbio.2019.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Bourc’his D., Bestor T.H. Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature. 2004;431:96–99. doi: 10.1038/nature02886. [DOI] [PubMed] [Google Scholar]
- 91.Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Gilbertson S.E., Walter H.C., Gardner K., Wren S.N., Vahedi G., Weinmann A.S. Topologically associating domains are disrupted by evolutionary genome rearrangements forming species-specific enhancer connections in mice and humans. Cell Rep. 2022;39 doi: 10.1016/j.celrep.2022.110769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Liao Y., Zhang X., Chakraborty M., Emerson J.J. opologically associating domains and their role in the evolution of genome structure and function in Drosophila. Genome Res. 2021;31:397–410. doi: 10.1101/gr.266130.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Andrews, S. (2010). Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data.
- 95.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. j. 2011;17:10–12. [Google Scholar]
- 96.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 98.Kolmogorov M., Yuan J., Lin Y., Pevzner P.A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 99.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., Earl A.M. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9 doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Roach M.J., Schmidt S.A., Borneman A.R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinf. 2018;19:460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Open2C. Abdennur N., Fudenberg G., Flyamer I.M., Galitsyna A.A., Goloborodko A., Imakaev M., Venev S.V. Pairtools: from sequencing data to chromosome contacts. bioRxiv. 2023 doi: 10.1101/2023.02.13.528389. 2023.02.13.528389. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Software for predicting library complexity and genome coverage in high-throughput sequencing. https://github.com/smithlabcode/preseq Github.
- 106.Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M., Durand N.C., Shamim M.S., Machol I., Lander E.S., Aiden A.P., Aiden E.L. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Dudchenko O., Shamim M.S., Batra S.S., Durand N.C., Musial N.T., Mostofa R., Pham M., Glenn St Hilaire B., Yao W., Stamenova E., et al. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv. 2018 doi: 10.1101/254797. Preprint at. 254797. [DOI] [Google Scholar]
- 109.Harris R.S. DNA; 2007. Improved Pairwise Alignment of Genomic. Thesis. [Google Scholar]
- 110.Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Rice P., Longden I., Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
- 112.Alonge M., Lebeigle L., Kirsche M., Jenike K., Ou S., Aganezov S., Wang X., Lippman Z.B., Schatz M.C., Soyk S. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23:258. doi: 10.1186/s13059-022-02823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Xu M., Guo L., Gu S., Wang O., Zhang R., Peters B.A., Fan G., Liu X., Xu X., Deng L., Zhang Y. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. GigaScience. 2020;9 doi: 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Uliano-Silva M., Ferreira J.G.R.N., Krasheninnikova K., Darwin Tree of Life Consortium. Formenti G., Abueg L., Torrance J., Myers E.W., Durbin R., Blaxter M., McCarthy S.A. MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinf. 2023;24:288. doi: 10.1186/s12859-023-05385-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Vurture G.W., Sedlazeck F.J., Nattestad M., Underwood C.J., Fang H., Gurtowski J., Schatz M.C. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–2204. doi: 10.1093/bioinformatics/btx153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Cheng H., Asri M., Lucas J., Koren S., Li H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat. Methods. 2024;21:967–970. doi: 10.1038/s41592-024-02269-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Gordon D., Green P. Consed: a graphical editor for next-generation sequencing. Bioinformatics. 2013;29:2936–2937. doi: 10.1093/bioinformatics/btt515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Jain C., Rhie A., Zhang H., Chu C., Walenz B.P., Koren S., Phillippy A.M. Weighted minimizer sampling improves long read mapping. Bioinformatics. 2020;36:i111–i118. doi: 10.1093/bioinformatics/btaa435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Troy W., Damas J., Titus A.J., Cantarel B.L. Find, Label, Annotate Genomes: FLAG, a fully automated tool for structural and functional gene annotation. bioRxiv. 2024;2023 doi: 10.1101/2023.07.14.548907. Preprint at. [DOI] [Google Scholar]
- 120.Jha A., Bohaczuk S.C., Mao Y., Ranchalis J., Mallory B.J., Min A.T., Hamm M.O., Swanson E., Dubocanin D., Finkbeiner C., et al. DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools. Genome Res. 2024;34:1976–1986. doi: 10.1101/gr.279095.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Vollger M.R., Swanson E.G., Neph S.J., Ranchalis J., Munson K.M., Ho C.-H., Sedeño-Cortés A.E., Fondrie W.E., Bohaczuk S.C., Mao Y., et al. A haplotype-resolved view of human gene regulation. bioRxiv. 2024 doi: 10.1101/2024.06.14.599122. Preprint at. [DOI] [Google Scholar]
- 122.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S.J., Marra M.A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Vollger M.R., Kerpedjiev P., Phillippy A.M., Eichler E.E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics. 2022;38:2049–2051. doi: 10.1093/bioinformatics/btac018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Kim D., Paggi J.M., Park C., Bennett C., Salzberg S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Ramírez F., Ryan D.P., Grüning B., Bhardwaj V., Kilpert F., Richter A.S., Heyne S., Dündar F., Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Shen W., Sipos B., Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta. 2024;3 doi: 10.1002/imt2.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Marchal C., Singh N., Corso-Díaz X., Swaroop A. HiCRes: a computational method to estimate and predict the genomic resolution of Hi-C libraries. Nucleic Acids Res. 2022;50 doi: 10.1093/nar/gkab1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Ramírez F., Bhardwaj V., Arrigoni L., Lam K.C., Grüning B.A., Villaveces J., Habermann B., Akhtar A., Manke T. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.sra-tools: SRA Tools. https://github.com/ncbi/sra-tools Github.
- 133.Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Pockrandt C., Alzamel M., Iliopoulos C.S., Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinformatics. 2020;36:3687–3692. doi: 10.1093/bioinformatics/btaa222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M., Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10 doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Dale, R. gffutils: GFF and GTF file manipulation and interconversion (Github).
- 138.Peterson R.A. bestNormalize: Normalizing transformation functions. R package version. 2018;1:573. [Google Scholar]
- 139.Lüdecke D. Sjplot - Data Visualization For Statistics In Social Science. R package version. 2018;10 doi: 10.5281/ZENODO.1462253. [DOI] [Google Scholar]
- 140.Green M.R., Sambrook J. Cold Spring Harbor Laboratory Press; 2012. Molecular Cloning: A Laboratory Manual. [Google Scholar]
- 141.Guppy software overview Oxford Nanopore Technologies. https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revax_14dec2018/guppy-software-overview.
- 142.Judd J., Wojenski L.A., Wainman L.M., Tippens N.D., Rice E.J., Dziubek A., Villafano G.J., Wissink E.M., Versluis P., Bagepalli L., et al. A rapid, sensitive, scalable method for Precision Run-On sequencing (PRO-seq) bioRxiv. 2020;2020 doi: 10.1101/2020.05.18.102277. 05.18.102277. [DOI] [Google Scholar]
- 143.Geneious prime Geneious. https://www.geneious.com/features/prime.
- 144.Hartley G. Preprint at Zenodo; 2024. Genome Assembly Code for Centromeric Transposable Elements and Epigenetic Status Drive Karyotypic Variation in the Eastern Hoolock Gibbon. [DOI] [PubMed] [Google Scholar]
- 145.medaka: Sequence correction provided by ONT Research (Github).
- 146.PyTorch Basecaller for Oxford Nanopore Reads. https://github.com/nanoporetech/bonito Github.
- 147.modbam2bed. https://github.com/epi2me-labs/modbam2bed Github.
- 148.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Gene Ontology Consortium. Aleksander S.A., Balhoff J., Carbon S., Cherry J.M., Drabkin H.J., Ebert D., Feuermann M., Gaudet P., Harris N.L. The Gene Ontology knowledgebase in 2023. Genetics. 2023;224 doi: 10.1093/genetics/iyad031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Shen W., Le S., Li Y., Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One. 2016;11 doi: 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Yoo D., Rhie A., Hebbar P., Antonacci F., Logsdon G.A., Solar S.J., Antipov D., Pickett B.D., Safonova Y., Montinaro F., et al. Complete sequencing of ape genomes. bioRxiv. 2024 doi: 10.1101/2024.07.31.605654. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.axtToSyn: Detect synteny blocks and synteny breakpoints from pairwise genome alignment file. https://github.com/carbonelab/axtToSyn Github.
- 153.Hao Z., Lv D., Ge Y., Shi J., Weijers D., Yu G., Chen J. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ. Comput. Sci. 2020;6:e251. doi: 10.7717/peerj-cs.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Abdennur N., Mirny L.A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36:311–316. doi: 10.1093/bioinformatics/btz540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Team R.C.R.C.T. R Foundation for Statistical Computing; 2019. RA Language and Environment for Statistical Computing. [Google Scholar]
- 157.Fox J., Friendly M., Weisberg S. Hypothesis tests for multivariate linear models using the car package. Rom. Jahrb. 2013;5:39. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated during this study are available at the NCBI BioProject: PRJNA1153068 and BioSample: SAMN43386187. The code used during this study is available at Github: https://github.com/gabriellehartley/HLE_Assembly and Zenodo: https://doi.org/10.5281/ZENODO.13324332.





