Summary
Enhancer-gene communication is dependent on topologically-associating domains (TADs) and boundaries enforced by the CTCF insulator, but the underlying structures and mechanisms remain controversial. Here, we investigated a boundary that typically insulates FGF oncogenes but is disrupted by DNA hypermethylation in gastrointestinal stromal tumors (GIST). The boundary contains an array of CTCF sites that enforce adjacent TADs, one containing FGF genes and the other containing ANO1 and its putative enhancers, which are specifically active in GIST, likely indicating the cell of origin. We show that coordinate disruption of four CTCF motifs in the boundary fuses the adjacent TADs, allows the ANO1 enhancer to contact FGF3, and causes its robust induction. High-resolution micro-C maps reveal specific contact between transcription initiation sites in the ANO1 enhancer and FGF3 promoter that quantitatively scales with FGF3 induction such that modest changes in contact frequency result in strong changes in expression, consistent with a causal relationship.
Graphical Abstract

eTOC blurb:
Enhancer-gene communication is dependent on chromosome topology, which helps enhancers find their correct gene targets. Kim et al. characterize topological boundaries that insulate the oncogenes FGF3 and FGF4 from a potent enhancer, and show that combinatorial disruption of CTCF sites in the boundary result in enhancer-promoter contact and gene induction.
Introduction
Gene regulation is a major driver of cell identity, allowing cells that share the same DNA to differentiate into the diverse lineages that underlie tissue and organismal biology1,2. By recent estimates, the human genome contains more than a million enhancer elements thought to dictate the precise spatiotemporal patterns of gene expression crucial for this specification and tissue function3–5. Their locations and cell type-specificities can be predicted from genomic maps of chromatin accessibility or histone modifications6. However, a fundamental unanswered question remains how these enhancers, which are distributed across vast swaths of chromosomal DNA, are guided to their appropriate gene targets. The ability of some enhancers to activate genes across distances of tens or hundreds of kilobases (kb) suggests that genome conformation plays an important role, while also alluding to the complexity of enhancer targeting mechanisms.
The role of chromosome conformation (‘topology’) and physical contacts between enhancers and promoters in gene regulation and transcriptional control is an area of active investigation. Topology can be mapped by chromosome conformation capture-based methods that measure contact frequency between genomic loci by restricting and ligating DNA in its native chromatin context7. The Hi-C method measures such interactions in an all-by-all manner by deep sequencing of ligation junctions8. Although Hi-C produces genome-wide maps, its effective resolution is limited by restriction site availability and its high sequencing requirement. This has prompted the development of alternative approaches that increase resolution by digesting with micrococcal nuclease (micro-C)9, that enrich loci of interest by hybrid capture or amplification (e.g., capture-C)10, or that measure one-to-all interactions from a prespecified viewpoint (e.g., 4C)11.
Application of these conformational assays has revealed that the genome is partitioned into discrete regions of self-interacting chromatin, termed topologically-associating domains (TADs). TADs correspond to loops of chromatin extruded through cohesin rings12,13, and are frequently bounded by binding sites for the CCCTC-binding factor (CTCF) insulator, which halts extrusion5,14. A compelling interpretation is that enhancers and promoters are constrained by these topological structures, such that regulatory interactions primarily occur between elements in the same TAD5,14–16. However, this straightforward interpretation of TADs as ‘insulated neighborhoods’ has been challenged by microscopy studies that find TADs to represent relatively short-lived structures17,18.
Furthermore, enhancer-promoter interactions are not clearly evident in Hi-C maps14 and are difficult to appreciate by microscopy19–22, leaving open the question of whether and how proximity translates to transcriptional activation. Although mainstream models assume that physical enhancer-promoter contacts direct transcription, alternate models invoke nonlinear23–25 or linear26,27 relationships to quantitatively relate physical contact to transcriptional activity. Clarifying these relationships is critical for understanding enhancer functions and the role of TADs in regulating their activities. The lack of clarity has also prompted alternate proposals in which contact-free effects are mediated through condensates or diffusion of transcription factors (TFs)24,28.
Human genetics and disease studies have provided important insights into the functions of TADs and boundaries. Germline mutations that alter chromosome topology in the EphA4, Shh and Sox9 loci disrupt enhancer-promoter interactions crucial for limb development, leading to polydactyly29–31. Somatic mutations that disrupt TAD boundaries and/or CTCF binding sites have been associated with aberrant gene activation in several cancer types32–35. The DNA hypermethylation associated with various cancer types can also disrupt CTCF binding and thereby weaken TAD boundaries36–38. We previously documented widespread loss of CTCF insulators in hypermethylated isocitrate dehydrogenase (IDH)-mutant gliomas and identified a specific CTCF insulator whose disruption caused an enhancer to aberrantly contact and activate the canonical brain tumor oncogene PDGFRA36,39. We identified a similar phenomenon in gastrointestinal stromal tumors (GIST) wherein insulator disruption activates FGF oncogenes37. In contrast to these striking individual examples, perturbation studies that used genome editing to disrupt multiple CTCF insulators suggest that a relatively small proportion of boundaries functionally impact gene expression in any given cellular context40–43. A critical goal is now to gain a mechanistic understanding of how these specific insulator aberrations can have such profound effects on gene activity.
GISTs are a relatively common form of sarcoma typically driven by gain-of-function mutations affecting the KIT or PDGFRA oncogenes44. However, ~10% of GISTs are driven by loss of the succinate dehydrogenase (SDH) complex. This subtype does not respond to standard-of-care therapy, highlighting an urgent need for new understanding and therapeutic strategies. Succinate overload in SDH-deficient tumors inhibits TET family DNA demethylases and results in profound DNA hypermethylation45. Hundreds of CTCF binding sites are disrupted by the hypermethylation, including sites within an insulator that protects the developmental regulators and proto-oncogenes FGF3 and FGF4. Disruption of this insulator by methylation is a recurrent event in SDH-deficient GIST associated with strong induction of oncogenic FGF ligands37. Moreover, the insulator may also be disrupted by genetic rearrangement, as documented in a very small cohort of GISTs46. Beyond its clinical implications, this profoundly responsive locus has potential to provide key insights into how TADs and their boundaries impact gene regulation. However, we do not yet understand which of the many CTCF sites in the locus direct boundary formation, nor how their disruption impacts locus topology, enhancer-promoter communication or transcriptional activity.
Here, we characterized the structure and function of the FGF3/4 locus and the boundary element that insulates the region from an enhancer in the adjacent TAD. By integrating chromosome conformation mapping with combinatorial CRISPR/Cas9 perturbations in a GIST cell line, we identify specific CTCF binding sites within the insulator that enforce the boundaries of two adjacent TADs, one containing FGF3 and FGF4 and the other containing a potent enhancer active in GIST and its presumed cell-of-origin. We used region-capture micro-C (RCMC)9 to quantify locus topology across GIST cell line derivatives with different CTCF disruptions, and to relate enhancer-promoter contact frequency to transcription. We find that a minimum of 4 CTCF sites must be disrupted to disable the insulator function, thus fusing the adjacent TADs to induce FGF3. Remarkably, micro-C measurements indicated that contact frequency between the enhancer and FGF3 promoter scales linearly with FGF3 transcriptional induction. In contrast, FGF4 did not contact the enhancer and remained inactive. In conclusion, we deeply characterized a topological boundary that insulates potent oncogenes from a nearby enhancer, documented an array of redundant CTCF insulators as critical for its function, and provided direct evidence that enhancer-promoter contacts govern transcriptional activity of a key developmental regulator.
Results
Topological organization and regulatory landscape of the FGF3-FGF4 locus
To investigate how locus topology impacts transcription of the FGF3 and FGF4 proto-oncogenes, we used RCMC to generate a one kilobase (kb)-resolution contact map of the locus in GIST-T1 cells (Figure 1A). We acquired 12.2 million reads to map a 1.0 Mb region that contains genes encoding FGF ligands, as well as ANO1 which encodes a chloride channel expressed in all GIST subtypes. For perspective, the resolution of our RCMC map is equivalent to a genome-wide contact map with ~40 billion reads. Visual inspection of the contact map revealed that the FGF genes are contained within a ~200 kb TAD marked by high levels of self-interacting chromatin (Figure 1A, red heat in the contact map). An adjacent TAD contains ANO1 and two enhancers that harbor histone 3 lysine 27 acetylation (H3K27ac) and transcribe enhancer-associated RNAs (eRNAs) in GIST tumors and the GIST-T1 cell line. The respective TADs are separated by a robust boundary (TB2) that is evident as a break between the regions of high self-interaction and confirmed quantitatively by insulation scores47 derived from the contact map (Figure 1A). The boundary and overall topology of the locus are highly conserved across Hi-C and Micro-C maps for diverse tissue and cell types (Figure S1A–B).
Figure 1. FGF locus topology and regulatory landscape.

(A) RCMC heatmap depicts contact frequency between genomic positions across an 575 kb interval that includes genes encoding FGF ligands and the GIST biomarker ANO1. The region contains two large TADs (red triangles) flanked by boundaries (TB1-3) that are evident as dips in the insulation score metric. (B) CTCF (black) and H3K27ac (blue) profiles are shown for GIST-T1 cells and for clinical samples corresponding to different GIST subtypes (ChIP-seq data are RPM normalized). Multiple CTCF sites in the TB2 boundary (black dashed box) that separates the FGF genes from the H3K27ac-marked enhancers that encode eRNAs (blue dashed box). (C) Box plots depict eRNA and ANO1 expression (top, green), CTCF binding (top, gray), promoter acetylation (middle) and FGF expression (bottom) in the respective GIST subtypes (data from GSE10744737). *p < 0.05, **p < 0.01, ***p < 0.001. (D) Schematic indicates correlations between eRNA expression, ANO1 expression, and averaged FGF3 and FGF4 expression across multiple GISTs (13 SDH-proficient, 6 SDH-deficient). The strong correlation between eRNA-1 and the FGF genes in SDH-deficient GISTs implicates the enhancer in the activation of these oncogenes. (E) UMAP visualization of single-cell RNA-seq data for gastrointestinal tissue highlights a cluster of Interstitial Cell of Cajal (ICC) that highly express ANO1 and eRNA-1. See also Figure S1.
The central TB2 boundary contains four strongly bound CTCF binding sites whose motif orientations are informative48 (Figure 1A). CTCF is understood to establish TAD boundaries by halting the extrusion of chromatin through cohesin. The extruded chromatin joins the region of high self-interaction, while the remainder is excluded or ‘insulated’ from the TAD. Importantly, this effect is orientation-dependent in that a CTCF site must be oriented towards the TAD to halt extrusion and create a boundary49–51. Of the four main CTCF sites in TB2, one is oriented towards the FGF TAD, while the other three are oriented towards the TAD containing the enhancers and ANO1. This suggests that the single TB2 element actually encodes two boundaries - one for the FGF TAD and one for the ANO1 TAD - through multiple, potentially redundant CTCF sites.
Comparison of 6 SDH-deficient and 13 SDH-proficient GIST samples indicates that CTCF binding is reduced in SDH-deficient GISTs (Figure 1B–C), consistent with our prior study37. Overall CTCF occupancy across TB2 is ~34% lower in the hypermethylated SDH-deficient tumors. We also examined the two enhancers in the ANO1 TAD. We found that both elements transcribe eRNAs in all three GIST subtypes, a feature that is suggestive of strong activity52. Using these eRNAs as surrogates for enhancer activity, we were able to correlate enhancer activity with FGF3 and FGF4 transcription across GIST samples using RNA-seq data. Neither enhancer correlated with these genes in KIT-mutant or PDGFRA-mutant GIST, which retain TB2. However, the more proximal enhancer (eRNA-1) strongly correlated with FGF3/4 expression in the SDH-deficient tumors (Figure 1D and S1C). By contrast, transcription from the distal enhancer (eRNA-2) was lower in tumors with high FGF3/4 expression. We speculate that this reflects a switch from a baseline state in which eRNA-1 interacts with eRNA-2 and ANO1 to an alternate state in which eRNA-1 preferentially interacts with FGF3/4, with the latter state primarily occurring in cells with a disrupted insulator. This suggests that the proximal enhancer may communicate with and potentially direct the robust induction of FGF3 and FGF4 in hypermethylated GIST tumors that have lost TB2.
The nomination of an enhancer element and eRNA-1 as correlates and potential drivers of oncogene activation during tumorigenesis could provide insight into the cell of origin in SDH-deficient GIST. We therefore evaluated the expression of this eRNA systematically across gastrointestinal tissues using published single-cell RNA-sequencing data53. Clustering of these expression data distinguished a spectrum of gastrointestinal cell types, a small subset of which expressed eRNA-1 transcript at levels ranging from 2 to 2.5 normalized gene counts in the single cells (Figure 1E). The most prominent cluster of eRNA-1 expressing cells corresponded to the Interstitial Cell of Cajal (ICC), a specialized cell type with pacemaker functions in healthy gastrointestinal tissue that is a proposed cell-of-origin for GIST44. eRNA-1 and the nearby ANO1 gene were both selectively expressed in the cluster of cells annotated as ICC. Hence, disruption of TB2 could represent a truncal driver event that causes the proximal enhancer to activate FGF oncogenes in ICC or a closely related pre-malignant cell type.
Experimental modeling of combinatorial CTCF site disruptions
We next sought to validate the insulatory function of TB2 and pinpoint the specific CTCF sites whose disruption underlies the induction of FGF genes in SDH-deficient GIST. We used CRISPR/Cas9 editing to disrupt different combinations of CTCF sites in the locus, prioritizing the TB2 sites most frequently lost in tumors (Figure 2A and S2A). We engineered eight derivatives of GIST-T1, a human cell line derived from a KIT-mutant GIST with intact SDH expression. We verified that the parental GIST-T1 cells retain a GIST-like enhancer landscape (Figure 1B) and that the FGF boundary remains intact and hypomethylated in TB2, with robust CTCF binding and physiologic locus topology (Figure S2A–B). We then confirmed efficient and selective editing of the targeted CTCF motifs in the respective GIST-T1 derivatives. Quantitation of transcript levels by real-time PCR revealed robust induction of FGF3 in five of the eight derivatives (ΔInsD/E/F/G/H) relative to non-targeting (NT) control (Figure 2B). A common thread across these five derivatives was the disruption of all four CTCF binding sites in TB2. FGF4 was not strongly induced in any of the cell line derivatives, which suggests that the transcription of this neighboring gene is affected by additional conformational or epigenetic factors (Figure S2C).
Figure 2. Redundant CTCF sites in the TAD boundary mediate robust insulation.

(A) Schematic depicts combinatorial disruption of CTCF sites in the TAD boundary (TB2) region by CRISPR/Cas9 editing. Table lists engineered GIST-T1 derivatives in which different combinations of CTCF sites are disrupted (targeted sites indicated by ‘×’) and the non-targeting control (NT). (B) Relative FGF3 mRNA expression (qRT-PCR) in the derivative lines. Two biologically independent replicates (p-values (one-way ANOVA) < 0.001 for ΔInsD/E/F/G/H relative to control). (C) Scatter plot depicts log2 normalized RNA-seq expression versus log2 fold change in ΔInsE compared to NT control. It reveals specific upregulation of FGF3 in the insulator-disrupted derivative. (D) Barplots show FGF3, eRNA-1 and ANO1 RNA-seq expression in NT and InsKO lines. Data represent two independent biological replicates (one-way ANOVA for ΔInsD/E/F/G/H relative to control: *p < 0.05, **p < 0.01, ***p < 0.001). (E) CTCF (black), H3K27ac (blue), and H3K27me3 (magenta) profiles are shown for NT and InsKO lines over the genomic interval containing the FGF genes and eRNAs. Triangles indicate the CTCF motif orientation (red:sense; blue:antisense). (F) Correlation of FGF3 expression (RNA-seq) and H3K27ac at the FGF3 promoter. RNA expressions in C,D,F panels are TPM normalized. Error bars in B,D panels represent standard deviations. See also Figure S2.
To evaluate further the specificity of induction, we used RNA-seq to query gene expression changes in four responsive derivatives ΔInsE/F/G/H relative to control. Strikingly, despite the robust induction of FGF3, we did not identify any other genes that were significantly regulated by TB2 disruption (Figure 2C and S2D–E). Within the locus, ANO1 and eRNA-2 were weakly downregulated (20 to 30%) in these derivatives, while eRNA-1 and other transcripts remained stable (Figure 2D).
Further support for the specificity of FGF3 induction emerged from an analysis of histone modifications across the locus by ChIP-seq. In unedited GIST-T1 cells and KIT mutant GIST tumors, the FGF TAD is devoid of activating H3K27ac chromatin (Figure 1B). Rather, it is diffusely enriched for histone 3 lysine 27 trimethylation (H3K27me3), a Polycomb-associated modification that silences key developmental loci in lineages where they are not required (Figure S2F). The ANO1 TAD showed the opposite pattern with strong H3K27ac signals over the two eRNA-producing enhancers and essentially no H3K27me3. These chromatin patterns are entirely consistent with the baseline transcriptional activity of the respective TADs in the GIST subtypes and the cell line, which retain TB2.
However, these patterns were markedly changed in the GIST-T1 derivatives with combinatorial CTCF deletions and FGF3 expression. We specifically observed a striking increase in H3K27ac over the FGF3 promoter, consistent with the transcriptional activation and potentially indicative of enhancer-promoter contact (Figure 2E). H3K27me3 levels were decreased over the induced FGF3 promoter. Comparing across the different GIST-T1 derivatives, we found that the amount of H3K27ac increase over the FGF3 promoter closely correlated with degree of FGF3 induction (r ~ 0.95; Figure 2F). Notably, the acetylation patterns in the GIST-T1 derivatives mirror those in SDH-deficient GIST specimens in which the boundary is disrupted by methylation (Figure 1B–C). These experimental models and analyses pinpoint specific combinations of CTCF sites as critical for the insulatory function of TB2, and document transcriptional and acetylation changes suggestive of aberrant long-range enhancer-promoter interactions in insulator-disrupted cells.
High-resolution contact maps reveal interconnected, CTCF-dependent topological loops
To understand the mechanisms by which these combinatorial CTCF disruptions activate FGF3, while sparing FGF4, we turned to the high-resolution RCMC contact map9 for the GIST-T1 lines (Figure 3A). In addition to delineating the neighboring FGF and ANO1 TADs, the contact map reveals long-range contacts between CTCF sites in the respective boundaries (TB1, TB2, TB3). These long-range contacts reflect loop domain structures that organize the corresponding TADs. In particular, the contact map highlights multiple points of contact between redundant CTCF sites in TB1 and TB2, which bound the FGF TAD, and between redundant CTCF sites in TB2 and TB3, which bound the ANO1 TAD. The contacts reflect states in which the respective boundaries are brought into close proximity by paused cohesin loops17. Consistent with the dynamic topology, long-range interactions spanning the entire FGF-ANO1 locus can also be appreciated in the heatmap (TB1-TB3). Yet with exception of this distant contact, physical interactions between the FGF and ANO1 TADs were conspicuously absent in unedited GIST-T1 cells (parental or NT control).
Figure 3. High-resolution contact maps reveal complex topological changes associated with combinatorial CTCF disruptions.

(A) RCMC contact maps shown for the FGF locus in control GIST-T1 (NT; as in Figure 1A) and derivatives (InsKO). Loop domain contacts between the boundaries flanking the two main TADs are indicated by dashed boxes. (B) Heatmap depicts Pearson correlations between replicate RCMC profiles for the control and derivative lines. (C) Insulation scores are plotted for the control and derivative lines across a portion of the FGF locus including the TB2 boundary (gray bar), which is expanded below. (D) CTCF profiles are shown for the FGF locus in control GIST-T1 and the ΔInsE derivative. Arc plots show differential contacts (p < 0.05) between pairs of CTCF sites that are increased (red) or decreased (blue) in the ΔInsE derivative. See also Figure S3.
We next used RCMC to generate high-resolution contact maps for GIST-T1 derivatives in which combinatorial disruption of CTCF sites in TB2 induced FGF3 (Figure 3A). Although the overall topology of the locus remained largely intact (r > 0.9), hierarchical clustering of the contact maps revealed consistent differences between the respective derivatives (Figure 3B). We evaluated the nature of these changes in several ways. First, we computed insulation scores across the locus by summing contacts spanning each position (Figure 3C and S3A). This showed that the insulation strength of TB2 was specifically reduced in all four derivatives. Second, we compared pairwise contact frequencies across the locus to identify interactions that increased or decreased in the GIST-T1 derivative ‘ΔInsE’, which showed the strongest induction of FGF3 (Figure 3D). This analysis revealed a loss of interactions involving sites in the TB2 element, consistent with the loss of CTCF binding (Figure 3D and S3B–C). Conversely, we observed an increase in ‘cross-TAD’ interactions between sites in the FGF TAD and sites in the ANO1 TAD, consistent with the loss of TB2 boundary function (Figure 3D and S3B–C). Notably, whereas cross-TAD interactions were increased in all four of the GIST-T1 derivatives, the magnitudes and patterns of change were highly variable (Figure S3D). This raised the possibility that variations in locus topology underlie the differences in FGF3 transcription levels across these GIST-T1 derivatives.
Contact frequency between FGF3 and eRNA-1 initiation sites predicts FGF3 induction
To investigate specific topological changes that could mediate FGF3 induction, we computed differential contact maps between control GIST-T1 cells and the derivatives (Figure 4A and S4A). This highlighted key changes in interactions involving the proximal enhancer E1 in the ANO1 TAD. The enhancer gained interactions with multiple sites across the FGF TAD in the InsE derivative that could be visualized as a stripe of increased contact frequency in the differential map (red). Importantly, this stripe extended only as far as the FGF3 promoter and did not involve FGF4 (Figure S4B). Conversely, the enhancer lost interactions with multiple sites across the ANO1 TAD in the InsE derivative that could be visualized as a stripe of reduced contact frequency in the differential map (blue). Importantly, the interactions with the FGF TAD were highly specific to the derivative and essentially undetectable in control GIST-T1 cells. Hence, disruption of TB2 redirects the enhancer encoding eRNA-1 towards the FGF TAD and the FGF3 promoter.
Figure 4. Enhancer-promoter contact frequency scales linearly with FGF3 transcriptional activation.

(A) Heatmap depicts differential RCMC contact frequency between ΔInsE and control GIST-T1 cells. Dashed circles indicate interactions between the eRNA-1 transcriptional start site and the FGF3 promoter (left) or the ANO1 promoter (right). Gene tracks, promoter and enhancer regions, and CTCF (triangles) are indicated below. Anchors indicate the transcription start sites of FGF3 and eRNA-1. Upon boundary disruption, eRNA-1 gains interaction with FGF3 (red heat in E1-FGF3 dashed circle), but loses interaction with sites across the ANO1 TAD (blue stripe extending to the E1-ANO1 dashed circle). (B) Virtual 4C tracks (inferred from RCMC) depict contact frequency between eRNA-1 (top, black) or the FGF3 promoter (bottom, gray) and all genomic positions across the FGF locus in control GIST-T1 (NT) and InsKO lines. Arc plots below show differential contacts (p < 0.05) between H3K27ac sites that are increased (red) or decreased (blue) in the ΔInsE derivative. (C) Heatmaps depict RCMC contact frequency between 10 kb windows centered over the transcription start sites of FGF3 and eRNA-1 in control GIST-T1 and the respective derivatives. Horizontal and vertical tracks flanking the heatmaps depict H3K27ac signal over the FGF3 promoter and eRNA-1, respectively. (D) FGF3 expression (RNA-seq) is plotted against contact frequency between the transcription start sites of FGF3 and eRNA-1 for control GIST-T1 and derivative lines. Contract frequencies and correlations were computed for different window sizes (symbols and lines). Transcription scales linearly with enhancer-promoter contact when calculated between start sites at 1 kb resolution. (E) Schematic summarizes the impact of TB2 disruption on TAD organization and re-targeting or ‘hijacking’ of the key enhancer to activate FGF oncogenes. See also Figure S4.
To evaluate the interactions between enhancer and FGF3 promoter more precisely, we imputed ‘4C’ profiles from the RCMC data (Figure 4B). For each derivative and control GIST-T1 line, we imputed one profile for interactions made by the enhancer across the locus, and a second profile for interactions made by the FGF3 promoter. In the control cells, the E1 enhancer contacted the ANO1 gene and the second enhancer (E2) in the ANO1 TAD. In the derivatives, the E1 enhancer also contacted sites in the FGF3 TAD, with a sharp peak directly over the FGF3 promoter. Consistent with this result, the 4C profiles imputed for FGF3 revealed a contact peak over the E1 enhancer, but did not reveal interactions with other sites in the ANO1 TAD (Figure S4C). Closer examination of the 4C profiles suggested two additional important points. First, the profiles indicated that the peak interaction occurs between the transcription start sites of eRNA-1 and FGF3. Second, they indicated that contact frequency between these sites varied across the GIST-T1 derivatives.
Based on these findings, we hypothesized that physical interaction between these specific positions in the E1 enhancer and the FGF3 promoter drives transcriptional induction of FGF3. As such, their contact frequency should predict differential FGF3 levels across the derivatives. To test this, we computed contact frequency between positions centered over the eRNA-1 transcription start site in the E1 enhancer and the FGF3 transcription start site (Figure 4C). We initially quantified contact over 5 kb windows, based on prior convention23, and correlated the values against FGF3 RNA expression across control and derivative lines (Figure 4D and S4D). Although this yielded a positive correlation, the relationship was nonlinear as contact frequency appeared to reach a plateau in the highest FGF3 expressors. We therefore leveraged the high-resolution RCMC data to quantify contacts for 1 kb windows directly centered on the eRNA-1 and FGF3 transcription start sites. Remarkably, these more precise interaction measurements correlated linearly with FGF3 RNA expression (r = 0.97, p-value < 0.01; Figure 4D and S4E–F). This striking linear correlation suggests that FGF3 expression is precisely controlled by interactions with the unleashed E1 enhancer or, under more physiologic conditions, may scale with cumulative contact with the lineage-specific enhancers that regulate its expression during development (Figure 4E)
Discussion
Understanding the molecular mechanisms by which chromatin structure and genome topology regulate gene expression is a critical goal relevant to human biology and disease. Here we drilled down on a recurrent epigenetic lesion that disrupts a boundary element to drive oncogene expression in GIST. We find that redundant CTCF sites in the boundary enforce adjacent TADs that robustly insulate the FGF proto-oncogenes, and we chart topological alterations incurred upon boundary disruption that allow aberrant enhancer-promoter interactions and transcriptional induction. Our study exemplifies how disease-associated epigenetic or genetic alterations can provide fundamental insights into genome organization, boundary elements and their impact on gene regulation.
First, we find that the FGF boundary element achieves robustness through redundancy. Multiple CTCF sites within the element enforce the boundaries of two adjacent TADs, which respectively ensconce the FGF genes and the ANO1 enhancer. Combinatorial editing of the cognate binding motifs revealed that any one of the four central CTCF sites is sufficient to maintain insulation of the proto-oncogenes. The sufficiency of even a single CTCF site to prevent enhancer hijacking is consistent with a model in which the coordinate extrusion of enhancer and promoter through cohesin incurs the long-range physical interactions required for transcription12,13. The presence of even a single intervening CTCF site would then suffice to exclude either enhancer or promoter from the extruding loop, and thereby prevent contact and transcriptional induction. In the absence of extrusion-driven co-localization, diffusion-limited enhancer-promoter contacts are presumably too rare to activate transcription due to the significant distance between FGF3 and the enhancer. Our results are consistent with the linearity of the activity-by-contact (ABC) model26,27 and help explain the remarkable robustness of the boundary.
Second, we gained key insights into how boundary destabilization translates to transcriptional activation by analyzing high-resolution contact maps for derivative GIST lines lacking different combinations of CTCF sites. Although the FGF boundary is clearly compromised in all of the derivatives with transcriptional induction of FGF3, the respective TADs remain partially intact. Importantly, we found that the different combinatorial CTCF disruptions incur subtly different conformational changes. We leveraged this variability to evaluate the relationship between conformational change and transcriptional activity. We found that contact frequency between enhancer and FGF3 promoter correlated with FGF3 induction. Importantly, the correlation was strongest and remarkably linear when we focused precisely on contact between 1 kb intervals corresponding to the eRNA-1 transcriptional start site and the FGF3 transcriptional start site. This analysis was only made possible by the high-resolution nature of the RCMC contact maps9. The linear association was lost when we measured interactions between 5 kb windows, which has been the standard in prior analyses23. This supports a model in which contact between transcriptional complexes in enhancer and promoter, which may be stabilized by biophysical interactions54,55, translates directly to activation of this oncogene. It also indicates that a relatively small change in TAD organization can lead to remarkably strong gene induction, provided it increases physical enhancer-promoter communication.
Third, our study provides insights into disease mechanisms. Having pinpointed a specific enhancer as a driver of aberrant FGF expression in SDH-deficient GIST, we used the associated eRNA as a surrogate to track its activity across gastrointestinal cell types using single cell data. The eRNA was strongly and specifically expressed in the Interstitial Cell of Cajal, a candidate cell-of-origin for GIST44. Yet the enhancer should not impact the FGF genes under physiologic conditions as the boundary is constitutive across tissues and lineages. The boundary should also be resilient to point mutations or short deletions given its multiple redundant CTCF sites. However, it is vulnerable to DNA hypermethylation which spreads across the element and displaces multiple CTCF sites in SDH-deficient GIST. Genetic rearrangements that effectively displace the entire boundary have also been described29, but are extremely rare. Hence, epigenetic or genetic disruption of this otherwise constitutive boundary in a specific cellular context causes FGF induction and a proliferative advantage that drives tumorigenesis. We acknowledge that this boundary element and the impact of its loss may not be representative of TAD boundaries. It was identified because of its potent effect on FGF expression and tumorigenesis. Moreover, its regulatory impact is highly context-dependent and might only be appreciated in studies of GIST, the rarified ICC, or potentially specific neural cell types that also express ANO1. Indeed, we note that studies of other TAD boundaries have tended to observe more subtle effects on neighboring gene expression.
In conclusion, we have deeply characterized the determinants and impact of a uniquely robust boundary element that insulates FGF ligand genes, which represent key developmental regulators and proto-oncogenes. Our results explain why DNA hypermethylation is uniquely suited for the coordinate disruption of the redundant CTCF sites that enforce the boundary, and document activity of the relevant enhancer in a likely cell-of-origin for GIST. Finally, our high-resolution contact maps of derivative lines with compromised insulation provide critical support for a model in which long-range transcriptional activation is driven by direct physical interaction between initiation sites in enhancer and promoter.
Limitations of the study
We note several limitations of our study. Although SDH-deficient GISTs can express FGF3 and FGF4, our CTCF perturbations specifically activate FGF3. The pattern of activation is consistent with the observation that FGF4 does not physically interact with the enhancer or any other position in the ANO1 TAD. Nonetheless, we cannot explain why the aberrant contacts unleashed upon FGF boundary disruption fail to extend to FGF4, even when an additional CTCF site positioned between FGF3 and FGF4 was disrupted. One possibility is that partial methylation of the FGF3 promoter, which is evident in SDH-deficient GISTs (Figure S4G), may redirect the enhancer towards FGF4 on some alleles in these tumors. A further limitation is that while RCMC enables contact mapping with remarkable resolution and precision, it nonetheless provides static, averaged measurements of a highly dynamic process. We cannot track loop extrusion or stabilization over time, nor can we evaluate the temporal relationship between enhancer-promoter contacts and transcription. We speculate that a cascade of conformational changes incurred upon boundary disruption alter boundary structure, TAD organization and thereby effect enhancer-promoter contact and FGF3 transcription. Finally, while our perturbations and high-resolution contact maps are supportive of a causal role for physical enhancer-promoter contact, the possibility that contact frequency is affected by transcription or doesn’t linearly control transcription cannot be ruled out. Future studies and new technologies are needed to address critical issues regarding the kinetics of loop extrusion, TAD function, and the causality of long-range enhancer contact.
STAR Methods
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Bradley E. Bernstein (Bradley_Bernstein@dfci.harvard.edu).
Materials availability
Plasmids generated in this study are available upon request.
Data and code availability
Accession information of published data sets analyzed in the manuscript are included in the key resources table along with information for original datasets. Original data generated in this study have been deposited at GEO (GSE241927) and are publicly available as of the date of publication.
This paper does not report any original code.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
Cell cultures
The GIST-T1 cell line, which was derived from a KIT-mutant GIST and maintains a GIST-like enhancer landscape and the FGF boundary56, was obtained from Cosmo Biosciences (no. #PMC-GIST01C). This cell line was cultured using DMEM with 10% fetal bovine serum, 1× penicillin-streptomycin and 1× Glutamax (Life Technologies) at 37°C in a humidified CO2-controlled (5%) incubator.
METHOD DETAILS
Insulator knockout cell line
For genome editing, we designed a U6-sgRNA-CMV-Cas9-T2A-eGFP piggyBac plasmid with the following sgRNAs: human non-targeting: 5’-ACGGAGGCTAAGCGTCGCAA-3’; CTCF site 1: 5’-GTCCCACTGCCACCACAAGA-3’; CTCF site 2: 5‘-ATGACATGATGGCCAGCAGA-3’; CTCF site3: 5’-GAAAAGCAACCGCCTCTAGG-3’; CTCF site 4: 5’-GGGCCAGGCCCGCCGCCAGG-3’; CTCF site 5: 5‘-ATTTGGGAATCCTGGCTGCA-3’; CTCF site 6: 5‘-AGAGAAGGGAAGGCCACTAG-3’; CTCF site 7: 5’-TTGTTTCTGTTGCCACCACA-3’; CTCF site 8: 5’-CTGCCCGACTCTCCAGCAGA-3’; CTCF site 9: 5’-TACTGAGGCCTACCAGCAGA-3’; sgRNAs were designed to disrupt the NGG portions of the CTCF motif and to have high specificity using the Benchling software. Constructs harboring two sequential sgRNAs cassettes were synthesized by TwistBioscience and cloned using standard methods. We verified insulator disruptions at target loci using CTCF ChIP-seq assay.
GIST-T1 WT cells were transfected with 1:1 ratio of the genome editing construct along with the piggyBac transposase construct using LipoD293 per manufacturer’s guidelines. Cells were sorted twice for GFP positivity on a Sony SH800 flow cytometer, once 48 hours after transfection, and another time after the cells reached confluence from the initial sort. Crosslinked cells were harvested for ChIP analysis to verify loss of CTCF binding.
Quantitative real-time PCR
Total RNA was isolated from Cell samples using the RNeasy Plus Kit (Qiagen no. 74134) and used to synthesize cDNA with the ProtoScript II First Strand cDNA Synthesis Kit (NEB no. E6560). cDNA was analyzed using the SYBR mastermix (Applied Biosystems) on a 7500 Fast Real Time system (Applied Biosystems). Gene expression primers were as follows: FGF4 set 1 forward 5′-CCAACAACTACAACGCCTACGA-3′; FGF4 set 1 reverse 5′-CCCTTCTTGGTCTTCCCATTCT-3′; FGF4 set 2 forward 5′-GCAGCAAGGGCAAGCTCTAT-3′; FGF4 set 2 reverse 5′-CGGTTCCCCTTCTTGGTCTT-3′; FGF3 forward 5′-ATGCTTCGGAGCACTACAGC-3′; FGF3 reverse 5′-CACGTACCACAGTCTCTCGG-3′. All gene expression results were normalized to primers for ribosomal protein, large, P0 (RPLP0) as follows: forward 5′-TCCCACTTGCTGAAAAGGTCA-3′; reverse 5′-CCGACTCTTCCTTGGCTTCA-3′.
RNA-seq
Total RNA was isolated from cell samples using the RNeasy Plus Kit (Qiagen) and quality was determined via TapeStation (Agilent). Libraries were prepared using the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB no. E7490) and the NEBNext Ultra II RNA Library Prep Kit (NEB no. E7770), and equimolar multiplexed libraries were sequenced with paired 38 base end reads on an Illumina NextSeq 500.
Chromatin immunoprecipitation (ChIP-seq)
ChIP-seq was performed as described previously37,39. Briefly, cultured cells were crosslinked in 1% formaldehyde for 12 minutes, flash frozen in liquid nitrogen, and then stored at −80 °C. Chromatin was fragmented using a Branson Sonifier calibrated to shear DNA to between 200-600 bp fragment length. CTCF was precipitated using a monoclonal rabbit antibody (Cell signaling clone D31H2, no. 3418). H3K27ac was precipitated using a polyclonal rabbit antibody (Active Motif no. 39133). H3K27me3 was precipitated using a polyclonal rabbit antibody (Cell signaling clone C36B11, no. 9733). Eluted ChIP DNA was used to generate sequencing libraries by NEBNext Ultra II DNA Library Prep Kit (NEB no. E7645). Barcoded fragments were amplified by PCR using NEBNext Ultra II Q5 Master Mix for 16 cycles and double-size selected using AMPure XP beads (Beckman Coulter no. A63880) for fragments between 300-500 bp.
Locus bisulfite sequencing
Briefly, genomic DNA was extracted (Monarch Genomic DNA Purification Kit, NEB no. T3010) and subjected to bisulfite conversion using EZ DNA Methylation-Lightning Kit (Zymo Research no. D5030). The conversion was split to four independent PCR reactions and CTCF sites were then PCR amplified using KAPA HiFi HotStart Uracil+ReadyMix PCR Kit (KAPA no. KK2801) and the following primers: CTCF 1 forward 5′-AAATTTAAGAATATTAAAGGTGGGAAAG-3′, reverse 5′-ACCTAAAAATTACAATTAACTCAACCC-3′; CTCF 2 forward 5′-TGGTGGTTTAGTTTGTTTTGAATTAAGA-3′, reverse 5′-ACCCACCTTAAAATAAAAATTAAAACCAA-3′; CTCF 3 forward 5′-ATTGTGATTGGTTGTGTTTTATATGGTGTA-3′, reverse 5′-AATCCTAAATCCAAACCC AAATCC-3′; CTCF 4 forward 5′-ATGAAATGTAGTAATGTTTTTTGTATATGG-3′, reverse 5′-ACTCTATCCTTTAAAAAACAACCCC-3′. Sequencing reads (Illumina) were aligned to the bisulfite converted locus and the frequency of methylated to unmethylated Cytosines was calculated.
Region Capture Micro-C (RCMC)
Crosslinking
RCMC was performed as described previously9. Briefly, trypsinized cells were doubly crosslinked to fix protein-protein and protein-DNA interactions using 3 mM disuccinimidyl glutarate (ThermoFisher no. 20593) and 1% methanol-free formaldehyde (ThermoFisher no. 28906) in 1× PBS (ThermoFisher no. 10010023), respectively. The crosslinking reaction was gently mixed at room temperature for 35 min, after which formaldehyde was added to a final concentration of 1%. The double crosslinking reaction was mixed at room temperature for an additional 10 min before quenching with Tris buffer pH 7.5 (ThermoFisher no. 15567027) at a final concentration of 0.375 M. Crosslinked cells were washed twice with 1× PBS containing 100 μg/mL BSA (NEB no. B9000), recounted to quantify any sample loss during fixation and then partitioned into 5 M cell aliquots that were pelleted and snap-frozen in liquid nitrogen for storage at −80 °C.
MNase digestion
Cell membranes were solubilized to extract intact nuclei by resuspending crosslinked 5 M cell pellets in Micro-C Buffer #1 (MB#1; 50 mM NaCl, 10 mM Tris-HCl pH 7.5, 5 mM MgCl2, 1 M CaCl2, 0.2% NP-40 Substitute (Abcam no. ab142227), 1× Protease Inhibitor Cocktail (Sigma-Aldrich no. 5056489001)) at 1 M cells per 100 μL for 20 min on ice. Following an MB#1 wash, samples were resuspended in 500 μL MB#1 and 1-2 μL MNase (NEB no. M0247) was added. This digestion reaction was mixed at 37 °C for 20 min on a thermomixer before being quenched with 20 mM EGTA (bioWORLD no. 40520008) and heat inactivated at 65 °C for 10 min. Digested nuclei were washed twice with ice-cold Micro-C Buffer #2 (50 mM NaCl, 10 mM Tris-HCl pH 7.5, 10 mM MgCl2 and 100 μg/mL BSA).
End repair and labeling
To generate blunt ends on digested DNA fragments before proximity ligation and add biotinylated nucleotides, a series of enzymatic processing steps were performed. First, to catalyze the addition of 5’-phosphate groups and the removal of 3’-phosphate groups, digested samples generated from 5 M cell inputs were incubated in end-repair reactions (50 U T4 Polynucleotide Kinase (NEB no. M0201), 50 mM NaCl, 10 mM Tris-HCl pH 7.5, 10 mM MgCl2, 100 μg/mL BSA, 2 mM ATP (NEB no. P0756) and 5 mM DTT (Sigma-Aldrich no. 10197777001), in water) at 37 °C for 15 min while mixing. To create 5’ fragment overhangs for end-blunting and labeling, 50 U DNA Polymerase I Klenow Fragment (NEB no. M0210) was added to the reaction and incubated at 37 °C for 15 min while mixing. Next, a mixture of dNTPs in end-labeling buffer (66 μM each of dTTP (Invitrogen no. 10297018), dGTP (Invitrogen no. 10297018), biotin-dATP (Jena Bioscience no. NU-835-BIO14), biotin-dCTP (Jena Bioscience no. NU-809-BIOX) and 100 μg/mL BSA in 1× T4 DNA Ligase Buffer) was added to the reaction. This reaction was incubated at room temperature for 45 min with interval mixing before being quenched by 30 mM EDTA (Invitrogen no. #15575020) and heat inactivated at 65 °C for 20 min. Finally, end-blunted and biotin-labeled nuclei were washed once with Micro-C Buffer #3 (50 mM Tris-HCl pH 7.5, 10 mM MgCl2 and 100 μg/mL BSA).
Proximity ligation and removal of unligated biotin
Proximity ligation was performed by incubating labeled chromatin in a ligation reaction (10,000 U T4 DNA Ligase (NEB no. M0202), 1× T4 DNA Ligase Buffer, 100 μg/mL BSA) at 25 °C for overnight with gentle mixing. To remove biotinylated dNTPs from all unligated fragment ends, samples were digested by 1,000 U Exonuclease III (NEB no. M0206) in reaction buffer (1× NEBuffer #1 and 100 μg/mL BSA in water) at 37 °C for 15 min with interval mixing.
DNA purification and size-selection
To prepare ligated DNA for library generation, DNA was reverse crosslinked and proteins and RNA were digested by adding 200 μL ProK Buffer (20 mM Tris-HCl pH7.5, 100 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.5 % Tx-100, 0.2 % SDS (Invitrogen no. AM9822), 24 U Proteinase K (NEB no. P8107), and 100 μg/mL RNaseA (Invitrogen no. EN0531), in water) to the samples and incubating at 65 °C overnight. DNA was extracted using the Zymo DNA Clean & Concentrator kit (Zymo Research no. D4034) according to the kit manual.
Dinucleosome-sized DNA fragments (250–350 bp) were isolated by extraction from a 1% agarose gel (VWR no. 97062). Gel extracts were purified using the Zymo Gel Purification kit (Zymo Research no. D4008), and samples were quantified by Qubit 1× dsDNA High Sensitivity Assay (Invitrogen no. Q33231). Sample ends were polished and blunted again using the Quick Blunting Kit (NEB no. E1201) at 25 °C for 45 min, followed by reaction inactivation at 65 °C for 10 min.
Ligated DNA contact fragments were isolated by pulling down biotin-bound fragments using Dynabeads MyOne Streptavidin T1 (Invitrogen no. 65601). DNA samples were bound to beads in a Binding and Wash Buffer (1 M NaCl, 5 mM Tris-HCl pH 7.5, 500 μM EDTA, 0.1% Tween-20 (Sigma-Aldrich no. P8074) in water) at room temperature for at least 30 min with mixing. After two washes with the Binding and Wash Buffer, the bead-bound samples were washed once with 10 mM Tris-HCl pH 7.5 before library preparation.
Library preparation
Illumina library preparation was performed using the NEBNext Ultra II kit (NEB no. E7645) and NEB Multiplex Oligos for Illumina Primer Set 1 (NEB no. E7335) to end-repair, A-tail, and adaptor ligate the bead-bound samples. All steps were performed as directed by the manual, except that incubations included interval shaking (1 min on, 3 min off) at 1,000 rpm. Sample washes were performed using Binding and Wash Buffer and Tris Buffer (10 mM Tris-HCl pH 7.5 in water). After adaptor ligation, samples were amplified by PCR for 7-8 cycles and then purified using AmPure XP beads.
Capture of target loci and sequencing
80-mer probes were designed to tile end-to-end without overlap across the capture loci through Twist Bioscience. Probes with high predicted likelihoods of off-target pull-down (for example, such as those in high-repeat regions) were masked and removed from the probe tiling, and probe coverage was double-checked to ensure the inclusion of key genomic features (for example, all promoters and CTCF/K27ac peaks in the locus) before finalization. Probe panels were synthesized and purchased as Custom Target Enrichment Panels from Twist Bioscience.
Capture enrichment was performed in accordance with Twist Bioscience’s Target Enrichment Standard Hybridization v1 Protocol. Briefly, pooled sample libraries were dried and mixed with Hybridization Mix (Twist Bioscience no. 104178), Custom Panels (Twist Bioscience no. 101001) and Universal Blockers (Twist Bioscience no. 100578). The library pool was hybridized to the biotinylated probe panel overnight, after which streptavidin beads (Twist Bioscience no. 100983) were used to pull down probes with hybridized ligated fragments and then washed (Twist Bioscience no. 104178) to remove unbound fragments. Bead-bound libraries were amplified by PCR using the Equinox Library Amplification Mix (Twist Bioscience no. 104178) for 10 cycles and purified using AmPure XP beads. Pooled libraries were sequenced by paired-end 2 × 50 cycle sequencing kits with Illumina NovaSeq S1 flow cells on a NovaSeq 6000 system by the Broad Institute of MIT and Harvard’s Walk-Up Sequencing services.
QUANTIFICATION AND STATISTICAL ANALYSIS
CTCF motif Analysis
CTCF peaks of GIST-T1 cells were called by MACS3 (version 3.0.0) and merged by bedtools (version 2.31.0). Peaks were then centered around the CTCF motif where found by FIMO (MEME 4.7) at a 100 bp window around the peak center, based on the “JASPAR_CORE_2021_vertebrates” database (MA0139.1). If multiple motifs were detected, we kept the one with the highest score.
RNA-seq Analysis
Libraries were sequenced as paired 38 base end reads on an Illumina NextSeq 500. Reads were aligned using STAR 2.5.345 to the human reference (hg38). RNA-seq data for clinical GISTs were previously published (GEO: GSE107447)37. Gene expression was estimated by RSEM (version 1.3.2). Data was visualized using R version 4.3.1.
ChIP-seq Analysis
Libraries were sequenced as paired 38 base end reads on an Illumina NextSeq 500. Reads were then aligned to the hg38 reference genomes using BWA aln version 0.7.4, removing reads with MAPQ score lower than 10. PCR duplicates were removed by Picard toolkit 2.9.2. Peaks were called with MACS3, correcting against input controls. Differential analysis of CTCF peaks and quantification of reads per peak were previously done. CTCF/H3K27ac ChIP-seq on GIST patients was downloaded from GEO (GSE107447)37, normalized by DESeq2 and analyzed. BigWig files normalized for reads per million (RPM) were visualized using the plotgardner package (version 1.6.1)57 on R version 4.3.1. Processed (RPM normalized BigWig), and raw data has been deposited on GEO (accession numbers: GSE241924).
Mapping and normalizing RCMC
RCMC paired-end reads generated by the Illumina NovaSeq sequencers were downloaded as .fastq files for each sample, pair mate, and flow cell lane. Read quality was verified using FastQC (v0.11.9). Paired end reads were aligned to the UCSC hg38 genome using Juicer (default setting, version 1.6). Deduplicated paired reads were filtered by mapping quality score (MAPQ ≥ 30) and filtered to retain those reads where both read mates lay within the capture region of interest (chr11:69.4-70.4Mb). These filtered reads were converted to contact matrices (.hic format file) using Juicerbox Tools with Pre at base pair delimited resolutions of 1 Mb, 500 kb, 100 kb, 25 kb, 10 kb, 5 kb, 1 kb and 500 bp. All contact matrices used for further analysis were KR-normalized with Juicer. RCMC contact maps were visualized alongside genomic annotations and ChIP-seq datasets using the ‘plotgardner’ package (version 1.6.1) on R version 4.3.1.
Chromatin contact analysis
Insulation score/Boundary
Insulation scores and TAD-boundaries have been calculated using the ‘GENOVA’ package (version 1.0.1)58 on R version 4.3.1. For the TAD calling performed on RCMC maps (.hic files), we used 1 kb contact matrices, with a window size of 100 kb and a min_strength of −0.2 to find TAD-boundaries. Insulation scores and TAD-boundaries were plotted using R.
Correlation Matrix
Contact map data KR-normalized at 1 kb resolution were extracted from .hic files (MAPQ ≥ 30) using the ‘strawr’ package (version 0.0.91) on R version 4.3.1. The extracted data were filtered to bins comprising CTCF/H3K27ac peaks that were called from CTCF/H3K27ac ChIP-seq data by MACS3 with a minimum distance of 20 kb. From these filtered matrices, the pairwise correlation matrix between RCMC contact maps has been visualized using the ‘Pheatmap’ package (version 1.0.12) on R.
Differential Contacts
Contact map data KR-normalized at 5 kb resolution were extracted from .hic files (MAPQ ≥ 30) using the ‘strawr’ package in R and filtered with a minimum distance of 20 kb. From this data, all differential contacts with adjusted p-value < 0.05 and absolute log2foldchange > 0.5 were identified using the ‘DESeq2’ package (version 1.40.2) in R version 4.3.1. To extract CTCF-CTCF or H3K27ac-H3K27ac differential contacts, all differential contacts are filtered to contacts between the bins comprising CTCF or H3K27ac peaks that were called from CTCF or H3K27ac ChIP-seq data by MACS3, respectively. Differential contacts were visualized using the ‘plotgardner’ package (version 1.6.1) on R.
Virtual 4C
Virtual 4C tracks were generated by using Juicebox. Horizontal and vertical 1D tracks of the pixel at 1 kb resolution for the interaction between FGF3 promoter (chr11: 69,819,000-69,820,000) and either eRNA-1 (chr11: 69,926,000-69,927,000), eRNA-2 (chr11: 69,986,000-69,987,000) or ANO1 promoter (chr11: 70,078,000-70,079,000) were generated with Juicebox ‘generate 1D track’ function. Wig files were visualized using the ‘plotgardner’ package (version 1.6.1) on R version 4.3.1.
IntraTAD and CrossTAD
Contact frequencies at 5 kb resolution were extracted from .hic files (MAPQ ≥ 30) using the ‘strawr’ package in R and filtered with a maximum distance of 10 kb. The normalization factors were calculated from the filtered frequencies using the ‘DESeq2’ package (version 1.40.2) in R version 4.3.1. Contact map data KR-normalized at 5 kb resolution were extracted from .hic files (MAPQ ≥ 30) using the ‘strawr’ package in R and filtered with a minimum distance of 20 kb and then applied by the normalization factors. Contacts in chr11:69.675-70.195 mb are counted for total contacts in the FGF-ANO1 locus. Contacts in chr11:69.675-69.905 mb and chr11:69.910-70.195 mb are counted for IntraTAD contacts in the FGF and ANO1 TADs, respectively. Contacts between chr11:69.675-69.905 mb and chr11:69.910-70.195 mb are counted for CrossTAD contacts between the FGF and ANO1 TADs. The ratio of CrossTAD/IntraTAD and the number of total contacts were plotted using R version 4.3.1.
Analysis of ENCODE, 4DN, Gut Cell Atlas data
Normalized ChIP-seq data and insulation scores with boundary calls from various human tissues and cell lines were downloaded from ENCODE and 4DN, respectively. Genome locus figures were plotted using the ‘plotgardner’ package (version 1.6.1) on R version 4.3.1. Expression of ANO1, eRNA-1 and eRNA-2 in scRNA-seq from human gut cells was plotted from Gut Cell Atlas, dataset E-MTAB-9543, E-MTAB-9536, E-MTAB-9532 and E-MTAB-9533.
Analysis of scRNA-seq data
The published 10X 5’ libraries53 were mapped to Human Gencode 41 assembly59 using STARsolo60. We have set barcode and umi parameters in STARsolo as follows “--soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 10 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts”. We have adjusted the 5p clipping to 39 “--clip5pNbases 39 0”. We have also set the following parameters to ensure that the gene expression matrices generated are compatible with the downstream analysis steps “--soloUMIdedup 1MM_CR --outFilterScoreMin 30 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB --soloUMIfiltering MultiGeneUMI_CR --soloBarcodeMate 1”
The raw matrix outputs of STARsolo were then processed using SCANPY61. The matrices were normalized, log1p transformed and then scaled. Technical variabilities due to total counts and mitochondrial content were regressed out using the “sc.pp.regress_out” function. PCA, clustering and UMAP generation were performed using SCANPY’s inbuilt functions.
Group comparisons
Pearson r values were computed for all correlations. Two-sided t-test P values were computed when comparing two groups. One-way ANOVA P values were computed when comparing more than 2 groups.
Supplementary Material
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Rabbit anti CTCF (clone D31H2) | Cell Signaling Technologies | 3418 |
| Rabbit anti H3K27ac | Active motif | 39133 |
| Rabbit anti H3K27me3 (clone C36B11) | Cell Signaling Technologies | 9733 |
| Bacterial and Virus Strains | ||
| NEB stable competent E. coli | New England Biolabs | C3040I |
| Chemicals, Peptides, and Recombinant Proteins | ||
| Agencourt AMPure XP | Beckman Coulter | A63882 |
| LipoD293 | Signagen | SL100668 |
| Fetal Bovine Serum | GIBCO | 26140079 |
| DMEM, high glucose, GlutaMAX Supplement, pyruvate | GIBCO | 10569010 |
| Penicillin-Streptomycin | GIBCO | 15140122 |
| Formaldehyde | ThermoFisher | 28906 |
| DSG (disuccinimidyl glutarate) | ThermoFisher | 20593 |
| 1 M Tris-HCI Buffer, pH 7.5 | Invitrogen | 15567027 |
| NP-40 Substitute | Abcam | ab142227 |
| Protease Inhibitor Cocktail (EDTA-free) | Sigma-Aldrich | 5056489001 |
| MNase (micrococcal nuclease) | New England BioLabs | M0247S |
| EGTA | BioWorld | 40520008 |
| BSA (bovine serum albumin) | Sigma-Aldrich | B8667 |
| BSA, Molecular Biology Grade | New England BioLabs | B9000S |
| T4 PNK (polynucleotide kinase) | New England BioLabs | M0201 |
| ATP | New England BioLabs | P0756 |
| DTT (dithiothreitol) | Sigma-Aldrich | 10197777001 |
| DNA Polymerase I Klenow Fragment | New England BioLabs | M0210 |
| dNTP set | Invitrogen | 10297018 |
| Biotin-14-dATP | Jena Bioscience | NU-835-BIO14 |
| Biotin-11-dCTP | Jena Bioscience | NU-809-BIOX |
| EDTA | Invitrogen | 15575020 |
| T4 DNA Ligase | New England BioLabs | M0202 |
| Exonuclease III | New England BioLabs | M0206 |
| Proteinase K | New England BioLabs | P8107S |
| RNase A, DNase- and protease-free | ThermoFisher | EN0531 |
| SDS (sodium dodecyl sulfate) | Invitrogen | AM9822 |
| Agarose | VWR | 97062 |
| Dynabeads MyOne Streptavidin T1 | Invitrogen | 65601 |
| Tween-20 | Sigma-Aldrich | P8074 |
| Multiplex Oligos for Illumina Primer Set 1 | New England BioLabs | E7335 |
| Q5 High-Fidelity DNA Polymerase | New England BioLabs | M0491 |
| Capture Custom Panel | Twist Bioscience | 101001 |
| Standard Hybridization Mix | Twist Bioscience | 104178 |
| Universal Blockers | Twist Bioscience | 100578 |
| Streptavidin Binding Beads | Twist Bioscience | 100983 |
| Equinox Library Amplification Mix | Twist Bioscience | 104178 |
| DNA Purification Beads | Twist Bioscience | 100983 |
| SYBR™ Green PCR Master Mix | Applied Biosystems | 4309155 |
| KAPA HiFi Uracil+ HotStart ReadyMix | KAPA | KK2800 |
| Critical Commercial Assays | ||
| NextSeq 500/550 High Output Kit v2.5 (75 Cycles) | Illumina | 20024906 |
| Qubit dsDNA HS Assay kit | Invitrogen | Q32854 |
| Qiagen Maxiprep plasmid kit | Qiagen | 12163 |
| Bioanalyzer D1000 screentape | Agilent | 5067-5582 |
| DNA Clean & Concentrator kit | Zymo Research | D4034 |
| Gel Purification kit | Zymo Research | D4008 |
| Quick Blunting Kit | New England BioLabs | E1201 |
| NEBNext Ultra II DNA Library Prep Kit for Illumina | New England BioLabs | E7645 |
| NEBNext Ultra II RNA Library Prep Kit for Illumina | New England BioLabs | E7770 |
| NEBNext Poly(A) mRNA Magnetic Isolation Module | New England BioLabs | E7490 |
| RNeasy Plus Kit | Qiagen | 74134 |
| ProtoScript II First Strand cDNA Synthesis Kit | New England BioLabs | E6560 |
| Monarch Genomic DNA Purification Kit | New England BioLabs | T3010 |
| EZ DNA Methylation-Lightning Kit | Zymo Research | D5030 |
| Deposited Data | ||
| Processed data | GEO | GSE241927 |
| Raw data (ChIP-seq) | GEO | GSE241924 |
| Raw data (Micro-C) | GEO | GSE241925 |
| Raw data (RNA-seq) | GEO | GSE241926 |
| GIST Patient37 (ChIP-seq, RNA-seq, DNA methylation) | GEO | GSE107447 |
| scRNA-seq53 (Gut Cell Atlas) | ArrayExpress | E-MTAB-9543, E-MTAB-9536, E-MTAB-9532, E-MTAB-9533 |
| Experimental Models: Cell Lines | ||
| GIST-T1 cell line | Cosmo Biosciences | PMC-GIST01C |
| Oligonucleotides | ||
| FGF4 real time PCR primers: CCAACAACTACAACGCCTACGA (Forward) CCCTTCTTGGTCTTCCCATTCT (Reverse) | IDT | N/A |
| FGF3 real time PCR primers: ATGCTTCGGAGCACTACAGC (Forward) CACGTACCACAGTCTCTCGG (Reverse) | IDT | N/A |
| Ribosomal protein, large, P0 (RPLP0) real time PCR primers: TCCCACTTGCTGAAAAGGTCA (Forward) CCGACTCTTCCTTGGCTTCA (Reverse) | IDT | N/A |
| Locus-specific Bisulfite-sequencing (CTCF#1): AAATTTAAGAATATTAAAGGTGGGAAAG (Forward) ACCTAAAAATTACAATTAACTCAACCC (Reverse) | IDT | N/A |
| Locus-specific Bisulfite-sequencing (CTCF#2): TGGTGGTTTAGTTTGTTTTGAATTAAGA (Forward) ACCCACCTTAAAATAAAAATTAAAACCAA (Reverse) | IDT | N/A |
| Locus-specific Bisulfite-sequencing (CTCF#3): ATTGTGATTGGTTGTGTTTTATATGGTGTA (Forward) AATCCTAAATCCAAACCCAAATCC (Reverse) | IDT | N/A |
| Locus-specific Bisulfite-sequencing (CTCF#4): ATGAAATGTAGTAATGTTTTTTGTATATGG (Forward) ACTCTATCCTTTAAAAAACAACCCC (Reverse) | IDT | N/A |
| Recombinant DNA | ||
| U6-sgRNA-CMV-Cas9-T2A-eGFP (piggybac construct) | This study | N/A |
| Software and Algorithms | ||
| R version 4.1.3 | R Core Team | https://www.rproject.org |
| Benchling (for sgRNA design) | Benchling | https://www.benchling.com |
| Juicer tools | Github | https://github.com/aidenlab/juicer |
| BWA aln version 0.7.4 | Github | https://github.com/lh3/bwa |
| Picard toolkit 2.9.2 | Broad Institute | https://broadinstitute.github.io/picard/ |
| MEME suite (version 4.11.3-1) | MEME suite | https://memesuite.org/meme/doc/download.html |
| RSEM version 1.3.2 | Github | https://github.com/deweylab/RSEM |
| DESeq2 version 1.40.2 | Bioconductor | https://bioconductor.org/packages/release/bioc/html/DESeq2.html |
| plotgardener version 1.6.1 | Bioconductor | https://bioconductor.org/packages/release/bioc/html/plotgardener.html |
| strawr version 0.0.91 | CRAN | https://cran.r-project.org/web/packages/strawr/index.html |
| GENOVA version 1.0.1 | Github | https://github.com/robinweide/GENOVA |
| Pheatmap version 1.0.12 | CRAN | https://cran.r-project.org/web/packages/pheatmap/index.html |
| SCANPY | Github | https://github.com/scverse/scanpy |
Highlights:
FGF oncogenes insulated from a potent enhancer by multiple redundant CTCF sites
RCMC maps delineate CTCF contacts and boundaries that separate genes from enhancer
Combinatorial disruption of CTCF sites result in aberrant enhancer-promoter contact
Contact between enhancer and promoter initiation sites correlates with FGF3 induction
Acknowledgments
We thank members of the Bernstein and Hansen labs and the Gene Regulation Observatory for thoughtful discussions and feedback. This work was supported by funds from the NCI/NIH Director’s Fund (DP1CA216873 to B.E.B.), the NIH New Innovator Award (DP2GM140938 to A.S.H.), the Gene Regulation Observatory at the Broad Institute and the Bertarelli Foundation at Harvard Medical School.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Interests
B.E.B. declares outside interests in Fulcrum Therapeutics, HiFiBio, Arsenal Biosciences, Chroma Medicine, Cell Signaling Technologies, and Design Pharmaceuticals. V.G. and A.S.H. have filed a provisional patent on RCMC.
References
- 1.Furlong EEM, and Levine M (2018). Developmental enhancers and chromosome topology. Science 361, 1341–1345. 10.1126/science.aau0320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cramer P. (2019). Organization and regulation of gene transcription. Nature 573, 45–54. 10.1038/s41586-019-1517-4. [DOI] [PubMed] [Google Scholar]
- 3.Heinz S, Romanoski CE, Benner C, and Glass CK (2015). The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol 16, 144–154. 10.1038/nrm3949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, et al. (2023). The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511.e40. 10.1016/j.cell.2023.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schoenfelder S, and Fraser P (2019). Long-range enhancer-promoter contacts in gene expression control. Nat. Rev. Genet 20, 437–455. 10.1038/s41576-019-0128-0. [DOI] [PubMed] [Google Scholar]
- 6.Preissl S, Gaulton KJ, and Ren B (2023). Characterizing cis-regulatory elements using single-cell epigenomics. Nat. Rev. Genet 24, 21–43. 10.1038/s41576-022-00509-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hildebrand EM, and Dekker J (2020). Mechanisms and Functions of Chromosome Compartmentalization. Trends Biochem. Sci 45, 385–396. 10.1016/j.tibs.2020.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293. 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Goel VY, Huseyin MK, and Hansen AS (2023). Region Capture Micro-C reveals coalescence of enhancers and promoters into nested microcompartments. Nat. Genet 55, 1048–1056. 10.1038/s41588-023-01391-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, De Gobbi M, Taylor S, Gibbons R, and Higgs DR (2014). Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet 46, 205–212. 10.1038/ng.2871. [DOI] [PubMed] [Google Scholar]
- 11.Kempfer R, and Pombo A (2020). Methods for mapping 3D chromosome architecture. Nat. Rev. Genet 21, 207–226. 10.1038/s41576-019-0195-2. [DOI] [PubMed] [Google Scholar]
- 12.Ganji M, Shaltiel IA, Bisht S, Kim E, Kalichava A, Haering CH, and Dekker C (2018). Real-time imaging of DNA loop extrusion by condensin. Science 360, 102–105. 10.1126/science.aar7831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Davidson IF, and Peters J-M (2021). Genome folding through loop extrusion by SMC complexes. Nat. Rev. Mol. Cell Biol 22, 445–464. 10.1038/s41580-021-00349-7. [DOI] [PubMed] [Google Scholar]
- 14.Hafner A, and Boettiger A (2023). The spatial organization of transcriptional control. Nat. Rev. Genet 24, 53–68. 10.1038/s41576-022-00526-0. [DOI] [PubMed] [Google Scholar]
- 15.Chakraborty S, Kopitchinski N, Zuo Z, Eraso A, Awasthi P, Chari R, Mitra A, Tobias IC, Moorthy SD, Dale RK, et al. (2023). Enhancer-promoter interactions can bypass CTCF-mediated boundaries and contribute to phenotypic robustness. Nat. Genet 55, 280–290. 10.1038/s41588-022-01295-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ren G, Jin W, Cui K, Rodrigez J, Hu G, Zhang Z, Larson DR, and Zhao K (2017). CTCF-Mediated Enhancer-Promoter Interaction Is a Critical Regulator of Cell-to-Cell Variation of Gene Expression. Mol. Cell 67, 1049–1058.e6. 10.1016/j.molcel.2017.08.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gabriele M, Brandão HB, Grosse-Holz S, Jha A, Dailey GM, Cattoglio C, Hsieh T-HS, Mirny L, Zechner C, and Hansen AS (2022). Dynamics of CTCF- and cohesin-mediated chromatin looping revealed by live-cell imaging. Science 376, 496–501. 10.1126/science.abn6583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mach P, Kos PI, Zhan Y, Cramard J, Gaudin S, Tünnermann J, Marchi E, Eglinger J, Zuin J, Kryzhanovska M, et al. (2022). Cohesin and CTCF control the dynamics of chromosome folding. Nat. Genet 54, 1907–1918. 10.1038/s41588-022-01232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Alexander JM, Guan J, Li B, Maliskova L, Song M, Shen Y, Huang B, Lomvardas S, and Weiner OD (2019). Live-cell imaging reveals enhancer-dependent transcription in the absence of enhancer proximity. Elife 8. 10.7554/eLife.41769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Heist T, Fukaya T, and Levine M (2019). Large distances separate coregulated genes in living embryos. Proc. Natl. Acad. Sci. U. S. A 116, 15062–15067. 10.1073/pnas.1908962116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mateo LJ, Murphy SE, Hafner A, Cinquini IS, Walker CA, and Boettiger AN (2019). Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature 568, 49–54. 10.1038/s41586-019-1035-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Benabdallah NS, Williamson I, Illingworth RS, Kane L, Boyle S, Sengupta D, Grimes GR, Therizols P, and Bickmore WA (2019). Decreased Enhancer-Promoter Proximity Accompanying Enhancer Activation. Mol. Cell 76, 473–484.e7. 10.1016/j.molcel.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zuin J, Roth G, Zhan Y, Cramard J, Redolfi J, Piskadlo E, Mach P, Kryzhanovska M, Tihanyi G, Kohler H, et al. (2022). Nonlinear control of transcription through enhancer-promoter interactions. Nature 604, 571–577. 10.1038/s41586-022-04570-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xiao JY, Hafner A, and Boettiger AN (2021). How subtle changes in 3D structure can create large changes in transcription. Elife 10. 10.7554/eLife.64320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tsujimura T, Takase O, Yoshikawa M, Sano E, Hayashi M, Hoshi K, Takato T, Toyoda A, Okano H, and Hishikawa K (2020). Controlling gene activation by enhancers through a drug-inducible topological insulator. Elife 9. 10.7554/eLife.47980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, Grossman SR, Anyoha R, Doughty BR, Patwardhan TA, et al. (2019). Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet 51, 1664–1669. 10.1038/s41588-019-0538-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nasser J, Bergman DT, Fulco CP, Guckelberger P, Doughty BR, Patwardhan TA, Jones TR, Nguyen TH, Ulirsch JC, Lekschas F, et al. (2021). Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243. 10.1038/s41586-021-03446-x.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Karr JP, Ferrie JJ, Tjian R, and Darzacq X (2022). The transcription factor activity gradient (TAG) model: contemplating a contact-independent mechanism for enhancer-promoter communication. Genes Dev. 36, 7–16. 10.1101/gad.349160.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, et al. (2015). Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025. 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Despang A, Schöpflin R, Franke M, Ali S, Jerković I, Paliou C, Chan W-L, Timmermann B, Wittler L, Vingron M, et al. (2019). Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet 51, 1263–1271. 10.1038/s41588-019-0466-z. [DOI] [PubMed] [Google Scholar]
- 31.Symmons O, Pan L, Remeseiro S, Aktas T, Klein F, Huber W, and Spitz F (2016). The Shh Topological Domain Facilitates the Action of Remote Enhancers by Reducing the Effects of Genomic Distances. Dev. Cell 39, 529–543. 10.1016/j.devcel.2016.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hnisz D, Weintraub AS, Day DS, Valton A-L, Bak RO, Li CH, Goldmann J, Lajoie BR, Fan ZP, Sigova AA, et al. (2016). Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458. 10.1126/science.aad9024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Guo YA, Chang MM, Huang W, Ooi WF, Xing M, Tan P, and Skanderup AJ (2018). Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers. Nat. Commun 9, 1520. 10.1038/s41467-018-03828-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu EM, Martinez-Fundichely A, Diaz BJ, Aronson B, Cuykendall T, MacKay M, Dhingra P, Wong EWP, Chi P, Apostolou E, et al. (2019). Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. Cell Syst 8, 446–455.e8. 10.1016/j.cels.2019.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Debaugny RE, and Skok JA (2020). CTCF and CTCFL in cancer. Curr. Opin. Genet. Dev 61, 44–52. 10.1016/j.gde.2020.02.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Flavahan WA, Drier Y, Liau BB, Gillespie SM, Venteicher AS, Stemmer-Rachamimov AO, Suvà ML, and Bernstein BE (2016). Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110–114. 10.1038/nature16490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Flavahan WA, Drier Y, Johnstone SE, Hemming ML, Tarjan DR, Hegazi E, Shareef SJ, Javed NM, Raut CP, Eschle BK, et al. (2019). Altered chromosomal topology drives oncogenic programs in SDH-deficient GISTs. Nature 575, 229–233. 10.1038/s41586-019-1668-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Johnstone SE, Reyes A, Qi Y, Adriaens C, Hegazi E, Pelka K, Chen JH, Zou LS, Drier Y, Hecht V, et al. (2020). Large-Scale Topological Changes Restrain Malignant Progression in Colorectal Cancer. Cell 182, 1474–1489.e23. 10.1016/j.cell.2020.07.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rahme GJ, Javed NM, Puorro KL, Xin S, Hovestadt V, Johnstone SE, and Bernstein BE (2023). Modeling epigenetic lesions that cause gliomas. Cell 186, 3674–3685.e14. 10.1016/j.cell.2023.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tarjan DR, Flavahan WA, and Bernstein BE (2019). Epigenome editing strategies for the functional annotation of CTCF insulators. Nat. Commun 10, 4258. 10.1038/s41467-019-12166-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Williamson I, Kane L, Devenney PS, Flyamer IM, Anderson E, Kilanowski F, Hill RE, Bickmore WA, and Lettice LA (2019). Developmentally regulated expression is robust to TAD perturbations. Development 146. 10.1242/dev.179523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Amândio AR, Beccari L, Lopez-Delisle L, Mascrez B, Zakany J, Gitto S, and Duboule D (2021). Sequential in mutagenesis in vivo reveals various functions for CTCF sites at the mouse cluster. Genes Dev. 35, 1490–1509. 10.1101/gad.348934.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ghavi-Helm Y, Jankowski A, Meiers S, Viales RR, Korbel JO, and Furlong EEM (2019). Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. Nat. Genet 51, 1272–1282. 10.1038/s41588-019-0462-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Blay J-Y, Kang Y-K, Nishida T, and von Mehren M (2021). Gastrointestinal stromal tumours. Nat Rev Dis Primers 7, 22. 10.1038/s41572-021-00254-5. [DOI] [PubMed] [Google Scholar]
- 45.Killian JK, Kim SY, Miettinen M, Smith C, Merino M, Tsokos M, Quezado M, Smith WI Jr, Jahromi MS, Xekouki P, et al. (2013). Succinate dehydrogenase mutation underlies global epigenomic divergence in gastrointestinal stromal tumor. Cancer Discov. 3, 648–657. 10.1158/2159-8290.CD-13-0092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Urbini M, Astolfi A, Indio V, Nannini M, Schipani A, Bacalini MG, Angelini S, Ravegnini G, Calice G, Del Gaudio M, et al. (2020). Gene duplication, rather than epigenetic changes, drives FGF4 overexpression in KIT/PDGFRA/SDH/RAS-P WT GIST. Sci. Rep 10, 19829. 10.1038/s41598-020-76519-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Crane E, Bian Q, McCord RP, Lajoie BR, Wheeler BS, Ralston EJ, Uzawa S, Dekker J, and Meyer BJ (2015). Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244. 10.1038/nature14450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680. 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pugacheva EM, Kubo N, Loukinov D, Tajmul M, Kang S, Kovalchuk AL, Strunnikov AV, Zentner GE, Ren B, and Lobanenkov VV (2020). CTCF mediates chromatin looping via N-terminal domain-dependent cohesin retention. Proc. Natl. Acad. Sci. U. S. A 117, 2020–2031. 10.1073/pnas.1911708117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li Y, Haarhuis JHI, Sedeño Cacciatore Á, Oldenkamp R, van Ruiten MS, Willems L, Teunissen H, Muir KW, de Wit E, Rowland BD, et al. (2020). The structural basis for cohesin-CTCF-anchored loops. Nature 578, 472–476. 10.1038/s41586-019-1910-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Nora EP, Caccianini L, Fudenberg G, So K, Kameswaran V, Nagle A, Uebersohn A, Hajj B, Saux AL, Coulon A, et al. (2020). Molecular basis of CTCF binding polarity in genome folding. Nat. Commun 11, 5612. 10.1038/s41467-020-19283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sartorelli V, and Lauberth SM (2020). Enhancer RNAs are an important regulatory layer of the epigenome. Nat. Struct. Mol. Biol 27, 521–528. 10.1038/s41594-020-0446-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Elmentaite R, Kumasaka N, Roberts K, Fleming A, Dann E, King HW, Kleshchevnikov V, Dabrowska M, Pritchard S, Bolt L, et al. (2021). Cells of the human intestinal tract mapped across space and time. Nature 597, 250–255. 10.1038/s41586-021-03852-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ohishi H, Shinkai S, Owada H, Fujii T, Hosoda K, Onami S, Yamamoto T, Ohkawa Y, and Ochiai H (2023). Transcription-coupled changes in higher-order genomic structure and transcription hub viscosity prolong enhancer-promoter connectivity. bioRxiv. 10.1101/2023.11.27.568629. [DOI] [Google Scholar]
- 55.Barshad G, Lewis JJ, Chivu AG, Abuhashem A, Krietenstein N, Rice EJ, Ma Y, Wang Z, Rando OJ, Hadjantonakis A-K, et al. (2023). RNA polymerase II dynamics shape enhancer-promoter interactions. Nat. Genet 55, 1370–1380. 10.1038/s41588-023-01442-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Taguchi T, Sonobe H, Toyonaga S-I, Yamasaki I, Shuin T, Takano A, Araki K, Akimaru K, and Yuri K (2002). Conventional and molecular cytogenetic characterization of a new human cell line, GIST-T1, established from gastrointestinal stromal tumor. Lab. Invest 82, 663–665. 10.1038/labinvest.3780461. [DOI] [PubMed] [Google Scholar]
- 57.Kramer NE, Davis ES, Wenger CD, Deoudes EM, Parker SM, Love MI, and Phanstiel DH (2022). Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics 38, 2042–2045. 10.1093/bioinformatics/btac057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.van der Weide RH, van den Brand T, Haarhuis JHI, Teunissen H, Rowland BD, and de Wit E (2021). Hi-C analyses with GENOVA: a case study with cohesin variants. NAR Genom Bioinform 3, lqab040. 10.1093/nargab/lqab040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, Sisu C, Wright JC, Armstrong J, Barnes I, et al. (2021). GENCODE 2021. Nucleic Acids Res. 49, D916–D923. 10.1093/nar/gkaa1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wolf FA, Angerer P, and Theis FJ (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15. 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Accession information of published data sets analyzed in the manuscript are included in the key resources table along with information for original datasets. Original data generated in this study have been deposited at GEO (GSE241927) and are publicly available as of the date of publication.
This paper does not report any original code.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
