SUMMARY
Mammalian genomes are folded into topologically associating domains (TADs), consisting of chromatin loops anchored by CTCF and cohesin. Some loops are cell-type specific. Here we asked whether CTCF loops are established by a universal or locus-specific mechanism. Investigating the molecular determinants of CTCF clustering, we found that CTCF self-association in vitro is RNase sensitive and that an internal RNA-binding region (RBRi) mediates CTCF clustering and RNA interaction in vivo. Strikingly, deleting the RBRi impairs about half of all chromatin loops in mESCs and causes deregulation of gene expression. Disrupted loop formation correlates with diminished clustering and chromatin binding of RBRi mutant CTCF, which in turn results in a failure to halt cohesion-mediated extrusion. Thus, CTCF loops fall into at least two classes: RBRi-independent and RBRi-dependent loops. We speculate that evidence for RBRi-dependent loops may provide a molecular mechanism for establishing cell-specific CTCF loops, potentially regulated by RNA(s) or other RBRi-interacting partners.
In Brief
CTCF is an architectural protein that mediates chromatin looping. Here, Hansen et al. demonstrate that an internal RNA-binding region (RBRi) in CTCF mediates CTCF clustering and that deletion of the RBRi causes disruption of about half of all chromatin loops in mouse embryonic stem cells.
Graphical Abstract
INTRODUCTION
Mammalian genomes are organized at multiple scales ranging from nucleosomes (hundreds of base pairs) to chromosome territories (hundreds of megabases) (Hansen et al., 2018a). At the intermediate scale of kilobases to megabases, mammalian interphase chromosomes are organized into local units known as topologically associating domains (TADs) (Dixon et al., 2012; Nora et al., 2012). TADs are characterized by the feature that two loci within the same TAD contact each other more frequently, whereas two equidistant loci in adjacent TADs contact each other less frequently. Thus, TADs are thought to regulate contact probability between enhancers and promoters and therefore influence gene expression (Dekker and Mirny, 2016; Merkenschlager and Nora, 2016; Rowley and Corces, 2018; Symmons et al., 2014).
Mechanistically, CCCTC-binding factor (CTCF) and the cohesin complex are hypothesized to form TADs through a loop extrusion mechanism: the cohesin ring complex entraps chromatin and extrudes intra-chromosomal chromatin loops until encountering convergently oriented chromatin-bound CTCF molecules on both arms of the loop, halting cohesin-mediated extrusion (Alipour and Marko, 2012; Fudenberg et al., 2016, 2017; Ganji et al., 2018; Sanborn et al., 2015). CTCF and cohesin then hold together a TAD as a chromatin loop until these loop anchor proteins dissociate from chromatin. Thus, both loop extrusion and chromatin loop maintenance are likely dynamic processes (Fudenberg et al., 2016; Hansen et al., 2017, 2018a). Consistent with a key role for CTCF and cohesin, TADs and chromatin loops largely disappear after acute depletion of CTCF and cohesin (Gassler et al., 2017; Nora et al., 2017; Rao et al., 2017; Schwarzer et al., 2017; Wutz et al., 2017). Moreover, CTCF and several cohesin subunits are among the most frequently mutated proteins in cancer (Hnisz et al., 2017; Lawrence et al., 2014), while disruption of TAD boundaries can cause developmental defects (Lupiáñez et al., 2015).
However, despite their critical role in shaping the three-dimensional (3D) genome organization, we know surprisingly little mechanistically about CTCF and cohesin. Although it is clear that CTCF binds DNA through its 11-ZF domain, the function of CTCF’s largely unstructured N- and C-terminal domains remain mostly unknown (Martinez and Miranda, 2010; Merkenschlager and Nora, 2016). For example, it is not clear which domain(s) in CTCF are required for its interaction with cohesin and for loop formation. These observations motivated us to investigate whether a universal molecular mechanism controls CTCF and cohesion-anchored loops, or whether distinct classes of CTCF-loops exist.
Along these lines, we and others have recently shown that CTCF forms clusters and foci in cells (Hansen et al., 2017; Zirkel et al., 2018), and TADs are often demarcated by multiple CTCF binding sites (Kentepozidou et al., 2019). Beyond CTCF, recent work has clearly shown that many proteins are non-homogeneously distributed in the nucleus and dynamically exchanging between regions of local high concentration, termed clusters, condensates, or hubs (Boehning et al., 2018; Cho et al., 2018; Chong et al., 2018). Although in some cases weak and transient protein-protein interactions are sufficient to form and maintain clusters, several examples exist in which nucleic acids can nucleate and/or stabilize protein clusters or hubs (Banani et al., 2017; Chong et al., 2018; McSwiggen et al., 2019; Shin and Brangwynne, 2017). However, the functional role of clustering is poorly understood. We have previously shown that both CTCF and cohesin are clustered in mammalian nuclei (Hansen et al., 2017) and recently that protein-protein interactions play a dominant role in cohesin self-association (Cattoglio et al., 2019). We therefore chose to investigate the molecular determinants of CTCF clustering in cells and their role in regulating 3D genome organization and chromatin looping.
Here, through an integrated approach combining genome editing, single-molecule and super-resolution imaging, in vitro biochemistry, PAR-CLIP, ChIP-seq, RNA sequencing (RNA-seq) and Micro-C, we identify critical functions of an RNA-interaction domain C-terminal to CTCF’s ZF 11 (RBRi). Specifically, we show that the RBRi mediates CTCF clustering and that loss of the RBRi disrupts only a subset of CTCF-mediated chromatin loops and affects the expression of 500 genes. Our genome-wide analyses suggest that CTCF boundaries can be classified into at least two sub-classes: RBRi dependent and RBRi independent. More generally, our work reveals a potential mechanism for establishing and maintaining specific CTCF loops, which may direct the establishment of cell type-specific chromatin topology during development.
RESULTS
CTCF Self-Associates in an RNA-Dependent Manner
We have previously shown that CTCF forms clusters in mouse embryonic stem cells (mESCs) and human U2OS cells (Hansen et al., 2017), and others have reported that CTCF forms larger foci in senescent cells (Zirkel et al., 2018). But what is the mechanisms underlying CTCF cluster formation? Because clusters necessarily arise through direct or indirect self-association, we took a biochemical approach to probe if and how CTCF self-associates. Because CTCF overexpression causes artifacts and alters cell physiology (Hansen et al., 2017; Rasko et al., 2001), we used CRISPR/Cas9-mediated genome editing to generate a mESC line in which one CTCF allele was 3xFLAGHalo tagged and the other allele was V5-SNAPf tagged (C62; Figures 1A and 1B). Consistent with CTCF clustering, when we immunoprecipitated V5-tagged CTCF, FLAG-tagged CTCF was pulled down along with it (co-immunoprecipitation [coIP]; Figure 1C; additional replicate and quantifications in Figures S1A and S1B). Conversely, immunoprecipitation of FLAG-tagged CTCF also co-precipitated significant amounts of V5-tagged CTCF (Figure S1C). This observation using endogenously tagged CTCF confirms and extends earlier studies that observed CTCF self-association using exogenously expressed CTCF (Pant et al., 2004; Saldaña-Meyer et al., 2014; Yusufzai et al., 2004). But what is the mechanism of CTCF self-interaction? Benzonase treatment, which degrades both DNA and RNA (Figure S1D), strongly reduced the coIP efficiency (Figures 1C, 1D, and S1A–S1C) whereas treatment with DNaseI had a significantly weaker effect on the CTCF self-coIP efficiency (Figure S1E). By contrast, treatment with RNase A alone severely impaired CTCF self-interaction (Figures 1C, 1D, and S1A–S1C). We conclude that CTCF self-associates in a biochemically stable manner in vitro that is largely RNA dependent and largely DNA independent.
An RNA-Binding Region (RBRi) in CTCF Mediates RNA Binding and Clustering
Our finding that CTCF self-association is predominantly RNA mediated is perhaps surprising, as CTCF is generally thought of as a DNA-binding protein. However, it confirms studies by Saldaña-Meyer et al. (2014), who also showed that CTCF self-association depends on RNA but not DNA. Importantly, Saldaña-Meyer et al. (2014) described an RNA-binding region (RBR) spanning ZFs 10 and 11 and the entire C terminus, and within this region identified 38 amino acids C-terminal to CTCF’s ZF 11 that are necessary for RNA binding and for CTCF multimerization in vitro (Figure 2A). We refer henceforth to this required internal region in the RBR as the RBRi. We therefore asked whether CTCF clustering in cells is also RBRi dependent. The RBRi largely corresponds to mouse CTCF exon 10, which we endogenously and homozygously replaced with a 3xHA tag in C59 Halo-CTCF mESCs (Hansen et al., 2017) to generate clone C59D2 ΔRBRi (Halo-ΔRBRi-CTCF = Halo-CTCFD576–611); Figures 2A, 2B, and S1F). ΔRBRi-CTCF mESCs express a full-length CTCF in which most of the RBRi (36 amino acids: N576–D611) have been substituted with a short linker (GDGAGLINS) followed by a 3xHA tag, preserving the original exon 10 structure and length. Interestingly, while Halo-ΔRBRi-CTCF protein levels are only mildly reduced compared with Halo-WT-CTCF, as measured by flow cytometry in live cells (Figures 2C and S1G), ΔRBRi-CTCF mESCs showed a ~2-fold growth defect, suggesting that the RBRi plays an important physiological role (Figure 2D).
First, we sought to confirm if the RBRi is required for RNA binding. Because CTCF was previously shown to bind the anti-sense transcript of human p53, hWRAP53 RNA (Saldaña-Meyer et al., 2014), we purified recombinant WT-CTCF (r-WT-CTCF) or ΔRBRi-CTCF (r-ΔRBRi-CTCF) from insect cells (Figure S1I) and tested binding to hWRAP53 RNA in vitro. We observed ~3-fold reduction in hWRAP53 RNA for ΔRBRi-CTCF compared with WT-CTCF in vitro (Figure 2E; additional replicates in Figure S1J). Thus, the RBRi mediates RNA binding but is not absolutely required for it. Next, we tested if the RBRi also mediates RNA binding in cells using photo-activatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) (Hafner et al., 2010). ΔRBRi-CTCF mESCs showed substantially lower RNA binding, using 32P-radiolabeled RNA as the readout, compared with WT-CTCF mESCs (Figure 2F). Consistent with our in vitro experiments (Figure 2E), ΔRBRi-CTCF mESCs showed reduced, but not abolished, RNA binding. Taken together, we conclude that CTCF directly interacts with RNA and that the RBRi significantly contributes to RNA binding by CTCF but that some RNA binding remains after RBRi loss. This is consistent with CTCF bearing multiple, perhaps partially redundant, RNA-binding regions (Saldaña-Meyer et al., 2019, in this issue of Molecular Cell).
To test if the RBRi also mediates CTCF clustering, we performed super-resolution photo-activated localization microscopy (PALM) imaging in fixed mESCs. We labeled Halo-CTCF with the PA-JF549 dye (Grimm et al., 2016), localized individual CTCF molecules inside the nucleus with a precision of ~13 nm (Figure S1H), and reconstructed CTCF nuclear organization. Indeed, WT-CTCF (Figure 2G) showed noticeably higher clustering than ΔRBRi-CTCF (Figure 2H), which we further verified and quantified using Ripley’s L function (Besag, 1977; Boehning et al., 2018; Ripley, 1976) (Figure 2I; L[r]−r values above 0 indicate clustering). We note that Ripley’s L function is normalized by abundance such that lower clustering for ΔRBRi-CTCF is not due simply to lower protein levels. These results suggest that CTCF largely self-associates in an RBRi-dependent manner and that CTCF clustering is significantly reduced, though not entirely abolished, in ΔRBRi-CTCF mESCs.
Because our RNA-binding experiments suggest that CTCF directly interacts with at least some RNA(s) (Figures 2E and 2F), it is tempting to speculate that RNA(s) directly bind CTCF and hold together CTCF clusters in vivo. However, our PALM and coIP experiments cannot distinguish between a mechanism in which several CTCF proteins directly bind RNA from a model in which CTCF indirectly interacts with an unknown factor, which then mediates CTCF self-association in an RNase-sensitive manner. We also note that the RBRi region has been reported to be regulated by CK2-mediated phosphorylation (El-Kady and Klenova, 2005; Klenova et al., 2001). Although the RBRi contains a putative nuclear localization signal (NLS), ΔRBRi-CTCF is still nuclear (Figures 2G, 2H, and 2J), consistent with prior work showing that nuclear localization and DNA binding are unaffected upon mutating (Klenova et al., 2001) or deleting (Saldaña-Meyer et al., 2014) the RBRi. Finally, although CTCF is clearly not generally misfolded in our ΔRBRi-CTCF mESCs, we cannot exclude slight effects on adjacent protein regions (e.g., ZF10–11 and the C-terminal regions), which could also contribute to the effects observed here.
The CTCF RBRi Regulates 3D Genome Organization, but Not Compartments
CTCF plays a major role in regulating 3D genome organization. We therefore next investigated whether impaired CTCF clustering (Figures 2G–2I), self-association (Figures 1C and 1D), RNA interaction (Figures 2E and 2F), and target searching of ΔRBRi-CTCF (Hansen et al., 2018b) might also affect 3D genome organization, using a high-resolution genome-wide chromosomal conformation capture (3C) assay, Micro-C. Unlike Hi-C, which uses restriction enzymes, Micro-C fragments chromatin to single nucleosomes using micrococcal nuclease and generates 3D contact maps of the genome at all biologically relevant resolutions (Hsieh et al., 2015, 2016). Originally developed for analyzing the small yeast genome, here we have adapted a Micro-C protocol for large-genome organisms. Micro-C successfully recapitulates all the 3D genome features previously identified by Hi-C (Figures S2 and S3; see Data S1 for the protocol). We applied this Micro-C protocol to C59 (WT-CTCF) and C59D2 (ΔRBRi-CTCF) mESCs (Figure 2A) over three replicates and generated ~668 million and ~694 million unique contacts, respectively. To test Micro-C, we assayed both reproducibility and consistency. Our Micro-C contact maps were highly reproducible between replicates (Figure S2), and the contact maps in WT-CTCF mESCs were consistent with Hi-C maps in mESCs, though notably, Micro-C reached “loop resolution” at substantially lower sequencing depth (Figure S3A). We also performed CTCF and cohesin (Smc1a) chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) in two replicates for WT-CTCF and ΔRBRi-CTCF mESCs (see below). We then surveyed 3D genome organization and analyzed features across several scales (Figure 3A) including compartments, TADs, loops, and stripes (Fudenberg et al., 2017), and began our analysis at the large end of the scale: compartments.
Mammalian chromosomes can be divided into two major compartments (Lieberman-aiden et al., 2009): A compartments, composed mainly of active euchromatin, and B compartments, composed mainly of inactive and gene-poor heterochromatin and lamina-associated domains (van Steensel and Belmont, 2017). We observed no significant change in compartmentalization when comparing WT-CTCF and ΔRBRi-CTCF mESCs (Figure 3B), nor did we observe significant changes in A-A, A-B, B-A, or B-B contact frequency (Figure 3C). Moreover, averaged over the whole genome, we observed the same contact probability scaling with genomic distance for WT-CTCF and ΔRBRi-CTCF mESCs (Figure 3D). We conclude that the CTCF RBRi does not affect the global distribution of active and inactive chromatin, consistent with compartments being largely unaltered after near-complete CTCF degradation (Nora et al., 2017; Wutz et al., 2017).
Loss of CTCF RBRi Disrupts a Subset of TADs
Having analyzed compartments, we next zoomed in and analyzed TADs. TADs are demarcated by a pair of strong boundaries, or insulators, which are frequently bound by the architectural proteins CTCF and cohesin and typically span lengths of ~100 kb to ~1 Mb in mouse and human genomes (Merkenschlager and Nora, 2016; Rowley and Corces, 2018). TADs are characterized by the feature that two loci inside the same TAD contact each other more frequently than two equidistant loci in different TADs (Dixon et al., 2012; Nora et al., 2012). We defined TADs using either arrowhead or insulation score (Crane et al., 2015; Rao et al., 2014) and arbitrarily chose a cut-off value to obtain ~3,500 TADs in WT-CTCF mESCs, corresponding to the previously reported TAD size and number (Forcato et al., 2017). Although the inferred number and size of TADs depends on the algorithm and the resolution of the maps (Forcato et al., 2017), we generally observed fewer and larger TADs in ΔRBRi-CTCF mESCs (Figures 4A and S4A). In brief, our insulation analysis called 3,666 and 2,793 TADs with average TAD sizes of ~715 kb and ~936 kb in WT-CTCF and ΔRBRi-CTCF mESCs, respectively (Figure 4B). We next aggregated over all TADs genome-wide (Figure 4C) and found TADs to be somewhat weaker in ΔRBRi-CTCF mESCs and characterized by weaker insulation strength (Figures 4D and S4B–S4E).
We next inspected local regions that were altered in ΔRBRi-CTCF mESCs, superimposing Micro-C and ChIP-seq results. Of note, when using spike-in normalization for ChIP-seq analysis, the ΔRBRi-CTCF signal appeared globally reduced compared to WT-CTCF, while Smc1a binding was largely unaltered at preserved sites (~60% of WT Smc1a binding sites; Figures S5B and S5C). Because biochemical experiments showed reduced stability of the ΔRBRi-CTCF protein after cell lysis (Figure 6A), we could not determine whether the dampened ChIP-seq signal resulted from reduced ChIP efficiency, diminished genomic occupancy of ΔRBRi-CTCF, or both. We thus decided to normalize data by sequencing depth instead and avoid direct comparisons between WT-CTCF and ΔRBRi CTCF ChIP-seq signals to draw conclusions. When inspecting local genomic regions, we noticed that CTCF and cohesin (Smc1a) binding was strongly depleted at some specific loci at ΔRBRi-affected boundaries (Figure 4E, blue arrows in a and b). Conversely, CTCF and cohesin binding was largely retained at unaffected boundaries (Figure 4E, browser track c). We conclude that the RBRi contributes to CTCF’s role in forming TADs. This is unlikely to be an indirect effect, because (1) the cell cycle phase distribution was identical between WT-CTCF and ΔRBRi-CTCF mESCs, despite the growth defect of the latter (Figures S4F and S4G); (2) although the ΔRBRi-CTCF expression level was somewhat lower (reduced by 28%) compared with WTCTCF (Figures 1C and S1G), Nora et al. (2017) previously demonstrated that TAD organization in mESCs is preserved for the most part even after 85% reduction of CTCF levels; and (3) fluorescence recovery after photobleaching (FRAP) experiments show that the residence time for binding to cognate sites is approximately the same for WT-CTCF and ΔRBRi-CTCF (Hansen et al., 2018b).
CTCF LoopsFall intoRBRi-Dependentand-Independent Sub-classes, and Loss of the CTCF RBRi Causes Longer Stripes
Many TADs show corner peaks of “C” signal at their summit, suggesting that they are held together as loop structures (Fudenberg et al., 2017; Rao et al., 2014) (see also Figures 3A and 4C). These loops are thought to be formed when pairs of chromatin-bound CTCF proteins block a loop-extruding cohesin (Fudenberg et al., 2017), yet the protein domain(s) in CTCF required for this are unknown. To test whether the RBRi plays any role in loop formation and/or maintenance, we analyzed the contact maps at high resolution (~1–5 kb) and identified ~14,372 loops in WT-CTCF mESCs using the method described by Rao et al. (2014). Overall, out of 14,372 called loops, 57% (8,189 loops) were weakened by at least 1.5-fold in ΔRBRi-CTCF mESCs and 39% (5,490 loops) by at least 2-fold relative to wild type (WT; Figures 5A and 5B), and loop strength was reproducible between replicates (Figure S4H). We next performed genome-wide loop aggregation analysis. The loop strength in C59 WT-Halo-CTCF mESCs is about as strong as in mESCs with untagged CTCF (Bonev et al., 2017) (Figure S4I), confirming that our endogenously tagged Halo-CTCF mESCs behave as WT mESCs (Hansen et al., 2017). However, the loop strength was greatly reduced in ΔRBRi-CTCF mESCs (Figures 5C and S4I). As a comparison, we reanalyzed Hi-C data at loops in mESCs with a CTCF degradation tag from Nora et al. (2017) and found that the loss in loop strength upon near complete CTCF degradation is actually comparable with the defect in loop strength we observe for ΔRBRi-CTCF mESCs (Figures S4I and S4K). Although technical differences between Micro-C and Hi-C make a direct comparison difficult, these results nevertheless emphasize the loop strength defect in ΔRBRi-CTCF mESCs.
Surprisingly, the effect of deleting the RBRi was highly heterogeneous: some CTCF loops were unaffected or even strengthened, whereas others were significantly weakened or completely disrupted in ΔRBRi-CTCF mESCs (Figure 5D). Qualitatively, we could distinguish two general categories of loops: an RBRi-independent class (Figure 5D, left) and an RBRi-dependent class (Figure 5D, right). When we overlaid the ChIP-seq tracks on the Micro-C contact maps, we noticed that CTCF and cohesin (Smc1a) binding was largely preserved at the anchors of RBRi-independent loops, as expected. However, we could distinguish at least two sub-types of loops that were lost in ΔRBRi-CTCF mESCs: (1) partial or complete loss of ΔRBRi-CTCF and/or cohesin binding at least at one loop anchor (Figure 5D, type 1 loops) and (2) no significant change in either ΔRBRi-CTCF or cohesin binding (Figure 5D, type 2 loops). Thus, whereas loop loss for type 1 loops can be explained through loss of CTCF and/or cohesin binding, differential changes in CTCF and cohesin binding cannot readily explain loss of type 2 loops. We discuss the mechanistic implications of these findings in greater detail below.
Finally, we analyzed stripes or flames (Fudenberg et al., 2017). We compiled contact matrices using the top 10,000 WT-CTCF ChIP signals at the center of the plot and found that stripes in ΔRBRi-CTCF mESCs are less intense at shorter distances (<200 kb from the CTCF peaks) but continue for ~200 kb longer than in WT-CTCF cells (Figures 5E and 5F; red arrow; examples in Figure 5G). Although the mechanistic basis of stripes remains unclear, the loop extrusion model posits that they are formed by cohesin-mediated extrusion ((Fudenberg et al., 2017); Figure 5H). We speculate that longer stripes in ΔRBRi-CTCF mESCs could be due to ~200 kb larger TADs in ΔRBRi-CTCF mESCs (Figures 4B and S4D). If cohesin has to extrude longer, on average, to reach a functional CTCF site in ΔRBRi-CTCF mESCs, this might result in longer stripes, as outlined in Figure 5E. In summary, our Micro-C analysis reveals that the CTCF RBRi domain regulates genome organization at the level of TADs, loops, and stripes in mESCs, without affecting A and B compartments.
Loss of the CTCF RBRi Reveals Distinct Sub-classes of TADs and Loops
We next asked why some CTCF boundaries depend on the RBRi but others do not (Figure 5D). First, we tested whether the RBRi is required for CTCF interaction with cohesin using coIPs. Both WT-CTCF and ΔRBRi-CTCF immunoprecipitation pulled down cohesin (subunits Rad21 and Smc1a in Figures 6A and S5A). This is especially notable because the protein stability of ΔRBRi-CTCF during the IP procedure was significantly reduced (compare CTCF inputs in Figures 6A and S5A). Thus, CTCF interacts with cohesin in an RBRi-independent manner, implying that loop loss is not due simply to a failure of ΔRBRi-CTCF to interact with cohesin.
Next, we analyzed our CTCF and cohesin (Smc1a) ChIP-seq data for WT-CTCF and ΔRBRi-CTCF mESCs in more detail (Figures 6B, S5B, and S5C). Our ChIP-seq data were both reproducible between replicates and consistent with other studies in mESCs (Figure S6). Consistent with FRAP experiments, which showed no detectable change in residence time at cognate binding sites for ΔRBRi-CTCF (Hansen et al., 2018b), ΔRBRi-CTCF still binds the majority of CTCF sites, although the number and occupancy levels were generally reduced (63% of 81,785 WT-CTCF ChIP-seq peaks maintained in ΔRBRi-CTCF mESCs, Figure S5C; spike-in normalized ChIP-seq in Figure S5B). Similarly, about 60% of the cohesin binding sites detected in WT-CTCF mESCs were also occupied in ΔRBRi-CTCF mESCs (Figure S5C).
To further dissect the site-specific features from the genome-wide average, we divided loops into four quartiles (Figure 6C), such that Q1 contains loops that are largely lost in ΔRBRi-CTCF mESCs and Q4 contains loops that are largely unaffected or even strengthened in ΔRBRi-CTCF mESCs. We then characterized the CTCF and Smc1a binding profiles at both anchors of loops and only analyzed loops that satisfy three prerequisites: (1) CTCF shows ChIP-seq signal at both anchors in WT cells, (2) cohesin (Smc1a) shows ChIP-seq signal at both anchors in WT cells, and (3) a pair of convergent CTCF cognate sites are present at both anchors. We then analyzed CTCF and cohesin (Smc1a) ChIP enrichment at the filtered loop anchors for each quartile (Figure 6D). Consistent with a key role for CTCF and cohesin, Q1 loops that were disrupted the most in ΔRBRi-CTCF mESCs also had the lowest CTCF and cohesin occupancy in ΔRBRi-CTCF mESCs (see also histograms in Figure S5D), while they were just as strongly, if not more, occupied as Q2–Q4 loops in WTCTCF mESCs.
Our qualitative analysis in Figure 5D suggested that RBRi-dependent loops can be subdivided into two types depending on their CTCF and cohesin dependence. If this interpretation is correct and robust, we should be able to recover these types naturally after applying an unsupervised clustering algorithm. To test this, we applied k-means clustering (using k = 3) on the most affected loops (Q1) and recovered three loop clusters, similar to Figure 5D (Figures 6E and S5E). Cluster 1 and 2 loops (76%) are lost because of partial and near complete loss of CTCF and cohesin binding, respectively (type 1 in Figure 5D); cluster 3 loops (24%) are affected loops without strong CTCF and cohesin loss (type 2 in Figure 5D). Thus, this analysis confirms our qualitative assessment in Figure 5D.
Could the CTCF loop type be encoded in the DNA-binding sequence motif? We performed de novo motif discovery on the four loop quartiles and observed distinct CTCF binding sequence preferences and potential co-regulators (Figures S7A and S7B). We conclude that loops can be classified into two classes, RBRi dependent and RBRi independent, and that the RBRi-dependent class can be further sub-classified into two types with distinct CTCF and cohesin binding profiles, and that each class correlates with a distinct CTCF DNA-binding motif preference.
Finally, we asked which other genomic features correlate with RBRi-dependent versus RBRi-independent loops. We performed an extensive bioinformatics comparison using 70 previously published datasets in mESCs (Figure S7C). Notably, Q4 loops that were not disrupted in ΔRBRi-CTCF mESCs correlated with transcriptionally active genomic regions (enhancers, promoters; Figure 6F) and were more frequently found in the A compartment (Figure S7D), which is generally associated with active genes. In contrast, Q1 loops were relatively larger and more enriched in the B compartment, which is generally associated with transcriptional repression. These results, albeit inherently correlative, argue against a “cis-model” in which nascent RNA transcripts stabilize CTCF boundaries in an RBRi-dependent manner. Instead, because active sites of transcription are enriched at TAD boundaries (Dixon et al., 2012; Merkenschlager and Nora, 2016), it seems plausible that active transcription may compensate for CTCF boundary weakening in Q4 loops through a CTCF-independent mechanism.
Loss of CTCF RBRi Affects Gene Expression
To evaluate the functional impact of CTCF RBRi deletion on gene expression, we compared RNA-seq of total, ribo-depleted RNA extracted from ΔRBRi-CTCF mESCs with that obtained from WT-CTCF mESCs (two replicates each). A stringent differential expression analysis between the two cell lines (edgeR false discovery rate % 0.05 and DESeq2 adjusted p value % 0.05; see STAR Methods) revealed 496 deregulated genes upon loss of CTCF RBRi, 275 being upregulated and 221 being downregulated, with a mean fold change of ~2.7 (Figures 7A and S7; complete gene list in Table S2; Gene Ontology analysis in Table S3).
Do gene expression changes correlate with the partial loss of CTCF and cohesin binding and altered chromatin loops described above in ΔRBRi-CTCF mESCs? Indeed, genes that were downregulated in ΔRBRi-CTCF mESCs compared with WT-CTCF mESCs had a higher probability to lie nearby a disrupted Smc1a binding site than any random set of unaltered genes (Figures 7B and S7E). In contrast, upregulated genes were not detectably closer to disrupted Smc1a peaks (Figure 7B). The transcription start site (TSS) of downregulated genes was also significantly closer than that of upregulated genes to CTCF peaks disrupted in ΔRBRi-CTCF mESCs (Figure S7G). Consistent with these observations, acute depletion of most CTCF protein revealed that early downregulated genes, but not upregulated genes, tended to be close to an affected CTCF site (Nora et al., 2017). Nevertheless, both downregulated and upregulated genes were located closer than the control unchanged gene sets to Q1 loop anchors, the most severely disrupted in ΔRBRi-CTCF mESCs (Figures 7C and S7F; Q2–4 in Figures S7H–S7J). Inspecting single genomic loci, we found several examples of both upregulated and downregulated genes proximal to the anchors of loops disrupted in ΔRBRi-CTCF mESCs (Figures 7D and S7L). Notably, several—and certainly more than expected by chance—of the deregulated genes in ΔRBRi-CTCF mESCs changed in the same direction as seen after acute CTCF depletion in mESCs (Nora et al., 2017) (Figure S7K; full overlap analysis in Table S2). Taken together, these results show that the CTCF RBRi regulates both chromatin looping and gene expression.
DISCUSSION
In this study, we have identified unexpected roles for an internal RNA-binding region (RBRi) in CTCF. We confirmed that CTCF self-associates in a largely RNA-mediated manner (SaldañaMeyer et al., 2014) (Figure 1C) and now demonstrate that the CTCF RBRi contributes to RNA binding, CTCF self-association, and clustering in vivo (Figure 7E). Moreover, we surprisingly find that almost half of all CTCF loops are lost in ΔRBRi-CTCF mESCs, suggesting that CTCF-mediated loops can be classified into at least two major classes (Figure 7F): RBRi-independent and RBRi-dependent CTCF loops. Intriguingly, this may provide a means for differentially engaging or disrupting specific CTCF loops during development and cellular differentiation (Bonev et al., 2017; Pękowska et al., 2018). We discuss some of the implications below.
How Do CTCF and Cohesin Interact?
Despite their critical role in 3D genome organization, we know surprisingly little mechanistically about CTCF and cohesin. Although the related SMC-complex condensin has been observed to extrude loops in vitro (Ganji et al., 2018), in vitro single-molecule studies of cohesin failed to detect extrusion (Davidson et al., 2016; Kanke et al., 2016; Stigler et al., 2016). Moreover, whether a hypothetical cohesin-based extrusion complex would exist as a single ring or perhaps as a pair of rings remains unclear and a matter of active debate (Cattoglio et al., 2019; Kim et al., 2019; Nasmyth, 2011; Skibbens, 2016). Finally, how CTCF and cohesin interact in vivo remains to be elucidated. Xiao et al. (2011) reported that the 575–611 region in human CTCF interacts directly with the SA2 subunit of cohesin and that interaction with the other cohesin subunits is indirect. This region largely corresponds to the RBRi and is entirely deleted in our ΔRBRi-CTCF mESCs. Nevertheless, we observed robust coIP of the cohesin subunits Rad21 and Smc1a with ΔRBRi-CTCF (Figure 6A; Figure S5A). Similarly, coIP between human ΔRBRi-CTCF with the cohesin subunit SA1 was observed (Saldaña-Meyer et al., 2014). Therefore, both our new studies and that of Saldaña-Meyer et al. (2014) show that ΔRBRi-CTCF can still interact with cohesin, which contradicts the findings of Xiao et al. (2011). We suggest that fully elucidating how CTCF and cohesin interact should be an important direction for future research.
What Does the CTCF RBRi Bind?
We find that CTCF self-association is strongly reduced upon treatment with RNase A in vitro (Figure 1C) and that ΔRBRi-CTCF shows substantially less clustering in cells (Figures 2G–2I). Consistently, the CTCF RBRi was reported on the basis of fractionation studies to be necessary for CTCF multimerization in vitro (Saldaña-Meyer et al., 2014). Saldaña-Meyer et al. (2014) also reported that CTCF directly binds the hWRAP53 RNA and that ZF10–11 contributes to RNA binding. Here, we show that ΔRBRi-CTCF shows substantially reduced, but not abolished, RNA binding in vitro (Figure 2E) and in cells (Figure 2F). After the present work appeared on bioRxiv, Saldaña-Meyer et al. (2019) further identified two additional RNA-binding regions in CTCF ZF1 and ZF10. Loss of ZF1 or ZF10 impairs RNA binding by CTCF as assayed using PAR-CLIP and causes deregulation of gene expression in mESCs (Saldaña-Meyer et al., 2019). Taken together with the results reported here, this suggests that CTCF interacts with RNA(s) through several protein regions, including ZF1, ZF10, and the RBRi. However, although our results clearly show that the CTCF RBRi is required for about half of all chromatin loops and mediates CTCF clustering, we do not know the mechanism at this stage. Specifically, our results cannot distinguish a model in which RNA(s) directly bound by the CTCF RBRi regulates looping and clustering, from indirect models in which the CTCF RBRi binds another factor, which then indirectly contributes to CTCF self-association and clustering in an RNase-sensitive manner and to loop formation (Figure 7E). Moreover, we note that serine residues in the RBRi are differentially phosphorylated during stem cell differentiation (El-Kady and Klenova, 2005; Rigbolt et al., 2011).
Nevertheless, it is worth considering other CTCF-RNA interactions that have been reported beyond Wrap53. CTCF has been reported to directly bind the lincRNAs HOTTIP (Wang et al., 2018), CCAT1-L (Xiang et al., 2014), and Firre (Yang et al., 2015); the RNA Jpx has been reported to evict CTCF from the X chromosome (Sun et al., 2013); CTCF has been shown to bind RNAs specifically and with high affinity in vitro (Kung et al., 2015); and CTCF was also reported to bind the RNA helicase p68/DDX5 together with the noncoding RNA, SRA (Yao et al., 2010). Finally, CTCF was identified as an RNA-binding protein in three recent independent screens for RNA-binding proteins (Brannan et al., 2016; Caudron-Herger et al., 2019; He et al., 2016), and transcription elongation by RNA Pol II can displace both CTCF and cohesin from chromatin (Heinz et al., 2018). However, there are likely many more CTCF RBRi interaction partners, and identifying these will be an important but challenging future endeavor.
There Are at Least Two Classes of CTCF Binding Sites and Chromatin Loops
The loop extrusion model can elegantly explain most experimental observations through a parsimonious mechanism (Fudenberg et al., 2017). In the model’s simplest form, any correctly oriented chromatin-bound CTCF should block cohesin-mediated loop extrusion. Accordingly, all CTCF binding sites should form loops. However, only a minority of CTCF binding sites form loops visible in Hi-C contact maps (Merkenschlager and Nora, 2016; Rao et al., 2014). Why is that? At a minimum, this suggests that not all CTCF sites are equivalent and that only a subset of CTCF sites can stabilize loops. Accordingly, we show here that CTCF sites fall into at least two distinct classes: RBRi-dependent and RBRi-independent sites.
How is the RBRi dependence of a CTCF binding site determined? CTCF binds DNA through 11 ZFs, and which ZFs contribute to DNA binding is somewhat idiosyncratic and binding site dependent (Hashimoto et al., 2017; Nakahashi et al., 2013; Yin et al., 2017). Although the core CTCF DNA motif is bound by the central ZFs, only the upstream motif is bound by ZF9–11 (Nakahashi et al., 2013). Because the RBRi is just downstream of ZF9–11 (Figures 1A and 2A), it is tempting to speculate that depending on whether ZF9–11 are engaged in DNA binding, there could be allosteric control over which potential RBRi interaction partners would be engaged. Consistent with this interpretation, we observed distinct DNA motifs bound by RBRi-dependent and RBRi-independent CTCF loops (Figures S7A and S7B).
Does CTCF Clustering Contribute to Halting CohesinMediated Loop Extrusion?
Within the context of the loop extrusion model, it is unclear how a small ~3- to ~5-nm-sized protein, CTCF, would efficiently block a large and rapidly extruding cohesin complex with a lumen of 40–50 nm—and do so in an orientation-specific manner (Guo et al., 2015; Rao et al., 2014; Vietri Rudan et al., 2015; de Wit et al., 2015). We previously showed that CTCF forms clusters in mESCs and U2OS cells (Hansen et al., 2017), and Zirkel et al. (2018) reported that CTCF forms large foci in senescent cells. Here, we now show that CTCF clustering is partly mediated by the RBRi and, simultaneously, that the RBRi is required for a large subset of loops. It is thus tempting to speculate that cluster and loop formation are related: in particular, RBRi-mediated CTCF clustering could make CTCF a more efficient boundary to cohesin-mediated extrusion in at least two ways (Figure 7G): (1) a cluster containing several CTCF proteins, aided by binding to polymers such as RNA, should be much larger and thus more efficient at arresting cohesin than a single chromatin-bound CTCF protein, and (2) if CTCF binds cohesin through a specific protein region, having more CTCFs present would increase the probability of a correct encounter between this target interaction surface and cohesin.
Loss of the CTCF RBRi Causes Deregulation of Gene Expression
Here we demonstrate that loss of the CTCF RBRi causes deregulation of ~500 genes (Figure 7A) as well as loss of about half of all chromatin loops (Figure 5B). Similarly, disruption of two other RNA-binding regions in CTCF also causes deregulation of ~400–500 genes (Saldaña-Meyer et al., 2019), whereas CTCF depletion for 4 days causes deregulation of 4,996 genes (Nora et al., 2017).Compared with suchauxin-induced depletion studies (Nora et al., 2017; Saldaña-Meyer et al., 2019; Wutz et al., 2017), one advantage of the endogenous deletion approach that we use here is that no residual WT-CTCF protein remains to confound interpretation. However, a disadvantage of our approach is that we cannot readily distinguish acute and direct effects of CTCF on transcription from indirect effects (e.g., deregulation of a gene by CTCF, which then causes indirect deregulation of other genes). Nevertheless, we do observe that deregulated genes tend to be closer to a disrupted loop compared with genes whose expression did not change (Figure 7C). This is consistent with chromatin looping directly contributing to the regulation of gene expression, although only for a subset of genes and only modestly (average fold change ~2.7). Taken together with (Nora et al., 2017; Saldaña-Meyer et al., 2019), our work emphasizes that CTCF is a significant regulator of transcription, although the fraction of genes whose expression is directly affected by CTCF and chromatin looping in a given cell type remains unclear.
Regulation of CTCF Loops during Differentiation and Development
An enduring paradox has been the fact that CTCF and cohesin are present in all cell types. Thus, if they were the only factors forming loops and TADs, how can we explain the observation that some TADs and loops change during differentiation (Bonev et al., 2017; Pękowska et al., 2018)? Here we report that CTCF loops can be divided into at least two classes: RBRi dependent and RBRi independent. Moreover, within the RBRi dependent CTCF loop class, we identify at least two types (Figures 5D and 6E). Having multiple types of CTCF boundaries provides potential mechanisms through which individual boundaries can be regulated. For example, if CTCF RBRi-dependent boundaries function in part by binding other proteins or RNAs, then regulating the abundance or function of these yet to be identified factors would provide a potential mechanism for distinct cell types to regulate specific boundaries and CTCF loops during development and differentiation (Figure 7H). Ultimately, this may enable cells to dissolve and form new CTCF-mediated chromatin loops during development and differentiation to regulate enhancer-promoter contacts and establish proper cell type-specific gene expression programs.
STAR★METHODS
LEAD CONTACT AND MATERIALS AVAILABILITY
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Robert Tjian (jmlim@berkeley.edu).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cell culture
JM8.N4 mouse embryonic stem cells (Pettitt et al., 2009) (male mESCs; Research Resource Identifier: RRID:CVCL_J962; obtained from the KOMP Repository at UC Davis) were grown and handled as described previously (Hansen et al., 2017). Briefly, mES cells were grown on plates pre-coated with a 0.1% autoclaved gelatin solution (Sigma-Aldrich, St. Louis, MO, G9391) under feeder free conditions in knock-out DMEM with 15% FBS and LIF (full recipe: 500 mL knockout DMEM (ThermoFisher, Waltham, MA, #10829018), 6 mL MEM NEAA (ThermoFisher #11140050), 6 mL GlutaMax (ThermoFisher #35050061), 5 mL Penicillin-streptomycin (ThermoFisher #15140122), 4.6 μL 2-mercapoethanol (Sigma-Aldrich M3148), 90 mL fetal bovine serum (HyClone Logan, UT, FBS SH30910.03 lot #AXJ47554)) and LIF. mES cells were fed by replacing half the medium with fresh medium daily and passaged every two days by trypsinization. Cell lines were pathogen tested (IMPACT II test for mESC C59) as described previously (Hansen et al., 2017). All cell lines will be provided upon request.
METHOD DETAILS
CRISPR/Cas9-mediated genome editing
Genome-editing was performed as previously described (Hansen et al., 2017). Briefly, we co-transfected cells with a repair plasmid and a plasmid encoding Cas9 and the sgRNA (using 2 μg and 1 μg, respectively, per well in a 6-well plate). The Cas9 plasmid was slightly modified from that distributed from the Zhang lab (Ran et al., 2013): 3xFLAG-SV40NLS-pSpCas9 was expressed from a CBh promoter; the sgRNA was expressed from a U6 promoter; and mVenus was expressed from a PGK promoter. We generally designed 2–4 sgRNAs per knock-in and transfected each (or two of them when necessary) in a separate well. The day of transfection, we pooled all transfected cells and FACS-sorted for transfected cells using the mVenus encoding by the Cas9 plasmid. For edits where there was no tag added (e.g., to replace the RBRi with 3xHA), we immediately plated single clones after the FACS. But for knock-ins with tags, e.g., 3xFLAG-Halo-CTCF or V5-SNAPf-CTCF, we first grew up cells and then labeled cells with dye (Halo-TMR for 3xFLAGHalo-CTCF; SNAP-JF646 for V5-SNAPf-CTCF) and then did a second round of FACS-sorting to increase the efficiency. Selected cells were plated at very low density (~0.1 cells per mm2), and single colonies were then picked, expanded and genotyped by PCR. Successfully edited clones were further verified by PCR followed by Sanger sequencing and western blotting.
For knock-ins with 2 different tags, we generated them using the above protocol in 2 steps. We first isolated a heterozygous knockin clone for one tag and we then re-edited that clone to introduce the second tag. This was the case for C62, where one of the diploid CTCF alleles is V5-SNAPf-tagged and the other 3xFLAG-Halo-tagged. We first isolated a clone with a correct V5-SNAPf-tagged CTCF allele and a null CTCF allele, where non-homologous end joining event following Cas9 cleavage introduced a 4-nucleotide deletion (81_84delACGC), leading to a premature stop codon. We then designed sgRNAs specific for the null CTCF allele and retargeted this clone with a 3xFLAG-Halo-CTCF repair vector. To build the repair vectors, we modified a pUC57 plasmid to contain the tag of interest flanked by 500 bp of genomic homology sequence on either side (IDT gBLocks). To prevent the Cas9-sgRNA complex from cutting the repair vector, we introduced synonymous mutations in the first nine codons after the ATG. To link the SNAP and Halo proteins to CTCF, we used the Sheff and Thorn linker (GDGAGLIN) (Sheff and Thorn, 2004) and a TEV linker sequence (EDLYFQS), respectively. mESC clones were screened using a three-primer PCR (two genomic primers external to the left and right homology sequences, and one internal to the tag).
To endogenously and homozygously delete the RBRi region in the previously published C59 mESC line (Hansen et al., 2017), we generated by Gibson Assembly a repair vector modifying a pBlueScript II SK (+) plasmid to contain the Sheff and Thorn linker followed by a 3xHA tag (Figure S1F), and flanked by ~500 bp of genomic homology sequence on either side. mESC clones were screened using a three-primer PCR (one genomic primer external to the left homology sequence, one internal to the right homology region, and an internal HA primer). Notably, we failed to generate clones with a simple deletion of the RBRi, possibly because shortening of the already small exon 10 (only 135 bp-long, 27 bp upon RBRi deletion) causes exon skipping and aberrant splicing.
All plasmids used in the editing are available upon request as are any of the cell lines. See Table S1 for sgRNA and primer sequences.
Cell Cycle phase analysis
Cell cycle phase analysis was performed using the Click-iT EdU Alexa Fluor 488 Flow Cytometry Assay Kit (ThermoFisher Scientific Cat. # C10425) according to manufacturer’s instructions, but with minor modifications. C59 mESCs (Halo-CTCF; Rad21-SNAPf) and C59D2 mESCs (ΔRBRi-Halo-CTCF; Rad21-SNAPf) were grown overnight in a 6-well plate and labeled with 10 μM EdU for 30 min at 37°C/5.5% CO2 in a TC incubator (one well was unlabeled, as a negative control). Cell were harvested, washed with 1% BSA in PBS, permeabilized (using 100 μL 1x Click-iT saponin-based permeabilization and wash reagent (Component D; see kit manual), mixed well and then incubated for 15 min. 0.5 mL Click-iT reaction was added to each tube and incubated for 30 min in the dark. Cells were washed with 1x Click-iT saponin-based permeabilization and wash reagent and resuspended in 1x Click-iT saponin-based permeabilization and wash reagent with DAPI (5 ng/mL) and incubated for 10 min. Cells were then spun down and re-suspended in 1% BSA in PBS and FACS performed on a LSR Fortessa Cytometer. DAPI fluorescence was excited using a 405 nm laser and collected using a 450/50 bandpass emission filter. Alexa Flour 488 fluorescence was excited using a 488 nm laser and collected using a 525/50 bandpass emission filter. Cells were gated based on forward and side scattering using identical settings for C59 and C59D2 mESCs. Cell cycle analysis was performed using custom-written MATLAB code using identical settings for C59 and C59D2 mESCs as illustrated in Figures S4F and S4G. Three independent biological replicates were performed.
CTCF FACS abundance quantification
FACS was performed as previously described (Hansen et al., 2017). We grew C59 mESCs (Halo-CTCF; Rad21-SNAPf) and C59D2 mESCs (ΔRBRi-Halo-CTCF; Rad21-SNAPf) overnight in a 6-well plate and labeled 1 well with 500 nM Halo-TMR (Promega Cat. # G8521) and left 1 well unlabeled (negative control for baseline fluorescence). Cells were labeled for 30 min at 37°C/5.5% CO2 in a TC incubator, washed with PBS and incubated with medium for 5 min in a TC incubator. Cells were then washed again with PBS, harvested, filtered and fluorescence quantified in live cells on a LSR Fortessa Cytometer, exciting fluorescence with a 561 nm laser and collecting fluorescence through a 610/20 bandpass emission filter. Live cells were gated based on forward and side scattering (using identical settings for C59 and C59D2 mESCs) using custom-written MATLAB code and the relative abundance quantified as the relative background-subtracted mean fluorescence as illustrated in Figures 2C and S1G.
Growth Assay
When passaging cells, two processes contribute to the apparent growth rate: 1) the fraction of cells that survive passaging and 2) the growth rate. To compare exclusively the growth rate of mESC C59 Halo-CTCF and mESC C59D2 ΔRBRi-Halo-CTCF, we therefore took the following approach. On day 0, we plated 250,000 cells in 2 wells in a 6-well plate. On day 1, we collected and counted the number of cells from 1 well. This gave us the number of cells that survived plating. Let this number be N1. On day 2, we then collected and counted the number of cells from the second well. Let this number be N2 and the time between the measurements be Δτ. The doubling time is then given by:
We performed 4 biological replicates and grew C59 and C59D2 side-by-side at the same time and handled them identically. The bargraph in Figure 2D shows the mean and standard error of the mean from the 4 replicates.
PALM
PALM was performed as previously described (Hansen et al., 2017) but with minor modifications. C59 mESCs (Halo-CTCF; Rad21SNAPf) and C59D2 mESCs (ΔRBRi-Halo-CTCF; Rad21-SNAPf) were grown overnight on MatriGel coated plasma-cleaned 25 mm circular no 1.5H cover glasses (Marienfeld, Germany, High-Precision 0117650), labeled with 500 nM PA-JF549 (Grimm et al., 2016) for 30 min at 37°C/5.5% CO2 in a TC incubator, washed twice (medium removed; PBS wash; fresh medium for 5 min), and then fixed in 4% Formaldehyde / 0.2% Glutaraldehyde in PBS for 20 min at 37°C, washed with PBS and then imaged in PBS with 0.01% (w/v) NaN3 on the same day. All PALM movies were acquired at room temperature using continuous HiLo illumination on the same microscope as previously described (Hansen et al., 2017). We used the following laser lines: main excitation laser (561 nm for PA-JF549) and photo-activation laser (405 nm). However, the intensity of the 405 nm laser was gradually increased over the course of the illumination sequence to image all molecules and at the same time avoid too many molecules being activated at any given frame. The following camera settings were used: 25 ms exposure time; frame transfer mode; vertical shift speed: 0.9 μs; ROI: variable. In total, 40,000 frames were recorded for each cell (~20 min), which was sufficient to image and bleach all labeled molecules at an effective pixel size of 106.67 nm, which resulted in a mean localization error (defined as the standard deviation) of ~13–14 nm (Figure S1H). We recorded 6–10 movies per cell line per day (and always imaged both C59 and C59D2 on the same day) and performed 3 biological replicates. Each movie contained several nuclei (generally 3–6), which improved the robustness of the algorithmic drift-correction (Elmokadem and Yu, 2015). We obtained and analyzed a total of 52 cells for C59 and 46 cells for C59D2.
Molecules in PALM data were localized using a custom-written MATLAB implementation of the MTT-algorithm ((Sergé et al., 2008); code is available on GitLab: https://gitlab.com/tjian-darzacq-lab/SPT_LocAndTrack) and the following settings: Localization error: 10−6; deflation loops: 0. After localization, the data was analyzed as described below using code available on GitLab: https://gitlab.com/anders.sejr.hansen/palm_pipeline
PALM analysis
Full details on PALM analysis as well as code to reproduce our results are available on GitLab: https://gitlab.com/anders.sejr.hansen/palm_pipeline. Here we summarize the major steps. First, drift-correction and merging of blinks is achieved through the main script “DriftCorrectMergeBlinks.m,” which calls a number of functions and runs in parallel as default, so the parallel processing toolbox in MATLAB is necessary. Drift-correction is first performed using a custom-modified implementation of BaSDI (Elmokadem and Yu, 2015) (“BaSDI_ASH”). This is achieved through the function “IterativeBaSDI_DriftCorrect.m” using FramesBin = 2000; PixelBin = 10; Iterations = 5. Compared with BaSDI, the main difference is that we found multiple iterations to be necessary to reach convergence and we have therefore custom-written the wrapper “IterativeBaSDI_DriftCorrect.m” to achieve this. Since the inferred drift is binned according to “FramesBin,” we use linear interpolation to drift-correct each frame. Once drift-correction has been achieved, we merge photo-blinking using a custom implementation of SimpleTracker (https://www.mathworks.com/matlabcentral/fileexchange/34040-simple-tracker), which was modified to be substantially more memory-efficient for large PALM movies (SimpleTracker_ASH). An important aspect of PALM, especially with very photo-stable dyes such as PA-JF549 (Grimm et al., 2016), is that the same molecule can appear in multiple adjacent frames and also blink such that there are gaps. It is therefore essential to link these appearances, which we accomplish using SimpleTracker’s implementation of nearest neighbor tracking and we allow a maximal linking distance of 75 nm and maximally 2 gaps. We note that 75 nm is quite lenient since the localization error is less than 15 nm, but we chose it so to ensure we fully correct for multiple appearances. For each molecule with multiple appearances, we collapse all the localizations to a single localization and take the x,y coordinates to be the means.
After drift-correcting and merging, individual nuclei are segmented after Gaussian smoothing of reconstructed images using a 60 nm pixel size. Since the movies contain several nuclei, each nucleus is manually segmented using polygon-segmentation. For each nucleus, a series of summary statistics are then displayed and saved (e.g., localization error, number of localization per frame, nuclear reconstructions) and each nucleus is saved to a separate directory together with code for running K-Ripley analysis (Besag, 1977; Boehning et al., 2018; Ripley, 1976) using the ads package in R (Pélissier and Goreaud, 2015) as well as code for running a Bayesian cluster identification algorithm (Rubin-Delanchy et al., 2015).
The R-code for running K-Ripley analysis was written by Herve Marie-Nelly and is described elsewhere (Boehning et al., 2018). The version included here is a slightly modified version and we refer the reader to the tutorial on GitLab for how to run it (requires both Python and R). Finally, the results of the K-Ripley analysis were plotted with “PLOT_K_L_g_Ripley.m” and Figure 2I show the mean and standard error of the mean across the population. More generally, Ripley’s K function analyzes pointillist data. PALM generates pointillist data. Specifically, we have in 2 dimensions the X,Y-coordinates for each CTCF protein inside the nucleus. Ripley’s K function is defined as:
where dij is the Euclidian distance between the ith and jth points, λ is the average density of points, r is the search radius, where the total number of data points (i.e., CTCF protein X,Y-coordinates) is n. I is the indicator function (equal to 1 only if the distance dij is smaller than r; otherwise, 0). K(r) scales as πr2 in 2 dimensions, if CTCF is randomly distributed. For this reason, typically, Ripley’s L function is used instead (this formulation was introduced by Besag in 1977):
Interpreting plots of such as shown in Figure 2I is straightforward: If the data are randomly distribution, = 0. If below 0, there is dispersion (“repulsion”). And if above 0, there is clustering (“attraction” between the CTCF proteins).d
Ripley’s K and L function are normalized for the abundance. In other words, clustering does not depend on protein abundance and the 27.7% lower expression level of ΔRBRi-CTCF cannot explain the lower clustering that we observe. For full details, we refer to the original papers by Ripley and Besag (Besag, 1977; Ripley, 1976).
The example reconstructions of CTCF nuclear localization in Figures 2G and 2H were plotted using ViSP (El Beheiry and Dahan, 2013). Each molecule was plotted using 25 nm (FWHM) and colored according to the neighbor density (0–200 Neighbors (min/max); Neighborhood Radius: 100 nm; Jet colormap (cMin-cMax: 0–0.35) with identical settings for C59 Halo-CTCF and C59D2 ΔRBRi-Halo-CTCF.
Western Blotting
Cells were grown in 6-well plates to confluency, washed twice with ice-cold PBS with protease inhibitors and scraped in 300 μL of high salt lysis buffer (0.5 M NaCl, 25 mM HEPES, 1 mM MgCl2, 0.2 mM EDTA, 0.5% NP-40 and protease inhibitors). Lysates were immediately transferred to 1.5 mL tubes containing 100 μL of 4X protein loading buffer (16% 2-Mercaptoethanol, 200 mM Tris-HCl pH 6.8, 8% SDS, 40% glycerol, 400 mM DTT, 0.4% bromophenol blue), boiled for 20’ and loaded to 8% Bis-Tris protein gels (10 μL per lane). Proteins were transferred onto nitrocellulose membranes (Amershan Protran 0.45 um NC, GE Healthcare) for 2 hr at 100V. Membranes were blocked in TBS-Tween with 10% milk for at least 1 hr at room temperature and blotted with the specified antibodies in TBS-T with 5% milk at 4°C overnight. HRP-conjugated secondary antibodies were diluted 1:5000 in TBS-T with 5% milk and incubated at room temperature for an hour prior to the chemiluminescence reaction. Band intensities were measured with the ImageJ “Analyze Gels” function (Schindelin et al., 2012) and used to calculate IP and CoIP efficiencies.
Co-immunoprecipitation (CoIP) assays
For CoIP experiments, cells were scraped from plates in ice-cold phosphate-buffered saline (PBS) with PMSF and aprotinin, pelleted, and flash-frozen in liquid nitrogen. Cell pellets where thawed on ice, resuspended to 1 ml/10 cm plate of cell lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP-40 and protease inhibitors), and incubated on ice for 10’. Nuclei were pelleted in a tabletop centrifuge at 4°C, at 4000 rpm for 10’, and resuspended to 0.5 ml/10 cm plate of low salt lysis buffer either with or without benzonase (600U/ml) and rocked 4 hr at 4°C. After the incubation the salt concentration was adjusted to 0.2 M NaCl final and the lysates were incubated for another 30’ at 4°C. 50 μL of each lysate were used for DNA and RNA extraction (see below), while the rest was cleared by centrifugation at maximum speed at 4°C and the supernatants quantified by Bradford. In a typical CoIP experiment, 1 mg of proteins was diluted in 1 mL CoIP buffer (0.2 M NaCl, 25 mM HEPES, 1 mM MgCl2, 0.2 mM EDTA, 0.5% NP-40 and protease inhibitors) and precleared for 2 hr at 4°C with protein-A/G Sepharose beads (GE Healthcare Life Sciences) before overnight immunoprecipitation with 4 mg of either normal serum IgGs or specific antibodies. Some pre-cleared lysate was kept at 4°C overnight as input. Protein-A/G-Sepharose beads precleared overnight in CoIP buffer with 0.5% BSA were then added to the samples and incubated at 4°C for 2 hr. Beads were then washed extensively with CoIP buffer, and proteins were eluted by boiling the beads for 5′ in 2X SDS-loading buffer. The immunoprecipitated material was split to two SDS-PAGE gels followed by Western Blotting: 90% of the IP was loaded to probe CoIP efficiencies, while 10% of the IP was loaded to probe IP efficiencies.
CoIP DNA and RNA extraction and quantification
For DNA extraction, 50 μL of lysates were added to 150 μL of CoIP buffer and extracted twice with 200 μL of phenol-chloroform (UltraPure Phenol:Chloroform:Isoamyl Alcohol (25:24:1, v/v)). After centrifugation at room temperature and maximum speed for 5′, the aqueous phase containing DNA was added of 2 volumes of 100% ethanol and precipitated 30’ at −80°C. After centrifugation at 4°C for 20’ at maximum speed, DNA was re-dissolved in 25 μL water and quantified by nanodrop. About 100 ng of the untreated sample DNA, or an equal volume from the nuclease treated samples, were used for relative quantification by quantitative PCR (qPCR) with SYBR Select Master Mix for CFX (Applied Biosystems, ThermoFisher) on a BIO-RAD CFX Real-time PCR system (primer sequences in Table S1).
RNA was extracted from 50 μL of lysates with 500 μL of TRIzol reagent, following manufacturer’s instructions. The RNA pellet was re-dissolved in 25 μL of water and quantified by nanodrop. About 1 μg of the untreated sample RNA, or an equal volume from the nuclease treated samples, was retrotranscribed with SuperScript III Reverse Transcriptase and random examers. cDNA was diluted 1:20 and 2 μL quantified by qPCR as above.
Chromatin immunoprecipitation (ChIP)
Smc1a, CTCF and control IgG ChIP assays were performed in the parental C59 ES cell line (wt-CTCF) and in its derivative clone C59D2 (ΔRBRi-CTCF). Cells were cross-linked for 5′ at room temperature with 1% formaldehyde-containing Knockout D-MEM; cross-linking was stopped by PBS-glycine (0.125 M final). Cells were washed twice with ice-cold PBS, scraped, centrifuged for 10’ at 4000 rpm and flash-frozen in liquid nitrogen. Cell pellets were thawed in ice, resuspended in cell lysis buffer (5 mM PIPES, pH 8.0, 85 mM KCl, and 0.5% NP-40, 1 ml/15 cm plate) and incubated for 10’ on ice. During the incubation, the lysates were repeatedly pipetted up and down every 5 minutes. Lysates were then centrifuged for 10’ at 4000 rpm. Nuclear pellets were measured and resuspended in 6 volumes of sonication buffer (50 mM Tris-HCl, pH 8.1, 10 mM EDTA pH 8.0, 0.1% SDS), incubated on ice for 10’, and sonicated to obtain DNA fragments below 2000 bp in length (Covaris S220 sonicator, 20% Duty factor, 200 cycles/burst, 150 peak incident power, 30–40 cycles of 20” on and 40” off). Sonicated lysates were cleared by centrifugation (20’ at 13200 rpm) and 625–800 μg of chromatin were diluted in RIPA buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA pH 8.0, 0.5 mM EGTA, 1% Triton X-100, 0.1% SDS, 0.1% Na-deoxycholate, 140 mM NaCl) to a final concentration of 0.8 μg/μL, precleared with Protein A Sepharose (GE Healthcare) for 2 hr at 4°C and immunoprecipitated overnight with 6.25–8 μg of normal mouse IgGs (ChromPure rabbit normal IgG; Jackson ImmunoResearch), anti-Smc1a (Abcam ab154769) or anti-CTCF antibodies (Abcam ab128873), which we have extensively validated for ChIP in a previous paper (Hansen et al., 2017). 4% of the precleared chromatin was saved as input. After the overnight incubation, samples were added to 20 μL of Protein A Sepharose beads precleared overnight in RIPA buffer with 0.5% (w/v) BSA and incubated for 2 hr at 4°C. Immunoprecipitated samples were washed 5 times with RIPA buffer, once with LiCl buffer (0.5% NP-40, 0.5% Na-deoxicholate, 250 mM LiCl, 1 mM EDTA pH 8.0), and once with TE. After the last wash, immunoprecipitated complexes were eluted from the beads twice with 150 μL of TE with 1% SDS, each time incubating 30’ in a thermomixer set at 37°C and 900 rpm. To the 300 μL eluted material was added of 1 μL of RNaseA (10 mg/ml) and 18 μL 5M NaCl, and incubated at 67C for 4–5 hr to reverse formaldehyde cross-linking. To inputs were added elution buffer to 300 μL total volume, and subject to the same treatment. To reverse cross-linked samples were added 2.5 volumes of ice-cold ethanol and precipitated overnight at −20C. DNA was pelleted by centrifugation (20’ at 13,200 rpm and 4°C), and pellets resuspended in 100 μL TE, 25 μL 5X PK buffer (50 mM Tris-HCl, pH 7.5, 25 mM EDTA pH 8.0, 1.25% SDS), and 1.5 μL of proteinase K (20 mg/ml), and incubated 2 hr at 45°C. After proteinase K digestion, DNA was purified with the QIAGEN QIAquick PCR Purification Kit, eluted in 60 μL of water and used for ChIP-Seq library preparation as described below.
Expression and purification of recombinant wt-CTCF and ΔRBRi-CTCF proteins
Recombinant Bacmid DNAs for the fusion mouse proteins 3xFLAG-Halo-wt-CTCF-6xHis (1086 amino acids; 123.5 kDa) and 3xFLAG-Halo-ΔRBRi-CTCF-6xHis (1086 amino acids; 123.7 kDa) were generated from pFastBAC constructs according to manufacturer’s instructions (Invitrogen). Recombinant baculovirus for the infection of Sf9 cells was generated using the Bac-to-Bac Baculovirus Expression System (Invitrogen). Sf9 cells (~2×106 /ml) were infected with amplified baculoviruses expressing recombinant wt- or ΔRBRi-CTCF. Infected Sf9 suspension cultures were collected at 48 hr post infection, washed extensively with cold PBS, lysed in 5 packed cell volumes of high salt lysis buffer (HSLB; 1.0 M NaCl, 50 mM HEPES pH 7.9, 0.05% NP-40, 10% glycerol, 10 mM 2-mercaptoethanol, and protease inhibitors), and sonicated. Lysates were cleared by ultracentrifugation, supplemented with 10 mM imidazole, and incubated at 4°C with Ni-NTA resin (QIAGEN) for 90 minutes. Bound proteins were washed extensively with HSLB with 20 mM imidazole, equilibrated with 0.5 M NaCl HGN (50 mM HEPES pH 7.9, 10% glycerol, 0.01% NP-40) with 20 mM imidazole, and eluted with 0.5 M NaCl HGN supplemented with 0.25 M imidazole. Eluted fractions were analyzed by SDS-PAGE followed by staining with PageBlue Protein Staining Solution. Peak fractions were pooled and incubated with antiFLAG M2 Affinity Gel (Sigma) for 3 hr at 4°C. Bound proteins were washed extensively with HSLB, equilibrated to 0.2M NaCl HGN, and eluted with 3xFLAG peptide (Sigma) at 0.4 mg/ml. Protein concentrations were determined by PageBlue staining compared to a BSA standard.
In vitro RNA binding assay
Binding of CTCF recombinant proteins to RNA was assessed in vitro as described by (Saldaña-Meyer et al., 2014) with some modifications. The first exon of human WRAP53 (nucleotides 1–167) was PCR amplified from HEK293T genomic DNA using PrimeSTAR HS DNA Polymerase (Takara R010B) and a forward primer that included a T7 promoter sequence (see Table S1). The gel-purified PCR product (200 ng) served as a template for T7 in vitro transcription (NEB), carried out in a total volume of 30 μL and incubated at 37°C for 4 hours and 30 minutes. The transcribed RNA was added to 30 μL of water and 2 μL RNase-free DNase I and incubated at 37°C for 15 minutes. The total volume was then adjusted to 360 μL with water. 40 μL of 3.3 M sodium acetate pH 5.2 was then added before two sequential extractions with 1 volume of phenol/chloroform followed by ethanol precipitation over night. The RNA pellet was washed with 500 μL of 70% ethanol, resuspended in water and quantified by nanodrop. 4 pmol RNA were incubated with 18.5 pmol of wt- or ΔRBRi-CTCF recombinant proteins in a 40 μL reaction containing 20 μL of 2X low-salt RNA binding buffer (100 mM Tris-HCl pH 7.9, 200 mM KCl, 0.2% NP-40, 1.5 mM MgSO4) at 4°C for 15 minutes. Reactions were added to 20 μL antiFLAG M2 Affinity Gel (Sigma) and rocked at 4°C for at least 1 hour and 30 minutes. FLAG beads were washed twice with 1X high-salt RNA binding buffer (50 mM Tris-HCl pH 7.9, 500 mM KCl, 0.1% NP-40, 0.75 mM MgSO4), resuspended in 500 μL of 1X low-salt RNA binding buffer, and split in half for either RNA extraction or protein analysis. “No protein” reactions containing RNA only were run in parallel to control for pulldown specificity. RNA was extracted with 500 mL of TRIzol reagent, following manufacturer’s recommendation but performing an additional extraction with chloroform prior to the isopropanol precipitation. The RNA pellet was added to 20 μL of 2X RNA loading dye (95% formamide, 0.02% SDS, 0.00625% bromophenol blue, 0.00625% xylene cyanol, 1mM EDTA), dissolved at 55°C for 10 minutes, denatured at 95°C for 5 minutes and placed on ice immediately prior to loading 10 μL to a 5% polyacrylamide urea gel in 1X TBE. RNA was stained with SYBR Gold Nucleic Acid Gel Stain (Invitrogen) in 1X TBE for 10–40 minutes and visualized on a Bio-Rad ChemiDoc imaging system. Proteins were extracted from the FLAG resin by adding 10 μL of 2X protein loading buffer (8% 2-Mercaptoethanol, 100 mM Tris-HCl pH 6.8, 4% SDS, 20% glycerol, 200 mM DTT, 0.2% bromophenol blue), boiling for 5 minutes and adjusting the final volume to 20 μL with low-salt 1X RNA binding buffer. Half of the recovered proteins (10 μl) were loaded to 8% Bis-Tris protein gels and stained with PageBlue. Band intensities were measured with the ImageJ “Analyze Gels” function (Schindelin et al., 2012) and used to calculate RNA pulldown efficiencies, normalizing each RNA sample by total recovered protein.
PAR-CLIP
PAR-CLIP was performed as in (Saldaña-Meyer et al., 2014) with some modifications. Briefly, mESC C59 Halo-wt-CTCF and mESC C59D2 Halo-ΔRBRi-CTCF cells were grown under standard conditions and pulsed with 400 mM 4-SU (Sigma) for 2 h. After washing the plates with PBS, cells were cross-linked with 400 mJ/cm2 UVA (312 nm) using a Stratalinker UV cross-linker (Stratagene). Whole nuclear lysates (WNLs) were obtained by fractionation and nuclei were then incubated for 10 min at 37°C in an appropriate volume of CLIP buffer (20 mM HEPES at pH 7.4, 5 mM EDTA, 150 mM NaCl, 2% EMPIGEN) supplemented with protease inhibitors, 20 U/mL Turbo DNase (Life technologies), and 200 U/mL murine RNase inhibitor (New England Biolabs). After clearing the lysate by centrifugation, immunoprecipitations were carried out using 200 μg of WNLs in the same CLIP buffer for 4 h at 4°C and then added protein G-coupled Dynabeads (Life Technologies) for an additional hour. Contaminating DNA was removed by treating the beads with Turbo DNase (2 U in 20 mL). Cross-linked RNA was labeled by successive incubation with 5 U of Antarctic phosphatase (New England Biolabs) and 5 U of T4 PNK (New England Biolabs) in the presence of 10 mCi [g-32P] ATP (PerkinElmer). Labeled material was resolved on 8% Bis-Tris gels, transferred to nitrocellulose membranes, and visualized by autoradiography.
ChIP-Seq library preparation
ChIP-Seq libraries were prepared independently from two ChIP biological replicates using the Solexa rapid library protocol. Briefly, immunoprecipitated DNA or 50 ng of input DNA was end-repaired, phosphorylated and adenylated in a single 50 μL reaction containing 31.5 μL of DNA, 5 μL of spike-in yeast DNA from MNase treated nucleosomes (10 ng/ml) (Skene and Henikoff, 2017) and 13.5 μL of end-repair/3′ A mix. Reactions were incubated in a thermal cycler for 15’ at 12°C, 15’ at 37°C, 20’ at 72°C, and held at 4°C.
To reactions were added 4 μL of water, 1 μL of Illumina TruSeq adapters, 55 μL of 2x Rapid DNA ligase buffer (Enzymatics #B101L) and 5 μl of DNA ligase (Enzymatics #L6030-HC-L), and incubated for 15’ at 20°C. Ligations were cleaned up twice with AMPure XP beads (Agencourt #A63880) diluted 1:2 with 20% PEG, 1.25M NaCl (first cleanup: 38 μl; beads eluted with 53 μL of 10 mM Tris-HCl pH 8.0, 50 μL transferred to a new tube and added of 55 μL of beads:PEG solution). Final elution volume was in 22 μL of 10 mM Tris-HCl pH 8.0, 20 μL of which were transferred to a new tube and amplified by PCR (45” at 98°C; 14 cycles of 15” at 98°C and 10” at 60°C; 1’ at 72°C; hold at 4°C).
PCR reactions were cleaned up once with 38 μL of AMPure XP beads diluted 1:2 with 20% PEG, 1.25M NaCl and eluted with 33 μL of 10 mM Tris-HCl pH 8.0, 30 μL of which were transferred to a new tube. We assessed library quality and fragment size by qPCR and Fragment analyzer, and sequenced 8–12 multiplexed libraries per lane on the Illumina HiSeq4000 sequencing platform (single endreads, 50 bp long) at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley (supported by NIH S10 OD018174 Instrumentation Grant).
ChIP-Seq analysis
Input, IgG, Smc1a and CTCF ChIP-Seq raw reads from wt-CTCF (C59) and ΔRBRi-CTCF (C59D2) ESCs (16 libraries total) were quality-checked with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and aligned onto the mouse and the yeast genome (mm10 and sacCer3 assembly, respectively) using Bowtie (Langmead et al., 2009), allowing for two mismatches (-n 2) and no multiple alignments (-m 1). We used Samtools ((Li et al., 2009) version 1.9) to sort and index bowtie output .bam files, remove duplicates from mapped reads (rmdup-s) and merge ChIP-Seq replicates, after assessing a good reproducibility between them (Figure S6). Peaks were called with MACS2 (--nomodel--extsize 300) (Zhang et al., 2008) using input DNA as a control. Overlap between ChIP-Seq peaks across samples were computed through Galaxy (Blankenberg et al., 2010; Giardine et al., 2005; Goecks et al., 2010), requiring a minimum 1-bp overlap between peak intervals.
For spike-in control normalization, we performed pairwise comparisons (e.g., C59 input versus C59D2 input) and selected the sample with the lowest number of unique yeast alignments (C59D2 input: 94,642 reads versus C59 input: 119,846 reads). We then used this value to compute a scale factor (sf) for the other sample (C59 sf: 94,642 / 119,846 = 0.79), to be used in the downstream analyses (see below).
To create heatmaps we used deepTools (version 2.4.1) (Ramírez et al., 2016). We first ran bamCoverage (--binSize 50--extendReads 300 -of bigwig) and normalized read numbers to either 1x sequencing depth (--normalizeTo1× 2150570000) or to the spike-in yeast DNA (--scaleFactor sf), obtaining read coverage per 50-bp bins across the whole genome (bigWig files). We then used the bigWig files to compute read numbers across 6 Kb centered on C59 CTCF or Smc1a peak summits as called by MACS2 (computeMatrix reference-point--referencePoint TSS--upstream 3000--downstream 3000--missingDataAsZero--sortRegions no). We sorted the output matrices by decreasing C59D2 enrichment, calculated as the total number of reads within a MACS2 called ChIP-Seq peak. Finally, heatmaps were created with the plotHeatmap tool (--averageTypeSummaryPlot mean--colorMap ‘Blues’--sortRegions no).
To generate the scatterplots in Figure S6A we used deepTools multiBigwigSummary (BED-file mode) on the bigWig output files generated by deepTools bamCoverage, and computed the average scores for each of the files in every CTCF or Smc1a peak called by MACS2 in wt-CTCF mESCs on the merged replicates.
Enriched regions were visualized on the mm10 genome with the Integrative Genomics Viewer (IGV) (Robinson et al., 2011; Thorvaldsdóttir et al., 2013) using the bigWig output files from deepTools bamCoverage.
Previously published data describing CTCF (Chen et al., 2008) and Smc1a (Kagey et al., 2010) binding profiles in mESCs were downloaded from GEO, analyzed with the very same pipeline described above and compared to the data generated in this study (Figures S6C and S6D).
RNA-Seq library preparation and analysis
RNA-Seq was performed in the parental C59 ES cell line (wt-CTCF) and in its derivative clone C59D2 (ΔRBRi-CTCF), with two biological replicates each. RNA was extracted with the QIAGEN RNeasy Mini Kit according to manufacturer’s instructions, lysing cells directly into 6-well plates with buffer RTL plus. 5 μg of the eluted RNA were treated with DNase I in a 25-μl reaction at 37°C for 30’ (Invitrogen DNA-free DNA Removal Kit). DNA-free RNA was quantified by nanodrop and quality checked by Bioanalyzer and 2.5 μg were subjected to ribosomal RNA depletion following Illumina Ribo-zero rRNA Removal Kit’s instructions. Precipitated RNA was resuspended to 17 μl End Repair Mix (ERP) from the TruSeq RNA Sample Preparation v2 Kit (Illumina RS-122–2001) and stored overnight at −80°C until library preparation. RNA fragmentation, first and second strand cDNA synthesis were performed according to the TruSeq RNA Sample Preparation v2 Kit but using Superscript III for reverse transcription instead of Superscript II (50C for 50’ incubation time). cDNA was purified with AMPure XP beads diluted 1:2 with 20% PEG, 1.25M NaCl, and eluted in 38.5 μL 10 mM TrisHCl pH 8.0, 36.5 μL of which were transferred to a new tube and subjected to the ChIP-Seq Solexa rapid library protocol described above.
RNA-Seq raw reads from wt-CTCF (C59) and ΔRBRi-CTCF (C59D2) ESCs (4 libraries total) were quality-checked with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and aligned onto the mouse genome (mm10) using STAR RNA-Seq aligner (Dobin et al., 2013) with the following options:--outSJfilterReads Unique --outFilterMultimapNmax 1 --outFilterIntronMotifs RemoveNoncanonical --outSAMstrandField intronMotif. We used Samtools ((Li et al., 2009) version 1.9) to convert STAR output .sam files into .bam files, and to sort and index them. We then counted how many reads overlapped an annotated gene (GENECODE vM19 annotations) using HTSeq (Anders et al., 2015) (htseq-count --stranded no -f bam--additional-attr gene_name -m union), and used the output counts files to find differentially expressed genes with edgeR (Liu et al., 2015; Robinson et al., 2010) and DESeq2 (Love et al., 2014), both run within the Galaxy platform. We used the following edgeR parameters: genes with ≤ 0.5 counts per million (CPM) in at least 3 samples were filtered out (38956 out of 54445); TMM was the method used to normalize library sizes; the edgeR quasi-likelihood test was used with robust settings (robust = TRUE with estimateDisp and glmQLFit). DESeq2 was run with Galaxy default parameters. 496 genes were called differentially expressed in ΔRBRi-CTCF ESCs compared to wt-CTCF cells by both edgeR (false discovery rate ≤ 0.05) and DESeq2 (adjusted P value ≤ 0.05), 275 of which being upregulated and 221 being downregulated. 5 groups of ~500 genes each were randomly sampled from the unchanged genes with > 0.5 CPM in at least 3 samples as controls for downstream analyses. Gene ontology analysis was performed with DAVID 6.8 Functional Annotation Tool Huang et al., 2009a, 2009b). Gene transcript levels were visualized on the mm10 genome with the Integrative Genomics Viewer (IGV) (Robinson et al., 2011; Thorvaldsdóttir et al., 2013) using the bigWig output files from deepTools bamCoverage (--binSize 50--extendReads 250 --normalizeTo1× 2150570000 -of bigwig).
Previously published RNA-Seq data measuring transcription changes in mESCs 1, 2 and 4 days after CTCF degradation (Nora et al., 2017) were downloaded from GEO, analyzed with the very same pipeline described above and compared to the data generated in this study (Figure S7K; Table S2). Of note, when using our stringent differential expression analysis we found 76, 262 and 3039 deregulated genes at day 1, 2 and 4 after CTCF degradation, respectively, which are significantly fewer than those reported by Nora and coworkers (370 differentially expressed genes 1 day after CTCF depletion, 1353 after 2 days and 4996 after 4 days). We might thus being underestimating the overlap between the RNA-seq data generated in this study and the one reported by Nora et al.
Micro-C
Mammalian Micro-C protocol and analysis were modified from (Hsieh et al., 2016). Here, we briefly summarize the key concepts of Micro-C experiment and data analysis. The detailed step-by-step protocol can be found in Supplemental Protocol.
I. Prepare crosslinked chromatin from cell culture
One to five million of trypsinized mouse embryonic stem cells were directly resuspended and crosslinked with freshly made 1% formaldehyde at room temperature for 10 minutes. Crosslinking reaction was quenched by adding Tris buffer (pH = 7.5) to final 0.75M at room temperature. Crosslinked cells were washed twice by 1x PBS and subjected to the second crosslinking with 3mM DSG crosslinking solution for 45 minutes at room temperature. Cells were snap-frozen and can be stored at −80°0C up to a year.
II. Digest crosslinked chromatin by micrococcal nuclease
Crosslinked cells were permeabilized in ice-cold Micro-C Buffer #1 (50mM NaCl, 10mM Tris-HCl pH = 7.5, 5mM MgCl2, 1M CaCl2, 0.2% NP-40, 1x Protease Inhibitor Cocktail) for 20 minutes. Chromatin from permeabilized cells was digested by pre-titrated concentration of micrococcal nuclease to about 90% of mononucleosomes and 10% dinucleosome at 37°C for 10 minutes. Digestion reaction was stopped by adding EGTA to a final concentration at 4mM and incubated at 65C for 10 minutes to completely deactivate enzyme activity. MNase-digested chromatin was washed twice with ice-cold Micro-C Buffer #2 (50mM NaCl, 10mM Tris-HCl pH = 7.5, 10mM MgCl2).
III. Repair fragment ends
Digested chromatin fragments were then subjected to dephosphorylation, phosphorylation, end-chewing reactions by T4 Polynucleotide Kinase and DNA Polymerase I Klenow Fragment in Micro-C end-repair buffers (50mM NaCl, 10mM Tris-HCl pH = 7.5, 10mM MgCl2, 100ug/mL BSA, 2mM ATP, 5mM DTT, no dNTPs) at 37°C for 30 minutes. Blunt-end reaction was triggered by adding biotindATP, biotin-dCTP, dGTP, and dTTP to a final concentration at 66mM and incubated at 25C for 45 minutes. Fill-in reaction was stopped by 30mM EDTA in 65C for 20 minutes. Chromatin was washed once with ice-cold Micro-C Buffer #3 (50mM Tris-HCl pH = 7.5, 10mM MgCl2).
IV. Proximity ligation and purge biotin-dNTP from unligated ends
Chromatin fragments with biotin-dNTPs were then ligated by T4 DNA ligase at room temperature for at least 2 hours. Unligated ends containing biotin-dNTPs were then removed by 5′ to 3′ exonuclease III in 37°C for at least 15 minutes. Chromatin was subjected to reverse crosslinking and protein digestion in proteinase K buffer (2mg/mL Proteinase K, 1% SDS, 1x TE buffer) in 65°C overnight.
V. Purify dinucleosomal DNA
DNA from Micro-C sample was extracted by Phenol:Chloroform:Isoamyl Alcohol (25:24:1) solution and ethanol precipitation method. DNA was cleaned-up by Zymo DNA clean and concentration column prior to gel size-selection for dinucleosomal DNA at 200 to 400 bp. DNA was then extracted from agarose gel by Zymo Gel purification column.
VI. Micro-C library preparation for deep sequencing
Purified DNA with biotin-dNTPs was captured by Dynabeads MyOne Streptavidin C1. Standard library preparation protocol including end-repair, A-tailing, and adaptor ligation was performed on beads with individual enzymes purchased from Lucigen or NEBnext UltraII kit. Sequencing library was amplified by Kapa HiFi PCR enzyme with lowest possible cycles to reduce PCR duplicates. Sequencing library was sequenced paired-end 50×50 or 100×100 in Illumina HiSeq 4000 sequencer.
VII. Micro-C data analysis
a. Mapping and pairing Micro-C contacts.
Micro-C data were preprocessed by a streamlined pipeline HiC-pro (Servant et al., 2015). Briefly, sequencing reads were mapped to mouse mm10 genome by Bowtie2 in –very-sensitive-local mode. Mapped reads were paired and pairs with multiple hits, low MAPQ, self-circle, and PCR duplicates were removed. Output file containing all valid pairs were used for following analysis.
b. Visualize Micro-C data.
Micro-C data was converted to standard 4DN formats (e.g., .cool or .hic file) with multiple resolutions, typically ranging from 500bp to 10Mb. Cool file can be visualized on Higlass browser (http://higlass.io/) (Kerpedjiev et al., 2018) and hic file can be visualized on Juicebox (https://github.com/aidenlab/Juicebox). All files can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123636. In this study, all snapshots of contact matrices were generated by Higlass browser.
c. Binning and balancing of Micro-C data.
Valid pairs were binned to 500bp resolution by using cooler packages (https://github.com/mirnylab/cooler) for cool file or juicer packages (https://github.com/aidenlab/juicer) for hic files (Durand et al., 2016). Low mappability or noisy regions were precluded prior to matrix balancing. Matrix was balanced by using iterative correction (IC) for cool file or Knight-Ruiz (KR) for hic file. Visually, both normalization methods generate equal quality of contact maps. Multiple resolutions of contact maps can be generated by using matrix coarsen or zoomify functions in cooler package.
d. Contact probability analysis
Only intra-chromosomal contacts were used to calculated contact density in bins with exponentially increasing widths from 100bp to 10Mb. Contacts shorter than 100bp were removed to minimize noises introduced by self-ligation or unligated products. The numbers of intra-chromosomal contacts within the range of distance were calculated and normalized by the all contacts within this range. Plot shown in Figure 3D only included pairs in “UNI” direction to minimize noise from unligated products.
e. Compartment analysis
Chromosomal compartments were identified using principal component analysis (PCA) on contact maps in 100kb resolution. The first component typically represents the compartment profile in mammalian genome – positive eigenvector value enriches with A compartment (gene-rich regions) and negative eigenvalue enriches with B compartment (gene-poor regions). The rank of compartment strength shown in saddle plot was analyzed by rearrangement and aggregation of genome-wide distance-normalized contact matrix in the order of increasing eigenvector values.
f. Domain/boundary analysis
We used two approaches to identify TADs and TAD boundaries/insulators. The detailed methods were described in (Crane et al., 2015) for the insulation score analysis or (Rao et al., 2014) for the arrowhead transformation analysis. Briefly, the optimal condition for calling insulation score was determined by testing multiple sizes of sliding window on 10kb - 100kb resolutions of contact maps. A sliding window at 200kb on 20kb-binned contact maps was used to analyze insulation score in this study. The signal within the sliding window was assigned to the corresponding bin across the entire genome. The insulation score was normalized by calculating the log2 ratio of individual score and the mean of genome-wide averaged insulation score. TAD boundaries/insulators can be identified by calling the local minima along the normalized insulation score. The arrowhead analysis defined as can be thought of as the measurement of the directionality preference of locus i, restricted to contacts at a linear distance of d. Ai,i+d will be strong positive / negative if either one of i,i−d or i,i+d is inside the domain and one another is not, but Ai,i+d will be close to zero if both loci are inside or outside the domain. Assigning this query across the genome, the edges of domain will be sharpened and TADs can be detected. For aggregate domain analysis, the individual TADs were rescaled to the same size as , where Ci,j is a pair of contact loci within a TAD. The rescale matrices were then aggregated at the center of plot with either ICE or distance normalization.
g. Loop analysis
Loops in mouse ES cells were discovered by HiCCUPS as described in (Rao et al., 2014) HICCUPS uses a modified BenjaminiHochberg FDR control procedure to reduce the rate of false positive and identify highly reliable loop annotations on contact map. Loops were called on multiple resolutions (1, 5, and 10kb) of KR-normalized Micro-C contact matrices with a false discovery rate smaller than 0.1. Peak widths and windows of peak-to-merging were set as 5kb for 1kb contact maps and 20kb for 5kb and 10kb contact maps. Genome-wide loop comparison/quantification was assessed by using aggregate peak analysis. All called loops were compiled on a center of 25kb x 25kb matrix with 1kb resolution of KR-normalized data. Loops within 55kb of diagonal were excluded to avoid distance decay effects. The ratio of loop enrichment was calculated by dividing observed contact in a searching window by the expected bottom-left submatrix.
h. Pile-up analysis
The concept of pile-up analysis is similar to aggregate domain analysis described above. Briefly, we used a set of ChIP-Seq peaks of interest (e.g., CTCF ChIP-Seq in this study) as baits to extract 600kb x 600kb snippets of contact map from 5kb resolution of Micro-C data, in which the coordination of ChIP peak was centered at the center point of each snippet. The snippets were then piled-up on the center of plot and normalized by the expected matrix.
i. Motif analysis
Sequences of loop anchors were extracted for CTCF cognate binding motifs scanning by MEME suit (http://meme-suite.org/). We also investigated the sequence enrichment for 20 bp upstream and downstream of CTCF motif. j. Reproducibility analysis
j. Reproducibility analysis
Reproducibility of Micro-C data was measured by four algorithms with different aspects of principles (packages are available at https://github.com/kundajelab/3DChromatin_ReplicateQC). 1). QuASAR (https://github.com/bxlab/hifive): contact matrix was transformed based on correlation matrix of distance-based enrichment. The reproducibility score was calculated by correlation of values in two transformed matrices. 2). GenomeDISCO (https://github.com/kundajelab/genomedisco): contact matrix was smoothed by graph diffusion. The matrix smoothing considers the input matrix as a network, in which nodes equal to the genomic bins and edges are weighted by contact maps. The reproducibility score was calculated by the difference in two smoothed matrices. 3). HiC-Rep (https://github.com/qunhualilab/hicrep): contact matrix was transformed by 2D mean filter. The reproducibility score was measured by the weighted sum of correlation coefficients. 4). HiC-Spector (https://github.com/gersteinlab/HiC-spector): contact map was transformed by eigenvalues of Laplace operator. The reproducibility score was analyzed by the difference of weighted eigenvectors. The detailed principle of algorithm can be found in the provided links.
QUANTIFICATION AND STATISTICAL ANALYSIS
For information about the number of replicates, the meaning of error bars (e.g., standard error of the mean) and other relevant statistical considerations, please see the figure legend associated with the figure showing the data. For information about how data was analyzed and/or quantified, please see the relevant section in METHOD DETAILS and/or the figure legend. And for the code and algorithms used in the analysis, please see the “Software and Algorithms” section of the Key Resources Table.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Antibodies | ||
Anti-ACTB (WB) | Sigma-Aldrich # A2228 | RRID:AB_476697 |
Anti-CTCF (IP, ChIP) | Abcam # ab128873 | RRID:AB_11144295 |
Anti-CTCF (WB) | Millipore # 07–729 | RRID:AB_441965 |
Anti-FLAG (IP) | Sigma-Aldrich # F7425 | RRID:AB_439687 |
Anti-FLAG (WB, IP) | Sigma-Aldrich # F3165 | RRID:AB_259529 |
Anti-H3 (WB) | Abcam # ab1791 | RRID:AB_302613 |
Anti-HA tag (WB) | Abcam # ab9110 | RRID:AB_307019 |
Anti-HaloTag (WB) | Promega # G9211 | RRID:AB_2688011 |
Anti-Smc1a (ChIP) | Bethyl # A300–055A | RRID:AB_2192467 |
Anti-V5 (IP) | Abcam # ab9166 | RRID:AB_307024 |
Anti-V5 (WB) | Thermo Fisher Scientific # R960–25 | RRID:AB_2556564 |
Mouse IgG (IP, ChIP) | Jackson ImmunoResearch Labs # 015–000-003 |
RRID:AB_2337188 |
Rabbit IgG (IP, ChIP) | Jackson ImmunoResearch Labs # 011–000-003 |
RRID:AB_2337118 |
ANTI-FLAG M2 Affinity Gel (in vitro RNA binding assay) | Sigma-Aldrich #A2220 | RRID:AB_10063035 |
Chemicals, Peptides, and Recombinant Proteins | ||
DAPI 4′,6-Diamidine-2′-phenylindole dihydrochloride | Sigma-Aldrich | Cat. # 10236276001 |
HaloTag TMR ligand | Promega | Cat. # G8251 |
HaloTag PA-JF549 ligand | Grimm et al., 2016 | N/A |
Benzonase | Millipore | Cat. # 71205 |
RNase A | Thermo Scientific | Cat. # EN0531 |
DNase I | Ambion | Cat. # AM2222 |
Formaldehyde | Polysciences | Cat. # 1881420 |
10 mM dNTPs | KAPA Biosystems | Cat. # KK1017 |
10 mM ATP | New England Biolabs | Cat. # P0756S |
5 U/μl T4 DNA polymerase | Invitrogen | Cat. # 18005025 |
5 U/μl Taq DNA polymerase | ThermoFisher Scientific | Cat. # EP0401 |
KAPA HS HIFI polymerase | KAPA Biosystems | Cat. # KK2502 |
Rapid DNA ligase | Enzymatics | Cat. # L6030-HC-L |
AMPure XP beads | Agencourt | Cat. # A63880 |
DSG (disuccinimidyl glutarate) | ThermoFisher Scientific | Cat. # 20593 |
Micrococcal Nuclease | Worthington Biochem | Cat. # LS004798 |
100mM ATP | ThermoFisher Scientific | Cat. # R1441 |
DNA Polymerase I, Large (Klenow) Fragment | New England Biolabs | Cat. # M0210 |
T4 Polynucleotide Kinase | New England Biolabs | Cat. # M0201 |
T4 DNA Ligase | New England Biolabs | Cat. # M0202 |
Exonuclease III (E. coli) | New England Biolabs | Cat. # M0206 |
Biotin-14-dATP | Jena Bioscience | Cat. # NU-835-BIO14 |
Biotin-11-dCTP | Jena Bioscience | Cat. # NU-809-BIOX |
20X Proteinase K solution | Sigma Aldrich | Cat. # 3115879001 |
Dynabeads MyOne Streptavidin C1 | ThermoFisher Scientific | Cat. # 65001 |
SuperScript III Reverse Transcriptase | Invitrogen | Cat. # 18080044 |
SYBR Gold Nucleic Acid Gel Stain | Invitrogen | Cat. # S11494 |
PageBlue Protein Staining Solution | ThermoFisher Scientific | Cat. # 24620 |
TRIzol Reagent | ThermoFisher Scientific | Cat. # 15596026 |
Ni-NTA agarose | QIAGEN | Cat. # 30210 |
3X FLAG Peptide | Sigma Aldrich | Cat. # F4799 |
recombinant 3xFLAG-Halo-wt-CTCF-6xHis protein | This manuscript | r-wt-CTCF |
recombinant 3xFLAG-Halo-ΔRBRi-CTCF-6xHis protein | This manuscript | r-ΔRBRi-CTCF |
EMPIGEN BB detergent | Sigma-Aldrich | Cat. # 30326 |
Critical Commercial Assays | ||
Click-iT EdU Alexa Fluor 488 Flow Cytometry Assay Kit | ThermoFisher Scientific | Cat. # C10425 |
NEBNext Ultra II | New England Biolabs | Cat. # E7645 |
End-It DNA End-Repair | Lucigen | Cat. # ER81050 |
RNeasy Mini Kit | QIAGEN | Cat. # 74104 |
DNA-free DNA Removal Kit | Invitrogen | Cat. # AM1906 |
Ribo-Zero rRNA Removal Kit | Illumina | Cat. # MRZH116 |
TruSeq RNA Sample Preparation v2 Kit | Illumina | Cat. # RS-122–2001 |
HiScribe T7 Quick High Yield RNA Synthesis Kit | New England Biolabs | Cat. # E2050S |
Bac-to-Bac Baculovirus Expression System | ThermoFisher Scientific | Cat. # 10359016 |
Deposited Data | ||
Raw imaging data (all raw data for This manuscript) | This manuscript and Hansen et al., 2018b | https://zenodo.org/record/2208323 |
Micro-C, ChIP-Seq and RNA-Seq data | This manuscript | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123636 |
Raw images and uncropped gels and blots deposited in Mendeley Data | This manuscript | DOI: https://doi.org/10.17632/5zdrpcsbt9.2 |
CTCF ChIP-Seq in mESCs | Chen et al., 2008 | GSE11431 |
Smc1a ChIP-Seq in mESCs | Kagey et al., 2010 | GSE22562 |
RNA-seq in mESCs after CTCF degradation | Nora et al., 2017 | GSE98671 |
Experimental Models: Cell Lines | ||
Mouse: JM8.N4 mouse embryonic stem cells | Pettitt et al., 2009 and UC Davis KOMP Repository | https://www.komp.org/pdf.php?cloneID=8669 |
mESC C59 FLAG-Halo-CTCF; Rad21- SNAPf-V5 (knock-in) |
Hansen et al., 2017 | C59 |
mESC C59D2 ΔRBRi-FLAG-Halo-CTCF (e10::3xHA); Rad21-SNAPf-V5 (knock-in) | This manuscript | C59D2 or ΔRBRi |
mESC C62 3xFLAG-Halo-CTCF (allele 1); V5-SNAPf-Halo-CTCF (allele 2) (both knock-in) | This manuscript | C62 |
Oligonucleotides | ||
See Table S1 | N/A | |
Recombinant DNA | ||
pBlueScript SK II (+) | Addgene # 212205 | GenBank: X52328.1 |
pBSII HR mCtcf. delRBRi(link-3XHA) Repair Vector | This manuscript | N/A |
pUC57 V5 Snap(f) mCTCF Repair Vector | This manuscript | N/A |
pUC57 3xFLAG Halo TEV mCTCF Repair Vector | This manuscript | N/A |
pFastBac Dual Expression Vector | ThermoFisher Scientific | Cat. # 10712024 |
Software and Algorithms | ||
MATLAB 2014b | The Mathworks | 2014b |
PALM analysis pipeline (MATLAB) | This manuscript | https://gitlab.com/anders.sejr.hansen/palm_pipeline |
FCSREAD (MATLAB) | Mathworks File Exchange | https://www.mathworks.com/matlabcentral/fileexchange/8430-flow-cytometry-data-reader-and-visualization |
MTT-Algorithm (MATLAB implementation) | Sergé et al., 2008 | https://gitlab.com/tjian-darzacq-lab/SPT_LocAndTrack |
SimpleTracker (MATLAB) | Jean-Yves Tinevez | https://www.mathworks.com/matlabcentral/fileexchange/34040-simple-tracker |
Spatial Point Patterns Analysis (ads package on CRAN) | Pélissier and Goreaud, 2015 | https://cran.r-project.org/web/packages/ads/index.html |
RStudio | RStudio | https://www.rstudio.com |
BaSDI (MATLAB) | Elmokadem and Yu, 2015 | https://github.com/jiyuuchc/BaSDI |
ImageJ (https://imagej.net/) | Schindelin et al., 2012 | RRID:SCR_003070 |
Samtools | Li et al., 2009 | RRID:SCR_002105 |
Integrative Genomics Viewer |
Robinson et al., 2011; Thorvaldsdóttir et al., 2013 |
RRID:SCR_011793 |
Bowtie | Langmead et al., 2009 | RRID:SCR_005476 |
deepTools | Ramírez et al., 2016 | RRID:SCR_016366 |
FastQC | http://www.bioinformatics.babraham.ac.uk/projects/fastqc | RRID:SCR_014583 |
MACS2 | Zhang et al., 2008 | https://github.com/taoliu/MACS/ |
Python 3.7 | Python | https://www.python.org/ |
Anaconda 3.7 | Anaconda | https://www.anaconda.com/ |
HiC-Pro | Servant et al., 2015 | https://github.com/nservant/HiC-Pro |
Cooler | Abdennur and Mirny, 2019 | https://github.com/mirnylab/cooler |
Juicer tools | Durand et al., 2016 | https://github.com/aidenlab/juicer |
Higlass | Kerpedjiev et al., 2018 | http://higlass.io/ |
STAR | Dobin et al., 2013 | https://github.com/alexdobin/STAR |
HTSeq | Anders et al., 2015 | https://pypi.org/project/HTSeq/ |
edgeR | Robinson et al., 2010 | https://bioconductor.org/packages/release/bioc/html/edgeR.html |
Liu et al., 2015 | ||
Galaxy |
Blankenberg et al., 2010; Giardine et al., 2005; Goecks et al., 2010 |
https://usegalaxy.org/ |
DESeq2 | Love et al., 2014 | https://bioconductor.org/packages/release/bioc/html/DESeq2.html |
DAVID Bioinformatics Resources 6.8 | Huang et al., 2009a, 2009b | https://david.ncifcrf.gov/home.jsp |
ViSP | El Beheiry and Dahan, 2013 | https://www.nature.com/articles/nmeth.2566 |
Supplementary Material
End-repair/3′ A mix component | Final concentration | Cat # |
---|---|---|
10X T4 DNA ligase buffer | 1X | NEB #B0202S |
10 mM dNTPs | 0.5 mM each | KAPA #KK1017 |
10 mM ATP | 0.25 mM | NEB #P0756S |
40% PEG 4000 | 2.5% | |
10 U/μl T4 PNK | 0.0025 U/μL | NEB #M0201S |
5U/μl T4 DNA polymerase* | 0.0025 U/μL | Invitrogen #18005025 |
5U/μl Taq DNA polymerase** | 0.0025 U/μL | Thermo #EP0401 |
diluted 1:20 in 1x T4 DNA ligase buffer
diluted 1:20 in 1X standard Taq buffer (NEB #B9014S)
PCR mix component | Final concentration | Cat # |
---|---|---|
5X KAPA buffer | 1X | KAPA #KK2502 |
10 mM dNTPs | 0.3 mM each | KAPA #KK1017 |
5 μM TruSeq PCR primers | 0.5 μM | Primer sequence in Table S1 |
KAPA HS HIFI polymerase | 1 U | KAPA #KK2502 |
Nuclease-free water to 30 μL |
Highlights.
An RNA-binding region (RBRi) in CTCF mediates self-association and clustering
Reorganization of TADs, loops, and stripes in ΔRBRi mutant cells
About half of all CTCF loops are disrupted in ΔRBRi mutant cells
CTCF loops fall into two classes: RBRi dependent and RBRi independent
ACKNOWLEDGMENTS
We thank Luke Lavis for generously providing JF dyes, Hervé Marie-Nelly for help with R code, Gina M. Dailey for assistance with cloning, Carla J. Inouye for help with the biochemical assays, Assaf Amitai for insightful discussion, Astou Tangara and Ana Robles for microscope assembly and maintenance, Ji Yu and Jean-Yves Tinevez for coding discussions, Daniel J. Lee for help with genotyping, and Dr. Kartoosh Heydari at the Li Ka Shing Facility for flow cytometry assistance. We thank Elphege Nora, Thomas Graham, and other members of the Tjian and Darzacq labs for comments on the manuscript. A preprint describing this work first appeared on bioRxiv on December 13, 2018 under the title “An RNA-Binding Region Regulates CTCF Clustering and Chromatin Looping.” This work was performed in part at the CRL Molecular Imaging Center, supported by the Gordon and Betty Moore Foundation. This work used the Vincent J. Coates Genomics Sequencing Laboratory at the University of California (UC), Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. A.S.H. was a postdoctoral fellow of the Siebel Stem Cell Institute and is supported by a NIH National Institute of General Medical Sciences (NIGMS) K99 Pathway to Independence Award (K99GM130896). This work was supported by NIH 4D Nucleome Common Fund grants UO1-EB021236 and U54-DK107980 (X.D.), California Institute of Regenerative Medicine grant LA1-08013 (X.D.), and the Howard Hughes Medical Institute (grant CC34430 to R.T.).
Footnotes
DATA AND CODE AVAILABILITY
Data and Code Availability Statement
The datasets and code generated during this study are available as detailed in the Key Resources Table.
Data availability
All the Micro-C, ChIP-Seq and RNA-Seq data is available at GEO under accession number GSE123636: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123636. The imaging data supporting this manuscript and (Hansen et al., 2018b) is available at https://zenodo.org/record/2208323. The raw uncropped gels, blots and confocal micrographs can be found at Mendeley Data: DOI: https://doi.org/10.17632/5zdrpcsbt9.2
Computer code
The code used for analyzing and processing the PALM data can be found at https://gitlab.com/anders.sejr.hansen/palm_pipeline. Please see Key Resources Table for a full list of all the codes and softwares and where to find them.
SUPPLEMENTAL INFORMATION
Supplemental Information can be found online at https://doi.org/10.1016/j.molcel.2019.07.039.
DECLARATION OF INTERESTS
D.R. is a co-founder of Constellation Pharmaceuticals and Fulcrum Therapeutics. All other authors declare no competing interests.
REFERENCES
- Abdennur N, and Mirny L. (2019). Cooler: scalable storage for Hi-C data and other genomically-labeled arrays. Bioinformatics. Published online July 10, 2019. 10.1093/bioinformatics/btz540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alipour E, and Marko JF (2012). Self-organization of domain structures by DNA-loop-extruding enzymes. Nucleic Acids Res. 40, 11202–11212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anders S, Pyl PT, and Huber W. (2015). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banani SF, Lee HO, Hyman AA, and Rosen MK (2017). Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol 18, 285–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Besag JE (1977). Contribution to the discussion of Dr. Ripley’s paper. J. R. Stat. Soc. B 39, 193–195. [Google Scholar]
- Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, and Taylor J. (2010). Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. Chapter 19, 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boehning M, Dugast-Darzacq C, Rankovic M, Hansen AS, Yu T, MarieNelly H, McSwiggen DT, Kokic G, Dailey GM, Cramer P, et al. (2018). RNA polymerase II clustering through carboxy-terminal domain phase separation. Nat. Struct. Mol. Biol 25, 833–840. [DOI] [PubMed] [Google Scholar]
- Bogu GK, Vizá n P, Stanton LW, Beato M, Di Croce L, and MartiRenom MA (2015). Chromatin and RNA Maps Reveal Regulatory Long Noncoding RNAs in Mouse. Mol. Cell. Biol 36, 809–819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, Xu X, Lv X, Hugnot J-P, Tanay A, and Cavalli G. (2017). Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572.e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brannan KW, Jin W, Huelga SC, Banks CAS, Gilmore JM, Florens L, Washburn MP, Van Nostrand EL, Pratt GA, Schwinn MK, et al. (2016). SONAR discovers RNA-binding proteins from analysis of large-scale proteinprotein interactomes. Mol. Cell 64, 282–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cattoglio C, Pustova I, Walther N, Ho JJ, Hantsche-Grininger M, Inouye CJ, Hossain MJ, Dailey GM, Ellenberg J, Darzacq X, et al. (2019). Determining cellular CTCF and cohesin abundances to constrain 3D genome models. eLife 8, 40164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caudron-Herger M, Rusin SF, Adamo ME, Seiler J, Schmid VK, Barreau E, Kettenbach AN, and Diederichs S. (2019). R-DeeP: proteome-wide and quantitative identification of RNA-dependent proteins by density gradient ultracentrifugation. Mol. Cell 75, 184–199.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et al. (2008). Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117. [DOI] [PubMed] [Google Scholar]
- Cho W-K, Spille J-H, Hecht M, Lee C, Li C, Grube V, and Cisse II (2018). Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 361, 412–415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chong S, Dugast-Darzacq C, Liu Z, Dong P, Dailey GM, Cattoglio C, Heckert A, Banala S, Lavis L, Darzacq X, and Tjian R. (2018). Imaging dynamic and selective low-complexity domain interactions that control gene transcription. Science 361, eaar2555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crane E, Bian Q, McCord RP, Lajoie BR, Wheeler BS, Ralston EJ, Uzawa S, Dekker J, and Meyer BJ (2015). Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davidson IF, Goetz D, Zaczek MP, Molodtsov MI, HuisIntVeld PJ, Weissmann F, Litos G, Cisneros DA, Ocampo-Hafalla M, Ladurner R, et al. (2016). Rapid movement and transcriptional re-localization of human cohesin on DNA. EMBO J. 35, 2671–2685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Wit E, Vos ESM, Holwerda SJB, Valdes-Quezada C, Verstegen MJAM, Teunissen H, Splinter E, Wijchers PJ, Krijger PHL, and de Laat W. (2015). CTCF binding polarity determines chromatin looping. Mol. Cell 60, 676–684. [DOI] [PubMed] [Google Scholar]
- Dekker J, and Mirny L. (2016). The 3D genome as moderator of chromosomal communication. Cell 164, 1110–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, and Ren B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, Lander ES, and Aiden EL (2016). Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El Beheiry M, and Dahan M. (2013). ViSP: representing single-particle localizations in three dimensions. Nat. Methods 10, 689–690. [DOI] [PubMed] [Google Scholar]
- El-Kady A, and Klenova E. (2005). Regulation of the transcription factor, CTCF, by phosphorylation with protein kinase CK2. FEBS Lett. 579, 1424–1434. [DOI] [PubMed] [Google Scholar]
- Elmokadem A, and Yu J. (2015). Optimal drift correction for superresolution localization microscopy with Bayesian inference. Biophys. J 109, 1772–1780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forcato M, Nicoletti C, Pal K, Livi CM, Ferrari F, and Bicciato S. (2017). Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fudenberg G, Imakaev M, Lu C, Goloborodko A, Abdennur N, and Mirny LA (2016). Formation of chromosomal domains by loop extrusion. Cell Rep. 15, 2038–2049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fudenberg G, Abdennur N, Imakaev M, Goloborodko A, and Mirny LA (2017). Emerging evidence of chromosome folding by loop extrusion. Cold Spring Harb. Symp. Quant. Biol 82, 45–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganji M, Shaltiel IA, Bisht S, Kim E, Kalichava A, Haering CH, and Dekker C. (2018). Real-time imaging of DNA loop extrusion by condensin. Science 360, 102–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gassler J, Brandão HB, Imakaev M, Flyamer IM, Ladstätter S, Bickmore WA, Peters J-M, Mirny LA, and Tachibana K. (2017). A mechanism of cohesin-dependent loop extrusion organizes zygotic genome architecture. EMBO J. 36, 3600–3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goecks J, Nekrutenko A, and Taylor J; Galaxy Team (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimm JB, English BP, Choi H, Muthusamy AK, Mehl BP, Dong P, Brown TA, Lippincott-Schwartz J, Liu Z, Lionnet T, and Lavis LD (2016). Bright photoactivatable fluorophores for single-molecule imaging. Nat. Methods 13, 985–988. [DOI] [PubMed] [Google Scholar]
- Guo Y, Xu Q, Canzio D, Shou J, Li J, Gorkin DU, Jung I, Wu H, Zhai Y, Tang Y, et al. (2015). CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M Jr., Jungkamp A-C, Munschauer M, et al. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen AS, Pustova I, Cattoglio C, Tjian R, and Darzacq X. (2017). CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife 6, e25776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen AS, Cattoglio C, Darzacq X, and Tjian R. (2018a). Recent evidence that TADs and chromatin loops are dynamic structures. Nucleus 9, 20–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen AS, Amitai A, Cattoglio C, Tjian R, and Darzacq X. (2018b). Guided nuclear exploration increases CTCF target search efficiency. bioRxiv. 10.1101/495457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hashimoto H, Wang D, Horton JR, Zhang X, Corces VG, and Cheng X. (2017). Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol. Cell 66, 711–720.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He C, Sidoli S, Warneford-Thomson R, Tatomer DC, Wilusz JE, Garcia BA, and Bonasio R. (2016). High-resolution mapping of RNA-binding regions in the nuclear proteome of embryonic stem cells. Mol. Cell 64, 416–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinz S, Texari L, Hayes MGB, Urbanowski M, Chang MW, Givarkes N, Rialdi A, White KM, Albrecht RA, Pache L, et al. (2018). Transcription elongation can affect genome 3D structure. Cell 174, 1522–1536.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hnisz D, Schuijers J, Li CH, and Young RA (2017). Regulation and dysregulation of chromosome structure in cancer. Annu. Rev. Cancer Biol 2, 21–40. [Google Scholar]
- Hsieh THS, Weiner A, Lajoie B, Dekker J, Friedman N, and Rando OJ (2015). Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell 162, 108–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsieh TS, Fudenberg G, Goloborodko A, and Rando OJ (2016). Micro-C XL: assaying chromosome conformation from the nucleosome to the entire genome. Nat. Methods 13, 1009–1011. [DOI] [PubMed] [Google Scholar]
- Huang W, Sherman BT, and Lempicki RA (2009a). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang W, Sherman BT, and Lempicki RA (2009b). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc 4, 44–57. [DOI] [PubMed] [Google Scholar]
- Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS, et al. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanke M, Tahara E, Huis In’t Veld PJ, and Nishiyama T. (2016). Cohesin acetylation and Wapl-Pds5 oppositely regulate translocation of cohesin along DNA. EMBO J. 35, 2686–2698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kentepozidou E, Aitken SJ, Feig C, Stefflova K, Ibarra-Soria X, Odom DT, Roller M, and Flicek P. (2019). Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains. bioRxiv. 10.1101/668855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Azhir A, Kumar N, et al. (2018). HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 19, 125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim E, Kerssemakers J, Shaltiel IA, Haering CH, and Dekker C. (2019). DNA-loop extruding condensin complexes can traverse one another. bioRxiv. 10.1101/682864. [DOI] [PubMed] [Google Scholar]
- Klenova EM, Chernukhin IV, El-Kady A, Lee RE, Pugacheva EM, Loukinov DI, Goodwin GH, Delgado D, Filippova GN, León J, et al. (2001). Functional phosphorylation sites in the C-terminal region of the multivalent multifunctional transcriptional factor CTCF. Mol. Cell. Biol 21, 2221–2234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kung JT, Kesner B, An JY, Ahn JY, Cifuentes-Rojas C, Colognori D, Jeon Y, Szanto A, del Rosario BC, Pinter SF, et al. (2015). Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol. Cell 57, 361–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Trapnell C, Pop M, and Salzberg SL (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, Meyerson M, Gabriel SB, Lander ES, and Getz G. (2014). Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, and Durbin R; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lieberman-aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat ML, Smyth GK, and Ritchie ME (2015). Why weight? Modelling sample and observational level variability improves power in RNAseq analyses. Nucleic Acids Res. 43, e97–e97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love MI, Huber W, and Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, et al. (2015). Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez SR, and Miranda JL (2010). CTCF terminal segments are unstructured. Protein Sci. 19, 1110–1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McSwiggen DT, Hansen AS, Teves SS, Marie-Nelly H, Hao Y, Heckert AB, Umemoto KK, Dugast-Darzacq C, Tjian R, and Darzacq X. (2019). Evidence for DNA-mediated nuclear compartmentalization distinct from phase separation. eLife 8, e47098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merkenschlager M, and Nora EP (2016). CTCF and cohesin in genome foldingcomme and transcriptional gene regulation. Annu. Rev. Genomics Hum. Genet 17, 17–43. [DOI] [PubMed] [Google Scholar]
- Nakahashi H, Kieffer Kwon KR, Resch W, Vian L, Dose M, Stavreva D, Hakim O, Pruett N, Nelson S, Yamane A, et al. (2013). A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 3, 1678–1689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nasmyth K. (2011). Cohesin: a catenase with separate entry and exit gates? Nat. Cell Biol 13, 1170–1177. [DOI] [PubMed] [Google Scholar]
- Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J, et al. (2012). Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nora EP, Goloborodko A, Valton A-L, Gibcus JH, Uebersohn A, Abdennur N, Dekker J, Mirny LA, and Bruneau BG (2017). Targeted degradation of CTCF decouples local insulation of chromosome domains from higher-order genomic compartmentalization. Cell 169, 930–944.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pant V, Kurukuti S, Pugacheva E, Shamsuddin S, Mariano P, Renkawitz R, Klenova E, Lobanenkov V, and Ohlsson R. (2004). Mutation of a single CTCF target site within the H19 imprinting control region leads to loss of Igf2 imprinting and complex patterns of de novo methylation upon maternal inheritance. Mol. Cell. Biol 24, 3497–3504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pękowska A, Klaus B, Xiang W, Severino J, Daigle N, Klein FA, Oleś M, Casellas R, Ellenberg J, Steinmetz LM, et al. (2018). Gain of CTCFanchored chromatin loops marks the exit from naive pluripotency. Cell Syst. 7, 482–495.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pélissier R, and Goreaud F. (2015). ads package for R: a fast unbiased implementation of the K-function family for studying spatial point patterns in irregular-shaped sampling windows. J. Stat. Softw 63(6). [Google Scholar]
- Pettitt SJ, Liang Q, Rairdan XY, Moran JL, Prosser HM, Beier DR, Lloyd KC, Bradley A, and Skarnes WC (2009). Agouti C57BL/6N embryonic stem cells for mouse genetic resources. Nat. Methods 6, 493–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramíez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, and Manke T. (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44 (W1), W160–W165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, and Zhang F. (2013). Genome engineering using the CRISPR-Cas9 system. Nat. Protoc 8, 2281–2308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, and Aiden EL (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rao SSP, Huang S-C, Glenn St Hilaire B, Engreitz JM, Perez EM, Kieffer-Kwon K-R, Sanborn AL, Johnstone SE, Bascom GD, Bochkov ID, et al. (2017). Cohesin loss eliminates all loop domains. Cell 171, 305–320.e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasko JEJ, Klenova EM, Leon J, Filippova GN, Loukinov DI, Vatolin S, Robinson AF, Hu YJ, Ulmer J, Ward MD, et al. (2001). Cell growth inhibition by the multifunctional multivalent zinc-finger factor CTCF. Cancer Res. 61, 6002–6007. [PubMed] [Google Scholar]
- Rigbolt KTG, Prokhorova TA, Akimov V, Henningsen J, Johansen PT, Kratchmarova I, Kassem M, Mann M, Olsen JV, and Blagoev B. (2011). System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci. Signal 4, rs3. [DOI] [PubMed] [Google Scholar]
- Ripley BD (1976). The second-order analysis of stationary point processes. J. Appl. Probab 13, 255–266. [Google Scholar]
- Robinson MD, McCarthy DJ, and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, and Mesirov JP (2011). Integrative genomics viewer. Nat. Biotechnol 29, 24–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rowley MJ, and Corces VG (2018). Organizational principles of 3D genome architecture. Nat. Rev. Genet 19, 789–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin-Delanchy P, Burn GL, Griffié J, Williamson DJ, Heard NA, Cope AP, and Owen DM (2015). Bayesian cluster identification in single-molecule localization microscopy data. Nat. Methods 12, 1072–1076. [DOI] [PubMed] [Google Scholar]
- Saldaña-Meyer R, González-Buendía E, Guerrero G, Narendra V, Bonasio R, Recillas-Targa F, and Reinberg D. (2014). CTCF regulates the human p53 gene through direct interaction with its natural antisense transcript, Wrap53. Genes Dev. 28, 723–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saldaña-Meyer R, Rodriguez-Hernaez J, Escobar T, Nishana M, JácomeLópez K, Nora EP, Bruneau BG, Tsirigos A, Furlan-Magaril M, Skok J, et al. (2019). RNA Interactions Are Essential for CTCF-Mediated Genome Organization. Mol. Cell 76 Published online September 12, 2019 10.1016/j.molcel.2019.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanborn AL, Rao SSP, Huang S-C, Durand NC, Huntley MH, Jewett AI, Bochkov ID, Chinnappan D, Cutkosky A, Li J, et al. (2015). Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. U S A 112, E6456–E6465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, et al. (2012). Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarzer W, Abdennur N, Goloborodko A, Pekowska A, Fudenberg G, Loe-Mie Y, Fonseca NA, Huber W, Haering CH, Mirny L, and Spitz F. (2017). Two independent modes of chromatin organization revealed by cohesin removal. Nature 551, 51–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serge A, Bertaux N, Rigneault H, and Marguet D. (2008). Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. Nat. Methods 5, 687–694. [DOI] [PubMed] [Google Scholar]
- Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C-J, Vert J-P, Heard E, Dekker J, and Barillot E. (2015). HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheff MA, and Thorn KS (2004). Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae. Yeast 21, 661–670. [DOI] [PubMed] [Google Scholar]
- Shin Y, and Brangwynne CP (2017). Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382. [DOI] [PubMed] [Google Scholar]
- Skene PJ, and Henikoff S. (2017). An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6, e21856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skibbens RV (2016). Of rings and rods: regulating cohesin entrapment of DNA to generate intra- and intermolecular tethers. PLoS Genet. 12, e1006337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stigler J, Camdere GÖ, Koshland DE, and Greene EC (2016). Singlemolecule imaging reveals a collapsed conformational state for DNA-bound cohesin. Cell Rep. 15, 988–998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun S, Del Rosario BC, Szanto A, Ogawa Y, Jeon Y, and Lee JT (2013). Jpx RNA activates Xist by evicting CTCF. Cell 153, 1537–1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Symmons O, Uslu VV, Tsujimura T, Ruf S, Nassari S, Schwarzer W, Ettwiller L, and Spitz F. (2014). Functional and topological characteristics of mammalian regulatory domains. Genome Res. 24, 390–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorvaldsdóttir H, Robinson JT, and Mesirov JP (2013). Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform 14, 178–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Steensel B, and Belmont AS (2017). Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell 169, 780–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vietri Rudan M, Barrington C, Henderson S, Ernst C, Odom DT, Tanay A, and Hadjur S. (2015). Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang F, Tang Z, Shao H, Guo J, Tan T, Dong Y, and Lin L. (2018). Long noncoding RNA HOTTIP cooperates with CCCTC-binding factor to coordinate HOXA gene expression. Biochem. Biophys. Res. Commun 500, 852–859. [DOI] [PubMed] [Google Scholar]
- Wutz G, Várnai C, Nagasaka K, Cisneros DA, Stocsits RR, Tang W, Schoenfelder S, Jessberger G, Muhar M, Hossain MJ, et al. (2017). Topologically associating domains and chromatin loops depend on cohesin and are regulated by CTCF, WAPL, and PDS5 proteins. EMBO J. 36, 3573–3599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiang JF, Yin QF, Chen T, Zhang Y, Zhang XO, Wu Z, Zhang S, Wang HB, Ge J, Lu X, et al. (2014). Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus. Cell Res. 24, 513–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao T, Wallace J, and Felsenfeld G. (2011). Specific sites in the C terminus of CTCF interact with the SA2 subunit of the cohesin complex and are required for cohesin-dependent insulation activity. Mol. Cell. Biol 31, 2174–2183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang F, Deng X, Ma W, Berletch JB, Rabaia N, Wei G, Moore JM, Filippova GN, Xu J, Liu Y, et al. (2015). The lncRNA Firre anchors the inactive X chromosome to the nucleolus by binding CTCF and maintains H3K27me3 methylation. Genome Biol. 16, 52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao H, Brick K, Evrard Y, Xiao T, Camerini-Otero RD, and Felsenfeld G. (2010). Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA. Genes Dev. 24, 2543–2555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin M, Wang J, Wang M, Li X, Zhang M, Wu Q, and Wang Y. (2017). Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res. 27, 1365–1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yusufzai TM, Tagami H, Nakatani Y, and Felsenfeld G. (2004). CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. Mol. Cell 13, 291–298. [DOI] [PubMed] [Google Scholar]
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, and Liu XS (2008). Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zirkel A, Nikolic M, Sofiadis K, Mallm J-P, Brackley CA, Gothe H, Drechsel O, Becker C, Altmu€ller, J., Josipovic, N., et al. (2018). HMGB2 loss upon senescence entry disrupts genomic organization and induces CTCF clustering across cell types. Mol. Cell 70, 730–744.e6. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.