Abstract
The CCCTC-binding factor (CTCF) works together with the cohesin complex to drives the formation of chromatin loops and topologically associating domains, but its role in gene regulation has not been fully defined. Here, we investigated the effects of acute CTCF loss on chromatin architecture and transcriptional programs in mouse embryonic stem cells undergoing differentiation to neural precursor cells. We identified CTCF-dependent enhancer-promoter contacts genome-wide and found that they disproportionally affect genes that are bound by CTCF at promoter and dependent on long-distance enhancers. Disruption of promoter-proximal CTCF binding reduced both long-range enhancer-promoter contacts and transcription, which are restored by artificial tethering of CTCF to the promoter. Promoter-proximal CTCF binding is correlated with transcription of over 2,000 genes, across a diverse set of adult tissues. Taken together, our study shows that CTCF binding to promoters may promote long-distance-enhancer dependent transcription at specific genes in diverse cell types.
Introduction:
Transcriptional regulation in mammalian cells is orchestrated by cis-regulatory elements that include promoters, enhancers, insulators and other less well characterized sequences1,2. Large-scale projects such as ENCODE have annotated millions of candidate cis-regulatory elements in the human genome and genomes of other mammalian species3–5. A majority of these candidate regulatory elements are located far from transcription start sites (i.e. promoters), display tissue and cell-type specific chromatin accessibility, and likely act as enhancers to regulate cell-type specific gene expression. Enhancers can activate genes at great genomic distances, making it difficult to predict their target genes from sequence information alone. Increasingly, maps of the chromatin topology are used to infer target genes of enhancers, based on the observations that enhancers are frequently positioned close to their target gene promoters in 3D space at the time of gene activation6. However, the exact role of chromatin topology in enhancer-dependent gene regulation remains to be clearly defined7,8.
The CCCTC-binding factor (CTCF) plays a critical role in chromatin architecture9–13. It works together with the cohesin complex to establish chromatin domains genome-wide, and forms long-range chromatin loops between CTCF binding sites (CBSs) via a mechanism involving loop extrusion 14,15. CTCF has also been shown to be necessary for enhancer-promoter (E-P) contacts for specific genes, such as the proto-cadherin gene clusters, and for class switch recombination in B lymphocytes16–18. On the other hand, acute depletion of CTCF has been shown to result in only moderate change of gene expression profiles despite the global loss of chromatin loops anchored at CBSs and weakening of chromatin domain boundaries13,19. In addition, although CTCF is essential for embryonic development in multiple types of tissues20, a recent study reported a dispensable role for CTCF in immune cell differentiation21. To better understand the apparent discrepancy in the functional role of CTCF in dynamic gene regulation in different cell types, comprehensive analysis of CTCF-dependent E-P contacts during cell differentiation and exploration of the role of CTCF binding in establishment of specific E-P contacts are needed.
Here we use two different approaches to perturb chromatin topology at CBSs in mouse embryonic stem cells (mESCs), in order to define the role of CTCF-driven chromatin organization in gene regulation and cellular differentiation. First, we used auxin-inducible degron22,23 to acutely deplete CTCF protein levels in a genetically engineered mouse ES cell line, and study the changes in chromatin topology genome-wide in both undifferentiated ES cells and in neural precursor cells (NPCs) derived from the CTCF-depleted ES cells. To identify promoter-anchored contacts at high-resolution that cannot be precisely revealed by conventional in situ Hi-C and might be in the similar resolution level as Micro-C24, we also performed promoter-centric chromatin conformation capture assays, PLAC-seq (also known as HiChIP)25,26. We observed hundreds of lost and newly formed E-P and promoter-promoter (P-P) contacts at dysregulated genes, and found that removal of CTCF binding at the promoter reduces E-P and P-P contacts and gene expression, suggesting that CTCF binding at promoters plays an active role in establishment of promoter-anchored contacts. In the second approach, we used CRISPR technology to artificially tether CTCF to a promoter. We demonstrated that targeted recruitment of CTCF to a promoter is required to establish long-range chromatin contacts between the promoter and distal elements and to activate gene expression. Furthermore, we characterized the features of CTCF-dependent genes and found that the impact of CTCF loss on gene regulation is determined not only by CTCF binding at promoters but also the distribution of nearby enhancers. The role of promoter-proximal CTCF binding in transcriptional regulation is further supported by the observation that over 2,300 mouse genes display a significant correlation between CTCF occupancy at the promoter and tissue-specific gene expression patterns. Our findings therefore uncover the mechanisms of CTCF-dependency in gene regulation and provide direct evidence for a role of CTCF-binding at promoters in activation of genes located in regions sparse in active enhancers, which is distinct from its function at insulator sequences.
Results:
CTCF depletion impedes differentiation of mESC towards neural precursor cells
To investigate the functional role of CTCF-driven chromatin loops in gene regulation, we utilized an auxin-inducible degron system to acutely deplete CTCF protein in mESC and examined the impact of CTCF loss on gene expression and chromatin architecture during mESC differentiation to NPCs (Fig. 1a, Supplementary Table 1). The depletion of CTCF was verified by Western blotting and by ChIP-seq analysis showing that chromatin occupancy of CTCF was nearly completely lost in both ESCs and NPCs, along with loss of cohesin accumulation (Extended Data Fig. 1 and Supplementary Table 2). The CTCF-depleted cells exhibited a delay in the formation of neuronal axons during neural differentiation treatment with cell colonies remaining in ESC-like round-shape and reverted to normal NPC morphology after washing out of auxin (Fig. 1b).
We next investigated the impact of CTCF loss on gene expression using RNA-seq. Consistent with previous reports13, transcription from most genes was unaffected in CTCF-depleted ESC. Additionally, the gene expression profiles were largely uninterrupted during cell differentiation (Fig. 1c). However, transcription of several hundreds of genes (186 genes, 1.4% in ESCs, 353 genes, 2.7% in NPCs) decreased significantly upon CTCF loss (FDR < 0.05, fold change > 2) (Fig. 1d, Extended Data Fig. 2a, b, and Supplementary Table 3). Genes related to neural differentiation (e.g. Pcdh cluster genes, Pax6, Tubb3) were highly enriched in these CTCF-dependent genes (Extended Data Fig. 2c), consistent with the observation that CTCF-depleted ESCs underwent abnormal neural differentiation as described above.
CTCF loss affects hundreds of E-P and P-P contacts at dysregulated genes in mESC and NPC
Previous studies have shown that CTCF plays an important role in establishment of chromatin loops and domains in the mammalian genome9–13,27. To better understand how CTCF-dependent chromatin topology contributes to gene regulation, we first defined the changes of chromatin architecture as a result of CTCF depletion. To this end, we performed in situ Hi-C experiments with ESC and NPC, each before and after auxin-induced depletion of CTCF. Topologically associating domains (TADs) are DNA segments characterized by strong intra-domain interactions and relatively weak inter-domain interactions in Hi-C, and the strength of the TAD boundaries can be quantified by the insulation score, a ratio between the number of cross-border interactions and the sum of intra-domain interactions within the two adjacent TADs28. As shown in Extended Data Fig. 3, CTCF depletion resulted in substantial loss of chromatin contacts between convergent CBSs (genomic distance > 100 kb), supporting an essential role for CTCF in the formation of these chromatin organizational features (Extended Data Fig. 3a). We also observed significant weakening of TAD boundaries in both ESC and NPC (Extended Data Fig. 3b–e and Supplementary Table 4). These results are consistent with previous findings indicating CTCF’s role in the formation of most, if not all, TADs in mammalian cells13 (Extended Data Fig. 3g–i). On the other hand, we also observed relatively well-preserved TAD boundaries in CTCF-depleted cells. As transcription has been suggested to play a role in boundary formation9,29, insulation of TAD boundaries was generally stronger when they overlapped with housekeeping genes. Consistent with the previous hypothesis, insulation scores at these TAD boundaries remained low even after CTCF depletion, indicative strong insulation. This observation suggests that CTCF is not the only factor to modulate domain insulation at TAD-boundaries with highly activated transcription (Extended Data Fig. 3d).
To more precisely define the changes in chromatin topology due to CTCF loss, we performed PLAC-seq (also known as HiChIP)25,26, which interrogates chromatin contacts from active or poised gene promoters at high resolution by combining Hi-C and chromatin immunoprecipitation. We used antibodies against the histone modification H3K4me3, which marks active or poised promoters, to detect chromatin contacts centered on these genomic regions. We obtained between 300 and 400 million paired-end reads for each replicate (Supplementary Table 1). To determine the differential chromatin contacts in ESCs and NPCs, we analyzed 11,900 gene promoters with similar levels of H3K4me3 ChIP-seq signal using a negative binomial model for each distance-stratified 10-kb interval (Supplementary Figure 1a, b, Methods). In total, we found 5,913 chromatin contacts between the promoters of 4,573 genes and distal elements (active enhancers or promoters) to be significantly induced during the neural differentiation (FDR < 0.05), and 1,594 contacts centered on 1,294 genes significantly decreased, most of which could not be identified in deeply sequenced Hi-C data29 (Fig. 2a, Supplementary Figure 1c, d). We observed a higher number and longer-range of E-P and P-P contacts induced during neural differentiation than in mESC (Fig. 2a, Supplementary Figure 1d, e, and Extended Data Fig. 4a). As expected, these dynamic changes of E-P and P-P contacts were positively correlated with the changes of active histone modifications such as H3K27ac and H3K4me1 (Extended Data Fig. 4b). Our datasets also confirmed previously reported dynamic E-P contacts during ESC differentiation (e.g. Sox2, Dnmt3b)10,30 (Extended Data Fig. 4c). Analysis of the relationship between dynamic chromatin contacts at promoters and transcription is complicated by the fact that many genes had multiple E-P contacts, and the changes of each individual chromatin contact were not always positively correlated with differential gene expression (Extended Data Fig. 4c–e). We therefore devised an Active-Inactive-Contact (AIC) value, in which we used solely contact count by combining multiple E-P and P-P contacts (Extended Data Fig. 4f, Methods). This value showed a positive correlation with gene expression changes (Extended Data Fig. 4g, h). Interestingly, we also observed a large number of promoter-anchored contacts even at inactive genes (Extended Data Fig. 4i) and identified over a thousand chromatin contacts with distal elements displaying repressive histone mark H3K27me3 (Extended Data Fig. 4j, k). The above results, taken together, support the potential of our datasets for analyzing individual E-P and P-P contacts29.
Using the same approach, we determined the chromatin contacts dependent on CTCF in ESCs and NPCs by comparing the PLAC-seq data collected from cells before and after CTCF depletion. The chromatin contacts between convergent CBSs were severely reduced upon CTCF loss (Extended Data Fig. 5a, b), consistent with the results from Hi-C assays. Surprisingly, the majority of chromatin contacts between enhancers and promoters remained unchanged despite the global weakening of TADs and disruption of chromatin loops. Chromatin contacts between just 394 and 806 enhancer-promoter (E-P) or promoter-promoter (P-P) pairs in ESCs and NPCs, respectively, decreased significantly upon CTCF loss (FDR < 0.05), while chromatin contacts between 44 and 109 pairs in ESCs and NPCs, respectively, increased upon CTCF loss (Fig. 2b, Supplementary Table 5, Supplementary Figure 1c). Consistent with potential for some promoters to act as enhancers of other genes31,32 (Extended Data Fig. 4h), disruption of P-P contacts upon CTCF loss was accompanied by down-regulation of gene expression (Extended Data Fig. 5d, e), although the expression of the genes in each pair did not always move in the same direction upon loss of contacts (Extended Data Fig. 5e). Regarding the genomic distance of chromatin contacts, CTCF-dependent E-P and P-P contacts in differentiated NPCs generally span longer genomic distances than those in undifferentiated ESCs (Extended Data Fig. 5c), which is consistent with the increase in long-range promoter-anchored contacts in differentiated NPCs (Fig. 2a). Most importantly, only 283 pairs of E-P and P-P contacts, out of 5,913 contacts that are normally induced during differentiation, failed to be induced in the absence of CTCF (Fig. 2c). This observation provides an explanation for the mild change of gene expression profiles upon CTCF depletion, and suggests CTCF/cohesin-independent mechanisms of enhancer-promoter contacts, as recently reported33.
CTCF binding at gene promoters drives CTCF-dependent E-P and P-P contacts
We next investigated the features of CTCF-dependent E-P and P-P contacts and genes. Since ChIP-seq levels of histone modifications (H3K27ac, H3K4me1, and H3K4me3) were virtually unaffected by CTCF depletion, the observed changes in chromatin contacts in ESCs and NPCs were likely a direct consequence of CTCF loss (Extended Data Fig. 6). We first categorized CTCF-dependent reduced chromatin contacts based on the presence of CBSs on anchor sites. The majority of CTCF-dependent E-P and P-P contacts were anchored by CBSs at promoter regions (81% in ESC, 64% in NPC), although they were not always anchored by convergent CTCF on their both anchor sites (Fig. 2d, Extended Data Fig. 7a). Importantly, most of these CTCF-dependent E-P and P-P contacts were located within CTCF-CTCF loops or at the loop anchors, suggesting that CTCF-dependent E-P and P-P contacts might be physically supported by surrounding CTCF-CTCF loops. In line with these results, CTCF-dependent down-regulated genes were strongly enriched for CBSs at their promoters, not at their distal elements (Fig. 2e, Extended Data Fig. 7c–f). Furthermore, the degree of enrichment of CTCF-dependent E-P and P-P contacts increased with the number of CBSs around the anchors (Extended Data Fig. 7b) and the larger number of CBSs was also observed around promoters of CTCF-dependent genes (Fig. 2f). By contrast, the anchors of E-P and P-P contacts induced upon CTCF loss were not enriched for CBS (Extended Data Fig. 7a, b), and a higher fraction of induced chromatin contacts were crossing over multiple CTCF sites such as TAD boundary regions, suggesting that CTCF depletion leads to newly formed chromatin contacts between enhancers and promoters that were formerly insulated by CTCF binding (Extended Data Fig. 7g–i). Taken together, CTCF dependency in gene regulation was highly impacted by enriched CBSs around promoters, indicating that CTCF promotes long-range E-P and P-P contacts by binding directly to their promoters and controls transcription of a select number of genes.
Artificial tethering of CTCF to a gene promoter facilitates distal element dependent transcription
To delineate the causal relationship between CTCF-mediated long-range chromatin contacts anchored at a promoter and distal element dependent transcription, we next focused on the Vcan gene, which encodes a protein that plays an important role in axonal outgrowth34 and neural differentiation35. Vcan is induced during NPC differentiation, and the induction is lost upon CTCF depletion along with the loss of a long-range P-P contacts (350 kb range) anchored by a CBS only on the Vcan promoter side, suggesting that the Vcan gene may be regulated by a CTCF-dependent P-P contact. We used CRISPR-mediated genome editing to delete a 118-bp sequence containing the CTCF binding motif at the promoter of Vcan gene. Upon removal of the CTCF binding sequence, Vcan expression was significantly reduced. This reduction in Vcan expression could be restored partially by tethering the CTCF protein to the mutated Vcan promoter using a dCas9-CTCF fusion and a guide RNA (gRNA) targeting a sequence adjacent to the deleted CTCF binding sequence, in two different experiments using distinct gRNAs (Fig. 3a, b, Extended Data Fig. 8a–c). The rescue of the Vcan expression by the artificially tethered CTCF was also dependent on the presence of the promoter of Xrcc4/Tmem167 gene located at 350 kb downstream of Vcan gene (Fig. 3a, b, the cell line depicted on second from the bottom). PLAC-seq experiments showed that the Vcan promoter-anchored contacts within the TAD were significantly reduced upon removal of the promoter-proximal CTCF binding site (WT vs CTCF motif del in Fig. 3c, d), though the degree of the change was not as severe as those observed after the global loss of CTCF (auxin - vs auxin + in Fig. 3c, d). Similarly, the artificial tethering of CTCF to Vcan promoter restored the intra-TAD contacts from Vcan promoter (dCas9-CTCF vs dCas9 alone in Fig. 3c, d), and importantly, the long-range chromatin contacts between Vcan promoter and the 350 kb downstream distal Xrcc4/Tmem167 promoter was also re-established (Fig. 3e, f, Extended Data Fig. 8d).
Taken together, our results demonstrated that promoter-proximal CTCF binding can promote long-range promoter-anchored chromatin contacts and facilitates gene activation driven by distal enhancers. Polymer modelling based on the strings and binders switch (SBS) model also supports such changes of chromatin contacts upon CTCF depletion36,37, providing at the same time information of the underlying 3D spatial organization of chromatin around the Vcan gene at the single-molecule level (Extended Data Fig. 8e–g). Finally, it is interesting to note that the distal element driving Vcan transcription was also itself a promoter of gene, but we confirmed that this distal promoter is responsible for the activation of Vcan gene transcription by deleting its sequences (Fig. 3a, b, the cell line depicted on the bottom).
Promoter-proximal CTCF-regulated genes tend to reside in enhancer sparse regions
While we found that the genes affected by CTCF loss tend to be occupied by CTCF at the promoters, we next addressed the distribution of active enhancers around CTCF-dependent/-independent genes in ESC and NPC. CTCF-independent genes (Fig. 1d) were generally close to active enhancers especially in NPCs (Extended Data Fig. 9a, b) and appeared to be regulated by short range interactions (≤ 50 kb, PLAC-seq peak signal p-value < 0.01) (Fig. 4a), implying that they are regulated by short-range E-P and P-P contacts formed independently of CTCF (Extended Data Fig. 9c, d). By contrast, CTCF-dependent genes were generally regulated by long-range E-P and P-P contacts (≥ 100 kb, PLAC-seq peak signal p-value < 0.01) especially in NPCs (Fig. 4b, Extended Data Fig. 9e). Similarly, genes up-regulated upon CTCF depletion differed from those down-regulated in whether they had multiple active enhancers around them or not. While the down-regulated genes tended to be located at enhancer sparse regions defined as having two enhancers or less around transcription start site (TSS) within 200 kb (Fig. 4b), the up-regulated genes were close to multiple enhancers (Fig. 4b, Extended Data Fig. 9f, and Supplementary Figure 2 for their examples). These results suggest that CTCF-dependent genes are generally dependent on distal elements. To support this, we further examined the overlap with genes whose regulation is dependent on Mll3 and Mll4, major H3K4 monomethyltransferases on distal enhancers38,39. As expected, the overlap between Mll3/4 dependent genes that were differentially expressed in Mll3/4 double knockout cells40 and the CTCF-dependent genes in NPCs were highly significant (p-value = 3.6e-42, odds ratio = 3.8) (Extended Data Fig. 9g). Lastly, we addressed the variable impacts of CTCF loss at gene promoters, because there were still hundreds of genes that were not affected by CTCF loss despite having CTCF binding at gene promoters in ESCs and NPCs. (Fig. 2e). We classified these genes occupied by CTCF at promoters (TSS ±1 kb) into two groups based on the number of enhancers around the gene and the distance of their E-P contacts (Fig. 4c, left panel). When genes were located at enhancer dense regions (7 enhancers or more around TSS ± 200 kb) and displayed short chromatin contacts with distal elements (< 50 kb, PLAC-seq peak signal p-value < 0.01), the reduction in gene expression upon CTCF loss was moderate despite CTCF binding at their promoters. Taken together, CTCF dependency in gene regulation was determined not only by the distribution of CBSs but also the distribution of active elements around promoters, and importantly, these results suggest that presence of multiple active enhancers nearby genes might compensate for the loss of CTCF at promoters.
Promoter occupancy by CTCF correlates with tissue-restricted gene expression at over 2,000 of genes
The above findings suggest a previously under-appreciated mechanism for CTCF in gene regulation. In contrast to its well-established role in forming chromatin loops, TAD boundaries and insulators, we demonstrated that CTCF also directly binds to gene promoters to promote long-range E-P(P-P) contacts and enable enhancer-dependent gene expression in mammalian cells. In mouse ESCs and NPCs, several hundred genes are subject to regulation by this mechanism. These include the proto-cadherin gene clusters that were previously reported to be regulated by CTCF binding at the promoters and the distal enhancer16 (Supplementary Figure 2a). To further explore the extent of genes subject to this CTCF-dependent mechanism, we examined public ChIP-seq datasets of CTCF binding and RNA-Seq across multiple mouse tissues (9 tissue samples from ENCODE4,41, Supplementary Table 6). Consistent with this postulated mechanism, multiple CBSs are enriched around promoters with relatively conserved binding motif sequences (Fig. 5a, Extended Data Fig. 10a–c) and ChIP-seq signals around promoters (TSS ±10 kb) show positive correlation with gene expression in over 2,300 mouse genes in these tissues (r > 0.6, 2,332 genes), many of which could not be explained by DNA methylation levels at the promoter-proximal CBSs (Fig. 5b, Extended Data Fig. 10d). Interestingly, high lineage-specificity in transcription as measured by Shannon entropy that define transcriptome diversity based on its distribution42 was predominantly found in the forebrain-specific genes and the most enriched gene ontology (GO) term in this gene group was related to “synapse assembly”. On the other hand, GO terms related to “signaling pathway” were enriched in the other tissue-specific genes (Fig. 5c, Extended Data Fig. 10e, f). Many forebrain-specific and heart-specific genes were down-regulated in CTCF-depleted NPCs and CTCF knockout heart tissue43, respectively (Extended Data Fig. 10g, h), supporting that many of these genes are indeed regulated by CTCF binding to the gene promoters in a lineage-specific manner.
Discussion:
CTCF- and cohesin-mediated chromatin structures such as TADs and CTCF loops have been postulated to play a role in constraining enhancer-promoter communications9–12,27. However, the vast majority of genes are expressed normally in the absence of CTCF or Cohesin13,19, raising questions about the role of chromatin architecture, especially E-P contacts, in gene regulation. Here, we unveiled the mechanisms of CTCF-dependent/-independent gene regulation in different cell identities, and provided multiple layers of evidence that CTCF not only actively forms TADs and CTCF loops, but also directly promotes E-P and P-P contacts and potentially contributes to activation of thousands of lineage-specific genes. CTCF binding to the promoter of such genes is necessary for establishing their E–P and P-P contacts, and we demonstrated that artificial tethering of CTCF to gene promoter could promote P-P contacts and gene activation. The active function of promoter-proximal CTCF in promoting long-range E-P(P-P) contacts may affect different subsets of genes in different cell types due to the variable distribution of enhancers and CBSs. Supporting this, CTCF loss leads to variable phenotypes in different cell types. For example, CTCF knockout leads to severe developmental defects in many tissues20 and our study shows abnormal cell differentiation from ESCs to NPCs in the absence of CTCF, while CTCF is dispensable for immune cell transdifferentiation with minor effects on transcription that is comparable to our findings21. It is also reasonable to assume that some specific long-range E-P or P-P contacts that potentially determine lineage-specificity require the strong structure of long-range CTCF loops including TAD structures44. In other words, CTCF-mediated structures might help promoter-proximal CTCF to reel distal enhancers to the target gene promoter, possibly via loop extrusion, predominantly from one side, as we observed significant change of chromatin contacts frequency inside the TAD in the rescue experiments at the Vcan gene. Therefore, the orientation of CTCF binding motif at promoters may be important to determine the predominant direction for searching distal enhancers (Extended Data Fig. 10c). In our dCas9-CTCF tethering experiments, we designed gRNAs to target top and bottom strands separately, but both gRNAs could restore the chromatin loops and gene expression. It might due to presumably flexible orientation of dCas9-CTCF protein that does not bind to DNA directly. Detailed analysis of CTCF protein structure and its binding orientation will illuminate this mechanism.
Our study also revealed that the majority of individual E-P and P-P contacts are not affected by CTCF loss, which could explain the modest change of gene expression profiles upon CTCF loss. While CTCF occupancy might not contribute to forming of the majority of E-P contact45, their dynamics during cell differentiation were associated with the enhancer activities29. In addition to the enhancer activity itself, it can be assumed that these CTCF-independent E-P contacts are mediated by other factors. In NPCs, neuronal transcription factors such as Pax6 contribute to chromatin folding, which is compatible with highly variable changes of E-P contacts upon neural differentiation29. Besides these lineage-specific expressed factors, there are several chromatin folding proteins that are more broadly expressed across tissues. For example, Yin Yang 1 (YY1) has been shown to regulate E-P contacts in mouse embryonic stem cells46,47. Another genomic interaction mediator, LIM-domain-binding protein 1 (LDB1) is also known to control long-range and trans interactions48,49 that regulate specific gene sets such as olfactory receptor genes49 and genes for cardiogenesis48. Besides E-P contacts, P-P contacts are also considered as a key factor for gene regulation network because many promoters may have enhancer-like activity31,32 and promote cooperative regulation between genes31,50. However, the chromatin folding factors between promoters and how CTCF binding affects such P-P contacts and gene regulation have not been elucidated. Our study identified hundreds of CTCF-dependent P-P contacts in ESCs and NPCs, and demonstrated that CTCF binding at promoter can promotes E-P and P-P contacts and activates gene expression. The synchronous activity between interacting promoters is also broadly consistent with the phase separation model51, and further work should elucidate the relationship between CTCF-mediated genome structure and phase separation transcription machinery. It should be noted that PLAC-seq datasets in this study have H3K4m3 antibody bias between E-P and P-P contacts, therefore it cannot determine which type of contacts is predominantly affected by CTCF loss.
Lastly, we showed a link between CTCF binding and lineage-specific gene expression, yet the factor(s) regulating tissue-specific CTCF binding at promoters remains unclear. DNA methylation is the most well-studied determinants of CTCF binding52,53. CTCF occupancy is inhibited by DNA methylation and tissue-specific DNA methylation dynamics might affect the tissue-specific CTCF binding at promoters. For example, as reported in a recent study47 and our datasets, global loss of CTCF binding occurred during differentiation from ESCs to NPCs (Extended Data Fig. 1d), and these observations are consistent with the reports that pluripotency in ESCs is associated with global DNA demethylation54,55. On the other hand, a recent study showed that the vast majority of CTCF binding was not affected upon a drastic reduction of DNA methylation by double knockout of the major DNA methyltransferases DNMT1 and DNMT3B56. Our analysis also showed lack of correlation between the majority of tissue-specific promoter-proximal CTCF binding and DNA methylation. It is possible that 5-carboxylcytosine57 or cooperative binding with undetermined factors play a role in tissue-specific CTCF binding.
Our study only surveys the impact of CTCF loss after a certain period of CTCF depletion and some genes might be affected by secondary effects of the abnormal differentiation. Nevertheless, this study uncovers the genome-wide alteration of E-P and P-P contacts upon CTCF loss during neural differentiation and provides new insight into the functional role of CTCF in gene regulation. CTCF has been implicated in a variety of human diseases. It has been previously reported that CBSs are highly mutated in several cancer types58,59 and somatic CTCF mutations also occur in about one-quarter of endometrial carcinoma60. Thus, further study of the mechanism of CTCF-mediated gene regulation in various cell types could help to elucidate the causes of these diseases.
Methods:
Cell lines
The F1 Mus musculus castaneus × S129/SvJae mouse ES cells (XY; F123 cells)61 (a gift from Rudolf Jaenisch) were cultured in KnockOut Serum Replacement containing mouse ES cell media: DMEM 85%, 15% KnockOut Serum Replacement (Gibco), penicillin/streptomycin (Gibco), 1× non-essential amino acids (Gibco), 1× GlutaMax (Gibco), 1000 U/ml LIF (Millipore), 0.4 mM β-mercaptoethanol. The cells were typically grown on 0.2% gelatin-coated plates with irradiated mouse embryonic fibroblasts (MEFs) (GlobalStem). Cells were maintained by passaging using Accutase (Innovative Cell Technologies) on 0.2% gelatin-coated dishes (GENTAUR) at 37°C and 5% CO2. Medium was changed daily when cells were not passaged. Cells were checked for mycoplasma infection and tested negative.
Construction of the plasmids
The CRISPR/Cas9 plasmid (CTCF-mouse-3sgRNA-CRISPRexp-AID) was assembled using the Multiplex CRISPR/Cas9 Assembly System kit (a gift from Takashi Yamamoto, Addgene kit #1000000055). Oligonucleotides for three gRNA templates were synthesized, annealed and introduced into the corresponding intermediate vectors. The first gRNA matches the genome sequence 23 bp upstream of the stop codon of mouse CTCF. The oligonucleotides with sequences (5’-CACCGTGATCCTCAGCATGATGGAC-3’) and (5’-AAACGTCCATCATGCTGAGGATCAC-3’) were annealed. The other two gRNAs direct in vivo linearization of the donor vector: the first pair of oligonucleotides are (5’-CACCGCTGAGGATCATCTCAGGGGC −3’) and (5’-AAACGCCCCTGAGATGATCCTCAGC −3’); the second pair is (5’-CACCGATGCTGGGGCCTTGCTGGC-3’) and (5’-AAACGCCAGCAAGGCCCCAGCATC-3’). The three gRNA-expressing cassettes were incorporated into one single plasmid using Golden Gate assembly. The donor vector (mCTCF24-AID-donor-Neo) was constructed using PCR and Gibson Assembly Cloning kit (New England Biolabs). The insert cassette includes sequences that codes for a 5GA linker, the auxin-induced degron (AID), a T2A peptide and the neomycin resistant marker, and is flanked by 24-bp homology arms to integrate into the CTCF locus. The left and right arms have sequences CCTGAGATGATCCTCAGCATGATG and GACCGGTGATGCTGGGGCCTTGCT, respectively. The AID coding sequence was amplified from pcDNA5-H2B-AID-EYFP (a gift from Don Cleveland, Addgene plasmid #47329) and the T2A-NeoR was amplified from pAC95-pmax-dCas9VP160–2A-neo (a gift from Rudolf Jaenisch, Addgene plasmid #48227). The sequence for the 5GA linker was included in one of the primers. The original donor backbone was a gift from Dr. Ken-ichi T. Suzuki from Hiroshima University, Hiroshima, Japan.
The donor vector encodes the following amino acid sequence that corresponds to the 24-bp left homology arm of CTCF, a 5GA linker, AID, T2A, and NeoR: PEMILSMMGAGAGAGAGAGSVELNLRETELCLGLPGGDTVAPVTGNKRGFSETVDLKLNLNNEPANKEGSTTHDVVTFDSKEKSACPKDPAKPPAKAQVVGWPPVRSYRKNVMVSCQKSSGGPEAAAFVKVSMDGAPYLRKIDLRMYKSYDELSNALSNMFSSFTMGKHGGEEGMIDFMNERKLMDLVNSWDYVPSYEDKDGDWMLVGDVPWPMFVDTCKRLRLMKGSDAIGLAPRAMEKCKSRAGSGEGRGSLLTCGDVEENPGPRLETRMGSAIEQDGLHAGSPAAWVERLFGYDWAQQTIGCSDAAVFRLSAQGRPVLFVKTDLSGALNELQDEAARLSWLATTGVPCAAVLDVVTEAGRDWLLLGEVPGQDLLSSHLAPAEKVSIMADAMRRLHTLDPATCPFDHQAKHRIERARTRMEAGLVDQDDLDEEHQGLAPAELFARLKARMPDGEDLVVTHGDACLPNIMVENGRFSGFIDCGRLGVADRYQDIALATRDIAEELGGEWADRFLVLYGIAAPDSQRIAFYRLLDEFF*.
The lentiviral vector for expressing TIR1 (Lentiv4-EFsp-Puro-2A-TIR1–9Myc) was constructed using PCR and Gibson Assembly Cloning kit (New England Biolabs). The backbone was modified from lentiCRISPR v2 (a gift from Feng Zhang, Addgene plasmid #52961) and the TIR1–9myc fragment was amplified from pBabe TIR1–9myc (a gift from Don Cleveland, Addgene plasmid #47328). The expressing cassette includes a puromycin resistant marker followed by sequences that code for P2A peptide and TIR1–9myc protein. The gene expression is driven by EFS promoter in the original lentiCRISPR v2. The maps and the sequences of the plasmids are available at the following URLs. CTCF-mouse-3sgRNA-CRISPRexp-AID (https://benchling.com/s/seq-1R4nJ8quYptUqerRWSdX), mCTCF24-AID-donor-Neo (https://benchling.com/s/seq-LtJu9OTscKJNCEMOk8ok), Lentiv4-EFsp-Puro-2A-TIR1–9Myc (https://benchling.com/s/seq-6wSCsW3Kr9S1igXZ8H9K).
Transfection and establishment of CTCF-AID knock-in mESC clones
The cells were passaged once on 0.2% gelatin-coated feeder-free plates before transfection. The cells were transfected using the Mouse ES Cell Nucleofector Kit (Lonza) and Amaxa Nucleofector (Lonza) with 10 μg of the CRISPR plasmid and 5 μg of the donor plasmid following the manufacturer’s instructions. After transfection, the cells were plated on drug-resistant MEFs (GlobalStem). Two days after transfection, drug selection was started by addition of 160 μg/ml G418 (Geneticin, Gibco) to the medium. Drug-resistant colonies were isolated and the clones with AID knock-in on both alleles were found by performing PCR of the genomic DNA using primers specific to sequences flanking the 3’ end of the CTCF coding sequence (AAATGTTAAAGTGGAGGCCTGTGAG and AAGATTTGGGCCGTTTAAACACAGC). The sequence at the CTCF-AID junction on both alleles were checked by sequencing of allele-specific PCR products, which were generated by using either a CTCF-129-specific (CTGACTTGGGCATCACTGCTG) or a CTCF-Cast-specific (GTTTTGTTTCTGTTGACTTAGGCATCACTGTTA) forward primer and a reverse primer in the AID coding sequence (GAGGTTTGGCTGGATCTTTAGGACA). The expression of CTCF-AID fusion protein was confirmed by observing the difference in the molecular weight compared to the control cells by Western blot with anti-CTCF antibody (Millipore, 07–729).
Lentivirus production and infection
We produced the lentivirus for expressing TIR1–9myc using Lenti-X Packaging Single Shots system (Clontech) and infected the CTCF-AID knock-in mESCs following the manufacturer’s instructions. After infection, the cells were selected by culturing with 1 μg/ml puromycin. Drug-resistant colonies were isolated and expression of TIR1–9myc was confirmed by Western blot using anti-Myc antibody (Santa Cruz, sc-40). Clones expressing high level of TIR1–9myc were used for the subsequent experiments.
Preparation of CTCF-depleted cells and neural progenitor cell differentiation
The CTCF-AID knock-in mESCs expressing TIR1–9myc were passaged on 0.2% gelatin-coated plates without MEFs. We added 1 ul 500 mM auxin (Abcam, ab146403) per 1 ml medium to deplete CTCF, and changed medium with auxin every 24 hours. Cells were harvested 24, 48 or 96 hours after starting auxin treatment. For NPC differentiation62,63, the CTCF-AID knock-in mESCs were grown on MEFs and passaged on 0.2% gelatin-coated plates without MEFs one day before starting differentiation treatment. The cells were plated sparsely to avoid passaging to new plates during neural differentiation because most of the cells failed to attach to new plates after auxin treatment. On day 0, auxin was added to the CTCF-depleted cell samples, and LIF was deprived from the culture medium 6 hours after adding auxin. From day 1, 5 uM retinoic acid (Sigma, R2625) was added with LIF-deprived medium and auxin was also added continuously to the CTCF-depleted cell samples. Cells were harvested on day 2, day 4 and day 6. To harvest auxin-washout samples, auxin treatment was stopped on day 4 or day 6 and differentiation treatment was continued for another 2 days. Alkaline phosphatase staining was performed on each time point using the AP Staining kit II (Stemgent, 00–0055).
Antibodies
Antibodies used in this study were rabbit anti-CTCF (Millipore, 07–729, for western blotting), rabbit anti-Histone H3 (abcam, ab1791, for western blotting), rabbit anti-CTCF (Active Motif, 61311, for microChIP-seq), rabbit anti-Rad21 (Santa Cruz, sc-98784, for microChIP-seq), rabbit anti-H3K4me1 (abcam, ab8895, for ChIP-seq), rabbit anti-H3K4me3 (Millipore, 04–745, for ChIP-seq), rabbit anti-H3K27ac (Active Motif, 39133, for ChIP-seq), mouse anti-H3K27me3 (Active Motif, 61017, for ChIP-seq), mouse anti-Myc antibody (Santa Cruz, sc-40, for western blotting), and mouse anti-Cas9 (Cell Signaling, 14697, for western blotting). Goat anti-Rabbit IgG (H+L)-HRP (Bio Rad, 1706515) and Goat anti-Mouse IgG (H+L)-HRP (Invitrogen, 31430) were used as secondary antibody for western blotting.
Western blotting
Cells were washed with PBS and scraped in cold PBS, and pelleted to be stored at −80°C. Two million cells were resuspended in 100 μL lysis buffer (20 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA, 1mM EGTA, 1% Triton X-100, 1x complete protease inhibitor (Roche)), and sonicated for 10 minutes total ON time with pulses of 15 second ON and OFF, and 40% amplitude using QSONICA 800R (Qsonica). Protein concentration was measured using Pierce BCA Protein Assay Kit (Thermo Fisher). Laemmli Sample Buffer (Bio-Rad) with 355 mM 2-Mercaptoethanol was mixed with 15 μg of each sample and incubated for 5 minutes at 95°C. The samples were run on 4–15% Mini-PROTEAN® TGX™ Precast Gels (Bio-Rad), and transferred onto nitrocellulose membranes at 100 V for 1 hour. The membranes were rinsed with 1x TBST and blocked with 5% dry milk at room temperature for 45 minutes. After washing with TBST, the membranes were incubated with diluted antibody in the blocking buffer overnight at 4°C. After overnight incubating, membranes were washed 4 times 5minutes in 1x TBST at room temperature, and incubated with secondary antibody in blocking buffer at room temperature for 45 minutes. After washing 4 times with TBST, the substrates were detected using Pierce ECL Western Blotting Substrate (Thermo Fisher).
Cell cycle analysis
Cells were grown in 6-well plates. After dissociation with Accutase (Innovative Cell Technologies), 2–5 million cells were washed with PBS and re-suspended in 300 μl ice-cold PBS. Cells were fixed for a minimum of 24h at 4°C after drop-wise addition of 800 μl ice-cold ethanol. After fixation, cells were pelleted and re-suspended in PBS containing 0.1% Triton X-100, 20 μg/mL Propidium iodide and 50 μg/ml RNase A. Cells were incubated for 30 min at 37°C before subjected to flow cytometry analysis.
MicroChIP-seq library preparation
MicroChIP-seq experiments for CTCF and Rad21 were performed as described in ENCODE experiments protocols (“Ren Lab ENCODE Chromatin Immunoprecipitation Protocol for MicroChIP” in https://www.encodeproject.org/documents/) with minor modifications. Libraries were sequenced on HiSeq2500 or HiSeq4000 single end for 50 bp. Two biological replicates were prepared for each sample. See Supplementary Note for more detailed information.
ChIP-seq library preparation
ChIP-seq experiments for each histone mark were performed as described in ENCODE experiment protocols (“Ren Lab ENCODE Chromatin Immunoprecipitation Protocol” in https://www.encodeproject.org/documents/) with minor modifications. Libraries were sequenced on HiSeq4000 single end for 50 bp. Two biological replicates were prepared for each sample. See Supplementary Note for more detailed information.
RNA-seq library preparation
Total RNA was extracted from 1–2 million cells using the AllPrep Mini kit (QIAGEN) according to the manufacturer’s instructions and 1 μg of total RNA was used to prepare each RNA-seq library. The libraries were prepared using TruSeq Stranded mRNA Library Prep Kit (Illumina). Library quality and quantity were estimated with TapeStation (Agilent Technologies) and Qubit (Thermo Fisher Scientific) assays. Libraries were sequenced on HiSeq4000 using 50 bp paired-end. Two biological replicates were prepared for each sample.
Hi-C library preparation
In situ Hi-C experiments were performed as previously described using the MboI restriction enzyme4. Libraries were sequenced on Illumina HiSeq 4000. Two biological replicates were prepared for each sample. See Supplementary Note for more detailed information.
PLAC-seq library preparation
PLAC-seq experiments were performed as previously described25. Libraries were sequenced on Illumina HiSeq 4000. Two biological replicates were prepared for each sample. See Supplementary Note for more detailed information.
ChIP-seq data analysis
Each fastq file was mapped to mouse genome (mm10) with BWA64 -aln with “-q 5 -l 32 -k 2” options. PCR duplicates were removed using Picard MarkDuplicates (https://github.com/broadinstitute/picard) and the bigWig files were created using deepTools 65 with following parameters: bamCompare --binSize 10 --normalizeUsing RPKM --ratio subtract (or ratio). The deepTools was also used for generating heatmaps. Peaks were called with input control using MACS2 66 with regular peak calling for narrow peaks (e.g. CTCF) and broad peak calling for broad peaks (e.g. H3K27me3, K3K4me1). Enhancer regions were characterized by the presence of both H3K4me1 and H3K27ac peaks, but not H3K4me3 peak. Promoter regions that are potentially activate genes like enhancers were characterized by the presence of H3K4me1, H3K27ac, and H3K4me3 peaks. Repressive distal elements that were analyzed in Extended Data Fig. 4j were characterized by the presence of H3K4me1 and H3K27me3 peaks, but not H3K27ac peak. We used DEseq267 to calculate differences in peak levels between samples.
RNA-seq data analysis
RNA-seq reads (paired-end, 100 bases) were aligned against the mouse mm10 genome assembly using STAR68. The mapped reads were counted using HTSeq69 and the output files from two replicates were subsequently analyzed by edgeR70 to estimate the transcript abundance and to detect the differentially expressed genes. Only genes that had H3K4me3 ChIP-seq peaks on TSS were used for downstream analysis (Fig. 1d, e). Differentially expressed genes were called by FDR < 0.05 and fold change > 2 thresholds. RPKM was calculated using an in-house pipeline.
Hi-C data analysis
Hi-C reads (paired-end, 50 or 100 bases) were aligned against the mm10 genome using BWA64 -mem. Reads mapped to the same fragment were removed and PCR duplicate reads were removed using Picard MarkDuplicates. Raw contact matrices were constructed using in-house scripts with 10 or 40 kb resolution, and then normalized using HiCNorm71. We used juicebox pre72 to create hic file with -q 30 -f options. To visualize Hi-C data, we used Juicebox72 and 3D Genome Browser (http://www.3dgenome.org). Topological domain boundaries were identified at 40-kb or 10-kb resolution based on the directionality index (DI) score and a Hidden Markov Model as previously described1, and they were also identified based on insulation scores using peakdet (Billauer E, 2012. http://billauer.co.il/peakdet.html). The insulation score analysis was performed as previously described73 and insulation scores on TAD boundaries were calculated by taking the average value of scores that overlapped with TAD boundaries. The stripe calling was performed using a homemade pipeline (shared from Feng Yue lab, Penn State University) that is based on the algorithm proposed in a previous study74. We used HiCCUPS72 with options “-r 10000 -k KR -f 0.001 -p 2 -i 5 -d 50000” to identify Hi-C peaks as chromatin loops, and then we chose CTCF associated loops among them that were overlapped with convergently oriented CTCF ChIP-seq peaks in control cells. The aggregate analysis of CTCF associated loops were performed using APA72 with default parameters.
To assess global changes in TAD boundary strength between samples, we performed a comparison of each samples’ aggregated boundary contact profile (Extended Data Fig. 2c). First, to generate a consensus set of TAD boundaries we performed a simple merge between boundaries from clone 1 before auxin treatment (Clone 1, 0 hr) and boundaries from clone 2 before auxin treatment (Clone 2, 0 hr). Two filtering steps were used to generate the final set of consensus boundaries: 1) We discarded boundaries there were within 3.04 Mb of a chromosome start or end, because we would not be able to extract a submatrix of the correct size for the aggregate analysis; 2) We discarded boundaries > 200kb, because these often represent regions of disorganized chromatin between TADs, rather than true TAD boundaries. Next, we extracted a Hi-C sub-matrix for each boundary in each sample. Each sub-matrix consists of a window of 3.04 Mb centered on the midpoint of the boundary in question. These boundary sub-matrices were then averaged to generate one 3.04 Mb matrix representing the average boundary contact profile in a given sample. To facilitate comparison between samples, average boundary contact profiles were then normalized across samples using standard quantile normalization. We then made pairwise comparisons between samples by subtracting the average boundary contact profile of sample 1 from the average boundary contact profile of sample 2. The list of consensus TAD boundaries used here is the same as that described for the aggregate boundary analysis above.
PLAC-seq data analysis
PLAC-seq reads (paired-end, 50 bases) were aligned against the mm10 genome using BWA64 -mem. Reads mapped to the same fragment were removed and PCR duplicate reads were removed using Picard MarkDuplicates. Filtered reads were binned at 10 kb size to generate the contact matrix. Individual bins that were overlapped with H3K4me3 peaks on TSSs were used for downstream differential contact analysis. For the peak calling, we used MAPS75 with default settings and FitHiChIP76 with coverage bias correction with default settings in 10 kb resolution.
For differential contact analysis, the raw contact counts in 10 kb resolution bins that have the same genomic distance were used as inputs for comparison. To minimize the bias from genomic distance, we stratified the inputs into every 10-kb genomic distance from 10 kb to 150 kb, and the other input bins with longer distances were stratified to have uniform size of input bins that were equal to that of 140–150 kb distance bins. Since each input showed negative binomial distribution, we used edgeR70 to get the initial set of differential interactions. We only used bins that have more than 20 contact counts in each sample of two replicates for downstream analysis. The significances of these differential interactions are either due to the difference in their H3K4me3 ChIP coverage or 3D contacts coverage. Therefore, the chromatin contacts overlapping with differential ChIP-seq peaks (FDR < 0.05, logFC < 0.5) were removed and only the chromatin contacts with the same level of H3K4me3 ChIP-seq peaks were processed. In this differential analysis, we used all bins for inputs that included non-significant interactions that were not identified by MAPS or FitHiChIP peak caller, because the majority of short-range interactions were not identified as significant peaks due to their high background and the changes in the short-range interactions might be also critical for gene regulation. We identified a large number of differentially changed short-range interactions even though many of them were not identified as significant peaks, and we observed a clear correlation between these differentially changed interactions and the changes of active enhancer levels on their anchor regions during neural differentiation, suggesting these interaction changes might reflect the biological changes. We used significance level with change direction (−/+ log10(p-value)) instead of fold change to show the changes of interactions, because fold change tends to be small value especially in short-range interactions even though the change is actually significant for biological aspects.
Active/inactive contact (AIC) ratio/value
The change of chromatin contacts on enhancers and promoters are affected by the alteration of enhancer activities such as H3K27ac and H3K4me1 levels (Extended Data Fig. 4b), and it is also well known that gene expression levels have positive correlation with these active marks around their TSS. These findings suggest that information of contact counts itself should involve the information of enhancer activity. Furthermore, the majority of genes have multiple E-P contacts with variable changes of contact frequencies. Therefore, we designed AIC value to represent quantitative activity of multiple E-P contacts and aimed to show the relationship between gene regulation change and E-P contacts change without using any quantitative values of histone marks. First, we summed total contact counts on active elements and promoters in each gene. As for promoter-promoter (P-P) contacts, they have similar function as E-P contacts32. However, it is still unclear that the same contact frequency of P-P contacts has the same effect as that of E-P contacts. Moreover, in H3K4me3 PLAC-seq datasets, P-P contacts correspond to peak-to-peak interactions that have generally higher contact counts than that of peak-to-non-peak interactions. Therefore, we divided the P-P contact counts by a certain integer that showed the highest correlation coefficient between gene expression changes and AIC value changes before summing total active contact counts. We tested integers from 0 to 10 to divide the P-P contact counts, and dividing by 3 and 8 showed the highest correlation coefficient between gene expression changes and AIC value changes in ESCs and NPCs, respectively. The simply summing of active contact counts is still not proper for comparison between different samples because they are affected by the difference of H3K4me3 peak levels on TSS in different samples. Therefore, in order to cancel the bias from the H3K4me3 peak levels in different samples, we also calculated total contact counts on inactive (non-active) regions and computed active/inactive contact (AIC) ratio on each gene by following formula.
AIC ratio on gene A = Sum of Active Contact counts (SAC) on gene A / Sum of Inactive Contact counts (SIC) on gene A
Next, we calculated the average of SICs from the comparing two samples on each gene, and multiplied them by AIC ratios to calculate AIC values. AIC values are computed as pseudo contact counts to perform differential analysis by edgeR70 after rounding them to their nearest integer. The bias from different H3K4me3 levels on TSS in different samples does not occur by multiplying the common average value of the SICs.
AIC value on gene A = AIC ratio on gene A × Average of SICs of compared two samples on gene A
We also computed the changes of AIC values using Hi-C datasets in the same way, and we could observe comparable correlations with gene expression changes. In Extended Data Fig. 4h, we also calculated AIC values of E-P contacts and P-P contacts separately. In this case, SAC was calculated using only E-P contact counts or P-P contact counts, and SIC was calculated in the same way as described above.
Odds ratio calculation for CTCF-dependent E-P contacts enrichment
For Fig. 3a, b all PLAC-seq contacts (10 kb resolution) on promoter and enhancer were classified based on the distance from anchor sites (enhancer side or promoter side) to the nearest CTCF binding sites (Fig. 3a, categorized into 4×4 bins). They are also classified based on the number of CTCF motif sites around each anchor site (10 kb bin ±5 kb) (Fig. 3b, categorized into 5×5 bins). Then, we generated 2×2 tables based on whether they are CTCF-dependent contacts or not (FDR < 0.05) and whether they were categorized into the bin or not. Odds ratios and p-values (Fisher’s test) on each 2×2 tables were calculated.
For Fig. 3c, d all PLAC-seq contacts (10 kb resolution) on promoter and enhancer were classified based on the distance from anchor sites (enhancer side or promoter side) to the nearest CTCF binding sites (Fig. 3c, categorized into 4×4 bins). They are also classified based on the number of CTCF motif sites around each anchor site (10 kb bin ±5 kb) (Fig. 3d, categorized into 5×5 bins). Then, we generated 2×2 tables based on whether they are on differentially down-regulated genes or not (FDR < 0.05) and whether they were categorized into the bin or not. For the E-P contacts on differentially down-regulated genes, chromatin contacts that were identified as significant by peak calling were counted (p value < 0.01). Odds ratios and p-values (Fisher’s test) on each 2×2 tables were calculated.
For Fig. 4a, all genes were classified based on the distance to the nearest interacting enhancer and the number of enhancers around TSS (< 200 kb) (categorized into 3×3 bins). The distance to the nearest interacting enhancer is represented by the shortest genomic distance of significant PLAC-seq peaks on enhancers and promoters (p-value < 0.01). Then, we generated 2×2 tables based on whether they are CTCF-independent stably-regulated genes or not (FDR < 0.05) and whether they were categorized into the bin or not. Odds ratios and p-values (Fisher’s test) on each 2×2 tables were calculated. In Fig. 4b and Extended Data Fig. 9e, the same analysis as Fig. 4b was performed in CTCF-dependent down-regulated genes and CTCF-dependent up-regulated genes.
3D modelling
We used the Strings & Binders Switch (SBS) polymer model36 to dissect the 3D spatial organization of the Vcan gene region in wild type and CTCF depleted NPC cells. In the SBS view, a chromatin filament is modelled as a Self-Avoiding Walk (SAW) chain of beads, comprising different specific binding sites for diffusing cognate molecular factors, called binders. Different types of binding sites are visually represented by distinct colors. Beads and binders of the same color interact with an attractive potential, so driving the folding of the chain. All binders also interact unspecifically with all the beads of the polymer by a weaker energy affinity (see below). We estimated the optimal number of distinct binding site types describing the locus and their arrangement along the polymer chain by using the PRISMR algorithm, a previously described machine learning based procedure37. In brief, PRISMR takes as input a pairwise experimental contact map (e.g. Hi-C) of the studied genomic region and, via a standard Simulated Annealing Monte Carlo optimization, returns the minimal number of different binding site types and their arrangement along the SBS polymer chain, which best reproduce the input contact map. Next, we ran Molecular Dynamics (MD) simulations of the inferred SBS polymers so to produce a thermodynamics ensemble of single molecule 3D conformations.
We focused on the genomic region chr13:89,200,000–92,000,000 (mm10) encompassing the mouse Vcan gene, in wild type and CTCF depleted NPC cells. Applied to our Hi-C contacts data of the region, at 10kb resolution, the PRISMR algorithm37 returned in both cases polymer models made of 6 different types of binding sites. In our simulations, beads and binders interact via standard potentials of classical polymer physics studies77 and the system Brownian dynamics is defined by the Langevin equation. By using the LAMMPS software78, we ran massive parallel MD simulations so producing an ensemble of, at least, 102 independent conformations. We started our MD simulations from initial SAW configurations and let the polymers evolve up to 108 MD time steps when the equilibrium globule phase is reached. We explored a range of specific and non-specific binding energies in the weak biochemical energy range, respectively from 3.0KBT to 8.0KBT and from 0KBT to 2.7KBT, where KB is the Boltzmann constant and T is the system temperature. For the sake of simplicity, those affinity strengths are the same for all the different types. All details about the model and MD simulations are described in 36,37. To better highlight the locations of the Vcan gene and its regulatory elements in the two different cases, we produced a coarse-grained version of the polymers. We interpolated the coordinates of the beads with a smooth third-order polynomial curve and used the POV-ray software (Persistence of Vision Raytracer Pty. Ltd) to produce the 3D images.
For the model derived contact maps, we computed the average contact frequencies from our MD derived ensemble of 3D polymer model conformations for each cell type. We followed a standard approach that considers a pair of polymer sites in contact if their physical distance is lower than a threshold distance37. To compare model contact maps with corresponding Hi-C data in each cell type, we used the HiCRep stratum adjusted correlation coefficient (SCC)79, a bias-corrected correlation designed for Hi-C comparison, with a smoothing parameter h=5 and an upper bound of interaction distance equal to 1.5Mb. To compute the model frequencies of multiple contacts, we proceeded similarly. Specifically, fixed a viewpoint site k, we accounted for a triple contact (i,j,k) between k and any pair of sites i,j along the locus if their relative physical distances were all lower than the threshold distance.
CTCF motif deletion and tethering dCas9-CTCF
The CRISPR/Cas9 system was used to delete CTCF motif nearby Vcan promoter. The sequences of the DNA targeted are listed below (the protospacer adjacent motif is underlined). The guide RNAs were generated using GeneArt Precision gRNA Synthesis Kit (Invitrogen).
5’-TTCAGCACAAGCGGAAAATAGGG-3’,
5’-CTGCTTGCAGTTGGGTGTTTCGG-3’
Transfection of gRNA and Cas9 ribonucleoprotein (EnGen SpyCas9, New England Biolabs) into mESCs was performed using Neon Transfection System, 10 ul tip kit (Life Technologies). The cells were grown for approximately one week, and individual colonies were picked into a 96-well plate. After expanding cells, genotyping by PCR and Sanger sequencing were performed to confirm the motif deletion.
For the generation of dCas9-CTCF tethered cell lines, plasmids containing sequences for dCas9 and CTCF were generated by modifying lenti-dCas-VP64-Blast (a gift from Feng Zhang, Addgene #61425). The VP64 cassette was replaced by CTCF sequences to generate dCas9-CTCF and neomycin resistant marker that was taken from pAC95-pmax-dCas9VP160–2A-neo (a gift from Rudolf Jaenisch, Addgene 48227) was inserted. To generate gRNA plasmids to recruit dCas9, the gRNA oligos were inserted into the backbone vector (pSPgRNA, a gift from Charles Gersbach, Addgene Plasmid #47108). The gRNA was designed to target the top and bottom strand of Vcan promoter-proximal region which is close to the deleted CTCF motif locus. The sequences of the DNA targeted are listed below (the protospacer adjacent motif is underlined).
5’-CCTGCCTCCTTGGACAGAGACGG-3’ (for top strand)
5’-GTCCCTTCCGTCTCTGTCCAAGG-3’ (for bottom strand)
The plasmids for dCas9-CTCF and gRNA were extracted using PureLink HiPure Plasmid Midiprep Kit (Invitrogen). For the electroporation, 350 ng of dCas9-CTCF plasmids and 600 ng of gRNA plasmids (1 ul) were added to 0.1–0.2 million mESCs resuspended in 10 ul Buffer R (Invitrogen), and electric pulse was delivered with the setting of 1200 V, 20 ms, and 2 pulses. After culturing approximately 10 days, individual colonies were picked and genotyping and western blotting were performed to confirm the sequences from the transfection plasmids and their protein expression.
For deletion of enhancer region that is interacting with Vcan promoter. The sequences of the DNA targeted are listed below (the protospacer adjacent motif is underlined).
5’- AGGAACGGCCCATTCCCGAGGGG-3’,
5’- CAATCAATAATAACACGCATAGG-3’
Generating gRNA and transfection of gRNA and Cas9 ribonucleoprotein into mESCs were performed in the same way as the deletion of CTCF motif was done. Genotyping by PCR and Sanger sequencing were performed to confirm the deletion.
Analysis of CTCF-occupied promoter genes in multiple mouse tissues
To analyze the CTCF ChIP-seq signals around promoters, we calculated fold changes of sample RPKM over input RPKM in each 50-bp bin and summed them in each promoter region (TSS ±10 kb) when the 50-bp bins were located at the regions of optimal IDR thresholded ChIP-seq peaks. Then correlation coefficient between these summed CTCF ChIP-seq signals and RNA-seq RPKM values across 9 mouse tissues was computed in each gene. The random datasets used as control in Fig. 5b was generated by randomly assigning the CTCF ChIP-seq signal levels at each promoter to each gene expression level (RNA-seq). Heatmap was generated for genes with high correlation coefficient (> 0.6). The values in the heatmap were calculated by log2(summed ChIP-seq signals / average value of all tissues) for promoter-proximal CTCF signal and log2(RPKM / average RPKM of all tissues) for gene expression. Lineage specificity of transcription was measured by Shannon entropy42. For DNA methylation levels around promoters, DNA methylation rates at CBSs (motif sequences ±100 bp) were calculated and averaged in each promoter region (TSS ±10 kb). PhastCons score80 was used for conservation analysis. The highest pahstCons score at each CTCF motif locus was represented as the conservation score of each CBS.
Extended Data
Supplementary Material
Acknowledgments:
We thank Drs. Elphrege Nora and Benoit Bruneau for exchanging datasets and reagents. We would like to give special thanks to Samantha Kuan for operating the sequencing instruments and Tristin Liu and Zhen Ye for helping with experiments. We would like to acknowledge the help of Drs. Victor Lobanenkov and Arshad Desai for giving helpful advice and the help of Drs. Feng Yue, Xiaotao Wang, Ivan Juric, and Armen Abnousi for sharing computational pipelines. We would also like to give special thanks to Drs. Ramya Raviram, Rongxin Fang, Yanxiao Zhang, Anthony Schmitt, and Sora Chee for sharing helpful information and protocols, as well as all the other members of the Ren laboratory. This work was supported by the Ludwig Institute for Cancer Research (B.R.), NIH (1U54DK107977–01) (B.R.), NIH (1U54DK107965) (H.Z.), a Ruth L. Kirschstein Institutional National Research Award from the National Institute for General Medical Sciences (T32 GM008666) (J.D.H.), and a Postdoc fellowship from the TOYOBO Biotechnology Foundation (N.K.).
Footnotes
Competing interests:
B.R. is a co-founder of Arima Genomics, Inc. and Epigenome Technologies, Inc..
Reporting Summary statement:
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
Code Availability Statement:
PLAC-seq and the other analyses in this study were performed by combining public software as described in Methods.
Data Availability statement:
All datasets generated in this study have been deposited to Gene Expression Omnibus (GEO), with accession number GSE94452. Hi-C dataset analyzed in Extended Data Fig. 3g–i was provided from Dr. Benoit Bruneau (GSE98671). Accession codes for the mouse tissue datasets used in Fig. 5 and Extended Data Fig. 10 are listed in Supplementary Table 6; data sets are available from the ENCODE portal (https://www.encodeproject.org/).
Source data are available with the paper online.
References:
- 1.Heintzman ND et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–12 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Long HK, Prescott SL & Wysocka J. Ever-Changing Landscapes: Transcriptional Enhancers in Development and Evolution. Cell 167, 1170–1187 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shen Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–20 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Andersson R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yu M. & Ren B. The Three-Dimensional Organization of Mammalian Genomes. Annu Rev Cell Dev Biol 33, 265–289 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Benabdallah NS et al. Decreased Enhancer-Promoter Proximity Accompanying Enhancer Activation. Mol Cell 76, 473–484 e7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Alexander JM et al. Live-cell imaging reveals enhancer-dependent Sox2 transcription in the absence of enhancer proximity. Elife 8(2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dixon JR et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–80 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Phillips-Cremins JE et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281–95 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nora EP et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–5 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rao SS et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nora EP et al. Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization. Cell 169, 930–944.e22 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sanborn AL et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A 112, E6456–65 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fudenberg G. et al. Formation of Chromosomal Domains by Loop Extrusion. Cell Rep 15, 2038–49 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Monahan K. et al. Role of CCCTC binding factor (CTCF) and cohesin in the generation of single-cell diversity of protocadherin-alpha gene expression. Proc Natl Acad Sci U S A 109, 9125–30 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang X. et al. Fundamental roles of chromatin loop extrusion in antibody class switching. Nature 575, 385–389 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee J, Krivega I, Dale RK & Dean A. The LDB1 Complex Co-opts CTCF for Erythroid Lineage-Specific Long-Range Enhancer Interactions. Cell Rep 19, 2490–2502 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rao SSP et al. Cohesin Loss Eliminates All Loop Domains. Cell 171, 305–320 e24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Arzate-Mejia RG, Recillas-Targa F. & Corces VG Developing in 3D: the role of CTCF in cell differentiation. Development 145(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Stik G. et al. CTCF is dispensable for immune cell transdifferentiation but facilitates an acute inflammatory response. Nat Genet (2020). [DOI] [PubMed] [Google Scholar]
- 22.Nishimura K, Fukagawa T, Takisawa H, Kakimoto T. & Kanemaki M. An auxin-based degron system for the rapid depletion of proteins in nonplant cells. Nat Methods 6, 917–22 (2009). [DOI] [PubMed] [Google Scholar]
- 23.Holland AJ, Fachinetti D, Han JS & Cleveland DW Inducible, reversible system for the rapid and complete degradation of proteins in mammalian cells. Proc Natl Acad Sci U S A 109, E3350–7 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Krietenstein N. et al. Ultrastructural Details of Mammalian Chromosome Architecture. Mol Cell 78, 554–565 e7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fang R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res 26, 1345–1348 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mumbach MR et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods 13, 919–922 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dowen JM et al. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374–387 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Krijger PH et al. Cell-of-Origin-Specific 3D Genome Structure Acquired during Somatic Cell Reprogramming. Cell Stem Cell 18, 597–610 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bonev B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572 e24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Li Y. et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS One 9, e114485 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Li G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Diao Y. et al. A tiling-deletion-based genetic screen for cis-regulatory element identification in mammalian cells. Nat Methods 14, 629–635 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Thiecke MJ et al. Cohesin-Dependent and -Independent Mechanisms Mediate Chromosomal Contacts between Promoters and Enhancers. Cell Rep 32, 107929 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Landolt RM, Vaughan L, Winterhalter KH & Zimmermann DR Versican is selectively expressed in embryonic tissues that act as barriers to neural crest cell migration and axon outgrowth. Development 121, 2303–12 (1995). [DOI] [PubMed] [Google Scholar]
- 35.Wu Y. et al. Versican V1 isoform induces neuronal differentiation and promotes neurite outgrowth. Mol Biol Cell 15, 2093–104 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chiariello AM, Annunziatella C, Bianco S, Esposito A. & Nicodemi M. Polymer physics of chromosome large-scale 3D organisation. Sci Rep 6, 29775 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bianco S. et al. Polymer physics predicts the effects of structural variants on chromatin architecture. Nat Genet 50, 662–667 (2018). [DOI] [PubMed] [Google Scholar]
- 38.Herz HM et al. Enhancer-associated H3K4 monomethylation by Trithorax-related, the Drosophila homolog of mammalian Mll3/Mll4. Genes Dev 26, 2604–20 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hu D. et al. The MLL3/MLL4 branches of the COMPASS family function as major histone H3K4 monomethylases at enhancers. Mol Cell Biol 33, 4745–54 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Yan J. et al. Histone H3 lysine 4 monomethylation modulates long-range chromatin interactions at enhancers. Cell Res 28, 387 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.He Y. et al. Spatiotemporal DNA methylome dynamics of the developing mouse fetus. Nature 583, 752–759 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Martinez O. & Reyes-Valdes MH Defining diversity, specialization, and gene specificity in transcriptomes through information theory. Proc Natl Acad Sci U S A 105, 9709–14 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee DP et al. Robust CTCF-Based Chromatin Architecture Underpins Epigenetic Changes in the Heart Failure Stress-Gene Response. Circulation 139, 1937–1956 (2019). [DOI] [PubMed] [Google Scholar]
- 44.Wutz G. et al. ESCO1 and CTCF enable formation of long chromatin loops by protecting cohesin(STAG1) from WAPL. Elife 9(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hsieh TS et al. Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding. Mol Cell 78, 539–553 e8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Weintraub AS et al. YY1 Is a Structural Regulator of Enhancer-Promoter Loops. Cell 171, 1573–1588 e28 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Beagan JA et al. YY1 and CTCF orchestrate a 3D chromatin looping switch during early neural lineage commitment. Genome Res 27, 1139–1152 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Caputo L. et al. The Isl1/Ldb1 Complex Orchestrates Genome-wide Chromatin Organization to Instruct Differentiation of Multipotent Cardiac Progenitors. Cell Stem Cell 17, 287–99 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Monahan K, Horta A. & Lomvardas S. LHX2- and LDB1-mediated trans interactions regulate olfactory receptor choice. Nature 565, 448–453 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schoenfelder S. et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res 25, 582–97 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hnisz D, Shrinivas K, Young RA, Chakraborty AK & Sharp PA A Phase Separation Model for Transcriptional Control. Cell 169, 13–23 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wang H. et al. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res 22, 1680–8 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Renda M. et al. Critical DNA binding interactions of the insulator protein CTCF: a small number of zinc fingers mediate strong binding, and a single finger-DNA interaction controls binding at imprinted loci. J Biol Chem 282, 33336–45 (2007). [DOI] [PubMed] [Google Scholar]
- 54.Leitch HG et al. Naive pluripotency is associated with global DNA hypomethylation. Nat Struct Mol Biol 20, 311–6 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ficz G. et al. FGF signaling inhibition in ESCs drives rapid genome-wide demethylation to the epigenetic ground state of pluripotency. Cell Stem Cell 13, 351–9 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Maurano MT et al. Role of DNA Methylation in Modulating Transcription Factor Occupancy. Cell Rep 12, 1184–95 (2015). [DOI] [PubMed] [Google Scholar]
- 57.Nanan KK et al. TET-Catalyzed 5-Carboxylcytosine Promotes CTCF Binding to Suboptimal Sequences Genome-wide. iScience 19, 326–339 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Katainen R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat Genet 47, 818–21 (2015). [DOI] [PubMed] [Google Scholar]
- 59.Kaiser VB, Taylor MS & Semple CA Mutational Biases Drive Elevated Rates of Substitution at Regulatory Sites across Cancer Types. PLoS Genet 12, e1006207 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Cancer Genome Atlas Research, N. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
References:
- 61.Gribnau J, Hochedlinger K, Hata K, Li E. & Jaenisch R. Asynchronous replication timing of imprinted loci is independent of DNA methylation, but consistent with differential subnuclear localization. Genes Dev 17, 759–73 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Strubing C. et al. Differentiation of pluripotent embryonic stem cells into the neuronal lineage in vitro gives rise to mature inhibitory and excitatory neurons. Mech Dev 53, 275–87 (1995). [DOI] [PubMed] [Google Scholar]
- 63.Bain G, Kitchens D, Yao M, Huettner JE & Gottlieb DI Embryonic stem cells express neuronal properties in vitro. Dev Biol 168, 342–57 (1995). [DOI] [PubMed] [Google Scholar]
- 64.Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–60 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ramirez F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–5 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Zhang Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Love MI, Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Anders S, Pyl PT & Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Robinson MD, McCarthy DJ & Smyth GK edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–40 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Hu M. et al. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28, 3131–3 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Durand NC et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Crane E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–4 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Vian L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 175, 292–294 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Juric I. et al. MAPS: Model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments. PLoS Comput Biol 15, e1006982 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Bhattacharyya S, Chandra V, Vijayanand P. & Ay F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat Commun 10, 4221 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Kremer K. & Grest GS Dynamics of entangled linear polymer melts: A molecular-dynamics simulation. The Journal of Chemical Physics 92, 5057–5086 (1990). [Google Scholar]
- 78.Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics 117, 1–19 (1995). [Google Scholar]
- 79.Yang T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27, 1939–1949 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Siepel A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15, 1034–50 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.