Abstract
Vertebrate genomes are partitioned into contact domains defined by enhanced internal contact frequency and formed by two principal mechanisms: compartmentalization of transcriptionally active and inactive domains, and stalling of chromosomal loop-extruding cohesin by CTCF bound at domain boundaries. While Drosophila has widespread contact domains and CTCF, it is currently unclear whether CTCF-dependent domains exist in flies. We genetically ablate CTCF in Drosophila and examine impacts on genome folding and transcriptional regulation in the central nervous system. We find that CTCF is required to form a small fraction of all domain boundaries, while critically controlling expression patterns of certain genes and supporting nervous system function. We also find that CTCF recruits the pervasive boundary-associated factor Cp190 to CTCF-occupied boundaries and co-regulates a subset of genes near boundaries together with Cp190. These results highlight a profound difference in CTCF-requirement for genome folding in flies and vertebrates, in which a large fraction of boundaries are CTCF-dependent and suggest that CTCF has played mutable roles in genome architecture and direct gene expression control during metazoan evolution.
Subject terms: Developmental biology, Epigenetics, Gene regulation, Chromatin, Transcription
Although the Drosophila genome has widespread contact domains and CTCF, it remains unclear whether CTCF-dependent domains exist in flies. Here, the authors ablate CTCF in Drosophila and find that CTCF is required to form a small fraction of all domain boundaries, suggesting differences in the role of CTCF for genome folding in flies and vertebrates.
Introduction
A wide range of animal genomes are partitioned into a series of contact domains (CDs) that exhibit increased physical proximity among loci within them. An evolutionarily conserved mechanism of such genome folding is thought to be compartmentalization, reflecting the segregation of chromosomal domains based on their transcriptional and epigenetic states1–3. In vertebrates, chromosomal loops are additionally extruded on underlying compartmental domains through a process involving DNA-bound CTCF molecules which stall loop-extruding cohesin complexes at domain boundaries1,4–10. CTCF-dependent extrusion-based boundaries either reinforce or counteract compartmental domain boundaries, depending on the locus. Overall, a large fraction of boundaries in the vertebrate genome are CTCF-dependent9,11.
Intriguingly, although Drosophila has widespread CDs and CTCF, it is currently unclear whether CTCF-dependent domains exist in Drosophila. High-resolution genome-wide Hi-C maps of formaldehyde-crosslinking frequencies between pairs of DNA fragments (as a measurement of their proximity in 3D-space) were recently generated in Drosophila tissue culture cells2,12–15. These studies highlighted the lack of hallmarks of CTCF-mediated domains observed in vertebrate cells. Rather, evidence suggests that CDs in flies are formed by CTCF-independent compartmentalization and other transcription-related processes, as most boundaries lie between domains with different histone modifications or at promoters of highly transcribed genes2,12,16–18.
Crucially, the functional importance of genome folding into CTCF-dependent domains is not fully understood in any organism. CTCF is essential for the viability of mammalian cells11,19,20, whereas it is dispensable for early development in Drosophila21. Assessing whether or not CTCF-mediated domains exist in Drosophila is important for understanding their relevance for genome function. Recent studies have perturbed specific CDs in flies to address their biological roles without knowing whether they are CTCF-mediated or compartmental22–24, yet different types of CDs may have different functions.
CTCF-dependent domains in mammals generally comprise regulatory elements and their target promoters25–27. This suggested that CTCF somehow limits regulatory crosstalk between CDs, and fosters regulatory interactions within them. This model is, however, difficult to test in mammals because global perturbation of CTCF leads to cell death. Acute depletion of CTCF protein in mouse embryonic stem cells followed by transcriptional profiling did not reveal widespread transcriptional changes11. Alternatively, deletion of CTCF binding sites near developmental genes in cultured cells and mice identified some sites where CTCF appears to critically prevent developmental defects and disease28–30, and many CTCF sites that did not appear functional31–33. These diverse results paint an opaque picture of how CTCF impacts gene expression. Previous studies that partially knocked-down CTCF in Drosophila cell lines also did not reveal clear effects on transcription34–36. Analysis of the homeotic phenotype of CTCF0 mutants completely lacking both maternal and zygotic CTCF suggested that CTCF blocks regulatory crosstalk between elements on either side of some CTCF binding sites21. A fundamental question arising from comparative studies in flies and humans is how CTCF impacts transcription, and how this relates to its uncertain architectural function in flies. Whether CTCF stably associates with partner proteins to effect its functions also remains unclear.
Here, we show using CTCF0 mutant Drosophila that CTCF is critically required in neurons for fly viability. We examine the effects of CTCF loss on genome folding and transcriptional regulation in the central nervous system (CNS) and investigate the molecular basis of CTCF function.
Results
CTCF expression in neural stem cells (NSCs) or neurons is essential for fly viability
To identify a biologically relevant tissue in which to study CTCF function in Drosophila, we used previously described CTCF knock-out (CTCFKO) mutants and CTCF0 mutants that additionally lack maternally inherited CTCF21. Some CTCFKO mutants (60%) hatch into adults with spasmatic movements suggesting a neurological phenotype that might be the cause of their short lifespan (Figs. 1a, 1b, Supplementary Movie 1). We tested the relevance of CTCF expression in the nervous system by performing tissue-specific knock-out and rescue experiments. Specifically, we used Gal4 drivers active in NSCs, mature neurons or muscles to drive conditional excision of a CTCF rescue transgene (knock-out) or UAS-CTCF expression (rescue) in CTCF mutant genetic backgrounds. Loss of CTCF expression in NSCs or neurons compromised the ability of flies to hatch to a comparable extent as loss of all zygotic CTCF expression (Fig. 1a) and severely shortened the life span of flies that did hatch (Fig. 1b, Supplementary Movie 2). On the other hand, loss of CTCF in muscle only slightly impaired adult hatching and life span (Figs. 1a, b).
In contrast to CTCFKO, CTCF0 mutants never hatch from the pupal case (Fig. 1c). Conditional expression of CTCF in NSCs or neurons of CTCF0 mutants strongly rescued hatching (Fig. 1c) and adults were capable of coordinated movements and survived for several days (Fig. 1d, Supplementary Movie 3). On the other hand, expressing CTCF in muscles of CTCF0 mutants barely rescued hatching (Fig. 1c, d).
Together, these results show that CTCF expression is critically required in neurons for pupal hatching and adult viability. Consistently, CTCF is more highly expressed in the nervous system than in other tissues37,38. Analyses of molecular phenotypes of CTCF0 mutants described hereafter were therefore performed in dissected CNSs of third instar larvae, a developmental stage at which CTCF0 mutants are fully viable.
Physical insulation defects in CTCF0 mutants
To address whether CTCF is required to form CD boundaries in flies, Hi-C was performed on CNSs dissected from wildtype (WT) and CTCF0 larvae in biological triplicate using two 4-cutter restriction enzymes for enhanced resolution. Hi-C maps consisting of 200 million reads per genotype were obtained by combining the correlated biological replicates (see Methods, Supplementary Table 1). Hi-C maps from whole bodies of single flies of the same genotypes were also generated. In parallel, CTCF binding sites were mapped in larval CNSs by chromatin immunoprecipitation sequencing (ChIP-seq) with a polyclonal antibody specifically recognizing CTCF (Supplementary Fig. 2a) in WT and in CTCF0 animals as control. Only 740 CTCF peaks were defined as enriched in WT relative to CTCF0 CNSs, of which 77% overlapped a CTCF consensus motif (Supplementary Fig. 2b, Supplementary Data 1).
To assess the relation between CTCF peaks and CD boundaries genome-wide in WT CNS Hi-C maps, boundaries were identified at 2 kb resolution with TopDom (see “Methods”, Supplementary Table 2, Supplementary Data 2 and 3). Very few (<1%) boundaries defined in this study potentially correspond to small CDs defined in even higher resolution Hi-C studies (see “Methods”). Domain boundaries were enriched within ±1 kb of several (36%) CTCF peaks (Fig. 2a). Conversely, a CTCF peak was located within ±1 kb of only 8% of all boundaries (Fig. 2b). This indicates that while CTCF peaks are frequently at domain boundaries, CTCF is only present at a small fraction of all boundaries in flies.
WT and CTCF0 Hi-C maps were globally similar, and most (84%) domain boundaries were detected in both WT and CTCF0 mutants. Nevertheless, specific CDs were visibly less physically insulated from the neighboring domain in CTCF0 mutants (Fig. 2c, Supplementary Fig. 2c, Supplementary Table 3). Clearly disrupted domain boundaries in CTCF0 mutants frequently occurred at former CTCF peaks (Fig. 2d). Of 135 strongly affected domain boundaries that were lost in CTCF0 mutants, 89 (66%) were at former CTCF peaks (Supplementary Table 2). To determine how generally physical insulation defects are observed at former CTCF peaks in the absence of CTCF (irrespective of their localization at CD boundaries identified by TopDom), physical insulation score differences between WT and CTCF0 mutants were measured across all 740 CTCF peaks. Boundary defects in CTCF0 mutants were observed at most former CTCF peaks, with more prominent defects visible at CTCF peaks that are highly occupied in WT (Fig. 2e, Supplementary Fig. 2d). CTCF-dependent boundaries were variably positioned relative to neighboring genes (see examples in Fig. 2c: CTCF peaks 2, 3, 5 and 6 are respectively in an intron, at the end of a gene, within 1 kb of a gene promoter or intergenic). Many CTCF-dependent boundaries were similarly affected in Hi-C maps from whole-body flies of the same genotypes, indicating that CTCF is required to form physical boundaries in most cell types (Supplementary Fig. 2e). Together, these results strongly suggest that CTCF mediates the formation of physical boundaries.
Whereas domain boundaries were abolished at several former CTCF peaks in CTCF0 mutants, they were partially retained at other peaks that are similarly occupied by CTCF in WT (Supplementary Fig. 2c, compare boundary defects at CTCF peaks 5 and 6). Of 343 WT CD boundaries bound by CTCF, only 125 (36%) were fully lost in CTCF0 mutants (Supplementary Table 2). This resulted in a lower average physical insulation score at former CTCF peaks in CTCF0 mutant CNS Hi-C maps (Fig. 2f). These observations are not due to the presence of contaminating CTCF, as CTCF RNA and protein are undetectable by RNA-seq and ChIP-seq (Fig. 2c and next section). As CTCF0 mutants lack CTCF from the beginning of development, residual boundaries can also not be explained by a role of CTCF in the establishment but not maintenance of boundaries. Rather, this observation suggests that at some sites, CTCF reinforces boundaries redundantly established by other mechanisms, a scenario also observed in mammalian cells1,2. We define CTCF-occupied CD boundaries present only in WT as strictly CTCF-dependent, and those that are present in CTCF0 (generally weaker than in WT) as partially CTCF-dependent. These two types of CTCF-dependent boundaries are contrasted later in the “Results” section.
A region in the N-terminus of human CTCF directly interacts with cohesin and stabilizes cohesin on DNA10,39, partly explaining how human CTCF forms CD boundaries. Vertebrate and fly CTCF N-termini are highly diverged, yet a 10 amino acid residue stretch in CTCF’s N-terminus that binds to cohesin in human cells is present at a similar distance from the zinc finger domain in fly CTCF10 (boxed in Supplementary Fig. 2f). We therefore tested whether two residues critical for cohesin interaction in human CTCF (Y226 F228, homologous to Y248 F250 in fly CTCF) mediate direct interaction of fly CTCF with the SA-Vtd (homologous to human SA2-SCC1) complex. For this, GFP-tagged recombinant WT and Y248A F250A point mutant CTCF N-termini were mixed with an untagged SA-Vtd subcomplex and purified on GFP binder beads. WT but not mutant CTCF versions retained SA-Vtd (Fig. 2g). Therefore, despite profound divergence, the fly CTCF N-terminus interacts directly with cohesin in vitro. This interaction was suggested to impart directionality to CTCF-dependent boundaries in mammalian cells10,39, but we find that CTCF has at best a very weak preference to establish directional boundaries (Supplementary Fig. 2g) consistent with a previous study2.
We conclude that Drosophila CTCF is required to form physical boundaries with strengths generally proportional to its occupancy on DNA. Other mechanisms reinforce CTCF-dependent boundaries at some sites and explain the formation of most boundaries in flies.
CTCF impacts expression patterns of genes near CTCF peaks
To understand how CTCF impacts transcription, we performed RNA sequencing (RNA-seq) on cDNA libraries from mRNA purified from WT and CTCF0 larval CNSs in triplicate. This confirmed the absence of CTCF mRNA in CTCF0 samples (Supplementary Fig. 3a). 392 (~3% of all) genes were significantly differentially expressed (DE) in CTCF0 mutants (with adjusted p-value<0.05 and |fold-change| > 1.5) (Fig. 3a, Supplementary Data 4). CTCF0 mutants therefore do not show widespread transcriptional defects, though changes occurring in subsets of cells in the CNS such as CTCF’s previously validated target gene Abdominal-B elude our analysis21.
Some DE genes had decreased expression in CTCF0 mutant CNSs compared to WT (Fig. 3b). Several DE genes with increased expression in CTCF0 CNSs are normally not expressed in the CNS but rather restricted to other specialized tissues like testes (Intraflagellar transport 52), tendons (Thrombospondin), and the peripheral nervous system (Odorant receptor 67d) (Figs. 3c, 3d, Supplementary Fig. 3b). Some ectopic transcripts lacked annotated start and termination sites suggesting that they are cryptic (Supplementary Fig. 3b). RNA fluorescent in situ hybridization (RNA-FISH) analysis showed that genes with increased expression in CTCF0 CNSs were misexpressed in various patterns, possibly driven by locus-specific enhancers (Fig. 3e).
Indirect transcriptional changes are expected in CTCF0 mutants, which lack CTCF since the beginning of development, and we asked whether CTCF regulates genes in the vicinity of its binding sites. 10% of DE genes had a CTCF peak within ±1 kb of their transcriptional start site (TSS) (ninefold enrichment over randomly sampled matched non-DE genes) (Fig. 3f), a result that was not very different for genes with increased versus decreased expression in CTCF0 mutants (Supplementary Fig. 3c). Conversely, 5% of CTCF peaks were located within ±1 kb of a DE gene TSS (9-fold enrichment over randomly sampled matched non-DE genes) (Fig. 3g). These results suggest that, depending on the locus, CTCF may directly repress or activate the transcription of nearby genes, or alternatively CTCF may shield promoters from inappropriate enhancers or silencers as observed at Hox gene loci21,40.
Could the structural defects observed in CTCF0 Hi-C maps be secondary consequences of gene misregulation in the vicinity of former CTCF peaks? Some CTCF-dependent domain boundaries were located far from genes (Fig. 2c, CTCF peak 6 is 9 kb away from the closest gene) and are thus unlikely to be impacted by transcription. Others were located near genes whose expression increased (Supplementary Fig. 2c, peak 3), decreased (Supplementary Fig. 2c, peak 6) and in most cases remained unchanged (Supplementary Fig. 2c, peak 7). Few (8%) DE genes were located in different A/B compartments in CTCF0 mutants relative to WT, indicating that differential gene expression mostly occurred without large changes in higher-order spatial chromatin configuration (Supplementary Fig. 3d, Supplementary Data 5). Together, these results indicate that the pervasive weakening of physical boundaries observed at former CTCF peaks in CTCF0 mutants (Fig. 2e-f) is not a mere consequence of altered transcription.
CTCF occupancy scales with enhancer-blocker activity in a reporter assay
Previous studies of the functionality of CTCF binding sites stably integrated into the fly genome suggested that most of them lack insulator activity (i.e., the ability to block regulatory crosstalk)36, at least in single copies40. Here, we tested CTCF peaks in a quantitative reporter assay. The reporter comprises an enhancer positioned between two fluorescent reporter genes (EGFP and mCherry) driven by minimal Heat-shock-protein-70 (Hsp70) promoters (Fig. 4a). Test fragments were cloned in between EGFP and the enhancer, maintaining the enhancer at a similar distance from both reporter genes. Reporter plasmids were then transiently transfected into Drosophila S2 cells, and relative EGFP and mCherry intensities were measured in thousands of single cells with a cell analyzer (Supplementary Fig. 4a). An insulator should reduce EGFP expression while mCherry expression should remain high. Control experiments with a neutral spacer or the well-characterized gypsy insulator41 validated the assay (Fig. 4b, lanes 1 and 2). Two CTCF peaks near genes whose expression decreased (peak G from Fig. 3b) or increased (peak N from Fig. 3e) in CTCF0 mutants had similar effects as gypsy (Fig. 4b, lanes 3 and 4). EGFP levels in the presence of gypsy or CTCF peaks were not strongly reduced below basal levels measured in enhancer-less control reporters (Supplementary Fig. 4b), suggesting that these tested sequences mostly impaired enhancer-mediated EGFP expression. Additional CTCF peak regions (Supplementary Fig. 4c, average size 360 bp) were tested and their relative insulator strengths were estimated from the median ratio of mCherry-over-EGFP fluorescence measured in single cells. Eleven out of 14 tested CTCF peaks selectively reduced EGFP intensities to various degrees that globally scaled with CTCF ChIP-seq occupancy measured in S2 cells42 (Fig. 4c) and that appeared independent of the endogenous locations of CTCF peaks relative to their nearest genes (Supplementary Fig. 4c) and of combinatorial co-binding with other fly insulator-binding proteins on the cloned fragments (Supplementary Fig. 4d). Mutating two base pairs of a CTCF motif in one of these fragments abolished its activity (Fig. 4c, fragment N mut); thus, the reporter specifically reveals the activity of a single CTCF binding site. Taken together, these observations indicate that CTCF sites in the reporter do not strongly directly repress or activate transcription but rather insulate a promoter from an enhancer.
CTCF recruits Cp190 to a subset of Cp190-bound domain boundaries
To further understand how CTCF functions, we asked whether it stably associates with partner proteins that contribute to its activity. Unbiased identification of CTCF partners from Drosophila embryonic nuclear extracts in biological duplicates by mass spectrometry reproducibly identified known insulator-binding proteins Centrosomal protein 190 kDa (Cp190) and Insulator binding factors 1 and 2 (Ibf1 and Ibf2) as enriched CTCF interactors relative to negative control (Supplementary Fig. 5a). Reciprocal Cp190 purifications published by others also identified Ibf1, Ibf2 and CTCF among other proteins43. Traces of the cohesin complex also co-purified with CTCF (Supplementary Fig. 5a) reminiscent of transient interactions between cohesin and CTCF seen in mammalian cells44.
CTCF was previously shown to directly interact with Cp19045, yet the relevance of this interaction remained unclear. No common target genes are known46 and a mutant version of CTCF reported to no longer interact with Cp190 was largely functional in vivo45. We performed pull-downs of GFP-tagged CTCF fragments co-expressed in bacteria with Cp190’s BTB (Broad-Complex, Tramtrack and Bric-a-brac) domain and found that amino acids 698-771 in CTCF C-terminus directly interact with Cp190 BTB (Supplementary Fig. 5b). Importantly, this stretch in CTCF does not overlap the previously deleted region (amino acid residues 774–818) that was used to conclude that CTCF’s interaction with Cp190 was unimportant in vivo.
To assess the genome-wide overlap between CTCF and Cp190 binding sites in larval CNSs, specific Cp190 peaks were identified by ChIP-seq with a polyclonal anti-Cp190 antibody in WT and in Cp190KO animals with a CRISPR-Cas9 mediated deletion of the Cp190 open reading frame as control (Supplementary Fig. 5c). 6,473 Cp190 peaks were enriched in WT relative to Cp190KO CNSs (Fig. 5a, Supplementary Data 6). Cp190 colocalized with CTCF at most (79%) CTCF peaks and was additionally present at many other sites (Fig. 5a), consistent with other studies35,36,47. We profiled Cp190 binding sites in WT and CTCF0 larval CNSs and found that Cp190 was normally recruited to most Cp190 peaks in CTCF0 mutants with the exception of former CTCF peaks, at which Cp190 was globally reduced (Figs. 5a, 5b, Supplementary Data 7 and 8). In CTCF0 mutants, Cp190 was lost from former higher-occupancy CTCF peaks but only reduced at former lower-occupancy CTCF peaks (Fig. 5b). We therefore distinguish between strictly CTCF-dependent Cp190 peaks (lacking a detectable Cp190 peak when comparing CTCF0 and Cp190KO mutants) and partially CTCF-dependent Cp190 peaks (with a detectable Cp190 peak in CTCF0 relative to Cp190KO mutants, generally weaker in CTCF0 than in WT).
Unlike CTCF, Cp190 binding was enriched at CD boundaries genome-wide (Fig. 5c lane 3, Supplementary Figs. 5d, e)2,15,17. Outside of CTCF peaks, Cp190-occupied domain boundaries were often proximal to transcribed TSSs (Fig. 5c, lane 6). In CTCF0 mutants, residual Cp190 binding at former CTCF-occupied boundaries was significantly associated with boundary retention (Figs. 5d–f). Seventy-five percent of strictly CTCF-dependent boundaries lacked a residual Cp190 peak, and 80% of residual Cp190 peaks were associated with a residual boundary in CTCF0 mutants (Fig. 5e). CD boundary defects in CTCF0 mutants were also less severe at former TSS-proximal CTCF peaks (within 200 bp of a gene TSS) than at former TSS-distal CTCF peaks (Fig. 5f). This suggests that either Cp190 itself, its associated factors, or transcription at Cp190-bound TSSs may redundantly contribute to the formation of physical boundaries independently of CTCF and may synergize with CTCF at partially CTCF-dependent Cp190 peaks (see examples in Fig. 5g).
CTCF and Cp190 co-regulate a subset of target genes
To assess whether loss of Cp190 results in transcriptional changes shared with CTCF0 mutants, RNA-seq was performed on Cp190KO larval CNSs in biological triplicate. Overall, 440 DE genes were observed in Cp190KO mutant CNSs compared to WT, of which 192 went up and 248 went down relative to WT (with adjusted p-value < 0.05 and |fold-change| > 1.5) (Supplementary Fig. 6a, Supplementary Data 9). Since Cp190 is bound to many more sites than CTCF (Fig. 5a), we did not expect that many transcriptional changes in Cp190KO mutants would be shared in CTCF0 mutants. Surprisingly, however, a considerable fraction of DE genes in CTCF0 and Cp190KO mutants were common (31% of all DE genes in CTCF0 and 26% of all DE genes in Cp190KO) and concordantly changed in similar directions and to similar degrees relative to WT (Fig. 6a). This is exemplified at the SP1029 (Fig. 6b–c) and CG15478 (Fig. 6d–e) genes that are proximal to a CTCF and Cp190 co-bound peak (peak 1/N in Fig. 6b, peak 2 in Fig. 6d). In the absence of CTCF, Cp190 is additionally lost from these peaks (Figs. 6b and d, middle), a CD boundary is disrupted (Supplementary Figs. 6b and c), and the gene is expressed at increased (SP1029 in Fig. 6b, middle) or decreased (CG15478 in Fig. 6d, middle) levels relative to WT. In the absence of Cp190, CTCF remains bound at SP1029 (Fig. 6b, bottom) and CG15478 (Fig. 6d, bottom) which are nevertheless also similarly misexpressed relative to WT (Figs. 6b and d, bottom). This suggests that Cp190 is required for CTCF function independently of CTCF binding to DNA. To more stringently compare SP1029 and CG15478 misexpression in the absence of CTCF or Cp190, we visualized their mRNAs in embryos completely lacking maternal and zygotic CTCF (CTCF0) or Cp190 (Cp1900). Already at 11 h of development, CTCF0 and Cp1900 embryos ectopically expressed SP1029 in the same cells (in the nervous system and additional cell types) (Fig. 6c) and failed to express WT levels of CG15478 in the nervous system (Fig. 6e). We conclude that Cp190 is a critical partner of CTCF for regulating a subset of common genes (see summary model in Fig. 6f).
Discussion
CTCF-dependent CDs have been proposed to regulate the communication between genes and their regulatory elements. Here, we analyzed Drosophila that developed in the complete absence of CTCF and reached the following conclusions: (1) CTCF is most critically required in neuronal cells for adult viability (Fig. 1). (2) Domain boundary defects in CTCF0 mutants are overwhelmingly associated with CTCF-bound sites, consistent with a mechanism in which CTCF can form boundaries (Fig. 2). At the same time, the vast majority of boundaries are CTCF-independent. (3) CTCF prevents ectopic activation and silencing of certain genes in its vicinity (Fig. 3). (4) Sites bound by CTCF do not directly repress or activate transcription, but rather functionally insulate promoters and enhancers in a reporter assay in S2 cells (Fig. 4). (5) Cp190 directly binds to the C-terminus of CTCF and is recruited to CTCF peaks in a strictly or partially CTCF-dependent manner (Fig. 5). Residual Cp190 binding at former CTCF peaks coincides with residual boundary retention in CTCF0 mutants (Fig. 5). (6) CTCF binding to DNA alone is not sufficient for correct expression patterns of a subset of genes that also rely on Cp190. Below we discuss how this work furthers our understanding of genome folding in Drosophila, CTCF’s role in transcriptional regulation and the molecular basis thereof.
Relaxed requirement of CTCF for Drosophila genome architecture
In comparison to vertebrates, the principles of genome folding into CDs in Drosophila are less clear. On the one hand, the majority of fly CDs were proposed to form by compartmentalization of domains with different transcriptional states or because actively transcribed genes cluster, with little contribution from architectural proteins acting independently of transcription2,48. On the other hand, analyses of enriched transcription factor motifs at domain boundaries defined at high-resolution revealed that 77% were enriched in core promoter motifs (and called promoter boundaries) and the remaining 23% were enriched in motifs of insulator-binding proteins like CTCF, su(Hw) and Ibf1 (and called non-promoter boundaries)17. This suggested that architectural proteins may form some domain boundaries. By completely ablating CTCF in vivo, we definitively show that CTCF contributes to the formation of a small fraction (below 10%) of domain boundaries in Drosophila (Fig. 2). This strongly contrasts with the mammalian genome where extrusion-based mechanisms are responsible for the formation of a large fraction of boundaries. This demonstrates that although domain formation is ubiquitous in different species, the contributions of different mechanisms can vary widely. The limited role that CTCF plays in global genome architecture in flies is nevertheless consistent with our finding that CTCF binding sites are an order of magnitude less frequent in flies (~800 peaks in 130 Mb genome) than in humans (~80,000 peaks in 3 billion bp genome)49, and the fact that alternative boundary-forming mechanisms exist in flies.
At strictly CTCF-dependent boundaries, CTCF can form boundaries independently of the presence/absence of a nearby TSS and of detectable transcriptional changes in nearby genes (Figs. 2c and 5d). At partially CTCF-dependent boundaries, defects in CTCF0 mutants are limited by redundant boundary-forming mechanisms often associated with CTCF-independent recruitment of Cp190, Cp190-associated factors or the presence of Cp190-bound transcribed gene TSSs (Figs. 5c–g and 6f). Cp190 marks both promoter and non-promoter boundaries (Fig. 5c)15,17, and it remains to be clarified whether Cp190 or its associated factors directly contribute to domain boundary formation (through similar or unrelated mechanisms as CTCF) or whether boundary formation is governed by transcription of Cp190-bound TSSs. Pervasive transcriptional perturbation globally affects Hi-C contact maps2,16,48, indicating that transcription itself or the transcription machinery at least reinforces CDs. Finally, we note that apart from CTCF, the transcription factor Zelda has also been shown to affect CD boundaries in flies: Zelda depletion in early Drosophila embryos led to partial disruption of former Zelda-occupied domain boundaries, and to concurrent loss of RNA polymerase II recruitment which may account for the observed boundary defects16.
Whether Drosophila CTCF, like its mammalian counterpart, forms CD boundaries in concert with loop-extruding cohesin remains unclear because of discrepancies between flies and mammals. (1) In mammalian Hi-C maps, CTCF sites at both anchors of an extruded loop often engage in high-frequency contacts4 not seen in Drosophila2 (Fig. 2c, Supplementary Fig. 2c). (2) CTCF and cohesin colocalize genome-wide in mammals49,50, but cohesin does not colocalize specifically with CTCF in Drosophila13,17. Fly CTCF may therefore not have a robust or unique ability to stall or stabilize loop-extruding cohesin complexes, despite their ability to interact in vitro (Fig. 2g). (3) CTCF-dependent boundaries are directional in mammals4,5,51 but lack clear directionality in flies (Supplementary Fig. 2g)2. All these discrepancies could nevertheless be expected given the probable differences in how fly CTCF interacts with extruding cohesin (Supplementary Fig. 2f). Indeed, previous in silico simulations6 and experiments affecting loop-extrusion processivity across CTCF-dependent boundaries in human cells7,9,10 described CDs with weaker corner interactions more similar to domains observed in flies. The N-terminus of DNA-bound mammalian CTCF may stall or stabilize cohesin by directly interacting with cohesin subunits and regulators10,39,52,53 via binding interfaces that are not all conserved in fly orthologs (Supplementary Fig. 2f). Our results suggest that direct interaction of fly CTCF N-terminus with cohesin is insufficient to form directional chromosomal loops.
Impact of CTCF on transcriptional regulation
Functional studies of how CTCF impacts expression are challenging in mammalian cells. Recent studies that manipulated CTCF binding sites at specific loci have moderated our view of how critical CTCF is for patterned gene expression, but a limitation is that effects can be masked by unperturbed CTCF sites nearby that function redundantly31–33.
Our transcriptional analyses of Drosophila CTCF0 CNSs showed that CTCF is required for patterned expression of selected genes in the CNS while at the same time being dispensable for orchestrating other complex gene expression programs. Gene misexpression may result from defective gene insulation from local regulatory elements, as supported by the binding of CTCF between certain neuronal and non-neuronal genes in vivo (Figs. 3c, d), the increased expression of these genes in CTCF0 larval CNSs (Figs. 3c–e) and the enhancer-blocking activity of CTCF peaks in S2 cells (Fig. 4b–c). Our reporter assay is independent of chromatin environment, allowing quantitative measurements of insulator activity that reveal a direct relation to the efficiency of CTCF recruitment. These findings are consistent with our previous characterization of Hox gene misexpression in CTCF0 mutants, which phenocopies deletions of insulator boundaries that maintain the independence of some Hox regulatory domains21. Our ability to detect gene misregulation in CTCF0 larval CNSs likely depends on genomic context, notably the presence of regulatory elements active in this organ in a sufficiently large number of cells to detectably alter transcription.
Why aren’t gene misexpression defects in CTCF0 mutants more widespread? Recent studies have emphasized that specific communication between regulatory elements and gene promoters is controlled at many levels, of which CTCF provides one. In particular, enhancer-promoter compatibility54 and regulation of the chromatin properties of regulatory elements themselves55 also determine whether or not regulatory elements and promoters functionally communicate. CTCF may also function redundantly with other insulator-binding proteins in Drosophila to limit regulatory crosstalk in this compact genome. Unlike what is known in mammals, flies have a family of insulator-binding proteins, many of which have DNA binding domains with which they target specific loci56.
Molecular basis of how CTCF impacts gene regulation
Whether CTCF’s ability to form physical boundaries explains its conserved genetic insulator activity remains an open question1,57. An ideal scenario to address this would be to separate boundary formation from gene insulator function. Human CTCF with mutated critical cohesin-interacting residues was largely functional, but CD boundaries were only partially disrupted10. We observed that some DE genes in CTCF0 mutants are close to partially CTCF-dependent boundaries (Fig. 5d, lane 6). Gene misregulation in the absence of CTCF may therefore occur despite significant retention of a physical boundary, but we did not definitively confirm that these DE genes are direct CTCF targets.
We found that CTCF functionally cooperates with a stably bound regulatory cofactor, expanding the view of how CTCF may impact gene regulation. The relevance of the CTCF-Cp190 interaction has been debated. On the one hand, Cp190 was assumed to be required for CTCF’s insulator function based on the observations (1) that the enhancer-blocking activity of a Hox gene insulator in transgenic reporter assays depended on both CTCF and Cp190, and (2) that CTCF failed to be recruited to many sites on polytene chromosomes in Cp190 mutants58,59. The latter observation was, however, not reproduced in genome-wide ChIP experiments in Cp190 knock-down cells36. On the other hand, no common CTCF and Cp190 target genes were known46, and the interaction between CTCF and Cp190 was recently concluded to be dispensable in vivo45. The latter conclusion was based on deleting residues in CTCF that did not interact with Cp190 in our pull-down experiments (Supplementary Fig. 5b). We identified genes with concordant transcriptional changes upon loss of either CTCF and Cp190 that are potentially directly regulated by both proteins.
Is this interaction conserved in vertebrates? Around 40 Cp190-like proteins comprising an N-terminal BTB domain and zinc fingers exist in humans60, but Cp190 does not have a direct ortholog. The C-terminus of human CTCF is capable of interacting with the BTB domain of a Cp190-like protein called KAISO in yeast two-hybrid experiments61, reminiscent of the interaction between fly CTCF C-terminus and the BTB domain of Cp190 (Supplementary Fig. 5b). Whether CTCF transiently interacts with a BTB domain-containing protein in human cells or whether this interaction has not been maintained in vivo remains to be clarified.
How do Cp190 and CTCF collaborate? Incomplete overlap of DE genes in CTCF0 and Cp190KO mutants suggests that CTCF requires Cp190 at some loci but not others (Fig. 6a). Alternatively, additional common targets may be masked by other transcriptional changes in Cp190KO mutants or by maternal Cp190 rescuing early defects in these mutants. How Cp190 functions is not known, but it may contribute to CTCF’s insulator activity similarly to how Cp190 contributes to the activities of gypsy and some Hox gene boundary insulators46,62. Cp190 may help CTCF form CD boundaries, or Cp190 may function independently of boundary formation through unknown mechanisms that could uncover paradigms for controlling the communication between genes and regulatory elements.
Methods
Tissue-specific CTCF loss-of-function
(CTCFKO, UAS-FLP)/TM6B heterozygotes were crossed to CTCFKO/TM6B heterozygotes for an independently isolated CTCFKO allele that also carried an FRT-flanked genomic CTCF rescue transgene and one of various Gal4 drivers: expressed in neuroblasts [worniu-Gal4 (Bloomington stock 56553)], mature neurons [elav-Gal4 (Bloomington stock 25750)], or muscles [Mef2-Gal4 (Bloomington stock 25756)]. Resulting non-TM6B animals were transheterozygous for CTCFKO alleles, derived from a WT maternal germline, and expressed UAS-FLP under the control of a Gal4 driver leading to tissue-specific excision of the CTCF rescue transgene. w1118 (wildtype) and CTCFKO transheterozygous animals were used as controls.
Tissue-specific rescue of CTCF0 mutants
Females trans-heterozygous for two independently isolated CTCFKO alleles were rescued with an FRT-flanked genomic CTCF rescue transgene that was excised in their germline by expressing FLP recombinase under the control of nanos regulatory sequences. These females were crossed to CTCFKO/TM6B males carrying a UAS-CTCF-3xHA transgene (FlyORF stock F000619) and a Gal4 driver mentioned above or no Gal4 driver as control. Resulting non-TM6B animals were transheterozygous for CTCFKO alleles, derived from a maternal germline devoid of CTCF (CTCF0 mutant background) and expressed UAS-CTCF under the control of a Gal4 driver. w1118 animals were used as WT control.
Drosophila viability tests
Three sets of 30–40 third instar larvae of desired genotypes were transferred into separate vials and the number of pupae and fully hatched adults was recorded. The average percentage and standard deviation of animals alive at each developmental stage and over a 30-day period after hatching were scored and plotted in Kaplan-Meier survival plots with 5% confidence intervals from the triplicate experiments.
Antibodies
For this study, polyclonal rabbit antibodies were raised against CTCF1–293 and Cp1901–1096. Proteins were recombinantly purified in E. coli by tandem affinity purification using N-terminal GFP- and C-terminal His-tags. Tags were cleaved off by 3C protease and used for immunization.
Western blotting
Forty third-instar larval CNSs per biological replicate were dissected in ice-cold PBS. Samples were sonicated in 100 µl of 20 mM Tris pH 7.5, 500 mM NaCl, 0.1% Triton X-100, 1× complete protease inhibitors (Roche) in a Bioruptor (settings on high, 5 min, 4 °C). Extracts were centrifuged for 5 min at maximum speed and total protein was quantified by Qubit protein assay (ThermoFisher). Calibrated amounts of extract from WT, CTCF0 and CTCFOE animals were loaded on a 4–12% acrylamide gel and probed with rabbit anti- CTCF1–293 crude serum (diluted 1:1000) and mouse anti-tubulin clone B-5-1-2 (Sigma T5168, diluted 1:10,000). CTCFOE animals expressed a CTCF cDNA under the control of upstream activating sequences (UAS) driven by a ubiquitous tubulin-Gal4 driver, and served as control. Chemilumiscence pictures of nitrocellulose membranes were imaged in Fiji v2.1.0/1.53c.
Chromatin preparation from larval CNSs
60 third-instar larval cuticles per biological replicate (two biological replicates per sample except CTCF ChIP-seq in WT performed in biological triplicates) were dissected in ice-cold PBS, then cross-linked 15 min at room temperature in 1.8% (v/v) paraformaldehyde, 50 mM HEPES pH 8, 100 mM NaCl, 1 mM EDTA, 1 mM EGTA. Crosslinking was stopped by washing for 10 min in 1 ml PBS, 0.01% Triton-X100, 125 mM glycine, then cuticles were washed for 10 min in 10 mM HEPES pH 7.6, 10 mM EDTA, 0.5 mM EGTA, 0.25% Triton X-100. CNSs were dissected from the cuticles in 10 mM HEPES pH 7.6, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.01% Triton X-100, then sonicated in 120 µl of RIPA buffer (10 mM Tris-HCl pH 8, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% SDS, 0.1% sodium deoxycholate, protease inhibitor cocktail) in AFA microtubes in a Covaris S220 sonicator for 5 min with a peak incident power of 140 W, a duty cycle of 5% and 200 cycles per burst. Sonicated chromatin was centrifuged to pellet insoluble material and snap-frozen.
ChIP-seq
ChIP was performed with 2 µl of rabbit polyclonal antibody crude sera against CTCF1–293 or Cp1901–1096, each incubated with half of the chromatin prepared from a biological replicate overnight at 4 °C. Twenty-five microliters of pre-mixed Protein A and G Dynabeads (Thermo Fisher 100-01D and 100-03D) were added for 3 h at 4 °C, then washed for 10 min each once with RIPA, four times with RIPA with 500 mM NaCl, once in LiCl buffer (10 mM Tris-HCl pH 8, 250 mM LiCl, 1 mM EDTA, 0.5% Igepal CA-630, 0.5% sodium deoxycholate) and twice in TE buffer (10 mM Tris-HCl pH 8, 1 mM EDTA). DNA was purified by RNase digestion, proteinase K digestion, reversal of crosslinks at 65 °C for 6 h, and elution from a QIAGEN Minelute PCR purification column. ChIP-seq libraries were prepared using the NEBNext Ultra II DNA Library Prep kit for Illumina. An equimolar pool of multiplexed ChIP-seq libraries at 4 nM was sequenced on the Illumina HiSeq4000 (150 bp paired-end).
ChIP-seq analysis
Paired-end ChIP-seq reads were demultiplexed and mapped to the dm6 genome using Micmap, a derivative of the fetchGWI tool63. Only chromosomes 2, 3, 4, and X were used. ChIP-seq peaks were called using the R package csaw64 v1.16.1 using a window width of 20 bp and spacing of 10 bp, ignoring duplicate reads. A background enrichment was evaluated as the median over all samples in the comparison of the average number of reads per 2 kb bins. Windows with less than threefold enrichment over background were filtered out. Data were normalized using the TMM method65 implemented in csaw. Differential binding analysis in csaw is based on the quasi-likelihood framework implemented in the edgeR package66. Results obtained on different windows were combined into regions by clustering adjacent windows. Combined p-values were evaluated for each region using csaw and Benjamini & Hochberg method was applied to control the false discovery rate. Regions with false discovery rate (FDR) < 0.01 and |fold change| > 2 were considered as differential binding regions and are reported in Supplementary Data files 1, 6, 7, and 8. Genuine CTCF peaks were identified by differential analysis of ChIP-seq signals in WT versus CTCF0 as being lower in the mutant samples relative to WT. Genuine Cp190 peaks were similarly identified by differential analysis of ChIP-seq signals in WT versus Cp190KO (Cp190 peaks in WT) or in CTCF0 versus Cp190KO (Cp190 peaks in CTCF0). Additional differential analyses were performed for Cp190 ChIP-seq signal in WT versus CTCF0 (for Fig. 5a). We defined ChIP occupancy as the best.log2FC obtained from csaw in the respective differential analysis. We defined peak positions as the best.pos obtained from csaw. To count overlaps between CTCF and Cp190 peaks in three-way comparisons shown in Fig. 5a, some CTCF and Cp190 peaks were split into 2 or 3 sub-regions. Specifically, 740 WT CTCF peaks were split into 765 peaks, 6473 WT Cp190 peaks were split into 6474 peaks, and 1045 differentially bound Cp190 regions with lower occupancy in CTCF0 relative to WT were split into 1076 peaks. Accompanying the CTCF ChIP-seq, matches to the Drosophila CTCF motif MA0531.1 downloaded from the JASPAR website were indicated in all figures.
Hi-C library preparation
60 third-instar larval CNSs (~600,000 cells) per biological replicate were dissected in ice-cold PBS. CNSs or a single whole-bodied female fly were crushed in RPMI supplemented with 10% fetal bovine serum using a micro-pestle. Cells were fixed in 1% (v/v) paraformaldehyde for 10 min at room temperature. The Hi-C libraries were prepared using MboI and MseI as restriction enzymes. Restricted ends were marked with biotin, then ligated. Fragmented DNA was enriched for pairwise DNA junctions by biotin pull-down using Dynabeads MyOne Streptavidin T1 beads following the manufacturer’s instructions. Illumina sequencing libraries were prepared with standard protocols. 4 nM equimolar pools of multiplexed Hi-C libraries were subjected to paired-end sequencing on Illumina HiSeqX Ten and HiSeq4000 instruments.
Hi-C data processing
We pre-computed a table containing the positions of all restriction sites used for Hi-C present in the dm6 genome. The FASTQ read pairs were analyzed with a Perl script available for download in the Micmap63 package (see Code Availability) to locate and separate fusion sites using the patterns /GATCGATC/, /TTATAA/, /GATCTAA/ and /TTAGATC/. The maximal length of each read was trimmed at 60 nucleotides, then reads were mapped to the dm6 genome using Micmap and matched to their closest pre-computed genomic restriction site. Read pairs were discarded if they (1) mapped to non-unique positions in the reference genome, (2) had indels or >2 mismatches per read, (3) represented fusion of 2 oppositely oriented reads within 2 kb of each other, which may have not resulted from ligation of 2 digested fragments (these fragments were used to estimate local copy number status of the underpinning genomic region), (4) were likely additional copies of a given read pair, i.e., likely PCR duplicates. Only chromosomes 2, 3, 4, and X were considered.
To assess the correlation of biological replicates, samples were downsampled to 45 million contacts per replicate. Raw Hi-C contact matrices were created by binning Hi-C pairs at 10 kb resolution. These matrices were then normalized with the ICE normalization implemented in iced v0.5.267. Low coverage regions (bins with no contacts and those with the 2% smallest total number of contacts among bins) were filtered out. Pearson correlation coefficients were determined for every pair of normalized matrices by flattening each matrix and evaluating the Pearson correlation coefficient for the resulting vector, using only pairs of bins at a genomic distance below 1 Mb. The limitation on the distance was introduced to compare contacts at a scale relevant to the analyses performed in this manuscript which were at the level of CDs. Resulting Pearson correlation coefficients were ≥0.949 for all replicates, showing that they were well correlated and that WT and CTCF0 Hi-C matrices were globally similar. For the analyses presented in the main figures, pooled replicates of the same genotype were downsampled to 200 million contacts per genotype. Raw Hi-C contact matrices obtained by binning Hi-C pairs at 2 kb resolution were then normalized with the ICE normalization implemented in iced v0.5.267. Low coverage regions (bins with no contacts and those with the 2% smallest total number of contacts among bins) were filtered out before normalization (these regions are marked by gray lines in Hi-C maps shown in the figures).
For each normalized Hi-C contact matrix, CD boundaries were called using TopDom68. Given a window size w, a physical insulation score was defined for each bin i as:
1 |
where binSignali is the average normalized Hi-C contact frequency between w bins upstream of bin i and w bins downstream of bin i determined by TopDom. The strength of a boundary at bin i was thus estimated as the log2 of the binSignal value at bin i normalized by its local average on a window of size w. With this definition, lower insulation scores indicate stronger boundaries. We extracted CD boundaries and physical insulation scores for Hi-C matrices at 2 kb resolution using window sizes 20, 40, 80, and 160 kb. CD boundaries found with all window sizes were merged, and the average insulation score obtained with all window sizes was retained. To facilitate comparisons of CD boundaries found in WT and CTCF0 genotypes and avoid mismatches due to small fluctuations of CD boundary positions obtained with different window sizes or genotypes, groups of consecutive boundaries (i.e., within 2 kb of each other) were merged. Groups of consecutive boundaries were replaced by the boundary with the lowest insulation score (average of both genotypes for boundaries common to WT and CTCF0).
Hi-C maps were visualized in R and Juicebox69 (see Supplementary Table 3 for links to interactive maps for browsing).
A/B compartment calling
A/B compartment calling was performed following the method proposed in Lieberman Aiden et al.70. Each individual chromosome arm (chr2L, chr2R, chr3L, chr3R, chr4, chrX) was analyzed separately. Normalized Hi-C contact matrices at 2 kb resolution were considered after discarding invalid bins (low coverage regions) and bins around centromeres (chosen for exclusion as dm6 coordinates >22,170,000 for chr2L, <5,650,000 for chr2R, >22,900,000 for chr3L, <4,200,000 for chr3R). Observed-over-expected matrices were generated by dividing the normalized Hi-C contact matrices by the average number of normalized Hi-C contacts at the corresponding genomic distance. For each chromosome arm, the first eigenvector of the correlation matrix was obtained by principal component analysis of the observed-over-expected matrix. Each eigenvector was then centered around zero by subtracting its mean value, then multiplied by the sign of the Pearson correlation between the eigenvector and the number of expressed gene TSSs per 2 kb bin. 2 kb bins with positive eigenvector values were assigned to compartment A, those with negative eigenvector values were assigned to compartment B. chr4 eigenvectors appeared to reflect a large-scale structure that separated the chromosome into two halves, and were thus excluded from Supplementary Fig. 3d.
Comparison with CD boundaries from other Hi-C studies
To assess whether CD boundaries called in our study could correspond to small CDs resolved in higher resolution Hi-C contact maps (analyzed at 500 bp resolution instead of 2 kb used here), we compared our CD boundary calls to CD coordinates published by Eagen et al14. and Ramírez et al.17 (converted from dm3 to dm6 genome coordinates using the liftOver tool http://genome.ucsc.edu/cgi-bin/hgLiftOver) in Kc167 tissue culture cells. We counted how many small (≤4 kb) CDs identified in those published studies were close (within 2 kb) to one of our CD boundaries. We could have potentially mis-called such small domains as a domain boundary. The result is that Eagen et al. did not report CDs smaller than 6 kb. Only 31 of our domain boundaries were within 2 kb of a ≤ 4 kb CD identified by Ramírez et al. Thus, very few (31/3970, or <1%) of our domain boundaries may correspond to a small domain defined by Ramírez et al. We next asked: How many domain boundaries that disappear in CTCF0 mutants could correspond to small domains? The result is that very few (4/567, or <1%) of our domain boundaries identified only in WT were within 2 kb of a ≤4 kb CD identified by Ramírez et al. Domain boundaries identified by Ramírez et al. are displayed together with domain boundaries identified in this study in all Hi-C screenshots throughout the manuscript for comparison.
RNA-seq on larval CNSs
WT, CTCF0 and Cp190KO mutant third instar larval brains were dissected in ice-cold PBS. For RNA isolation, triplicates of 60 larval brains each were homogenized in TRIzol LS (ThermoFisher) with pestles (VWR) on ice. RNA was extracted following the manufacturer’s instructions, remaining DNA digested with DNase I (Roche), and RNA was purified using RNAClean XP beads (Beckman Coulter). Strand-specific mRNA-seq libraries were prepared from 1 µg of total RNA after mRNA selection with NEBNext Oligo d(T)25 beads, using the NEBNext Ultra directional RNA library prep kit for Illumina following the manufacturer’s instructions. Multiplexed libraries were sequenced on one lane of a HiSeq2500 (100 bp paired-end for CTCF0 and WT control) or a Hiseq4000 (150 bp single-end for Cp190KO and WT control).
Differential RNA-seq analysis
RNA-seq reads were mapped both to the dm6 Drosophila melanogaster reference genome and to Flybase gene models and transcripts (dmel-all-r6.26.gtf.gz) using Micmap63. The results of both mappings were combined into spliced alignments in BAM file format. Then, htseq-count (v0.9.1) was used to produce read counts per gene71. Statistical analysis was performed in R (v3.5.1). Genes with <1 count per million in at least three replicate samples were filtered out using EdgeR (v3.22.5)66. Normalization and differential expression analysis were performed in DEseq2 (v1.22.1)72 individually for both WT versus CTCF0 and WT versus Cp190KO samples. Statistical significance was tested by Wald test and the Benjamini-Hochberg method was used for multiple testing adjustment. A significance threshold of |fold change| > 1.5 and p-adjusted < 0.05 was used to identify DE genes. The R package ggplot2 (v3.2.1) was used for data visualization.
RNA-FISH
Labeled RNA probes were generated by in vitro transcription with Dig-UTP labeling mix (Roche 11277073910) and T7 RNA polymerase (Roche 10881767001) antisense to full-length complementary DNA clones of SP1029 (FI20034) and IFT52 (MIP14443), genomic DNA amplified from dm6 coordinates chr3L: 10263888-10266244, or cDNAs amplified using gene-specific primers from a cDNA library prepared from Drosophila embryos (see Supplementary Data 10 for primer sequences). After DNase I digestion for 20 min at 37 °C, probes were fragmented by incubating 20 min at 65 °C in 60 mM Na2CO3, 40 mM NaHCO3 pH 10.2, precipitated in 300 mM sodium acetate pH 5.2, 1.25 M LiCl, 50 mg/ml tRNA and 80% EtOH, resuspended in 50% formamide, 75 mM sodium citrate pH 5, 750 mM NaCl, 100 µg/ml salmon sperm DNA, 50 µg/ml heparin and 0.1% Tween20, and stored at −20 °C. Embryos or third instar larval cuticles were fixed in 4% paraformaldehyde for 30 min at room temperature, washed, and then stored in 100% MeOH at −20 °C for at least overnight. Samples were rehydrated in PBS with 0.1% Tween20, post-fixed in 4% paraformaldehyde for 20 min at room temperature, progressively equilibrated to hybridization buffer (50% formamide, 75 mM sodium citrate pH 5, 750 mM NaCl) and heated to 65 °C. RNA probes were diluted 1:50 in hybridization buffer, denatured at 80 °C for 10 min then placed on ice, and added to the samples overnight shaking at 65 °C. Samples were washed 6 times 10 min in hybridization buffer at 65 °C, then progressively equilibrated to PBS with 0.1% Triton X-100. Samples were incubated overnight at 4 °C in anti-dig peroxidase (Roche 11207733910) diluted 1:2000 in PBS, 0.1% Triton X-100, 1× Western blocking reagent (Sigma 1921673). Samples were washed six times 10 min in PBS with 0.1% Tween20, labeled with Cyanine 3 tyramide in the TSA Plus kit (Perkin Elmer NEL753001KT) for 3 min at room temperature, washed 6 times 10 min in PBS with 0.1% Tween20, and finally mounted with DAPI to stain DNA. Images were acquired on a Zeiss LSM 880 microscope with a ×20 objective and visualized with Fiji software v2.1.0/1.53c.
Insulator reporter
An insulator reporter (Fig. 4a) was designed with an enhancer (OpIE2) equidistant from EGFP and mCherry fluorescent reporters with basal Hsp70 promoters. A gypsy insulator is present in the reporter plasmid, downstream of the EGFP transcription unit. Selected CTCF binding sites (Supplementary Fig. 4c) were PCR-amplified from genomic DNA and cloned in between the enhancer and EGFP. Control reporters had a neutral spacer (a fragment of the bacterial Kanamycin resistance gene) or the gypsy insulator in between the enhancer and EGFP. In addition, one CTCF binding site (fragment N) was mutagenized by PCR to mutate 2 bp in a CTCF motif (ATGTCAGAGGGCGCT converted to ATGTCAGACAGCGCT). All plasmids were transfected in parallel into S2 cells (originally purchased from ATCC, reference number CRL-1963) in triplicates in a 96-well plate using 100 ng of reporter plasmid and Effectene (QIAGEN) following the manufacturer’s instructions. After 48 h, fluorescence was measured on a NovoCyte Flow Cytometer (ACEA) using FITC and PE-TexasRed detection settings. Recordings were gated to discard measurements of untransfected cells (Supplementary Fig. 4a). Distributions of mCherry/EGFP fluorescence ratios in thousands of single transfected cells were plotted and the median mCherry/EGFP ratio was extracted for each experiment. The average of these median values obtained for each replicate is plotted in Fig. 4c as a function of the total CTCF ChIP-seq read counts in S2 cells on the cloned fragment tested in the insulator reporter—extracted using bedtools multicov73 applied to CTCF ChIP-seq data in S2 cells42 (GEO accession GSM1015410).
Recombinant protein pull-downs
Purification of N-terminal CTCF constructs
The sequence encoding WT or Y248A F250A mutant versions of the dmCTCF N-terminus (residues 1-293) were cloned into a pET-based vector with an N-terminal GFP-tag and a C-terminal His6 tag. The constructs were transformed into an E.coli expression strain (Rosetta), and 1 liter cultures were grown in TB-medium to an OD(600) of 1.0 at 37 °C. The culture temperature was then reduced to 18 °C and IPTG was added to a final concentration of 0.5 mM. Cells were harvested after overnight incubation at 18 °C by centrifugation, and the cell pellet was resuspended in 2 volumes of Lysis Buffer (50 mM Tris pH 7.5, 300 mM NaCl, 5 % glycerol, 25 mM Imidazole). Cells were opened by sonication, and the lysate was clarified by centrifugation at 50,000 × g at 4 °C. The supernatant was loaded onto a 5 ml HisTrap column (GE Healthcare), washed extensively with Lysis Buffer, and the bound material was eluted with Lysis Buffer supplemented with 400 mM Imidazole. The eluate was then diluted 10-fold with buffer QA (20 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol), and the resulting solution was loaded onto a 5 ml HiTrap-Q column (GE Healthcare). After washing the column with 5 column volumes (cV) of QA buffer, the bound material was eluted with a 5 cV gradient from QA to QB (20 mM Tris pH 7.5, 1000 mM NaCl, 5% glycerol). Fractions containing the CTCF protein at sufficient purity were identified by SDS-PAGE followed by Coomassie staining. Proteins aliquots were snap-frozen in liquid nitrogen and stored at −80 °C.
Purification of SA-Vtd complex
The sequences encoding dmSA (residues 102–1085) and Vtd (Rad21) (residues 273-458) were cloned into a pET-based vector with an N-terminal His10-TwinStrep-3C tag on SA. The complex was expressed in 1 liter of E.coli (Rosetta) grown in TB. Growth, induction of expression, and cell harvesting and lysis were carried out as described for CTCF constructs. Clarified lysates were loaded onto a 5 ml StrepTrap column (GE Healthcare), washed with 5 cV of Lysis buffer, and bound material was eluted with 8 cV of elution buffer (20 mM Tris pH 7.5, 100 mM NaCl, 5 % glycerol, 2.5 mM des-thiobiotin). The eluate was loaded on a 5 ml HiTrap-Q column (GE Healthcare), and after washing the column with 5 column volumes (cV) of QA buffer, the bound material was eluted with a 5 cV gradient from QA to QB (20 mM Tris pH 7.5, 1000 mM NaCl, 5% glycerol). Fractions containing the purified SA-Vtd complex were identified by SDS-PAGE and Coomassie staining, pooled, aliquoted, snap-frozen in liquid nitrogen, and stored at −80 °C.
Pulldowns between CTCF and SA-Vtd
Proteins were diluted to a final concentration of 2.5 µM in 500 µl of binding buffer (20 mM Tris pH 7.5, 150 mM potassium acetate, 10 % glycerol) and allowed to bind to each other at 4 °C for 2 h. Twenty microliters of this solution was removed as ‘input’ sample and boiled in SDS-PAGE loading buffer. GFP-binder beads (Agarose beads covalently bound to GFP-nanobody; 20 µl per reaction) were washed in binding buffer and added to the binding reactions for 30 min at 4 °C on a rotating wheel to bind to the GFP-tagged CTCF construct. Beads were harvested by centrifugation (1 min, 700 × g) and washed twice with 1 ml of binding buffer. The final immobilized material was eluted by boiling in 50 µl of SDS-PAGE loading buffer. Inputs and pulldowns were loaded onto a 12% SDS-PAGE gel, and the proteins were visualized by staining with Coomassie.
Pull-downs between C-terminal CTCF constructs and Cp190 BTB domain
Expression plasmids encoding GFP-His-tagged constructs of the C-terminal domain of CTCF (all with Ampicillin resistance) were co-transformed with an expression plasmid carrying a His-tagged Cp190 BTB-domain (with Kanamycin resistance) into the E.coli Rosetta strain. Colonies were inoculated in 10 ml TB cultures and grown at 37 °C to an OD(600) of 1. The culture temperature was then reduced to 18 °C, and 0.5 mM IPTG was added to induce protein expression. Cells were harvested after overnight incubation at 18 °C, and the pellets were resuspended in 2 volumes of lysis buffer (50 mM Tris pH 7.5, 200 mM NaCl, 5% glycerol, 25 mM Imidazole). Cells were lysed by sonication and the lysate was clarified by centrifugation at 16000 g for 10 min at 4 °C. The lysates was split into two halves, which were incubated for 1 h at 4 °C with either 20 µl of GFP-binder resin or 20 µl of Ni(2+)-NTA resin, to pull down only CTCF-constructs or both CTCF and CP190-BTB, respectively. The beads were then washed three times with 1 ml of Lysis buffer to remove non-specifically bound proteins. The bound material was eluted either by boiling in SDS-loading buffer (for GFP pulldowns) or by incubation with Lysis buffer supplemented with 500 mM Imidazole (for Ni(2+)-NTA pulldowns), and analysed by SDS-PAGE followed by Coomassie staining.
Co-purification of CTCF interactors from embryo nuclear extracts
Soluble nuclear protein extracts were prepared from WT (OregonR) 0–14 h embryos. Thirty grams of embryos were dechorionated, taken up in 30 ml of NU1 buffer (15 mM HEPES pH 7.6, 10 mM KCl, 5 mM MgCl2, 0.1 mM EDTA pH 8, 0.5 mM EGTA pH 8, 350 mM sucrose, 2 mM DTT, 0.2 mM PMSF), and dounce-homogenized. The lysate was filtered through a double layer of miracloth, then centrifuged 15 min at 9000 rpm at 4 °C. The nuclei pellet was resuspended and lysed in 30 ml of high-salt buffer (15 mM HEPES pH 7.9, 400 mM KCl, 1.5 mM MgCl2, 0.2 mM EDTA, 20% glycerol, 1 mM DTT, protease inhibitor cocktail) rotating for 20 min at 4 °C, and ultracentrifuged 1 h with a SW40 rotor at 38000 rpm at 4 °C. The lipid layer was removed by suction and the soluble nuclear extract was dialyzed into 15 mM HEPES pH 7.9, 200 mM KCl, 1.5 mM MgCl2, 0.2 mM EDTA pH 7.9, 20% glycerol, 1 mM DTT with a 6-8 kDa molecular weight cut-off membrane. Soluble nuclear extract was snap-frozen in liquid nitrogen, and stored at −80 °C. Drosophila CTCF1–293 fused to an N-terminal GFP-3C tag and a 3C-His6 C-terminal tag was purified from bacterial lysates by Ni-NTA affinity then ion-exchange chromatography as described above. Purified GFP-3C-CTCF1–293-3C-His6 was immobilized on GFP binder beads, of which 30 µl bead volume were then incubated with 6 mg of Drosophila embryo nuclear extract in a total volume of 10 ml of IP buffer (50 mM Tris-Cl pH 7.5, 150 mM potassium acetate, 2 mM MgCl2, 10% glycerol, 0.1 mM DTT, 0.2% Igepal, 1× complete protease inhibitor cocktail) rotating for 3 h at 4 °C. Beads were washed three times with IP buffer, rotating for 10 min at 4 °C for each wash. Proteins were eluted with 3 C protease, adjusted to 1× SDS-loading buffer and loaded on an SDS-PAGE gel. A duplicate experiment was similarly performed with nuclear protein extracts prepared from another biological replicate embryo sample. Peptides covering the entire CTCF full-length protein were recovered, indicating that pull-downs with CTCF N-terminus recovered interactors of full-length CTCF.
Mass spectrometry analysis
Protein samples were separated by SDS-PAGE and stained by Coomassie. Gel lanes between 15–300 kDa were excised into five pieces and digested with sequencing-grade trypsin. Extracted tryptic peptides were dried and resuspended in 0.05% trifluoroacetic acid, 2% (v/v) acetonitrile. Tryptic peptide mixtures were injected on a Dionex RSLC 3000 nanoHPLC system (Dionex, Sunnyvale, CA, USA) interfaced via a nanospray source to a high-resolution mass spectrometer LTQ-Orbitrap Velos Pro. Peptides were loaded onto a trapping microcolumn Acclaim PepMap100 C18 (20 mm × 100 μm ID, 5 μm, Dionex) before separation on a C18 reversed-phase custom-packed column using a gradient from 4 to 76% acetonitrile in 0.1 % formic acid. In data-dependent acquisition controlled by Xcalibur software (Thermo Fisher), the 10 most intense multiply charged precursor ions detected with a full MS survey scan in the Orbitrap were selected for collision-induced dissociation (CID, normalized collision energy NCE = 35%) and analysis in the ion trap. The window for precursor isolation was of 4.0 m/z units around the precursor and selected fragments were excluded for 60 s from further analysis. Data files were analyzed with MaxQuant 1.6.3.4 incorporating the Andromeda search engine74,75 for protein identification and quantification based on IBAQ intensities76. The following variable modifications were specified: cysteine carbamidomethylation (fixed) and methionine oxidation and protein N-terminal acetylation (variable). The sequence databases used for searching were Drosophila melanogaster and Escherichia coli reference proteomes based on the UniProt database (www.uniprot.org, versions of 31 January 2019, containing 21,939 and 4915 sequences respectively), and a contaminant database containing the most usual environmental contaminants and the enzymes used for digestion (keratins, trypsin, etc). Mass tolerance was 4.5 ppm on precursors (after recalibration) and 0.5 Da on CID fragments. Both peptide and protein identifications were filtered at 1% FDR relative to hits against a decoy database built by reversing protein sequences. The MaxQuant output table proteinGroups.txt was processed with Perseus software77 to remove proteins matched to the contaminants database as well as proteins identified only by modified peptides or reverse database hits. Next, the table was filtered to retain only proteins identified by a minimum of two peptides, the IBAQ quantitative values were log-2 transformed and missing values imputed with a constant value of 9.
Generation of Cp190KO animals
We cloned ~1.5 kb homology arms (dm6 coordinates chr3R:15276111-15274519 and chr3R:15271056-15269404) into the pHD-DsRed-attP vector78. Guide RNAs close to the start and stop codons of the Cp190 open reading frame were cloned into pCFD3 vector79. Plasmids were co-injected into nanos-Cas9 embryos79. Experiments were performed in animals transheterozygous for two independent knockout alleles.
Generation of Cp1900 animals
Cp190KO mutants were rescued into viable and fertile adults with an FRT-flanked 7 kb Cp190 genomic rescue transgene (dm6 coordinates chr3R:15269425-15276409) amplified by PCR. The Cp190 rescue cassette was excised from male and female germlines through nanos-Gal4:VP16 (NGVP16)-driven expression of UAS-FLP. Cp1900 animals were collected from crosses between such males and females.
Statistics and reproducibility
All described replicate experiments are biological (not technical) replicates. For all box plots: center line, median; box limits, upper and lower quartiles; upper whisker extends to the largest value no further than 1.5× interquartile range from the upper hinge; lower whisker extends to the smallest value no further than 1.5× interquartile range from the lower hinge; points, outliers. Figure 2g: This experiment was repeated twice from independently grown bacterial cultures, with similar results. Figure 3e and Supplementary Fig. 1a–b: n = 10 independent third instar larvae per genotype were examined over two independent experiments each. All animals showed similar expression patterns for a given gene, that was characteristic of each genotype. RNA-FISH probes for additional genes were tested on larval nervous systems but discarded because they showed an inconsistent pattern (variable, asymmetric signal in the optic lobes in all genotypes) that we concluded was non-specific background. Figure 6c, e: n = 50 independent embryos per genotype were examined over two independent RNA-FISH experiments each. All animals showed similar expression patterns for a given gene, that was characteristic of each genotype. Supplementary Fig. 2a: The experiment was repeated twice with independently prepared extracts, with similar results. Supplementary Fig. 5b: The pull-down experiments were repeated twice from independently grown bacterial cultures, with similar results.
Reporting summary
Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.
Supplementary information
Acknowledgements
We thank Winship Herr, Richard Benton, Jean-Yves Roignant and Naoko Mizuno for critical comments on the manuscript. We thank Patrice Waridel and Manfredo Quadroni for mass spectrometry analyses. We thank René Dreos for advice on statistical analyses. MCG thanks Eileen Furlong for support during early phases preceding this work. Deep sequencing was performed at the Genomic Technologies Facility (GTF), mass spectrometry was performed at the Protein Analysis Facility (PAF) and imaging was performed at the Cellular Imaging Facility (CIF) at the Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, Switzerland. This work was supported by the Swiss National Science Foundation (SNSF #184715 to M.C.G.) and the University of Lausanne.
Source data
Author contributions
M.C.G. conceived the project and designed experiments. E.L.A. and A.O. conceived and designed Hi-C experiments. A.K., G.M., I.O., A.O., M.T., P.C., A.S., and M.C.G. performed the experiments. J.D., P.C., A.S., O.D., F.M., C.I., Y.E., D.W., M.S.S., and N.G. analyzed data. Y.E. created links interactive browsing of Hi-C and ChIP-seq data in Juicebox. M.C.G. prepared the manuscript with input from all authors.
Data availability
All sequencing data (Hi-C, ChIP-seq, RNA-seq) that support the findings of this study were deposited in Gene Expression Omnibus with accession code GSE146752. Hi-C maps are browsable on Juicebox (links in Supplementary Table 3). Mass spectrometry proteomics data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD019487. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. Additional information is provided in Supplementary Data files 1–10 and a reporting summary for this Article is available as a Supplementary Information file. Source data are provided with this paper.
Code availability
All software used as described in the Methods to map, visualize and analyze data is published open source and freely available for download in the following links: “Micmap v2.20200223 [https://github.com/sib-swiss/micmap]”; “DESeq2 v1.22.2 [https://bioconductor.org/packages/release/bioc/html/DESeq2.html]”; “HTSeq v0.9.1 [https://github.com/simon-anders/htseq]”; “iced v0.5.2 [https://github.com/hiclib/iced]”; “TopDom v0.0.2 [https://github.com/jasminezhoulab/TopDom]”; “R v3.5.1 [https://www.R-project.org/]” with packages “csaw v1.16.1 [https://bioconductor.org/packages/release/bioc/html/csaw.html]”, “edgeR v3.22.5 [https://bioconductor.org/packages/release/bioc/html/edgeR.html]”, “Eulerr v6.0.0 [https://cran.r-project.org/package=eulerr]” and “ggplot2 v3.1.0 [https://ggplot2.tidyverse.org/]”; “bedtools multicov v2.29.2 [https://bedtools.readthedocs.io/en/latest/]”; “Juicebox v1.5.1 [aidenlab.org/juicebox]”. Custom scripts are provided in “link [https://github.com/gambettalab/kaushal2020/]”.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review informationNature Communications thanks Elphège Nora, Sergey Razin, Félix Recillas-Targa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Anjali Kaushal, Giriram Mohana, Julien Dorier.
Contributor Information
Erez Lieberman Aiden, Email: erez@erez.com.
Maria Cristina Gambetta, Email: mariacristina.gambetta@unil.ch.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-21366-2.
References
- 1.Rao SSP, et al. Cohesin loss eliminates all loop domains. Cell. 2017;171:305–309.e24. doi: 10.1016/j.cell.2017.09.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rowley, M. J. et al. Evolutionarily conserved principles predict 3D chromatin organization. Mol. Cell67, 837–852.e7 (2017). [DOI] [PMC free article] [PubMed]
- 3.Rowley MJ, Corces VG. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 2018;19:1–800. doi: 10.1038/s41576-018-0060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sanborn AL, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Fudenberg, G. et al. Formation of chromosomal domains by loop extrusion. Cell Rep.15, 2038–2049 (2016). [DOI] [PMC free article] [PubMed]
- 7.Haarhuis JHI, et al. The cohesin release factor WAPL restricts chromatin loop extension. Cell. 2017;169:693–707.e14. doi: 10.1016/j.cell.2017.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schwarzer W, et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature. 2017;551:51–56. doi: 10.1038/nature24281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wutz G, et al. Topologically associating domains and chromatin loops depend on cohesin and are regulated by CTCF, WAPL, and PDS5 proteins. EMBO J. 2017;36:3573–3599. doi: 10.15252/embj.201798004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Li Y, et al. The structural basis for cohesin–CTCF-anchored loops. Nature. 2020;578:1–9. doi: 10.1038/s41586-019-1910-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nora EP, et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell. 2017;169:930–944.e22. doi: 10.1016/j.cell.2017.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chathoth KT, Zabet NR. Chromatin architecture reorganization during neuronal cell differentiation in Drosophila genome. Genome Res. 2019;29:613–625. doi: 10.1101/gr.246710.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cubeñas-Potts C, et al. Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture. Nucleic Acids Res. 2017;45:1714–1730. doi: 10.1093/nar/gkw1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Eagen KP, Aiden EL, Kornberg RD. Polycomb-mediated chromatin loops revealed by a subkilobase-resolution chromatin interaction map. Proc. Natl Acad. Sci. USA. 2017;114:8764–8769. doi: 10.1073/pnas.1701291114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang Q, Sun Q, Czajkowsky DM, Shao Z. Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells. Nat. Commun. 2018;9:188. doi: 10.1038/s41467-017-02526-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hug CB, Grimaldi AG, Kruse K, Vaquerizas JM. Chromatin architecture emerges during zygotic genome activation independent of transcription. Cell. 2017;169:216–228.e19. doi: 10.1016/j.cell.2017.03.024. [DOI] [PubMed] [Google Scholar]
- 17.Ramírez F, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ulianov, S. V. et al. Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains. Genome Res.26, 70–84 (2016). [DOI] [PMC free article] [PubMed]
- 19.Moore JM, et al. Loss of maternal CTCF is associated with peri-implantation lethality of Ctcf null embryos. PLoS ONE. 2012;7:e34915. doi: 10.1371/journal.pone.0034915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Soshnikova N, Montavon T, Leleu M, Galjart N, Duboule D. Functional analysis of CTCF during mammalian limb development. Dev. Cell. 2010;19:819–830. doi: 10.1016/j.devcel.2010.11.009. [DOI] [PubMed] [Google Scholar]
- 21.Gambetta MC, Furlong EEM. The insulator protein CTCF is required for correct Hox gene expression, but not for embryonic development in Drosophila. Genetics. 2018;210:129–136. doi: 10.1534/genetics.118.301350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Arzate-Mejía RG, Cerecedo-Castillo AJ, Guerrero G, Furlan-Magaril M, Recillas-Targa F. In situ dissection of domain boundaries affect genome topology and gene transcription in Drosophila. Nat. Commun. 2020;11:894. doi: 10.1038/s41467-020-14651-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ghavi-Helm Y, et al. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. Nat. Genet. 2019;51:1272–1282. doi: 10.1038/s41588-019-0462-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yokoshi, M., Segawa, K. & Fukaya, T. Visualizing the role of boundary elements in enhancer-promoter communication. Mol. Cell78, 224–235.e5 (2020). [DOI] [PubMed]
- 25.Andrey G, et al. A switch between topological domains underlies HoxD genes collinearity in mouse limbs. Science. 2013;340:1234167–1234167. doi: 10.1126/science.1234167. [DOI] [PubMed] [Google Scholar]
- 26.Nora EP, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485:381–385. doi: 10.1038/nature11049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Symmons O, et al. Functional and topological characteristics of mammalian regulatory domains. Genome Res. 2014;24:390–400. doi: 10.1101/gr.163519.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Flavahan WA, et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature. 2016;529:110–114. doi: 10.1038/nature16490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Narendra V, et al. Transcription. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015;347:1017–1021. doi: 10.1126/science.1262088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Narendra V, Bulajić M, Dekker J, Mazzoni EO, Reinberg D. CTCF-mediated topological boundaries during development foster appropriate gene regulation. Genes Dev. 2016;30:2657–2662. doi: 10.1101/gad.288324.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Despang A, et al. Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 2019;51:1263–1271. doi: 10.1038/s41588-019-0466-z. [DOI] [PubMed] [Google Scholar]
- 32.Paliou C, et al. Preformed chromatin topology assists transcriptional robustness of Shh during limb development. Proc. Natl Acad. Sci. USA. 2019;116:12390–12399. doi: 10.1073/pnas.1900672116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rodríguez-Carballo E, et al. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev. 2017;31:2264–2281. doi: 10.1101/gad.307769.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bartkuhn M, et al. Active promoters and insulators are marked by the centrosomal protein 190. EMBO J. 2009;28:877–888. doi: 10.1038/emboj.2009.34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bortle KV, et al. Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains. Genome Res. 2012;22:2176–2187. doi: 10.1101/gr.136788.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schwartz YB, et al. Nature and function of insulator protein binding sites in the Drosophila genome. Genome Res. 2012;22:2188–2198. doi: 10.1101/gr.138156.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tomancak P, et al. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2007;8:R145. doi: 10.1186/gb-2007-8-7-r145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Brown JB, et al. Diversity and dynamics of the Drosophila transcriptome. Nature. 2014;512:393–399. doi: 10.1038/nature12962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Nora EP, et al. Molecular basis of CTCF binding polarity in genome folding. Nat. Commun. 2020;11:5612. doi: 10.1038/s41467-020-19283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kyrchanova O, et al. The insulator functions of the Drosophila polydactyl C2H2 zinc finger protein CTCF: necessity versus sufficiency. Sci. Adv. 2020;6:eaaz3152. doi: 10.1126/sciadv.aaz3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Geyer PK, Corces VG. DNA position-specific repression of transcription by a Drosophila zinc finger protein. Genes Dev. 1992;6:1865–1873. doi: 10.1101/gad.6.10.1865. [DOI] [PubMed] [Google Scholar]
- 42.Ong C-T, Van Bortle K, Ramos E, Corces VG. Poly(ADP-ribosyl)ation regulates insulator function and intrachromosomal interactions in Drosophila. Cell. 2013;155:148–159. doi: 10.1016/j.cell.2013.08.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cuartero S, Fresán U, Reina O, Planet E, Espinàs ML. Ibf1 and Ibf2 are novel CP190‐interacting proteins required for insulator function. EMBO J. 2014;33:637–647. doi: 10.1002/embj.201386001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hansen AS, Pustova I, Cattoglio C, Tjian R, Darzacq X. CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife. 2017;6:2848. doi: 10.7554/eLife.25776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bonchuk A, et al. Functional role of dimerization and CP190 interacting domains of CTCF protein in Drosophila melanogaster. BMC Biol. 2015;13:63. doi: 10.1186/s12915-015-0168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Savitsky, M., Kim, M., Kravchuk, O. & Schwartz, Y. B. Distinct roles of chromatin insulator proteins in control of the Drosophila Bithorax complex. Genetics 115.179309 (2016). [DOI] [PMC free article] [PubMed]
- 47.Nègre N, et al. A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 2010;6:e1000814. doi: 10.1371/journal.pgen.1000814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rowley MJ, et al. Condensin II counteracts cohesin and RNA polymerase II in the establishment of 3D chromatin organization. Cell Rep. 2019;26:2890–2903.e3. doi: 10.1016/j.celrep.2019.01.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pugacheva, E. M. et al. CTCF mediates chromatin looping via N-terminal domain-dependent cohesin retention. Proc. Natl Acad. Sci. USA 201911708 (2020). [DOI] [PMC free article] [PubMed]
- 50.Wendt KS, et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature. 2008;451:796–801. doi: 10.1038/nature06634. [DOI] [PubMed] [Google Scholar]
- 51.Tang Z, et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell. 2015;163:1611–1627. doi: 10.1016/j.cell.2015.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wutz G, et al. ESCO1 and CTCF enable formation of long chromatin loops by protecting cohesinSTAG1 from WAPL. Elife. 2020;9:e52091. doi: 10.7554/eLife.52091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hansen AS. CTCF as a boundary factor for cohesin-mediated loop extrusion: evidence for a multi-step mechanism. Nucleus. 2020;11:132–148. doi: 10.1080/19491034.2020.1782024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Haberle V, et al. Transcriptional cofactors display specificity for distinct types of core promoters. Nature. 2019;570:801. doi: 10.1038/s41586-019-1210-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kraft K, et al. Serial genomic inversions induce tissue-specific architectural stripes, gene misexpression and congenital malformations. Nat. Cell Biol. 2019;21:305–310. doi: 10.1038/s41556-019-0273-x. [DOI] [PubMed] [Google Scholar]
- 56.Özdemir I, Gambetta MC. The role of insulation in patterning gene expression. Genes. 2019;10:767. doi: 10.3390/genes10100767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Yatskevich S, Rhodes J, Nasmyth K. Organization of chromosomal DNA by SMC complexes. Annu Rev. Genet. 2019;53:445–482. doi: 10.1146/annurev-genet-112618-043633. [DOI] [PubMed] [Google Scholar]
- 58.Gerasimova TI, Lei EP, Bushey AM, Corces VG. Coordinated control of dCTCF and gypsy chromatin insulators in Drosophila. Mol. Cell. 2007;28:761–772. doi: 10.1016/j.molcel.2007.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Mohan M, et al. The Drosophila insulator proteins CTCF and CP190 link enhancer blocking to body patterning. EMBO J. 2007;26:4203–4214. doi: 10.1038/sj.emboj.7601851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43:D257–D260. doi: 10.1093/nar/gku949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Defossez P-A, et al. The human enhancer blocker CTC-binding factor interacts with the transcription factor Kaiso. J. Biol. Chem. 2005;280:43017–43023. doi: 10.1074/jbc.M510802200. [DOI] [PubMed] [Google Scholar]
- 62.Pai C-Y, Lei EP, Ghosh D, Corces VG. The centrosomal protein CP190 is a component of the gypsy chromatin insulator. Mol. Cell. 2004;16:737–748. doi: 10.1016/j.molcel.2004.11.004. [DOI] [PubMed] [Google Scholar]
- 63.Iseli C, Ambrosini G, Bucher P, Jongeneel CV. Indexing strategies for rapid searches of short words in genome sequences. PLos ONE. 2007;2:e579. doi: 10.1371/journal.pone.0000579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Lun ATL, Smyth G. K. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. Nucleic Acids Res. 2015;44:e45. doi: 10.1093/nar/gkv1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Robinson MD, McCarthy DJ, Smyth G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinform. Oxf. Engl. 2009;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Shin H, et al. TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res. 2015;44:e70. doi: 10.1093/nar/gkv1505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550–521. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 75.Cox J, et al. Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J. Proteome Res. 2011;10:1794–1805. doi: 10.1021/pr101065j. [DOI] [PubMed] [Google Scholar]
- 76.Schwanhäusser B, et al. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
- 77.Tyanova S, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods. 2016;13:731–740. doi: 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
- 78.Gratz SJ, et al. Highly specific and efficient CRISPR/Cas9-catalyzed homology-directed repair in Drosophila. Genetics. 2014;196:961–971. doi: 10.1534/genetics.113.160713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Port F, Chen H-M, Lee T, Bullock SL. Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc. Natl Acad. Sci. USA. 2014;111:E2967–E2976. doi: 10.1073/pnas.1405500111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All sequencing data (Hi-C, ChIP-seq, RNA-seq) that support the findings of this study were deposited in Gene Expression Omnibus with accession code GSE146752. Hi-C maps are browsable on Juicebox (links in Supplementary Table 3). Mass spectrometry proteomics data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD019487. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. Additional information is provided in Supplementary Data files 1–10 and a reporting summary for this Article is available as a Supplementary Information file. Source data are provided with this paper.
All software used as described in the Methods to map, visualize and analyze data is published open source and freely available for download in the following links: “Micmap v2.20200223 [https://github.com/sib-swiss/micmap]”; “DESeq2 v1.22.2 [https://bioconductor.org/packages/release/bioc/html/DESeq2.html]”; “HTSeq v0.9.1 [https://github.com/simon-anders/htseq]”; “iced v0.5.2 [https://github.com/hiclib/iced]”; “TopDom v0.0.2 [https://github.com/jasminezhoulab/TopDom]”; “R v3.5.1 [https://www.R-project.org/]” with packages “csaw v1.16.1 [https://bioconductor.org/packages/release/bioc/html/csaw.html]”, “edgeR v3.22.5 [https://bioconductor.org/packages/release/bioc/html/edgeR.html]”, “Eulerr v6.0.0 [https://cran.r-project.org/package=eulerr]” and “ggplot2 v3.1.0 [https://ggplot2.tidyverse.org/]”; “bedtools multicov v2.29.2 [https://bedtools.readthedocs.io/en/latest/]”; “Juicebox v1.5.1 [aidenlab.org/juicebox]”. Custom scripts are provided in “link [https://github.com/gambettalab/kaushal2020/]”.