Abstract
CCCTC-binding factor (CTCF) is critical to three-dimensional genome organization. Upon differentiation, CTCF insulates active and repressed genes within Hox gene clusters. We conducted a genome-wide CRISPR knockout (KO) screen to identify genes required for CTCF-boundary activity at the HoxA cluster, complemented by biochemical approaches. Among the candidates, we identified Myc-associated zinc-finger protein (MAZ) as a cofactor in CTCF insulation. MAZ colocalizes with CTCF at chromatin borders and, similar to CTCF, interacts with the cohesin subunit RAD21. MAZ KO disrupts gene expression and local contacts within topologically associating domains. Similar to CTCF motif deletions, MAZ motif deletions lead to derepression of posterior Hox genes immediately after CTCF boundaries upon differentiation, giving rise to homeotic transformations in mouse. Thus, MAZ is a factor contributing to appropriate insulation, gene expression and genomic architecture during development.
Subject terms: Epigenetics, Gene regulation, Gene targeting, Stem cells, Pattern formation
Genome-wide screens identify several genes, including MAZ, required for CTCF-mediated insulation. MAZ interacts with cohesin, and MAZ motif deletions derepress posterior Hox gene expression, leading to homeotic transformations in mouse.
Main
The precise regulation of gene expression is required to ensure proper embryonic development. Beyond the DNA sequence, the chromatin structure and spatial organization of the genome regulate transcriptional output. The genomes of higher eukaryotes are tightly folded and packaged within the nucleus1. The partitioning of the genome into independent chromatin domains occurs via insulators. Although several insulators are present in Drosophila2, CTCF is the main insulator protein in vertebrates3–5. CTCF is a highly conserved, ubiquitously expressed, 11-zinc-finger protein6 that is critical for development7,8 and enriched at the borders of topologically associating domains (TADs)9–11. Among the many proteins associated with CTCF at different loci4,12, only the cohesin complex colocalizes to most CTCF binding sites and is required for CTCF function13,14. CTCF-boundary activity is context dependent15. CTCF functions as a boundary between active and repressed chromatin domains, decorated by Trithorax and Polycomb, respectively, at Hox clusters upon differentiation of mouse embryonic stem cells (ESCs) into cervical motor neurons (MNs)16,17. This dynamic compartmentalization of Hox clusters into antagonistic domains allows CTCF-mediated looping to reshape regulatory interactions. Although there is a cell-type-specific modulation of CTCF-boundary activity, CTCF and cohesin occupancy appears stable across Hox clusters during the differentiation of ESCs into cervical MNs16,18. Thus, during differentiation, additional regulatory factors appear to be necessary to foster CTCF-mediated insulation properties.
To identify such putative factors affecting CTCF-boundary activity, we devised an unbiased genome-wide loss-of-function genetic screen involving a functional CTCF boundary within the HoxA cluster in cervical MNs. We complemented this screen with biochemical approaches to identify CTCF partners and colocalizing proteins on chromatin in ESCs and MNs (Fig. 1a.). We identified MAZ as a CTCF cofactor functioning to insulate active chromatin boundaries from spreading into repressive regions at Hox clusters, among other candidates that were narrowed down via secondary loss-of-function screens. Through a series of functional assays performed in vitro and in vivo during development, we demonstrate that MAZ is integral to appropriate gene expression and architectural genome organization in the context of CTCF and cohesin.
Results
A dual reporter of Hox gene expression at the HoxA cluster
We aimed to identify boundary-associated factors that function to insulate the anterior from the posterior region of the HoxA cluster. To this end, we focused on the CTCF boundary that forms upon ESC differentiation into cervical MNs16,17. This CTCF boundary insulates active and repressed chromatin domains at the HoxA cluster, and its loss gives rise to defined transcriptional repercussions in cervical MNs16,17. We constructed a dual-reporter ESC line (Hoxa5:a7 ESCs) containing distinct fluorescent reporters of endogenous Hox gene expression on each side of the CTCF-demarcated boundary in the HoxA cluster using CRISPR technology19 (Fig. 1b and Extended Data Fig. 1a). The relative expression of Hox genes can be assayed in single cells, and any activity mediating CTCF-boundary formation can be assessed in this Hoxa5:a7 dual-reporter system. As expected based on previous studies16,20,21, Hoxa5-P2A-mCherry reporter expression was induced during cervical MN differentiation, whereas Hoxa7-P2A-eGFP remained repressed (Extended Data Fig. 1b–d). To confirm that Hoxa7-P2A-eGFP could report defects in the formation of the CTCF-dependent boundary, we deleted the CTCF binding sites between Hoxa5 and Hoxa7 genes in ESCs (CTCF (Δ5|6) or CTCF (Δ5|6:6|7), respectively) and demonstrated the derepression of Hoxa7-P2A-eGFP by fluorescence-activated cell sorting (FACS) analysis and reverse transcription quantitative polymerase chain reaction (RT-qPCR) (Extended Data Fig. 1b–d), as previously reported16 (Supplementary Note 1). The ~10–15% Hoxa7-P2A-eGFP-positive cells (Extended Data Fig. 1c,d) allowed for enough of a dynamic range to identify mutants that decreased or increased CTCF insulating properties.
Genome-wide CRISPR loss-of-function screen for CTCF-boundary function
To identify factors required for the integrity of the CTCF boundary, we performed an unbiased loss-of-function genetic screen on the Hoxa5:a7 dual-reporter ESCs using a pooled genome-wide library of single-guide RNAs (sgRNAs)22, as shown schematically in Fig. 1b. A Hoxa5:a7 ESC clone expressing Cas9 (Extended Data Fig. 1e) was transduced with the pooled lentiviral sgRNAs at a low multiplicity of infection (~0.4), as applied previously22,23, such that each transduced cell expressed a single sgRNA. The reporter ESCs (t = 0) were then differentiated into cervical MNs (t = 6) via the addition of all-trans-retinoic acid/smoothened agonist24 and sorted by FACS into two MN populations: (1) wild-type (WT) MNs (mCherry-positive/eGFP-negative cells; t = 6a) and (2) CTCF-boundary-disrupted MNs (double-positive cells; t = 6b). By preparing libraries at each time point, the relative sgRNA representation at t = 0, 6, 6a and 6b were compared using next-generation sequencing, as described previously22,23,25,26. This screen setup enabled identification of three sets of genes: (1) essential genes in ESCs (negative selection), (2) essential/differentiation-related genes in MNs (negative selection) and (3) genes affecting CTCF-boundary function (positive selection) (Supplementary Note 2).
Identification of factors affecting CTCF insulation function
As expected from a functional screen, we observed selective loss of essential genes in the starting population (ESCs, t = 0) compared to plasmid library (Fig. 1c, Extended Data Fig. 1f,g and Supplementary Dataset 1a), and further loss of genes essential/required for MN differentiation (MN, t = 6) compared to the ESC population (t = 0) (Fig. 1d, Extended Data Fig. 1h,i and Supplementary Dataset 1b), indicating the success of the screen. Among genes underrepresented in MNs compared to ESCs (false discovery rate (FDR) < 0.05), we observed Polycomb group genes, CTCF, cohesin components and several components related to the MN differentiation pathway. Our genome-wide screens were performed in duplicates by using independent genome-wide sublibraries (library A and library B) containing three sgRNAs per gene, resulting in four independent screens. In each screen, we identified ~1,000 genes positively selected in double-positive cells (CTCF-boundary-disrupted MNs, t = 6b) compared to mCherry-positive cells (WT MNs, t = 6a) using MAGeCK tools27,28 (Fig. 1e and Supplementary Dataset 2). Based on the four independent sublibrary screens, we narrowed down the list of candidates in CTCF-boundary-disrupted MNs compared to WT MNs to 215 genes (Fig. 1e,f and Supplementary Dataset 3). Notably, Maz was identified as a top candidate (rank = 2) in one of the genome-wide screens with library A and also detected in a similar screen (rank = 486) with library B containing an independent set of sgRNAs (Fig. 1e,f and Supplementary Datasets 2 and 3).
Identification of proteins colocalizing with CTCF on chromatin
We complemented the locus-specific genetic screen with orthogonal biochemical approaches for the identification of proteins colocalizing with CTCF on chromatin. Unlike previous studies that aimed to identify CTCF partner proteins in soluble cellular fractions through the use of overexpression-based systems12,29, we identified proteins colocalizing with endogenous CTCF on chromatin that may or may not interact with CTCF but nonetheless may be important for its insulation properties in situ. To pull down CTCF under endogenous conditions, we generated an ESC line containing C-terminal FLAG-tagged CTCF via CRISPR technology19 (Extended Data Fig. 2a) and confirmed successful FLAG-CTCF immunoprecipitation from the nuclear fraction of ESCs (Extended Data Fig. 2b–f and Extended Data Fig. 2g for the immunoprecipitation in 293FT cells). To expand and identify factors colocalizing with CTCF on chromatin, we applied two biochemical methods: (1) FLAG-CTCF immunoprecipitation from native chromatin in ESCs and MNs (Extended Data Fig. 2c) and (2) FLAG-CTCF immunoprecipitation from crosslinked chromatin in ESCs and MNs (Fig. 1g), an adapted version of the chromatin immunoprecipitation (ChIP) mass spectrometry (MS) approach described previously30–33 (Supplementary Note 3). In both FLAG-CTCF ChIP-MS approaches, we identified known interactors and novel proteins interacting or cobinding with CTCF (Fig. 1g and Extended Data Fig. 2c; all candidates are listed in Supplementary Dataset 4). As expected, we recovered CTCF, cohesin components and accessory subunits and other chromatin remodelers (Fig. 1g and Extended Data Fig. 2c). Although the overlap between genetic and biochemical approaches is limited (Extended Data Fig. 2d and Supplementary Dataset 5; Supplementary Dataset 2 versus Supplementary Dataset 4 and Fig. 1g), the candidates identified in both approaches have the potential to be critical for CTCF function at the HoxA cluster and genome-wide, respectively. Interestingly, MAZ was identified uniquely in the crosslinked-based CTCF ChIP-MS, thereby constituting a representative candidate that overlaps with those identified from the Hox-related functional screens. Thus, MAZ might serve a role in regulating the CTCF boundary at the Hox loci. MAZ was also reported to colocalize with CTCF at ~48% of binding sites based on ENCODE ChIP sequencing (ChIP-seq) data in K562 cells34, as recently confirmed35. In a systematic study investigating DNA binding proteins at chromatin loops, the combinations of MAX-MYC-MAZ-CHD2 and CTCF-RAD21-SMC3 were reported36. Moreover, an algorithm detecting combinatorial motifs for transcription factors has revealed the presence of MAZ and CTCF along with others within the X chromosome37, reinforcing our observation here of their proximal binding on crosslinked chromatin.
Candidates after secondary CRISPR loss-of-function screens
Both the genetic and biochemical approaches revealed a large list of candidates, which were further narrowed down and validated through independent secondary genetic screens. In order to systematically narrow down candidates from the primary genome-wide screens (Supplementary Dataset 2) and check whether CTCF partners identified in Fig. 1g and Extended Data Fig. 2c (Supplementary Dataset 4) have a role at the CTCF boundary at the HoxA cluster, we performed secondary loss-of-function screens with a small custom library (Supplementary Dataset 6, Extended Data Fig. 2h and Supplementary Note 4). Importantly, these secondary screens were performed with increased statistical power in ESCs having either the WT Hoxa5:7 reporter (Fig. 2a and Extended Data Fig. 3a,b) or the CTCF (Δ5|6:6|7) Hoxa5:7 reporter (Fig. 2b and Extended Data Fig. 3c,d) to focus on candidates uniquely impacting the CTCF boundary in the WT background. Based on the rank of genes overrepresented in the Hoxa5:7 dual-positive MN population compared to Hoxa5-positive cells, we identified 55 genes that disrupt the CTCF boundary in the WT background having intact CTCF binding sites (Fig. 2c and Supplementary Dataset 7). Similarly, we identified 165 genes that influence CTCF-boundary/Hoxa7 gene expression from screens performed in the CTCF (Δ5|6:6|7) background (Fig. 2d and Supplementary Dataset 8). Thus, the secondary screens resulted in a small list of 43 genes, which when mutated phenocopied the CTCF (Δ5|6) motif deletion in the presence of intact CTCF binding sites (Fig. 2e shows a comparison of secondary screens in both backgrounds; Supplementary Dataset 9). Importantly, the secondary screens also confirmed the identification of Maz uniquely in the WT background. Other genes shown in Fig. 2c,d are expected positive controls such as Ctcf, cohesin components/accessory subunits and Znf143, which encodes a protein that colocalizes with CTCF at TADs38,39 (Supplementary Note 4).
Validation of MAZ function at CTCF boundaries in Hox clusters
Among the candidates we identified as mimicking CTCF (Δ5|6) at the HoxA cluster, MAZ was ranked high in multiple primary screens, identified as a colocalizing factor with CTCF on chromatin and further validated through secondary screens. MAZ is a ubiquitously expressed protein that was initially identified as a regulatory protein associated with Myc gene expression40 and also identified as a regulatory factor for the insulin promoter41. To validate the screen results, we generated a MAZ KO in ESCs through CRISPR editing19 (Extended Data Fig. 3e,f). The MAZ KO did not produce a profound change in gene markers associated with ESC and MN fate (Extended Data Fig. 3g). In addition, the MAZ KO did not result in cell cycle changes in ESCs (Extended Data Fig. 3h,i). Importantly, the MAZ KO did not affect overall CTCF and cohesin levels (Extended Data Fig. 3f). However, as shown in Fig. 2f, the MAZ KO in MNs mimicked the specific deletion of the CTCF sites (Δ5|6:6|7) at the HoxA cluster and disrupted the boundary between active and repressed genes. In addition, the MAZ KO resulted in differential expression of ~2,400 genes in ESCs (Fig. 2g, top; Extended Data Fig. 4a for Gene Ontology (GO) analysis and Supplementary Dataset 10) and ~1,800 genes in MNs compared to WT (Fig. 2g, bottom, and Supplementary Dataset 11). GO analysis indicated that developmental processes, particularly anterior–posterior pattern specification, are enriched in MAZ KO MNs compared to WT MNs (Extended Data Fig. 4b). Consistent with MAZ having a role in CTCF-boundary integrity and the MAZ KO mimicking the CTCF binding site deletions, the MAZ KO led to a derepression of mainly posterior Hox genes after CTCF boundaries in MNs, but not in ESCs with the exception of Hoxc10 and Hoxd13 (Fig. 2g, Extended Data Fig. 4c–g and Supplementary Datasets 10–12). Notably, we did not observe further derepression of Hoxa7-eGFP upon differentiation of ESCs into MNs when comprising both CTCF (Δ5|6:6|7) and a MAZ KO (Fig. 2h,i), confirming our secondary screen results (Fig. 2c–e).
MAZ colocalizes with CTCF on chromatin
Based on our ChIP-seq analysis, ~20% of MAZ binding sites colocalize with CTCF in ESCs and MNs (Fig. 2j–l). The MAZ signal is specific given its loss in MAZ KO cells (Extended Data Fig. 5a and Supplementary Fig. 1) and the de novo detection of the MAZ motif within its binding sites in ESCs and MNs (Extended Data Fig. 5b,c). MAZ mostly binds to promoters in addition to introns, intergenic regions and other regions (Extended Data Fig. 5d). That CTCF and MAZ functionally cooperate beyond the Hox clusters is supported by our findings that 740 genes are commonly impacted when comparing differentially expressed genes reported in the case of CTCF loss (auxin treatment, 48 h) in ESCs42,43 and those in the case of MAZ loss (Fig. 2m and Extended Data Fig. 5e for CTCF and MAZ occupancies at these genes). As we initially identified the MAZ KO as influencing the CTCF boundary at the HoxA cluster (Fig. 2f), we compared ChIP-seq tracks of MAZ at the HoxA cluster to those of CTCF. MAZ appears to bind to DNA in proximity to CTCF as MAZ and CTCF colocalized at CTCF borders in Hox clusters (Fig. 3a; Fig. 3e and Extended Data Figs. 5a and 6 for HoxA; and Extended Data Fig. 7 for HoxD). MAZ KO in ESCs and MNs resulted in a slight decrease in CTCF binding at the boundary in the HoxA cluster (Extended Data Fig. 5a). We also observed a similar global decrease in CTCF binding in the MAZ KO (Extended Data Fig. 5f–i), suggesting a possible role of MAZ in stabilizing CTCF on chromatin (Supplementary Note 5).
Repercussions of MAZ motif deletion at the Hox clusters
MAZ binds to a GC-rich motif on DNA (GGGAGGG) through its zinc fingers44 (Extended Data Fig. 5b,c shows MAZ motifs detected in ESCs and MNs). The distance analysis between MAZ and CTCF motifs indicates ~70–140 bp within a window of 500 bp centered on CTCF binding regions in ESCs and MNs (Extended Data Fig. 5j,k). We found MAZ binding motifs close to CTCF at the Hox boundaries (Fig. 3a), which were confirmed as such through FLAG-MAZ binding in vitro (Extended Data Figs. 6a and 7a). We also tested whether deletion of MAZ binding motifs at a specific Hox cluster mimics that of a CTCF binding site in the respective Hox cluster (Supplementary Note 6). As expected, MAZ binding site deletions at Hox clusters did not influence the cell cycle in ESCs (Extended Data Fig. 6b,c for HoxA and Extended Data Fig. 7b,c for HoxD clusters). Interestingly, MAZ binding site deletions generated at Hoxa5|6 (Fig. 3b, Extended Data Fig. 6 and Supplementary Fig. 2) and Hoxd4|8 (Extended Data Fig. 7) phenocopied the transcriptional repercussions of CTCF binding site deletions at the respective boundaries without loss of CTCF binding at the boundary (Fig. 3c–e, Extended Data Fig. 6d,e and Supplementary Datasets 13 and 14 for HoxA; Extended Data Fig. 7d–f for HoxD; and published results for the loss of the CTCF boundary16). These results pointed to a specific role of MAZ binding in regulating Hox gene expression at CTCF boundaries in multiple Hox clusters during differentiation (Supplementary Note 6). When we analyzed how chromatin domains were influenced upon deletion of the MAZ binding site at the Hoxa5|6 boundary, we observed spreading of active chromatin (H3K4me3) into the repressed region (H3K27me3) at the boundary (Fig. 3e and Extended Data Fig. 6d), similar to the CTCF binding site deletions shown in Extended Data Fig. 6e and that reported previously16. In accordance, CTCF depletion was also reported to impact transcriptional activity, but not the spread of H3K27me3 domains42. Similar to MAZ (Δ5|6) being ineffectual with respect to neighboring CTCF binding (Fig. 3e), CTCF (Δ5|6:6|7) did not affect adjacent MAZ binding at the HoxA cluster (Extended Data Fig. 6e). Nevertheless, we note that based on cleavage under targets and release using nuclease (CUT&RUN) analysis of the double-positive sorted population (Hoxa5-P2A-mCherry and Hoxa7-P2A-eGFP) in MNs, MAZ (Δ5|6) did not affect RAD21 binding, yet it modestly decreased CTCF binding and H3K27me3 (Extended Data Fig. 6d). Although H3K4me3 spreading (Fig. 3e) and decreased H3K27me3 were observed for MAZ Hoxa5|6 motif deletion (Extended Data Fig. 6d), the MAZ motif deletion at Hoxd4|8 exhibited only decreased H3K27me3 (Extended Data Fig. 7d). Thus, our results suggest that MAZ acts as a chromatin border factor alone, being partially additive with CTCF, and that alterations of the active and repressive chromatin marks can be context dependent.
According to the analysis of topological organization by circular chromosome conformation capture (4C), the interaction signal covers the HoxA cluster in ESCs as a single architectural domain not altered upon MAZ deletion (Δ5|6) (Fig. 3f), in accordance with the CTCF motif deletion shown in Extended Data Fig. 6f, and as reported previously16. However, upon differentiation into MNs, the HoxA cluster partitions into active and repressed regions (Fig. 3e)16, as reflected by the 4C interactions observed exclusively within the rostral part of the HoxA cluster (Fig. 3f). In contrast to ESCs, deletion of the MAZ Hoxa5|6 binding site affects the topological organization of the HoxA cluster in MNs (Fig. 3f), similar to that observed for CTCF(Δ5|6:6|7) in MNs (Extended Data Fig. 6f), and as reported previously16. Thus, MAZ (Δ5|6) impacts not only the partitioning of active and repressed chromatin domains and Hox gene expression, but also the structural organization of the HoxA cluster.
Effect of MAZ depletion on global genome organization
Besides its boundary role at Hox clusters, CTCF plays a pleiotropic role in three-dimensional (3D) genome structure. As shown here, MAZ colocalizes with CTCF at ~20% of MAZ binding sites in ESCs and MNs (Fig. 2j–l), and MAZ KO reduces CTCF binding (Extended Data Fig. 5f–i) and results in differential expression of ~2,000 genes (Fig. 2g). Thus, we examined the effect of MAZ KO on global 3D genomic organization by performing Hi-C in WT versus MAZ KO ESCs and MNs in biological duplicates (Fig. 4, Extended Data Figs. 8 to 10 and Supplementary Note 7). MAZ depletion resulted in a modest alteration of local contacts within TADs in ESCs (Extended Data Fig. 8d) and MNs (Fig. 4a), indicating its contribution to TAD integrity. In addition, analysis of differential loop activity showed alterations upon MAZ KO in both cell types (Fig. 4b and Extended Data Fig. 8e), indicating a global reduction in loop interactions relative to WT. Such loop changes were accompanied by significant alterations in the expression of genes that localize within these differential loops in both ESCs (Extended Data Fig. 8f) and MNs (Fig. 4c). In accordance, aggregate peak analysis (APA) showed that contact frequencies were decreased in MAZ KO ESCs (Extended Data Fig. 8g) and MNs (Fig. 4d) compared to WT and that the majority of loops examined had less than 2 Mb between the anchors. Interestingly, loops having CTCF, MAZ or CTCF and MAZ at loop anchors exhibited decreased contact frequencies upon MAZ KO (Fig. 4d and Extended Data Fig. 8g). These decreased signals observed at loop anchors containing CTCF (Fig. 4d and Extended Data Fig. 8g, top and bottom plots) could be attributable to the global decrease in CTCF binding in the absence of MAZ (Extended Data Fig. 5f–i). In particular, upon MAZ KO, we observed a mild downregulation of MAZ-containing loops in the Shh locus accompanied by its downregulation (Fig. 4e and Supplementary Dataset 11; and Extended Data Fig. 9, Supplementary Figs. 3–5 and Supplementary Note 7 for other loci). We also observed a larger-scale effect on intra-TAD activity and looping interactions accompanied by gene expression changes upon differentiation of ESCs into MNs (Extended Data Fig. 8h–l). As CTCF motifs are known to be convergent at loop anchors45–47, we analyzed the directionality for the CTCF motif and MAZ motifs shown in Extended Data Figure 5b,c. Both ESCs and MNs exhibited MAZ towards the inside of the loop with respect to CTCF in a slightly higher number of loop anchors (Extended Data Fig. 10a,b). Moreover, the convergence observed for CTCF and MAZ motifs at Hi-C loop anchors in Extended Data Fig. 10c,d demonstrated that MAZ motifs can be mostly convergent and tandem at loop anchors; however, the frequency of convergence observed for MAZ is smaller than that for CTCF. Collectively, these results demonstrate that MAZ participates in the maintenance of local interactions within the TADs and other looping interactions.
RAD21 relocalization to MAZ binding sites upon loss of CTCF
We observed that similar to CTCF12, MAZ coimmunoprecipitates with the cohesin component RAD21 (Fig. 5a and Extended Data Fig. 2f,g), as demonstrated recently by Xiao et al35. CTCF, MAZ and RAD21 appear to colocalize at ~1,500 binding sites in ESCs (Fig. 5b), as described previously35. As cohesin was reported to be redistributed away from CTCF sites in the absence of CTCF47 (Fig. 5c and Supplementary Note 8) supporting the loop-extrusion model48,49, we explored the underlying DNA motifs in these regions of cohesin relocalization (Fig. 5c). Interestingly, the most enriched motif in the majority of relocalized RAD21 binding sites upon CTCF degradation resembled the MAZ binding motif (Fig. 5d and Extended Data Fig. 5b,c). Moreover, such redistributed RAD21 binding sites colocalized with MAZ binding in ESCs (Fig. 5c–e). Thus, our analyses suggest that RAD21 relocalizes to MAZ binding sites in the absence of CTCF in ESCs, implying a possible barrier function for MAZ.
Skeletal pattern defects upon MAZ motif deletion at HoxA cluster
Our findings point to MAZ being critical for the proper establishment of positional identity and topological organization in ESC-derived cervical MNs. Thus, we hypothesized that MAZ motif deletions would produce homeotic transformations in vivo, similar to that shown for CTCF16,17. We generated embryos with MAZ Hoxa5|6 motif deletions that ranged from 20 to 64 bp in cis to the MAZ motif using CRISPR (Supplementary Fig. 6) and investigated their skeletal development. In WT mice, there are 7 cervical (C1–C7), 13 thoracic (T1–T13), 6 lumbar (L1–L6) and 4 sacral (S1–S4) vertebrae50. Compared to WT mice, MAZ (Δ5|6) mouse embryos showed cervicothoracic C7-to-T1 transformation (Fig. 6a), similar to the homeotic transformations reported previously in the case of CTCF binding site deletions at the Hox clusters17. The observed phenotype indicates different levels of expressivity, mostly unilateral extension and ~78% penetrance (Fig. 6b). Thus, MAZ functions as a boundary factor in the HoxA cluster during development of the axial skeleton.
Discussion
In this study, we demonstrated that an unbiased genome-wide CRISPR screen coupled with biochemical approaches enabled the identification of factors that function similarly to and in conjunction with CTCF. Our results place MAZ as a boundary factor that functions in partitioning Hox clusters into insulated domains wherein Trithorax and Polycomb activities are important in maintaining the distinct Hox gene expression patterns critical to anterior–posterior positioning during development. MAZ KO or MAZ binding site deletions at active and repressed gene borders in Hox clusters phenocopy the effect of CTCF binding site deletions at Hox clusters16,17. In particular, the transcriptional effect of MAZ motif deletions in Hox clusters points to their requirement for transcriptional insulation. This scenario may constitute a precedent in which DNA neighboring a CTCF site can influence boundary activity51, not only by indicating the requirement for a distinct DNA motif but also by revealing an insulation factor, MAZ, at Hox clusters.
In addition to CTCF and cohesin, MAZ also contributes to the integrity of TADs and contacts within TADs, as recently reported in K562 cells35. Looping interactions are impacted upon loss of MAZ, although the effects are not as large scale as the loss of essential architectural proteins such as CTCF42 or cohesin52 (Supplementary Note 9). Based on our current model, MAZ binding adjacent to CTCF and interaction of each with cohesin support their function at loops, possibly with other proteins (discussed below), such that disruption of these loops is accompanied by altered gene expression (Fig. 7). Moreover, although our results suggest that in the absence of CTCF, MAZ might serve as a possible block to cohesin during loop extrusion, possibly with other factors (Fig. 7, right), this model remains to be tested (Supplementary Note 9).
Consistent with our findings, Maz−/− mice show perinatal lethality and developmental defects in the kidney and urinary track53 and eye development54, although other phenotypes remain to be investigated (Supplementary Note 10). Deletion of a critical CTCF site separating chromatin domains resulted in Hoxd13 misexpression in the developing kidneys55. The cervicothoracic transformation we observed in the context of axial-skeleton development in mice with a MAZ motif deletion at Hoxa5|6 is similar to that observed for a CTCF motif deletion at the Hoxc5|6 region17. Although the transformation phenotype of the CTCF Hoxc5|6 mice has been shown to be fully penetrant, MAZ Hoxa5|6 motif deleted mice show similar penetrance levels to CTCF Hoxa5|6:7|9 motif deletions17. Our findings are in agreement with those obtained in loss-of-function studies for Hoxa5 and Hoxa6 exhibiting a similar ectopic rib at C750,56,57 and others for Hoxc5 and Hoxc6 showing cervicothoracic transformations50,58,59. Indeed, our observation of homeotic transformations in skeleton with the MAZ motif deletion at Hoxa5|6 reinforces the importance of MAZ during normal development.
Our findings point to MAZ functioning as an insulator-like factor at Hox clusters in vitro and in vivo, sharing other properties with CTCF such as cohesin interaction and being critical to global gene regulation and genome organization (Supplementary Note 11). Such regulation is critical for the spatial and temporal progression of gene expression to ensure proper development. We note that this report has identified other candidates that may be required for the integrity of the CTCF boundary at the HoxA cluster as well as chromatin-based CTCF partners or colocalizing proteins under endogenous conditions during differentiation. These candidates were systematically narrowed down based on their insulation function at the HoxA cluster. Although our CRISPR loss-of-function screens are limited to the identification of those genes that are mainly nonessential, our biochemical approaches identified both essential and nonessential CTCF partners in undifferentiated versus differentiated cells. Similar to MAZ, some of these other candidates could potentially contribute to CTCF, cohesin and/or MAZ function, reflecting their impact on gene regulation during development.
Methods
This study was performed under compliance with ethical regulations and approved by New York University (NYU)/NYU Grossman School of Medicine’s Institutional Biosafety Committee.
Cell culture and MN differentiation
E14TG2a mouse ESCs (ES-E14TG2a, ATCC, CRL-1821) were cultured in standard medium supplemented with LIF and 2i conditions (1 mM MEK1/2 inhibitor (PD0325901, Stemgent) and 3 mM GSK3 inhibitor (CHIR99021, Stemgent)). For MN differentiation, a previously described protocol was applied16. Briefly, ESCs were differentiated into embryoid bodies in 2 days, and further patterning was induced with addition of 1 μM all-trans-retinoic acid (Sigma) and 0.5 μM smoothened agonist (Calbiochem). Biological replicates stand for independent differentiation experiments performed. 293FT cells (R70007, Thermo Fisher Scientific) were cultured in standard medium as described in the manufacturer’s protocol.
CRISPR genome editing
sgRNAs were designed using CRISPR design tools (http://crispr.mit.edu/; currently available at https://benchling.com). All sgRNAs were cloned into SpCas9-2AGFP vector (Addgene: PX458) or into a lentiviral vector lentiGuide-puro (52963, Addgene). The sgRNAs were transfected into ESCs using Lipofectamine 2000 (Invitrogen) as described previously16 or infected into an ESC clone expressing lentiCas9-blast (52962, Addgene). In the case of CRISPR knockin cell lines, donor DNA (1 μl of 10 μM single-stranded DNA oligonucleotide or 3 μg pBluescriptSK (+) plasmid containing donor DNA) were transfected with 1 μg px458-sgRNAs. Single clones from GFP-positive FACS-sorted cells or puromycin (InvivoGen)-resistant cells were genotyped and confirmed by sequencing. When necessary, PCR products were further assessed by TOPO cloning (Invitrogen) and sequencing to distinguish the amplified products of different alleles. The sequencing chromatograms were aligned in Benchling. All sgRNAs, donors and genotyping primers are shown in Supplementary Table 1.
Cell line generation for Hoxa5:a7 dual reporter in WT and CTCF (Δ5|6:6|7) backgrounds
To generate Hoxa5:a7 dual-reporter cells, ESCs were sequentially targeted at Hoxa5 and Hoxa7 loci, respectively. ESCs were initially transfected with sgRNA and donor pBluescriptSK (+) plasmid for Hoxa5-P2A-mCherry cell line generation using Lipofectamine (Invitrogen). Hoxa5-mCherry cell line was confirmed through genotyping, sequencing, and FACS analysis upon MN differentiation for the homozygous insertion of reporter. Next, the Hoxa5-mCherry cell line was transfected with sgRNA and donor pBluescriptSK (+) plasmid for generation of the dual Hoxa5:a7 knock-in cell line, which was confirmed by genotyping, sequencing, and FACS analysis for the homozygous insertion of reporter. To demonstrate Hoxa7-P2A-eGFP expression in MNs, CTCF binding sites at Hoxa5|6 and Hoxa6|7 were removed via sequential CRISPR genome editing using respective sgRNAs, generating CTCF (Δ5|6) and CTCF (Δ5|6:6|7) deletion lines in the Hoxa5:a7 dual-reporter background. For CRISPR library screen experiments, WT or CTCF (Δ5|6:6|7) dual-reporter lines were transduced with lentiCas9-blast (Addgene, 52962), and Cas9-expressing clones were obtained after selection with blasticidin (InvivoGen).
Cell line generation for FLAG-CTCF-tagged cells
To generate the CTCF C-terminal FLAG-tagged cell line, E14TG2a mouse ESCs were targeted with sgRNA in SpCas9-2AGFP vector (Addgene, PX458) and single-stranded donor oligonucleotide at the Ctcf locus. The cell line was confirmed by genotyping, sequencing, and western blot for FLAG-CTCF.
Cell line generation for MAZ KO cells
WT or CTCF (Δ5|6:6|7) Hoxa5:a7 dual-reporter cells expressing Cas9 were targeted with sgRNAs in lentiGuide-puro vector for Maz. Knock-out of Maz was confirmed by genotyping, sequencing, and western blot.
Cell line generation for MAZ binding site deletions
Hoxa5:a7 dual-reporter cells were targeted with sgRNAs in SpCas9-2AGFP vector (Addgene, PX458) for MAZ binding sites at HoxA, HoxD or HoxC clusters. Specific MAZ binding site deletions were confirmed by genotyping and sequencing.
CRISPR screens
CRISPR genome-wide screens were done using methods described previously22,23. Briefly, GeCKO genome-wide pooled CRISPR libraries (Addgene, 1000000053) were amplified and deep-sequenced to confirm sgRNA representations, as shown previously22. A Cas9-expressing Hoxa5:a7 ESC clone was transduced with the pooled lentiviral sgRNAs at a low multiplicity of infection (~0.4). The reporter ESCs were selected with puromycin, cultured for 7 days, differentiated into MNs in 6 days and sorted by FACS into two MN populations: (1) WT MNs (mCherry-positive/eGFP-negative cells) and (2) CTCF-boundary-disrupted MNs (double-positive cells). During the screens, 300× and 1,000× coverage was applied for genome-wide screens and secondary screens, respectively. CRISPR libraries were prepared at each time point and/or sorted population, and the relative sgRNA representation was assessed using next-generation sequencing, as described previously22,23.
Custom library construction for secondary CRISPR screens
sgRNAs for custom library used in the secondary CRISPR screens were retrieved from a previously designed genome-wide mouse CRISPR KO pooled library (Brie)60. When required for several genes, sgRNAs were designed by using the Broad Institute CRISPRko gRNA design tools (https://portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design). All sgRNAs in the custom library in Supplementary Dataset 6 were synthesized as a pool by Twist Bioscience. The custom library was cloned into lentiGuide-puro vector, amplified and verified in terms of representation of all constructs using methods described previously61.
Flow cytometry
Cells were trypsinized, filtered and stained with 4,6-diamidino-2-phenylindole (Sigma) to eliminate dead cells during analysis of Hoxa5:a7 reporters in ESCs and MNs. Hoxa5:a7 dual fluorescent reporter cells in WT versus other backgrounds were assessed by using single-color fluorescent reporters as controls in the same cell type as analyzed (i.e., MNs). Hb9-T2A-eGFP reporter cells (not shown) were used as GFP control in MNs (Supplementary Fig. 7a). For cell cycle analysis, ESCs were fixed in 75% ethanol, and DNA was stained with propidium iodide (Thermo Fisher Scientific) after RNase A (Thermo Fisher Scientific) treatment. FlowJo (version 8.7) was used for all FACS analysis (Supplementary Fig. 7b).
Expression analysis
RNA was purified from cells with RNAeasy Plus Mini kit (Qiagen), and RT was performed on 1 μg RNA by using Superscript III (Life Technologies) and random hexamers (Thermo Fisher Scientific). RT-qPCRs were performed in replicates on 100 ng cDNA using PowerUp SYBR Green Master Mix (Thermo Fisher Scientific). The primers are listed in Supplementary Table 2. For RNA-seq analysis, 1 μg RNA was used to prepare ribo-minus RNA-seq libraries according to the manufacturer’s protocols by the NYU Genome Technology Center.
ChIP-seq
ChIP-seq experiments were performed as described previously62 (see details regarding ESC fixation in Oksuz et al.62 and MN fixation in Narendra et al.16). Briefly, cells were fixed with 1% formaldehyde, nuclei were isolated and chromatin was fragmented to ~250 bp using a Diagenode Bioruptor. ChIP was performed using antibodies listed in Supplementary Table 2. Chromatin from Drosophila (1:100 ratio to ESC- or MN-derived chromatin), and Drosophila-specific H2Av antibody was used as a spike-in control in each sample. For ChIP-seq, libraries were prepared as described previously16 using 1–30 ng immunoprecipitated DNA. ChIP-qPCRs were performed with PowerUp SYBR Green Master Mix (Thermo Fisher Scientific) and detected by the Stratagene Mx3005p or QuantStudio 5 (Thermo Fisher Scientific) instrument. All ChIP-qPCR primers are listed in Supplementary Table 2.
CUT&RUN
This method was performed as described previously63,64 using 100 000–200 000 cells that were sorted for double-positive (Hoxa5-P2A-mCherry and Hoxa7-P2A-eGFP) populations. WT MNs were treated similarly and collected through FACS. The cells were re-counted after sorting and the published protocol65 detailed in https://www.protocols.io by Janssens and Henikoff was followed. CUT&RUN experiments were analyzed with the methods described for ChIP-seq below.
Preparation of 4C template
Cells were processed for 4C sequencing (4C-seq) as described previously16,66. Cells were trypsinized and counted, and 1 × 107 cells were crosslinked with the crosslinking solution (2% formaldehyde and 10% FBS in 1× PBS) for 10 min at room temperature. After the reaction was quenched with glycine, cells were lysed on ice with 1 ml lysis buffer (50 mM Tris, pH 7.3, 150 mM NaCl, 5 mM EDTA, 0.5% NP-40 and 1% Triton X-100) for 15 min. Nuclei were spun down and frozen at −80 °C. Upon thawing on ice, nuclei were resuspended in 360 µl H2O. 60 µl of 10× DpnII restriction buffer and 15 µl 10% SDS were added to the samples and left to shake for 1 h at 37 °C. Afterwards, 150 µl of 10% Triton X-100 was then added, and the samples were incubated for 1 h at 37 °C. After taking 5 µl undigested control, the remaining nuclei were incubated overnight with 200 U DpnII restriction enzyme (New England Biolabs, R0543M). Then, 200 U fresh DpnII was additionally added the next morning for 6 h. After digestion, DpnII was inactivated with 80 µl 10% SDS, and a proximity ligation reaction was performed in a 7-ml volume using 4,000 U T4 DNA Ligase (Roche, M0202M). Then, 300 µg Proteinase K was added, and the crosslinks were reversed at 65oC overnight. Samples were treated with 300 µg RNase A for 45 min at 37 °C the next day, and DNA was precipitated with ethanol. A second restriction digestion was performed with 50 U Csp6l (Fermentas, ER0211) in 500 µl reaction volume. The enzyme was then inactivated at 65 °C for 25 min, and a proximity ligation reaction was done in 14-ml volume with 6,000 U T4 DNA ligase. Finally, the resulting DNA was precipitated with ethanol and purified using the QIAquick PCR purification kit.
Preparation of Hi-C samples
Cells were removed, and 1 M cells were fixed in 2% formaldehyde (Fisher Chemical) according to the ARIMA-Hi-C protocol. Samples were prepared and sequenced according to the manufacturer’s protocol by the NYU Grossman School of Medicine’s Genome Technology Center.
Cellular fractionation, immunoprecipitation and recombinant protein purification
All cellular fractionation and immunoprecipitation experiments were performed at 4 °C or on ice with buffers containing 1 μg ml−1 pepstatin, 1 μg ml−1 aprotonin, 1 μg ml−1 leupeptin, 0.3 mM PMSF, 10 mM sodium fluoride and 5 mM sodium orthovanadate. For FLAG affinity purification from native chromatin (native ChIP-mass spectrometry), nuclear extracts from ESCs and MNs were prepared using Buffer A and Buffer C, as described67. Cytosolic fraction was removed by buffer A (10 mM Tris, pH 7.9, 1.5 mM MgCl2, 10 mM KCl, and 0.5 mM dithiothreitol (DTT)). The pellet was resuspended in buffer C (20 mM Tris, pH 7.9, 25% glycerol, 420 mM NaCl, 1.5 mM MgCl2, 0.2 mM EDTA and 0.5 mM DTT) and incubated for 1 h to obtain nuclear extract. After removing the nuclear extract, the remaining nuclear pellet was solubilized by benzonase (Millipore) digestion in a buffer containing 50 mM Tris, pH 7.9, and 2 mM MgCl2. For FLAG affinity purification from native chromatin and MS, 20 mg nuclear pellet was incubated with 200 μl FLAG M2 beads in BC250 overnight and washed six times with BC250 containing 0.05% NP40, as described elsewhere68. Two elutions were performed with 0.5 mg ml−1 FLAG peptide in BC50 (without any protease inhibitors) with rotation at 4 °C for 2 h for a total of 4 h. The eluate was sent to the Biological Mass Spectrometry Facility of Robert Wood Johnson Medical School and Rutgers and analyzed by liquid chromatography tandem MS. Peptide counts are shown for the native ChIP-MS experiments in Supplementary Dataset 4.
For FLAG affinity purification from crosslinked chromatin (crosslinked ChIP-MS), a modified version of a previously reported protocol was applied32,33. Briefly, cells were crosslinked and sonicated as described above for ChIP-seq with the exception to obtain a larger fragment size that includes approximately three to five nucleosomes. Then, 3 mg chromatin was used for FLAG affinity purification, and FLAG elutions were performed after stringent washes as described previously32, but excluding the second step in the protocol wherein DNA is biotinylated. After decrosslinking, samples were sent to the Biological Mass Spectrometry Facility of Robert Wood Johnson Medical School and Rutgers and analyzed by liquid chromatography tandem MS.
For extraction in 293FT cells, CβF expression vectors containing cDNAs for CTCF (mouse) or MAZ (mouse) were transfected into 293FT cells using polyethylenimine (PEI), and nuclei was prepared using TMSD and BA450 buffers, as described previously69,70. Briefly, TMSD buffer (20 mM HEPES, 5 mM MgCl2, 85.5 g l−1 sucrose, 25 mM NaCl and 1 mM DTT) was used for cytosol removal, and nuclear extraction was done in BA450 buffer (20 mM HEPES, 450 mM NaCl, 5% glycerol and 0.2 mM EDTA). FLAG affinity purification and FLAG peptide elution were performed similarly in the nuclear fraction.
The FLAG-MAZ recombinant protein used in electrophoretic mobility shift assays (EMSA) was purified from 293FT cells expressing CβF expression vectors containing cDNA for MAZ as described before69,70. The nuclear extraction was performed as detailed above with TMSD buffer followed by BA450 buffer. FLAG affinity purification was performed under the wash conditions with BA450. FLAG peptide elution was done to elute FLAG-MAZ. The purity of FLAG-MAZ was ensured by Coomassie blue staining (~ %95 purity).
Library construction
All libraries were prepared according to the manufacturer’s instructions (Illumina). CRISPR libraries were prepared by performing two-step PCRs as described elsewhere23. Briefly, sgRNAs were amplified from genomic DNA by keeping the coverage maintained throughout the screens (300× for the GeCKO v2 library and 1,000× for the custom library in secondary screens) and performing secondary amplifications using Phusion polymerase (New England Biolabs) to attach Illumina adaptors (Supplementary Table 3). ChIP-seq libraries were prepared as described previously16. RNA-seq libraries were prepared using KAPA library preparation kits. Libraries for 4C-seq were constructed by attaching barcoded Illumina adapters to the 5ʹ end of the primer as described previously16 (Supplementary Table 6). PCR reactions were performed using the Expand Long Template PCR System (Roche), and approximately 100–700 bp DNA was gel purified and quantified before sequencing. Hi-C libraries were prepared according to the ARIMA standard Hi-C protocols by the NYU Grossman School of Medicine’s Genome Technology Center.
Electrophoretic mobility shift assays
Single-stranded oligonucleotides with MAZ DNA binding sites from the mouse HoxA and HoxD loci were annealed and radioisotope-labeled using 400 pmol double-stranded DNA, T4 PNK (Thermo Fisher Scientific, EK0031) and [γP32]-ATP (Supplementary Table 5). The probes were purified by G-25 columns (GE Healthcare, 27532501). After the labeling reaction, 40 pM probe was resuspended in binding buffer (25 mM HEPES, 50 mM NaCl, 5% glycerol, 5 mM MgCl2, 1 mM ZnSO4 and 2 μg salmon sperm DNA). The reactions were then incubated with increasing amounts of mouse recombinant MAZ (0.25, 0.5 and 0.75 μg) for 4 h at 30 °C. After the incubation, the reactions were run on 5% acrylamide gels for 30 min at room temperature, 200 V and 0.25× TAE buffer. Finally, the acrylamide gel was dried and exposed overnight.
CRISPR zygotic injection
MAZ Hoxa5|6 mutant mice were generated by zygotic injection71 as described previously17. Briefly, 50 ng µl−1 gRNA template (Synthego) and 100 ng µl−1 Cas9 mRNA were injected into the cytoplasm of ~150 C57BL/6 zygotes in the NYU Grossman School of Medicine’s Rodent Genetic Engineering Laboratory. Surviving embryos were transferred to four pseudopregnant females, and a total of 27 pups were born. These pups were genotyped by PCR using genotyping primers (Supplementary Table 1) and Sanger sequencing, indicating the genomic alterations as summarized in Supplementary Figure 6. When required, TOPO cloning was applied to reveal different alleles by Sanger sequencing (Supplementary Fig. 6). Mouse studies were approved by NYU Grossman School of Medicine’s Institutional Animal Care and Use Committee. Housing conditions were as follows: dark/light cycle, 6:30 pm to 6:30 am (off) / 6:30 am to 6:30 pm (on); temperature, 21 °C ± 1 or 2 °C; and humidity range, 30–70%.
Alcian blue–Alizarin red staining of skeleton
The neonates (postnatal day 0.5) were dissected by removing the skin and organs, and skeletal staining was performed as described previously17. Embryos were fixed for 4 days in 95% ethanol with rocking at room temperature. Ethanol was replaced with Alcian blue stain (0.03% Alcian blue, 80% ethanol and 20% acetic acid) for 24 h with rocking at room temperature. Embryos were washed with 95% ethanol twice for 1 h each time with rocking at room temperature and transferred to 2% KOH solution for 24 h. The specimens were then stained with Alizarin red solution (0.03% Alizarin red and 1% KOH in water) for 24 h. Finally, the skeleton was further washed in 1% KOH/20% glycerol for 6 days, 1% KOH/50% glycerol for 10 days and passed to 100% glycerol. In case of long-term storage, the skeletons were transferred to 4:1 glycerol/ethanol.
Data analysis of CRISPR screens
MAGeCK tools (version 0.5.7) was used for all primary and secondary CRISPR screen analyses27,28. Genome-wide screens with GeCKO v2 library A (three sgRNAs per gene) and GeCKO v2 library B (three sgRNAs per gene) were analyzed together in total populations of ESCs and MNs to identify essential/differentiation-related genes (negative selections). The analysis was done separately for library A (two screens) and library B (two screens) in sorted MN populations to identify genes affecting CTCF-boundary function (positive selection). When there is no replicate in a CRISPR screen, MAGeCK estimates the mean and variance of all samples from both control and treated conditions, assuming that most sgRNAs have no effect on selection27. The PANTHER database was used for GO analysis72, and the PANTHER overexpression test tool was utilized for statistical analysis73. To generate Venn diagrams in CRISPR screens, web tools (http://genevenn.sourceforge.net) were used.
Data analysis of RNA-seq
RNA-seq data were analyzed as described previously16. Briefly, sequence reads were mapped to mm10 reference genome with bowtie2 (version 2.3.4.1) (ref. 74), and normalized differential gene expression was obtained with DESeq2 (version 1.26.0) (refs. 75,76). Differential gene expression analysis was performed using the Wald test built into DESeq2 with an FDR cutoff of 0.05. Relevant expression and P values are listed for differentially expressed genes in Supplementary Datasets 10, 11, 13 and 14. The PANTHER database was used for GO analysis72.
Data analysis of ChIP-seq
ChIP-seq experiments were analyzed as described previously62. In brief, sequence reads were mapped to mm10 reference genome with bowtie2 (version 2.3.4.1) using default parameters74. Quality filtering and removal of PCR duplicates were performed by using SAMtools (version 1.9) (ref. 77). After normalization with the spike-in Drosophila read counts, normalized ChIP-seq read densities were visualized in Integrative Genomics Viewer version 2.4.14 (ref. 78). MACS (version 1.4.2) was used for narrow peak calling using default parameters of macs2 (ref. 79). Heat maps were generated using deepTools in R (version 3.1.2) (ref. 80). The ChIPpeakAnno package (version 3.20.1) from Bioconductor81 was used to draw Venn diagrams to visualize the overlap among ChIP-seq samples. In addition, BEDTools (version 2.27.1) was also used for the assessment of overlaps82. The replicates were assessed similarly by visualizing at Integrative Genomics Viewer (version 2.4.14) and generating heat maps. ChIP-seq BED file coordinates were converted into fasta using fetch sequences tool within Regulatory Sequence Analysis Tools83; MEME (version 5.4.1) was used for motif analysis of MAZ in ESCs and MNs84, SpaMo (version 5.4.1) was used for distance analysis between CTCF and MAZ motifs in ESCs and MNs85 and Tomtom (version 5.4.1) was used as a motif comparison tool86. CTCF and MAZ occupancies in the subset of genes shown in Extended Data Fig. 5e were analyzed by using EaSeq software (version 1.111)87.
Data analysis of 4C-seq
4C-seq data were analyzed using the 4C-ker (version 0.0.0.9000) pipeline88. Briefly, reads were mapped to mm10 reduced genome, and undigested and self-ligated fragments were removed. Near-bait analysis was generated in R by using 4C-ker tools.
Data analysis of Hi-C
All samples were prepared in two biological replicates. All Hi-C data were analyzed by the Hi-C bench platform (version 0.1) (ref. 89). Throughout our comprehensive analysis, the following operations were done using Hi-C bench. Internally, bowtie2 (ref. 90) was used to align the paired reads using mm10 reference genome and only the read pairs uniquely mapped to the same chromosome with the mapping quality ≥20 and the pair distance ≥25 kb were used. Then, the interaction matrix was tabulated by reading the coordinates of aligned reads in 20-kb bins. To ensure that each interaction bin showed equal visibility, the iterative correction method91 was used to normalize the bins.
For the compartment analysis, the Hi-C interaction bins were divided into A and B compartments using the first principal component values from HOMER’s (version 4.11) runHiCpca92,93. Using Hi-C-bench, the compartment changes from comparison of two cell types for the bins in the interaction matrix were visualized by the stacked bar plot.
TADs were defined as shown before89,94 with the insulating window of 500 kb. The boundaries of TADs were called from the boundary score using the “ratio” method defined89, wherein each TAD boundary had a noticeably lower boundary score than the neighboring region. The score was calculated for each 20-kb bin using the window size of 250 kb, 500 kb and 1,000 kb. In the principal-component analysis to distinguish the differences, the boundary score for every replicate and cell type was combined, quantile normalized and plotted. Then, for each TAD, the magnitude of intra-TAD “activity” was defined as reported previously94. The cutoff for significantly differential TADs was Benjamini–Hochberg corrected Q value of 0.05 and no cutoff for the fold change.
Significantly enriched chromatin loops were called using FitHi-C (version 2.0.7) (ref. 95) with default parameters. To characterize the loops by CTCF and MAZ ChIP-seq levels, APA software46 was used to show the averaged profile. When filtering the Hi-C loops for the occupancy of CTCF and MAZ, a binary cutoff was placed such that the ChIP-seq signal at the anchors had values shown in Supplementary Table 4. The genome sequence that matched the transcription factor motifs of mouse CTCF and MAZ from the Catalog of Inferred Sequence Binding Preferences96 was found from PWMScan (version 1.1.9) (ref. 97). Visualization of Hi-C and associated ChIP-seq data were made with pyGenomeTracks (version 3.5) (ref. 98).
Analysis of CTCF/MAZ motif orientation in Hi-C anchors
A chromatin loop found by Hi-C can have one or multiple motif hits of transcription factors such as CTCF or MAZ, in either the 5ʹ or 3ʹ anchors or both. The similarity of sequence between the loci and the known transcription factor motifs was calculated using the motifFinder feature of Juicer (version 1.5) (ref. 99), and the location and the direction of motif matches were produced. To reduce the complexity and the potential false positives, the sequences were compared only at the intersection of loop anchors and the ChIP-seq peaks for respective transcription factors. Find Individual Motif Occurrences of MEME suite (version 5.2.0) (refs. 100,101) was used with a P value cutoff of 10−3 to associate anchors with motifs. In case of multiple motif hits in the anchors, motifFinder found one with the highest score and reported it. One of the CTCF motifs was chosen from the M1 motif102 and downloaded from Juicer’s reference data. Also, the position-weight matrices of CTCF and MAZ motifs found by our study (Extended Data Fig. 5b,c) were used.
For the pairwise motif orientation from 5ʹ and 3ʹ anchors, only the cases wherein motifs were located in both anchors were considered. If a loop contained the motifs hits wherein its 5ʹ anchor harbored a positive direction and its 3ʹ anchor had a negative direction, the loop was defined as having a convergent motif hit. In case of the negative direction on 5ʹ and the positive direction on 3ʹ anchors, the loop was defined to contain a divergent motif hit. If the anchors contained all positive or all negative direction on both anchors, then the loop was defined as tandem. The proportion of convergent, tandem or divergent loops over the sum of loop groups was compared across experiment types.
Statistical analysis
Statistical analyses related to experiments are described above in each section. Statistical analyses in bar plots were performed using GraphPad Prism (version 9.2.0). The R package pcr (ref. 103) was used in Extended Data Fig. 4e–g.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-021-01008-5.
Supplementary information
Acknowledgements
We thank L. Vales for reading and guidance on the manuscript; S. Tu, S. Krishnan and T. Escobar for discussions; O. Oksuz and B. Akgol Oksuz for initial discussions regarding sequencing data analysis; Y. Grobler for providing Drosophila S2R+ cells; S. Liu for discussions on CUT&RUN; D. Hernandez for technical assistance with mice; past and present members of the Reinberg laboratory for discussions as the work was in progress; G. Felsenfeld for sharing his group’s MAZ-related manuscript with us; D. J. Lamb and M. A. O’Neill for initial discussions about mice; and P. Lhoumaud for discussions on 4C-seq. We also thank the NYU Grossman School of Medicine’s Genome Technology Center, particularly A. Heguy, P. Zappile and P. Meyn, for help with sequencing; the Applied Bioinformatics Laboratories for providing bioinformatics support; NYU Grossman School of Medicine’s Cytometry and Cell Sorting Core, particularly P. Lopez and M. Gregory for help with FACS; NYU Grossman School of Medicine’s Rodent Genetic Engineering Laboratory, particularly S. Kim, for help with generating the mice; and Biological Mass Spectrometry Facility of Robert Wood Johnson Medical School and Rutgers, particularly H. Zheng, for help with proteomics. This study utilized computing resources at the High-Performance Computing Facility of the Center for Health Informatics and Bioinformatics at NYU Grossman School of Medicine. This work was supported by National Institutes of Health (NIH) grant R01NS100897 and the Howard Hughes Medical Institute (D.R.); NIH grant R01NS100897 (E.O.M.); American Cancer Society grant RSG-15-189-01-RMC, St. Baldrick’s Foundation grant 581357 and National Cancer Institute/NIH grant P01CA229086-01A1 (A.T.); NIH grants R35GM122515 and P01CA229086 (J.A.S.); NIH grant F31HD090892 (H.O.); Alex’s Lemonade Stand Foundation for Childhood Cancer (G.L.); and the Memorial Sloan Kettering T32 Clinical Scholars Program (V.N.). The NYU Grossman School of Medicine’s Genome Technology Center, the Applied Bioinformatics Laboratories, the NYU Grossman School of Medicine’s Cytometry and Cell Sorting Core, and NYU Grossman School of Medicine’s Rodent Genetic Engineering Laboratory are supported partially by NIH/National Cancer Institute Support Grant P30CA016087 at the Laura and Isaac Perlmutter Cancer Center. The Biological Mass Spectrometry Facility of Robert Wood Johnson Medical School and Rutgers is supported by NIH shared instrumentation grant S10OD01640.
Extended data
Source data
Author contributions
H.O., E.O.M. and D.R. conceived the project, designed the experiments and wrote the paper; H.O. performed most of the experiments and the bioinformatic analysis (except for Hi-C); P.-Y.H. helped with immunoprecipitation, RT-qPCRs, mice and analysis; H.C. performed the bioinformatic analysis of Hi-C under the supervision of A.T.; V.N. advised in the initial design of study; G.L. advised on the mass spectrometry; E.G.-B. performed in vitro binding assays; and J.A.S. advised on the progression of this study.
Peer review
Peer review information
Nature Genetics thanks Guillaume Andrey and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
Sequencing data have been deposited at Gene Expression Omnibus (GSE157139). We used the publicly available datasets in Fig. 5b–e pertaining to CTCF-degron ESCs (GEO GSE98671 and GSE156868). The list of differentially expressed genes in CTCF-degron ESCs used in Fig. 2m was previously reported42. Proteomic data have been deposited to the ProteomeXchange Consortium via PRIDE (PXD030452 and PXD030543). Supplementary Datasets 1–14 are provided with this paper. Source data are provided with this paper.
Code availability
Analysis tools used in this study have been published before as described in Methods and Reporting Summary.
Competing interests
D.R. was a cofounder of Constellation Pharmaceuticals and Fulcrum Therapeutics but currently has no affiliation with either company. The other authors have no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
is available for this paper at 10.1038/s41588-021-01008-5.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-021-01008-5.
References
- 1.Misteli T. Beyond the sequence: cellular organization of genome function. Cell. 2007;128:787–800. doi: 10.1016/j.cell.2007.01.028. [DOI] [PubMed] [Google Scholar]
- 2.Van Bortle K, et al. Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains. Genome Res. 2012;22:2176–2187. doi: 10.1101/gr.136788.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Phillips-Cremins JE, Corces VG. Chromatin insulators: linking genome organization to cellular function. Mol. Cell. 2013;50:461–474. doi: 10.1016/j.molcel.2013.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ong CT, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet. 2014;15:234–246. doi: 10.1038/nrg3663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lobanenkov VV, et al. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5’-flanking sequence of the chicken c-myc gene. Oncogene. 1990;5:1743–1753. [PubMed] [Google Scholar]
- 6.Ohlsson R, Renkawitz R, Lobanenkov V. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet. 2001;17:520–527. doi: 10.1016/s0168-9525(01)02366-6. [DOI] [PubMed] [Google Scholar]
- 7.Heath H, et al. CTCF regulates cell cycle progression of alphabeta T cells in the thymus. EMBO J. 2008;27:2839–2850. doi: 10.1038/emboj.2008.214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Splinter E, et al. CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev. 2006;20:2349–2354. doi: 10.1101/gad.399506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nora EP, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485:381–385. doi: 10.1038/nature11049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ciabrelli F, Cavalli G. Chromatin-driven behavior of topologically associating domains. J. Mol. Biol. 2015;427:608–625. doi: 10.1016/j.jmb.2014.09.013. [DOI] [PubMed] [Google Scholar]
- 12.Zlatanova J, Caiafa P. CTCF and its protein partners: divide and rule? J. Cell Sci. 2009;122:1275–1284. doi: 10.1242/jcs.039990. [DOI] [PubMed] [Google Scholar]
- 13.Rubio ED, et al. CTCF physically links cohesin to chromatin. Proc. Natl Acad. Sci. USA. 2008;105:8309–8314. doi: 10.1073/pnas.0801273105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wendt KS, et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature. 2008;451:796–801. doi: 10.1038/nature06634. [DOI] [PubMed] [Google Scholar]
- 15.Cuddapah S, et al. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 2009;19:24–32. doi: 10.1101/gr.082800.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Narendra VR, et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015;347:1017–1021. doi: 10.1126/science.1262088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Narendra V, Bulajic M, Dekker J, Mazzoni EO, Reinberg D. CTCF-mediated topological boundaries during development foster appropriate gene regulation. Genes Dev. 2016;30:2657–2662. doi: 10.1101/gad.288324.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nitzsche A, et al. RAD21 cooperates with pluripotency transcription factors in the maintenance of embryonic stem cell identity. PLoS One. 2011;6:e19470. doi: 10.1371/journal.pone.0019470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ran FA, et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 2013;8:2281–2308. doi: 10.1038/nprot.2013.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liu JP, Laufer E, Jessell TM. Assigning the positional identity of spinal motor neurons: rostrocaudal patterning of Hox-c expression by FGFs, Gdf11, and retinoids. Neuron. 2001;32:997–1012. doi: 10.1016/s0896-6273(01)00544-x. [DOI] [PubMed] [Google Scholar]
- 21.Mazzoni EO, et al. Saltatory remodeling of Hox chromatin in response to rostrocaudal patterning signals. Nat. Neurosci. 2013;16:1191–1198. doi: 10.1038/nn.3490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods. 2014;11:783–784. doi: 10.1038/nmeth.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shalem O, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343:84–87. doi: 10.1126/science.1247005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wichterle H, Lieberam I, Porter JA, Jessell TM. Directed differentiation of embryonic stem cells into motor neurons. Cell. 2002;110:385–397. doi: 10.1016/s0092-8674(02)00835-8. [DOI] [PubMed] [Google Scholar]
- 25.Shalem O, Sanjana NE, Zhang F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 2015;16:299–311. doi: 10.1038/nrg3899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Parnas O, et al. A genome-wide CRISPR screen in primary immune cells to dissect regulatory networks. Cell. 2015;162:675–686. doi: 10.1016/j.cell.2015.06.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li W, et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15:554. doi: 10.1186/s13059-014-0554-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Li W, et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 2015;16:281. doi: 10.1186/s13059-015-0843-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yusufzai TM, Tagami H, Nakatani Y, Felsenfeld G. CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. Mol. Cell. 2004;13:291–298. doi: 10.1016/s1097-2765(04)00029-2. [DOI] [PubMed] [Google Scholar]
- 30.Soldi, M. & Bonaldi, T. The ChroP approach combines ChIP and mass spectrometry to dissect locus-specific proteomic landscapes of chromatin. J. Vis. Exp. (86), 51220 (2014). [DOI] [PMC free article] [PubMed]
- 31.Wang CI, et al. Chromatin proteins captured by ChIP-mass spectrometry are linked to dosage compensation in Drosophila. Nat. Struct. Mol. Biol. 2013;20:202–209. doi: 10.1038/nsmb.2477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Rafiee MR, Girardot C, Sigismondo G, Krijgsveld J. Expanding the circuitry of pluripotency by selective isolation of chromatin-associated proteins. Mol. Cell. 2016;64:624–635. doi: 10.1016/j.molcel.2016.09.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Tu S, LeRoy G, Reinberg D. Chromatin starts to come clean. Mol. Cell. 2016;64:439–441. doi: 10.1016/j.molcel.2016.10.022. [DOI] [PubMed] [Google Scholar]
- 34.Van Bortle K, et al. Insulator function and topological domain border strength scale with architectural protein occupancy. Genome Biol. 2014;15:R82. doi: 10.1186/gb-2014-15-5-r82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xiao T, Li X, Felsenfeld G. The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organization. Proc. Natl Acad. Sci. USA. 2021;118:e2023127118. doi: 10.1073/pnas.2023127118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang K, Li N, Ainsworth RI, Wang W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 2016;7:12249. doi: 10.1038/ncomms12249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhang S, Liang Y, Wang X, Su Z, Chen Y. FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets. DNA Res. 2019;26:231–242. doi: 10.1093/dnares/dsz004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bailey SD, et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2015;2:6186. doi: 10.1038/ncomms7186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhou Q, et al. ZNF143 mediates CTCF-bound promoter-enhancer loops required for murine hematopoietic stem and progenitor cell function. Nat. Commun. 2021;12:43. doi: 10.1038/s41467-020-20282-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bossone SA, Asselin C, Patel AJ, Marcu KB. Maz, a zinc finger protein, binds to C-Myc and C2 gene-sequences regulating transcriptional initiation and termination. Proc. Natl Acad. Sci. USA. 1992;89:7452–7456. doi: 10.1073/pnas.89.16.7452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kennedy GC, Rutter WJ. Pur-1, a zinc-finger protein that binds to purine-rich sequences, transactivates an insulin promoter in heterologous cells. Proc. Natl Acad. Sci. USA. 1992;89:11498–11502. doi: 10.1073/pnas.89.23.11498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nora EP, et al. Targeted fegradation of CTCF fecouples local insulation of chromosome domains from genomic compartmentalization. Cell. 2017;169:930–944. doi: 10.1016/j.cell.2017.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Saldana-Meyer R, et al. RNA interactions are essential for CTCF-mediated genome organization. Mol. Cell. 2019;76:412–422. doi: 10.1016/j.molcel.2019.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Song J, et al. Two consecutive zinc fingers in Sp1 and in MAZ are essential for interactions with cis-elements. J. Biol. Chem. 2001;276:30429–30434. doi: 10.1074/jbc.M103968200. [DOI] [PubMed] [Google Scholar]
- 45.de Wit E, et al. CTCF binding polarity determines chromatin looping. Mol. Cell. 2015;60:676–684. doi: 10.1016/j.molcel.2015.09.023. [DOI] [PubMed] [Google Scholar]
- 46.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Nora EP, et al. Molecular basis of CTCF binding polarity in genome folding. Nat. Commun. 2020;11:5612. doi: 10.1038/s41467-020-19283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Davidson IF, et al. DNA loop extrusion by human cohesin. Science. 2019;366:1338–1345. doi: 10.1126/science.aaz3418. [DOI] [PubMed] [Google Scholar]
- 49.Li Y, et al. The structural basis for cohesin-CTCF-anchored loops. Nature. 2020;578:472–476. doi: 10.1038/s41586-019-1910-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.McIntyre DC, et al. Hox patterning of the vertebrate rib cage. Development. 2007;134:2981–2989. doi: 10.1242/dev.007567. [DOI] [PubMed] [Google Scholar]
- 51.Huang H, et al. CTCF mediates dosage- and sequence-context-dependent transcriptional insulation by forming local chromatin domains. Nat. Genet. 2021;53:1064–1074. doi: 10.1038/s41588-021-00863-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rao SSP, et al. Cohesin loss eliminates all loop domains. Cell. 2017;171:305–320. doi: 10.1016/j.cell.2017.09.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Haller M, Au J, O’Neill M, Lamb DJ. 16p11.2 transcription factor MAZ is a dosage-sensitive regulator of genitourinary development. Proc. Natl Acad. Sci. USA. 2018;115:E1849–E1858. doi: 10.1073/pnas.1716092115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Medina-Martinez O, et al. The transcription factor Maz is essential for normal eye development. Dis. Model Mech. 2020;13:dmm044412. doi: 10.1242/dmm.044412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Darbellay F, et al. The constrained architecture of mammalian Hox gene clusters. Proc. Natl Acad. Sci. USA. 2019;116:13424–13433. doi: 10.1073/pnas.1904602116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kostic D, Capecchi MR. Targeted disruptions of the murine Hoxa-4 and Hoxa-6 genes result in homeotic transformations of components of the vertebral column. Mech. Dev. 1994;46:231–247. doi: 10.1016/0925-4773(94)90073-6. [DOI] [PubMed] [Google Scholar]
- 57.Jeannotte L, Lemieux M, Charron J, Poirier F, Robertson EJ. Specification of axial identity in the mouse: role of the Hoxa-5 (Hoxl.3) gene. Genes Dev. 1993;7:2085–2096. doi: 10.1101/gad.7.11.2085. [DOI] [PubMed] [Google Scholar]
- 58.Dasen JS, Liu JP, Jessell TM. Motor neuron columnar fate imposed by sequential phases of Hox-c activity. Nature. 2003;425:926–933. doi: 10.1038/nature02051. [DOI] [PubMed] [Google Scholar]
- 59.Garcia-Gasca A, Spyropoulos DD. Differential mammary morphogenesis along the anteroposterior axis in Hoxc6 gene targeted mice. Dev. Dyn. 2000;219:261–276. doi: 10.1002/1097-0177(2000)9999:9999<::AID-DVDY1048>3.0.CO;2-3. [DOI] [PubMed] [Google Scholar]
- 60.Doench JG, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 2016;34:184–191. doi: 10.1038/nbt.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Joung J, et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat. Protoc. 2017;12:828–863. doi: 10.1038/nprot.2017.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Oksuz O, et al. Capturing the onset of PRC2-mediated repressive domain formation. Mol. Cell. 2018;70:1149–1162. doi: 10.1016/j.molcel.2018.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Skene PJ, Henikoff S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife. 2017;6:e21856. doi: 10.7554/eLife.21856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Meers MP, Bryson TD, Henikoff JG, Henikoff S. Improved CUT&RUN chromatin profiling tools. Elife. 2019;8:e46314. doi: 10.7554/eLife.46314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Skene PJ, Henikoff JG, Henikoff S. Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nat. Protoc. 2018;13:1006–1019. doi: 10.1038/nprot.2018.015. [DOI] [PubMed] [Google Scholar]
- 66.van de Werken HJ, et al. 4C technology: protocols and data analysis. Methods Enzymol. 2012;513:89–112. doi: 10.1016/B978-0-12-391938-0.00004-5. [DOI] [PubMed] [Google Scholar]
- 67.Dignam JD, Lebovitz RM, Roeder RG. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuce. Nucleic Acids Res. 1983;11:1475–1489. doi: 10.1093/nar/11.5.1475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Tu S, et al. Co-repressor CBFA2T2 regulates pluripotency and germline development. Nature. 2016;534:387–390. doi: 10.1038/nature18004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.LeRoy G, et al. LEDGF and HDGF2 relieve the nucleosome-induced barrier to transcription in differentiated cells. Sci. Adv. 2019;5:eaay3068. doi: 10.1126/sciadv.aay3068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.LeRoy G, Rickards B, Flint SJ. The double bromodomain proteins Brd2 and Brd3 couple histone acetylation to transcription. Mol. Cell. 2008;30:51–60. doi: 10.1016/j.molcel.2008.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Yang H, Wang H, Jaenisch R. Generating genetically modified mice using CRISPR/Cas-mediated genome engineering. Nat. Protoc. 2014;9:1956–1968. doi: 10.1038/nprot.2014.134. [DOI] [PubMed] [Google Scholar]
- 72.Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019;47:D419–D426. doi: 10.1093/nar/gky1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Mi H, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 2013;8:1551–1566. doi: 10.1038/nprot.2013.092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Robinson JT, et al. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Zhang Y, et al. Model-based analysis of ChIP-seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–W191. doi: 10.1093/nar/gku365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Zhu LJ, et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinf. 2010;11:237. doi: 10.1186/1471-2105-11-237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Nguyen NTT, et al. RSAT 2018: regulatory sequence analysis tools 20th anniversary. Nucleic Acids Res. 2018;46:W209–W214. doi: 10.1093/nar/gky317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. [PubMed] [Google Scholar]
- 85.Whitington T, Frith MC, Johnson J, Bailey TL. Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011;39:e98. doi: 10.1093/nar/gkr341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8:R24. doi: 10.1186/gb-2007-8-2-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Lerdrup M, Johansen JV, Agrawal-Singh S, Hansen K. An interactive environment for agile analysis and visualization of ChIP-sequencing data. Nat. Struct. Mol. Biol. 2016;23:349–357. doi: 10.1038/nsmb.3180. [DOI] [PubMed] [Google Scholar]
- 88.Raviram R, et al. 4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-seq experiments. PLoS Comput. Biol. 2016;12:e1004780. doi: 10.1371/journal.pcbi.1004780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Lazaris C, Kelly S, Ntziachristos P, Aifantis I, Tsirigos A. HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking. BMC Genomics. 2017;18:22. doi: 10.1186/s12864-016-3387-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Imakaev M, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods. 2012;9:999–1003. doi: 10.1038/nmeth.2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Heinz S, et al. Transcription elongation can affect genome 3D structure. Cell. 2018;174:1522–1536 e22. doi: 10.1016/j.cell.2018.07.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Kloetgen A, et al. Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nat. Genet. 2020;52:388–400. doi: 10.1038/s41588-020-0602-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014;24:999–1011. doi: 10.1101/gr.160374.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Weirauch MT, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158:1431–1443. doi: 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Ambrosini G, Groux R, Bucher P. PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics. 2018;34:2483–2484. doi: 10.1093/bioinformatics/bty127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Ramirez F, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9:189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Durand NC, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Bailey TL, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–W208. doi: 10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Schmidt D, et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell. 2012;148:335–348. doi: 10.1016/j.cell.2011.11.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Ahmed M, Kim DR. pcr: an R package for quality assessment, analysis and testing of qPCR data. PeerJ. 2018;6:e4473. doi: 10.7717/peerj.4473. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data have been deposited at Gene Expression Omnibus (GSE157139). We used the publicly available datasets in Fig. 5b–e pertaining to CTCF-degron ESCs (GEO GSE98671 and GSE156868). The list of differentially expressed genes in CTCF-degron ESCs used in Fig. 2m was previously reported42. Proteomic data have been deposited to the ProteomeXchange Consortium via PRIDE (PXD030452 and PXD030543). Supplementary Datasets 1–14 are provided with this paper. Source data are provided with this paper.
Analysis tools used in this study have been published before as described in Methods and Reporting Summary.