Skip to main content
iScience logoLink to iScience
. 2019 Jul 30;19:326–339. doi: 10.1016/j.isci.2019.07.041

TET-Catalyzed 5-Carboxylcytosine Promotes CTCF Binding to Suboptimal Sequences Genome-wide

Kyster K Nanan 1,2,5, David M Sturgill 1,5, Maria F Prigge 1, Morgan Thenoz 1,3, Allissa A Dillman 1,4, Mariana D Mandler 1, Shalini Oberdoerffer 1,6,
PMCID: PMC6699469  PMID: 31404833

Summary

The mechanisms supporting dynamic regulation of CTCF-binding sites remain poorly understood. Here we describe the TET-catalyzed 5-methylcytosine derivative, 5-carboxylcytosine (5caC), as a factor driving new CTCF binding within genomic DNA. Through a combination of in vivo and in vitro approaches, we reveal that 5caC generally strengthens CTCF association with DNA and facilitates binding to suboptimal sequences. Dramatically, profiling of CTCF binding in a cellular model that accumulates genomic 5caC identified ~13,000 new CTCF sites. The new sites were enriched for overlapping 5caC and were marked by an overall reduction in CTCF motif strength. As CTCF has multiple roles in gene expression, these findings have wide-reaching implications and point to induced 5caC as a potential mechanism to achieve differential CTCF binding in cells.

Subject Areas: Genomics, Molecular Genetics, Molecular Mechanism of Gene Regulation

Graphical Abstract

graphic file with name fx1.jpg

Highlights

  • TET-catalyzed 5-carboxylcytosine (5caC) strengthens CTCF association with DNA

  • 5caC accumulation in genomic DNA through Tdg deletion produced ~13,000 new CTCF sites

  • Gains in CTCF binding were most pronounced at locations with suboptimal sequence motifs

  • 5caC promoted CTCF binding to otherwise refractory sequences in vitro


Genomics; Molecular Genetics; Molecular Mechanism of Gene Regulation

Introduction

CTCF is an 11 zinc-finger DNA-binding protein that regulates multiple critical genomic functions, including promoting long-range interactions between distal regions of the genome and insulating areas of active transcription from inactive regions (Phillips and Corces, 2009). Profiling of CTCF binding in human cells suggests tens of thousands of binding sites, more than half of which display tissue specificity (Chen et al., 2012, The ENCODE Project Consortium, 2012, Wang et al., 2012). Regulation of CTCF binding at variable locations is largely achieved through dynamic DNA methylation, wherein overlapping 5-methylcytosine (5mC) inhibits CTCF association with DNA (Bell and Felsenfeld, 2000, Hark et al., 2000, Maurano et al., 2015, Shukla et al., 2011, The ENCODE Project Consortium, 2012, Wang et al., 2012). Among the described functions for variable CTCF sites, we identified a role in alternative pre-mRNA splicing (Shukla et al., 2011). CTCF binding within actively transcribed genes transiently obstructs RNA polymerase II (pol II) elongation, thereby kinetically favoring spliceosome assembly at weak upstream splice sites (Shukla et al., 2011). In contrast, inhibition of CTCF binding through overlapping 5mC shifts splicing to competing downstream sites through loss of pol II pausing (Shukla et al., 2011). However, the mechanisms that dynamically regulate CTCF exchange in alternative splicing and other tissue-specific activities remained unknown.

We recently determined that the alpha-ketoglutarate-dependent dioxygenases, TET1 and TET2, support CTCF function in splicing regulation by antagonizing overlapping 5mC at CTCF-binding sites (Marina et al., 2016). The TET proteins catalyze active DNA demethylation through successive oxidation of 5mC to 5-hydroxymethycytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxycytosine (5caC) (Ito et al., 2011, Tahiliani et al., 2009). In the final step in the demethylation pathway, 5caC is converted to cytosine through the base-excision repair factor thymine DNA glycosylase (TDG) (He et al., 2011, Maiti and Drohat, 2011). In contrast, reduced TET activity results in increased 5mC at CTCF-binding sites and associated exclusion of dependent upstream exons from spliced mRNA due to CTCF eviction (Marina et al., 2016). Curiously, the underlying DNA at these splicing-associated CTCF sites was not fully unmethylated but was rather marked by a steady level of 5caC (Marina et al., 2016). Although 5caC levels in genomic DNA are generally low, we readily detected the oxidized derivative within CTCF sites in actively dividing primary peripheral lymphocytes (Marina et al., 2016). Biochemical characterization confirmed CTCF interaction with 5caC-containing DNA in vitro that was, unexpectedly, enhanced when compared with unmethylated DNA (Marina et al., 2016). However, the significance of CTCF association with 5caC in vivo remained unclear.

Here we directly examine whether and how 5caC influences CTCF binding in cells. By utilizing a cellular system that accumulates 5caC within genomic DNA (Cortazar et al., 2011), we observe a dramatic expansion in locations of CTCF binding, genome-wide. Characterization of CTCF binding at these de novo sites revealed unique features, including an enrichment for overlapping 5caC and loosening of the consensus CTCF-binding motif. Likewise, 5caC was found enriched at low-motif CTCF sites in primary T cells. CTCF and 5caC profiling in primary T cells and biochemical analysis support the notion that 5caC reinforces CTCF binding in suboptimal contexts. Together, these results provide a rationale for a perplexing aspect of CTCF biology: CTCF physically interacts with DNA methyltransferases (DNMTs) that establish 5mC in genomic DNA and the TET proteins that site-specifically oxidize 5mC in a pathway to demethylation (Dubois-Chevalier et al., 2014, Guastafierro et al., 2008, Zampieri et al., 2012). Our results raise the intriguing possibility that CTCF association with these factors acts to reinforce its own binding, and potentially create a platform for the dynamic regulation of CTCF binding as DNMT and TET levels vary during development.

Results

De Novo CTCF Sites Overlap with 5caC-Rich Regions in Tdg−/− Cells

Given the multiple critical roles played by CTCF in the nucleus, a deeper understanding of the molecular determinants that drive CTCF binding is warranted. We recently identified 5caC as one such factor, wherein we observed that purified CTCF showed increased interaction with 5caC-containing when compared with unmethylated DNA in electrophoretic mobility shift assays (EMSA) (Marina et al., 2016). However, whether 5caC enhances CTCF binding within the complex environment of chromosomal DNA was unclear. To begin to address CTCF/5caC binding in vivo, we turned to knockout mouse embryonic stem cells (mESCs) lacking the base-excision factor Tdg (Tdg−/−) (Figure 1A) (Cortazar et al., 2011). Others previously established that Tdg depletion can be leveraged to boost the otherwise low level of 5caC within genomic DNA, without compromising other aspects of base-excision repair (Cortazar et al., 2011, He et al., 2011). Loss of Tdg signal in knockout mESCs was confirmed by immunoblotting (Figure 1B). Of note, mESCs were maintained on wild-type (Tdg+/+) mouse embryonic fibroblast (MEF) feeders to avoid unintended and uneven differentiation. Accordingly, residual Tdg detection in the knockout population can be attributed to minor MEF contamination during cell harvest. However, given that 5caC levels are exceedingly low in differentiated tissues (Ito et al., 2011, Wu and Zhang, 2017, Zhu et al., 2018), it is unlikely that residual fibroblasts substantially contributed to our analysis at the genome-wide level. In direct support of the integrity of the genomic results, MEF-specific mRNAs were severely depleted in RNA sequencing from wild-type and Tdg−/− mESCs (Figure S1A) and strain-specific single-nucleotide polymorphisms were appropriately assigned in sequencing of DNA and RNA inputs from wild-type and Tdg−/− mESCs (Figure S1B). Importantly, 5caC levels were substantially increased in Tdg−/− mESCs and compared to wild-type mESC genomic DNA, as evidenced by dot blot with 5caC-specific antibody (Figure 1C).

Figure 1.

Figure 1

5caC Enhances CTCF Binding to Genomic DNA

(A) Schematic of cytosine and 5-carboxylcytosine (5caC) molecular structures.

(B) Tdg immunoblot in wild-type (WT) and Tdg−/− mESCs. Replicates are derived from independently maintained cultures. Relative Tdg abundance when compared with WT control is determined through β-tubulin normalization.

(C) 5caC dot blot in WT and Tdg−/− mESC genomic DNA. Replicates as in (B) with methylene blue staining as loading control. Relative 5caC abundance when compared with WT control is determined through methylene blue normalization.

(D) CTCF peaks determined through ChIP-seq in WT and Tdg−/− mESCs were parsed by occurrence in only WT or Tdg−/− cells (unique) or in both populations (common). The Venn diagram depicts the number of CTCF peaks per category.

(E) CTCF immunoblot in WT and Tdg−/− mESCs.

(F) CTCF ChIP-seq signal in WT and Tdg−/− mESCs centered on locations uniquely detected in Tdg−/− cells. Reduced CTCF signal in WT mESCs failed to reach the threshold for peak calling. Overlapping gray lines indicate CTCF signal in randomly shuffled sites from WT and Tdg−/− cells.

(G) Averaged 5caC meDIP-seq in Tdg−/− mESCs centered on CTCF peaks unique to Tdg−/− cells or commonly detected in Tdg−/− and WT mESCs.

Having confirmed the elevated presence of 5caC in Tdg−/− cells, we next explored the consequence to CTCF binding genome-wide. CTCF chromatin immunoprecipitation sequencing (ChIP-seq) was performed in duplicate in wild-type and Tdg−/− mESCs (Figures 1D and S1C, Table S1). Consistent with previous reports, peak-calling indicated ~40,000 CTCF peaks that were commonly detected in both mESC populations (Chen et al., 2012, The ENCODE Project Consortium, 2012, Wang et al., 2012). However, CTCF sites that were uniquely detected in one population or the other showed a dramatic imbalance: whereas ~1,900 sites were present in wild-type but not in Tdg−/− mESCs, nearly ~13,000 sites were found in Tdg−/− but not wild-type mESCs (Figure 1D, Table S2). In other words, Tdg loss was associated with a substantial increase in CTCF binding genome-wide. This increase in CTCF binding was not related to a change in expression, as immunoblotting revealed comparable CTCF protein levels in wild-type and Tdg−/− mESCs (Figure 1E). With this in mind, it is notable that comparison of CTCF ChIP-seq read density within the Tdg−/− induced sites showed that CTCF binding was not entirely absent in wild-type cells, but rather failed to reach the threshold for peak detection (Figure 1F). Importantly, CTCF signal was not observed in randomly shuffled sites, thus establishing the validity of the called peaks (Figure 1F). Likewise, CTCF-ChIP-seq peaks showed high replicate concordance (Figure S1D). These findings suggest that de novo CTCF binding in Tdg−/− cells results from cellular change unrelated to CTCF abundance that reinforces binding at otherwise weak sites.

Based on our previous EMSA showing enhanced CTCF binding in the presence of 5caC, we specifically queried CTCF sites in Tdg−/− cells for emergent 5caC. To this end, we examined 5caC-methylated DNA-immunoprecipitation sequencing (meDIP-seq) data from Tdg-depleted mESCs (Shen et al., 2013). In total, we identified >172,000 discrete genomic regions with 5caC enrichment in Tdg−/− cells. Intuitively, the bulk of these 5caC-enriched locations were not punctuated by overlapping CTCF, as would be expected within the genomic environment wherein modified cytosines exist in numerous sequence contexts and diverse transcription factors are in competition for limited binding sites. Nevertheless, co-occurrence was statistically enriched when compared with random genomic bins (Figure S1E). Critically, when reciprocally focused within CTCF binding sites (CBS), a remarkable association with 5caC emerges: a robust 5caC signal was observed directly within Tdg−/− induced CTCF peaks (Figure 1G). Of note, enrichment for overlapping 5caC was not exclusive to Tdg−/− induced sites when compared with random genomic bins, but was rather also observed in CBS that were commonly detected in wild-type and Tdg−/− cells (Figure S1F). It is relevant in this regard that overall CTCF signal was also increased at common CBS, although to a lesser extent (Figures S1F and S1G). Overall, these data raise the intriguing possibility that 5caC globally reinforces CTCF binding in chromosomal DNA, allowing otherwise low-penetrance CBS to reach the threshold for detection and potentially biological relevance.

Tdg−/− Induced CTCF Sites Display Unique Molecular Features In Vivo

Having established the presence of 5caC-rich induced CTCF sites in Tdg−/− cells, we next explored their molecular basis and biological relevance. In particular, we examined for unique features in the induced subset as influenced by motif strength. Our rationale derived from the following two observations: (1) CTCF association with DNA does not adhere to a strict consensus sequence (Nakahashi et al., 2013, Rhee and Pugh, 2011) and (2) weak CTCF binding was observed in wild-type mESCs at locations that were robustly detected in Tdg−/− cells (Figure 1F). These findings reflect a nuanced aspect of CTCF biology: the 11 zinc fingers associate with DNA to varying extents, resulting in degrees of binding strength and a relatively degenerate motif (Hashimoto et al., 2017, Nakahashi et al., 2013, Rhee and Pugh, 2011). We thus reasoned that overlapping 5caC at Tdg−/− induced CTCF sites may reflect a positive role for the cytosine modification in suboptimal contexts. Empirically determined CTCF-binding sites in wild-type and Tdg−/− mESCs were accordingly assigned motif scores through the CTCFBSDB 2.0 database (Figure 2A, schematic) (Ziebarth et al., 2013). Considering that 5caC was globally enriched in Tdg−/− induced CTCF sites (Figures 1G, S1D, and S1E), it is notable that the de novo sites were generally characterized by reduced motif strength when compared with commonly detected peaks (Figure 2B). To specifically interrogate the relationship between 5caC and motif strength, CTCF peaks were segregated into groups representing low, mid, and high motif scores (bottom quartile, middle two quartiles, and top quartile, respectively) (Figure 2A). This grouping placed 13,183 sites in the low-motif group. Examination of 5caC content within CTCF sites not only confirmed a general enrichment for 5caC across the spectrum of Tdg−/− induced sites but also revealed an inverse relationship to CTCF motif strength, wherein 5caC levels were highest in the low-motif subset (Figure 2C). Browser shot examples of genomic data depicting increased CTCF binding at low-motif locations associated with overlapping 5caC in Tdg−/− cells are shown (Figure S2A). As 5caC is most frequently observed within CpG dinucleotide contexts, we further examined CpG content within CTCF sites parsed by motif strength. In agreement with increased substrate density, low-motif sites were characterized by higher overall CpG density when compared with mid- or high-motif locations, thus providing a rationale for the increased prevalence of 5caC (Figure 2D). These findings support a role for 5caC in promoting new CTCF binding in Tdg−/− cells, particularly to suboptimal motifs.

Figure 2.

Figure 2

Low-Motif CTCF Peaks Exhibit Greatest Gains in Binding at 5caC-rich DNA

(A) CTCF ChIP-seq peaks in WT and Tdg−/− mESCs were assigned motif scores through the CTCFBSDB 2.0 algorithm and segregated into high-, mid-, and low-motif bins.

(B) Box-and-whisker diagram of maximum motif score distribution in common versus Tdg−/− induced CTCF peaks. ∗∗p < 2.2 × 10−16, Mann-Whitney U test.

(C) Tdg−/− mESC 5caC meDIP-seq within empirically determined CTCF peaks, segregated by motif score. meDIP-seq abundance is given in natural log scale.

(D) CpG dinucleotide frequency within Tdg−/− mESC CTCF peaks, segregated by motif score. ∗∗p < 2.2 × 10−16, Mann-Whitney U test.

(E) Tdg−/− mESC RNA pol II ChIP-seq density within common and Tdg−/− induced CTCF peaks relative to randomly permutated (shuffled) control peaks. Within each window, bins are normalized so that aggregate signal reflects local polymerase stalling.

(F) Tdg−/− mESC Pol II ChIP-seq in common and Tdg−/− induced CTCF sites as in (E), segregated on the basis of overlapping 5caC. Analysis is performed on the subset of CTCF sites located within expressed genes.

(G) Tdg−/− mESC pol II ChIP-seq at 5caC-rich (+) and 5caC-poor (−) regions not associated with proximal CTCF binding. n.s. = not significant (p > 0.05), Mann-Whitney U test.

The above-mentioned data establish the presence of 5caC-rich CTCF sites in Tdg−/− cells, but to confirm functionality, we examined for co-occurrence of RNA polymerase II (pol II). We and others have shown that CTCF binding within genic DNA transiently obstructs pol II elongation, resulting in local accumulation of pol II at CTCF sites (Lu and Tang, 2012, Paredes et al., 2013, Shukla et al., 2011). Consistent with the general distribution of CTCF binding, alignment to genomic features showed that ~45% of the Tdg−/− induced sites occur within gene bodies (Figure S2B, Table S3). To assess pol II levels within the induced sites, we performed pol II ChIP-seq in Tdg−/− cells. General examination of pol II read density within both common and induced genic CTCF sites revealed a clear accumulation that is consistent with bona fide CTCF binding, which was absent in randomized regions (Figure 2E). To further explore the relationship to 5caC, pol II occurrence was examined within genic Tdg−/− induced CTCF sites segregated on the basis of overlapping 5caC (5caC-rich (+) or 5caC-poor (−)). Accumulating pol II was detected in both classes of CTCF sites, whereas levels were markedly elevated for the 5caC(+) sites (Figure 2F). Although somewhat unexpected, this finding is consistent with our previous demonstration of increased CTCF binding to a 5caC-containing probe in EMSA assay when compared with unmodified probe (Marina et al., 2016). Thus, the observed increase in pol II density at 5caC-rich CTCF sites in Tdg−/− cells raises the possibility that 5caC both promotes and strengthens CTCF binding in vivo. Importantly, examination of pol II in 5caC-rich versus poor genic regions that are not marked by proximal CTCF showed no distinction, demonstrating that accumulating pol II is not a general feature of 5caC-rich DNA (Figures 2G and S2C). Together, these results establish both the presence and functionality of 5caC-associated CTCF sites in Tdg−/− cells.

Of note, mESCs are qualitatively distinct from differentiated tissues on several accounts. Most relevant to the current analysis, embryonic stem cells are uniquely characterized by non-CpG methylation and overall higher levels of 5mC-oxidized derivatives (Guo et al., 2014, Huang et al., 2014, Ito et al., 2010, Ito et al., 2011, Koh et al., 2011, Lister et al., 2009, Ramsahoye et al., 2000). Thus, to examine the generality of our findings outside of mESCs, we turned to primary human lymphocytes. We previously co-detected 5caC and CTCF at pre-mRNA splicing-associated regions in naive CD4+ T lymphocytes (Marina et al., 2016). TET1 and TET2 expression was high in naive CD4+ T cells, whereas levels decreased upon activation (Marina et al., 2016). Accordingly, we isolated naive CD4+ T cells from peripheral blood for genome-wide analysis of CTCF and 5caC through ChIP-seq and meDIP-seq, respectively (Figure 3A). As in the mESC analysis, experimentally determined CTCF sites were parsed based on motif strength into low-, mid-, and high-scoring groups (Figure 3B) with the bottom 25% of scores (7,016 CBS) comprising the low-motif group. Concordant with the mESC results, low-motif CTCF sites in CD4+ T cells were marked by increased CpG density when compared with the mid- and high-scoring groups (Figure 3C). Intriguingly, the low-motif cohort showed the greatest change in CTCF occupancy during the naive to activated transition, wherein a net decrease in CTCF binding was observed (Figure S3A). In support of developmental regulation through overlapping 5caC, low-motif CTCF sites in naive CD4+ T cells were further marked by elevated overlapping 5caC when compared with the mid- and high-scoring groups (Figure 3D). A browser shot example depicting reduced 5caC coinciding with decreased CTCF binding at a low-motif CBS in activated versus naive T cells is shown (Figure S3B). Given that it is commonly accepted that 5caC levels within genomic DNA are too low for basal detection, 5caC occurrence within CTCF sites in an unperturbed cellular setting is noteworthy and attests to the relevance of association. Indeed, analysis of pol II ChIP-seq performed in naive CD4+ T cells revealed a clear enrichment for pol II within low-motif CTCF sites, which also showed the highest levels of overlapping 5caC (Figure 3E). Of note, pol II levels were highest in the low-motif set firmly establishing these sites as locations of CTCF binding with a bona fide role in the regulation of a biological function.

Figure 3.

Figure 3

5caC Associates with Low-Motif CTCF-Binding Sites in Primary Human CD4+ T Cells

(A) Naive CD4+ T cells were isolated from human peripheral blood through immunomagnetic enrichment and subjected to genome-wide analyses.

(B) Distribution of maximum CTCFBSDB 2.0-determined motif scores at CTCF peaks detected in naive T cells. Peaks were segregated into high, mid, and low bins corresponding to scores within the top quartile, middle two quartiles, and bottom quartile, respectively.

(C) CpG dinucleotide frequency within CD4+ T cell CTCF peaks, segregated by motif score. ∗∗p < 2.2 × 10−16, Mann-Whitney U test.

(D and E) 5caC meDIP-seq (D) and RNA pol II ChIP-seq (E) within determined CD4+ T cell CTCF peaks, segregated by motif score; (D) includes sites with 5caC abundance greater than genomic baseline (determined by shuffled control).

CTCF Binds Motif-Free DNA in the Presence of 5caC In Vitro

Our genome-wide data strongly support a role for 5caC in driving CTCF binding in a chromosomal setting. However, to move beyond correlations, we turned to in vitro systems for biochemical characterization of CTCF association with 5caC-containing DNA. In particular, based on the observed association of CTCF with 5caC-rich DNA at low-motif sites, we examined whether 5caC promotes CTCF binding in suboptimal contexts through EMSA assays. Although EMSA varies from chromosomal DNA in the use of linear DNA probes, concerns related to artificiality are mitigated by the fact that CTCF binds to nucleosome-free DNA in vivo (Chen et al., 2012, Fu et al., 2008, Magbanua et al., 2015, Teif et al., 2014). Importantly, we previously revealed a surprising preference for 5caC-containing when compared with unmethylated DNA in CTCF EMSA: overall complex formation was increased in the presence of overlapping 5caC, and unlabeled 5caC was a generally more effective competitor when compared with unmethylated competitor (Marina et al., 2016). These results were observed with multiple probes and distinct CTCF protein sources including recombinant CTCF and FLAG-tagged CTCF purified from cell culture (Marina et al., 2016). These findings are consistent with the genome-wide analysis of CTCF binding in Tdg−/− cells and highlight EMSA as an appropriate method for examining CTCF interaction with carboxylated DNA in vitro.

To validate the relationship between 5caC and CTCF motif strength uncovered in the genome-wide analysis, we generated EMSA probes embodying distinct CTCF-binding modalities in the presence and absence of 5caC. To represent strong (“CBS-high”) and weak (“CBS-low”) motifs, we utilized a CTCF-binding site located within exon 5 of the CD45 gene in either a wild-type or mutated context. We previously performed extensive characterization of the CD45 probes and showed that the incorporation of point mutations within the CTCF core to generate the CBS-low probe abolished CTCF binding in EMSA with unmodified DNA (Marina et al., 2016, Shukla et al., 2011). Importantly, the introduced mutations do not alter CpG or general cytosine density in double-stranded DNA. For reference, the CTCFBSDB 2.0 algorithm revealed a score of 18.16 associated with the CBS-high CTCF-binding site, whereas the CBS-low probe produced a score of 7.26 (Figure 4A). EMSA was thus performed with FLAG-tagged CTCF purified from HEK293T lysates and radiolabeled CD45 probes. 5caC incorporation was accomplished through (1) PCR amplification in the presence of 2′-deoxy-5-carboxycytidine 5′-triphosphate (5-carboxy-dCTP) or dCTP and (2) commercial synthesis. The PCR approach results in a 72-bp probe centered on the CTCF-binding site with uniformly modified cytosines, whereas commercial synthesis yields a 41mer with a total of three modified cytosines per DNA stand, occurring exclusively in a CpG context (Figure 4A). Although CTCF binding is generally compromised on shorter probes such as the 41mer (Marina et al., 2016), both approaches yielded consistent and clear results. In the presence of unmethylated DNA, CTCF bound to the CBS-high, but not the CBS-low, site, as evidenced by complex formation in phosphorimager analysis (Figure 4B). Percent shift was calculated as the amount of label in complex with CTCF relative to free label (Figure 4B). In contrast, CTCF formed a robust and specific complex with both the CBS-high and CBS-low probes in the presence of overlapping 5caC, as demonstrated by cold competition and supershift with anti-CTCF antibody (Figure 4B). Importantly, cold competition revealed a reproducible degree of distinctness between the unmodified and 5caC-containing complexes. In the PCR-generated probes with uniform incorporation of cytosine or 5caC, unmodified competitor more effectively competed unmodified complexes, whereas carboxylated competitor more effectively competed carboxylated complexes (Figure 4B, left). This distinction may reflect a cumulative change in DNA structure or charge in heavily modified substrates, which in this case includes 5caC outside of a CpG context. In support of this notion, in examining the 41mers with CpG-restricted modification, CBS-high carboxylated and unmodified probes were comparably efficient at competing CTCF association with unmodified CBS-high DNA, whereas 5caC-containing competitor was more effective at disrupting complexes involving carboxylated DNA. Furthermore, whereas the unmodified CBS-low probe was an ineffective competitor in any circumstance, the low-motif 5caC-containing probe was on par with the carboxylated CBS-high competitor (Figure 4B, right). Overall, these EMSA results highlight the capacity of 5caC to overcome the sequence penalty associated with weak-motif CTCF sites.

Figure 4.

Figure 4

5caC Enhances CTCF Association with Weak Binding Sites In Vitro

(A) Schematic of EMSA probes derived from the human CD45 gene. Point mutations were incorporated into a strong-motif CTCF site (CBS-high) within exon 5 to generate an analogous weak-motif probe (CBS-low). A region devoid of CTCF binding in vivo within exon 6 was selected as a “motif-free” control. CTCFBSDB 2.0 motif scores are shown. Underlined sequences indicate the exon 5 CTCF-binding site location, red text highlights mutated residues, and sequences in bold denote CpG locations.

(B) EMSA with affinity-purified CTCF and radiolabeled PCR-generated 72mer or commercially synthesized 41mer DNA probes with uniform or CpG-restricted cytosine [Cyt.] and 5-carboxylcytosine [5caC], respectively. Unlabeled CBS-high (H) or CBS-low (L) motif cold-competitor DNA added to indicated lanes; 10X and 100X molar excess used for 72mer and 41mer, respectively. Supershift performed with α-CTCF antibody. Quantification of bound probe indicated as % shift below figure; representative of n ≥ 3.

(C) EMSA using affinity-purified CTCF and motif-free PCR-generated CD45 exon 6 probes of varying carboxylation status; 10X molar excess of unlabeled CD45 exon 6 cold competitor where indicated; representative of n = 2.

(D) Titration EMSA performed using 41mer CD45 exon 5 DNA probes (3.76 nM); P indicates lane containing probe alone, CTCF concentration: 342.3–0.0418 nM.

(E) Saturation binding curves, derived from EMSAs in (D), and summary chart of apparent KD values for each 41mer probe; markers on curves indicate mean of two (CBS-high [Cyt.]) or three (CBS-high [5caC], CBS-low [5caC]) independent replicates, and error bars indicate standard deviation.

Of note, the genome-wide analysis of Tdg−/− induced CTCF sites revealed sequences for which the likelihood of CTCF binding was less than expected in random sequence (negative motif scores). Likewise, an unbiased mass spectrometry study that identified CTCF as a 5caC-specific reader utilized a probe entirely lacking elements consistent with known CTCF-interacting sequences (−8.04 motif score) (Spruijt et al., 2013, Ziebarth et al., 2013). To examine whether we could recapitulate CTCF binding to seemingly motif-free DNA in the presence of 5caC, we performed additional EMSA with a sequence lacking any characteristics of CTCF binding. Specifically, CD45 exon 6 does not contain any computationally predicted CTCF-binding sites (7.36 motif score, Figure 4A) and shows no evidence of CTCF binding in ChIP-qPCR and ChIP-seq (Marina et al., 2016, Shukla et al., 2011, The ENCODE Project Consortium, 2012). Exon 6 EMSA probes were generated by PCR in the presence of dCTP or 5-carboxy-dCTP. As predicted, purified CTCF did not interact with the unmethylated exon 6 probe under established binding conditions. In contrast, the 5caC-containing exon 6 probe formed a robust and specific complex with CTCF (Figure 4C). This unexpected finding clearly demonstrates the positive impact of 5caC on CTCF binding in vitro.

The above EMSAs establish that CTCF binding is qualitatively enhanced in the presence of 5caC. To further quantify the strength of association, we pursued relative binding affinity determination (KD,apparent(app)) through saturation binding experiments involving purified CTCF and radiolabeled CD45 exon 5 41mers. EMSA was performed with a fixed amount of CD45 probe representing CBS-high (+/− CpG 5caC) and CBS-low (+CpG 5caC) CTCF sites and decreasing levels of purified CTCF (Figure 4D) (Heffler et al., 2012). The unmodified CBS-low probe was excluded from this analysis as CTCF binding was not detected in standard EMSA. Saturation binding curves were generated through percent shift to determine relative CTCF binding affinity as it relates to motif strength and 5caC (Heffler et al., 2012). Consistent with the enhanced binding visualized in standard EMSA, incorporation of 5caC into the three CpGs in the CBS-high probe strengthened CTCF binding and resulted in a near 2-fold increase in affinity when compared with the unmodified CBS-high probe. Remarkably, the presence of 5caC within the CBS-low probe yielded an intermediate KD,app that was strengthened when compared with the unmodified CBS-high probe but moderately reduced relative to the 5caC-containing CBS-high probe (Figure 4E). These KD,app values are within the established range for CTCF association with DNA (Hashimoto et al., 2017, Li et al., 2017, Martinez et al., 2014, Plasschaert et al., 2014). Taken together, these data corroborate the in vivo observations that overlapping 5caC, even in a minimal CpG setting, is sufficient to promote CTCF binding to weak consensus motifs.

An inherent limitation of EMSA relates to the fact that protein:DNA interactions are analyzed in the absence of additional variables at the cellular level. For example, while CTCF may interact with weak binding sites in the presence of 5caC in isolation, other factors may occupy such sites in vivo. To address this possibility, the CBS-high and CBS-low exon 5 probes were used as bait in DNA affinity purification assay (DAPA) (Figure 5A). PCR-generated 72mer probes were biotinylated and immobilized on streptavidin-coated magnetic beads. Of note, capture of the 5caC-containing probes was slightly reduced relative to the unmodified probes, as assessed through SYBR Gold staining of the unbound portion (Figure 5B). Nevertheless, recovery of CBS-low versus CBS-high probes was comparable per modification state, allowing for direct comparisons as related to motif strength (Figure 5B). Incubation with nuclear extracts from HEK293T cells expressing FLAG-CTCF allowed for the capture and subsequent elution of associated proteins. Immunoblotting of DAPA eluates from the unmodified DNA probes demonstrated a robust interaction between CTCF and the probe containing a CBS-high motif, whereas binding was not observed for the CBS-low probe (Figure 5C). In contrast, CTCF was recovered through incubation with both the CBS-high and CBS-low 5caC-containing probes. Consistent with the determined KD,app CTCF association was reduced for the CBS-low 5caC-containing probe, but was clearly visible (Figure 5C). Relatedly, while the uneven streptavidin immobilization precludes direct comparison between unmodified and 5caC-containing probes, the overall reduction in CTCF recovery through the 5caC-containing probes may reflect competition for binding with other factors that shape the binding landscape in vivo. Indeed, numerous nuclear factors exhibited 5caC-specific binding in an unbiased mass spectrometry screen (Spruijt et al., 2013). Overall, these DAPA results confirm that CTCF interacts with suboptimal DNA motifs in the presence of 5caC within the complex cellular milieu.

Figure 5.

Figure 5

Affinity Purification of CTCF from Nuclear Lysates with 5caC-Containing DNA

(A) Overview of CTCF DNA affinity precipitation assay (DAPA). Bead-bound DNA probes generated in the presence or absence of 5caC were used to enrich CTCF from nuclear lysates.

(B) SYBR Gold-staining to assess DNA probe capture efficiency on streptavidin magnetic beads. CD45 exon 5 probes representing strong (CBS-high) and weak (CBS-low) CTCF motifs generated through PCR in the presence of dCTP (Cyt.) or 5-carboxy-dCTP (5caC) were used.

(C) Immunoblotting of DAPA-recovered CTCF through the various probes.

CTCF Generates a Strong Footprint at Weak Binding Motifs in the Presence of 5caC

The sum of the genome-wide analysis of Tdg−/− cells and general EMSA results clearly demonstrate that 5caC enhances CTCF binding in suboptimal contexts. However, although these approaches quantitatively assess the extent of CTCF association, they do not indicate whether binding is qualitatively distinct. To establish whether CTCF occupies a similar or unique expanse of DNA in the presence of 5caC, we thus pursued in vitro DNase I footprinting analysis. DNase I footprinting relies on time- and concentration-dependent DNase I cleavage according to availability (Brenowitz et al., 2001, Carey et al., 2013, Hampshire et al., 2007, Leblanc and Moss, 2015). This capacity hinges on the fact that DNA molecules adopt an inherent structure that renders certain regions more or less exposed. Incubation of end-labeled probe with a protein of interest in the presence of DNase I can thus inform on the protein-binding site through the region that is protected from cleavage (Brenowitz et al., 2001, Carey et al., 2013, Hampshire et al., 2007, Leblanc and Moss, 2015) (Figure 6A).

Figure 6.

Figure 6

DNase I Hypersensitivity (DHS) Analysis of CTCF Associated with 5caC-Containing DNA

(A) Overview of CTCF DNase I footprinting with variably carboxylated radiolabeled DNA probes representing strong (CBS-high) and weak (CBS-low) CTCF motifs.

(B) Gel analysis of CTCF DNase I footprinting performed with the CD45 exon 5 probe, ±CpG 5caC. The DNase I reaction contained 342.3–42.8 nM CTCF and 7.52 nM DNA probe. The location of the CTCF-binding site is highlighted through the black bars. Carboxylated cytosine residues are indicated by C∗. M signifies oligonucleotide marker, and arrowheads show lanes used to generate histograms.

(C) Lane histogram densitometry analysis of overall probe digestion patterns in the presence and absence of CTCF as indicated in (B). Significant protection from cleavage is seen in the region overlapping the CTCF-binding site for the CBS-high[Cyt.] and CBS-low[5caC] probes.

It is well established that CTCF binds to nucleosome-free DNA, and previous studies have demonstrated CTCF footprints of >20 bp (Chen et al., 2012, Filippova et al., 2001, Fu et al., 2008, Magbanua et al., 2015, Teif et al., 2014). We thus pursued DNase I footprinting with the synthesized CD45 exon 5 41mer probe in which 5caC is restricted to three CpGs located within and adjacent to the CTCF-binding core (Figure 4A). Of note, DNase I cleavage displays some sequence preference and 5caC protrusion into the major grove of the DNA double helix is known to minorly alter base-pairing thermodynamic stability (Dai et al., 2016, Herrera and Chaires, 1994, Szulik et al., 2015). Accordingly, as the interrogated 41mers differ in both absolute sequence context and modified nucleotide composition, it is unsurprising that gel analysis of DNase I hypersensitivity (DHS) patterns in the absence of CTCF showed some distinctions (Figure 6B). In this regard, DNase I footprinting effectively informs whether select nucleotides display resistance to cleavage when compared with other locations within a particular probe. Importantly, CTCF addition to the DNase I reaction involving the unmodified strong motif probe resulted in a protected region encompassing the known location of CTCF binding (Figure 6B). In contrast, cleavage persisted within the weak CTCF motif-binding core in the presence of increasing CTCF (Figure 6B). Of note, consistent with basal CTCF detection at Tdg−/− induced locations in wild-type mESC cells (Figure 1F), minor protection from cleavage was observed upon CTCF addition to the weak motif probe. However, overall digestion patterns were virtually indistinguishable in lane histogram densitometry analysis in the presence or absence of CTCF (Figure 6B). With this in mind, it is remarkable that the 5caC-containing weak motif generated a CTCF footprint that effectively mirrored the strong unmodified probe (Figures 6B and 6C). In addition to validating that overlapping 5caC does indeed enable CTCF binding in suboptimal contexts, these DHS results indicate that binding is qualitatively similar to unmodified strong binding sites, suggesting a related mode of interaction. Taken into consideration along with the EMSA results, these in vitro data complement the genome-wide in vivo results and solidify the role of 5caC in reconstituting CTCF association at weak sequence motifs.

Discussion

Once considered a stable hereditary mark, it is now appreciated that DNA methylation is dynamically regulated to shape and define gene expression in a cell-specific manner (Luo et al., 2018). Tissue-specific changes in methylation often occur within gene bodies (Deaton et al., 2011, Maunakea et al., 2010), wherein we previously determined a role in pre-mRNA splicing that is achieved through modulation of CTCF binding (Marina et al., 2016, Shukla et al., 2011). We showed that genic CTCF promotes inclusion of weak exons in spliced mRNA through local RNA polymerase II pausing, whereas DNA methylation has the opposite effect (Shukla et al., 2011). Our efforts to understand how dynamic methylation at CTCF sites is achieved further uncovered a role for TET-catalyzed oxidized 5mC derivatives (5oxiC) (Marina et al., 2016). In particular, we asked how methylation is modulated at specific CTCF sites while leaving others unaffected. Given that CTCF is critical to numerous cellular processes, including general nuclear architecture, precise control of variable binding would be of tantamount relevance (Dixon et al., 2012, Handoko et al., 2011, Rao et al., 2014, Vietri Rudan et al., 2015, Zuin et al., 2014). We previously determined that splicing-associated “dynamic” CTCF sites are marked by overlapping oxidized derivatives, whereas static sites are unmethylated (Marina et al., 2016). Considering that CTCF has been biochemically associated with both the DNMT enzymes (that would evict CTCF) (Guastafierro et al., 2008, Zampieri et al., 2012) and the TET enzymes (that would facilitate CTCF binding) (Dubois-Chevalier et al., 2014) these results raised the possibility that 5oxiC facilitates CTCF binding and bookmarks locations of future CTCF eviction as TET activity declines. In other words, CTCF association with DNMT1 would ensure methylation of proximal CpGs post-replication, whereas the TET enzymes would subsequently oxidize 5mC and enable CTCF binding. However, a problematic aspect of this model related to the fact that CTCF is incapable of binding the abundant 5mC oxidized derivative 5hmC (Marina et al., 2016), and the downstream derivatives 5fC and 5caC are lowly detected in genomic DNA (Ito et al., 2011, Lu et al., 2015).

Our demonstrations associating CTCF and 5caC reconcile these observations: CTCF robustly interacts with 5caC, in vitro and in vivo, and 5caC is readily detected at CTCF sites. This latter point further suggests that CTCF may protect 5caC from removal through the base-excision repair enzymes (He et al., 2011). However, the truly unexpected aspect of this work relates to the observation that CTCF binding is seemingly enhanced through overlapping 5caC, suggesting a novel mode of binding. These results are consistent with a previous unbiased mass spectrometry study employing a short CpG-carboxylated DNA probe, wherein CTCF was identified as a 5caC-specific reader (Spruijt et al., 2013). As in our EMSA experiments, this DNA fragment lacked a computationally identifiable CTCF-binding site. Combined with the genome-wide observation that CTCF-binding sites with overlapping 5caC are generally characterized by lower motif scores, these findings suggest unique binding determinants. Although the precise biophysical bases guiding CTCF binding at unmethylated versus 5caC-rich DNA are unclear, our DHS data comparing unmodified CBS-high to carboxylated CBS-high and CBS-low probes indicate that binding in these variably modified contexts is conformationally similar and motif-centric. These findings suggest that rather than creating a true de novo environment involving novel CTCF contacts with unique sequences, 5caC rather stabilizes CTCF association with suboptimal motifs that are otherwise insufficient for protein retention. Indeed, it is well established that transcription factors dynamically engage substrate DNA (Voss and Hager, 2014). 5caC in this regard may subtly alter Kon versus Koff rates to reach a favorable equilibrium for bona fide CTCF binding. This prospect is in line with the EMSA results demonstrating increased KD in the presence of 5caC as well as the genome-wide data showing globally enhanced CTCF binding in the presence of overlapping 5caC.

As for how 5caC generates favorable conditions for CTCF binding, charge-based stabilization involving the negatively charged carboxylate group of 5caC and cation-loaded zinc fingers of CTCF may play a role. Alternatively, the presence of 5caC within genomic DNA may facilitate a double-helical structure that enables CTCF binding. Detailed biophysical studies will be required to resolve the precise mechanistic underpinnings of sequence- and modification-specific CTCF DNA binding. Additional investigation will also be required to determine why only a subset of 5caC-rich locations promote enhanced CTCF association. In this regard, the canonical CTCF motif logo may be principally driven by binding to high-affinity unmodified sites, thus obscuring the influence of lower prevalence dynamic sites. In support of this premise, 1,490 determined locations of CTCF binding in T cells yielded motif scores that were lower than the “motif-free” exon 6 EMSA probe (score of 7.36). In addition to highlighting the relevance of 5caC localization in vivo (exon 6 is not marked by 5caC), these findings suggest nuances within CTCF motifs that are driven by elements outside of strict sequence. We believe 5caC to be one such factor. Relatedly, CTCF possesses a variety of binding partners such that proximal sequence motifs that engage associated factors may influence CTCF recruitment to otherwise weak sites (Holwerda and de Laat, 2013, Parelho et al., 2008). Finally, it is likely that other factors occupy the 5caC-rich sequences in vivo. Indeed, CTCF is only one of many factors that showed preferential association with 5caC-containing DNA in an unbiased mass spectrometry screen (Spruijt et al., 2013). All in all, additional investigation will be required to determine the basis of 5caC-associated CTCF binding in vivo.

Although comprehensive analysis examining the physiological impact of widespread Tdg−/− induced CTCF sites is pending, one can infer that the observed increase in binding will impact diverse CTCF functions. CTCF has a demonstrated role in numerous aspects of nuclear biology including chromatin insulation, long-range chromosomal interactions, and gene expression regulation (Ong and Corces, 2014). With respect to the latter point, we provide evidence herein that the low-motif, 5caC-rich CTCF sites show a strong impact on RNA polymerase II pausing, intuiting a role in transcription elongation. Curiously, a recent global run-on sequencing study described a function for 5caC in reduced RNA polymerase II elongation in Tdg−/− cells (Wang et al., 2015). Our description of ~13,000 novel CTCF sites upon Tdg deletion raises the possibility that emergent CTCF contributed to the overall reduction in pol II processivity in Tdg−/− cells. Indeed, in our hands, examination of pol II occupancy at 5caC-rich regions that were not marked by CTCF binding showed no elevation when compared with 5caC-poor regions. Further analyses will be required to conclusively determine the source of altered elongation in Tdg−/− cells.

In sum, we describe a role for 5caC in modulating CTCF binding in cells. Given that 5caC levels vary during development (Wheldon et al., 2014), these results have significant implications to dynamic CTCF binding during tissue differentiation. A detailed analysis of CTCF and 5caC co-occurrence during organismal development will inform on the extent to which 5caC shapes CTCF tissue specificity. Importantly, our findings reported herein raise the possibility that CTCF sites may be engineered in the genome through targeted 5caC. As CRISPR/Cas9 technology continues to advance, one can envision applications both in studying specific CTCF sites and the overall impact of induced CTCF on nuclear architecture.

Limitations of the Study

In this study, we provide evidence that 5caC stabilizes CTCF binding to suboptimal DNA sequence contexts in vitro and in vivo. However, the net impact on cellular function was not assessed. Whether variations in 5caC levels that occur during development or in response to specific stimuli influence gene expression or genomic architecture through enhanced CTCF binding is unclear. In addition, subtle distinctions between our in vitro and in vivo results suggest that CTCF is in competition with other 5caC-sensitive transcription factors in a chromosomal setting. The identity of such factors and how they shape the transcriptional landscape in concert with CTCF remains to be determined.

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

Acknowledgments

We thank the members of the Center for Cancer Research Sequencing Facility (CCR-SF) at the National Cancer Institute (Frederick, MD) for providing Illumina sequencing services. We thank Dr. Primo Schär of the University of Basel for providing wild-type and Tdg−/− mESCs. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). This work is supported by the Intramural Research Program of NIH, the National Cancer Institute, and The Center for Cancer Research.

Author Contributions

Conceptualization, S.O.; Methodology, K.K.N., M.T., and S.O.; Data analysis and curation, D.M.S. and A.A.D.; Investigation and validation, K.K.N., M.F.P., A.A.D., and M.D.M.; Writing, K.K.N, D.M.S., and S.O.; Supervision and funding acquisition, S.O.

Declaration of Interests

The authors declare no competing interests.

Published: September 27, 2019

Footnotes

Supplemental Information can be found online at https://doi.org/10.1016/j.isci.2019.07.041.

Data and Code Availability

The accession number for the sequence data reported in this paper is GEO: GSE123101.

Supplemental Information

Document S1. Transparent Methods and Figures S1–S3
mmc1.pdf (1.1MB, pdf)
Table S1. Summary of CTCF ChIP-Seq Alignment Statistics in mESCs, Related to Figure 1
mmc2.xlsx (10.4KB, xlsx)
Table S2. Locations and Properties of Wild-Type and Tdg −/− mESC CTCF Peaks, Related to Figures 1 and 2
mmc3.xlsx (3.5MB, xlsx)
Table S3. Genomic Annotation of Tdg −/− mESC CTCF Peaks Using HOMER, Related to Figures 2 and S3
mmc4.xlsx (6.9MB, xlsx)

References

  1. Bell A.C., Felsenfeld G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature. 2000;405:482–485. doi: 10.1038/35013100. [DOI] [PubMed] [Google Scholar]
  2. Brenowitz M., Senear D.F., Kingston R.E. DNase I footprint analysis of protein-DNA binding. Curr. Protoc. Mol. Biol. 2001:12.4.1–12.4.16. doi: 10.1002/0471142727.mb1204s07. Chapter 12, Unit 12.4. [DOI] [PubMed] [Google Scholar]
  3. Carey M.F., Peterson C.L., Smale S.T. DNase I footprinting. Cold Spring Harb. Protoc. 2013;2013:469–478. doi: 10.1101/pdb.prot074328. [DOI] [PubMed] [Google Scholar]
  4. Chen H., Tian Y., Shu W., Bo X., Wang S. Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome. PLoS One. 2012;7:e41374. doi: 10.1371/journal.pone.0041374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cortazar D., Kunz C., Selfridge J., Lettieri T., Saito Y., MacDougall E., Wirz A., Schuermann D., Jacobs A.L., Siegrist F. Embryonic lethal phenotype reveals a function of TDG in maintaining epigenetic stability. Nature. 2011;470:419–423. doi: 10.1038/nature09672. [DOI] [PubMed] [Google Scholar]
  6. Dai Q., Sanstead P.J., Peng C.S., Han D., He C., Tokmakoff A. Weakened N3 hydrogen bonding by 5-formylcytosine and 5-carboxylcytosine reduces their base-pairing stability. ACS Chem. Biol. 2016;11:470–477. doi: 10.1021/acschembio.5b00762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Deaton A.M., Webb S., Kerr A.R., Illingworth R.S., Guy J., Andrews R., Bird A. Cell type-specific DNA methylation at intragenic CpG islands in the immune system. Genome Res. 2011;21:1074–1086. doi: 10.1101/gr.118703.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dubois-Chevalier J., Oger F., Dehondt H., Firmin F.F., Gheeraert C., Staels B., Lefebvre P., Eeckhoute J. A dynamic CTCF chromatin binding landscape promotes DNA hydroxymethylation and transcriptional induction of adipocyte differentiation. Nucleic Acids Res. 2014;42:10943–10959. doi: 10.1093/nar/gku780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Filippova G.N., Thienes C.P., Penn B.H., Cho D.H., Hu Y.J., Moore J.M., Klesert T.R., Lobanenkov V.V., Tapscott S.J. CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus. Nat. Genet. 2001;28:335–343. doi: 10.1038/ng570. [DOI] [PubMed] [Google Scholar]
  11. Fu Y., Sinha M., Peterson C.L., Weng Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 2008;4:e1000138. doi: 10.1371/journal.pgen.1000138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Guastafierro T., Cecchinelli B., Zampieri M., Reale A., Riggio G., Sthandier O., Zupi G., Calabrese L., Caiafa P. CCCTC-binding factor activates PARP-1 affecting DNA methylation machinery. J. Biol. Chem. 2008;283:21873–21880. doi: 10.1074/jbc.M801170200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Guo J.U., Su Y., Shin J.H., Shin J., Li H., Xie B., Zhong C., Hu S., Le T., Fan G. Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain. Nat. Neurosci. 2014;17:215–222. doi: 10.1038/nn.3607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hampshire A.J., Rusling D.A., Broughton-Head V.J., Fox K.R. Footprinting: a method for determining the sequence selectivity, affinity and kinetics of DNA-binding ligands. Methods. 2007;42:128–140. doi: 10.1016/j.ymeth.2007.01.002. [DOI] [PubMed] [Google Scholar]
  15. Handoko L., Xu H., Li G., Ngan C.Y., Chew E., Schnapp M., Lee C.W., Ye C., Ping J.L., Mulawadi F. CTCF-mediated functional chromatin interactome in pluripotent cells. Nat. Genet. 2011;43:630–638. doi: 10.1038/ng.857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hark A.T., Schoenherr C.J., Katz D.J., Ingram R.S., Levorse J.M., Tilghman S.M. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature. 2000;405:486–489. doi: 10.1038/35013106. [DOI] [PubMed] [Google Scholar]
  17. Hashimoto H., Wang D., Horton J.R., Zhang X., Corces V.G., Cheng X. Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol. Cell. 2017;66:711–720.e3. doi: 10.1016/j.molcel.2017.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. He Y.F., Li B.Z., Li Z., Liu P., Wang Y., Tang Q., Ding J., Jia Y., Chen Z., Li L. Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science. 2011;333:1303–1307. doi: 10.1126/science.1210944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Heffler M.A., Walters R.D., Kugel J.F. Using electrophoretic mobility shift assays to measure equilibrium dissociation constants: GAL4-p53 binding DNA as a model system. Biochem. Mol. Biol. Educ. 2012;40:383–387. doi: 10.1002/bmb.20649. [DOI] [PubMed] [Google Scholar]
  20. Herrera J.E., Chaires J.B. Characterization of preferred deoxyribonuclease I cleavage sites. J. Mol. Biol. 1994;236:405–411. doi: 10.1006/jmbi.1994.1152. [DOI] [PubMed] [Google Scholar]
  21. Holwerda S.J., de Laat W. CTCF: the protein, the binding partners, the binding sites and their chromatin loops. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2013;368:20120369. doi: 10.1098/rstb.2012.0369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Huang Y., Chavez L., Chang X., Wang X., Pastor W.A., Kang J., Zepeda-Martinez J.A., Pape U.J., Jacobsen S.E., Peters B. Distinct roles of the methylcytosine oxidases Tet1 and Tet2 in mouse embryonic stem cells. Proc. Natl. Acad. Sci. U S A. 2014;111:1361–1366. doi: 10.1073/pnas.1322921111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ito S., D'Alessio A.C., Taranova O.V., Hong K., Sowers L.C., Zhang Y. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature. 2010;466:1129–1133. doi: 10.1038/nature09303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ito S., Shen L., Dai Q., Wu S.C., Collins L.B., Swenberg J.A., He C., Zhang Y. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333:1300–1303. doi: 10.1126/science.1210597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Koh K.P., Yabuuchi A., Rao S., Huang Y., Cunniff K., Nardone J., Laiho A., Tahiliani M., Sommer C.A., Mostoslavsky G. Tet1 and Tet2 regulate 5-hydroxymethylcytosine production and cell lineage specification in mouse embryonic stem cells. Cell Stem Cell. 2011;8:200–213. doi: 10.1016/j.stem.2011.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Leblanc B.P., Moss T. In vitro DNase I footprinting. Methods Mol. Biol. 2015;1334:17–27. doi: 10.1007/978-1-4939-2877-4_2. [DOI] [PubMed] [Google Scholar]
  27. Li W., Shang L., Huang K., Li J., Wang Z., Yao H. Identification of critical base pairs required for CTCF binding in motif M1 and M2. Protein Cell. 2017;8:544–549. doi: 10.1007/s13238-017-0387-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lister R., Pelizzola M., Dowen R.H., Hawkins R.D., Hon G., Tonti-Filippini J., Nery J.R., Lee L., Ye Z., Ngo Q.M. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lu J., Tang M. CTCF-dependent chromatin insulator as a built-in attenuator of angiogenesis. Transcription. 2012;3:73–77. doi: 10.4161/trns.19634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lu X., Han D., Zhao B.S., Song C.X., Zhang L.S., Dore L.C., He C. Base-resolution maps of 5-formylcytosine and 5-carboxylcytosine reveal genome-wide DNA demethylation dynamics. Cell Res. 2015;25:386–389. doi: 10.1038/cr.2015.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Luo C., Hajkova P., Ecker J.R. Dynamic DNA methylation: in the right place at the right time. Science. 2018;361:1336–1340. doi: 10.1126/science.aat6806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Magbanua J.P., Runneburger E., Russell S., White R. A variably occupied CTCF binding site in the ultrabithorax gene in the Drosophila bithorax complex. Mol. Cell. Biol. 2015;35:318–330. doi: 10.1128/MCB.01061-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Maiti A., Drohat A.C. Thymine DNA glycosylase can rapidly excise 5-formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites. J. Biol. Chem. 2011;286:35334–35338. doi: 10.1074/jbc.C111.284620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Marina R.J., Sturgill D., Bailly M.A., Thenoz M., Varma G., Prigge M.F., Nanan K.K., Shukla S., Haque N., Oberdoerffer S. TET-catalyzed oxidation of intragenic 5-methylcytosine regulates CTCF-dependent alternative splicing. EMBO J. 2016;35:335–355. doi: 10.15252/embj.201593235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Martinez F.P., Cruz R., Lu F., Plasschaert R., Deng Z., Rivera-Molina Y.A., Bartolomei M.S., Lieberman P.M., Tang Q. CTCF binding to the first intron of the major immediate early (MIE) gene of human cytomegalovirus (HCMV) negatively regulates MIE gene expression and HCMV replication. J. Virol. 2014;88:7389–7401. doi: 10.1128/JVI.00845-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Maunakea A.K., Nagarajan R.P., Bilenky M., Ballinger T.J., D'Souza C., Fouse S.D., Johnson B.E., Hong C., Nielsen C., Zhao Y. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature. 2010;466:253–257. doi: 10.1038/nature09165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Maurano M.T., Wang H., John S., Shafer A., Canfield T., Lee K., Stamatoyannopoulos J.A. Role of DNA methylation in modulating transcription factor occupancy. Cell Rep. 2015;12:1184–1195. doi: 10.1016/j.celrep.2015.07.024. [DOI] [PubMed] [Google Scholar]
  38. Nakahashi H., Kieffer Kwon K.R., Resch W., Vian L., Dose M., Stavreva D., Hakim O., Pruett N., Nelson S., Yamane A. A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 2013;3:1678–1689. doi: 10.1016/j.celrep.2013.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ong C.T., Corces V.G. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet. 2014;15:234–246. doi: 10.1038/nrg3663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Paredes S.H., Melgar M.F., Sethupathy P. Promoter-proximal CCCTC-factor binding is associated with an increase in the transcriptional pausing index. Bioinformatics. 2013;29:1485–1487. doi: 10.1093/bioinformatics/bts596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Parelho V., Hadjur S., Spivakov M., Leleu M., Sauer S., Gregson H.C., Jarmuz A., Canzonetta C., Webster Z., Nesterova T. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell. 2008;132:422–433. doi: 10.1016/j.cell.2008.01.011. [DOI] [PubMed] [Google Scholar]
  42. Phillips J.E., Corces V.G. CTCF: master weaver of the genome. Cell. 2009;137:1194–1211. doi: 10.1016/j.cell.2009.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Plasschaert R.N., Vigneau S., Tempera I., Gupta R., Maksimoska J., Everett L., Davuluri R., Mamorstein R., Lieberman P.M., Schultz D. CTCF binding site sequence differences are associated with unique regulatory and functional trends during embryonic stem cell differentiation. Nucleic Acids Res. 2014;42:774–789. doi: 10.1093/nar/gkt910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Ramsahoye B.H., Biniszkiewicz D., Lyko F., Clark V., Bird A.P., Jaenisch R. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl. Acad. Sci. U S A. 2000;97:5237–5242. doi: 10.1073/pnas.97.10.5237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Rao S.S., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Rhee H.S., Pugh B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–1419. doi: 10.1016/j.cell.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Shen L., Wu H., Diep D., Yamaguchi S., D'Alessio A.C., Fung H.L., Zhang K., Zhang Y. Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics. Cell. 2013;153:692–706. doi: 10.1016/j.cell.2013.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Shukla S., Kavak E., Gregory M., Imashimizu M., Shutinoski B., Kashlev M., Oberdoerffer P., Sandberg R., Oberdoerffer S. CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature. 2011;479:74–79. doi: 10.1038/nature10442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Spruijt C.G., Gnerlich F., Smits A.H., Pfaffeneder T., Jansen P.W., Bauer C., Munzel M., Wagner M., Muller M., Khan F. Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell. 2013;152:1146–1159. doi: 10.1016/j.cell.2013.02.004. [DOI] [PubMed] [Google Scholar]
  50. Szulik M.W., Pallan P.S., Nocek B., Voehler M., Banerjee S., Brooks S., Joachimiak A., Egli M., Eichman B.F., Stone M.P. Differential stabilities and sequence-dependent base pair opening dynamics of Watson-Crick base pairs with 5-hydroxymethylcytosine, 5-formylcytosine, or 5-carboxylcytosine. Biochemistry. 2015;54:1294–1305. doi: 10.1021/bi501534x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tahiliani M., Koh K.P., Shen Y., Pastor W.A., Bandukwala H., Brudno Y., Agarwal S., Iyer L.M., Liu D.R., Aravind L. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009;324:930–935. doi: 10.1126/science.1170116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Teif V.B., Beshnova D.A., Vainshtein Y., Marth C., Mallm J.P., Hofer T., Rippe K. Nucleosome repositioning links DNA (de)methylation and differential CTCF binding during stem cell development. Genome Res. 2014;24:1285–1295. doi: 10.1101/gr.164418.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Vietri Rudan M., Barrington C., Henderson S., Ernst C., Odom D.T., Tanay A., Hadjur S. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 2015;10:1297–1309. doi: 10.1016/j.celrep.2015.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Voss T.C., Hager G.L. Dynamic regulation of transcriptional states by chromatin and transcription factors. Nat. Rev. Genet. 2014;15:69–81. doi: 10.1038/nrg3623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wang H., Maurano M.T., Qu H., Varley K.E., Gertz J., Pauli F., Lee K., Canfield T., Weaver M., Sandstrom R. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 2012;22:1680–1688. doi: 10.1101/gr.136101.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wang L., Zhou Y., Xu L., Xiao R., Lu X., Chen L., Chong J., Li H., He C., Fu X.D. Molecular basis for 5-carboxycytosine recognition by RNA polymerase II elongation complex. Nature. 2015;523:621–625. doi: 10.1038/nature14482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wheldon L.M., Abakir A., Ferjentsik Z., Dudnakova T., Strohbuecker S., Christie D., Dai N., Guan S., Foster J.M., Correa I.R., Jr. Transient accumulation of 5-carboxylcytosine indicates involvement of active demethylation in lineage specification of neural stem cells. Cell Rep. 2014;7:1353–1361. doi: 10.1016/j.celrep.2014.05.003. [DOI] [PubMed] [Google Scholar]
  59. Wu X., Zhang Y. TET-mediated active DNA demethylation: mechanism, function and beyond. Nat. Rev. Genet. 2017;18:517–534. doi: 10.1038/nrg.2017.33. [DOI] [PubMed] [Google Scholar]
  60. Zampieri M., Guastafierro T., Calabrese R., Ciccarone F., Bacalini M.G., Reale A., Perilli M., Passananti C., Caiafa P. ADP-ribose polymers localized on Ctcf-Parp1-Dnmt1 complex prevent methylation of Ctcf target sites. Biochem. J. 2012;441:645–652. doi: 10.1042/BJ20111417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zhu Q., Stoger R., Alberio R. A lexicon of DNA modifications: their roles in embryo development and the germline. Front. Cell Dev. Biol. 2018;6:24. doi: 10.3389/fcell.2018.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Ziebarth J.D., Bhattacharya A., Cui Y. CTCFBSDB 2.0: a database for CTCF-binding sites and genome organization. Nucleic Acids Res. 2013;41:D188–D194. doi: 10.1093/nar/gks1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Zuin J., Dixon J.R., van der Reijden M.I., Ye Z., Kolovos P., Brouwer R.W., van de Corput M.P., van de Werken H.J., Knoch T.A., van IJcken W.F. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc. Natl. Acad. Sci. U S A. 2014;111:996–1001. doi: 10.1073/pnas.1317788111. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Transparent Methods and Figures S1–S3
mmc1.pdf (1.1MB, pdf)
Table S1. Summary of CTCF ChIP-Seq Alignment Statistics in mESCs, Related to Figure 1
mmc2.xlsx (10.4KB, xlsx)
Table S2. Locations and Properties of Wild-Type and Tdg −/− mESC CTCF Peaks, Related to Figures 1 and 2
mmc3.xlsx (3.5MB, xlsx)
Table S3. Genomic Annotation of Tdg −/− mESC CTCF Peaks Using HOMER, Related to Figures 2 and S3
mmc4.xlsx (6.9MB, xlsx)

Data Availability Statement

The accession number for the sequence data reported in this paper is GEO: GSE123101.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES