Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Apr 24:2024.04.24.590818. [Version 1] doi: 10.1101/2024.04.24.590818

Circadian regulation of stereotypic chromatin conformations at enhancers

Xinyu Y Nie 1, Jerome S Menet 1,2,#
PMCID: PMC11071494  PMID: 38712031

Summary

Cooperation between the circadian transcription factor (TF) CLOCK:BMAL1 and other TFs at cis-regulatory elements (CREs) is critical to daily rhythms of transcription. Yet, the modalities of this cooperation are unclear. Here, we analyzed the co-binding of multiple TFs on single DNA molecules in mouse liver using single molecule footprinting (SMF). We found that SMF reads clustered in stereotypic chromatin states that reflect distinguishable organization of TFs and nucleosomes, and that were remarkably conserved between all samples. DNA protection at CLOCK:BMAL1 binding motif (E-box) varied between CREs, from E-boxes being solely bound by CLOCK:BMAL1 to situations where other TFs competed with CLOCK:BMAL1 for E-box binding. SMF also uncovered CLOCK:BMAL1 cooperative binding at E-boxes separated by 250 bp, which structurally altered the CLOCK:BMAL1-DNA interface. Importantly, we discovered multiple nucleosomes with E-boxes at entry/exit sites that were removed upon CLOCK:BMAL1 DNA binding, thereby promoting the formation of open chromatin states that facilitate DNA binding of other TFs and that were associated with rhythmic transcription. These results demonstrate the utility of SMF for studying how CLOCK:BMAL1 and other TFs regulate stereotypical chromatin states at CREs to promote transcription.

Introduction

Cooperation between TFs at CREs (CREs being defined herein as enhancers and promoters) is critical to transcription activation111. Cooperativity can involve physical TF-TF interaction, but also occur in the absence of direct interaction through a mechanism called nucleosome-mediated TF cooperation6,8,12. Due to the preferential binding of most TFs to naked DNA13, nucleosomes represent a physical barrier that inhibits TF-DNA binding. Because the footprint that TFs occupy on DNA is much smaller than the footprint of the histone octamer (usually < 30 bp vs 147 bp)14, binding of a TF on DNA can not only inhibit nucleosome compaction, but also facilitate the binding of other TFs at nearby locations6,12,1520. However, how TFs cooperate to compete with histones, bind DNA, and regulate CRE activity is still unclear and remains a central question in the field of gene expression.

By switching from an active to inactive state on a 24-hour basis, circadian TFs provide a system to address the role of TF cooperativity in transcription regulation. In mammals, nearly every cell harbors a circadian clock that initiates daily rhythms in gene expression to activate biological processes at the appropriate time of the day. Given the large number of rhythmically expressed genes, the activity of most transcriptional programs and biological pathways oscillates across the day, as for example, those controlling metabolic functions in anticipation of the 24-hour feed-fasting rhythm21,22. The mammalian circadian clock is initiated by the heterodimeric basic-helix-loop-helix (bHLH) TF CLOCK:BMAL1. During the day, CLOCK:BMAL1 binds DNA to activate the transcription of Period (Per1, Per2, and Per3) and Cryptochrome (Cry1 and Cry2), which upon expression form a repressive complex that decreases CLOCK:BMAL1 DNA affinity at night and inhibits CLOCK:BMAL1-mediated transcription2325. In addition to regulating core clock gene transcription, the molecular clock also initiates the rhythmic expression of thousands of genes to regulate the oscillation of biological processes21,22,26. Accumulating evidence indicates that CLOCK:BMAL1 is not sufficient to generate active CREs, and that the activity of CLOCK:BMAL1-bound CREs rather relies on the cooperation between CLOCK:BMAL1 and other TFs2729.

Characterization of TF cooperative binding across an entire CRE on a single DNA molecule has been challenging. While techniques like chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-Seq) have helped uncover the genome-wide location of binding sites for many TFs in various species and tissues, they lack the ability of assessing nucleosome occupancy and TF cooperation on the same DNA molecule. The recent development of SMF has overcome some of these challenges by enabling the detection of TF and nucleosome footprints at CREs with single-molecule resolution over a range of ~500bp6. Here, we adapted SMF to 18 mouse liver CREs in vivo which, combined with our custom-made computational pipeline, allowed us to demonstrate that CREs in vivo exhibit stereotypical chromatin conformations reflecting dynamic changes in TF binding and nucleosome positioning. Further dissection of how CLOCK:BMAL1 regulates these distinct chromatin states shed light on how SMF can be leveraged to determine how TFs structurally and hierarchically remodel chromatin, evict/shift nucleosomes, and promote the recruitment of other TFs to regulate transcription.

Results

CREs exhibit stereotypic chromatin conformations that are conserved between samples

To characterize the modality of CLOCK:BMAL1 cooperation with other TFs, we adapted SMF to mouse liver nuclei of wild-type (WT) mice collected at ZT06 (mid-day) and ZT18 (mid-night) and clock-deficient Bmal1−/− mice (BMKO) at ZT06 (n = 3 per group), representing conditions exhibiting strong, weak and no CLOCK:BMAL1 DNA binding, respectively30,31. SMF relies on treating nuclei with the exogenous methyltransferase M.CviPi to methylate exposed cytosine at GpC dinucleotide followed by bisulfite conversion of unmodified cytosine to uracil (Fig. 1a). SMF thus labels every GpC across the genome based on DNA protection by chromatin-associated proteins and identifies simultaneous binding of TFs and nucleosomes on single DNA molecules. We carried out SMF at a total of 18 CREs (see methods) that were PCR-amplified to increase sequencing depth to a few thousand DNA molecules and sequenced as paired-end 250 bp (PE250) to characterize DNA protection on full-length CREs (Fig. 1a, S1). Analysis of footprinting signal was performed using a custom bioinformatic pipeline that returned SMF signal at reconstituted reads from PE250 for each CRE and animal, removed PCR duplicates, and output footprint signal across CREs as a binary string with protected GpCs labeled as 1 and exposed GpCs labeled as 0 (see methods for details; Fig. 1a). Bisulfite conversion efficiency was also computed at all HCH sites (H = A, C, or T) and found to be highly effective with 96.29 +/− 2.10% (mean +/− SD) of cytosine at HCH sites sequenced as thymine (Fig. S2).

Fig. 1: CREs exhibit stereotypic chromatin conformations.

Fig. 1:

(a) Experimental design and bioinformatic analysis of single molecule footprinting (adapted from Sonmezer et al., 20216). (b) DNase-Seq, CLOCK ChIP-seq and BMAL1 ChIP-seq signals at an enhancer 1kb upstream Inmt TSS (chr6:55152994-55153301; highlighted in yellow). (c) Heatmap illustrating SMF signal at the enhancer upstream Inmt TSS in mouse liver. Each line represents a GpC protection event on a single DNA molecule, with exposed/methylated cytosines colored in yellow, and protected/unmethylated cytosines colored in green (WT ZT06), brown (WT ZT18) or blue (BMKO ZT06). Shades of green, brown, and blue distinguish biological replicates. Reads from all nine animals (3,385 per sample) were clustered by the BMD clustering algorithm in 17 clusters, and each column illustrates protection at single GpCs (range of 308 bp). Arrows illustrate four major protection events. (d) Percentage of reads allocated to clusters comprising reads from 1 to 9 samples for each of the 18 CREs. Dot for each boxplot represents one of the 18 CREs.

We first sought to determine if DNA protection at CREs occurs at specific locations and/or exhibits distinct patterns. To that end, we binned for each CRE an equal number of reads from all nine biological samples to avoid overrepresentation of some samples vs. others, and performed a clustering analysis using the Binary Matrix Decomposition (BMD) algorithm that specializes in clustering binary data32. We found that reads within each CRE could be clustered into 11–21 chromatin states that reflect distinguishable organization of TFs and nucleosome signals (Fig. S3a). For example, reads at an enhancer upstream Inmt transcription start site (TSS), which is targeted by CLOCK:BMAL1 and used hereafter as a prototypical CRE with accessible chromatin as assessed by DNase-seq33 (Fig. 1b), were clustered into 17 clusters reflecting various levels of DNA protection (Fig. 1c). Interestingly, each cluster revealed a distinct protection profile. Cluster 1 (C1) showed high protection at nearly all GpCs, likely reflecting a condensed chromatin state. Clusters 2 to 8 had several regions displaying continuous protection over at least 75 bp, likely reflecting chromatin states being bound by one or more nucleosomes (Fig. 1c). Considering that most TFs have less access to nucleosomal DNA than to free DNA13, these clusters could reflect transcriptionally inactive states. In contrast, clusters 11 to 15 showed less nucleosome protection and increased protection at one or two consecutive GpCs that likely correspond to a TF bound to DNA. When all nucleosomes are evicted, CREs can reach a fully accessible chromatin state, as in clusters 16 and 17 (Fig. 1c). Together, this clustering analysis suggests that SMF detects the multiple configurations that CREs undergo because of dynamic changes in TF binding and nucleosome positioning.

A striking feature of these chromatin states is their conservation between all nine biological samples (Fig. S3b). At the Inmt enhancer, all clusters contained reads from at least eight animals, and 14 clusters had reads from all nine animals (Fig. 1c, S3c). This indicates that nucleosome positioning at CREs is not random but rather genetically encoded either by DNA sequence and/or TF binding, as suggested elsewhere34,35. A similar result was found for all other CREs (Fig. S3b). More than 94.7% of all reads were binned in clusters common to 7, 8, or 9 animals (Fig. 1d, S3b), and clusters defined by reads from 6 or less samples were for the most part minor clusters encompassing less than 5.3% reads (Fig. 1d, Fig. S3d). These data therefore indicate that chromatin conformations are highly conserved among samples, and immune to differences between genotypes and timepoints.

To confirm that our findings were not biased by the computational pipeline, we performed several additional analyses. First, we carried out a BMD clustering analysis of reads from each individual sample, reasoning that a chromatin state found by clustering all nine samples together should also be identified when clustering each sample individually. To quantitatively define the nature of all clusters, we computationally defined the protection state of each GpC as either bound by nucleosome or exposed DNA (either naked or bound by TF), and performed a hierarchical clustering analysis using the defined GpC protection state of all clusters of each sample (Fig. S4a). This analysis revealed that clusters from all nine individuals could be parsed in five major groups representing various levels and location of nucleosome protection (Fig. S4b). Importantly, 4 out of the 5 groups contained clusters from all animals, and many clusters were observed in all nine samples. The last group (group 2) contained clusters from 6 samples encompassing the three experimental conditions. Thus, BMD clustering analysis of individual samples confirmed that chromatin states at CREs are highly conserved between samples and likely reflect biologically relevant chromatin conformation.

Two additional analyses were carried out. First, we calculated the simple matching coefficient between each pair of reads at Inmt TSS upstream enhancer using 1,692 randomly selected reads for each of the nine samples (total of 15,228 reads and over 115 million pairwise comparisons), and performed a hierarchical clustering of the simple matching coefficient (Fig. S4c). No major segregation of reads based on timepoints or genotypes was observed, and computation of the percentage of reads found in the top ten clusters defined by the hierarchical clustering showed that clusters comprised reads from all nine samples, except two clusters where reads from a single replicate were lacking (Fig. S4d). Second, we performed a principal component analysis with reads from all samples, which did not reveal any sample-specific distributions, i.e., profiles were highly similar for all nine samples (Fig. S4e).

Taken together, our data indicate that the chromatin states that we identified represent biologically-relevant chromatin conformations that are distinguishable based on the protection of DNA by histones and TFs, and that are conserved across timepoints and between wild-type and Bmal1−/− mice. Given that transcription occurs by bursting3638, it is tempting to speculate that the multiple chromatin conformations that we observed at each CRE reflect the differences in transcriptional activity that exist between different single cells.

SMF defines the multiple positions of nucleosomes at CREs

The numerous chromatin conformations that we identified with SMF clustering analysis (Fig. 1c) were largely linked to the different positions of long stretches of DNA protection that we interpreted as nucleosome protection. Since SMF analysis of nucleosome positioning may reveal how TFs dynamically remodel chromatin at an unprecedented resolution, we sought to determine formally whether these long stretches of DNA protection reflect legitimate nucleosome signal. To this end, we first generated an algorithm that computationally defined within each cluster long stretches of GpC protection as nucleosomes, and calculated the percentage of nucleosome protection at each GpC for all CREs (see methods). Then, we compared this SMF nucleosome signal to that of the current gold-standard for nucleosome mapping, i.e., sequencing of ~150bp DNA fragments obtained from chromatin digestion with micrococcal nuclease (MNase-Seq)39. Using a public mouse liver MNase-seq dataset comprising nearly 1 billion nucleosomes from 47 biological replicates of WT and BMKO mice40, we found that SMF nucleosome signal strongly matched MNase-Seq signal, with SMF nucleosome protection signal at all 453 GpCs being positively correlated with MNase-seq signal (Pearson correlation coefficient: 0.79, p-val: 1.03×10−97) (Fig. 2a, S5a). Importantly, SMF was also able to resolve complex profiles of MNase-Seq signal that are difficult to interpret. For example, the lack of clear nucleosome positioning by MNase-Seq at Inmt enhancer (Fig. 2b) can be explained by the half-dozen positions that nucleosomes harbor at that enhancer (Fig. 1c). Moreover, events where MNase-Seq profile indicate that the dyad of two adjacent nucleosomes are spaced less than 140 bp apart (e.g., as if nucleosomes were overlapping) are explained by SMF with nucleosomes being well positioned on distinct DNA molecules and parsed in different clusters (Fig. 2c, S5b). Thus, SMF clustering analysis improves the resolution of the location of nucleosomes across CREs, and enables the quantification of the percentage of alleles comprising a nucleosome at a specific position.

Fig. 2: SMF improves the resolution of nucleosome positioning.

Fig. 2:

(a) Pearson correlation between SMF nucleosome signal and MNase-seq signal at 453 distinct GpCs located in 18 CREs. MNase-Seq datasets comprise 978,740,598 nucleosomes originating from 47 WT or BMKO mouse livers. (b) Nucleosome signal at Inmt TSS upstream enhancer. Top: MNase-seq signal; Middle: panel representing the definition of each GpC as either nucleosome (purple) or accessible chromatin (naked DNA or bound by TFs; yellow) in chromatin states C2 to C15 (see Fig. 1c); Bottom: SMF nucleosome signal (equivalent to percentage of nucleosome protection at each GpC). (c) SMF signal at Dbp promoter (chr7:45354206-45354625). Top left panel: SMF nucleosome signal; Middle and bottom left panels: MNase-seq signals displayed with nucleosome signal calculated for the full-length 147 bp nucleosome (middle left) or calculated for 75 bp centered on the nucleosome dyad (bottom left; commonly used to improve the resolution of nucleosome positioning by MNase-Seq). Right: heatmap displaying SMF profile as in Fig. 1c. Reads from all nine animals (1,715 reads for each sample) were clustered in 13 clusters, and each column illustrates protection at a single GpC (range of 420 bp). (d) Percentage of exposed DNA for each of the 18 CREs analyzed in this study, and calculated using SMF profiles and the computationally-defined nucleosome position. (e) Pearson correlation between SMF exposed DNA signal and MNase-seq signal using signals at 18 CREs. (f) Pearson correlation between SMF exposed DNA signal and DNase-seq signal using signals at 18 CREs. (g) MNase-seq nucleosome signals at Inmt TSS upstream enhancer, obtained from paired-end sequencing of high (left) and low (middle) MNase digested mouse liver nuclei. The ratio of low over high MNase-seq signal (right) was calculated to represent nucleosome fragility at Inmt TSS upstream enhancer.

Since nucleosomes represent a physical barrier for the binding of most TFs to DNA13, we next sought to leverage our data to determine the percentage of exposed DNA at each individual CRE. To that end, we calculated the proportion of exposed DNA for each cluster, which were then combined for each CRE (Fig. S5c). This analysis revealed that the extent to which DNA is exposed at CREs varied extensively, with some CREs exhibiting as low as 13.2% of nucleosomal DNA while others showed 68.9% of DNA wrapped around histones (Fig. 2d). Such differences imply that the regulation of CRE transcriptional activity is CRE-specific with, for example, increased involvement of chromatin remodelers for TF DNA binding at CREs exhibiting lower levels of exposed DNA. Unsurprisingly, the percentage of exposed DNA at each CRE was negatively correlated with MNase-Seq signal (Fig. 2e). Consistent with the nature of DNase-Seq that necessitates two closely spaced cleavage events for DNA selection and sequencing, we also observed a positive correlation between the percentage of exposed DNA and DNase-seq signal (Fig. 2f).

Nucleosomal DNA can partially unwrap from the histone octamer, and when this transient conformation is stabilized, nucleosomes become more susceptible to MNase digestion and are called fragile nucleosomes41. CREs are enriched in fragile nucleosomes, and nucleosome fragility can be increased without histone eviction in response to environmental changes that poise genes for transcriptional activation42,43. Given that SMF detects the different chromatin states that CREs undergo and resolves nucleosome positioning, we sought to determine whether fragile nucleosomes are associated with specific chromatin states through the analysis of public high and low MNase-Seq datasets from mouse liver44. At Inmt TSS upstream enhancer, we observed a peak found only in low but not high MNase-Seq datasets, indicating the enrichment of a fragile nucleosome at that location (dyad located ~105bp downstream the 5’ end of SMF reads) (Fig. 2g). Inspection of DNA protection at that enhancer (Fig. 1c) revealed the presence of a nucleosome at position 38–168 bp in cluster C4, matching the low MNase-Seq peak at position ~50–150bp (Fig. 2g). Examination of other CREs (Fig. S6ac) confirmed that the combination of SMF with low vs. high MNase-Seq approaches can uncover the chromatin states that harbor higher proportion of fragile nucleosomes, thus suggesting that the complementary usage of these assays could shed light on the mechanisms underlying the formation of fragile nucleosomes.

Protection at E-boxes is variable between CREs

CLOCK:BMAL1 binds DNA rhythmically, yet it remains unknown whether other TFs bind to the same E-boxes when CLOCK:BMAL1 DNA affinity is minimal at night. To address this possibility, we analyzed the 10 out of 18 CREs that exhibit rhythmic BMAL1 ChIP-Seq signal (Fig. S7a), have at least one E-box (CACGTG with up to one mismatch), and have a GpC located within 5 bp of the E-box (CLOCK:BMAL1 footprint on DNA is ~18 bp28,45; Fig. S7b), reasoning that binding of CLOCK:BMAL1 and/or other TFs to E-boxes should prevent GpC methylation by M.CviPi (sequences are provided in table S1). Percentage of protection was calculated at 42 GpCs (27 E-boxes) using reads with E-boxes unprotected by a nucleosome, and was found to be significantly higher in WT ZT06 and lower in BMKO, consistent with CLOCK:BMAL1 DNA binding ability (Fig. 3a). Examination of protection at individual E-boxes revealed some variability. Consistent with the global pattern, DNA protection was significantly different between the three groups and higher in WT ZT06 at 6 E-boxes located in 3 CREs (Por distal enhancer, Inmt upstream TSS, and Nr1d1 1st intron; Fig. 3b). A similar trend was observed at three E-boxes in Dbp promoter, but no statistical significance was reached likely because of variability between samples (GpCs 65, 218, 317; p < 0.1; Fig. 3b). More generally, protection of many GpCs at E-boxes was minimal in BMKO mice (e.g., 20/42 GpCs had less than 5% protection in BMKO mice, and BMKO was the group with lowest DNA protection for 26/42 GpCs), demonstrating that CLOCK:BMAL1 is primarily responsible for protecting these GpCs (Fig. 3a,b). However, we observed some CREs exhibiting similar levels of protection between groups, or even higher in WT ZT18. This is exemplified with a CRE located in an intergenic region, where three GpCs at E-box 3 showed 80% protection in BMKO ZT6 (GpCs 427, 431, 440; Fig. S7c). Given that open chromatin states at this CRE are enriched in WT ZT06 group (Fig. S7c,d; see below), this suggests that CLOCK:BMAL1 binds DNA during the day, but that other TFs compete with CLOCK:BMAL1 for E-box binding especially when CLOCK:BMAL1 DNA binding affinity is decreased. These other TFs may include USF1 and DEC1, which are bHLH TFs like CLOCK:BMAL1 and bind E-boxes (Fig. S7e).

Fig. 3. Patterns of protection at E-boxes differ between CREs.

Fig. 3.

(a) Percentage of protection at 42 GpCs located within 5 bp of E-boxes (CACGTG with up to one mismatch) for the 10 CREs bound by CLOCK:BMAL1 in WT ZT06 (green), WT ZT18 (brown), and BMKO ZT06 (blue). The percentage of protection was calculated using reads where GpCs were not protected by a nucleosome. Repeated measures ANOVA between the three groups. * p<0.05. (b) Scatter plot of the percentage of protection at the 42 E-box GpCs for all 9 samples, using Fig. 1c color code. The size and color of the bubble plot (left) depict the p-value obtained by one-way ANOVA (n = 3 samples/group). Circles are colored in grey if p>0.05. (c) SMF profile at an enhancer located in Nr1d1 1st intron (chr11:98664610-98665043) in mouse liver. The heatmap displays protection from GpC methylation as in Fig. 1c (n = 2,368 reads for each sample; range of 434 bp). Asterisks highlight GpCs at five E-boxes, with sequences provided below the heatmaps. (d) Normalized extend of co-binding (N.EOC) calculated between every GpC located within 5 bp of an E-box at Nr1d1 1st intron. Fisher’s exact test followed by Benjamini-Hochberg correction. Squared N.EOC values were plotted in heatmap. (e) Number of reads based on the protection profiles at E-box4 and E-box 2/3 for each group. Protection events at E-box 4 (GpCs 408/411/419) were separated in 4 categories: all GpCs exposed (000), only GpC 408 exposed (011), all GpCs protected (111), and all other protection events (others). Protection events at E-boxes 2/3 (GpCs 124/131/138) were similarly separated in 4 categories: all exposed (000, yellow), only exposed at GpC 138 (110, orange), all protected (111, purple), and others (grey). (f) Schematic illustrating changes in GpC protection at E-box2/3/4/5 based on CLOCK:BMAL1 DNA binding.

SMF resolution also provided the opportunity to examine CLOCK:BMAL1 cooperative binding between multiple E-boxes. To that end, we focused on a CRE located in the first intron of the clock gene Nr1d1 that contains 5 E-boxes (Fig. 3c). E-box 1 and E-box 2 are separated by 6 bp, which has been characterized as a dual E-box and the preferred CLOCK:BMAL1 binding site46. Intriguingly, we observed CLOCK:BMAL1 binding on E-box 2 (GpC124) but not E-box 1 (GpC100), and protection at E-box 2 was instead associated with protection at E-box 3, which are separated by 9 bp (Fig. 3c). Protection at E-boxes 4 and 5, which are 16 bp apart, were also strongly correlated (Fig. 3c). To quantitatively assess cooperative binding, we adapted a previously published methodology that computes a “normalized extent of co-binding” (N.EOC) between protection events at two GpCs5, and used Fisher’s exact test to test if co-binding was non-random (see Fig. S7f and methods for details). This analysis confirmed the strong CLOCK:BMAL1 cooperative binding between E-box2 and E-box3, and between E-box4 and E-box5 (Fig. 3d). Cooperation at E-boxes 2/3 was only observed in WT ZT6, whereas cooperation at E-boxes 4/5 was similar at both timepoints. Interestingly, we also found a higher N.EOC between GpCs 124/131 (E-boxes 2/3), GpCs 411/419 (E-box 4) and GpC 429/432/441 (E-box5) in WT ZT6 than in the other two groups, suggestive of a cooperative binding of CLOCK:BMAL1 over a range of 280 bp (Fig. 3d). Examination of co-binding events at another CRE carrying multiple E-boxes in Dbp promoter also revealed cooperative CLOCK:BMAL1 binding at multiple E-boxes separated by over 250bp and which was, as Nr1d1 first intron, more potent in WT ZT06 (Fig. S7g).

Closer inspection of CLOCK:BMAL1 co-binding events at Nr1d1 first intron revealed that although CLOCK:BMAL1 protects GpC 138 (E-box 3) and GpC 408 (E-box 4), their co-binding was surprisingly much lower than the co-binding between GpCs 124/131/411/419 (Fig. 3d). We hypothesized that this may be caused by a conformational change of how CLOCK:BMAL1 binds DNA when co-bound on E-boxes 2/3/4. To test this hypothesis, we first separated protection events at E-box 4 (GpCs 408/411/419) in 4 categories: all GpCs exposed (000), only GpC 408 exposed (011), all GpCs protected (111), and all other protection events (others) (Fig. 3e). Next, we calculated for each protection category at E-box 4 the number of reads exhibiting specific protection profiles at E-boxes 2/3 (GpCs 124/131/138): all exposed (000, yellow), only exposed at GpC 138 (110, orange), all protected (111, purple), and others (grey). This analysis revealed that, at ZT06 when CLOCK:BMAL1 is high, the proportion of CLOCK:BMAL1 binding at E-boxes 2/3 is twice higher when CLOCK:BMAL1 is bound at E-box 4 than when unbound (60.1% vs. 33.2%), further confirming CLOCK:BMAL1 cooperative between E-boxes separated by 280 bp. Interestingly, the proportion of CLOCK:BMAL1 binding at E-boxes 2/3 was even higher when GpC 408 was exposed (87.4% vs. 60.1% of reads), with a large fraction of the reads emanating from protection events where GpC 138 at E-box 3 is exposed. Together, this suggests that CLOCK:BMAL1 binds to GpC138 and GpC408 differently when engaged in a long-range interaction (Fig. 3e). At ZT18 (when CLOCK:BMAL1 DNA binding is minimal) and in BMKO mice, the proportion of protection at E-boxes 2/3 was equivalent or lower when E-box4 was protected. Thus, CLOCK:BMAL1 cooperative binding between E-boxes 2/3/4 was only enriched during the day, and may be mediated by proteins that specifically interact with CLOCK:BMAL1 at time of increased DNA binding affinity (Fig. 3f).

CLOCK:BMAL1 promotes rhythmic chromatin opening and cooperation with other TFs

We next sought to determine if CLOCK:BMAL1 DNA binding affects chromatin states at CREs. A recent study showed that CLOCK:BMAL1 binds to the entry/exit site of nucleosomes, i.e., binds E-box(es) located at superhelix location (SHL) +/−5 to SHL +/−747 (Fig. 4a). To determine how prevalent CLOCK:BMAL1 nucleosome binding is, we used our SMF data at the 10 CREs bound by CLOCK:BMAL1 and computationally identified nine nucleosomes having an E-box located within 20 bp of the SMF nucleosome edge (Fig. 4b,c, S8; see methods for details). Remarkably, SMF nucleosome signal for seven out of the nine nucleosomes was lower in WT ZT06, when CLOCK:BMAL1 DNA binding is maximal (one-way ANOVA; 4 nucleosomes with p < 0.05; 3 other nucleosomes with p < 0.1). Moreover, analysis of the levels of exposed DNA revealed that chromatin at many CLOCK:BMAL1 CREs was more open in WT ZT06 that in WT ZT18 and BMKO mice (Fig. 4d, S9a). This CLOCK:BMAL1-mediated chromatin opening was importantly not prevalent at non-CLOCK:BMAL1 CREs (Fig. 4d, S9a). This therefore suggests that CLOCK:BMAL1 binding to nucleosome entry/exit sites occurs at many CREs, leads to nucleosome removal, and promotes an accessible chromatin environment.

Fig. 4. Many nucleosomes with an E-box at the entry/exit site are removed upon CLOCK:BMAL1 binding.

Fig. 4.

(a) Schematic of CLOCK:BMAL1 preferential binding to nucleosome entry/exist site (left) and approach used to uncover E-box(es) at nucleosome entry/exit site. SHL = superhelix location. (b) Percentage of GpC protection at Inmt TSS upstream enhancer, chromatin state C9 (see Fig. 1c), showing a nucleosome (purple region; GpCs 130–262) with an E-box (red lines) at the entry/exist site. Shaded areas in green, brown, and blue represent the S.E.M. of 3 biological replicates. (c) Percentage of reads in clusters identified as having a nucleosome with E-box(es) at the entry/exist site for each group. Error bars correspond to the S.E.M. of 3 biological replicates. One-way ANOVA; * p<0.1; ** p<0.05. Clusters with nucleosomes at similar locations were combined. (d) Percentage of exposed DNA at CREs in WT ZT06, WT ZT18, and BMKO ZT06. CREs were separated based on CLOCK:BMAL1 binding by ChIP-seq. One-way ANOVA; p<0.05, red; p>0.05, grey.

Does CLOCK:BMAL1-mediated increase in DNA accessibility enable other TFs to bind DNA and regulate transcription? At Inmt TSS upstream enhancer and Por distal enhancer, we observed that nucleosomes moved gradually (shifted to the right on heatmaps), thereby exposing more chromatin (Fig. S4a, S9b). As chromatin became more accessible, other TFs including members of the RFX family were able to bind DNA (Fig. 5a, S9c). This included a protection event at GpC 262 in the Inmt TSS upstream enhancer, which showed increased protection in WT ZT06 compared to the other two groups (9.9% of the reads were protected in WT ZT06 vs. 3.1% and 4.0% in WT ZT18 and BMKO, respectively; Fig. 5b). Importantly, this increased protection at GpC 262 was primarily caused by CLOCK:BMAL1-mediated opening of the chromatin, since GpC 262 protection was not different between the three groups when considering only reads where GpC 262 is exposed (i.e., intrinsic capability of a TF to bind naked DNA; Fig. 5c). Our results also showed that co-binding between CLOCK:BMAL1 and the TF protection event at GpC 262 was significantly higher in WT ZT6 compared to the other two groups, which coincides with the peak of Inmt transcription (Fig. 5e,f). A similar finding was found at Por distal enhancer, where the percentage of protection at a RFX5 binding motif was higher in WT groups than in BMKO, but where no differences were observed when considering exposed reads (Fig. S9c,d). Overall, our data therefore suggest that CLOCK:BMAL1 promotes rhythms of chromatin accessibility that facilitate the binding of other TFs, and that this cooperation between CLOCK:BMAL1 and other TFs coincides with transcription activation.

Fig. 5. CLOCK:BMAL1-mediated chromatin opening facilitates the nearby binding of other TFs.

Fig. 5.

(a) TF motif analysis around GpC 262 at Inmt TSS upstream enhancer performed using TomTom. (b) Percentage of protection at GpC 262 at Inmt TSS upstream enhancer, calculated using all reads. Error bars correspond to the S.E.M. of 3 biological replicates. (c) Percentage of protection at GpC 262 at Inmt TSS upstream enhancer, calculated using the number of reads where GpC 262 is exposed. One-way ANOVA was performed between the three groups. (d) Normalized extend of co-binding between CLOCK:BMAL1 and the candidate TF at GpC 262 of Inmt TSS upstream enhancer. Each dot represents the value of each sample, and the error bars correspond to the S.E.M. of 3 biological replicates. (e) Mouse liver Inmt pre-mRNA level across the 24-hour day, from public mouse liver total RNA-Seq datasets (GSE73554) from Atger et al., 201562. Black and blue line represent nighttime-fed WT and BMKO mice, respectively.

Discussion

By applying SMF to mouse liver in vivo, we demonstrate that CREs adopt a defined set of chromatin conformations that reflect stereotypic organization of nucleosomes and transcription factors. These chromatin states are remarkably conserved between individuals, strongly suggesting that they are genetically encoded, i.e., DNA sequence at CREs either limits the position of nucleosomes to specific locations, instructs where TFs bind to remodel chromatin, and/or a combination of both. Carrying out SMF at CREs targeted by the pioneer-like circadian TF CLOCK:BMAL1 revealed that changes in CLOCK:BMAL1 DNA affinity did not generate activity-specific chromatin states, but rather increased the representation of states with more open chromatin. Due to the bursting nature of transcription3638, it is tempting to speculate that these different conformations represent different biologically relevant chromatin states encompassing various stages of transcription regulation that exist between different single cells.

CLOCK:BMAL1 binds DNA rhythmically across tissues, yet the extent to which other bHLH TFs bind the same E-boxes at times of low CLOCK:BMAL1 DNA binding affinity remains largely unknown. A few TFs including USF148, DEC1/2 (aka BHLHE40/41)49, and MYC50 are known to compete with CLOCK:BMAL1 for E-box access, but competition may be more widespread since the mammalian bHLH TF family consists of over 100 members 51,52. So far, this question has been technically difficult to address, especially in vivo where DNA is chromatinized. Using SMF at 10 CREs targeted by CLOCK:BMAL1, we found that levels of E-box protection varied between CREs. At four CREs, E-box protection was minimal in BMKO mice and mirrored CLOCK:BMAL1 ChIP-Seq signal, indicating that CLOCK:BMAL1 is the only bHLH TF member to target these E-boxes. Levels of protection at other E-boxes was more even between groups, including at an intergenic enhancer exhibiting very high protection even in BMKO mice. Given the overrepresentation of open chromatin states at that enhancer when CLOCK:BMAL1 ChIP-Seq is maximal, this suggests that CLOCK:BMAL1 binds DNA during the day and that other bHLH TFs bind at night. More generally, our data demonstrate that some (but not all) CLOCK:BMAL1-bound E-boxes are targeted by other TFs. Interestingly, changes in these TFs’ activity could alter how they compete with CLOCK:BMAL1, and thus affect CLOCK:BMAL1 DNA binding without altering its DNA binding affinity. Extending SMF to more CREs targeted by CLOCK:BMAL1 should help address this possibility.

SMF also provided valuable insights into the cooperation between CLOCK:BMAL1 molecules at CREs having multiple E-boxes. At Nr1d1 1st intron, CLOCK:BMAL1 did not co-bind the two E-boxes of its preferred dual E-boxes motif, but rather engaged into co-binding events at E-boxes separated by 9 or more bp. Interestingly, a long-range interaction of 274 bp was only observed at ZT6 but not ZT18 and BMKO mice, and was associated with differences in E-boxes protection that imply structural changes in CLOCK:BMAL1 interface with DNA. Given the over 50-fold difference in Nr1d1 transcription between ZT06 and ZT1853, and the low-level protection at E-boxes 4/5 at ZT18, this indicates that Nr1d1 transcription is mostly initiated when CLOCK:BMAL1 is co-bound to E-boxes 2/3/4/5. To some extent, these data about cooperation between CLOCK:BMAL1 molecules also support recent findings showing that CLOCK:BMAL1-mediated transcription does not just rely on its binding to a single E-box, but rather on its cooperation with other TFs2729.

CLOCK:BMAL1 was recently reported to bind the entry/exit site of nucleosomes and to interact with the H2A-H2B acidic patch47, a mechanism that would explain its pioneer-like activity40. Remarkably, we found 9 nucleosomes (out of the 10 CREs we analyzed in this study) harboring an E-box at their entry/exit sites, with 7 of them showing decreased nucleosome signal in WT ZT06 compared to WT ZT18 or BMKO mice. CLOCK:BMAL1 binding at the entry/exit sites of nucleosomes is thus prevalent in the mouse genome, and it importantly promotes the removal and/or sliding of nucleosomes. This nucleosome remodeling is likely achieved by CLOCK:BMAL1-mediated recruitment of chromatin remodelers/modifiers54, and may also involve CLOCK histone acetyltransferase activity55. By generating a more open chromatin landscape, CLOCK:BMAL1 increases the capability of other TFs that preferentially bind exposed DNA to access their binding sites. We found that this nucleosome-mediated TF cooperation mechanism applies to at least two protection events that contain a binding motif for RFX TF family. Increased SMF resolution, which is currently limited to GpC dinucleotide, is likely to uncover additional TFs whose binding is increased by CLOCK:BMAL1-mediated chromatin opening.

In summary, our in vivo SMF study in mouse liver provides a mechanistic understanding of how CLOCK:BMAL1 DNA binding at CREs evicts/moves nucleosomes and increases DNA accessibility for other TFs that preferentially bind naked DNA to regulate gene expression across the 24-hour day. Our analysis pipeline also provides the framework to using SMF for mechanistic studies interrogating how TFs and other chromatin associated proteins remodel the chromatin environment at CREs to favor the overrepresentation of chromatin conformations that are permissive to transcription.

Methods

Mice

Experiments with mice were approved by the Texas A&M University Institutional Animal Care and Use Committee. Adult male wild type (WT; C57BL/6NCrl strain) and Bmal1−/− mice (BMKO; backcrossed to C57BL/6NCrl background for a minimum of 8 generations) were housed under 12 h light:12 h dark, and provided food and water ad libitum. Mice were euthanized by isoflurane anesthesia followed by decapitation in the middle of the day (ZT06; WT and BMKO) or night (ZT18; WT only), with three biological replicates per group. Livers were dissected quickly after euthanasia, briefly washed in ice-cold 1X PBS, snap-frozen in liquid nitrogen, and stored at −80°C until nuclei purification.

Single molecule footprinting

The SMF protocol was adapted from Sönmezer et al., 20216 for mouse liver as described in Michael et al., 202347. Briefly, nuclei from nine samples (WT ZT06, WT ZT18 and BMKO ZT06; 3 replicates of each) were purified as described in Menet et al., 201253, and 250,000 nuclei were used for each methylation reaction. Nuclei were washed (50 mM Tris pH 8.5, 50 mM NaCl, 10 mM DTT) and resuspended in 1ml M.CviPi reaction buffer (50 mM Tris pH 8.5, 50 mM NaCl, 300 mM sucrose, 10 mM DTT), to which 18.75 µl of 32 mM SAM and 200U of M.CviPi were then added. After 7.5 min incubation in a 37°C water bath, the reaction was supplemented with 100U M.CviPI and 128 µmol of SAM for another round of 7.5 min incubation at 37°C. Then, 350 µl of SDS containing buffer (20 mM Tris, 600 mM NaCl, 1% SDS 10 mM EDTA) and 20 µL of Proteinase K (20 mg/ml) were added to stop the reaction, and samples were incubated overnight at 55°C. Genomic DNA (gDNA) was purified by phenol-chloroform extraction, and 2 µg of gDNA was used for bisulfite conversion using the Epitect bisulfite conversion kit (QIAGEN 59124) or 200 ng gDNA was used for enzymatic conversion (NEB E7125). A total of 18 CREs were selected for this study, including 10 CLOCK:BMAL1 targeted CREs. The first 8 CREs were selected based on CLOCK and BMAL1 ChIP-seq peaks, and the remaining 10 CREs were selected because they are located in two genes (Por and Inmt) studied with the original 8 CREs. A minimum of 10–12 ng of converted DNA was used to amplify each of the 18 CREs as in Sönmezer et al., 20216. Each PCR product was individually purified with 1.5x SPRI beads, and PCR products from the same liver sample were mixed and used to generate sequencing libraries with NEBNext Ultra II Kit (amplification for 12 cycles). Libraries from the nine biological samples were pooled together and sequenced with a MiSeq v2 Nano Reagent kit or MiSeq v2 Reagent kit (paired-end 250 bp).

Sequence alignment

The python package and parameters used for alignment were described in Michael et al 202347, and provided as supplementary file 1. Briefly, pairwiseAligner function from Bio.Align python package were used to align reads from fastq files, using parameters of matched, mismatched, and gapped score set as 1.0, −0.2 and −0.5 respectively. An alignment score was calculated based on the sum of alignment score divided by the length of query sequence, with the maximum score of 1.0. First, a pre-selection was performed to select reads that have an alignment score of 0.8 when aligning both forward and reverse primer sequences to the beginning ~25nt sequences in the paired-end fastq files. The full sequence of pre-selected pair-end reads were then aligned to bisulfite converted target CRE sequences (HCH replaced by HTH, GC replaced by GY, CG replaced by YG, with Y= pyrimidine and H = not-G). Based on the alignment results at each nucleotide, the full-length query sequence was then reconstituted for reads with both paired-end sequence alignment scores higher than 0.7, and nucleotides having higher quality scores were used for overlapping region reconstitution between the paired-end read. Reads having a cytosine of a GCH position not sequenced as GC or GT were removed, along with PCR duplicates.

UMI barcode

Amplification of the 10 CREs in Por and Inmt genes (see single molecule footprinting section above) was performed using primers having UMI barcodes added in both forward and reverse primers. To avoid biased incorporation of UMIs mapping perfectly to the amplicon, UMI barcodes contained 4 mismatched nucleotides directly upstream the priming sequence and 4 random nucleotides (N) in 5’. The four mismatched nucleotides were designed to avoid annealing with the template nucleotide, e.g., if the template was an adenine (A), then a letter B (= any nucleotide other than an A, i.e., C, G, or T) was incorporated in the UMI barcode. If the template had a GC or CG, then a letter R (= A or G) was incorporated in the UMI. Because of the 250bp sequencing length limitation for paired-end reads and our goal of sequencing GpCs in the middle of the amplicon, Por CRE8 has 6 N plus 4 mismatches in the forward primer, Por CRE7 and CRE10 has 6 N plus 5 mismatches in the forward primer, and none of the reverse primers for these 3 CREs had an UMI. Sequences of primers are provided in table S1, and details about which CREs were sequenced in each fastq file are provided in table S2.

Sequence alignment of CREs amplified with UMI-containing primers was performed as above with slight modifications. First, reads were called unique if they had a unique barcode in both forward and reverse primers. Then, if several reads had the same UMI combinations, then the read with the highest total quality score was kept. This score was calculated by adding the quality score (QC) at each position if QC ≥ 15, which is similar to samtools (as in DuplicateScoringStrategy class in samtools open-source code)56. A random read was chosen if multiple reads had the same maximum total quality score.

BMD Clustering

Only reads that successfully passed the sequence alignment steps were used for downstream analysis. First, the sample with the lowest number of reads among the nine samples was identified for each CRE, and its number of reads was used to downsample by random selection the number of reads in the other eight samples such that all nine samples contained an equal number of reads for each CRE in subsequent steps. Conversion information was extracted at each GCH position (except for GCH positions in the PCR primers) for each read, using 0 or 1 to represent unprotected (sequenced as GC) or protected (sequenced as GT), respectively. In the Nr1d1 1st intron, one GpC in the middle of the amplicon was not covered by the sequencing of either paired end read and was thus removed from analysis. Binary Matrix Decomposition (BMD) clustering was applied on randomly selected reads from all nine samples for each CRE32. The generalBMD function from bmdcluster python package was used, with the parameter b (number of points to initialize bootstrap seed points) set as 1% of the total number of reads, and parameter seed (the pseudorandom number generator) set as 1. The final cluster number was defined by increasing stepwise from 2 clusters until the number of reads in a cluster was less than 1% of all reads, and set such that no cluster contained less than 1% of the total number of reads. Once the number of clusters was set, reads were parsed based on their relative cluster and experimental group (WT ZT06, WT ZT18 and BMKO ZT06).

Computational definition of nucleosome protection

A custom python function named defineRegionCombinedSamples (provided as supplementary file 2) was generated to automatically define the protection profile as either nucleosome or exposed DNA (bound or not by a TF) for each cluster. A figure illustrating the criteria used in this study and described below is provided as Fig. S10. First, if the percentage of protection at three or more consecutive GpCs was larger than 60% (each of them), or if the percentage of protection at four or more consecutive GpCs was larger than 50% (each of them), then the position of the first and last GpCs of this putative nucleosome was determined. If the distance between the first and last GpCs was over 75 bp, or if the distance between the first and last GpCs was between 60 bp and 75 bp but the distance between the GpC before the first and the GpC after the last was over 120 bp, the footprint was defined as nucleosome protection.

Because nucleosome protection signal may be truncated at the 5’ or 3’ end of each CRE/amplicon, we applied two other criteria for this specific situation. For the first criteria, a footprint was defined as nucleosome if the percentage of protection was larger than 50% at two or more consecutive GpCs starting from the first or last GpC of the PCR amplicon (first 5’ GpC or last 3’ GpC), and if the distance between the first and last GpC of this footprint was over 20bp (many TFs have a footprint < 20 bp14). The second criteria looked for patterns having continuous protection on the same DNA molecule starting from the first or last GpC of the PCR amplicon (first 5’ GpC or last 3’ GpC). In the 5’ to 3’ direction, if the protection of the first GpC was over 50% and the protection of the next GpC was over 40%, then, we calculated whether the two protection events occurred on the same DNA molecule. This step was called “continuity value” (e.g., in the script provided in supplementary file 2), and is defined as the number of reads being protected at two consecutive GpCs on the same DNA molecule divided by the number of protected reads at the first of the two GpCs. If the resulting “continuity value” was over 70%, the process was repeated for the next GpC(s) until the value was less than 70% or the GpC percentage of protection was lower than 40%. Finally, we required for this second criteria that the distance between the first and last protected GpCs to be over 15bp to be defined as nucleosome. The same procedure was carried out starting from the 3’ end of the CRE amplicon, with criteria applied in the 3’ to 5’ direction.

Another additional analysis was then carried out to define nucleosomes, and relied on the continuity of protection events on the same DNA molecules as described above with what we called the “continuity value” . Starting from the 5’ end of the CRE/amplicon, the algorithm identified GpC(s) having a percentage of protection over 40% and not defined as nucleosome. If the percentage of protection of the next GpC (in 3’) was over 30%, then the algorithm calculated the continuity value between the two GpCs, and continued computing this value with the next GpCs in 3’ until the continuity value was lower than 70% or the percentage of protection of the next GpC was lower than 30%. If 5 consecutive GpCs matched those criteria and the distance between the first GpC and the last GpC was over 75 bp, this footprint was defined as a nucleosome. The algorithm was also run in reverse orientation, using criteria starting from the 3’ end of the amplicon and running toward the 5’ end.

After defining nucleosomes using all criteria described above, the algorithm started checking the undefined GpCs that were next to the nucleosomal GpCs. The aim of this step was to avoid mixing signals that were likely nucleosomes and being called as TF signals. At the GpC adjacent to the first GpC or last GpC of a nucleosome, if the percentage of protection was over 1%, then, the algorithm calculated the continuity value. If this value higher than 70%, the footprint at this GpC was defined as nucleosome. The algorithm then continuously looked at the next adjacent GpCs until the continuity value was lower than 70% or the percentage of protection at the GpC was less than 1%.

These criteria were applied all across our analyses, except in cluster 14 at Nr1d1 1st intron for the calculation of TF cooperativity (Fig 3d) and of levels of exposed DNA at CRE (Fig 4d, S9a), where the definition at four GpCs between 424 to 441 was curated to exposed DNA based on visual inspection.

Analysis of protection profiles by simple matching coefficient

A total of 1,692 reads for each sample were randomly selected for the CRE at Inmt TSS upstream enhancer (number of reads corresponding to half of the reads of the sample with the lowest number of reads). Simple matching coefficient was calculated between each pair of reads using the formula below:

SimplematchingcoefficientSMC=M00+M11M00+M11+M10+M01

M00: the total number of GpCs, where in read A and B both have value 0.

M11: the total number of GpCs, where in read A and B both have value 1.

M10: the total number of GpCs, where in read A has value 1 and read B has value 0.

M01: the total number of GpCs, where in read A has value 0 and read B has value 1.

A hierarchical clustering was then performed on the matrix of SMC between each pair of reads (total of 15,228 reads and over 115 million pairwise comparisons), using hierarchy function (parameters of method and metric was set as ‘average’ and ‘euclidean’, respectively) from scipy.cluster python package. The heatmap illustrating the output of the hierarchical clustering of SMC was plotted with an entry of the source of each read. The percentage of reads from each sample was calculated for the first ten branches of the dendrogram.

Analysis of protection profiles by PCA

A binary matrix with reads represented as vectors of binary values at GpCs was computed using reads at Inmt TSS upstream enhancer originating from all nine samples (GpCs output as a column, and each read corresponding to a row). PCA was performed using this binary matrix and the PCA function from sklearn.decomposition python package and the number of components set as the number of GCH within CRE. The output of the PCA analysis was then parsed for each sample, with PC1 plotted over PC2.

BMD Clustering of individual samples

BMD clustering was performed for each individual sample with reads mapping Inmt TSS upstream enhancer. Nucleosome footprints were defined for every cluster of each sample using the custom-made function defineRegionCombinedSamples (see above). Outputs of this cluster definition (as either nucleosome or exposed DNA) were merged for all clusters of all nine samples and used to carry out hierarchical clustering. Five major hierarchical clusters were used to group the BMD clustering-based definition of chromatin states.

Conversion efficiency

Conversion efficiency was performed on reads where cytosines at GCH position were sequenced as either GC or GT (but not GN, GA, or GG) and after PCR duplicates removal. Computation of bisulfite or enzymatic conversion efficiency was performed by calculating the percentage of cytosines at HCH positions sequenced as A, T, G, C or N at all for each CRE and for each sample, i.e., by dividing the number of A, T, G, C or N to the total number of HCH positions in all reads of each CRE per sample, as shown in Fig. S2.

Conservation of chromatin states

A biological replicate was considered represented in a cluster/chromatin state if at least 1% of the reads of the cluster were assigned to that sample. This criterion was used to calculate the number of biological samples found in each cluster at each CRE, as reported in Fig. 1d and S3a,b.

DNase-seq and ChIP-seq dataset analysis

Mouse liver DNase-seq (GSE3707457), and liver ChIP-seq datasets for BMAL1 (GSE3986031 and GSE11060228), CLOCK (GSE3986031 and GSE11748858), USF1 (GSE4460948), TFE3 (GSE16029259), MYC (GSE7604260), and bHLH40 (GSE20719961) were downloaded as fastq files and mapped to mouse genome version mm39 using bowtie2 (if read length ≥ 51 bp) or bowtie (of read length ≤ 50 bp). Duplicated reads were removed and bw files were generated by normalizing to the number of uniquely mapped reads to 10,000,000 reads for visualization.

Motif analysis

TF motif analysis was performed using DNA sequences of ~20 bp centered on the GpC(s) exhibiting TF protection using online Tomtom motif comparison tool (version 5.5.5).

Calculation of exposed DNA level at CREs

BMD clustering was performed for each CRE with reads from all nine samples downsized to the sample with the lowest number of reads. Then, protection profile was defined at each GpC in each cluster of each CRE using the custom-made function defineRegionCombinedSamples (see computational definition of nucleosome protection section above), with each GpC ending up defined as either nucleosome (N) or exposed (E). This step located the two GpCs at the edge of nucleosomes for every cluster (GpCs defined as N). To improve the definition of nucleosome coverage and avoid an overestimation of exposed DNA at CREs, we then retrieved the position of the GpC before the first and after the last GpCs of a nucleosome (these GpCs being defined as E), and extended the nucleosome position to the middle position between the two linked N and E GpCs. Finally, nucleosome occupancy in each cluster was calculated by dividing the length of DNA protected by nucleosome in a cluster to the total length of the CRE. The nucleosome occupancy of each cluster was multiplied to the percentage of reads of this cluster, which were then summed between all clusters of a CRE to represent the total nucleosome exposure of a CRE. The exposed DNA level of a CRE was finally calculated by subtracting the total nucleosome exposure of a CRE to 1. Formulas are provided below:

NucleosomeoccupancyofaclusterNO=totalnucleosomelengthinaclusterCRElength
clusterrationR=numberofreadswithinclusterCREtotalreads
Nucleosomelevel=i=0nRiNOi
n:numberofclustersataCRE
Exposedlevel=1Nucleosomelevel

Cooperativity between TFs

Calculation of the normalized extend of co-binding (N. EOC) between TFs at two GpCs was adapted from Rao et al., 20215 with slight modification, and carried out in clusters where both GpCs were not defined as nucleosomes using the formula:

PTFi=tfiNE
Pobserved=coboundNE
extendofcobindingEOC=PobservedPexpected=PobservedPTF1PTF2

Where:

tf: number of TF protected reads in clusters that GpC1 and GpC2 are not nucleosome.

NE: number of reads in clusters that GpC1 and GpC2 are not nucleosome.

cobound: number of reads cobound by TF1 and TF2 at GpC1 and GpC2.

Then the normalized extend of co-binding was calculated using the formula:

normalizationfactornorm=tf1+tf22NE
normalizedextendofcobindingN.EOC=normEOC

P-values were calculated by Fisher’s exact test (null hypothesis: protection at two GpCs are independent), followed by Benjamini/Hochberg correction to adjust the p value to multiple testing, using fdr_correction function from mne.stats python package, with the false discovery rate set as 0.05.

MNase-seq analysis

24 WT and 23 BMKO single-end mouse liver MNase-seq datasets from Menet et al., 201440 (GSE47145) were uploaded as fastq files and mapped to the mouse genome version mm39 using bowtie2. Reads located +/− 1 kb from each CRE were selected using samtools, and extended to 147 bp to map full length nucleosome. Sequences from all datasets were merged and sorted using bamtools and samtools, respectively. Finally, relative nucleosome signal was normalized to the total number of mapped reads from all 47 samples to generate bigwig (bw) files. A similar analysis was performed using sequence definition/position adjusted to cover 75 bp centered on the nucleosome dyad to better visualize nucleosome position.

Analysis of paired-end mouse liver low and high MNase-seq datasets (nuclei treated with low and high MNase concentration) from Iwafuchi-Doi et al., 201644 (GSE57559) was performed as above with only slight differences. Selection of sequencing reads included an additional step, where only concordant paired reads with a size between 100 bp and 200 bp and a bam-file flag number of 99 or 163 were selected. The analysis was performed on four merged high MNase-seq datasets (2 replicates C3H WT mice and 2 replicates from C57BL6 WT mice), and on four merge low MNase-seq datasets (2 replicates C3H WT mice and 2 replicates from C57BL6 WT mice). The bw files were generated by normalizing MNase-Seq signal to the total number of mapped reads having a bam-file flag of 99 or 163 from the four high MNase-seq or low MNase-seq datasets.

Comparison between SMF and MNase-seq nucleosome signals

SMF nucleosome signal for each sample at each GpC was calculated by adding the percentage of reads from clusters that were defined as nucleosome at that GpC. The averaged value from all nine samples was then calculated for each GpC of each CRE. The MNase-seq nucleosome signal at each GpC was extracted from the bw file generated from Menet et al., 2014 datasets as described above40, using reads extended to 147 bp for all 47 datasets. The MNase-seq nucleosome signal and averaged SMF nucleosome signal were calculated at a total of 453 GpCs from 18 CREs, and used to carry out the Pearson correlation.

Analysis of nucleosomes having an E-box at the entry/exit site

A custom python script was generated and applied to automatically select nucleosomes that had E-box(es) at the entry/exit site, followed by a limited curation based on visual inspection. Nucleosomes were initially selected if E-boxes were located 20 bp within a nucleosome as defined by our custom script defineRegionCombinedSamples (the edge of the nucleosome corresponding to a GpC location, see above), or 20 bp outside the nucleosome but without a GpC located between the E-box and the edge of the nucleosome. Then, some criteria were applied to further refine the selection. If the nucleosome location started at the first GpC in 5’ or 3’ of the CRE (i.e., first GpC amplified when carrying out the PCR of bisulfite converted DNA), and the E-box was also located on the 5’ or 3’ side, this nucleosome should protect over a range of 110 bp. Nucleosomes with protection over 200 bp were excluded (e.g., condensed chromatin) unless there was a dip in the percentage of protection between two observed nucleosomes. Nucleosomes representing less than 5% of all reads were excluded from downstream analysis.

Supplementary Material

Supplement 1
media-1.zip (32.1KB, zip)
Supplement 2

Acknowledgments

We thank members of the Menet lab for helpful discussions throughout the project; Alicia Michael, Paul Hardin, and Joseph Rodriguez for insightful suggestions and comments on the manuscript; and Ana Velasquez for her contribution at the early stage of this project. We also thank Texas A&M Institute for Genome Sciences and Society (TIGSS) for their help with sequencing SMF libraries. Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing. This work was supported by a NIH NIGMS grant (R01GM145737) and startup funds from Texas A&M University (JSM).

Footnotes

Conflict of interests

The authors have no competing interests to declare.

Code availability

The script used to align paired-end reads is provided as supplementary file 1, and the code used to computationally define nucleosome protection (defineRegionCombinedSamples) is provided as supplementary file 2.

Figure representation

Box-plot elements (Fig. 1d, 2d, 3a, S3d) were represented as follow: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, all data values.

Data availability

The accession number for the data reported in this paper is GSE255510.

References

  • 1.Junion G. et al. A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell 148, 473–486, doi: 10.1016/j.cell.2012.01.030 (2012). [DOI] [PubMed] [Google Scholar]
  • 2.Siersbaek R. et al. Transcription factor cooperativity in early adipogenic hotspots and super-enhancers. Cell reports 7, 1443–1455, doi: 10.1016/j.celrep.2014.04.042 (2014). [DOI] [PubMed] [Google Scholar]
  • 3.Tijssen M. R. et al. Genome-wide analysis of simultaneous GATA1/2, RUNX1, FLI1, and SCL binding in megakaryocytes identifies hematopoietic regulators. Developmental cell 20, 597–609, doi: 10.1016/j.devcel.2011.04.008 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.He Q. et al. High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species. Nature genetics 43, 414–420, doi: 10.1038/ng.808 (2011). [DOI] [PubMed] [Google Scholar]
  • 5.Rao S., Ahmad K. & Ramachandran S. Cooperative binding between distant transcription factors is a hallmark of active enhancers. Molecular cell 81, 1651–1665 e1654, doi: 10.1016/j.molcel.2021.02.014 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sonmezer C. et al. Molecular Co-occupancy Identifies Transcription Factor Binding Cooperativity In Vivo. Molecular cell 81, 255–267 e256, doi: 10.1016/j.molcel.2020.11.015 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lemon B. & Tjian R. Orchestrated response: a symphony of transcription factors for gene control. Genes & development 14, 2551–2569, doi: 10.1101/gad.831000 (2000). [DOI] [PubMed] [Google Scholar]
  • 8.Kim S. & Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Molecular cell 83, 373–392, doi: 10.1016/j.molcel.2022.12.032 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jolma A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388, doi: 10.1038/nature15518 (2015). [DOI] [PubMed] [Google Scholar]
  • 10.de Boer C. G. & Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 625, 41–50, doi: 10.1038/s41586-023-06661-w (2024). [DOI] [PubMed] [Google Scholar]
  • 11.Ibarra I. L. et al. Mechanistic insights into transcription factor cooperativity and its impact on protein-phenotype interactions. Nature communications 11, 124, doi: 10.1038/s41467-019-13888-7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mirny L. A. Nucleosome-mediated cooperativity between transcription factors. Proceedings of the National Academy of Sciences of the United States of America 107, 22534–22539, doi: 10.1073/pnas.0913805107 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhu F. et al. The interaction landscape between transcription factors and the nucleosome. Nature 562, 76–81, doi: 10.1038/s41586-018-0549-5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Vierstra J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736, doi: 10.1038/s41586-020-2528-x (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Miller J. A. & Widom J. Collaborative competition mechanism for gene activation in vivo. Molecular and cellular biology 23, 1623–1632 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Moyle-Heyrman G., Tims H. S. & Widom J. Structural constraints in collaborative competition of transcription factors against the nucleosome. Journal of molecular biology 412, 634–646, doi: 10.1016/j.jmb.2011.07.032 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Polach K. J. & Widom J. A model for the cooperative binding of eukaryotic regulatory proteins to nucleosomal target sites. Journal of molecular biology 258, 800–812, doi: 10.1006/jmbi.1996.0288 (1996). [DOI] [PubMed] [Google Scholar]
  • 18.Adams C. C. & Workman J. L. Binding of disparate transcriptional activators to nucleosomal DNA is inherently cooperative. Molecular and cellular biology 15, 1405–1421 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Vashee S., Melcher K., Ding W. V., Johnston S. A. & Kodadek T. Evidence for two modes of cooperative DNA binding in vivo that do not involve direct protein-protein interactions. Current biology : CB 8, 452–458 (1998). [DOI] [PubMed] [Google Scholar]
  • 20.Morgunova E. & Taipale J. Structural perspective of cooperative transcription factor binding. Curr Opin Struct Biol 47, 1–8, doi: 10.1016/j.sbi.2017.03.006 (2017). [DOI] [PubMed] [Google Scholar]
  • 21.Rijo-Ferreira F. & Takahashi J. S. Genomics of circadian rhythms in health and disease. Genome Med 11, 82, doi: 10.1186/s13073-019-0704-0 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bass J. & Lazar M. A. Circadian time signatures of fitness and disease. Science 354, 994–999, doi: 10.1126/science.aah4965 (2016). [DOI] [PubMed] [Google Scholar]
  • 23.Partch C. L., Green C. B. & Takahashi J. S. Molecular architecture of the mammalian circadian clock. Trends in cell biology 24, 90–99, doi: 10.1016/j.tcb.2013.07.002 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Takahashi J. S. Transcriptional architecture of the mammalian circadian clock. Nature reviews. Genetics, doi: 10.1038/nrg.2016.150 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ripperger J. A. & Schibler U. Rhythmic CLOCK-BMAL1 binding to multiple E-box motifs drives circadian Dbp transcription and chromatin transitions. Nature genetics 38, 369–374, doi: 10.1038/ng1738 (2006). [DOI] [PubMed] [Google Scholar]
  • 26.Panda S. Circadian physiology of metabolism. Science 354, 1008–1015, doi: 10.1126/science.aah4967 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Beytebiere J. R., Greenwell B. J., Sahasrabudhe A. & Menet J. S. Clock-controlled rhythmic transcription: is the clock enough and how does it work? Transcription, 1–10, doi: 10.1080/21541264.2019.1673636 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Beytebiere J. R. et al. Tissue-specific BMAL1 cistromes reveal that rhythmic transcription is associated with rhythmic enhancer-enhancer interactions. Genes & development 33, 294–309, doi: 10.1101/gad.322198.118 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Trott A. J. & Menet J. S. Regulation of circadian clock transcriptional output by CLOCK:BMAL1. PLoS genetics 14, e1007156, doi: 10.1371/journal.pgen.1007156 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rey G. et al. Genome-wide and phase-specific DNA-binding rhythms of BMAL1 control circadian output functions in mouse liver. PLoS biology 9, e1000595, doi: 10.1371/journal.pbio.1000595 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Koike N. et al. Transcriptional architecture and chromatin landscape of the core circadian clock in mammals. Science 338, 349–354, doi: 10.1126/science.1226339 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li T. A general model for clustering binary data. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 188–197 (2005). [Google Scholar]
  • 33.Consortium E. P. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710, doi: 10.1038/s41586-020-2493-4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Struhl K. & Segal E. Determinants of nucleosome positioning. Nature structural & molecular biology 20, 267–273, doi: 10.1038/nsmb.2506 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Iyer V. R. Nucleosome positioning: bringing order to the eukaryotic genome. Trends in cell biology 22, 250–256, doi: 10.1016/j.tcb.2012.02.004 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Suter D. M. et al. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474, doi: 10.1126/science.1198817 (2011). [DOI] [PubMed] [Google Scholar]
  • 37.Tunnacliffe E. & Chubb J. R. What Is a Transcriptional Burst? Trends in genetics : TIG 36, 288–297, doi: 10.1016/j.tig.2020.01.003 (2020). [DOI] [PubMed] [Google Scholar]
  • 38.Rodriguez J. & Larson D. R. Transcription in Living Cells: Molecular Mechanisms of Bursting. Annual review of biochemistry 89, 189–212, doi: 10.1146/annurev-biochem-011520-105250 (2020). [DOI] [PubMed] [Google Scholar]
  • 39.Schones D. E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898, doi: 10.1016/j.cell.2008.02.022 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Menet J. S., Pescatore S. & Rosbash M. CLOCK:BMAL1 is a pioneer-like transcription factor. Genes & development 28, 8–13, doi: 10.1101/gad.228536.113 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Brahma S. & Henikoff S. Epigenome Regulation by Dynamic Nucleosome Unwrapping. Trends Biochem Sci 45, 13–26, doi: 10.1016/j.tibs.2019.09.003 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Xi Y., Yao J., Chen R., Li W. & He X. Nucleosome fragility reveals novel functional states of chromatin and poises genes for activation. Genome research 21, 718–724, doi: 10.1101/gr.117101.110 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jeffers T. E. & Lieb J. D. Nucleosome fragility is associated with future transcriptional response to developmental cues and stress in C. elegans. Genome research 27, 75–86, doi: 10.1101/gr.208173.116 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Iwafuchi-Doi M. et al. The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. Molecular cell 62, 79–91, doi: 10.1016/j.molcel.2016.03.001 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sobel J. A. et al. Transcriptional regulatory logic of the diurnal cycle in the mouse liver. PLoS biology 15, e2001069, doi: 10.1371/journal.pbio.2001069 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Paquet E. R., Rey G. & Naef F. Modeling an evolutionary conserved circadian cis-element. PLoS computational biology 4, e38, doi: 10.1371/journal.pcbi.0040038 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Michael A. K. et al. Cooperation between bHLH transcription factors and histones for DNA access. Nature 619, 385–393, doi: 10.1038/s41586-023-06282-3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shimomura K. et al. Usf1, a suppressor of the circadian Clock mutant, reveals the nature of the DNA-binding of the CLOCK:BMAL1 complex in mice. Elife 2, e00426, doi: 10.7554/eLife.00426 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Honma S. et al. Dec1 and Dec2 are regulators of the mammalian molecular clock. Nature 419, 841–844, doi: 10.1038/nature01123 (2002). [DOI] [PubMed] [Google Scholar]
  • 50.Altman B. J. et al. MYC Disrupts the Circadian Clock and Metabolism in Cancer Cells. Cell metabolism 22, 1009–1019, doi: 10.1016/j.cmet.2015.09.003 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.de Martin X., Sodaei R. & Santpere G. Mechanisms of Binding Specificity among bHLH Transcription Factors. Int J Mol Sci 22, doi: 10.3390/ijms22179150 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lambert S. A. et al. The Human Transcription Factors. Cell 172, 650–665, doi: 10.1016/j.cell.2018.01.029 (2018). [DOI] [PubMed] [Google Scholar]
  • 53.Menet J. S., Rodriguez J., Abruzzi K. C. & Rosbash M. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation. eLife 1, e00011, doi: 10.7554/eLife.00011 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Masri S., Orozco-Solis R., Aguilar-Arnal L., Cervantes M. & Sassone-Corsi P. Coupling circadian rhythms of metabolism and chromatin remodelling. Diabetes Obes Metab 17 Suppl 1, 17–22, doi: 10.1111/dom.12509 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Doi M., Hirayama J. & Sassone-Corsi P. Circadian regulator CLOCK is a histone acetyltransferase. Cell 125, 497–508, doi: 10.1016/j.cell.2006.03.033 (2006). [DOI] [PubMed] [Google Scholar]
  • 56.Danecek P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, doi: 10.1093/gigascience/giab008 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Vierstra J. et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science 346, 1007–1012, doi: 10.1126/science.1246426 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hong H. K. et al. Requirement for NF-kappaB in maintenance of molecular and behavioral circadian rhythms in mice. Genes & development 32, 1367–1379, doi: 10.1101/gad.319228.118 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Gosis B. S. et al. Inhibition of nonalcoholic fatty liver disease in mice by selective inhibition of mTORC1. Science 376, eabf8271, doi: 10.1126/science.abf8271 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kress T. R. et al. Identification of MYC-Dependent Transcriptional Programs in Oncogene-Addicted Liver Tumors. Cancer research 76, 3463–3472, doi: 10.1158/0008-5472.CAN-16-0316 (2016). [DOI] [PubMed] [Google Scholar]
  • 61.van den Berg L. et al. Sugar-responsive inhibition of Myc-dependent ribosome biogenesis by Clockwork orange. Cell reports 42, 112739, doi: 10.1016/j.celrep.2023.112739 (2023). [DOI] [PubMed] [Google Scholar]
  • 62.Atger F. et al. Circadian and feeding rhythms differentially affect rhythmic mRNA transcription and translation in mouse liver. Proceedings of the National Academy of Sciences of the United States of America 112, E6579–6588, doi: 10.1073/pnas.1515308112 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.zip (32.1KB, zip)
Supplement 2

Data Availability Statement

The accession number for the data reported in this paper is GSE255510.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES