Pioneer factor-nucleosome binding events during differentiation are motif-encoded

Michael P Meers; Derek H Janssens; Steven Henikoff

doi:10.1016/j.molcel.2019.05.025

. Author manuscript; available in PMC: 2020 Aug 8.

Published in final edited form as: Mol Cell. 2019 Jun 25;75(3):562–575.e5. doi: 10.1016/j.molcel.2019.05.025

Pioneer factor-nucleosome binding events during differentiation are motif-encoded

Michael P Meers ¹, Derek H Janssens ¹, Steven Henikoff ^1,^2,^3,^*

PMCID: PMC6697550 NIHMSID: NIHMS1532243 PMID: 31253573

Summary

Though the in vitro structural and in vivo spatial characteristics of transcription factor (TF) binding are well defined, TF interactions with chromatin and other companion TFs during development are poorly understood. To analyze such interactions in vivo, we profiled several TFs across a time course of human embryonic stem cell differentiation, and studied their interactions with nucleosomes and co-occurring TFs by Enhanced Chromatin Occupancy (EChO), a computational strategy for classifying TF interactions with chromatin. EChO shows that multiple individual TFs can employ either direct DNA binding or “pioneer” nucleosome binding at different enhancer targets. Nucleosome binding is not exclusively confined to inaccessible chromatin, but rather is correlated with local binding of other TFs, and with degeneracy at key bases in the pioneer factor target motif responsible for direct DNA binding. Our strategy reveals a dynamic exchange of TFs at enhancers across developmental time that is aided by pioneer nucleosome binding.

Keywords: Pioneer factors, chromatin, differentiation, transcription factors

Graphical Abstract

graphic file with name nihms-1532243-f0001.jpg

eTOC Blurb

Meers et al. use a novel analysis strategy for genome-wide protein-DNA binding data to identify instances of “pioneer factor” binding to nucleosomes during stem cell differentiation. They show that pioneer factor-nucleosome binding occurs in the absence of strong binding motifs, often at accessible sites previously bound by other transcription factors.

Introduction

Developmental transcription factors (TFs) direct cell fate by binding to DNA motifs within complex networks of enhancers, to activate enhancer-regulated developmental gene expression programs (Spitz and Furlong, 2012). Such networks often depend upon the concerted activity of several different TFs expressed in specific spatial and temporal windows throughout development (St Johnston and Nusslein-Volhard, 1992). Developmental TFs must find their binding sites in a genome that is decorated with millions of nucleosomes that each wrap roughly 150 base pairs of DNA in a highly stable configuration (Kornberg and Lorch, 1999). Since nucleosome occupancy is often refractory to TFs binding at their motif targets, nucleosomes must be actively removed or otherwise outcompeted by TFs in order to access their enhancer targets during developmental transitions (Voss and Hager, 2014). It has been hypothesized that “Pioneer” TFs are able to bind to DNA wrapped in nucleosomes at inactive enhancers to facilitate their displacement (Zaret and Carroll, 2011). Though there is substantial evidence in vitro indicating that TF-nucleosome interactions occur (Cirillo et al., 2002; Soufi et al., 2015; Zhu et al., 2018), it remains difficult to verify whether such interactions are employed regularly at natural enhancers in vivo, or whether TFs instead bind opportunistically during episodes of chromatin remodeling or DNA unwrapping from the nucleosome (Morris et al., 2014; Polach and Widom, 1995). Thus, there are at least two possible “modes” through which TFs can access their targets, via direct DNA binding or via nucleosome binding, whose relative usage during development is difficult to distinguish.

We recently developed CUT&RUN, an in situ chromatin profiling method that uses antibody-targeted Micrococcal Nuclease (MNase) to specifically liberate DNA fragments bound by a target protein, thus greatly reducing background noise and onerous processing steps associated with whole-genome chromatin fragmentation in ChIP-seq (Janssens et al., 2018; Skene et al., 2018; Skene and Henikoff, 2017). In contrast with ChIP-seq, which relies upon random chromatin fragmentation, CUT&RUN preserves information about the size of sequenced fragments, defined as the mapped distance between the 5’ and 3’ termini of a read pair, which reflects the protein barriers to processive MNase digestion (Kent et al., 2011; Noll, 1974). Size information from MNase-seq data has been used to infer the location of nucleosomes and non-nucleosomal particles at base pair resolution, based on their protection of DNA fragments of different sizes (Henikoff et al., 2011; Kent et al., 2011). Fragment-size analysis of CUT&RUN data indicates that small fragments of less than 120 base pairs represent direct TF contacts with DNA, whereas fragments larger than 150 bp indicate nucleosomal protection, enabling the elucidation of molecular substructure within genomic regions of enriched CUT&RUN signal (Skene and Henikoff, 2017). Fragment size therefore represents an attractive analysis tool for classifying the chromatin structure at TF binding sites to distinguish modes of binding in vivo.

Here we use CUT&RUN to profile several TFs that engage inactive enhancer chromatin during differentiation of human embryonic stem cells (hESCs) to definitive endoderm (DE), for which TF binding and gene expression signatures of differentiation are well known (Teo et al., 2011; Tsankov et al., 2015). We identify several multi-TF co-regulatory “modules” indicative of broad enhancer transitions during endoderm differentiation, and introduce a computational strategy to analyze the structure of TF-chromatin encounters in those modules, which we dubbed Enhanced Chromatin Occupancy (EChO). EChO accurately summarizes direct binding and nucleosome enrichment surrounding a variety of TFs, and infers the likelihood of TFs co-binding simultaneously in a complex versus binding in a mutually exclusive fashion to the same locus. When used to analyze changes in the interactions between TFs and chromatin at the same loci across differentiation, EChO identifies multiple modes of TF binding in chromatin, including both direct binding to DNA to the exclusion of nucleosomes, and “pioneer” TF-nucleosome binding and eviction. Surprisingly, we observe that multiple individual TFs use different modes of binding at different enhancer targets. Nucleosome binding vs. direct DNA binding by pioneer factors occurs independent of chromatin accessibility at their targets, and nucleosome binding is common at enhancers to which other factors bind throughout development. We further find that pioneer interactions with chromatin are correlated with binding motif strength, and that nucleosome binding targets with weaker motifs are accompanied by other motifs for TFs that exhibit similar pioneer nucleosome binding propensity. We conclude that pioneer factors trigger a temporal cascade of TF-nucleosome binding events, which work in a concerted manner to coordinate binding of enhancers across developmental time.

Results

Temporal chromatin profiling identifies stage-specific, multi-factor binding modules

We used CUT&RUN to profile replicates of 10 transcription factors, three histone PTMs, and IgG controls at five 24-hour time points during a time course of H1 hESC differentiation to DE. This generated a total of 135 datasets representing 65 different factor-timepoint combinations (Figure 1A, Table S1). Examination of differentiation time points by immunofluorescence showed a gradual reduction in expression of pluripotency factors such as Oct4, and a concomitant increase in DE factors such as Sox17 (Figure S1A), confirming the effectiveness of the differentiation protocol. This trend is also seen in spike-normalized CUT&RUN data (Figure 1B), indicating that CUT&RUN profiling accurately reflects developmental TF binding across differentiation.

Figure 1: — A) Schematic of H1 hESC to definitive endoderm (DE) differentiation scheme. New H1 cells were seeded on each of five consecutive days, and incubated in DE culture medium for different periods of time before harvesting all cultures on Day 5 of incubation and carrying out CUT&RUN for the indicated targets in each of the cultures. B) Left: CUT&RUN data across a time course of hESC-to-DE differentiation for each factor indicated in a 4 kb window around their EChO foci. Foci were defined in the following time points: Day 1 for Sox2, Nanog and Oct4, Day4 for GATA4 and EOMES, and Day 5 for Sox17, FoxA2, FoxA1, and CTCF. Right: Genome browser screenshots of time course CUT&RUN for select factors indicated at representative loci. See also Figure S1.

We used SEACR (Meers et al. 2019) to call enriched regions from all 65 factor-timepoint combinations, and identified the 26 most informative combinations, meeting a threshold of at least 4000 peaks called. As expected, binding of pluripotency factors such as Oct4, Sox2 and Nanog was confined largely to H1 cells, whereas “intermediate” regulators including EOMES and GATA-4 were most enriched between Day 2 and Day 4, and DE-specific regulators such as Sox17 and FoxA1/FoxA2 were most enriched at Day 4 and Day 5 (Figure 1B). Correlation analysis of overlapping factor-timepoint enriched regions revealed a handful of coordinated factor-timepoint combinations (Figure 2A). For instance, CTCF binding was highly coordinated throughout the time course, indicative of its universal expression and strong affinity for its targets. In contrast, factor binding events specific to stem cells, “intermediate” stage, or endoderm were broadly grouped by time point rather than factor; H1-specific (Oct4, Sox2, Nanog) and Day 2-specific (EOMES, GATA-4, and Nanog) groupings were clearly delineated, and Day3-, Day4-, and DE-specific combinations could be distinguished within broad correlation between the three days, indicating a temporal progression of regulatory element usage (Figure 2A).

Figure 2: — A) Hierarchical clustering of Jaccard Similarity scores of a genome-wide enrichment matrix for 26 factor-timepoint combinations. Dendrogram is displayed at left; colored boxes denoting time point (Day1–Day 5) are displayed at right; black dashed-line boxes denote closely correlated factor-timepoint combinations as identified by hierarchical clustering. B) K-modes clustering of a binary peak overlap matrix for 82472 genomic regions overlapped by any one of 26 factor-timepoint combinations described in Figure 2A. Colored boxes denoting time point in order (with the exception of CTCF time points) are displayed at bottom. Coordinated TF modules are separated by dashed black lines, and informative factor-timepoint clusters for each module are highlighted by solid black boxes. See also Figure S2.

To determine the combinations of TFs that bind to specific loci across development, we clustered 82472 individual loci that showed enrichment for any one of the 26 informative factor-timepoint combinations previously identified (Figure 2B). Individual loci could largely be grouped into stage-specific “modules” of TF co-enrichment that clustered by time point, with the exception of constitutive CTCF-bound loci and a partitioning of the Day 5 cluster by enrichment of FoxA1/2 (Day 5 GATA4/Sox17/EOMES/FoxA (GSEF), vs. Day 5 GSE). The presence of stage-specific TF binding modules at particular loci largely matched the expression patterns of nearby genes. For instance, enhancers for pluripotency genes such as OCT4 and SOX2 were grouped in a Day 1-specific module characterized by Oct4, Sox2 and Nanog binding in H1, mesendoderm factors such as T were overlapped by a module defined by multi-factor binding in Day 2, and endoderm specific genes such as CXCR4 and SOX17 were specifically overlapped by later endoderm modules (Figure S2A–C). Additionally, small subsets of loci were overlapped by factor binding across Day 3 through Day 5 (Day 3–5), or across all timepoints (All). The latter loci were often nearby genes for factors previously identified to be broadly important for germ layer specification, including EOMES, OTX2, and MYC (Figure S2D). These results confirm that multiple factors collaborate differentially across developmental time to regulate enhancer usage, reminiscent of previously identified multi-factor ensembles that regulate developmentally significant enhancers (Tsankov et al., 2015; Whyte et al., 2013).

EChO uses average fragment size to characterize TF binding within chromatin

Sequenced fragment size can be used to differentiate direct TF binding to DNA from nucleosomal wrapping of DNA in both MNase-seq (Henikoff et al., 2011; Kent et al., 2011) and CUT&RUN (Skene and Henikoff, 2017). To use fragment size to characterize TF-chromatin interactions at individual loci, we developed a locally weighted averaging approach to approximate average fragment size from enriched regions with sufficient read depth, which we dubbed Enhanced Chromatin Occupancy (EChO) (Figure 3A–C, Figure S3A–C, STAR Methods). For a variety of CUT&RUN experiments profiling different chromatin-associating proteins, we used EChO to identify local minima in the locally weighted average fragment size profiles for each enriched region defined by SEACR, which we termed region “foci” (Figure 3A–C). Using K562 CTCF ChIP-exo data from a previous study (Rhee and Pugh, 2011) to define nuclease-limited ChIP-exo enriched sites as a proxy for CTCF binding, EChO foci were much closer matches to the ChIP-exo sites than summits defined by Macs2 (Figure S3D–E). Moreover, the observed distributions of average fragment sizes at foci defined from CUT&RUN datasets profiling different chromatin proteins mirrored their expected interactions with DNA. For instance, CUT&RUN from transcription factors such as Sox2 exhibited largely sub-nucleosomal sized foci (i.e. <150 bp) reflective of direct DNA binding; “active” histone post-translational modifications (PTMs) such as H3K4me2 were majority nucleosomal with a small-fragment tail likely reflective of either TF binding or sub-nucleosomal particles at active regions (Ramachandran et al., 2017); and “repressive” PTMs such as H3K27me3 and their associated non-histone cofactors such as SUZ12 were almost exclusively represented by large fragment foci, reflective of compacted chromatin (Figure 3D). These results indicate that EChO foci reflect sites of minimal DNA protection by protein particles.

Figure 3: — A) Schematic of EChO methodology. In addition to traditional signal density (“Fragment counts”), EChO generates a continuous average fragment size profile (“Fragment size”), from which minimum local fragment size “foci” (denoted by red arrows) can be detected. B) Top: Schematic of fragment size averaging strategy, which requires a balance between minimizing error (left) and minimizing likelihood of overfitting (i.e. the ”roughness” of the predicted average fragment size trendline) (right). Bottom: Plot of standard error vs. roughness for different span values used in locally weighted averages of fragment size across a sample CTCF enriched region. Optimal span is determined by calculating the point with maximum orthogonal euclidean distance (red lines) below the line defined by the points representing minimum values for standard error and roughness (blue dashed line). C) Fragment position vs. size plots for the sample region from B) overlaid with EChO average fragment size profiles at different span values. Fully minimized roughness (top row) systematically overestimates fragment size at informative small fragment clusters (red boxes), whereas fully minimized standard error (bottom row) overfits a “phantom” small fragment region (blue box). D) Density plots describing distributions of average fragment size at all detected foci for each CUT&RUN experiment indicated. Vertical dashed black line represents 150 bp average fragment size. See also Figure S3.

We sought to determine whether EChO could classify specific TF enriched regions as direct binding or nucleosomal based on fragment size as previously reported (Skene and Henikoff, 2017). When we analyzed CTCF CUT&RUN data generated from either K562 cells in a previous study (Janssens et al. 2018) or from H1 hESCs in this study, in both cases the average fragment size reported at foci was largely bimodally distributed into small- and large-fragment populations (Figure 3D, Figure 4A). We hypothesized that the small- and large-fragment foci represent direct CTCF binding sites and flanking nucleosomes, respectively, both of which were previously identified in CTCF CUT&RUN (Skene and Henikoff, 2017). Indeed, K562 cell CTCF foci of smaller average fragment size corresponded almost exclusively to sites of strong CTCF motif residence (Figure S4A), whereas foci of larger fragment size lacked strong motif content and frequently marked sites flanking the small-fragment foci (Figure 4B, Figure S4A). These results indicate that direct factor binding sites vs. sites of nucleosome occupancy can be distinguished based on fragment size alone.

Figure 4: — A) Top: EChO average fragment size profile for sample CTCF region (gold line), with foci highlighted (red and green boxes). Bottom: Histogram and tripartite Gaussian Mixture Model (GMM) describing calculated average fragment size at foci defined from CTCF CUT&RUN data from K562 cells. Two of the three gaussian components unambiguously represent small (red) and large (green) fragment size foci. B) Top: Density plot of distance between CTCF foci classified by GMM grouping and the nearest CTCF ChIP-exo summit as defined by MACS2. Bottom: Schematic of hypothesized protein structure based on EChO focus fragment sizes. C) Schematic of EChO overlap analysis. Left: Loci at which two factors bind simultaneously or in sequence at the same site are considered “Shared”, and will exhibit high correlation and low cross-correlation lag between EChO profiles at foci. Right: Loci at which two factors bind independently of one another are considered “Specific”, and will exhibit low correlation and high cross-correlation lag between EChO profiles at foci. D) Cumulative distribution plots of Pearson’s correlation (left) and Cross-correlation lag (right) for genome-wide comparisons between pairs of factors at foci, classified by modules identified in Figure 2B, as well as CTCF, Myc-Max, FoxA2 Day5-OSN Day1 comparisons as controls. E) Genome browser screenshot and EChO profiles of Oct4, Sox2 and Nanog CUT&RUN in Day 1 at a subregion of the SOX2 locus. Shared and Sox2-specific foci are indicated by arrows. See also Figure S4.

Since EChO uses fragment size to map the most likely sites of direct binding within enriched regions at near base-pair resolution, we hypothesized that the similarity of fragment size distributions between different factors within the same co-enriched region might infer simultaneous co-binding of those factors (or sequential binding at different time points) versus the factors binding independently of each other (Figure 4C). Consistent with this hypothesis, there was high similarity (as defined by Pearson’s R² and cross-correlation lag) between overlapping EChO profiles for all CTCF time points, consistent with CTCF constitutively binding the same targets, as well as between c-Myc and Max, which form a heterodimer at their binding sites (Blackwood and Eisenman, 1991) (Figure 4D). In contrast, profiles for Oct4, Sox2, or Nanog (OSN) and FoxA2 in distinct time points showed consistently low correlation and high cross-correlation lag. A closer examination of individually mapped fragments confirms that, whereas matched EChO profiles have largely overlapping small fragment densities at foci (Figure S4B–C), unmatched TF pairs have fragments for one TF that are more sparsely distributed around the foci for the other TF, indicating that direct binding for each TF may occur at adjacent sites independently of one another (Figure S4D–E). Curiously, we found that the majority of factor pairs exhibited low correlation and high cross-correlation lag, presenting the possibility that a significant proportion of TF binding events are TF-specific despite widespread co-enrichment between different TFs (Figure 4D). Interestingly, “shared” (i.e. matched EChO profiles) and “specific” (unmatched) foci occur in close proximity to each other in some cases, including Oct4, Sox2, and Nanog binding at the SOX2 locus (Figure 4E). Moreover, Sox2 foci that were shared with Oct4 were more likely than Sox2-specific foci to possess a strong Oct4-Sox2 tandem motif previously shown to facilitate binding of an Oct4-Sox2 heterodimer (Remenyi et al., 2003), indicating that shared/specific designations are dictated by motif incidence (Figure S4F). In all, these results show that EChO can be used to accurately characterize a variety of TF binding configurations in chromatin, including direct binding, nucleosomal protection, and multi-TF co-enrichment.

Pioneer TFs access their targets by both nucleosome binding and direct DNA binding

Pioneer TF binding to nucleosomes is thought to occur at the onset of TF expression to displace nucleosomes from inactive enhancers, and render them accessible for binding of secondary TFs (Zaret and Carroll, 2011). FoxA2 is well known to belong to a class of putative pioneer factors that can bind nucleosomes in vitro, and occupy sites that are depleted of DNase hypersensitivity signal in vivo (Cirillo et al., 2002; Iwafuchi-Doi et al., 2016), and GATA4, EOMES, and HMG box transcription factors such as Sox17 have all been similarly implicated in possessing characteristics of pioneer factors (Cirillo et al., 2002; Soufi et al., 2015; Zhu et al., 2018). Therefore, using the fragment size characteristics of EChO, we sought to test whether the intermediate and endoderm factors we profiled engage in pioneer nucleosome binding early in differentiation before directly binding their targets. We hypothesized that such events would manifest in a nucleosome-sized EChO profile at an early time point, followed by a small-fragment, direct DNA binding profile at later time points (Figure 5A). Therefore, we restricted our analysis to foci that exhibited small fragment binding (<120 bp average fragment size) at a late time point, and analyzed differences in EChO size profiles at the time points preceding it. We analyzed FoxA2 foci in this way and found that, whereas many of them exhibited small fragment binding throughout the time course, indicative of direct DNA binding early on, 29.4% of them exhibited an average fragment size larger than 120 bp in the earliest time point, suggestive of pioneer nucleosome binding (Figure 5B). We found similar results for each of GATA4, EOMES, and Sox17, where a subset of binding events exhibited a minimum average fragment size greater that 120 bp in the earliest time point (48.3%, 26.9%, and 36.4%, respectively) (Figure 5B). As a control, we examined CTCF fragment size trajectories across the entire time course at Day 5 CTCF direct binding foci, and found that no more than 9% exhibited an average fragment size greater than 120 bp at any prior time point, likely within the margin of error for the technique (discussed in STAR Methods) and consistent with CTCF remaining consistently bound at its targets (Figure 5B). These data suggest that, rather than binding all targets uniformly, pioneer TFs can employ either direct DNA binding or pioneer nucleosome binding to varying degrees to access particular targets.

Figure 5: — A) Schematic of strategy for identifying direct vs. nucleosomal TF binding to targets. Sites of direct, small-fragment binding at a terminal timepoint may be preceded by small fragment binding (orange arrow) or by large fragment binding (teal arrow) at early time points, corresponding to direct DNA binding or nucleosomal binding to access their targets, respectively. B) Violin plot describing distribution of fragment sizes in multiple time points (denoted by colors at top) at foci defined in the “late” time point (Day 5 for CTCF, Sox17, and FoxA2; Day 4 for GATA4 and EOMES). Black solid lines indicate median value for each violin. Black dashed line indicates 120 bp average fragment size; brackets describe the percentage of foci in the early time point for each factor that exhibit an average fragment size greater than 120 bp. C) Top: Density plot of centers of FoxA2 fragments >150 bp in length in a window around FoxA2 foci, divided into quartiles base on Early timepoint average fragment size, where Quartile 1 indicates largest fragment size (nucleosome binding) and Quartile 4 indicates smallest fragment size (direct DNA binding). Middle: Expansion of highlighted regions, demonstrating enrichment of fragments at Quartile 1 foci in the Early timepoint that is absent from Quartile 4 foci. Bottom: Schematic describing quartile partitioning. Factor foci defined in the latest analyzed time point were partitioned into quartiles based on average fragment size in the earliest analyzed time point. See also Figure S5.

To confirm that individual nucleosome-sized fragments are enriched at putative nucleosome binding sites, we examined FoxA2 foci more closely by first separating them into quartiles based on the observed fragment size at the earliest time point analyzed, where Quartile 1 represented the putative large-fragment binding of nucleosome binding events and Quartile 4 represented initial small-fragment binding of direct DNA binding events (Figure S5A). We then mapped the centers of FoxA2 CUT&RUN fragments larger than 150 bp in length from the earliest and latest time points in a window around each focus, to determine the local density of nucleosome-sized fragments directly around the eventual direct binding site. At the earliest time point, the nucleosomal quartile exhibited an enrichment of large fragments directly over the focus that was lacking in the direct binding quartile, and this disparity was eliminated by the latest time point (Figure 5C, Figure S5B). These data demonstrate that early in the time course, nucleosome binding events can be identified by the presence of nucleosome-sized fragments at the eventual terminal-binding focus. Importantly, rather than being represented exclusively by nucleosome-sized fragments, the nucleosomal quartile contains a mixture of small and large fragments centered at binding foci in the earliest time point (Figure S5C), consistent with a shift in the equilibrium between direct and nucleosome binding events at the single-molecule level. These results show that nucleosome binding is employed at a subset of binding sites by multiple developmental TFs.

Pioneer nucleosome binding is sensitive to motif strength rather than chromatin configuration

Sites of pioneer nucleosome interactions are thought to be uniquely inaccessible prior to TF binding. To determine differences in chromatin conformation prior to and during factor binding at sites of different initial fragment size, we profiled transposase accessibility by ATAC-seq (Buenrostro et al., 2013) and “active” histone post-translational modifications (PTMs) H3K27ac and H3K4me2 by CUT&RUN, as well “repressive” PTM H3K27me3 by CUT&RUN, across all time points. Accessibility-associated signal increased at sites of intermediate and endoderm TF binding across the time course as expected (Figure 6A, Figure S6A–C). However, contrary to expectation, binding sites were accessible and enriched for both active and repressive marks in undifferentiated ES cells, and nucleosome binding targets were not depleted of ATAC-seq, H3K27ac or H3K4me2, or enriched for H3K27me3 at earlier time points relative to direct targets (Figure 6A, Figure S6A–C). Instead, the nucleosomal quartile for Sox17, and to a lesser extent GATA4, exhibited higher ATAC-seq signal across the time course than all other quartiles (Figure S6A–C). Close investigation of the fragment sizes of H3K4me2 CUT&RUN at FoxA2 foci across the time course revealed that fragment size distributions were similar across all FoxA2 quartiles, indicating an equal likelihood of nucleosome occlusion across all binding sites (Figure S6D). These results indicate that differences in the propensity for nucleosome vs. direct DNA binding are not dictated by prior chromatin configuration.

Figure 6: — A) Average plots of ATAC-seq (top row) and CUT&RUN for H3K27ac (2^nd row), H3K4me2 (3^rd row), and H3K27me3 (bottom row) at FoxA2 quartiles at five time points spanning hESC differentiation to endoderm. B) Boxplots describing motif score (as defined by FIMO) of the strongest motif in a 200 bp window surrounding each focus. Score is in comparison with a JASPAR-based refined position-weight matrix for each factor. Upper and lower box boundaries represent 75^th and 25^th percentile of distribution (i.e. upper boundary of 2^nd quartile and lower boundary of third quartile, or inner-quartile distance (IQR)); notches represent 1.58*IQR/sqrt(n); bars represent 1.5*IQR. C) Top: Position probability sequence logos for optimal (left) and degenerate (right) FoxA motifs identified in FoxA2 CUT&RUN. Bottom: fraction of FoxA2 motifs proximal to nucleosomal foci from Quartile 1 (“Nuc”, top row) or to direct binding foci from Quartile 4 (“Direct”, bottom row) that correspond to the “optimal” or “degenerate” motif categories. D) Fraction of direct or nucleosomal FoxA2 foci that contain a perfect match for the indicated motif within 200 bp of the focus. See also Figures S6 and S7.

DNA sequence might serve as an intrinsic signal that dictates binding mode in the absence of differences in chromatin accessibility. To determine differences in sequence at TF binding foci with different temporal binding strategies, we used the JASPAR database (Khan et al., 2018) to identify a composite motif position-weight matrix (PWM) corresponding to each factor, refined those PWMs to a “minimal” PWM composed of base positions with the highest information content from the composite, and used FIMO (Grant et al. 2011) to identify the closest match to the factor’s minimal PWM in a 200 bp window around each focus. For all factors analyzed, we found that motifs at nucleosome binding sites did not match the minimal PWM as closely as those at direct binding sites (Figure 6B). We were additionally able to use our fragment size-based focus prediction to measure the distance from the closest matching motif in the 200 bp window around the binding focus to the focus itself. We found that, for EOMES and FoxA2, nucleosomal foci were farther on average from the strongest local motif than direct binding foci (Figure S6E). Together, these results suggest that pioneer factors engage their targets via different binding modes based on the strength and positioning of the motifs they engage.

We analyzed the sequence content of FoxA motifs (composed of the “optimal” highest PWM-scoring sequence T₁G₂T₃T₄T₅A₆C₇) at FoxA2 binding sites more closely, and found that, whereas nearly 41% of motifs at direct binding sites were a perfect match for the “optimal” sequence, fewer than 19% of motifs at nucleosome binding sites met this criterion, and 51% of motifs were degenerate in at least two base positions (Figure 6C). FoxA2 is predicted to engage in various sequence-specific major groove contacts with the terminal T₅A₆C₇ base pairs in the highest scoring motif, chiefly via Histidine 209 (H209) in helix 3 of its winged-helix domain (Li et al., 2017). H209 is orthologous to a Histidine residue in FoxA1 that is impaired in direct motif binding when it is mutated with a nearby Asparagine (Sekiya et al., 2009). Whereas 40.9% of FoxA motifs at direct binding foci are degenerate in at least one base in the T₅A₆C₇ trinucleotide, more than 65% of nucleosomal FoxA motifs meet that criterion, indicating that motifs at nucleosomal sites are degenerate in base pairs that are most responsible for the integrity of the direct major groove contact. In contrast, direct and nucleosomal motifs are similarly degenerate at the proximal G₂T₃ (26.5% vs. 30.4%), which make contacts with the Wing 1 domain that also functions in nonspecific DNA backbone contacts that are important for nucleosome binding (Li et al., 2017; Sekiya et al., 2009). Thus, motif degeneracy occurs in base pairs that make contacts with protein residues most responsible for sequence-specific direct DNA binding. Our results indicate that motif content is a strong correlate with pioneer factor nucleosome engagement strategy.

Nucleosome binding occurs in the presence of other TF binding events

Pioneer factors are expected to engage target enhancers to render them accessible for other factors to bind; therefore, we hypothesized that pioneer nucleosome binding would occur at enhancers that are not previously bound by other factors. However, when we classified fragment size quartiles for each factor by the co-binding modules identified in Figure 2B, we found that foci from the nucleosomal quartile were in fact more likely to occur in regions where other TFs bind prior to or during pioneer factor binding (i.e. residing in “All” module) than would be expected by chance, and in fact represented greater than 50% of pioneer binding events for GATA4 and EOMES (Figure S7A). To determine whether local motif context is predictive of co-binding, we compared nucleosomal vs. direct DNA binding sites using DREME (Bailey, 2011) to identify short motif sequences de novo in windows surrounding foci for each factor. Notably, while a short GATA motif was detected in both nucleosomal and direct FoxA2 regions, a higher percentage of regions harbored a perfect match for the motif at nucleosomal sites than direct sites, and the GATA match was in fact more abundant than perfect matches for a minimal FoxA motif at nucleosomal sites (Figure 6D). This is consistent with the higher incidence of pioneer nucleosome binding at multi-factor sites (Figure S7A), and presents the possibility that nucleosome binding pioneer events are temporally aligned with other TFs at the same loci. Moreover, analysis of Pearsons R² between GATA4 and FoxA2 EChO profiles in Day 3, at which FoxA2 is expected to engage in nucleosome binding, indicates a slightly higher propensity for nucleosomal binding events to engage in co-binding interactions vs. direct binding events (Figure S7B). In all, these results indicate that nucleosome binding does not occur exclusively in inactive chromatin, but is widespread at loci where multiple factors coordinate their binding across developmental time.

Discussion

In this study, we use CUT&RUN to efficiently profile several TFs and histone PTMs in a time course during hESC differentiation to endoderm. We identify coordinated enhancer modules at higher temporal resolution than previous studies (Tsankov et al., 2015), which reflect broad but rapid shifts in cis-element usage over developmental time. CUT&RUN is notable for the unique fragment-end information generated by processive MNase digestion of DNA protected by target proteins, which sonication-based ChIP-seq lacks due to random fragmentation of chromatin. For instance, fragment ends can be used to generate TF binding “footprints” at single-base resolution (Liu et al., 2018), and full fragment length can be used to infer direct factor binding to DNA vs. the presence of flanking nucleosomes (Skene and Henikoff, 2017). To further take advantage of this unique dimension, we introduce Enhanced Chromatin Occupancy (EChO), which generates continuous profiles of average fragment size spanning regions of CUT&RUN enrichment. EChO is able to predict the presence of direct CTCF binding vs. flanking nucleosomes from CTCF CUT&RUN data, and identifies fragment size distributions from a variety of TFs and histone PTMs that accurately reflect their various activities in chromatin, including direct DNA binding, partially displaced nucleosomes, or compacted nucleosomes. Though inferences of direct binding are possible from ChIP-seq and chromatin accessibility data (Bailey and Machanick, 2012; Boyle et al., 2011), they frequently rely upon TF motifs as presumed anchors, whereas many factors lack detectable binding motifs, lack known motifs at many apparent binding sites, or fail to bind known motif-containing loci (Joshi et al., 2007). In contrast, EChO analysis is motif-independent, and thus EChO fragment-size binding characteristics can be used as classifiers to identify subtle differences in motif character that are difficult to identify with motif-centric approaches.

Co-enrichment of CUT&RUN signal for multiple TFs at target enhancers may be indicative of physical cooperativity, for which there is abundant evidence in multicellular eukaryotes (Chronis et al., 2017; Moorman et al., 2006; Wang et al., 2006; Yan et al., 2013). However, the frequency with which multiple TFs bind enhancers coordinately remains difficult to assess. TF co-binding at specific sequences is often validated via in vitro binding mobility shift assays (Thanos and Maniatis, 1995), or by in vivo analysis of overlapping binding profiles (Moorman et al., 2006; Yan et al., 2013) or chromatin accessibility that considers motif structure within accessible regions (Boyle et al., 2011; Thurman et al., 2012). However, these strategies lack genome scale, high resolution of co-binding configurations, or direct confirmation of the presence of the factors in question, respectively. CUT&RUN combined with EChO combines all three characteristics by using antibodies to confirm factor identity, high-throughput sequencing to achieve genome scale, and comparison of fragment size profiles to infer direct binding of multiple confirmed factors at a resolution of tens of bases. As each fragment released in CUT&RUN is generated by cleavages on either side of the Protein A/MNase-bound TF, other TFs that bind to the same site within the population of cells will not confound the analysis. Thus, CUT&RUN with EChO can serve as effective tools to identify likely co-binding relationships in a variety of different in vivo temporal and developmental contexts.

Pioneer transcription factors are expressed at the onset of crucial developmental transitions, at which point it is thought that they bind inactive enhancers that are wrapped in nucleosomes, and render them accessible for the binding of downstream TFs and coactivators (Zaret and Carroll, 2011). In vitro binding studies indicate that binding to nucleosomes is expected to be common amongst human transcription factors (Zhu et al., 2018). However, transient DNA unwrapping from the nucleosome core (Polach and Widom, 1995), DNA exposure during replication (Ramachandran and Henikoff, 2016), and cycles of pervasive remodeling activity (Morris et al., 2014) can all facilitate general access of TFs to targets without direct nucleosome binding, and in vivo it remains exceedingly difficult to distinguish direct nucleosome binding activity from such alternatives. We sought to address this problem by tracking EChO fragment size across time from the onset of factor binding to terminal enhancer activation, and were surprised to find that individual factors access different enhancer targets via different strategies, either by directly accessing DNA or by pioneer binding to nucleosomes. Partitioning of pioneer factor targets is not without precedent (Johnson et al., 2018); however, contrary to expectation, we find that pioneer nucleosome binding is not associated with reduced chromatin accessibility or the absence of auxiliary factor binding prior to pioneering. These results are consistent with a single-molecule interpretation of the bulk data at hand that considers chromatin as a dynamic participant in the processes described above. For instance, our results indicate that bulk chromatin accessibility at a specific locus reflects a transition equilibrium between accessible and inaccessible states at each individual DNA molecule, consistent with variable nucleosome positioning at the same loci, and observation of both accessible and inaccessible states within the same DNase hypersensitive site, between different single cells (Chereji et al., 2018; Lai et al., 2018). This allows TFs to implement either direct DNA or nucleosome binding as dictated by a variety of kinetic or thermodynamic constraints, to which prior chromatin configuration does not appear to contribute substantially during hESC differentiation to endoderm. Similarly, rather than exclusively interrogating invariantly inaccessible sites devoid of auxiliary factor binding, pioneer factors may bind nucleosomes that are dynamically reshuffled across the enhancer as a consequence of other TF binding events. This hypothesis is consistent with a recent study showing that FoxA2 is necessary to precipitate chromatin opening at its binding sites during endoderm differentiation from mouse ESCs, but not sufficient in the absence of other factors (Cernilogar et al., 2019). Thus, we propose that pioneer nucleosome binding does not occur exclusively as a mechanism to uniquely access inactive chromatin, but rather in concert with a cascade of other TF binding events at the same enhancers across developmental time. In this light, a dynamic nucleosome environment that grants developmental TFs the opportunity to sample diverse chromatin configurations, both accessible and occluded, is likely a central feature of major developmental transitions such as the differentiation of embryonic stem cells to primordial germ layers.

Finally, we find that motif strength is anticorrelated with the incidence of pioneer nucleosome binding, suggesting that binding via a nucleosome may stabilize interactions at low-affinity motif targets. We posit a model in which TFs engage low-affinity motifs via a combination of direct DNA binding and nucleosomal anchoring to ensure eventual access, whereas high affinity motifs are bound effectively during opportunistic windows of DNA unwrapping from the nucleosomal core (Figure 7). Our results and model are consistent with structural data describing direct contacts between TFs and their motifs, particularly for FoxA2 (Li et al., 2017; Sekiya et al., 2009). From a developmental perspective, it is possible that the presence of pioneer binding acts as a “failsafe” that ensures target access that is robust to variations in chromatin configuration or factor expression across a range of motif targets. From an evolutionary perspective, the capacity to access low-affinity sites in the context of a nucleosome dramatically increases the search space for a given factor, and perhaps facilitated the evolution of new enhancers nucleated by partial pioneer factor binding and refined by developmental selective pressures. Such a mechanism also empowers a single factor with significant binding complexity with respect to sequence, which could obviate the proposed need for combinatorial cooperative binding of multiple factors to diversify sequence targets (Thanos and Maniatis, 1995). Alternatively, the relaxed binding constraints could in fact pave the way for the assembly of cooperative enhancers, by utilizing more common clusters of multiple weak motifs as raw material to select for more stringent co-binding. We observe that co-binding is less frequent than might be expected in a pure “enhanceosome” model of combinatorial TF complexes, suggesting that individual TFs stochastically engaging multiple sequences may in part be enabled by relaxed pioneer factor motif binding constraints (Arnosti and Kulkarni, 2005; Spitz and Furlong, 2012). It is likely that co-binding vs. single-factor binding and strong motif direct binding vs. nucleosome-mediated weak motif binding exert mutual influence upon each other during evolution to shape new enhancers and fine-tune existing ones. In all, this study uncovers several unexpected behaviors of pioneer factors in vivo, and suggests that future refinement and analysis of the temporal signatures of TF binding may prove to be extremely valuable in capturing in vivo binding dynamics at genome scale during essential developmental transitions.

Figure 7: — Model of direct DNA vs. pioneer nucleosome binding events. Dynamic exchange occurs between four states: 1) No TF binding, target wrapped in a nucleosome (left), 2) No binding, target accessible (bottom), 3) TF binding target wrapped in nucleosome (top), and 4) TF binding accessible target (right). TFs engaging strong motifs (orange arrow) can more easily transition from state 2 to 4 due to high affinity for transiently accessible targets, whereas TFs engaging weak motifs (teal arrow) fail to bind their targets efficiently when accessible due to low affinity, and therefore occupy state 3 as a means to eventually bind their accessible targets directly.

STAR METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to the Lead Contact, Steven Henikoff (steveh@fredhutch.org)

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Human Cell culture

Human female K562 Chronic Myleogenous Leukemia cells (ATCC) were authenticated for STR, sterility, human pathogenic virus testing, mycoplasma contamination, and viability at thaw. H1 (WA01) male human embryonic stem cells (hESCs) (WiCell) were authenticated for karyotype, STR, sterility, mycoplasma contamination, and viability at thaw. Cells were cultured as previously described (Janssens et al., 2018). Briefly, K562 cells were cultured in liquid suspension H1 cells were cultured in Matrigel (Corning)-coated plates at 37°C and 5% CO₂ using mTeSR-1 Basal Medium (STEMCELL Technologies) exchanged every 24 hours. H1 cultures were differentiated to definitive endoderm (DE) using the Stem Diff Definitive Endoderm Kit (STEMCELL Technologies). Briefly, new hESCs were seeded on each day during a five-day period, and cells were incubated with DE differentiation medium according to manufacturer’s recommendations for the remainder of the time until the completion of the five-day span. Thus, cells seeded on the first day underwent the entire five-day differentiation protocol and corresponded to “Day 5” of the time course, cells seeded on the second day underwent four out of five days of differentiation and corresponded to “Day 4”, and so on until the cells seeded on the last day were not exposed to any differentiation medium, and thus were simply re-seeded hESCs and corresponded to “Day 1”. On day five, all cell cultures were harvested and used for downstream CUT&RUN or immunofluorescence.

METHOD DETAILS

Immunofluorescence

Cells were cultured as described in Experimental Model and Subject Details, and immunofluorescence staining was conducted in-well in 12-well plates at room temperature. After culturing, cells were washed once with 1 mL 1x Phosphate Buffered Saline (PBS) with gentle rocking for 5 minutes, then were incubated with 4% Paraformaldehyde (PFA) in 1 mL 1x PBS for 15 minutes with gentle rocking. Wells were rinsed once with 1x PBS, then washed twice with 1 mL 1x PBS plus 0.1% Triton-X100 (1x PBST) for 5 minutes each with gentle rocking. Wells were then incubated with 0.5 mL 1x PBST with primary antibody added (Sox17 1:100; Oct4 1:250) for 2 hours with gentle rocking. After primary antibody incubation, we rinsed wells once with 1 mL 1x PBS, then washed them twice with 1 mL 1x PBS for five minutes each with gentle rocking. We then added 0.5 mL 1x PBST with Donkey anti-Goat-Rhodamine Red conjugated secondary antibody added (1:1000) and incubated wells for 1 hour with gentle rocking. Wells were rinsed once with 1 mL 1x PBS, washed once with 1x PBS for 5 minutes with gentle rocking, then incubated with 0.5 mL 1x PBST with Goat anti-Rabbit-Cy5 conjugated secondary antibody added (1:1000) for 1 hour with gentle rocking. Wells were rinsed once with 1 mL 1x PBS, washed twice with 1 mL 1x PBS for 5 minutes each with gentle rocking, and incubated with 0.5 mL 1x PBST with DAPI (1:50000) and Phalloidin-AlexaFluor 488 (Thermo Fisher #A12379, 1:100) for 20 minutes with gentle rocking. Wells were rinsed once with 1 mL 1x PBS, washed three times with 1 mL 1x PBS for 5 minutes each with gentle rocking, then wells were imaged in 1x PBS on an EVOS FL Auto 2 Cell Imaging System (Invitrogen) with 40x magnification.

CUT&RUN

CUT&RUN was carried out as previously described (Janssens et al., 2018; Skene et al., 2018). Briefly, cells were washed and bound to Concanavalin A-coated magnetic beads, then permeabilized with Wash Buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine and one Roche Complete protein inhibitor tablet per 50 mL) containing 0.05% digitonin (Dig Wash), and incubated with primary antibody overnight at 4°C. Cell-bead slurry was washed twice with Dig Wash, incubated with Protein A-MNase (pA-MN) for 1 hour at 4°C, then washed twice more with Dig Wash. Slurry was then placed on an ice-cold block and incubated with Dig Wash containing 2 mM CaCl₂ to activate pA-MN digestion. After an antibody-specific incubation period, one volume of 2x Stop Buffer (340 mM NaCl, 20 mM EDTA, 4 mM EGTA, 0.05% Digitonin, 0.05 mg/mL glycogen, 5 μg/mL RNase A, 2 pg/mL heterologous spike-in DNA) was added to stop the reaction, and fragments were released by 30-minute incubation at 37°C. Samples were centrifuged 5 minutes at 16000xg, and supernatant was recovered and DNA extracted via Phenol-Chloroform extraction and ethanol precipitation. Resulting DNA was used as input for library preparation as previously described (Skene et al., 2018).

To ensure consistency in sample processing, most CUT&RUN was carried out on an automated BioMek platform (Janssens et al., 2018), though due to logistical issues some samples needed to be carried out using standard, non-automated CUT&RUN. Table S1 describes CUT&RUN protocol details for every CUT&RUN reaction presented in this paper.

ATAC-seq

ATAC-seq was carried out as previously described (Buenrostro et al., 2013), with minor changes, using the Nextera DNA Library Prep Kit (Illumina). Briefly, cells were washed once with 1 mL Wash buffer and once with 900 μL RSB Buffer (20 mM HEPES pH 7.5, 10 mM NaCl, 3 mM MgCl₂). Mixture was then spun down, resuspended in 50 μL RSB Buffer with 0.1% Tween-20, 0.1% NP-40, and 0.01% Digitonin added, and incubated on ice for 3 minutes to permeabilize. After incubation, we added 1 mL RSB + 0.1% Tween-20 and pelleted cells for 10 minutes at 500xg at 4°C. Pellet was then resuspended on ice in 50 μL TR Mix (25 μL 2x TD Buffer, 2.5 μL Tn5 Transposase, 16.5 μL 1x PBS, 1 μL 5% Digitonin, 0.5 μL 10% Tween-20, 4.5 μL ddH₂O), then transferred to a 37°C water bath for 30 minutes for transposition reaction. Reactions were stopped by adding 50 μL RSB, 2 μL 10% SDS, and 1.5 μL Proteinase K (Thermo Fisher), mixing well, and incubating at 50°C for 1 hour to digest proteins. After incubation, 100 μL ddH₂O was added, and samples were phenol-chloroform extracted, ethanol precipitated, and resulting DNA was amplified for 14 cycles with i5/i7 adapters.

Data processing

Sequencing data were aligned to the UCSC hg19 genome build using bowtie2 (Langmead and Salzberg, 2012), version 2.2.5, with parameters --end-to-end --very-sensitive --no-mixed --no-discordant -q -- phred33 −I 10 −X 700. Mapped reads were converted to paired-end BED files containing coordinates for the termini of each read pair, and then converted to bedgraph files using bedtools genomecov with parameter -bg (Quinlan and Hall, 2010).

EChO methodology

EChO was devised based on our observation that CUT&RUN data often adopts a stereotyped curvature surrounding sites of TF binding: small (<120 bp) fragments at the site of direct binding, flanked by nucleosomal (>150 bp) fragments on either side. Thus, we hypothesized that a locally weighted averaging approach would adequately summarize the data across such sites. EChO uses Locally Estimated Scatterplot Smoothing (LOESS) (Cleveland 1978), a non-parametric polynomial regression strategy that takes local subsets of the input data in a sliding window and fits binomial functions to each subset, then estimates a continuous “Loess curve” across the entire region. We used LOESS to generate a Loess curve estimating positional fragment size, based on a scatterplot of fragment centers vs. fragment sizes for all fragments within every CUT&RUN enriched region containing more than 10 fragments.

Standard formulations of LOESS require a “span” value between 0 and 1 that dictates the fraction of the total data to incorporate within the sliding window. Small span values incorporate less data in each window, and therefore generate “rougher” profiles that interpolate the actual data more closely, whereas large span values incorporate more data, and generate “smoother” profiles at the expense of larger residual error. Low error is desirable to summarize the data accurately; however, full interpolation of the data can result in rough curves that summarize the data poorly in between data points. Therefore, we adopted the following approach to minimize both metrics simultaneously: 1) For each enriched region, we generated Loess curves using all span values between 0.05 and 1 at intervals of 0.05; 2) For each curve, we derived standard error and roughness values, the latter calculated as the standard deviation of the first derivative of the Loess curve; 3) We transformed the error and roughness values to z-scores, plotted them on a scatterplot, and calculated the orthogonal distance from each point to the line defined by the two points representing either minimum scaled error or minimum scaled roughness; 4) The span associated with the point with the largest orthogonal distance below the line was chosen as the optimal span to calculate the Loess curve.

We used EChO to define “foci” in the data that represent local minima in the average fragment size Loess curves. Foci were defined as any point in the curve where the first derivative transitions from negative to positive. For downstream analysis, we considered only foci that had the smallest average fragment size of all foci in a 200 bp window on either side of it.

EChO software and implementation

All software associated with EChO is provided at https://github.com/FredHutch/EChO. EChO was implemented as follows: 1) We combined all replicates for each factor-timepoint combination to maximize the number of fragments used in size calculations. 2) For each SEACR (Meers et al. 2019) peak detected in a CUT&RUN dataset, we found all CUT&RUN fragments from the same dataset that overlapped it using bedtools intersect with the “-wao” flag (Quinlan and Hall, 2010). 3) For every fragment overlapping the peak in question, we calculated the distance of the center of the fragment from the center of the peak (with negative values corresponding to upstream and positive to downstream), and paired it with the fragment size in bp to produce a table of positions vs. sizes. 4) We implemented LOESS in R across 20 different span values between 0.05 and 1 at intervals of 0.05 using the “loess” function with the position vs. size table as input, and defined the Loess curve over the peak region using the “predict” function. 5) For each span value, we used the “s” output value from “loess” to identify standard error, and calculated a “roughness” value by taking the standard deviation of the first derivative of the Loess curve with the command sd(diff(Loess curve)), to produce a table of errors vs. roughness across all span values. 6) We eliminated any span values that failed to produce a prediction due to lack of data, used the “scale” command to produce z-scores of the error and roughness values, plotted the two terms for each span value in a scatterplot, and defined a line formed by the points representing the smallest scaled error value and the smallest scaled roughness value. 7) We calculated the orthogonal distance between the line and each point, and used the span for which the orthogonal distance below the line was maximized as the designated span value for the peak in question. 8) We determine each base pair in the Loess curve at which the first derivative of the curve transitioned from negative to positive to identify local “foci” of minimum fragment size, and reported the positions and fragment sizes in an output file. 9) We repeated steps 3–8 for every SEACR peak in the dataset.

EChO validation

To compare EChO’s automatic span selection strategy described above to other possible span selection strategies, we used all CTCF enriched regions called by SEACR in chr1 as input to EChO using either automatic span selection (“Auto”), using a constant span for all regions, or using a constant number of fragments to inform the span for each region. As an example of the latter case, given a constant of 50 fragments used for span calculation, EChO would summarize a region with 100 total fragments with a span of 0.5 (50/100), whereas a region with 200 total fragments would use a span of 0.25 (50/200). For constant span tests, we used 10 span values between 0.1 and 1 at intervals of 0.1; for constant fragment number tests, we used 10 constant fragment number values between 10 and 100 at intervals of 10.

To evaluate the performance of Auto span selection vs. other strategies, we calculated metrics estimating residual error of small fragments and degree of fragment density bias in error. To represent residual error, for each fragment in any region that was less than 120 bp, we calculated a residual error between the actual fragment size and the estimated EChO fragment size for the genomic position of the center of that fragment, took its absolute value, and took the mean of all residuals calculated within each size selection strategy. To estimate density bias, for every fragment less than 120 bp, we plotted the absolute value of the residual error against the number of fragments used to calculate the estimated fragment size for that fragment’s position, plotted a best-fit linear model for the data, and used the absolute value of the best fit line’s slope as a proxy for density bias. We also noted for each strategy whether a span could be calculated for 100% of regions, or whether there were regions that failed to produce a span value. Failures usually occurred when there was not sufficient data within the region to derive a polynomial based on the selection strategy; for instance, a constant span of 0.1 applied to a region with only 20 fragments would fail because only two fragments would be used for Loess calculation. Once we obtained these three values for each selection strategy, we plotted the error estimation on the x-axis and the bias estimation on the y-axis of a scatterplot reporting all selection strategies (Figure S3B), and color-coded each point to denote whether 100% of regions succeeded or not.

EChO overlapping profile analysis

EChO overlap analysis was conducted with pairwise combinations of CUT&RUN datasets. To categorize EChO foci into “shared” or “specific”, we first concatenated BED file lists of small fragment (<120 bp) foci for two CUT&RUN experiments of interest, refined the list to foci with the smallest average fragment size of all foci for the same factor within 50 bp, and sorted the concatenated list by genomic position. We then used Bedtools (Quinlan and Hall, 2010) to filter out any foci that do not overlap an enriched peak for both factors. Finally, we classified any two foci for different factors that fell within 50 bp of each other as “shared”, and all other foci as “specific”.

To analyze Pearson’s correlation and cross-correlation lag between pairs of CUT&RUN datasets, we did the following: 1) For two CUT&RUN datasets, we first obtained the intersection of SEACR peaks using bedops with the “-i” flag (Neph et al., 2012), and filtered out any regions of peak intersection shorter than 401 bp. 2) Next, for each region of intersection, we obtained the span values used to calculate foci in the SEACR peak for each factor that overlaps the region of intersection, used them to recalculate Loess curves across the factor-specific SEACR peaks, and returned factor-specific vectors for the Loess curve values that corresponded to the region of intersection, resulting in a table of average fragment size values with one column corresponding to each factor and a length in rows equal to the length of the region of intersection. 3) From this table, we identified the minimum fragment size positions for each factor, and extracted from the table two subsets equal to 401 rows each corresponding to a 400 bp window surrounding the minimum fragment size position for each factor. 4) For each of the two subsets, we calculated Pearson’s correlation using the “cor” function in R, and calculated cross-correlation lag using the “ccr” function, with a maximum lag equal to half the length of the region of intersection. 5) We repeated steps 2–4 for every region of intersection detected from the two CUT&RUN datasets.

EChO pioneer binding analysis

For EChO pioneer TF analysis, we incorporated three datasets from the same factor corresponding to initial, intervening, and terminal time points, and processed them as follows: 1) Foci of small average fragment size (<120 bp) from the terminal time point were filtered to include only foci with the smallest average fragment size of all foci within 50 bp. 2) We used bedtools intersect with the -c flag between the list of filtered foci and a BED files for all CUT&RUN fragments for each factor-timepoint combination in the time course to filter out any foci with fewer than 20 fragments from any factor-timepoint combination mapping in a 400 bp window around the focus. 3) We intersected the fully filtered list of foci with peaks from the terminal time point using bedtools intersect with the “-wao” flag, and generated a BED file with coordinates for each peak that overlapped each focus, with the first focus coordinate included as the fourth field. 4) For each time point, we implemented EChO for each peak region, and generated a matrix of average fragment size values for every focus at every base pair in a 400 bp window around the focus. 5) For each time point, we defined minimum fragment size at the focus as the minimum fragment size in a 100 bp window around each focus. The corresponding minimum values for the initial time point was used to define quartiles for downstream analysis.

Data Visualization

Data plots were produced using R (https://www.r-project.org) core plotting utilities, ggplot2 (https://ggplot2.tidyverse.org), and Deeptools (Ramirez et al., 2016).

QUANTIFICATION AND STATISTICAL ANALYSIS

Correlation and clustering analyses

For both factor-timepoint correlation analysis and enhancer clustering analysis, we assessed overlap of peaks from 26 factor-timepoint combinations via bedops -i, and we merged adjacent regions and designated the entire merged region as being overlapped by any factor-timepoint combination that overlapped a component of the merged region. This analysis resulted in 82472 distinct regions overlapping at least one of the factor-timepoint combinations, and we generated a binary matrix of 82472 rows and 26 columns indicating the overlap status of every factor-timepoint combination in every region. For correlation analysis, we evaluated the Jaccard distance between each pair of factor-timepoint combinations, and generated a hierarchically clustered dendrogram based on those distance values. For enhancer clustering analysis, we used K-modes clustering by implementing the “kmodes” R command on 82472 regions, and used the “knee” method with within-group distance to select an optimal number of clusters.

Gaussian mixture model analysis

We used the normalmixEM command from the mixtools package in R (https://cran.r-project.org/web/packages/mixtools/index.html) to generate a tripartite Gaussian mixture model (GMM) that summarized a vector of average fragment sizes at CTCF foci from CTCF CUT&RUN data from K562 cells (Janssens et al., 2018). Three gaussians were selected on the basis of best fit by log likelihood. Foci were assigned to one of the three gaussians by selecting the one for which the posterior probability as reported by normalmixEM was the highest.

Motif analyses

For all motif analyses, we used bedtools getfasta to generate FASTA files for designated windows around each group of foci being analyzed. For FIMO analyses, we used fimo with a -thresh value of 0.05 to capture all motif configurations, and then filtered the output file so that it contained only the single motif with the highest FIMO motif score (column 6 in the output) for each FASTA entry. For CTCF motif analysis, we used MEME to call de novo peaks in a 40 bp window around CTCF K562 foci belonging to each of the three groups identified by GMM. A strong CTCF motif was only identified in Group 1 (small-fragment foci); nevertheless, we used the position-weight matrix (PWM) for that motif as an input for FIMO (Grant et al., 2011) to test each of the GMM groups for the strength of the strongest CTCF motifs identified in each window. For other motif analyses focused on known motifs, including for GATA4, EOMES, Sox17, and FoxA2, we searched for motif PWMs from the JASPAR database (Khan et al., 2018), and selected PWMs with a preference for those derived from human systems or from ChIP-seq data where available. The JASPAR accession numbers for the selected motifs are as follows: GATA4: MA0482.1 (mouse ChIP-seq); EOMES: MA0800.1 (human SELEX); Sox17: MA0078.1 (mouse SELEX); FoxA2: MA0047.2 (mouse ChIP-seq). We then reduced the reported PWMs to refined PWMs that were defined by the outermost bases whose information content was greater than 1.0 bits; the threshold for Sox17 was loosened to 0.5 bits in order to select a sequence length comparable to the other three. The resulting refined PWMs correspond to the following base positions from the reported JASPAR motifs: GATA4: positions 2–8 (7 bp); EOMES: positions 2–9 (8 bp); Sox17: positions 3–9 (7 bp); FoxA2: positions 1–7 (7 bp). These refined PWMs were used as input to FIMO analysis as described above. For de novo motif analysis, we used DREME (Bailey, 2011) with default settings for each group of FASTA sequences tested.

Statistical testing

For comparison of motif score and motif offset distributions in Figure 6B and Figure S7E, respectively, we used two-sample Kolmogorov-Smirnov (KS) tests to derive p-values designating the likelihood of the two sets of values being drawn from the same distribution. For comparison between quartiles in Figure S6B (distance between density distributions at foci) and Figure S7E (percentage of foci overlapping the “All” enhancer module), we randomly reassigned foci to quartiles 100 times (Figure S7E) or 1000 times (Figure S6B), recomputed the relevant metrics for each of the random assignments, and assigned a p-value that was equal to the percentile in which the experimental value fell relative to the distribution of values from randomized data.

DATA AND SOFTWARE AVAILABILITY

All software required for implementation of SEACR are provided at https://github.com/FredHutch/SEACR, and software required for implementation of EChO is provided at https://github.com/FredHutch/EChO. All sequencing data have been GEO under ID code GSE128499.

Supplementary Material

NIHMS1532243-supplement-1.pdf^{(14.9MB, pdf)}

Table S1, related to Figure 1: Descriptions of all CUT&RUN reactions presented in this study. Details are listed in “Table S1 Legend” tab.

NIHMS1532243-supplement-2.xlsx^{(53.3KB, xlsx)}

Data S1, related to STAR Methods: Scripts for implementing EChO. EChO.fragsize.matrix.pl and EChO.fragsize.matrix.R are used to implement EChO as described in “EChO software and implementation” in STAR Methods. EChO.fragsize.corr.sh and EChO.fragsize.matrix.R are used to carry out EChO overlap analysis as described in “EChO overlapping profile analysis” in STAR Methods. Updated versions of these scripts and detailed usage guidelines are available at https://github.com/FredHutch/EChO.

NIHMS1532243-supplement-3.zip^{(5.9KB, zip)}

KEY RESOURCES TABLE

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies
Rabbit polyclonal anti-CTCF	Millipore	Cat# 07–729; Lot# 2971000; RRID: AB_441965
Rabbit polyclonal anti-FoxA2	Millipore	Cat# 07–633; Lot# 2890845; RRID: AB_390153
Rabbit polyclonal anti-Sox2	Abcam	Cat# ab92494; Lot# GR305857–10; RRID: AB_10585428
Rabbit polyclonal anti-FoxA1	Abcam	Cat# ab23738; Lot# GR292351–1; RRID: AB_2104842
Guinea Pig anti-Rabbit IgG	Antibodies-online	Cat# ABIN101961; Lot# 29313; RRID: AB_10775589
Rabbit monoclonal anti-H3K27me3	Cell Signaling Technology	Cat# 9733; Lot# 8; RRID: AB_2616029
Rabbit polyclonal anti-H3K27ac	Abcam	Cat# ab4729; Lot# GR3205526–1; RRID: AB_2118291
Rabbit polyclonal anti-H3K4me2	Upstate (Millipore)	Cat# 07–030; Lot# 26335; RRID: AB_11213050
Rabbit polyclonal anti-Oct4	Abcam	Cat# ab109183; Lot# GR120970–6; RRID: AB_10864777
Rabbit polyclonal anti-SUZ12	Abcam	Cat# ab12073; Lot# GR320578–1; RRID: AB_442939
Rabbit monoclonal anti-Nanog	Abcam	Cat# ab109250; Lot# GR247237–24; RRID: AB_10863442
Rabbit anti-mouse IgG	Abcam	Cat# ab46540; RRID: AB_2614925
Rabbit polyclonal anti-EOMES	Abcam	Cat# ab23345; Lot# GR3234551–1; RRID: AB_778267
Mouse monoclonal anti-H3.3	Abnova	Cat# H00003021-A01; Lot# 10316–51; RRID: AB_461517
Mouse monoclonal anti-GATA4	Santa Cruz Biotechnology	Cat# sc-25310X; Lot# H0918; RRID: AB_627667
Goat polyclonal anti-Sox17	R&D Systems	Cat# AF1924; Lot# KGA0916121; RRID: RRID:AB_355060
Goat polyclonal anti-Brachyury	R&D Systems	Cat# AF2085; Lot# KQP0617121; RRID: AB_2200235
Rabbit anti-Goat IgG	Abcam	Cat# ab6697; RRID: AB_955988
Donkey anti-Goat-Rhodamine Red	Jackson ImmunoResearch	Cat# 705–295-147
Goat anti-Rabbit-Cy5	Jackson ImmunoResearch	Cat# 111–175-144; RRID: AB_2338013
Critical Commercial Assays
mTeSR-1 Basal Medium	StemCell Technologies	Cat# 85850
StemDiff Definitive Endoderm Kit	StemCell Technologies	Cat# 05110
KAPA HiFi HotStart PCR Kit	KAPA Biosystems	Cat# KK2502
Nextera DNA Library Prep Kit	Illumina	Cat# FC-121–1030
Deposited Data
Raw and analyzed data	This paper	GEO: GSE128499
CTCF CUT&RUN data from K562 cells	Janssens et al. 2018	GEO: GSE120011
CTCF ChIP-exo data from K562 cells	Rhee et al. 2011	SRA: SRA044886
Experimental Models: Cell Lines
WA01 human embryonic stem cells	WiCell	Cat# WA01; Lot# WB35186; RRID: CVCL_9771
K562 Chronic myelogenous leukemia cell line	ATCC	Cat# CCL-243; RRID: CVCL_0004
Software and Algorithms
SEACR: Sparse Enrichment Analysis for CUT&RUN	Meers et al. 2019	https://github.com/FredHutch/SEACR
EChO: Enhanced Chromatin Occupancy	This paper	https://github.com/FredHutch/EChO
Bowtie2	Langmead and Salzberg, 2012	http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Bedtools	Quinlan and Hall, 2010	https://bedtools.readthedocs.io/en/latest/
R	R project	https://www.r-project.org

Open in a new tab

Highlights.

CUT&RUN maps transcription factor binding during stem cell differentiation.
EChO shows that pioneer factors bind to both accessible DNA and nucleosomes.
Pioneer factor-nucleosome interactions are dictated by binding motif strength.
Pioneer nucleosome binding occurs at previously bound accessible sites.

Acknowledgements

We thank Kami Ahmad, Antoine Molaro, and members of the Henikoff Lab for critical reading of the manuscript, Terri Bryson for help with cell culture, Christine Codomo for help preparing sequencing libraries, and Jorja Henikoff for help with alignment and processing of sequencing data. This work was supported by the Howard Hughes Medical Institute and a grant from the National Institutes of Health (4DN TCPA A093).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of Interests

The authors declare no competing interests.

References

Arnosti DN, and Kulkarni MM (2005). Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J Cell Biochem 94, 890–898. [DOI] [PubMed] [Google Scholar]
Bailey TL (2011). DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bailey TL, and Machanick P (2012). Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res 40, e128. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blackwood EM, and Eisenman RN (1991). Max: a helix-loop-helix zipper protein that forms a sequence-specific DNA-binding complex with Myc. Science 251, 1211–1217. [DOI] [PubMed] [Google Scholar]
Boyle AP, Song L, Lee BK, London D, Keefe D, Birney E, Iyer VR, Crawford GE, and Furey TS (2011). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res 21, 456–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, and Greenleaf WJ (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cernilogar FM, Hasenöder S, Wang Z, Scheibner K, Burtscher I, Sterr M, Smialowski P, Groh S, Evenroed IM, Gilfillan GD, et al. Pre-marked chromatin and transcription factor co-binding shape the pioneering activity of FoxA2. bioRxiv, 607721. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chereji RV, Ramachandran S, Bryson TD, and Henikoff S (2018). Precise genome-wide mapping of single nucleosomes and linkers in vivo. Genome Biol 19, 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chronis C, Fiziev P, Papp B, Butz S, Bonora G, Sabri S, Ernst J, and Plath K (2017). Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442–459 e420. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, and Zaret KS (2002). Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell 9, 279–289. [DOI] [PubMed] [Google Scholar]
Cleveland WS (1978). Robust Locally Weighted Regression and Smoothing Scatterplots. J Am Stat Assoc 74, 829–836 [Google Scholar]
Grant CE, Bailey TL, and Noble WS (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, and Henikoff S (2011). Epigenome characterization at single base-pair resolution. Proc Natl Acad Sci U S A 108, 18318–18323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Iwafuchi-Doi M, Donahue G, Kakumanu A, Watts JA, Mahony S, Pugh BF, Lee D, Kaestner KH, and Zaret KS (2016). The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. Mol Cell 62, 79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
Janssens DH, Wu SJ, Sarthy JF, Meers MP, Myers CH, Olson JM, Ahmad K, and Henikoff S (2018). Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigenetics Chromatin 11, 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson TA, Chereji RV, Stavreva DA, Morris SA, Hager GL, and Clark DJ (2018). Conventional and pioneer modes of glucocorticoid receptor interaction with enhancer chromatin in vivo. Nucleic Acids Res 46, 203–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Joshi R, Passner JM, Rohs R, Jain R, Sosinsky A, Crickmore MA, Jacob V, Aggarwal AK, Honig B, and Mann RS (2007). Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell 131, 530–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kent NA, Adams S, Moorhouse A, and Paszkiewicz K (2011). Chromatin particle spectrum analysis: a method for comparative chromatin structure analysis using paired-end mode next-generation DNA sequencing. Nucleic Acids Res 39, e26. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, van der Lee R, Bessy A, Cheneby J, Kulkarni SR, Tan G, et al. (2018). JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 46, D1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kornberg RD, and Lorch Y (1999). Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294. [DOI] [PubMed] [Google Scholar]
Lai B, Gao W, Cui K, Xie W, Tang Q, Jin W, Hu G, Ni B, and Zhao K (2018). Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J, Dantas Machado AC, Guo M, Sagendorf JM, Zhou Z, Jiang L, Chen X, Wu D, Qu L, Chen Z, et al. (2017). Structure of the Forkhead Domain of FOXA2 Bound to a Complete DNA Consensus Site. Biochemistry 56, 3745–3753. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu N, Hargreaves VV, Zhu Q, Kurland JV, Hong J, Kim W, Sher F, Macias-Trevino C, Rogers JM, Kurita R, et al. (2018). Direct Promoter Repression by BCL11A Controls the Fetal to Adult Hemoglobin Switch. Cell 173, 430–442 e417. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meers MP, Bryson TD, Henikoff S (2019). A streamlined protocol and analysis pipeline for CUT&RUN chromatin profiling. bioRxiv, 569129. [Google Scholar]
Moorman C, Sun LV, Wang J, de Wit E, Talhout W, Ward LD, Greil F, Lu XJ, White KP, Bussemaker HJ, et al. (2006). Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc Natl Acad Sci U S A 103, 12027–12032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morris SA, Baek S, Sung MH, John S, Wiench M, Johnson TA, Schiltz RL, and Hager GL (2014). Overlapping chromatin-remodeling systems collaborate genome wide at dynamic chromatin transitions. Nat Struct Mol Biol 21, 73–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, et al. (2012). BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
Noll M (1974). Subunit structure of chromatin. Nature 251, 249–251. [DOI] [PubMed] [Google Scholar]
Polach KJ, and Widom J (1995). Mechanism of protein access to specific DNA sequences in chromatin: a dynamic equilibrium model for gene regulation. J Mol Biol 254, 130–149. [DOI] [PubMed] [Google Scholar]
Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramachandran S, Ahmad K, and Henikoff S (2017). Transcription and Remodeling Produce Asymmetrically Unwrapped Nucleosomal Intermediates. Mol Cell 68, 1038–1053 e1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramachandran S, and Henikoff S (2016). Transcriptional Regulators Compete with Nucleosomes Post-replication. Cell 165, 580–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, and Manke T (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Remenyi A, Lins K, Nissen LJ, Reinbold R, Scholer HR, and Wilmanns M (2003). Crystal structure of a POU/HMG/DNA ternary complex suggests differential assembly of Oct4 and Sox2 on two enhancers. Genes Dev 17, 2048–2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rhee HS, and Pugh BF (2011). Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sekiya T, Muthurajan UM, Luger K, Tulin AV, and Zaret KS (2009). Nucleosome-binding affinity as a primary determinant of the nuclear mobility of the pioneer transcription factor FoxA. Genes Dev 23, 804–809. [DOI] [PMC free article] [PubMed] [Google Scholar]
Skene PJ, Henikoff JG, and Henikoff S (2018). Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nat Protoc 13, 1006–1019. [DOI] [PubMed] [Google Scholar]
Skene PJ, and Henikoff S (2017). An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, and Zaret KS (2015). Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spitz F, and Furlong EE (2012). Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13, 613–626. [DOI] [PubMed] [Google Scholar]
St Johnston D, and Nusslein-Volhard C (1992). The origin of pattern and polarity in the Drosophila embryo. Cell 68, 201–219. [DOI] [PubMed] [Google Scholar]
Teo AK, Arnold SJ, Trotter MW, Brown S, Ang LT, Chng Z, Robertson EJ, Dunn NR, and Vallier L (2011). Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev 25, 238–250. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thanos D, and Maniatis T (1995). Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell 83, 1091–1100. [DOI] [PubMed] [Google Scholar]
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsankov AM, Gu H, Akopian V, Ziller MJ, Donaghey J, Amit I, Gnirke A, and Meissner A (2015). Transcription factor binding dynamics during human ES cell differentiation. Nature 518, 344–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
Voss TC, and Hager GL (2014). Dynamic regulation of transcriptional states by chromatin and transcription factors. Nat Rev Genet 15, 69–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang J, Rao S, Chu J, Shen X, Levasseur DN, Theunissen TW, and Orkin SH (2006). A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364–368. [DOI] [PubMed] [Google Scholar]
Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, and Young RA (2013). Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan J, Enge M, Whitington T, Dave K, Liu J, Sur I, Schmierer B, Jolma A, Kivioja T, Taipale M, et al. (2013). Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813. [DOI] [PubMed] [Google Scholar]
Zaret KS, and Carroll JS (2011). Pioneer transcription factors: establishing competence for gene expression. Genes Dev 25, 2227–2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu F, Farnung L, Kaasinen E, Sahu B, Yin Y, Wei B, Dodonova SO, Nitta KR, Morgunova E, Taipale M, et al. (2018). The interaction landscape between transcription factors and the nucleosome. Nature 562, 76–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu Q, Liu N, Orkin SH, Yuan G-C. (2019). CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis. bioRxiv, 529081. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1532243-supplement-1.pdf^{(14.9MB, pdf)}

Table S1, related to Figure 1: Descriptions of all CUT&RUN reactions presented in this study. Details are listed in “Table S1 Legend” tab.

NIHMS1532243-supplement-2.xlsx^{(53.3KB, xlsx)}

NIHMS1532243-supplement-3.zip^{(5.9KB, zip)}

Data Availability Statement

[R1] Arnosti DN, and Kulkarni MM (2005). Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J Cell Biochem 94, 890–898. [DOI] [PubMed] [Google Scholar]

[R2] Bailey TL (2011). DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bailey TL, and Machanick P (2012). Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res 40, e128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Blackwood EM, and Eisenman RN (1991). Max: a helix-loop-helix zipper protein that forms a sequence-specific DNA-binding complex with Myc. Science 251, 1211–1217. [DOI] [PubMed] [Google Scholar]

[R5] Boyle AP, Song L, Lee BK, London D, Keefe D, Birney E, Iyer VR, Crawford GE, and Furey TS (2011). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res 21, 456–464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Buenrostro JD, Giresi PG, Zaba LC, Chang HY, and Greenleaf WJ (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Cernilogar FM, Hasenöder S, Wang Z, Scheibner K, Burtscher I, Sterr M, Smialowski P, Groh S, Evenroed IM, Gilfillan GD, et al. Pre-marked chromatin and transcription factor co-binding shape the pioneering activity of FoxA2. bioRxiv, 607721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Chereji RV, Ramachandran S, Bryson TD, and Henikoff S (2018). Precise genome-wide mapping of single nucleosomes and linkers in vivo. Genome Biol 19, 19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Chronis C, Fiziev P, Papp B, Butz S, Bonora G, Sabri S, Ernst J, and Plath K (2017). Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442–459 e420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, and Zaret KS (2002). Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell 9, 279–289. [DOI] [PubMed] [Google Scholar]

[R11] Cleveland WS (1978). Robust Locally Weighted Regression and Smoothing Scatterplots. J Am Stat Assoc 74, 829–836 [Google Scholar]

[R12] Grant CE, Bailey TL, and Noble WS (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, and Henikoff S (2011). Epigenome characterization at single base-pair resolution. Proc Natl Acad Sci U S A 108, 18318–18323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Iwafuchi-Doi M, Donahue G, Kakumanu A, Watts JA, Mahony S, Pugh BF, Lee D, Kaestner KH, and Zaret KS (2016). The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. Mol Cell 62, 79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Janssens DH, Wu SJ, Sarthy JF, Meers MP, Myers CH, Olson JM, Ahmad K, and Henikoff S (2018). Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigenetics Chromatin 11, 74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Johnson TA, Chereji RV, Stavreva DA, Morris SA, Hager GL, and Clark DJ (2018). Conventional and pioneer modes of glucocorticoid receptor interaction with enhancer chromatin in vivo. Nucleic Acids Res 46, 203–214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Joshi R, Passner JM, Rohs R, Jain R, Sosinsky A, Crickmore MA, Jacob V, Aggarwal AK, Honig B, and Mann RS (2007). Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell 131, 530–543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Kent NA, Adams S, Moorhouse A, and Paszkiewicz K (2011). Chromatin particle spectrum analysis: a method for comparative chromatin structure analysis using paired-end mode next-generation DNA sequencing. Nucleic Acids Res 39, e26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, van der Lee R, Bessy A, Cheneby J, Kulkarni SR, Tan G, et al. (2018). JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 46, D1284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Kornberg RD, and Lorch Y (1999). Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294. [DOI] [PubMed] [Google Scholar]

[R21] Lai B, Gao W, Cui K, Xie W, Tang Q, Jin W, Hu G, Ni B, and Zhao K (2018). Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281–285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Li J, Dantas Machado AC, Guo M, Sagendorf JM, Zhou Z, Jiang L, Chen X, Wu D, Qu L, Chen Z, et al. (2017). Structure of the Forkhead Domain of FOXA2 Bound to a Complete DNA Consensus Site. Biochemistry 56, 3745–3753. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Liu N, Hargreaves VV, Zhu Q, Kurland JV, Hong J, Kim W, Sher F, Macias-Trevino C, Rogers JM, Kurita R, et al. (2018). Direct Promoter Repression by BCL11A Controls the Fetal to Adult Hemoglobin Switch. Cell 173, 430–442 e417. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Meers MP, Bryson TD, Henikoff S (2019). A streamlined protocol and analysis pipeline for CUT&RUN chromatin profiling. bioRxiv, 569129. [Google Scholar]

[R26] Moorman C, Sun LV, Wang J, de Wit E, Talhout W, Ward LD, Greil F, Lu XJ, White KP, Bussemaker HJ, et al. (2006). Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc Natl Acad Sci U S A 103, 12027–12032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Morris SA, Baek S, Sung MH, John S, Wiench M, Johnson TA, Schiltz RL, and Hager GL (2014). Overlapping chromatin-remodeling systems collaborate genome wide at dynamic chromatin transitions. Nat Struct Mol Biol 21, 73–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, et al. (2012). BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Noll M (1974). Subunit structure of chromatin. Nature 251, 249–251. [DOI] [PubMed] [Google Scholar]

[R30] Polach KJ, and Widom J (1995). Mechanism of protein access to specific DNA sequences in chromatin: a dynamic equilibrium model for gene regulation. J Mol Biol 254, 130–149. [DOI] [PubMed] [Google Scholar]

[R31] Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Ramachandran S, Ahmad K, and Henikoff S (2017). Transcription and Remodeling Produce Asymmetrically Unwrapped Nucleosomal Intermediates. Mol Cell 68, 1038–1053 e1034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Ramachandran S, and Henikoff S (2016). Transcriptional Regulators Compete with Nucleosomes Post-replication. Cell 165, 580–592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, and Manke T (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Remenyi A, Lins K, Nissen LJ, Reinbold R, Scholer HR, and Wilmanns M (2003). Crystal structure of a POU/HMG/DNA ternary complex suggests differential assembly of Oct4 and Sox2 on two enhancers. Genes Dev 17, 2048–2059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Rhee HS, and Pugh BF (2011). Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Sekiya T, Muthurajan UM, Luger K, Tulin AV, and Zaret KS (2009). Nucleosome-binding affinity as a primary determinant of the nuclear mobility of the pioneer transcription factor FoxA. Genes Dev 23, 804–809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Skene PJ, Henikoff JG, and Henikoff S (2018). Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nat Protoc 13, 1006–1019. [DOI] [PubMed] [Google Scholar]

[R39] Skene PJ, and Henikoff S (2017). An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, and Zaret KS (2015). Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Spitz F, and Furlong EE (2012). Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13, 613–626. [DOI] [PubMed] [Google Scholar]

[R42] St Johnston D, and Nusslein-Volhard C (1992). The origin of pattern and polarity in the Drosophila embryo. Cell 68, 201–219. [DOI] [PubMed] [Google Scholar]

[R43] Teo AK, Arnold SJ, Trotter MW, Brown S, Ang LT, Chng Z, Robertson EJ, Dunn NR, and Vallier L (2011). Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev 25, 238–250. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Thanos D, and Maniatis T (1995). Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell 83, 1091–1100. [DOI] [PubMed] [Google Scholar]

[R45] Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Tsankov AM, Gu H, Akopian V, Ziller MJ, Donaghey J, Amit I, Gnirke A, and Meissner A (2015). Transcription factor binding dynamics during human ES cell differentiation. Nature 518, 344–349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Voss TC, and Hager GL (2014). Dynamic regulation of transcriptional states by chromatin and transcription factors. Nat Rev Genet 15, 69–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Wang J, Rao S, Chu J, Shen X, Levasseur DN, Theunissen TW, and Orkin SH (2006). A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364–368. [DOI] [PubMed] [Google Scholar]

[R49] Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, and Young RA (2013). Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Yan J, Enge M, Whitington T, Dave K, Liu J, Sur I, Schmierer B, Jolma A, Kivioja T, Taipale M, et al. (2013). Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813. [DOI] [PubMed] [Google Scholar]

[R51] Zaret KS, and Carroll JS (2011). Pioneer transcription factors: establishing competence for gene expression. Genes Dev 25, 2227–2241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] Zhu F, Farnung L, Kaasinen E, Sahu B, Yin Y, Wei B, Dodonova SO, Nitta KR, Morgunova E, Taipale M, et al. (2018). The interaction landscape between transcription factors and the nucleosome. Nature 562, 76–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] Zhu Q, Liu N, Orkin SH, Yuan G-C. (2019). CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis. bioRxiv, 529081. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Pioneer factor-nucleosome binding events during differentiation are motif-encoded

Michael P Meers

Derek H Janssens

Steven Henikoff

Summary

Graphical Abstract

eTOC Blurb

Introduction

Results

Temporal chromatin profiling identifies stage-specific, multi-factor binding modules

Figure 1:

Figure 2:

EChO uses average fragment size to characterize TF binding within chromatin

Figure 3:

Figure 4:

Pioneer TFs access their targets by both nucleosome binding and direct DNA binding

Figure 5:

Pioneer nucleosome binding is sensitive to motif strength rather than chromatin configuration

Figure 6:

Nucleosome binding occurs in the presence of other TF binding events

Discussion

Figure 7:

STAR METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Human Cell culture

METHOD DETAILS

Immunofluorescence

CUT&RUN

ATAC-seq

Data processing

EChO methodology

EChO software and implementation

EChO validation

EChO overlapping profile analysis

EChO pioneer binding analysis

Data Visualization

QUANTIFICATION AND STATISTICAL ANALYSIS

Correlation and clustering analyses

Gaussian mixture model analysis

Motif analyses

Statistical testing

DATA AND SOFTWARE AVAILABILITY

Supplementary Material

Highlights.

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases