Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 8.
Published in final edited form as: Mol Cell. 2024 Jul 16;84(15):2838–2855.e10. doi: 10.1016/j.molcel.2024.06.022

Systematic dissection of sequence features affecting binding specificity of a pioneer factor reveals binding synergy between FOXA1 and AP-1

Cheng Xu 1,2,4, Holly Kleinschmidt 1,2,4, Jianyu Yang 1,2, Erik M Leith 1,2, Jenna Johnson 1, Song Tan 1,2, Shaun Mahony 1,2, Lu Bai 1,2,3,5,*
PMCID: PMC11334613  NIHMSID: NIHMS2010578  PMID: 39019045

SUMMARY

Despite the unique ability of pioneer factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO) to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1, in human A549 cells. Combining ChIP-ISO with in vitro and neural network analyses, we find that (1) FOXA1 binding is strongly affected by co-binding transcription factors (TFs) AP-1 and CEBPB; (2) FOXA1 and AP-1 show binding cooperativity in vitro; (3) FOXA1’s binding is determined more by local sequences than chromatin context, including eu-/heterochromatin; and (4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.

In brief

Pioneer factors only bind a subset of their genomic motifs in a cell-type-specific manner, which is critical for cell differentiation and development. Xu and Kleinschmidt et al. reveal that co-factor AP-1 plays a key role in directing FOXA1 binding in A549 cells and is partially responsible for its context specificity.

Graphical Abstract

graphic file with name nihms-2010578-f0001.jpg

INTRODUCTION

Sequence-specific transcription factors (TFs) are major regulators of gene expression. Characterization of the location and strength of TF binding in the genome is therefore a critical step in understanding gene regulation. TF binding sites are typically identified using the weight matrices of their binding motifs. In higher eukaryotes, however, this method has weak predictive power for actual TF binding events. Many TFs bind <1% of their motifs across the genome, and their binding patterns can change in a cell-type-specific manner.13 Multiple features beyond the core sequence motif have been proposed to contribute to this phenomenon, including DNA shape,4,5 cooperative binding with other TFs,69 DNA methylation,10,11 nucleosome occupancy and chromatin accessibility,12,13 histone modifications,14,15 3D genome contacts,16 and variations in local TF concentrations.17,18 Among these potential factors, nucleosomes have a major inhibitory effect on the binding of many TFs, and chromatin accessibility tends to be correlated with TF binding.13,1921

A subset of TFs known as “pioneer factors” (PFs) can stably associate with nucleosomal templates by recognizing partial sequence motifs and/or interacting with histones.15,2225 Inside cells, PFs can overcome the nucleosomal barrier by targeting nucleosome-embedded motifs and generating accessible chromatin, which enables the binding of other TFs and triggers transcriptional activation.2629 Given their ability to open chromatin in vivo and bind nucleosomal DNA in vitro, PFs should be able to access most, if not all, consensus motifs in the genome. This is indeed the case for PFs in budding yeast30 (Figure S1A). Surprisingly, like canonical TFs, PFs in higher eukaryotes also show highly selective and cell-type-specific binding. For example, FOXA1 is a classic PF capable of binding and opening highly compacted chromatin,3134 but it only occupies 3.7% of its potential motifs in MCF-7 cells, and less than half of these binding events are shared with LNCaP cells.35 Our analysis found that only 10%–20% of consensus motifs are bound by FOXA1 in MCF-7 and A549 cells (Figure S1B). The molecular mechanism underlying such binding selectivity is not well understood.

TF binding is usually studied in the context of the native genome, where each binding event can be affected by multiple variables and individual effects are therefore hard to dissect. Here, we overcome this limitation by developing a method named “chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO)” and apply this method to study FOXA1 binding in human A549 lung cancer cells. In this method, we engineer specific genetic features into synthetic sequences, integrate them into a fixed genomic locus, and measure FOXA1 binding in this highly controlled genetic and epigenetic context. In combination with in vitro and neural network analyses, our work reveals key determinants of PF binding, which has implications on PF function through development and differentiation.

RESULTS

ChIP-ISO assay allows highly parallel measurements of FOXA1 binding to thousands of integrated synthetic sequences

FOXA1 is expressed in A549 cells, where it plays important physiological roles.36,37 To study FOXA1 binding specificity in this cell line, we performed the ChIP-ISO procedure as shown in Figures 1A and S1CS1J. Briefly, a synthetic oligo library (length: 193 bp, complexity: 3,203) was inserted into a truncated CCND1 enhancer (CCND1e),38 where all endogenous FOXA1 motifs are deleted. These sequences contain variations in genetic features that potentially affect FOXA1 binding (Table S1). The resulting plasmid library was integrated into the AAVS1 locus in the human genome through CRISPR-Cas9, and FOXA1 binding to these sequences was measured by ChIP followed by amplicon sequencing (STAR Methods). The same amplicon sequencing was performed on purified genomic sequences as the input control. Because the synthetic sequences associated with FOXA1 are enriched by ChIP, for each library sequence, the ratio of the read count in the ChIP sample divided by that in the input sample was used as a measure of FOXA1 binding strength (referred to as the “ChIP-ISO signal” below) (STAR Methods).

Figure 1. ChIP-ISO assay allows highly parallel measurements of FOXA1 binding to thousands of integrated synthetic sequences.

Figure 1.

(A) Workflows for ChIP-ISO (1), EMSA-seq (2), and neural network analysis (3).

(B) Reproducibility of FOXA1 ChIP-ISO signal across two biological replicates. Black/gray dots represent sequences containing WT/mutated FOXA1 motifs. r: Pearson correlation.

(C) Histogram of the FOXA1 ChIP-ISO signals with the entire ISO library, fit by two Gaussian peaks (green and yellow: low and high peaks, respectively; red: superposition of the two).

(D) Comparison of FOXA1 ChIP-ISO signals with low-throughput ChIP-qPCR signals from three biological replicates over three individual ISO library sequences.

We obtained ChIP-ISO signals for 1,882 library sequences with high confidence (STAR Methods). We then performed a few tests to evaluate the ChIP-ISO method. Two biological replicates agree with an overall correlation coefficient of 0.78 (Figure 1B). The ChIP-ISO signals follow a skewed distribution, with the lower peak representing sequences with no or low-level FOXA1 binding (Figure 1C). As expected, sequences with mutated FOXA1 motifs show lower binding (Figure 1B). Furthermore, we constructed three cell lines, each containing a single library sequence integrated into the AAVS1 site, and measured FOXA1 binding by ChIP followed by quantitative PCR (ChIP-qPCR). These low-throughput measurements agree well with the high-throughput results (Figure 1D). We therefore conclude that the ChIP-ISO method can accurately and efficiently measure FOXA1 binding to integrated synthetic sequences.

Co-binding of AP-1 strongly enhances FOXA1 binding to the CCND1e

The endogenous CCND1e in A549 cells is bound by FOXA1 and accompanied by high chromatin accessibility and H3K27ac signals38 (Figure S2A). It contains three FOXA1 motifs, with scores of 8.08, 8.56, and 13.37, respectively, as well as conserved binding sites of eight other TFs (Figures 2A and S2B). The first set of the library includes CCND1e mutants. Figure 2B shows the ChIP-ISO measurement on sequences with scanning mutations, where a 10-bp window is sequentially scrambled with a 3-bp step size. The most prominent drop in FOXA1 occupancy is observed when a region near the third FOXA1 motif is scrambled, indicating that this region contains key elements that recruit FOXA1.

Figure 2. Co-binding of AP-1 strongly enhances FOXA1 binding to the CCND1e.

Figure 2.

(A) Map of the 193-bp portion of the CCND1e explored in this study (chr11:69,654,913–69,655,105). Three FOXA1 motifs (orientations depicted by arrow directions) and motifs of potential co-binding TFs are labeled.

(B) ChIP-ISO signals over CCND1e variants containing scrambled sequences in a 10-bp moving window (step size: 3 bp). Bars are the averaged signals between two biological replicates (shown in light and dark gray, respectively). X: missing data. Gray box highlights an area where the scrambles lead to particularly low FOXA1 ChIP-ISO signals, indicating that these sequences are critical for FOXA1 binding.

(C) The effect on FOXA1 binding by manipulating individual FOXA1 motifs. Each FOXA1 motif is either mutated (mut), orientation-reversed (rev), or converted into the strongest consensus (con). Table lists the fold change and statistical significance of FOXA1 ChIP-ISO signal caused by these sequence variations. Two-tailed paired t test.

(D) FOXA1 ChIP-ISO signals over CCND1e variants containing 0, 1, 2, or 3 original (ori) FOXA1 motifs or 1 consensus (con) FOXA1 motif. X: missing data.

(E) Same as (C), but showing the effect on FOXA1 binding by mutating co-factor motifs.

(F) Box-and-whisker plots showing FOXA1 ChIP-ISO signals over otherwise identical CCND1e sequences containing WT or mutated AP-1, CEBPB, or SP1 motifs. Paired sequences are connected by a line. **** p < 0.0001, *** p < 0.001, ** p < 0.01, and ns, non-significant based on two-tailed paired t test (same below, unless specified).

(G) Pearson correlation coefficient between FOXA1 ChIP-ISO signals and CCND1 sequence variables, calculated using the total set or the subset containing an AP-1 motif. These numbers reflect the level of impact of each variable on FOXA1 binding.

(H) FOXA1 ChIP-ISO signal as a function of the linear distance between the 3rd FOXA1 motif and the AP-1 motif. Dots are data points from individual replicates and the line represents the average.

(I) Relation between the FOXA1 ChIP-ISO signal and the total FOXA1 motif score over CCND1 variants ± AP-1/CEBPB motifs.

(J) Effect of FOXA1 on AP-1 binding. Bar plots show FOXA1 and FOSL2 ChIP-qPCR for three biological replicates over integrated WT CCND1e or a variant with all three FOXA1 motifs mutated (knockin). ChIP-qPCR over a positive/negative control locus (P and N) and the native CCND1e are also shown. Error bars represent standard error (same below, unless specified). Two-tailed unpaired t test.

To more accurately pinpoint sequence features affecting FOXA1 binding, we first replaced each FOXA1 motif with mutated, reversed, or consensus versions (Figure S2C). As expected, mutating/strengthening FOXA1 motifs significantly reduces/increases FOXA1 binding, while reversing their orientation has a minor effect (Figure 2C). Consistent with Figure 2B, the third FOXA1 motif is more influential than the other two (Figures 2C and 2D). FOXA1 binding also increases with the number of FOXA1 motifs in a non-linear fashion (Figure 2D), indicating that there is a synergistic effect among multiple adjacent binding sites.

We next examined the effect of other sequence-specific TFs that potentially co-bind with FOXA1, which we refer to as “co-factors” below. As there are eight co-factor motifs, the ChIP-ISO library includes all 256 combinations, where each motif can be wild-type (WT) or mutated. We found that AP-1 and CEBPB mutations lead to a significant decrease in FOXA1 binding (Figure 2E). AP-1 has a particularly strong effect, as mutating its motif essentially eliminates FOXA1 binding for almost all CCND1e variants (Figure 2F). The presence of AP-1 highly correlates with FOXA1 binding, even more than the total FOXA1 motif score (Figure 2G). Low-throughput FOSL2 (a subunit of AP-1) and FOXA1 ChIP confirmed the abolished binding of both factors when the AP-1 motif is mutated (Figure S2D). These data indicate that AP-1 is a crucial co-factor that potentiates FOXA1 binding to the CCND1e. Interestingly, both AP-1 and CEBPB motifs are immediately adjacent to the third FOXA1 motif, which has the largest impact on FOXA1 binding. To test the significance of this observation, we moved the AP-1 motif away from the third motif and found that FOXA1 binding declines markedly with increasing distance (Figure 2H), indicating that motif proximity is important for AP-1-facilitated FOXA1 binding.

Correlation analysis shows that FOXA1 binding is mostly affected by AP-1, its own motif, and CEBPB (Figure 2G). To understand the interplay between these factors, we plotted the FOXA1 binding strength as a function of the total FOXA1 motif score in the presence or absence of AP-1 and CEBPB (Figure 2I). Without AP-1 and CEBPB, FOXA1 can still bind strong motifs, but the presence of these two co-factors allows FOXA1 to target suboptimal motifs, at least in the CCND1e context (Figure 2I).

We also investigated the reciprocal relationship of FOXA1 on AP-1 binding to determine whether they bind hierarchically or cooperatively. We generated another CCND1e variant with all three FOXA1 motifs mutated and measured the change in FOXA1 and AP-1 binding. Notably, both FOXA1 and AP-1 binding are drastically reduced on this mutated CCND1e (Figure 2J), supporting the scenario that the binding of these two TFs is mutually dependent.

AP-1 and CEBPB co-bind with FOXA1 and assist its binding genome wide

The case study of the CCND1e demonstrates the importance of co-factors in FOXA1 binding. We next investigated the generality of this phenomena. We evaluated the co-binding of FOXA1 with other TFs in A549 cells based on the overlap between their ChIP sequencing (ChIP-seq) peaks and the occurrence of their motifs in FOXA1 peaks (STAR Methods). A large fraction (25%–45%) of FOXA1 ChIP-seq peaks overlap with the peaks of AP-1 subunits JUNB, JUND, and FOSL2 (Figure 3A), and vice versa (Figure S3A). Moreover, the most enriched motifs within FOXA1 peaks, aside from the FOXA1 motif itself, are those of the AP-1 subunits (p value < 10−1,000) (Figure 3A). These data support wide-spread co-binding of FOXA1 and AP-1. Many other TFs, including CEBPB, also display significant co-binding with FOXA1 (Figure 3A).

Figure 3. AP-1 and CEBPB co-bind with FOXA1 and assist its binding genome wide.

Figure 3.

(A) Bioinformatic analysis of FOXA1/TF co-binding. Top: schematics of co-binding events. Bottom: for each TF, the dot plot shows the percentage of the overlapped FOXA1 ChIP-seq peaks (x axis) and the enrichment of its motif within FOXA1 peaks (y axis). TFs chosen for further analysis are labeled. AP-1 subunits are indicated in blue.

(B) Heatmaps of ChIP-seq/ATAC-seq and the corresponding intensity profiles in WT A549 cells over FOXA1 binding regions separated based on FOXA1 motif scores. Sequences in the top section contain strong consensus motifs (score > 16) and the ones at the bottom contain weak motifs (score < 12). FOXA1 ChIP-seq signal is plotted as the sequencing coverage (BamCoverage) from replicate 1 generated by this study. Other ChIP-seq and ATAC-seq signals are plotted as the averaged “fold change over control” of 2–3 replicates from ENCODE (same for all the heatmaps below, unless stated otherwise).

(C) The effect on FOXA1 binding by mutating co-factor motifs. The ISO set used in (C)–(E) is derived from genomic sequences that show overlapped FOXA1 and TF ChIP-seq peaks in A549 and contain both motifs in proximity (<30 bp). The table lists the fold change and statistical significance of FOXA1 ChIP-ISO signal upon mutations of co-factor motifs in these sequences. Two-tailed paired t test.

(D and E) Box-and-whisker plots showing the changes of FOXA1 ChIP-ISO signal upon mutations of AP-1 (D) or CEBPB motif (E). Two-tailed paired t test.

(F) DeepLift-SHAP feature attribution scores highlighting features used by a sequence-trained CNN to predict FOXA1 binding in A549 cells at the CCND1e.

(G) Top 5 motifs detected by TF-MoDISco in the genome-wide DeepLift-SHAP positive feature attribution scores at sites predicted by the CNN to be bound by FOXA1. The TF family of matching motifs is annotated for each motif, as is the number of seqlets used by TF-MoDISco to construct each motif.

Given that co-factors may permit FOXA1 binding at suboptimal motifs (Figure 2I), we analyzed FOXA1-TF co-binding at FOXA1 sites with different motif strengths. We separated FOXA1 binding events near strong consensus motifs (scores > 16) or very weak ones (scores < 12) (Figure 3B, left column). The average FOXA1 binding strength is comparable over these two sets of regions. Strikingly, co-binding predominantly occurs in the sites with low motif scores, and these sites also show active histone marks and high chromatin accessibility (Figure 3B). These data suggest that TF “hubs” tend to form over weaker motifs, and these co-binding events are more likely to be functional in gene regulation. This may represent a common strategy to ensure gene expression plasticity.

Co-binding between FOXA1 and TFs does not necessarily imply cooperativity among these factors. To test whether the TFs identified in Figure 3A indeed affect FOXA1 binding, we carried out additional mutational analyses for 15 TFs that show co-binding with FOXA1. For each TF, we selected 10–20 native genomic loci where it was co-occupied by FOXA1 with proximal motifs (<30 bp) (Figure S3B) and included 193 bp sequences from these loci containing WT or mutated TF motifs in the ChIP-ISO library. AP-1 motif mutation again has the largest impact on FOXA1 binding, followed by CEBPB (Figures 3C3E), indicating that these two factors promote FOXA1 binding at many genomic loci. These results also suggest that most co-localized TFs do not bind cooperatively with FOXA1.

To further assess whether co-factor motifs are predictive of genome-wide FOXA1 binding in A549 cells, we trained a convolutional neural network (CNN) to recognize FOXA1 ChIP-seq peaks using DNA-sequence features (Figure 1A). The CNN sequence-only model achieves high overall performance, with an area under precision-recall curve (auPRC) of 0.20 compared with 0.01 from a random predictor (calculated on held-out test regions). The CNN predictions of FOXA1 binding activities are generally consistent with ChIP-ISO measurements (Figure S3C). Using the DeepLift-SHAP feature attribution approach,39,40 we characterized which DNA base positions contribute toward the CNN’s FOXA1 binding predictions at specific loci. Over CCND1e, for example, DeepLift-SHAP strongly highlights the second and third FOXA1 motifs and the AP-1 motif as positive contribution to FOXA1 binding (Figure 3F). The feature attribution scores at the FOXA1, AP-1, and CEBPB motifs are weakened when mutated, and those from FOXA1 are strengthened when replaced with the consensus, consistent with ChIP-ISO measurements (Figure S3D). In addition, we ran the TF-MoDISco tool41 to compile feature attribution scores from across all ChIP-seq peaks into commonly occurring motif patterns. Alongside FOXA1 motif variants, TF-MoDISco identifies the AP-1 and CEBPB motifs as the most prominent features that the CNN uses to predict FOXA1 binding (Figure 3G). A GC-rich sequence similar to the SP1 motif is identified as the most negative feature (Figure S3E). In summary, the CNN analysis supports the findings that AP-1 and CEBPB assist FOXA1 binding genome wide in A549 cells.

AP-1 inhibition leads to motif-directed redistribution of FOXA1 binding in the genome

To further test the role of AP-1 in promoting FOXA1 binding, we measured the effect of knocking down AP-1 on genome-wide FOXA1 binding. Because the AP-1 family has multiple homologs that may have redundant functions, we took advantage of a dominant-negative protein A-FOS to inhibit global AP-1 binding. A-FOS dimerizes with JUN family proteins to form a heterodimer that cannot bind DNA.42,43 We constructed an A549 cell line with doxycycline (Dox)-inducible A-FOS (Figures 4A and S4A) and verified that A-FOS induction leads to a near-complete global inhibition of AP-1 binding using FOSL2 ChIP-seq (Figures 4B and S4B). Over the sites where FOXA1 and AP-1 peaks overlap, the average FOXA1 ChIP-seq signal is significantly reduced by A-FOS induction, while FOXA1 binding over non-overlapping sites remains unchanged (Figures 4B4D). Differential binding analysis revealed 1,340 reduced and 234 enhanced FOXA1 peaks in the presence of A-FOS (“lost” vs. “gained” peaks). Over 80% of the lost peaks contain AP-1 motifs and/or show AP-1 binding, much higher than the unchanged and the gained peaks (Figures 4E and 4F). Mutation of AP-1 motifs in three loci with lost peaks results in similar decreases of FOXA1 binding, measured by ChIP-ISO (Figure S4C). These results further support the notion that AP-1 directs FOXA1 binding.

Figure 4. AP-1 inhibition leads to motif-directed redistribution of FOXA1 binding in the genome.

Figure 4.

(A) Schematic showing the effect of A-FOS induction. Upon doxycycline-induced overexpression of A-FOS, it dimerizes with Jun and thus prevents Fos:Jun heterodimer formation and chromatin binding.

(B) Representative genomic tracks of FosL2 and FOXA1 ChIP-seq ± A-FOS induction in WT A549 cells. In the presence of A-FOS, FOXA1 binding is significantly reduced at the overlapped site, but not the unique site.

(C) Heatmap of FosL2 and FOXA1 ChIP-seq signals over AP-1/FOXA1 overlapped sites and FOXA1 unique sites ± A-FOS. FOXA1 and FosL2 ChIP-seq signals are plotted as the sequencing coverage (BamCoverage) from replicate 1 generated by this study.

(D) Profiles of the average FOXA1 ChIP-seq intensities in (C).

(E) Volcano plot showing differential FOXA1 binding ± A-FOS. Reduced (loss) and enhanced (gain) FOXA1 peaks in +A-FOS are highlighted.

(F) Overlap of loss/unchanged/gain FOXA1 peaks with AP-1. The upper panel shows the fraction that overlapped with FOSL2 peaks, and the lower panel shows the fraction that contains AP-1 motifs.

(G) Distribution of the two orientations of FOXA1/AP-1 motifs in loss/unchanged/gain peaks. Orange arrow: FOXA1 motif, blue pentagon: AP-1 (palindromic).

(H) Distribution enrichment of FOXA1/AP-1 motif distances in lost peaks vs. unchanged peaks separated by the two orientations. Significance was calculated based on the histogram in Figure S4E using right-tailed two-proportion Z test. The dashed lines correspond to the p value threshold of 0.05. The insert plots show only enrichment significance levels above the threshold in the shaded area.

(I) The distributions of the maximum FOXA1 motif score per peak for loss/unchanged/gain FOXA1 peaks. Two-tailed unpaired t test.

(J) Fold changes of the RNA-seq counts with A-FOS overexpression for the proximal genes near loss/unchanged/gain FOXA1 peaks. Two-tailed unpaired t test.

(K) Top 10 enriched Gene Ontology (GO) terms of the proximal genes near the loss FOXA1 peaks.

We next analyzed whether AP-1-enhanced FOXA1 binding depends on specific configurations of their motifs, i.e., relative orientation and distance (Figure S4D). The two motif orientations are evenly distributed regardless of the peak category (Figure 4G), while the distance is much more likely to be <8 bp in the lost peaks (Figures 4H and S4E). These results, along with the data in Figure 2H, show that proximity, but not a specific spacing or orientation between FOXA1 and AP-1 motifs, is required for their cooperativity. Weak enrichment is also observed near 10, 20, 30, and 40 bp, suggesting that the rotational orientation of these two motifs on the same side of the DNA may promote cooperativity. In addition, the maximum FOXA1 motif scores are significantly lower in lost peaks than in unchanged and gained peaks (Figure 4I). This is consistent with the observation in Figure 3B that FOXA1 binding over weaker motifs tends to be more AP-1 dependent. It also suggests that, upon AP-1 inhibition, FOXA1 is released from the weaker sites and re-distributed to stronger motifs.

To explore the functional role of FOXA1 binding events potentiated by AP-1, we conducted RNA sequencing (RNA-seq) in cells ±A-FOS overexpression. We found that the genes proximal to lost/gain peaks show significant down-/upregulation in the absence of AP-1 (Figure 4J). Differential expression analysis also revealed the same trend, and, in particular, CCND1 is downregulated in the presence of A-FOS (Figures S4FS4H). These results indicate that AP-1-facilitated FOXA1 binding mostly activates gene expression in A549 cells. Gene Ontology (GO) analysis of the genes proximal to the lost peaks show enrichment in the cell migration, tissue development, and signal transduction categories (Figure 4K), implying their cell-type-specific and differentiation-linked functions.

In vitro study of FOXA1 binding and cooperativity with AP-1

To directly evaluate the intrinsic FOXA1 binding activity, we developed an electrophoretic mobility shift assay followed by sequencing (EMSA-seq) to measure the in vitro binding affinities between FOXA1 and all library sequences simultaneously (Figure 1A; STAR Methods). In this method, EMSA was performed using mixed library DNA incubated with purified FOXA1 at different concentrations (Figures 5A and S5A). Shifted (FOXA1-bound) vs. unshifted (unbound) bands were then purified, PCR amplified, and subjected to amplicon sequencing. Normalized sequencing counts were converted into the “ratio bound in vitro” for each sequence, which was highly correlated between two replicates (Figure S5B; STAR Methods).

Figure 5. In vitro study of FOXA1 binding and cooperativity with AP-1.

Figure 5.

(A) A representative EMSA-seq gel conducted on the ISO library with 0–60 nM of purified recombinant mouse FOXA1. The lower asterisk represents unbound oligonucleotides and the upper asterisk indicates FOXA1-bound oligonucleotides. L: 100 bp DNA ladder.

(B) FOXA1 bound fraction as a function of FOXA1 concentration measured by EMSA-seq. Data for a subset of the library, CCND1e variants, are plotted here. Green, yellow, orange, and red curves correspond to CCND1 variants with three, two, one, and zero FOXA1 motifs. The dotted line marks the fraction bound at 15 nM FOXA1, which is used to represent “binding strength in vitro” in (C)–(E).

(C) Pearson correlation coefficient between FOXA1 binding strength in vitro and CCND1 sequence variables.

(D) Correlation between the binding strength in vitro and total (summed) FOXA1 motif score for each sequence.

(E) Correlation between the binding strength in vivo (measured by ChIP-ISO) and that in vitro (EMSA-seq).

(F) Representative EMSA gels with FOXA1 titration ± 300 nM AP-1 (left, FOXA1 concentration ranges from 0 to 200 nM) or AP-1 titration ± 150 nM FOXA1 (right, AP-1 concentration ranges from 0 to 400 nM) using a CCND1 variant with the first two FOXA1 motifs mutated (FOXA112_mut). Different populations are labeled on the right side of the gel (FOXA1: orange, AP-1: blue).

(G) Quantification of the EMSA gel in (F). Error bar represents standard error for three replicates.

(H and I) Same as (F) and (G), but using different DNA templates and with two replicates performed on each template. Top: CCND1 variant with all three FOXA1 motifs mutated (FOXA1all_mut). Bottom: CCND1 variant with the first two FOXA1 motif and AP-1 motif mutated (FOXA112_mut, AP-1mut).

FOXA1 binding in vitro is primarily determined by the motif strength. Among the CCND1e variants, for example, FOXA1 binding generally increases with the number of motifs (Figures 5B and S5C). The EMSA-seq signals of the whole library are highly correlated with FOXA1 motif score (Figures 5C and 5D). Different features of DNA-shape play only a minor role (Figure S5D). Importantly, FOXA1 binding is no longer attenuated by mutations in AP-1 motifs in vitro and, if anything, shows a slight, but significant, increase (Figures 5C and S5E), confirming that the AP-1 effect is not due to inadvertent changes in intrinsic FOXA1 binding affinities. Although sequences with the highest FOXA1 ChIP-ISO signals (>7) tend to have stronger FOXA1 motifs and EMSA-seq signals than the average, FOXA1 binding in vitro in general has poor correlation with that in vivo (r = 0.21) (Figure 5E). This reinforces our previous finding that FOXA1 motif strength is only partially responsible for FOXA1 binding in vivo.

To investigate potential cooperativity between AP-1 and FOXA1 in vitro, we purified recombinant AP-1 (Figure S5A) and performed low-throughput EMSAs with AP-1 and FOXA1 using CCND1e DNA. To focus on the co-binding between AP-1 and the most proximal FOXA1 motif, we used a CCND1e template that has the first two FOXA1 motifs mutated (Figure 5F). The gels show distinct bands for DNA bound by FOXA1 or AP-1 alone, and a faint super-shift for DNA bound by both factors (Figure 5F). Quantification of the unbound band intensity shows that the presence of AP-1 moderately promotes the binding of FOXA1, and vice versa (Figure 5G). Interestingly, we observed slight binding cooperativity between FOXA1 and AP-1, even in the absence of one factor’s motif, though a super-shifted band was not visible (Figures 5H and 5I). These results suggest that FOXA1 and AP-1 may exhibit protein-protein interactions that allow them to recruit each other without direct DNA binding. This can, at least partially, explain the interdependency and cooperativity of these two factors in vivo.

FOXA1 binding is mostly determined by the local sequence, not the chromatin context

With the work above focusing on local sequences, we next explored how the larger-scale chromatin context can impact FOXA1 binding. In ChIP-ISO, native sequences containing FOXA1 motifs are moved from their endogenous loci to the euchromatic AAVS1 site (Figure S6A). Comparison of FOXA1 occupancy at the native vs. AAVS1 locus therefore allows us to infer the effect from the endogenous chromatin.

We first applied this strategy to FOXA1 sites within euchromatic regions. We selected two sets of native sequences where FOXA1 binding cannot be explained by its motif strength: those with high-score FOXA1 motifs but mostly weak binding (set 1) and vice versa (set 2) (Figures 6A and S6B). EMSA-seq shows that most of these sequences are bound by FOXA1 in vitro, with slightly higher occupancies in set 1 (Figure 6B). Strikingly, ChIP-ISO signals on these sequences at AAVS1 largely mimic the ChIP-seq intensities at the endogenous sites (Figure 6B), indicating that FOXA1 binding is mostly determined by the local sequences. Such local signals again involve co-factors, with set 2 sites being more enriched with AP-1 and CEBPB motifs and showing higher AP-1 and CEBPB binding (Figure 6C).

Figure 6. FOXA1 binding is mostly determined by the local sequence, not the chromatin context.

Figure 6.

(A) ChIP-ISO test cases where strong/weak FOXA1 motifs show low/high FOXA1 binding, named as set 1 (N = 160) and 2 (N = 144), respectively. Left: FOXA1 motif scores. Right: heatmap of FOXA1 ChIP-seq signals in WT A549.

(B) Heatmap of FOXA1 ChIP-ISO signals (left) and EMSA-seq signals (fraction bound at 15 nM FOXA1, right) for sequences in (A).

(C) Heatmap of FOSL2, JUN, and CEBPB ChIP-seq signals in WT A549 (left) and number of FOXA1, FOSL2, and CEBPB motifs for sequences in (A).

(D) Heatmap of FOXA1, H3K9me3, H3K27me3, FOSL2, and Jun ChIP-seq signals in WT A549 for a set of ChIP-ISO library sequences derived from H3K9me3-marked regions (top, N = 56) and a control set with comparable FOXA1 binding derived from euchromatic loci (bottom, N = 56). FOXA1, H3K9me3, and H3K27me3 ChIP-seq signals are plotted as sequencing coverage (BamCoverage) from replicate 1 generated by this study.

(E) Violin plot of H3K9me3 ChIP-seq signals for sequences in (D) at their native genomic loci (left) and ChIP-ISO signals of the same sequences at AAVS1 (right). Two-tailed unpaired t test.

(F) Same as (E), but for FOXA1.

(G) Correlation between FOXA1 ChIP-ISO and ChIP-seq signals for all sequences derived from the native genome in our library. Orange: sequences from euchromatin, blue: H3K9me3-marked heterochromatin, green: H3K27me3-marked heterochromatin.

(H) Precision-recall curves showing performance of neural networks trained on FOXA1 ChIP-seq data in A549 cells. Each plot shows performance of a CNN trained using only sequence (blue lines); Bichrom trained using sequence and H3K9me3 (orange lines); Bichrom trained using sequence and ATAC-seq (green lines); Bichrom trained using sequence and a selection of five histone marks (red lines); and the baseline auPRC of a random predictor (purple lines). The left plot shows the performance of the neural networks across all held-out test sites, while the right plot shows performance at FOXA1 motif instances that overlap H3K9me3 or H3K27me3 peaks.

(I) Schematic of CRISPRi method where KRAB-dCas9 is induced by doxycycline to ectopically write H3K9me3 to the endogenous CCND1e in A549 cells. This system is used to measure the effect of H3K9me3 deposition on FOXA1 binding.

(J) H3K9me3 (left) and FOXA1 (right) ChIP-qPCR signals on three biological replicates at a positive control region (P), CCND1e (C1 and C2), and a negative control region (N) with (orange) or without (beige) KRAB-dCas9 induction. **** p < 0.0001 and ns, non-significant based on two-tailed unpaired t test.

We next investigated the effect of heterochromatin by selecting FOXA1 motifs from regions covered by H3K9me3 and H3K27me3. If heterochromatin has a strong inhibitory effect, we expect FOXA1 binding to increase when these sequences are transferred to euchromatin. We picked 56 sequences from H3K9me3-marked regions (Figure 6D). For comparison, we assembled a control set in euchromatin that lacks H3K9me3 but exhibits matching levels of FOXA1 binding (Figures 6D6F). The differences in H3K9me3 signals between these two sets of regions disappear after they are relocated to the AAVS1 locus, confirming the elimination of H3K9me3 marks at the new site (Figure 6E). However, this does not lead to enhanced FOXA1 binding, as the sequences originally from heterochromatin still exhibit the same FOXA1 binding levels as the euchromatic control (Figure 6F). FOXA1 sites from H3K27me3 regions show a similar behavior (Figures S6CS6E). Combining the data from eu- and heterochromatin, FOXA1 binding at the AAVS1 site is highly correlated with that in its native sites (r = 0.69) (Figure 6G). Overall, these data suggest that the native chromatin context, including H3K9me3 and H3K27me3 marks, plays a minor role in FOXA1 binding.

To test whether H3K9me3 enables better prediction of FOXA1 binding genome wide, we again turned to neural network analysis. Specifically, we used our previously described Bichrom neural network architecture to integrate DNA sequence and various chromatin features into the training process.44 Integrating ATAC-seq signals or a combination of histone marks into the neural network helps to improve performance in distinguishing held-out FOXA1-bound and unbound sites (Figures 6H and S6F). However, integrating H3K9me3 alone alongside DNA-sequence features does not improve performance (Figure 6H), suggesting that Bichrom is unable to learn any informative relationship between H3K9me3 and FOXA1 binding.

We noted that FOXA1 motifs in H3K9me3-covered regions are weaker and have no adjacent AP-1 motifs. To ensure that the absence of a heterochromatin effect is not simply due to the lack of suitable motifs, we artificially introduced H3K9me3 to the endogenous CCND1e near a strong FOXA1 binding site by targeting KRAB-dCas9 through CRISPR interference (CRISPRi) (Figure 6I). KRAB-dCas9 induction increased the local H3K9me3 level to ~10% of the highest H3K9me3 enrichment across the genome (P1) (Figure 6J), and this deposited H3K9me3 level is comparable with some genomic H3K9me3 peaks. Despite the significant H3K9me3 deposition, there is no change in FOXA1 binding compared with uninduced cells (Figure 6J). These results are in line with our finding that native chromatin context plays only a minor role, if any, in FOXA1 binding. Together, we conclude that FOXA1’s binding specificity in vivo is more determined by the local sequence than the epigenetic background.

Cell-type-specific binding of FOXA1 correlates with differential expression of AP-1

Previous studies have shown that FOXA1 binding varies among cell types.35 Based on our findings above, we hypothesized that differential availability of co-factors may contribute to such cell-type specificity. We therefore analyzed the RNA-seq data of FOXA1 and AP-1 subunits in three cancer cell lines, A549, HepG2, and MCF-7. Interestingly, FOXA1 mRNA level is lower but AP-1 subunits are higher in A549 than the other two cell lines (Figure 7A). Immunostaining confirmed this trend at the protein level (Figure S7A). Despite the lower expression of AP-1 in HepG2 and MCF-7, FOXA1 still co-binds with AP-1, but the level of overlap is significantly reduced compared with A549 (Figure 7B).

Figure 7. Cell-type-specific binding of FOXA1 correlates with differential expression of AP-1.

Figure 7.

(A) RNA-seq counts, reported in transcripts per million (TPM), for FOXA1 and various AP-1 subunits in WT A549, HepG2, and MCF-7 cell lines. Red and blue mark the highest and lowest expression for each gene.

(B) Bioinformatic analysis of FOXA1/TF co-binding in HepG2 (top) and MCF-7 (bottom). For each TF, the dot plot shows the percentage of the overlapped FOXA1 ChIP-seq peaks (x axis) and the enrichment of its motif within FOXA1 peaks (y axis). AP-1 subunits are indicated in blue.

(C) Differential FOXA1 binding analysis in A549 and MCF-7 cells, with heatmaps showing shared, A549-enriched, and MCF-7-enriched peaks. FOSL2 and JUND ChIP-seq signals over the same regions are shown on the right.

(D) Occurrence probability of FOSL2 or JUND motif in common or differential FOXA1 peaks. Left: A549 vs. MCF-7; right: A549 vs. HepG2.

(E) Top ranking motifs detected by TF-MoDISco in the genome-wide DeepLift-SHAP positive feature attribution scores at sites predicted by a sequence-trained CNN to be bound by FOXA1 in A549, HepG2, and MCF-7 cells. The top five ranking motifs are shown unless TF-MoDISco returned fewer than five motifs. The TF family of best-matching motifs is annotated for each motif.

It is possible that the abundance of AP-1 in A549 allows it to play a more dominant role in directing FOXA1 binding than in HepG2 and MCF-7. If this is true, we would expect a higher representation of AP-1 motifs in A549-specific FOXA1 peaks than the HepG2 or MCF-7-specific peaks. To test this prediction, we performed differential binding analysis of FOXA1 in A549 vs. MCF-7/HepG2 and searched for motifs in common and cell-type-specific peaks (Figures 7C and S7B). Indeed, A549-specific FOXA1 peaks are much more likely to contain AP-1 motifs compared with shared or MCF-7/HepG2-specific peaks (Figure 7D). In A549 cells, AP-1 binding strongly correlates with that of FOXA1, and such correlation is weaker in the other two cell types (Figure 7C). Mild AP-1 binding, however, is still detectable over MCF-7- and HepG2-specific FOXA1 sites, despite the lower enrichment of AP-1 motifs. This may be due to the tethering of AP-1 by FOXA1, as indicated by the in vitro binding assay (Figures 5H and 5I). Overall, these data indicate that AP-1 is at least partially responsible for cell-type-specific binding of FOXA1.

To further characterize the importance of AP-1 in specifying FOXA1 binding, we trained DNA-sequence neural networks on 13 cell lines and tissue types, including the three above, with published FOXA1 ChIP-seq data (Figure S7C). We again performed feature attribution analysis at ChIP-seq peaks and compiled informative patterns into motifs using TF-MoDISco. Consistent with the analysis above, AP-1 and CEBPB are not top features for promoting FOXA1 binding in MCF-7 and HepG2 (Figure 7E). Interestingly, the AP-1 motif is identified as a highly informative feature for FOXA1 binding in the RT4 urinary bladder cell line, suggesting that AP-1 may assist FOXA1 binding in different cell types. In other cell lines, our neural networks identify additional informative co-factor motifs, including CTCF in 22Rv1 prostate carcinoma epithelial cells and AP-2 in both GP5d and SK-BR-3 cancer cells. We speculate that these factors may play analogous roles to AP-1 in assisting FOXA1 binding in these cell types.

DISCUSSION

Despite recent advances in genomic technology, extracting the genetic rules that govern TF/PF binding remains a formidable challenge. A significant hurdle arises from the fact that many genetic and epigenetic features can influence TF binding, and native genomes do not provide sufficient diversity to explore all possible combinations of these variables, especially within the constraints of evolution. In comparison, the ChIP-ISO assay utilizes artificially designed sequences that can circumvent evolutionary constraints and systematically perturb one genetic feature at a time.45 The synthetic sequences are inserted into the same genomic locus, which eliminates many variables caused by chromatin background, including the well-known ChIP artifact.46 Overall, the ChIP-ISO assay allows us to quantitatively dissect the contribution to TF binding from individual genetic features.

Cooperative binding is a commonly reported phenomenon among many TFs in mammalian cells, but PFs, including FOXA1, are often thought to function independently as the most upstream factors that interact with chromatin. It is therefore surprising that co-factors are key determinants of FOXA1 binding. These seemingly contradictory findings may be reconciled by considering the FOXA1 motif strength: while FOXA1 is able to bind to a subset of strong motifs in the absence of co-factors, its binding on suboptimal sites is strongly promoted by AP-1 and/or CEBPB (Figures 2I, 3B, and 4I). Co-factors also provide a mechanism for context-specific PF binding. Indeed, reduced AP-1 expression in MCF-7 and HepG2 cells releases FOXA1 near AP-1 motifs and allows it to occupy other genomic loci (Figure 7). This agrees with previous findings that the genomic distribution of FOXA1/2 can be affected by steroid receptors,47 GATA4,7 or PDX1.48 The co-factor-dependent weak sites tend to be situated in open chromatin with active histone marks and are therefore more likely to carry regulatory functions (Figure 3B). Consistently, native enhancers often contain suboptimal motifs with reduced TF binding affinities.49,50 Overall, these findings suggest that, although FOXA1 may use consensus motifs to engage with chromatin by itself, weaker motifs in conjunction with co-factors may play a more functional role during development to generate cell-type-specific binding and regulation.48,51,52

AP-1 is a ubiquitously expressed TF that is highly represented in distal enhancers in many cell types.53 It is therefore not surprising that the AP-1 motif is enriched near the binding sites of many TFs. More relevant to this work, AP-1 was shown to cobind with FOXA1 in breast and prostate cancer cells35,54,55 and with FOXA1/2 in pancreatic ductal adenocarcinoma.56 In most of these cases, however, it is not clear whether these factors bind independently, cooperatively, or hierarchically. Here, we clearly demonstrated that the binding of FOXA1 and AP-1 is mutually dependent in CCND1e, and such cooperativity may happen at other genomic loci. This finding, together with a previous proposal that AP-1 may also function as a PF,57 raises an intriguing possibility that FOXA1 and AP-1 may bind together to achieve sufficient pioneering activities in A549, which may contribute to cell-type-specific enhancer selection.58

The molecular mechanism of FOXA1 and AP-1 binding cooperativity requires further elucidation, but our current data provide some clues. First, the in vitro EMSA assay shows that AP-1 enhances FOXA1 binding, and vice versa. Mild cooperativity can even be observed on templates lacking FOXA1 or AP-1 motifs. These results indicate that there may be protein-protein interactions between these two factors that allow one to be tethered by the other. Consistent with this idea, weak AP-1 binding can be detected near many MCF-7- or HepG2-enriched FOXA1 binding sites in the absence of the AP-1 motif (Figures 7C and 7D). Second, AP-1 can stimulate FOXA1 binding with different distances and orientations between their motifs (Figures 4G and 4H). This argues against a model where FOXA1 and AP-1 form a rigid complex with highly specific interactions. It is more likely that they have weak and polymorphic interactions and that this type of “soft motif syntax” is commonly found among cooperative TFs.6,59,60 Third, although the cooperativity between FOXA1 and AP-1 is detected in vitro on naked DNA, the effect is weaker than that in vivo. It is therefore possible that their cooperativity in vivo is also promoted by nucleosomes.61 This would be consistent with the idea that FOXA1 and AP-1 have co-pioneering activities.

Finally, our study suggests that the genomic background, including the heterochromatin marks, plays a minor role in FOXA1 binding. Heterochromatin has been reported in literature to both permit and inhibit PF binding.14,62,63 For example, one study overexpressed a FOXA1 homolog, FOXA2, in immortalized foreskin fibroblasts cells and found that FOXA2 enrichment was generally depleted in H3K9me3 and H3K27me3 regions.7 However, the same study also pointed out that, instead of a direct inhibitory effect from heterochromatin, this may be due to the lack of FOXA2 binding sites within these domains. Another study found that gain of FOXA2 binding at lamin-enriched sites is correlated with loss of H3K9me3, indicating its ability to target heterochromatin.64 Here, our ChIP-ISO, CRISPRi, and neural network analyses suggest that native chromatin context at most plays a minor role in FOXA1 binding. In summary, using the ChIP-ISO approach, in combination with in vitro and in silico analyses, our study demonstrated that cooperative binding with co-factors is the primary mechanism by which PFs achieve binding specificity. This result argues against the model that PFs exclusively function independently and explains the context-dependency of PF activities observed in multiple studies.

Limitations of the study

In terms of the ChIP-ISO methodology, our current protocol suffers from low integration efficiency and low signal-to-noise ratio of the ChIP assay. As a result, the complexity of the library is limited to a few thousand, and a large number of cells are needed for the readout. These aspects should be improved in the future. In terms of the scientific findings, although the paper clearly establishes the cooperativity of FOXA1 and AP-1, and its potential role during cell differentiation, the detailed molecular nature of such cooperativity is still unknown. Finally, the effect of co-bound FOXA1 and AP-1 on local chromatin needs to be further elucidated.

STAR★METHODS

RESOURCE AVAILABILITY

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Lu Bai (lub15@psu.edu).

Materials availability

Requests for plasmids and cell lines should be directed to Lu Bai (lub15@psu.edu).

Data and code availability

  • Statement about the Data: ChIP-ISO datasets from this study, including the raw data in fastq format and the processed raw counts files, are available on Gene Expression Omnibus (GEO) with accession number GSE247411. ChIP–seq datasets produced in this study, including the raw data in fastq format and the processed BED and bigWig formats, are available on GEO with accession number GSE247412. RNA-seq datasets from this study, including the raw data in fastq format and the processed feature counts files format, are available on GEO with accession number GSE247414. EMSA-seq datasets, including the raw data in fastq format and the processed raw counts files, are available on GEO with accession number GSE247431. Raw images for immunostaining experiments are availale on Mendeley Data: https://data.mendeley.com/datasets/x6z25n3z2z/1. Previously published data used in this study is summarized in Table S2.

  • Statement about the Code: No new code is generated by this study.

  • General statement: “Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.”

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS

Cell lines

WT A549 human lung carcinoma epithelial cells, a gift from Dr. Yanming Wang, were maintained in Ham’s F-12K (Kaighn’s) Medium (Gibco 21127022) supplemented with 10% FBS (Gibco 16000044) and 1% Penicillin-Streptomycin (Gibco 15070063). WT HepG2 human liver cancer cells and MCF-7 human breast cancer cells were obtained from ATCC and maintained in Dulbecco’s Modified Eagle Medium (DMEM) (Gibco 10569044) supplemented with 10% FBS (Gibco 16000044) and 1% Penicillin-Streptomycin (Gibco 15070063). All cells were cultured at 37°C in a humidified incubator with 5% CO2. Cells at passage number one were thawed and passaged at least an additional two times prior to experimental usage. All cell lines used in this study can be found in Table S3.

METHOD DETAILS

Synthetic oligonucleotide design

The ChIP-ISO sequence library comprises 3,203 different sequences, each 229bp in length with 193bp variable regions and 18bp primer-binding regions on the two sides, each containing a BsaI or BbsI recognition site (different subsets of oligos use different primers and cutting sites) (Figure S1C). A detailed breakdown of the library composition can be found in Table S1. The synthetic oligo library was ordered from Agilent (Product #G7220A).

To design the ISO library with CCND1e variants in Figure 2, a 193bp region from CCND1e (chr11:69,654,913–69,655,105) was selected (Figure S1E), and an internal BbsI cutting site was mutated to distinguish it from the native CCND1e. The library sequences were designed using a MATLAB program developed previously in the lab.65 Each FOXA1 motif was mutated, reversed, or converted into consensus (Figure S2C). For the motifs of the other eight co-factors, the 1 to 3 most consensus bases were mutated to their complementary bases to maintain GC content, while avoiding interfering with neighboring motifs (checked by MEME “FIMO”77, Version 5.5.4). Some combinations of FOXA1 and co-factor motif variations were also included.

To design the ISO library containing native sequences with FOXA1 and TF cobinding in Figure 3, overlaps between the top 30,000 genomic FOXA1 peaks from our FOXA1 ChIP-seq data and TF ChIP-seq peaks from ENCODE were identified using BEDTools “intersect intervals”78 (Version 2.30.0). The number of overlapping regions was divided by 30,000 (number of total FOXA1 peaks) to give the percent of FOXA1 peaks overlapping the TF. The enrichment of the TF motif within the 30,000 FOXA1 peaks was calculated using MEME “AME”79 (Version 5.5.4). A selection of TFs having a large percentage of FOXA1 peak overlaps and high motif enrichment in FOXA1 peaks were chosen for further analysis. FOXA1 and TF motifs within FOXA1 / TF overlapped regions were identified using FIMO with Position Weight Matrices (PWM) obtained from JASPAR or CIS-BP. These regions were filtered to select only for sequences containing a FOXA1 and TF motif within 8–30bp measured from the center of each motif. 10 to 20 regions were included in the synthetic library for each TF. Additionally, for each region, versions containing a mutated FOXA1 or TF motif were also included.

To design the ISO library “set 1” in Figure 6A, which contains strong FOXA1 motifs but shows very weak or no FOXA1 binding, we first used FIMO to locate all FOXA1 motifs in the genome and calculated their motif scores based on PWM. A subset of motifs with score >16 and have no overlap with FOXA1 ChIP-seq peaks (evaluated by BEDTools “intersect intervals”) were selected for the library. For set2 where weak FOXA1 motifs are associated with strong ChIP-seq peaks, we first sorted FOXA1 ChIP-seq peaks based on their intensities using deepTools2.80 Among the top 50% of the peaks, the corresponding genomic sequences (peak center +/− 100bp) were retrieved using BEDTools getfasta (Version 2.30.0), and FOXA1 motifs within these sequences were identified by FIMO. Sequences that contain a single motif with the score between 10–14.5 were selected for the library. To design the ISO library containing FOXA1 motifs covered by different epigenetic marks, FIMO was first used to identify FOXA1 motifs within FOXA1 ChIP-seq peaks and ChIP-seq peaks of different histone modifications. FOXA1 motifs present in both FOXA1 and H3K9me3/H3K27me3 peaks represent FOXA1-bound motifs within these repressive regions. FOXA1 motifs associated with FOXA1 peaks but absent in all histone modification datasets were labeled as FOXA1-bound motifs within unmarked regions. FOXA1-unbound motifs within repressive or unmarked regions were labeled similarly. A random subset of 50–60 sequences within each of these categories were included in the ChIP-ISO library, with the exception of FOXA1-bound motifs within H3K9me3-marked regions, in which all 21 regions were included. Each 193bp sequence was designed to include the genomic region surrounding the centered FOXA1 motif.

Plasmid construction

All plasmids and primers used in this study can be found in Table S3. The plasmid library backbone was derived from pAAVS1-Nst-MCS, which was a gift from Knut Woltjen (Addgene plasmid # 80487; http://n2t.net/addgene:80487; RRID:Addgene_80487). A ~2kb human genomic region containing the CCND1e (chr11:69,653,809–69,655,876) was cloned between the PacI and SalI cutting sites. The two endogenous BsaI cutting sites on the resulting plasmid were mutated. The 193bp CCND1e sequence (chr11:69,654,913–69,655,105) was replaced by two BsaI cutting sites. The resulting plasmid named pCX1.10 was used as the backbone plasmid for ChIP-ISO plasmid library construction.

The plasmid expressing Cas9 and gRNA was derived from eSpCas9(1.1), a gift from Feng Zhang (Addgene plasmid # 71814; http://n2t.net/addgene:71814; RRID:Addgene_71814). sgRNA-T2 (5’-GGGGCCACTAGGGACAGGAT-3’) was cloned between the two BbsI cutting sites. The resulting plasmid was named pCX3.10.

To construct the plasmid for A-FOS overexpression, A-FOS sequence was PCR amplified from plasmid CMV500 A-FOS, a gift from Charles Vinson (Addgene plasmid # 33353; http://n2t.net/addgene:33353; RRID:Addgene_33353), and cloned into Xlone-GFP plasmid81 (a gift from Dr. Lance Lian), replacing the GFP gene between the KpnI and SpeI cutting sites. The resulting plasmid was named pCX4.1.

To design the piggyBac gRNA-containing vector used to randomly integrate CCND1e-targeting gRNAs into the genome, we followed a protocol described previously, with the following changes.82 Two CCND1e-targeted gRNAs were designed using CHOPCHOP at a distance of roughly −300bp from the first FOXA1 motif and +300bp from the third FOXA1 motif. Oligonucleotides (IDT) corresponding to these two gRNA sequences were cloned into pGEP179_pX330K (Addgene 137882) and pX330S-2 (Addgene Kit 1000000055) according to the kit instructions. These gRNA-containing vectors were assembled by Gibson assembly into a single entry vector for Gateway cloning into pGEP163 (Addgene 137881), resulting in the plasmid pGEP163_CCND1_U2_D4.

Generation of the plasmid library for ChIP-ISO

The synthetic oligonucleotide library was resuspended in TE buffer, pH 8.0, and diluted with water to 10 nM for PCR amplification. For each 1,000 types of oligonucleotides, 32 μl of 10 nM diluted synthetic library was amplified in a 400 μl PCR reaction (final template concentration 800 pM) for 13 cycles, using NEBNext Ultra II Q5 master mix (NEB M0544S). The PCR product was purified using Amicon Ultra-2mL 50K centrifugal filter (Millipore UFC205024) to exchange the PCR solution for 1X NEB CutSmart buffer. DNA concentration was estimated by agarose electrophoresis. ~1.8 μg of the amplified library was digested with 60 U of BsaI/BbsI at 37°C overnight, followed by adding 30 U of extra BsaI/BbsI and digestion at 37°C for another two hours. The digestion products were purified with AMPure XP beads (Beckman Coulter MSPP-A63880) with a beads to DNA ratio of 1.8. 5 μg of pCX1.10 plasmid was digested with 120 U of BsaI at 37°C overnight, followed by adding 30 U of extra BsaI and digestion at 37°C for another two hours. 10 U of CIP was added into the digestion reaction, followed by incubation at 37°C for one hour to dephosphorylate 5’-ends. The linearized plasmid backbone was purified with E.Z.N.A. cycle pure kit (Omega D6492–02). 600 ng linearized plasmid backbone and 48 ng of digested library (molar ratio of 1:3) were ligated with 5 μL (2000 U) of T4 DNA ligase (NEB M0202S) in a 100 μL reaction. The ligation reaction was incubated at 16°C overnight, purified with E.Z.N.A. cycle pure kit and eluted with 30 μL of water. The purified ligation product was transformed into 5-alpha electrocompetent E. coli (NEB C2989) via electroporation. In each electroporation reaction, 25 μL of electrocompetent cells was transformed with 2.5 μL of purified ligation product. Adequate number of electroporation reactions were done to produce at least ~100,000 colonies per 1,000 types of oligonucleotides. The E. coli cells were then pooled and grown overnight with ampicillin selection, followed by plasmid extraction using the E.Z.N.A. plasmid DNA maxi kit (Omega D6922–02).

Generation of the cell library for ChIP-ISO

9.06×106 WT A549 cells were plated per 15 cm dish in 22.5 mL of Ham’s F-12K (Kaighn’s) Medium supplemented with 10% FBS and 1% Penicillin-Streptomycin. 24 hours later, cells in each dish were transfected with 7.875 μg of pCX3.10 plasmid (expressing gRNA and Cas9), 23.625 μg of library plasmids, 96.75 μL of lipofectamine 3000 reagent (ThermoFisher L3000015) and 63 μL of P3000 reagent, which were diluted in Opti-MEM. 8–10 hours post-transfection, the media was replaced with fresh Ham’s F-12K (Kaighn’s) Medium supplemented with 10% FBS and 1% Penicillin-Streptomycin. 48 hours post-transfection, the cells in each dish were dissociated from the dish and splitted into two 15 cm dishes with Ham’s F-12K (Kaighn’s) Medium supplemented with 15% FBS, 1% Penicillin-Streptomycin and 600 μg/mL of G418. Media changes were performed every three to four days while the G418 selection was kept. When cell colonies were visible, the number of colonies was estimated by counting colonies inside randomly sampled grids on the dish under the microscope. Adequate number of transfection reactions were done, which produced ~92,000 colonies for 3,203 types of sequences. The cells were then dissociated from the dishes, disaggregated, pooled and plated in new 15 cm dishes with Ham’s F-12K (Kaighn’s) Medium supplemented with 10% FBS, 1% Penicillin-Streptomycin and 300 μg/mL of G418. The pooled cell library was maintained and expanded for ChIP.

Chromatin immunoprecipitation (ChIP) and ChIP-seq

ChIP was performed with the cell library for ChP-ISO following a standard ChIP protocol. To fix protein-DNA interactions, formaldehyde (Ricca Chemical Company RSOF0010250A) was added to 2 X 107 adherent log-phase cells to a final concentration of 1% and incubated for 10 minutes at room temperature. For FOXA1 and histone modification ChIP-ISO samples, 1.6 X 108 cells and 8 X 107 cells were fixed, respectively, in individual plates of 2 X 107 cells for each replicate. Cross-linking was quenched by addition of glycine to a final concentration of 0.125 M and incubated for 5 minutes at room temperature. Cells were washed twice with cold 1X DPBS. Cells were scraped into cold 1X DPBS and pelleted at 4°C. In some cases, cell pellets were snap frozen using liquid nitrogen and stored at −80°C until ready to proceed. Fresh or thawed cell pellets were lysed by incubating cells in 2.5 mL cell lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP-40) supplemented with protease inhibitor cocktail (Sigma-Aldrich P8340) for 10 minutes on ice. Cell nuclei were pelleted at 4°C and lysed in 150 μL nuclei lysis buffer (50 mM Tris-Cl pH 8.0, 10 mM EDTA, 1% SDS) supplemented with protease inhibitor cocktail for 10 minutes on ice. Chromatin was fragmented in Diagenode Pico with a circulating water bath at 4°C using the Shear and Go Easy Mode setting for 3 cycles (30 seconds on followed by 30 seconds off). Sonicated chromatin was centrifuged to remove cell debris and residual SDS precipitate. The supernatant containing sheared chromatin was pooled across all replicates, a 50 μL input DNA sample was reserved, and the remaining pool was split again into eight (FOXA1) or four (histone modification) chromatin samples. In some cases, supernatant containing sheared chromatin was snap frozen using liquid nitrogen and stored at −80°C until ready to proceed.

For each ChIP sample, 20 μL Magna ChIP Protein A+G Magnetic Beads (Sigma-Aldrich 16–663) was washed four times with 1X DPBS supplemented with 5 mg/mL BSA and subsequently crosslinked to 5 μg antibody (Anti-FOXA1 antibody: GeneTex, GTX100308; Anti-Fra2 antibody: Cell Signaling Technology, 19967S; Anti-H3K9me3 antibody: abcam, ab8898; and Anti-H3K27me3 antibody: abcam, ab6002) for two hours at 4°C. Antibody-crosslinked magnetic beads were washed four additional times with the DPBS/BSA solution. Each chromatin sample except the input was incubated with the washed antibody-crosslinked magnetic beads for two hours at 4°C. Next, the magnetic beads were washed five times with LiCl wash buffer (100 mM Tris pH 7.5, 500 mM LiCl, 1% NP-40, 1% sodium deoxycholate) and once with 1X TE buffer (10 mM Tris-HCl pH 7.5, 0.1 mM Na2EDTA) at room temperature. Immunoprecipitated chromatin was eluted from the magnetic beads by incubating in IP elution buffer (1% SDS, 0.1 M NaHCO3) for 1 hour at 65°C. The collected supernatant (containing immunoprecipitated chromatin) and the reserved input sample were both incubated with 40 μg RNase A and NaCl to a final concentration of 0.37 M overnight at 65°C. The next day, 80 μg proteinase K was added to the ChIP sample and 400 μg was added to the input sample and both were incubated at 55°C for 2 hours. The ChIP and input DNA was purified via phenol-chloroform extraction. At the DNA elution step, the individual ChIP DNA pellets were resuspended together to achieve a more concentrated sample.

For ChIP-seq, the same ChIP protocol was used, but with the following modifications. For each ChIP biological replicate, only 2 X 107 adherent log-phase cells were fixed. 50 μL of sheared chromatin was reserved per replicate as the genomic input sample. At the DNA elution step, each ChIP pellet was resuspended separately. Prior to any downstream applications, the success of the ChIP reaction was determined via qPCR using various positive, negative, and locus-specific primer pairs. Sequencing library was constructed by NEBNext ultra II DNA library prep kit (NEB E7103L). 50 million paired-end 50bp reads were obtained for each ChIP and input sample using a NextSeq 2000 instrument.

Amplicon sequencing for ChIP-ISO

To amplify the integrated ChIP-ISO library sequences while excluding other genomic DNA, including native CCND1e, we performed two rounds of PCR amplification with a BbsI digestion step in between (Figure S1J). The primer pairs for the first round of PCR contain regions annealing to the CCND1e (outside the 193bp library sequence) at the 3’-ends, partial Illumina TruSeq adaptor sequences at the 5’-end and 0–3 random nucleotide spacers in between to increase sequence complexity. Primers with different numbers of spacers are mixed in equimolar ratio for the first round of PCR. Preliminary PCR tests were performed to decide the optimal cycle number that keeps the PCR reactions in exponential phase. We used 23–25 cycles for our first round of PCR. For the first round of PCR, 30 μl of ChIP DNA was amplified in a 100 μl PCR reaction using NEBNext Ultra II Q5 master mix. The PCR products were purified with AMPure XP beads with a beads to DNA ratio of 0.9. 15 μL of the purified PCR product was digested with 20 U of BbsI in a 30 μL reaction at 37°C for two hours. The digestion products were purified with AMPure XP beads with a beads to DNA ratio of 0.9, followed by the second round of PCR. The primer pairs for the second round of PCR contain the rest of the Illumina TruSeq adaptor sequences and sample indexes. For the second round of PCR, 2 μl of purified digestion product was amplified in a 50 μl PCR reaction for 8 cycles using NEBNext Ultra II Q5 master mix. The PCR products were purified with AMPure XP beads with a beads to DNA ratio of 0.8. Quality control was conducted with TapeStation (Agilent). 30 million paired-end 150bp reads were obtained for each ChIP-ISO and input sample using a NextSeq 2000 (Illumina) instrument. Demultiplexing was performed using DRAGEN BCL Convert (v3.8.4).

Low-throughput ChIP-qPCR test of binding on single integrated sequences

The overall process was the same as ChIP-ISO. Instead of the synthetic oligonucleotide library, single synthetic sequences were cloned into the pCX1.10 plasmid backbone. WT A549 cells were transfected with the resulting plasmids individually, together with pCX3.10 plasmid. For each synthetic sequence, the cell colonies were pooled together after G418 selection, and expanded for ChIP. To measure TF binding to the integrated synthetic sequences and the native CCND1 enhancer separately, quantitative PCR (qPCR) was conducted with primer pairs that can distinguish between the integrated and the native sequences (Table S3). Locked nucleic acids (LNAs) were incorporated into the primers to increase specificity. qPCRs were performed using the Agilent AriaMx Real-Time PCR System with the SYBR Green optical module (Emission 516.0 nm, Excitation 462.0 nm). The ChIP-qPCR signal for each sequence of interest was normalized to the ChIP-qPCR signal of a positive or negative control sequence.

RNA-seq

Cells were lysed by Trizol (ThermoFisher 15596026), extracted by 0.2 volume of chloroform, followed by adding equal volume of 100% ethanol. RNeasy kit (Qiagen 74104) was then used to purify the RNA. 3 μg of purified RNA was treated with 2 U of RNase-free DNase I, and purified again with RNeasy kit. Sequencing library was constructed by the Illumina Stranded mRNA Prep kit (Illumina 20040532). 30 million paired-end 50bp reads were obtained for each RNA-seq sample using a NextSeq 2000 instrument. Data analysis was conducted based on a protocol from Batut et al.83

A-FOS cell line construction

To construct A549 ePB tet-on A-FOS cell line, 6×105 WT A549 cells were plated in a 6-well plate well in Ham’s F-12K (Kaighn’s) Medium supplemented with 10% FBS and 1% Penicillin-Streptomycin. 24 hours later, cells were transfected with 0.72 μg of piggyBac transposase plasmid (System Biosciences PB210PA-1) and 1.78 μg of pCX4.1 plasmid using Lipofectamine 3000 transfection reagent. 12 hours post-transfection, the media was replaced with fresh medium. 48 hours post-transfection, the cells in the well were dissociated from the dish and splitted into two 6-well plate wells with Ham’s F-12K (Kaighn’s) Medium supplemented with 15% FBS, 1% Penicillin-Streptomycin and 10 μg/mL of blasticidin. Media changes were performed every three to four days while the blasticidin selection was kept. When cell colonies were visible, the cells were then dissociated from the wells, disaggregated, pooled and maintained in Ham’s F-12K (Kaighn’s) Medium supplemented with 10% FBS, 1% Penicillin-Streptomycin and 5 μg/mL of blasticidin. The expanded cells were subject to immunofluorescence, ChIP-seq and RNA-seq.

Recombinant protein expression and purification

Mouse FOXA1 (UniProtKB: P35582) fused to an N-terminal 6x-histidine tag was expressed in BL21(DE3)pLysS E. coli cells (Novagen 69388–3) at 37°C for 3 hours using the bacterial expression plasmid pET-28b-FOXA1, a gift from K. Zaret, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA.84 Harvested bacterial cells were resuspended and lysed by sonication in P300 buffer (50 mM sodium phosphate pH 7.0, 300 mM NaCl, 5 mM 2-mercaptoethanol, 1 mM benzamidine). Following extraction of soluble proteins, insoluble material containing FOXA1 was resuspended and sonicated in P300 buffer with 7 M urea added. Solubilized FOXA1 was isolated using Ni-NTA chromatography (GoldBio H-350–25), and further purified by Source S cation-exchange chromatography (Cytiva 17–0944-01). FOXA1 protein was refolded by dialyzing against 10 mM HEPES pH 7.5, 100 mM NaCl, 100 mM 2-mercaptoethanol, 0.9 M urea. Glycerol was added to 20% for storage.

Genes encoding human c-Fos (UniprotKB: P01100) fused to an N-terminal 6x-histidine tag and untagged human c-Jun (UniprotKB: P05412) were subcloned into pST3985 and pST50Tr,86 respectively, from pST39-F:cJun/6xHis:cFos, a gift from C.M. Chiang, UT Southwestern Medical Center, Dallas, TX.87 Expression of each protein was carried out separately in Rosetta2(DE3)pLysS E. coli cells (Novagen 70951) at 37°C. Soluble proteins were extracted as described for FOXA1, and insoluble materials containing c-Fos and c-Jun were processed separately. c-Fos was solubilized by resuspension and sonication in T100 buffer (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 5 mM DTT). Following centrifugation, clarified extract was dialyzed into P300 buffer containing 7 M urea, and c-Fos was partially purified from it using Ni-NTA chromatography. Insoluble material containing c-Jun was washed three times in T100 buffer, followed by solubilization in 20 mM Tris-HCl pH 7.5, 1 mM EDTA, 1 mM DTT, 6 M guanidine-HCl. Refolding of c-Fos/c-Jun heterodimers was performed by stepwise dialysis as described by Ferguson and Goodrich88 and purified by cobalt metal-affinity chromatography (Talon resin, Clontech 635652). cFos/cJun was stored in 18 mM Tris-HCl pH 7.5, 90 mM NaCl, 9 mM 2-mercaptoethanol, 20% glycerol. Proteins were analyzed by SDS-PAGE (Figure S5A).

Electrophoretic mobility shift assay with sequencing (EMSA-seq)

EMSA-seq is inspired by a similar method, SeEN-seq.89 A pooled equimolar mixture of the ChIP-ISO synthetic oligonucleotide library was PCR-amplified and purified via agarose gel purification (Thermo Scientific K0691). The EMSA protocol was adapted from Michael et al.89 Briefly, 100 nM ChIP-ISO synthetic oligonucleotide library was incubated with 50X non-specific competitor DNA and 0–60 nM recombinant mouse 6xHis-FOXA1 in binding buffer (10 mM Tris-HCl pH 7.5, 1 mM MgCl2, 10 mM ZnCl2, 50 mM KCl, 3 mg/mL BSA, 10% glycerol, and 1 mM DTT) at room temperature for 30 minutes. Free and FOXA1-bound library sequences were separated on a 7.5% non-denaturing polyacrylamide gel (Bio-Rad 4561026) run in 1X Tris-Glycine at 200V at room temperature for 30 minutes. Gels were stained with 1 μg/mL Ethidium Bromide (Invitrogen 15585011) in 1X Tris-Glycine for 10 minutes at room temperature. Stained gels were visualized with a Bio-Rad GelDoc Go Imaging System using the Ethidium Bromide setting (Figure 5A). The FOXA1-bound and -unbound DNA bands were excised at each FOXA1 concentration, and the DNA was eluted from each polyacrylamide gel slice following a User-Developed Protocol for extraction of DNA fragments from polyacrylamide gel using the QIAGEN QIAquick Gel Extraction Kit (QIAGEN 28704). The gel-extracted DNA was PCR-amplified and purified using AMPure XP Beads (Beckman Coulter A63880) with a 0.9x bead cleanup ratio. The library was created in the same manner as the ChIP-ISO library.

30 million paired-end 150bp reads were obtained for FOXA1-bound and -unbound DNA samples at each protein concentration using a NextSeq 2000 instrument. The sequencing data was processed and analyzed in the same manner as the ChIP-ISO datasets. The number of filtered paired-end reads aligning to each ChIP-ISO library sequence was counted for each bound and unbound sample and normalized to the number of sequencing reads for each sample. The ratio of FOXA1-bound DNA to total input (FOXA1-bound + FOXA1-unbound) DNA was calculated for each synthetic oligonucleotide library sequence at each FOXA1 concentration and normalized to the corresponding FOXA1-bound ratio of negative control CCND1e-FOXA1all_mut (Index: 22). To correct for systematic error between the two replicates, these negative-normalized values were multiplied by a constant such that the slope of the linear fit of the plot of the two replicates approached 1 at each FOXA1 concentration. The adjusted ratios were further normalized to the highest ratio across all FOXA1 concentrations, forcing ratios to fall between 0–1 for ease of analysis. For simplicity, these values are called the “ratio bound in vitro.” To determine the FOXA1 concentration at which the ratio bound in vitro falls into the linear range for most sequences, these values were plotted for all library sequences in the top-down CCND1e FOXA1 motif category (Figure 5B). For each sequence, the data points were fit with the Hill slope equation Y=Bmax*Xh/(Kdh + Xh), where X = FOXA1 concentration and Y = ratio bound in vitro, to model specific FOXA1 binding. For all quantitative analyses, the ratio of each synthetic oligonucleotide library sequence bound by FOXA1 in vitro was approximated by its FOXA1-bound ratio at 15 nM FOXA1, which was extrapolated by averaging its FOXA1-bound ratios at 10 nM and 20 nM FOXA1. This FOXA1 concentration was chosen because it falls into the linear range of the FOXA1 binding curves of most sequences (Figure 5B).

FOXA1 and AP-1 co-binding electrophoretic mobility shift assays (EMSAs)

Three CCND1e DNA templates were designed to include different FOXA1 and AP-1 motif mutants. FOXA112_mut has mutations in the two upstream FOXA1 motifs, FOXA1all_mut has mutations in all three FOXA1 motifs, and FOXA112_mut, AP-1mut has mutations in the two upstream FOXA1 motifs and single AP-1 motif. These DNA templates were individually PCR-amplified and purified using a PCR clean-up kit (Omega Bio-tek D6492). In the FOXA1 EMSAs, 100 nM CCND1e DNA was incubated with 50X non-specific competitor DNA and 0–200 nM recombinant mouse 6xHis-FOXA1 in DNA-binding buffer (10 mM Tris-HCl pH 7.5, 1 mM MgCl2, 10 μM ZnCl2, 50 mM KCl, 3 mg/mL BSA, 10% glycerol, and 1 mM DTT), with or without 300 nM recombinant human AP-1. Similarly, in the AP-1 EMSAs, up to 400 nM cJun/6His:cFos was titrated into the same buffer/DNA solution, with or without 150 nM FOXA1. The EMSA samples were incubated at room temperature for 30 minutes and separated on a 7.5% non-denaturing polyacrylamide gel (Bio-Rad 4561026) run in 1X Tris-Glycine at 200V at room temperature for 30 minutes. Gels were stained with 1 μg/mL Ethidium Bromide (Invitrogen 15585011) in 1X Tris-Glycine for 10 minutes at room temperature. Stained gels were visualized with a Bio-Rad GelDoc Go Imaging System using the Ethidium Bromide setting (Figures 5F and 5H). For the FOXA1 titration EMSAs, the unbound fraction was calculated at each FOXA1 concentration by normalizing the intensity of the free band to the intensity of the free band at 0 nM FOXA1 ± AP-1 (Figures 5G and 5I). The equivalent calculations were performed for the AP-1 titration EMSAs.

CRISPRi

To integrate the KRAB-dCas9 construct into the AAVS1 locus, 2 X 106 cells were co-transfected with 10 mg pT077 (Addgene 137879), 1.5 μg AAVS1 TALEN L (Addgene 59025) and 1.5 μg AAVS1 TALEN R (Addgene 59026) using Lipofectamine 3000 Transfection Reagent (Invitrogen L3000015) according to the manufacturer’s protocol. Transfected cells were transferred to a medium supplemented with 700 μg/mL G418 (Gibco 10131035) 24 hours post-transfection and maintained in this medium to allow for single-cell colony formation (~14 days). Single colonies were picked and seeded into 24-well plates. Colonies were maintained in G418-supplemented medium until they reached sufficient cell density, and those that retained normal cell morphology and growth rate were split for visualization of EGFP and maintenance. To visualize the EGFP expression of each colony, cells were plated into two wells of an 8-well dish (ibidi 80806); one well of cells was induced with 1 μg/mL doxycycline (Sigma-Aldrich D5207) 24 hours after plating and one well was left untreated. 48 hours post-induction, inducible expression of KRAB-dCas9 was confirmed by measuring EGFP expression in induced cells normalized to untreated cells. Colonies with high relative EGFP expression and homogeneity were frozen for storage. The colony with the highest EGFP expression and homogeneity was validated using genotyping PCR to confirm KRAB-dCas9 integration at the AAVS1 locus. To randomly integrate CCND1e-targeting gRNAs throughout the genome of KRAB-dCas9 cells, 6 X 105 cells were co-transfected with 5 μg of gRNA-containing piggyBac vector and 1 μg of piggyBac transposase plasmid (System Biosciences PB210PA-1) using Lipofectamine 3000 Transfection Reagent. Transfected cells were transferred to a medium supplemented with 700 μg/mL G418 and 10 μg/mL Blasticidin S HCl (Gibco A11139) 24 hours post-transfection and maintained in this medium to allow for single-cell colony formation (~14 days). Constitutive expression of gRNAs was confirmed by measuring the expression of mRFP in a mixed cell population. Anti-Histone H3 (tri-methyl K9) antibody (abcam ab8898) was used to perform H3K9me3 ChIP on the mixed cell population, followed by qPCR to confirm H3K9me3 deposition at the CCND1e. Anti-FOXA1 antibody (GeneTex GTX100308) was used to perform FOXA1 ChIP on the mixed cell population, followed by qPCR, to monitor any changes to FOXA1 binding at the CCND1e upon H3K9me3 deposition.

Immunofluorescence

Immunofluorescence experiments were performed according to a protocol from Yoney et al.90 The following primary antibodies and dilutions were used: FLAG (mouse monoclonal, Millipore-Sigma, F1804, 1:1000), FOXA1 (rabbit polyclonal, GeneTex, GTX100308, 1:500), and FOSL1 (mouse monoclonal, Santa Cruz Biotechnology, sc-28310, 1:50). The following secondary antibodies and dilutions were used: goat anti-mouse IgG(H+L) (Alexa Fluor 594, ThermoFisher, A-11005, 1:1000), and goat anti-rabbit IgG(H+L) (Alexa Fluor 488, ThermoFisher, A-11008, 1:500).

Immunostained A549 ePB tet-on A-FOS cells were imaged using Leica DMI6000 with Hamamatsu ORCA-R2 C10600 camera and SOLA SE light source. Images were acquired in phase contrast, GFP, and Texas Red channels, and with 40x/1.30 objective. Immunostained WT A549, MCF-7 and HepG2 cells were imaged using Zeiss Axio Observer 7 with camera Axiocam 705 mono. Images were acquired in DIC, AF594, AF488 and DAPI channels, and with 20x/0.8 objective. The average fluorescence intensity within the nuclei of each cell in the field was calculated using the Zeiss Bio Apps Gene Expression tool. The measurement area was limited to the cell nucleus, which was detected from the signal in the DAPI-stained channel. Average fluorescence intensity in the green/red was then measured for each cell in the field and normalized to the DAPI intensity in the same cell to correct for differences in cell permeability across cell types.

QUANTIFICATION AND STATISTICAL ANALYSIS

Sequencing data analysis for ChIP-ISO

Raw sequencing reads were filtered using fastp with default settings66 (Version 0.23.2), and the first three nucleotides were trimmed from each read using cutadapt67 (Version 4.4) to remove the 0–3 random nucleotide spacers introduced by the amplicon primers. Processed forward and reverse reads were merged into single reads based on their overlapping regions using NGmerge68 (Version 0.1). Merged reads were aligned to a FASTA file containing all ChIP-ISO library sequences (including their reversely ligated versions) using BWA-MEM269 (Version 2.2.1). BAM alignments containing at least 2 mismatched nucleotides were filtered out using BAMtools70 (Version 2.4.0). The number of filtered reads aligning to each ChIP-ISO library sequence was counted for each ChIP and input sample and normalized to the total number of sequencing reads for each sample. ChIP-ISO signal was calculated by dividing the normalized number of ChIP counts by the normalized number of input counts. Any sequence having fewer than 1000 input counts was excluded from further analyses, resulting in FOXA1 ChIP-ISO signals for 1,882 sequences. ChIP-ISO signal is reported and plotted as the average of two independent biological ChIP-ISO replicates.

ChIP-seq data analysis

Paired-end reads were filtered using fastp with default settings and subsequently aligned to the human reference genome (hg38) using BWA-MEM2. Resulting BAM files were filtered for MAPQ scores > 20 using SAMtools71 (Version 1.8). Mapped regions within the ENCODE Blacklist were excluded from further analysis91 (hg38 Version 2). Read coverage was obtained separately for input and ChIP samples using deepTools “bamCoverage”, with a bin size of 10bp80 (Version 3.5.1), and visualized in IGV (Version 2.8.12) or the UCSC Genome Browser. ChIP peaks were called from pooled ChIP replicates using MACS2 callpeak with the default settings72 (Version 2.1.1.20160309). Heatmaps and intensity profiles were generated using computeMatrix, plotHeatmap, and plotProfile functions in deepTools280 (Version 3.5.4). To calculate the ChIP-seq signal, the number of filtered reads aligning to selected regions was counted for the ChIP and input sample using BEDTools “MultiCovBed” and normalized to the total number of sequencing reads for each sample. The ChIP-seq signal was then calculated by dividing the normalized number of ChIP counts by the normalized number of input counts.

A-FOS-related data analysis

To identify lost, unchanged and gained FOXA1 ChIP-seq peaks in +Dox versus −Dox conditions, differential binding analysis was conducted with Bioconductor “DiffBind”73 (Version 3.18) using default settings. The regions within each category were converted from BED to FASTA format using BEDTools “GetFasta”, and FOXA1 and AP-1 motif scanning was performed inside these regions using MEME “FIMO”. To identify proximal genes of FOXA1 ChIP-seq peaks, MEME “T-Gene”92 (Version 5.5.4) was first used to predict target genes for each category of peaks. The predicted target genes, whose distances to the corresponding ChIP-seq peaks are smaller than 100 kb, are selected as the proximal genes for further analysis. Gene ontology (GO) analysis on the proximal genes of the lost peaks was performed by “Metascape”74 (Version 3.5.20230501) using default settings.

Cell-type-specific FOXA1 and AP-1 motifs, binding, and expression

To identify FOXA1 binding sites that are enriched, shared, or depleted in A549 cells compared to HepG2 / MCF-7 cells, a HepG2 FOXA1 ChIP-seq BED file containing significant peaks from at least two replicates was acquired from the ENCODE Database, sorted by coordinate using BEDTools “sortBED”, and pooled with the top 30,000 significant WT A549 FOXA1 ChIP-seq peaks from two replicates we acquired. To determine the number of reads aligning to each region in the pooled BED file across cell types and replicates, the pooled BED file and the corresponding BAM files from two HepG2 and two A549 replicates were input into BEDTools “MultiCovBed” using default parameters. The output from this tool was input into Bioconductor “edgeR”75 (Version 3.34.0) using default settings to identify regions enriched, shared, or depleted in FOXA1 binding in A549 vs HepG2. Regions with a log2 fold-change less than −1 were labeled A549-depleted, between −1 and 1 were labeled shared, and greater than 1 were labeled A549-enriched. The regions within each category were converted from BED to FASTA format using BEDTools “GetFastaBed”, and each category was input into MEME “FIMO” to identify the number of regions containing FOS or JUN motifs. The percent of regions in each category containing each individual FOS or JUN motif was calculated.

Neural network architectures

The sequence-only convolutional neural network (CNN) model aims to predict FOXA1 ChIP-seq peaks using DNA sequence input. Briefly, one-hot encoded DNA sequence input of length 240bp is first passed through a 1D convolution layer of 256 filters, where each filter is of size 24 and stride 1. After convolution, the output is processed by ReLU activation and batch normalization. A 1D max-pooling layer of size 15 and stride 15 is then applied to pool the output. The pooled output is fed into a long short-term memory (LSTM) layer to output a 32-length vector. The output vector passes through two dense layers with ReLU activation and Dropout. Finally, a single sigmoid activated linear node outputs the prediction probability.

The Bichrom models aim to assess whether chromatin features positively contribute to predicting FOXA1 binding and use a previously published interpretable bimodal neuron network architecture named Bichrom.44 Bichrom consists of two independent subnetworks, corresponding to DNA sequence and chromatin input, respectively. The sequence sub-network is the trained CNN network described above with all the trained weights frozen. The final linear node is replaced by another linear node activated by a tanh function. The input to the chromatin sub-network consists of the relevant chromatin feature(s) coverage track(s), binned in 20bp bins, across the same 240bp region as DNA sequence input. The chromatin feature input passes through a ReLU activated 1D convolution layer of 15 filters (kernel size 1) and a LSTM layer to output a 5-vector. A tanh activated linear node is then used to get the scalar output. The full Bichrom model works by combining the scalar values from both sub-networks into a sigmoid activated linear node to predict the TF binding label. Three Bichrom models were tested: one trained on DNA-sequence and ATAC-seq features; one trained on DNA-sequence and H3K9me3 features; and one trained on DNA-sequence and ATAC-seq, H3K9me3, H3K27ac, H3K4me1, H3K4me2, and H3K4me3 features. All chromatin features were sourced from ENCODE A549 ATAC/ChIP-seq experiments.

Neural network training

For both model architectures, two chromosomes are held out for validation (chr11) and test (chr17). The sampling strategies differ by the model type. For the sequence-only CNN model, positive sample regions are obtained by randomly shifting 240bp long regions centered by ChIP-seq peak midpoints (−95bp<=shifting distance<95bp). Negative sample regions are sampled from four different sources: 1) flanking negative regions around ChIP-seq peaks (flanking distances: [450, −450, 500, −500, 1250, −1250, 1750, −1750]); 2) accessible regions not overlapping ChIP-seq peaks; 3) non-accessible regions not overlapping ChIP-seq peaks; 4) random regions sampled from the entire genome and not overlapping ChIP-seq peaks. The goal of sampling is to ensure the percentages of accessible regions in both positive and negative samples are the same. Bichrom models use the same positive sample regions as the sequence-only CNN model, while negative sample regions only consist of random regions sampled from the entire genome and not overlapping ChIP-seq peaks.

Neural network feature attribution

The DeepLift-SHAP implementation from SHAP39,40 was employed to compute the attribution scores for each trained model. The hypothetical attribution scores were obtained by computing the DeepLift-SHAP score of all possible nucleotide choices at each base pair. Then TF-MoDISco was used to extract globally high-impact sequence patterns with the option -n 50000. The final sequence patterns were then compared to motifs from Cis-BP93 using Tomtom.94

DNA shape analysis

The DNA shape scores were computed by DNAShapeR76,95,96 using default settings. Each type of DNA shape score was plotted around the FOXA1 motif center.

Supplementary Material

1
3
4
5

KEY RESOURCES TABLE.

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Anti-FOXA1, rabbit polyclonal GeneTex Cat#GTX100308
Anti-Fra2, rabbit monoclonal Cell Signaling Technology Cat#19967S
Anti-H3K9me3, rabbit polyclonal abcam Cat#ab8898
Anti-H3K27me3, mouse monoclonal abcam Cat#ab6002
Anti-FLAG, mouse monoclonal Millipore-Sigma Cat#F1804
Anti-FOSL1, mouse monoclonal Santa Cruz Biotechnology Cat#sc-28310
Goat anti-mouse IgG(H+L), Alexa Fluor 594 ThermoFisher Cat#A-11005
Goat anti-rabbit IgG(H+L), Alexa Fluor 488 ThermoFisher Cat#A-11008
Bacterial and Virus Strains
NEB® 5-alpha competent E. coli NEB Cat#C2987
5-alpha electrocompetent E. coli NEB Cat#C2989
BL21(DE3)pLysS E. coli Novagen Cat#69388–3
Rosetta2(DE3)pLysS E. coli Novagen Cat#70951
Chemicals, Peptides, and Recombinant Proteins
Ham’s F-12K (Kaighn’s) Medium Gibco Cat#21127022
Dulbecco’s Modified Eagle Medium (DMEM) Gibco Cat#10569044
FBS Gibco Cat#16000044
Penicillin-Streptomycin Gibco Cat#15070063
NEBNext Ultra II Q5 master mix NEB Cat#M0544S
T4 DNA ligase NEB Cat#M0202S
AMPure XP beads Beckman Coulter Cat#MSPP-A63880
E.Z.N.A. cycle pure kit Omega Cat#D6492–02
E.Z.N.A. plasmid DNA maxi kit Omega Cat#D6922–02
Lipofectamine 3000 reagent ThermoFisher Cat#L3000015
37% formaldehyde Ricca Chemical Company Cat#RSOF0010250A
Protease inhibitor cocktail Sigma-Aldrich Cat#P8340
Magna ChIP Protein A+G Magnetic Beads Sigma-Aldrich Cat#16–663
NEBNext ultra II DNA library prep kit NEB Cat#E7103L
Illumina Stranded mRNA Prep kit Illumina Cat#20040532
Trizol ThermoFisher Cat#15596026
RNeasy kit Qiagen Cat#74104
Ni-NTA chromatography GoldBio Cat#H-350–25
Source S cation-exchange chromatography Cytiva Cat#17–0944-01
cobalt metal-affinity chromatography Clontech Cat#635652
7.5% non-denaturing polyacrylamide gel Bio-Rad Cat#4561026
Ethidium Bromide Invitrogen Cat#15585011
QIAGEN QIAquick Gel Extraction Kit QIAGEN Cat#28704
G418 Gibco Cat#10131035
Doxycycline Sigma-Aldrich Cat#D5207
Blasticidin S HCl Gibco Cat#A11139
Fluoromount-G Mounting Medium ThermoFisher Cat#00–4958-02
Mouse FOXA1 (UniProtKB: P35582) This study N/A
Human c-Fos (UniprotKB: P01100) This study N/A
Human c-Jun (UniprotKB: P05412) This study N/A
Deposited Data
ChIP-ISO datasets This study (GEO) GSE247411
ChIP–seq datasets This study (GEO) GSE247412
RNA-seq datasets This study (GEO) GSE247414
EMSA-seq datasets This study (GEO) GSE247431
ENCODE datasets ENCODE (Table S2) N/A
Other published datasets Other publications (Table S2) N/A
Raw images (immunostaining) This paper; Mendeley Data https://data.mendeley.com/datasets/x6z25n3z2z/1
Experimental Models: Cell Lines
A549 Dr. Yanming Wang N/A
HepG2 ATCC Cat#HB-8065
MCF-7 ATCC Cat#HTB-22
A549-HK1 This study (Table S3) N/A
A549-HK2 This study (Table S3) N/A
A549 ePB tet-on A-FOS This study (Table S3) N/A
A549-CX1 This study (Table S3) N/A
A549-CX2 This study (Table S3) N/A
A549-CX3 This study (Table S3) N/A
A549-CX4 This study (Table S3) N/A
Oligonucleotides
Synthetic oligo library Agilent Cat#G7220A
Primers This study (Table S3) N/A
gBlocks This study (Table S3) N/A
Recombinant DNA
pAAVS1-Nst-MCS Addgene Cat#80487
eSpCas9(1.1) Addgene Cat#71814
pXAT2 Addgene Cat#80494
CMV500 A-FOS Addgene Cat#33353
Xlone-GFP Addgene Cat#96930
pGEP179_pX330K Addgene Cat#137882
pX330S-2 Addgene Cat#1000000055
pGEP163 Addgene Cat#137881
pT077 Addgene Cat#137879
AAVS1 TALEN L Addgene Cat#59025
AAVS1 TALEN R Addgene Cat#59026
Super piggyBac transposase plasmid System Biosciences Cat#PB210PA-1
pCX1.4 This study (Table S3) N/A
pCX1.5 This study (Table S3) N/A
pCX1.16 This study (Table S3) N/A
pCX1.17 This study (Table S3) N/A
pCX1.10 This study (Table S3) N/A
pCX3.10 This study (Table S3) N/A
pCX4.1 This study (Table S3) N/A
pGEP163_CCND1_U2_D4 This study (Table S3) N/A
Software and Algorithms
OligoDesign3 Chen et al. 202365 N/A
FIMO (Version 5.5.4) MEME N/A
Intersect intervals (Version 2.30.0) BEDTools N/A
AME (Version 5.5.4) MEME N/A
getfasta (Version 2.30.0) BEDTools N/A
fastp (Version 0.23.2) Chen et al. 201880 N/A
cutadapt (Version 4.4) Martin 201181 N/A
NGmerge (Version 0.1) Gaspar 201882 N/A
BWA-MEM2 (Version 2.2.1) Vasimuddin et al. 201983 N/A
BAMtools (Version 2.4.0) Barnett et al. 201184 N/A
SAMtools (Version 1.8) Li et al. 200985 N/A
bamCoverage (Version 3.5.1) deepTools2 N/A
MACS2 callpeak (Version 2.1.1.20160309) Zhang et al. 200887 N/A
computeMatrix (Version 3.5.4) deepTools2 N/A
plotHeatmap (Version 3.5.4) deepTools2 N/A
plotProfile (Version 3.5.4) deepTools2 N/A
DiffBind (Version 3.18) Ross-Innes et al. 201288 N/A
T-Gene (Version 5.5.4) MEME N/A
Metascape (Version 3.5.20230501) Zhou et al. 201990 N/A
edgeR (Version 3.34.0) Robinson et al. 201091 N/A
SHAP Lundberg et al. 201740 N/A
Tomtom (Version 5.5.4) MEME N/A
DNAShapeR Chiu et al. 201694 N/A
Zeiss Bio Apps Gene Expression tool Zeiss N/A

Highlights.

  • Systematic dissection of genetic rules underlying FOXA1 binding specificity

  • FOXA1 binding is strongly promoted by co-binding TFs AP-1 and CEBPB

  • Chromatin context (e.g., heterochromatin) plays a minor role in FOXA1 binding

  • AP-1 is partially responsible for cell-type-specific binding of FOXA1

ACKNOWLEDGMENTS

We thank Dr. Kenneth Zaret for providing the FOXA1 bacterial expression construct, Dr. Cheng-Ming Chiang for providing the AP-1 polycistronic bacterial expression strain, and Dr. Xiaojun Lance Lian for providing the XLone piggyBac cargo strain. We also thank Dr. Yanming Wang for providing us with A549 human lung carcinoma cells. We are grateful to Dr. Cheryl Keller and others in the Huck Genomics Research Incubator for assistance, training, and discussions related to high-throughput sequencing. We acknowledge all members in the Bai lab for insightful comments on the manuscript. We also thank the members of the Center of Eukaryotic Gene Regulation at Pennsylvania State University for discussions and technical support. This work is supported by the National Institutes of Health (T32 GM125592 to H.K. and E.M.L., R35 GM127034 to S.T., R35 GM144135 to S.M., and R35 GM139654 to L.B.) and the Graduate Research Innovation fund from the Huck Institute of Life Sciences (to C.X.).

Footnotes

DECLARATION OF INTERESTS

The authors declare no competing interests.

SUPPLEMENTAL INFORMATION

Supplemental information can be found online at https://doi.org/10.1016/j.molcel.2024.06.022.

REFERENCES

  • 1.Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y, et al. (2012). Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812. 10.1101/gr.139105.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Arvey A, Agius P, Noble WS, and Leslie C (2012). Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 22, 1723–1734. 10.1101/gr.127712.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Spitz F, and Furlong EEM (2012). Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626. 10.1038/nrg3207. [DOI] [PubMed] [Google Scholar]
  • 4.Gordân R, Shen N, Dror I, Zhou T, Horton J, Rohs R, and Bulyk ML (2013). Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104. 10.1016/j.celrep.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Abe N, Dror I, Yang L, Slattery M, Zhou T, Bussemaker HJ, Rohs R, and Mann RS (2015). Deconvolving the recognition of DNA shape from sequence. Cell 161, 307–318. 10.1016/j.cell.2015.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, Enge M, Kivioja T, Morgunova E, and Taipale J (2015). DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388. 10.1038/nature15518. [DOI] [PubMed] [Google Scholar]
  • 7.Donaghey J, Thakurela S, Charlton J, Chen JS, Smith ZD, Gu H, Pop R, Clement K, Stamenova EK, Karnik R, et al. (2018). Genetic determinants and epigenetic effects of pioneer-factor occupancy. Nat. Genet. 50, 250–258. 10.1038/s41588-017-0034-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kim S, Morgunova E, Naqvi S, Bader M, Koska M, Popov A, Luong C, Pogson A, Claes P, Taipale J, et al. (2023). DNA-guided transcription factor cooperativity shapes face and limb mesenchyme. Cell 187, 692–711.e26. 10.1101/2023.05.29.541540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yáñez-Cuna JO, Dinh HQ, Kvon EZ, Shlyueva D, and Stark A (2012). Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding. Genome Res. 22, 2018–2030. 10.1101/gr.132811.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Domcke S, Bardet AF, Adrian Ginno P, Hartl D, Burger L, and Schübeler D (2015). Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579. 10.1038/nature16462. [DOI] [PubMed] [Google Scholar]
  • 11.Kaluscha S, Domcke S, Wirbelauer C, Stadler MB, Durdu S, Burger L, and Schübeler D (2022). Evidence that direct inhibition of transcription factor binding is the prevailing mode of gene and repeat repression by DNA methylation. Nat. Genet. 54, 1895–1906. 10.1038/s41588-022-01241-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Barozzi I, Simonatto M, Bonifacio S, Yang L, Rohs R, Ghisletti S, and Natoli G (2014). Coregulation of transcription factor binding and nucleosome occupancy through DNA features of mammalian enhancers. Mol. Cell 54, 844–857. 10.1016/j.molcel.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Neikes HK, Kliza KW, Gräwe C, Wester RA, Jansen PWTC, Lamers LA, Baltissen MP, van Heeringen SJ, Logie C, Teichmann SA, et al. (2023). Quantification of absolute transcription factor binding affinities in the native chromatin context using BANC-seq. Nat. Biotechnol. 41, 1801–1809. 10.1038/s41587-023-01715-w. [DOI] [PubMed] [Google Scholar]
  • 14.Soufi A, Donahue G, and Zaret KS (2012). Facilitators and impediments of the pluripotency reprogramming factors’ initial engagement with the genome. Cell 151, 994–1004. 10.1016/j.cell.2012.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sinha KK, Bilokapic S, Du Y, Malik D, and Halic M (2023). Histone modifications regulate pioneer transcription factor cooperativity. Nature 619, 378–384. 10.1038/s41586-023-06112-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kim S, and Shendure J (2019). Mechanisms of Interplay between Transcription Factors and the 3D Genome. Mol. Cell 76, 306–319. 10.1016/j.molcel.2019.08.010. [DOI] [PubMed] [Google Scholar]
  • 17.Garcia DA, Johnson TA, Presman DM, Fettweis G, Wagh K, Rinaldi L, Stavreva DA, Paakinaho V, Jensen RAM, Mandrup S, et al. (2021). An intrinsically disordered region-mediated confinement state contributes to the dynamics and function of transcription factors. Mol. Cell 81, 1484–1498.e6. 10.1016/j.molcel.2021.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Brodsky S, Jana T, and Barkai N (2021). Order through disorder: The role of intrinsically disordered regions in transcription factor binding specificity. Curr. Opin. Struct. Biol. 71, 110–115. 10.1016/j.sbi.2021.06.011. [DOI] [PubMed] [Google Scholar]
  • 19.Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordân R, and Rohs R (2014). Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399. 10.1016/j.tibs.2014.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Keilwagen J, Posch S, and Grau J (2019). Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 20, 9. 10.1186/s13059-018-1614-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Luo Y, North JA, Rose SD, and Poirier MG (2014). Nucleosomes accelerate transcription factor dissociation. Nucleic Acids Res. 42, 3017–3027. 10.1093/nar/gkt1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, and Zaret KS (2015). Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell 161, 555–568. 10.1016/j.cell.2015.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Donovan BT, Chen H, Eek P, Meng Z, Jipa C, Tan S, Bai L, and Poirier MG (2023). Basic helix-loop-helix pioneer factors interact with the histone octamer to invade nucleosomes and generate nucleosome-depleted regions. Mol. Cell 83, 1251–1263.e6. 10.1016/j.molcel.2023.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Guan R, Lian T, Zhou BR, Wheeler D, and Bai Y (2023). Structural mechanism of LIN28B nucleosome targeting by OCT4. Mol. Cell 83, 1970–1982.e6. 10.1016/j.molcel.2023.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Donovan BT, Chen H, Jipa C, Bai L, and Poirier MG (2019). Dissociation rate compensation mechanism for budding yeast pioneer transcription factors. eLife 8, e43008. 10.7554/eLife.43008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zaret KS, and Carroll JS (2011). Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 25, 2227–2241. 10.1101/gad.176826.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Balsalobre A, and Drouin J (2022). Pioneer factors as master regulators of the epigenome and cell fate. Nat. Rev. Mol. Cell Biol. 23, 449–464. 10.1038/s41580-022-00464-z. [DOI] [PubMed] [Google Scholar]
  • 28.Bulyk ML, Drouin J, Harrison MM, Taipale J, and Zaret KS (2023). Pioneer factors - key regulators of chromatin and gene expression. Nat. Rev. Genet. 24, 809–815. 10.1038/s41576-023-00648-z. [DOI] [PubMed] [Google Scholar]
  • 29.Yan C, Chen H, and Bai L (2018). Systematic Study of Nucleosome-Displacing Factors in Budding Yeast. Mol. Cell 71, 294–305.e4. 10.1016/j.molcel.2018.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rossi MJ, Kuntala PK, Lai WKM, Yamada N, Badjatia N, Mittal C, Kuzu G, Bocklund K, Farrell NP, Blanda TR, et al. (2021). A high-resolution protein architecture of the budding yeast genome. Nature 592, 309–314. 10.1038/s41586-021-03314-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bernardo GM, and Keri RA (2012). FOXA1: a transcription factor with parallel functions in development and cancer. Biosci. Rep. 32, 113–130. 10.1042/BSR20110046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, and Zaret KS (2002). Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol. Cell 9, 279–289. 10.1016/s1097-2765(02)00459-8. [DOI] [PubMed] [Google Scholar]
  • 33.Fakhouri THI, Stevenson J, Chisholm AD, and Mango SE (2010). Dynamic chromatin organization during foregut development mediated by the organ selector gene PHA-4/FoxA. PLoS Genet. 6, e1001060. 10.1371/journal.pgen.1001060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sérandour AA, Avner S, Percevault F, Demay F, Bizot M, Lucchetti-Miganeh C, Barloy-Hubler F, Brown M, Lupien M, Métivier R, et al. (2011). Epigenetic switch involved in activation of pioneer factor FOXA1-dependent enhancers. Genome Res. 21, 555–565. 10.1101/gr.111534.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lupien M, Eeckhoute J, Meyer CA, Wang Q, Zhang Y, Li W, Carroll JS, Liu XS, and Brown M (2008). FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell 132, 958–970. 10.1016/j.cell.2008.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang H, Meyer CA, Fei T, Wang G, Zhang F, and Liu XS (2013). A systematic approach identifies FOXA1 as a key factor in the loss of epithelial traits during the epithelial-to-mesenchymal transition in lung cancer. BMC Genomics 14, 680. 10.1186/1471-2164-14-680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Li J, Zhang S, Zhu L, and Ma S (2018). Role of transcription factor FOXA1 in non-small cell lung cancer. Mol. Med. Rep. 17, 509–521. 10.3892/mmr.2017.7885. [DOI] [PubMed] [Google Scholar]
  • 38.Eeckhoute J, Carroll JS, Geistlinger TR, Torres-Arzayus MI, and Brown M (2006). A cell-type-specific transcriptional network required for estrogen regulation of cyclin D1 and cell cycle progression in breast cancer. Genes Dev. 20, 2513–2526. 10.1101/gad.1446006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Shrikumar A, Greenside P, and Kundaje A (2017). Learning Important Features Through Propagating Activation Differences. Proceedings of the 34th International Conference on Machine Learning 70, 3145–3153. [Google Scholar]
  • 40.Lundberg SM, and Lee SI (2017). A Unified Approach to Interpreting Model Predictions. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems 30, 4768–4777. [Google Scholar]
  • 41.Shrikumar A, Tian K, Avsec Ž, Shcherbina A, Banerjee A, Sharmin M, Nair S, and Kundaje A (2018). Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5 6.5. Preprint at arXiv. [Google Scholar]
  • 42.Olive M, Krylov D, Echlin DR, Gardner K, Taparowsky E, and Vinson C (1997). A dominant negative to activation protein-1 (AP1) that abolishes DNA binding and inhibits oncogenesis. J. Biol. Chem. 272, 18586–18594. 10.1074/jbc.272.30.18586. [DOI] [PubMed] [Google Scholar]
  • 43.Biddie SC, John S, Sabo PJ, Thurman RE, Johnson TA, Schiltz RL, Miranda TB, Sung MH, Trump S, Lightman SL, et al. (2011). Transcription factor AP1 potentiates chromatin accessibility and glucocorticoid receptor binding. Mol. Cell 43, 145–155. 10.1016/j.molcel.2011.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Srivastava D, Aydin B, Mazzoni EO, and Mahony S (2021). An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol. 22, 20. 10.1186/s13059-020-02218-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kleinschmidt H, Xu C, and Bai L (2023). Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 132, 167–189. 10.1007/s00412-023-00796-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Teytelman L, Thurtle DM, Rine J, and van Oudenaarden A (2013). Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc. Natl. Acad. Sci. USA 110, 18602–18607. 10.1073/pnas.1316064110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Swinstead EE, Miranda TB, Paakinaho V, Baek S, Goldstein I, Hawkins M, Karpova TS, Ball D, Mazza D, Lavis LD, et al. (2016). Steroid Receptors Reprogram FoxA1 Occupancy through Dynamic Chromatin Transitions. Cell 165, 593–605. 10.1016/j.cell.2016.02.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Geusz RJ, Wang A, Lam DK, Vinckier NK, Alysandratos KD, Roberts DA, Wang J, Kefalopoulou S, Ramirez A, Qiu Y, et al. (2021). Sequence logic at enhancers governs a dual mechanism of endodermal organ fate induction by FOXA pioneer factors. Nat. Commun. 12, 6636. 10.1038/s41467-021-26950-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Farley EK, Olson KM, Zhang W, Brandt AJ, Rokhsar DS, and Levine MS (2015). Suboptimization of developmental enhancers. Science 350, 325–328. 10.1126/science.aac6948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Crocker J, Abe N, Rinaldi L, McGregor AP, Frankel N, Wang S, Alsawadi A, Valenti P, Plaza S, Payre F, et al. (2015). Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203. 10.1016/j.cell.2014.11.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hovland AS, Bhattacharya D, Azambuja AP, Pramio D, Copeland J, Rothstein M, and Simoes-Costa M (2022). Pluripotency factors are repurposed to shape the epigenomic landscape of neural crest cells. Dev. Cell 57, 2257–2272.e5. 10.1016/j.devcel.2022.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Deplancke B, Alpern D, and Gardeux V (2016). The Genetics of Transcription Factor DNA Binding Variation. Cell 166, 538–554. 10.1016/j.cell.2016.07.012. [DOI] [PubMed] [Google Scholar]
  • 53.Bejjani F, Evanno E, Zibara K, Piechaczyk M, and Jariel-Encontre I (2019). The AP-1 transcriptional complex: local switch or remote command? Biochim. Biophys. Acta Rev. Cancer 1872, 11–23. 10.1016/j.bbcan.2019.04.003. [DOI] [PubMed] [Google Scholar]
  • 54.Fu X, Pereira R, De Angelis C, Veeraraghavan J, Nanda S, Qin L, Cataldo ML, Sethunath V, Mehravaran S, Gutierrez C, et al. (2019). FOXA1 upregulation promotes enhancer and transcriptional reprogramming in endocrine-resistant breast cancer. Proc. Natl. Acad. Sci. USA 116, 26823–26834. 10.1073/pnas.1911584116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bi M, Zhang Z, Jiang YZ, Xue P, Wang H, Lai Z, Fu X, De Angelis C, Gong Y, Gao Z, et al. (2020). Enhancer reprogramming driven by high-order assemblies of transcription factors promotes phenotypic plasticity and breast cancer endocrine resistance. Nat. Cell Biol. 22, 701–715. 10.1038/s41556-020-0514-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Milan M, Balestrieri C, Alfarano G, Polletti S, Prosperini E, Spaggiari P, Zerbi A, Diaferia GR, and Natoli G (2019). FOXA2 controls the cis-regulatory networks of pancreatic cancer cells in a differentiation grade-specific manner. EMBO J. 38, e102161. 10.15252/embj.2019102161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Wolf BK, Zhao Y, McCray A, Hawk WH, Deary LT, Sugiarto NW, LaCroix IS, Gerber SA, Cheng C, and Wang X (2023). Cooperation of chromatin remodeling SWI/SNF complex and pioneer factor AP-1 shapes 3D enhancer landscapes. Nat. Struct. Mol. Biol. 30, 10–21. 10.1038/s41594-022-00880-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Vierbuchen T, Ling E, Cowley CJ, Couch CH, Wang X, Harmin DA, Roberts CWM, and Greenberg ME (2017). AP-1 Transcription Factors and the BAF Complex Mediate Signal-Dependent Enhancer Selection. Mol. Cell 68, 1067–1082.e12. 10.1016/j.molcel.2017.11.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, et al. (2021). Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366. 10.1038/s41588-021-00782-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.de Almeida BP, Reiter F, Pagani M, and Stark A (2022). DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624. 10.1038/s41588-022-01048-5. [DOI] [PubMed] [Google Scholar]
  • 61.Mirny LA (2010). Nucleosome-mediated cooperativity between transcription factors. Proc. Natl. Acad. Sci. USA 107, 22534–22539. 10.1073/pnas.0913805107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Mayran A, Khetchoumian K, Hariri F, Pastinen T, Gauthier Y, Balsalobre A, and Drouin J (2018). Pioneer factor Pax7 deploys a stable enhancer repertoire for specification of cell fate. Nat. Genet. 50, 259–269. 10.1038/s41588-017-0035-2. [DOI] [PubMed] [Google Scholar]
  • 63.Zaret KS (2020). Pioneer Transcription Factors Initiating Gene Network Changes. Annu. Rev. Genet. 54, 367–385. 10.1146/annurev-genet-030220-015007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Whitton H, Singh LN, Patrick MA, Price AJ, Osorio FG, López-Otín C, and Bochkis IM (2018). Changes at the nuclear lamina alter binding of pioneer factor Foxa2 in aged liver. Aging Cell 17, e12742. 10.1111/acel.12742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Chen H, Yan C, Dhasarathy A, Kladde M, and Bai L (2023). Investigating pioneer factor activity and its coordination with chromatin remodelers using integrated synthetic oligo assay. Star Protoc. 4, 102279. 10.1016/j.xpro.2023.102279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Chen S, Zhou Y, Chen Y, and Gu J (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 68.Gaspar JM (2018). NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors. BMC Bioinformatics 19, 536. 10.1186/s12859-018-2579-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Vasimuddin M, Misra S, Li H, and Aluru S (2019). Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 314–324. 10.1109/IPDPS.2019.00041. [DOI] [Google Scholar]
  • 70.Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, and Marth GT (2011). BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692. 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, and Durbin R; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137. 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481, 389–393. 10.1038/nature10730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, and Chanda SK (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523. 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Robinson MD, McCarthy DJ, and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Chiu TP, Comoglio F, Zhou T, Yang L, Paro R, and Rohs R (2016). DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32, 1211–1213. 10.1093/bioinformatics/btv735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Grant CE, Bailey TL, and Noble WS (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018. 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.McLeay RC, and Bailey TL (2010). Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics 11, 165. 10.1186/1471-2105-11-165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, and Manke T (2016). deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165. 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Randolph LN, Bao X, Zhou C, and Lian X (2017). An all-in-one, Tet-On 3G inducible PiggyBac system for human pluripotent stem cells and derivatives. Sci. Rep. 7, 1549. 10.1038/s41598-017-01684-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Hazelbaker DZ, Beccard A, Angelini G, Mazzucato P, Messana A, Lam D, Eggan K, and Barrett LE (2020). A multiplexed gRNA piggyBac transposon system facilitates efficient induction of CRISPRi and CRISPRa in human pluripotent stem cells. Sci. Rep. 10, 635. 10.1038/s41598-020-57500-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Batut B, van den Beek M, Doyle MA, and Soranzo N (2021). RNA-Seq Data Analysis in Galaxy. Methods Mol. Biol. 2284, 367–392. 10.1007/978-1-0716-1307-8_20. [DOI] [PubMed] [Google Scholar]
  • 84.Fernandez Garcia M, Moore CD, Schulz KN, Alberto O, Donague G, Harrison MM, Zhu H, and Zaret KS (2019). Structural Features of Transcription Factors Associating with Nucleosome Binding. Mol. Cell 75, 921–932.e6. 10.1016/j.molcel.2019.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Tan S (2001). A modular polycistronic expression system for overexpressing protein complexes in Escherichia coli. Protein Expr. Purif. 21, 224–234. 10.1006/prep.2000.1363. [DOI] [PubMed] [Google Scholar]
  • 86.Tan S, Kern RC, and Selleck W (2005). The pST44 polycistronic expression system for producing protein complexes in Escherichia coli. Protein Expr. Purif. 40, 385–395. 10.1016/j.pep.2004.12.002. [DOI] [PubMed] [Google Scholar]
  • 87.Wang WM, Lee AY, and Chiang CM (2008). One-step affinity tag purification of full-length recombinant human AP-1 complexes from bacterial inclusion bodies using a polycistronic expression system. Protein Expr. Purif. 59, 144–152. 10.1016/j.pep.2008.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ferguson HA, and Goodrich JA (2001). Expression and purification of recombinant human c-Fos/c-Jun that is highly active in DNA binding and transcriptional activation in vitro. Nucleic Acids Res. 29, e98. 10.1093/nar/29.20.e98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Michael AK, Grand RS, Isbel L, Cavadini S, Kozicka Z, Kempf G, Bunker RD, Schenk AD, Graff-Meyer A, Pathare GR, et al. (2020). Mechanisms of OCT4-SOX2 motif readout on nucleosomes. Science 368, 1460–1465. 10.1126/science.abb0074. [DOI] [PubMed] [Google Scholar]
  • 90.Yoney A, Bai L, Brivanlou AH, and Siggia ED (2022). Mechanisms underlying WNT-mediated priming of human embryonic stem cells. Development 149, dev200335. 10.1242/dev.200335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Amemiya HM, Kundaje A, and Boyle AP (2019). The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 9, 9354. 10.1038/s41598-019-45839-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.O’Connor T, Grant CE, Bodén M, and Bailey TL (2020). T-Gene: improved target gene prediction. Bioinformatics 36, 3902–3904. 10.1093/bioinformatics/btaa227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, et al. (2014). Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443. 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Gupta S, Stamatoyannopoulos JA, Bailey TL, and Noble WS (2007). Quantifying similarity between motifs. Genome Biol. 8, R24. 10.1186/gb-2007-8-2-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Li J, Sagendorf JM, Chiu TP, Pasi M, Perez A, and Rohs R (2017). Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Res. 45, 12877–12887. 10.1093/nar/gkx1145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Chiu TP, Rao S, Mann RS, Honig B, and Rohs R (2017). Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding. Nucleic Acids Res. 45, 12565–12576. 10.1093/nar/gkx915. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
3
4
5

Data Availability Statement

  • Statement about the Data: ChIP-ISO datasets from this study, including the raw data in fastq format and the processed raw counts files, are available on Gene Expression Omnibus (GEO) with accession number GSE247411. ChIP–seq datasets produced in this study, including the raw data in fastq format and the processed BED and bigWig formats, are available on GEO with accession number GSE247412. RNA-seq datasets from this study, including the raw data in fastq format and the processed feature counts files format, are available on GEO with accession number GSE247414. EMSA-seq datasets, including the raw data in fastq format and the processed raw counts files, are available on GEO with accession number GSE247431. Raw images for immunostaining experiments are availale on Mendeley Data: https://data.mendeley.com/datasets/x6z25n3z2z/1. Previously published data used in this study is summarized in Table S2.

  • Statement about the Code: No new code is generated by this study.

  • General statement: “Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.”

RESOURCES