Skip to main content
eLife logoLink to eLife
. 2017 Feb 10;6:e19272. doi: 10.7554/eLife.19272

Combinatorial bZIP dimers display complex DNA-binding specificity landscapes

José A Rodríguez-Martínez 1,, Aaron W Reinke 2,, Devesh Bhimsaria 1,3,, Amy E Keating 2,4, Aseem Z Ansari 1,5,*
Editor: Michael R Green6
PMCID: PMC5349851  PMID: 28186491

Abstract

How transcription factor dimerization impacts DNA-binding specificity is poorly understood. Guided by protein dimerization properties, we examined DNA binding specificities of 270 human bZIP pairs. DNA interactomes of 80 heterodimers and 22 homodimers revealed that 72% of heterodimer motifs correspond to conjoined half-sites preferred by partnering monomers. Remarkably, the remaining motifs are composed of variably-spaced half-sites (12%) or ‘emergent’ sites (16%) that cannot be readily inferred from half-site preferences of partnering monomers. These binding sites were biochemically validated by EMSA-FRET analysis and validated in vivo by ChIP-seq data from human cell lines. Focusing on ATF3, we observed distinct cognate site preferences conferred by different bZIP partners, and demonstrated that genome-wide binding of ATF3 is best explained by considering many dimers in which it participates. Importantly, our compendium of bZIP-DNA interactomes predicted bZIP binding to 156 disease associated SNPs, of which only 20 were previously annotated with known bZIP motifs.

DOI: http://dx.doi.org/10.7554/eLife.19272.001

Research Organism: Human

eLife digest

Most cells in our body contain the same DNA blueprint, which encodes all the genes needed for every process in the body. However, only certain genes need to be active in a particular cell at any given time. Proteins known as transcription factors control the activity of genes by binding to DNA near the start of the genes and switching genes on or off as required. Often transcription factors work together to regulate specific genes in response to signals from other cells or the environment. Failure to control the activity of genes can give rise to cancer, diabetes and a wide array of other diseases.

The bZIP family of transcription factors regulates the activities of many genes. These transcription factors work in pairs to bind a specific DNA site. They either partner with an identical molecule or with a different bZIP transcription factor. Different combinations of bZIP pairs prefer to bind different stretches of DNA. Except for a few examples, it is not yet understood how bZIP pairs work together to find the right target DNA.

Rodriguez-Martinez, Reinke, Bhimsaria et al. have identified all of the DNA sites that 102 pairs of human bZIP transcription factors can bind to. The experiments show that over two thirds of the bZIP pairs bind to DNA sequences that each individual partner prefers. However, in many cases, the choice of a partner can change the DNA sequence that the pair targets in a manner that could not have been predicted based on the preferences of each partner alone. This suggests that, by pairing up, bZIP transcription factors are able to change their preferences for which location they target in the DNA. The experiments also show that many of the 102 pairs could bind to more than one type of site. Thus, the ability of bZIP proteins to interact with different partners greatly expands the locations on genomic DNA from which they can regulate the activity of different genes.

DNA sequences vary between different individuals and some variants can predispose individuals to certain diseases. Rodriguez-Martinez et al. found that bZIP pairs can bind to over a hundred DNA variants that are associated with disease. The next challenge is to find out how specific variations in DNA can lead to the formation or elimination of bZIP binding sites that cause disease. In the future, DNA editing methods may make it possible to specifically fix such changes in our genomes to reduce the risk of disease.

DOI: http://dx.doi.org/10.7554/eLife.19272.002

Introduction

Multiple sequence-specific transcription factors (TFs) converge at enhancers and promoters to control the expression of genes (Ptashne and Gann, 2002). Such TF assemblages permit integration of multiple cellular signals to regulate targeted gene networks (Ciofani et al., 2012; Kittler et al., 2013; Xie et al., 2013). The combinatorial use of a limited number of TFs provides the means to finely control the complex cellular transcriptome. The ability of a given TF to interact with different partners expands the DNA targeting repertoire. It also enhance specificity by focusing combinations of factors to a more specific set of regulatory sites across the genome. Different partners also alter the regulatory potential of a given TF, at times completely altering its regulatory output such that the factor switches from an activator to a repressor of transcription depending on the binding partners with which it associates (Ptashne and Gann, 2002).

The bZIP class of human TFs is well suited to play a role in signal integration and combinatorial transcriptional control (Lamb and McKnight, 1991; Miller, 2009; Tsukada et al., 2011). bZIPs bind DNA as either homo- or heterodimers; the discovery in 1988 that JUN and FOS could bind to DNA as a heterodimer immediately suggested the potential for combinatorial regulation by this family (Franza et al., 1988; Lamb and McKnight, 1991). Interestingly, the human bZIP network displays greater ability to form heterodimers compared to simpler eukaryotes, suggesting that more complex combinatorial regulation may contribute to organismal complexity (Reinke et al., 2013). bZIP proteins also interact with other classes of TFs to stabilize higher order oligomers at enhancers (Jain et al., 1992; Murphy et al., 2013; Thanos and Maniatis, 1995). Their role in nucleating such multi-factor complexes is supported by the observation that certain bZIP dimers such as AP-1 (FOS•JUN) and CEBPA can function as ‘pioneer’ factors that bind inaccessible chromatin and enable the assembly of other TFs at regulatory sites (Biddie et al., 2011; Collins et al., 2014). Notably, the choice of dimerizing partner not only impacts DNA recognition properties but can also influence regulatory function of a given bZIP. For example, ATF3 homodimer acts as a repressor, whereas ATF3•JUN activates transcription (Hsu et al., 1992). As a class, bZIPs regulate diverse biological phenomena ranging from response to stress at the cellular level, organ development at the tissue level and viral defense, circadian patterning, memory formation, and ageing at an organismal level (Costa et al., 2003; Herdegen and Waetzig, 2001; Jung and Kwak, 2010; Male et al., 2012). Given their central role in various processes, mutations in bZIP proteins are implicated in the etiology of diseases ranging from cancer and diabetes to neuronal malfunction, developmental defects and behavioral dysfunction (Lopez-Bergami et al., 2010; Tsukada et al., 2011).

Fifty-three bZIPs encoded by the human genome can be grouped into 21 families, and as homodimers they are known to bind at least six distinct classes of DNA motifs, including sites labeled as TRE, CRE, CRE-L, CAAT, PAR, and MARE (Figure 1) (Deppmann et al., 2006; Jolma et al., 2013). In 1991, Hai and Curran (1991) showed that some heterodimers have DNA-binding specificities that are distinct from those of each partnering bZIP. For example, JUN•ATF2 heterodimer binds to a cognate site in the ENK2 promoter that is not bound by either JUN•JUN or ATF2•ATF2 homodimers. However, the past 20 years have provided only a handful of additional examples of how bZIP heterodimerization influences DNA-binding specificity (Cohen et al., 2015; Jolma et al., 2015; Vinson et al., 1993; Yamamoto et al., 2006). Central questions about bZIP transcription factors remain unanswered: What is the influence of protein dimerization on DNA binding? Does DNA stabilize dimer formation? Which protein dimers can bind DNA? Which sequences do they bind? And, how do bZIP-binding sites contribute to cellular function and the etiology of various diseases?

Figure 1. Overview of human bZIP homodimer and heterodimer DNA-binding specificities.

(A) Summary of SELEX-seq results categorized by protein-protein interaction (PPI) affinity (Reinke et al., 2013). Specificity profiles were classified as resulting in a motif arising from DNA binding by either a homodimer (brown) or a heterodimer (dark brown), or not resulting in a motif (white). Some profiles could not be unambiguously assigned to a homo vs. heterodimer (light brown). (B) Pairwise comparisons of the DNA-binding preferences of 102 bZIP dimers (22 homodimers and 80 heterodimers) using the z-scores for 1222 unique 10 bp sequences corresponding to the 50 top ranked sequences for each dimer. Throughout the paper, the biotinylated bZIP is listed first when describing a heterodimer. (C) Representative motifs bound by bZIP homodimers and heterodimers reported in this study. Heterodimer motifs were grouped as Conjoined, Variable spacer, and Emergent. The color code defined here for half sites (colored arrows above motifs) is used throughout the figures.

DOI: http://dx.doi.org/10.7554/eLife.19272.003

Figure 1—source data 1. Data for Figure 1C.
Pairwise comparison (Pearson's correlation) of the DNA-binding preferences of 102 bZIP dimers using the CSI intensity for 1222 10 bp sequences.
elife-19272-fig1-data1.xlsx (120.7KB, xlsx)
DOI: 10.7554/eLife.19272.004

Figure 1.

Figure 1—figure supplement 1. Cognate site identification by SELEX-sequencing.

Figure 1—figure supplement 1.

(A) In CSI by SELEX-seq, a DNA library with a randomized 20 bp region is incubated with a bZIP pair in which one bZIP partner (light grey) was biotinylated and the other partner (light grey) was labeled with fluorescein (blue star). bZIP partners were mixed in 3:1 molar ratios with the biotinylated partner at the lower concentration. Affinity purification using the less abundant biotinylated partner enriched for heterotypic dimers. (B) Reproducibility of CSI by SELEX-Seq. Scatter plots of CSI intensities (z-scores) for all 10-mers for replicate samples and (C) reciprocal samples.

Figure 1—figure supplement 2. ATF3 CSI Intensity (z-score) correlates with equilibrium association constant.

Figure 1—figure supplement 2.

(A) DNA sequence of oligonucleotides used for determining binding constants. (B) Correlation between CSI intensity (z-score) and association constant (Ka) for ATF3. Binding constants were measured by EMSA. Error bars are ± S.D. of at least duplicate measurements. (C) Representative autoradiographs of EMSA experiments from which binding constants were calculated using non-linear regression.

Figure 1—figure supplement 3. Pairwise comparison of bZIP homodimers reported in this study and bZIP dimers reported by Jolma et al.

Figure 1—figure supplement 3.

(Jolma et al., 2013). (A) Hierarchical clustering was performed using the CSI intensities (z-scores) of 871 unique 10 bp sequences corresponding to the 50 top ranked sequences identified from each dimer. Corresponding bZIP pairs are highlighted in matching color. (B) Scatter plots comparing CSI intensity (z-score) for all 10-mers of bZIP dimers from this study with bZIP dimers previously reported by Jolma et al. (2013). (top left) BATF3 vs. BATF3; (top right) CEBPG vs. CEBPG; (bottom left) ATF4 vs. ATF4; (bottom right) ATF4 vs. ATF4•CEBPG.

Figure 1—figure supplement 4. bZIP heterodimer specificity.

Figure 1—figure supplement 4.

Pearson’s correlations (r) of all 10-mers between replicate experiments of bZIP dimers (top), and correlations between a bZIP heterodimer and the bZIP homodimer that was used to pull-down the heterodimer. The average (± standard deviation) Pearson’s correlation (r) for eight replicate samples was 0.8 ± 0.1.

Figure 1—figure supplement 5. DNA sequence preferences for FOS•CEBPE, FOS•CEBPG, FOSL1•CEBPE, and FOSL1•CEBPG.

Figure 1—figure supplement 5.

Left, PPI affinity for the corresponding heterodimer is shown. Middle, MEME motifs are represented as DNA logos. Right, 2-dimensional scatter plots comparing the CSI intensities for all 10-mers. CRE/CAAT (TGACGTAA) sites are colored red, and TRE/CAAT (TGAGCAA) sites are colored orange.

Resolving these issues requires systematic examination of the DNA-binding specificities of bZIP homo- and heterodimers. Fifty-three human bZIP proteins can potentially form as many as 1431 distinct dimers. Quantitative experiments using fluorescence resonance energy transfer (FRET) in solution indicated that ~ 30% of all possible bZIP dimers form in the absence of DNA. Most bZIPs can form dimers with different partners, potentially greatly expanding the repertoire of cognate sites that might be targeted by different heterodimers (Reinke et al., 2013). We used this protein-protein interaction (PPI) dataset to prioritize 270 bZIP dimers for bZIP-DNA interactome studies and to apply FRET-based methods to distinguish DNA-bound heterodimers from homodimers.

Insights that emerged from our compendium of bZIP-DNA interactomes include: (i) identification of new bZIP cognate sites, (ii) evidence for three classes of heterodimer-binding sites (conjoined half-sites, variably-spaced half-sites, and unpredicted emergent cognate sites), (iii) ability of individual bZIP heterodimers to target a range of binding sites, (iv) evidence for varying heterodimer selectivity between distinct sequences currently classified as a single consensus motif, (v) improved ability to account for in vivo genome-wide occupancy of heterodimers, and (vi) identification of bZIP cognate sites at 156 SNPs linked to human diseases and quantitative traits. DNA sequence preferences of bZIP heterodimers reported here serve as a valuable resource for many purposes including, but not limited to, evaluating potential bZIP dimer binding at genomic binding sites, providing hypotheses about mechanisms underlying the etiology of disease-linked SNPs, and predicting binding specificities of heterodimeric bZIPs from other species.

Results

Comprehensive bZIP-DNA interactomes

To elucidate DNA sequence-recognition properties of TFs that form obligate dimers, we examined 270 pairs of purified human bZIP proteins. These pairs were composed from 36 bZIP proteins representing 21 bZIP families and encompassing the diversity of all 53 bZIPs encoded in the human genome (Supplementary file 1A). Given the 666 potential dimeric pairs that can be formed with 36 bZIPs, we used biophysically measured PPIs to prioritize the dimers that were examined (Reinke et al., 2013). We selected 126 pairs (97 hetero- and 29 homodimer) that form stable dimers with PPI dissociation constants (Kd) less than 1 µM at 21°C in the absence of DNA. In addition, we tested 144 (137 hetero- and seven homo-) dimer combinations that do not stably associate in solution in the absence of DNA (Kd >1 µM at 21°C). For most TFs, including the bZIP class, the DNA specificity of the isolated DNA binding domain (DBD) is typically indistinguishable from the full-length factor (Jolma et al., 2013). Therefore, we focused our efforts on the bZIP domain, which comprises the basic region that binds DNA and the leucine zipper dimerization module that forms a coiled-coil. Recombinant proteins, overexpressed in bacteria, were purified to homogeneity. Two versions of each protein were made, one conjugated to biotin at the carboxyl terminal and the other without. This set of highly purified DNA binding proteins enabled examination of the innate DNA-binding sequence specificity of 36 representative human bZIPs. Individual bZIP partners were mixed in 3:1 molar ratios with the biotinylated partner at the lower concentration; affinity purification of the protein-DNA complexes using the less abundant biotinylated partner enriched for heterodimers. To additionally favor examination of the heterodimer, whenever possible, the interaction partner with the weaker homodimer was biotinylated and used for isolating protein-DNA complexes. Each protein dimer is denoted by a dot between each monomer – for example, JUN•ATF3. The DNA binding sites are indicated either by a specific sequence or their classical designations, such as CRE or CAAT, or by half-sites connected by a hyphen, such as CRE-CAAT.

DNA-binding specificity of the pairs was queried using systematic evolution of ligands by exponential enrichment coupled to deep sequencing (SELEX-seq) (Figure 1—figure supplement 1) (Jolma et al., 2010; Slattery et al., 2011; Tietjen et al., 2011; Zhao et al., 2009; Zykovich et al., 2009). In our cognate site identification (CSI) effort using SELEX-seq, a DNA library spanning the entire sequence space of a 20-mer (1012 different sequence permutations) was independently incubated with each of the 270 different bZIP pairs (234 hetero- and 36 homodimers), and protein-bound DNA sequences were enriched, amplified, and re-probed for an additional two cycles to further enrich cognate sites over non-cognate sites that comprise the majority of the starting library. The starting DNA library as well as selectively enriched sequences were barcoded and sequenced using massively parallel DNA sequencing methods. The CSI intensity, corresponding to the z-score for the enrichment of a sequence, was computed for each 10-mer as described in the methods. Repeated experiments demonstrated an average Pearson’s correlation r = 0.8 ± 0.1 between CSI intensities from replicates (Figure 1—figure supplement 1). The CSI intensity (z-score) correlates with the binding affinity for a particular sequence (Figure 1—figure supplement 2) (Carlson et al., 2010; Puckett et al., 2007; Tietjen et al., 2011). In three cases, reciprocal biotinylation of each partner was performed to ensure that the choice of partner did not skew the results (Pearson’s correlation for comparing experiments was r = 0.89–0.98; Figure 1—figure supplement 1C).

Among the 270 pairs tested were 12 homodimers that had been previously examined by other groups (Jolma et al., 2013). Overall, we found excellent agreement between the cognate sites identified using highly purified bZIP modules in our study versus full-length proteins in unpurified cell lysates in other studies, with only a few inconsistent examples that can be seen in Figure 1—figure supplement 3A (e.g. MAFB, NFE2, ATF4). Interestingly, we found that the previously reported DNA specificity of ATF4 has a higher correlation with the specificity of ATF4•CEBPG heterodimer identified in this report than with the specificity of the ATF4 homodimer (Figure 1—figure supplement 3B), suggesting that CEBPG possibly formed a complex with ATF4 in the cell lysates used in the prior study (Jolma et al., 2013). This observation highlights the advantage of using highly purified proteins over cell lysates and validates our focus on the bZIP domain to capture the innate specificity of this class of transcription factors that bind DNA as obligate dimers.

Overall, 30 out of the 36 bZIP proteins tested in this study enriched specific DNA sequences as part of at least one dimer. bZIPs that are not known to bind DNA as homodimers did not yield cognate sites in our studies (e.g. JUNB and FOS) (Deng and Karin, 1993; Hai and Curran, 1991). 73 of 126 (58%) bZIP pairs that dimerize in the absence of DNA yielded specific cognate sites (Figure 1A). Surprisingly, 29 of the 144 (20%) bZIP pairs that do not stably associate in the absence of DNA (Kd >1 µM at 21°C) yielded evidence of sequence-specific binding to DNA, indicating that PPIs were stabilized by binding to specific DNA sites (Figure 1A; Supplementary file 1C). This finding has important implications, given that the majority of the potential bZIP PPI space consists of protein pairs that do not associate strongly in the absence of DNA (Reinke et al., 2013).

Conjoined, variably-spaced, and emergent cognate sites bound by heterodimers

For 184 bZIP-DNA interactomes that showed evidence for an enriched motif, we computationally parsed and retained datasets that could be attributed with high confidence to 80 heterodimers and 22 homodimers (Materials and methods). We assigned a specificity profile to a heterodimer when it bound sequences that were significantly different (t-test p<0.05) from the sequences preferred by the homodimer of the biotinylated bZIP (e.g. ATF4 vs. ATF4•CEBPA, r = 0.1; Figure 1—figure supplement 4) or when the biotinylated bZIP did not bind specific DNA sequences as a homodimer (e.g. FOS and JUNB).

Of the 22 homodimers, comprehensive DNA-binding specificity is reported for the first time for human ATF2, ATF3, ATF6, ATF6B, CEBPA, CREB1, FOSL1, JUN, MAFB, and NFE2L1. Hierarchical clustering of the 102 bZIP-DNA interactomes readily identified six previously known classes of bZIP-binding sites (TRE, CAAT, PAR, MARE, CRE, and CRE-L) (Figure 1B–C). Notably, several bZIP homodimers (ATF6, ATF6B, CREB3L1, and JUN) enriched more than one motif (Supplementary file 2). Examining the cognate sites bound by heterodimers highlighted the ability of some heterodimers to bind homodimer motifs as well as a range of other heterodimer-specific motifs. Such binding to multiple motifs is reminiscent of previous studies that reported bZIP dimers binding to different sites with different affinities (Badis et al., 2009; Kim and Struhl, 1995; König and Richmond, 1993). Interestingly, several heterodimers that bind classic bZIP homodimer motifs such as TRE, CRE-L, or CRE displayed clear differences in their preference for a subset of sequences categorized under a single consensus motif (e.g. compare motifs 9 and 10 in Figure 1 with the CRE-L site). This was also true for different homodimers (e.g. compare CRE-L binding profiles for ATF6, CREB3, CREB3L1, and XBP1 in Supplementary file 2). Thus, the binding data reported here reveal a sequence sub-structure to classic consensus motifs. Moreover, the sub-structure highlights differences in DNA-binding specificity between closely related dimers.

Three classes of bZIP heterodimer motifs were identified and are illustrated in Figure 1C: ‘Conjoined’ sites for which half-sites preferred by each contributing monomer are juxtaposed (such as the CRE-CAAT site represented by motif 1, or the MARE-CRE site of motif 7), ‘Variably-spaced’ sites for which half-sites overlap (as is the case in motifs 2 and 4), and ‘Emergent’ sites for which binding preferences could not have been readily inferred based on the half-site preferences of each partner (motifs 3, 5, 8, 9, and 10). In other words, an emergent site arises as a consequence of heterodimer formation and is not simply comprised of the conjoined or variably-spaced half-sites preferred by each monomer. An elegant study of Hox-Exd heterodimers identified ‘latent’ sites that were preferred by different Hox factors when they bound DNA in conjunction with Exd (Slattery et al., 2011). Preferences for different sequences at the interface of half-sites or sequences flanking the half sites were observed for different classes of Hox-Exd heterodimers. In our studies, we observed a change in half-site preference of certain bZIPs when they bound DNA as heterodimers. In some instances, homodimers bound with low affinity to sites that emerged as high-affinity sites in the context of a heterodimer, whereas in other cases entirely new site preferences emerged. We classified such newly acquired binding preferences as emergent sites because they are not readily inferred from the binding preferences of homodimers.

While a large fraction of heterodimers bind conjoined sites, it was surprising to find that closely related heterodimers such as FOS•CEBPG and FOS•CEBPE preferred different arrangements of half-sites, with the former heterodimer preferring the 8 bp conjoined CRE-CAAT site (motif 1 5’TGACGCAA3’) and the latter preferring the 7 bp variably-spaced TRE-CAAT site (motif 4 5’TGAGCAA3’). Figure 1—figure supplement 5 highlights the unexpectedly poor correlation between the binding preferences of these two heterodimers and between the binding preferences of FOSL1•CEBPG and FOSL1•CEBPE. Similarly, other heterodimers bound both conjoined and variably spaced motifs (see JUNB•ATF3 and MAFB•ATF5 in Supplementary file 2); however, the preference for one arrangement over the other was not amenable to predictions based on the binding preferences of each contributing partner of the heterodimer.

Emergent sites pose a particular challenge for current models of DNA binding site predictions that are based on protein homology (Weirauch et al., 2014). Emergent cognate sites for heterodimers can be subdivided into two categories: (i) ‘gain-of-specificity’ motifs that display a change in half-site preferences for a bZIP or (ii) motifs that display a ‘loss-of-specificity’ for one half-site. An example of the first category includes a switch in the half-site preferences of BATF family members, from a CRE half-site (5′TGAC3′) that is preferred in homodimers to a CRE-L (5′CCAC3′) half-site that is preferred by many BATF-containing heterodimers (compare motifs 3 and 8–10, and see examples in Supplementary file 2 such as BATF2•ATF3, BATF2•JUN, BATF3•ATF3, BATF3•ATF4). An example of the second category is DDIT3•CEBPG binding to 5′ATTGCA3′ (motif 5) (Ubeda et al., 1996), with heterodimers displaying no apparent requirement for one half-site. Overall, for the 80 bZIP heterodimers with binding motifs reported here, 72% of the motifs can be classified as conjoined, 16% as emergent, and 12% as variably-spaced. Nine out of the 80 heterodimers (11%) enriched two motifs (Supplementary file 2). For example, BATF•CEBPG enriched both CRE-CAAT and CRE-L-CAAT motifs.

Specificity and energy landscapes reveal the entire spectrum of cognate sites bound by heterodimers

To examine the full specificity spectrum of individual bZIP dimers, we displayed DNA binding data as specificity and energy landscapes (SELs) (Carlson et al., 2010; Tietjen et al., 2011). In a SEL, all possible sequences of a given length are arranged within concentric circles based on their homology to a seed motif. The seed motif is often derived from position weight matrices (PWMs) of the most enriched sequences (Figure 2A). The innermost circle contains all sequences that have an exact sequence match to the seed motif (0-mismatch). As each enriched sequence placed in this ring is an exact match to the seed motif, the source of varying CSI intensity (z-score) is the contribution of the sequences flanking the seed motif. The 1-mismatch ring contains all sequences that differ from the seed motif at any one position, or a Hamming distance of one. The subsequent rings, going outwards, display sequences with increasing number of mismatches to the seed motif. The height and color of each point represents the CSI intensity for the corresponding sequence. As noted above, CSI intensity correlates with binding affinity where measured (Figure 1—figure supplement 2) (Carlson et al., 2010; Hauschild et al., 2009; Puckett et al., 2007; Tietjen et al., 2011; Warren et al., 2006). Although there are far more low-affinity sequences than enriched sequences (as depicted by the illustrative histogram in Figure 2A), the moderate-to-low affinity sites (low CSI intensity) are often overlooked by motif searching algorithms. Such sequences readily emerge in SEL display of the entire binding data (Carlson et al., 2010; Tietjen et al., 2011). In Figure 2A, we illustrate how SELs are built and we note that a SEL can be constructed using any sequence as a seed motif. The choice of a different seed motif simply alters the placement of the sequences on the landscape without changing the underlying binding preferences of a protein for a given sequence.

Figure 2. Specificity and energy landscapes (SELs) and motifs for bZIP heterodimers.

Figure 2.

(A) SEL displays CSI intensities for all sequence permutations of a given binding site size (k-mers). Sequences are organized with respect to any selected seed motif; however, a k-mer representing PWM-derived motif is typically used. CSI intensities correlate with equilibrium binding affinities. As an example, the arrangement of 6-mer sequences for a simplified 4-mer seed motif is shown. The innermost circle displays the intensities for all sequences that have an exact match to the seed motif (0-mismatch ring). In this ring, sequences are arranged in a clockwise manner with sequences that include residues 5′ of the seed motif at the start, sequences with residues that flank both 5′ and 3′ ends in the middle, and 3′ flanking sequences at the end (context). The subsequent 1-mismatch ring contains the sequences that differ at one position from the seed. The sequences are organized clockwise starting with mismatches at the first position and ending with mismatches at the last position of the motif. Within each sector, the mismatches at a given position (indicated by x) are organized in alphabetical order (A, C, G, and T). The 2-mismatch ring contains all permutations with two positional differences with the seed, similarly ordered. (B) Left, SEL for JUN•ATF3 heterodimer using CRE (5′TGACGTCA3′) as the seed motif. By displaying the 10 bp sequence space, preferred sequences become apparent. Peaks corresponding to emergent and variably-spaced sites are identified by arrows. Right, SEL displaying 12 bp sequences for ATF4•CEBPG heterodimer using CRE-CAAT (5′ATGACGCAAT3′) as the seed motif. (C) Heatmap of the relative CSI intensities of 102 bZIP dimers (columns) for the 10 sites highlighted in Figure 2B as well as constituent half-sites of the six classic bZIP motifs (rows). Displayed is the maximum CSI intensity of all the 10 mers matching the site. bZIP dimers are listed in the same order as in Figure 1B. ATF3, ATF4, CEBPG, and JUN homodimers are marked by asterisks. While bZIPs do not bind as monomers to half-sites, the occurrence of bZIP half-sites within motifs is displayed in the second set of rows to enable comparison between the half-site preferences versus the CSI intensity for motifs that display these half-sites in different combinations or in different contexts.

DOI: http://dx.doi.org/10.7554/eLife.19272.010

Figure 2—source data 1. Data for Figure 2C.
Relative CSI intensity for 102 bZIP dimers for different DNA-binding sites and half-sites.
DOI: 10.7554/eLife.19272.011

SEL plots of 102 bZIP homo- and heterodimers reveal that the impact of flanking sequence context and the range of different cognate sites bound by most bZIPs is far richer than might be inferred from motifs represented as PWMs (Supplementary file 2). In Figure 2B, the SELs of JUN•ATF3 and ATF4•CEBPG illustrate the broader insights that emerged from examining specificity profiles of these two heterodimers. JUN•ATF3 binds a CRE site composed of conjoined half-sites for JUN and ATF3. Visualizing the entire JUN•ATF3-DNA interactome via a SEL shows that the binding of JUN•ATF3 heterodimer to CRE is significantly influenced by the sequence context that flanks the motif (see affinity variations in the 0-mismatch ring). Additionally, the 3-mismatch ring of the SEL identifies several high-intensity peaks corresponding to additional cognate sites. As indicated, one of these is a variably spaced site, and another is an emergent site 5′TGACGCAT3′. Thus, the SEL highlights that this single heterodimer binds multiple classes of cognate sites. On the other hand, the SEL for ATF4•CEBPG shows that the seed motif 5′ATGCGCAAT3′ bound by this heterodimer is relatively insensitive to context effects (0-mismatch ring). The 1-mismatch ring indicates that both half-sites are not equally tolerant of mismatches, with mismatches in the 5′TGA3′ core of the CRE site dramatically reducing binding, whereas the 5′CAA3′ site is tolerant of deviations at the first position of the half-site but sensitive to deviations in the 5′AA3′ positions. Similar insights can be obtained from the SELs for each of the102 bZIP dimers that are reported in Supplementary file 2.

Our compendium of SEL plots greatly extend previous reports that bZIP dimers bind a range of sequences with different degrees of affinity (Badis et al., 2009; Kim and Struhl, 1995; König and Richmond, 1993). To examine whether the set of sequences that are pointed out in SELs of JUN•ATF3 and ATF4•CEBPG from Figure 2B are bound by homodimers or any bZIP in our compendium, we displayed the relative preferences of each dimer for this set of binding sites in a heatmap (Figure 2C). Each column displays the relative preference of each of the 102 bZIP dimers for different sequences, including half-sites of all six classical homodimer motifs. An examination of row 3, which displays preferences of all 102 dimers for the emergent site 5’TGACGCAT3’, indicates that this site is highly preferred by JUN•ATF3 and to some extent by JUN•CEBPG but not by homodimers formed by ATF3, CEBPG, or JUN (denoted by asterisks). While not as exclusive as the JUN•ATF3 emergent site, the conjoined CRE-CAAT site is primarily targeted by heterodimers formed by the CEBP family of bZIPs. The heatmap does indicate that this heterodimer-preferred site permits low-affinity binding by the CEBPG homodimer. Interestingly, data in row 8 reveals that substituting 5’CAAT3’ with 5’TAAT3’ in the CRE-CAAT site perturbs binding by CEBP heterodimers in a non-uniform manner, unmasking hidden differential sequence preferences of related heterodimers that are opaque to current models that use protein homology to predict cognate site preferences. The C-to-T substitution also expands the repertoire of bZIPs that bind this mutated site. DBP, HLF, and NFIL3 as homo- and hetero- dimers display an unmistakable affinity for this modified CRE-CAAT site that recreates the PAR half-site that is a target of this set of bZIPs. On the other hand, a different substitution at the same position (5’GAAT3’) dramatically reduces the binding of all bZIPs to this version of the CRE-CAAT site (row 9). Furthermore, the importance of the sequences flanking a binding site (rows 4 and 5) or the contribution of each half-site to the binding of bZIPs (rows 8–10) is also made evident by the heatmap. In essence, SEL plots alongside comparative heatmaps of affinities of proteins for a range of cognate sites bring a new appreciation for diversity of DNA sequences that can be targeted by a given factor.

EMSA-FRET analysis to validate heterodimer binding to different cognate sites

To validate the ability of heterodimers to bind cognate sites identified by CSI analysis, we used an electrophoretic mobility shift assay (EMSA) in which a FRET signal distinguished homo- vs. heterodimers in protein-DNA complexes (Figure 3A; Materials and methods) (Reinke et al., 2013). We first used EMSA-FRET to assay bZIP dimers formed by mixing fluorescein and TAMRA labeled versions of 16 proteins drawn from different bZIP families. For 15 homodimers for which we could detect binding to DNA, the mixed-dye homodimer could be easily distinguished from both of the single-dye homodimers, as shown for CEBPG and ATF3 homodimers binding to CAAT and CRE sites, respectively (Figure 3A). This assay was then used to demonstrate that the ATF3•CEBPG heterodimer bound the conjoined CRE-CAAT site better than either parental homodimer (Figure 3A). Furthermore, swapping fluorophores did not alter the binding properties of the resulting heterodimer (last panel of Figure 3A). DNA fluorescence coincides with the protein FRET signals, confirming that protein-DNA complexes were being observed in the EMSA gels (Figure 3—figure supplement 1A).

Figure 3. Influence of bZIP protein dimerization on DNA binding.

(A) EMSA-FRET assay used to quantify bZIP heterodimers and homodimers binding to DNA. Fluorescein and TAMRA are depicted as blue and green stars, respectively. In the EMSA gel, homodimers give rise to pseudo-colored blue (fluorescein) or green (TAMRA) signals, whereas heterodimers give a FRET signal that is pseudo-colored red. (B) EMSA-FRET results for bZIP dimers binding to selected heterodimer-specific emergent sites (brown) and conjoined half-sites (blue). Bar graphs show the percent of the indicated DNA oligomer bound by each dimer. The PPI strength of each dimer is indicated with gray-scale circles sized according to the Kd for a given protein-protein interaction. Homodimers are marked with an asterisk (*). (C) EMSA-FRET results for bZIP dimers tested for binding to DNA sites composed of conjoined half-sites. Left, dimers tested against two different sites composed of conjoined half-sites. Right, dimers tested against a single site. Data are displayed as in B.

DOI: http://dx.doi.org/10.7554/eLife.19272.012

Figure 3.

Figure 3—figure supplement 1. Influence of bZIP protein dimerization on DNA binding.

Figure 3—figure supplement 1.

(A) Detecting heterodimer DNA complexes using an EMSA-FRET assay. Top, Fluorescein signal in blue, TAMRA signal in green, and FRET signal in red. Bottom, TYE 665 labeled DNA site. (B) Three examples to explain the notation used in part C summarize data for DNA binding by homodimers and heterodimers composed of (left) ATF3 and DBP, (middle) ATF3 and CEBPA and (right) BATF3 and JUN. Within each example, rows indicate different bZIP dimers. The top row describes the homodimer formed by the first-mentioned bZIP, the bottom row is for the other homodimer, and the middle row contains data for the heterodimer. Within each example, each column represents binding to a different DNA site composed of two half-sites. DNA-binding affinity is indicated using a green-scale heatmap with key indicating % binding at far right. The color of the cell border indicates strength of the protein-protein interaction as measured previously by FRET, indicated by yellow-scale heatmap at right. ATF3•DBP example: top row is ATF3 binding to CRE-PAR, middle row is ATF3 • DBP heterodimer binding to CRE-PAR, bottom is DBP homodimer binding to CRE-PAR. ATF3•CEBPA example: Top row is ATF3 homodimer, middle row is ATF3 • CEBPA heterodimer, and bottom row is CEBPA homodimer. Binding is to CRE-CAAT in left column and TRE-CAAT in right column. BATF3•JUN example: Top row is BATF3 homodimer, middle row is BATF3 • JUN heterodimer and bottom row is JUN homodimer. Binding is to CRE-CRE in left column and CRE-CREA in right column. (C) Complete set of EMSA-FRET data. Examples in B are included in this grid and other cells can be interpreted analogously.
Figure 3—figure supplement 2. Heterospecific binding of DNA.

Figure 3—figure supplement 2.

Top, DNA sequences composed of optimal half sites. Bottom, comparison of an optimal DNA site to a heterodimer-specific non-optimal DNA site. DNA sequences for EMSA-FRET experiments are reported in Supplementary file 1D.

We used this EMSA-FRET assay to quantify the DNA binding of 83 bZIP homodimers and heterodimers comprised of 16 proteins. Each heterodimer was systematically examined with DNA sites that were constructed by conjoining the preferred half-site(s) for each bZIP. Figure 3B shows EMSA-FRET data for six heterodimers and corresponding homodimers binding to heterodimer-specific sites (three conjoined sites and three emergent sites). For these sites, the CSI intensity for the heterodimers is higher than the scores for either of the two contributing homodimers. The EMSA-FRET data demonstrate clearly that neither the JUN nor the ATF3 homodimers associate with the emergent site identified for JUN•ATF3 (Figure 3B and Figure 3—figure supplement 2). Similarly, emergent sites identified for ATF4•CEBPA and ATF4•JUN, and several conjoined sites such as TRE-CAAT for ATF3•CEBPA, CRE-L-CAAT for BATF3•CEBPA, and CRE-CRE-L for BATF3•JUN, were validated by EMSA-FRET as bona fide heterodimer-specific cognate sites that show weaker binding, or no binding, by the contributing homodimers (Figure 3B). EMSA-FRET data also validate the ability of BATF3 to bind emergent CRE-L half-sites as a heterodimer (in addition to the CRE site preferred by the homodimer). The complete EMSA-FRET data are presented in a more compact format in Figure 3—figure supplement 1.

A striking result of our CSI analysis is that conjoined half-sites form a substantive fraction (~70%) of the cognate sites bound by heterodimers. To determine how frequently DNA half-sites derived from homodimer-binding data, when presented as conjoined sites, would bind the corresponding heterodimers, we tested DNA binding by EMSA-FRET for stably interacting bZIP heterodimers (PPI: Kd <1 µM at 21°C), and the corresponding homodimers (Figure 3C). Consistent with CSI analysis, 52 out of 56 bZIP pairs that form stable heterodimers bound the DNA site made by conjoining the half-site preferred by each monomer. Specific binding to conjoined sites was also detected for 6 out of 27 (22%) pairs that do not stably associate in the absence of DNA (PPI: Kd >1 µM at 21°C). This fraction is similar to the 20% of bZIP pairs (29 out of 144) that showed sequence-specific DNA binding in SELEX-seq experiments despite their apparent inability to dimerize in the absence of DNA.

ATF3: a case study of the influence of interacting partners on heterodimer cognate site preferences

Activating Transcription Factor 3 (ATF3) is a member of the CREB/ATF family. Initially identified as a suppressor of inflammation and the adaptive immune response in resting cells, ATF3 is now associated with numerous diseases including a variety of aggressive and widely occurring cancers (Tanaka et al., 2011; Thompson et al., 2009; Yin et al., 2008). ATF3 is able to interact with a large variety of TFs to function as a regulatory hub of cellular adaptive response (Gilchrist et al., 2006; Hai et al., 1999, 2010). As a homodimer, ATF binds to CRE sites and represses a wide array of genes (Hai et al., 1999, 2010). However, as a heterodimer with JUN or JUND, ATF3 activates transcription of targeted genes (Chu et al., 1994; Filén et al., 2010; Hsu et al., 1992).

To test the hypothesis that heterodimerization with other bZIPs might alter DNA-binding specificity and possibly genomic targets, we analyzed SELEX-seq for 20 different ATF3 heterodimers spanning the full range of PPI affinities. DNA-binding specificities could be assigned with high confidence to nine heterodimers that displayed a range of DNA sequence preferences, including affinity for the CRE site preferred by the homodimer (Figures 4A and B). Importantly, distinct DNA-binding preferences among ATF3•CEBP and ATF3•BATF heterodimers and their corresponding homodimers were detected. The motifs enriched by the ATF3 homo- and heterodimers can be described in five broad categories: CRE, TRE, CRE-CAAT, CRE-L, and the emergent 5′TGACGCAT3′ site (Figure 4B). Scatter plots illustrate instances where the CSI intensities of ATF3 heterodimers differ markedly from those of the parent homodimers (Figure 4C and D). For example, as evident from high CSI intensities, CRE-CAAT sites (red) are preferably bound by ATF3•CEBPG as compared to ATF3 or CEBPG (Figure 4C, top panel). Similarly, scores for TRE (green) and 5′TGACGCA3′ (black) are higher for JUN•ATF3 than for JUN or ATF3 (Figure 4C, middle panel). BATF3•ATF3 (Figure 4C, bottom panel) and BATF2•ATF3 (Figure 4D, top panel) enrich CRE-L sites (blue), further supporting that CRE-L is an emergent site for BATF family heterodimers (also with JUN in Figure 3C). Figure 4D further highlights the differences between CRE and TRE binding by ATF3 in its homo versus heterodimer state. An important and recurring observation is that several ATF3 heterodimers (BATF3•ATF3, JUN•ATF3, and JUNB•ATF3) can associate with the CRE site that is bound by the ATF3 homodimer (Figure 4).

Figure 4. ATF3 heterodimers bind a range of distinct cognate sites.

Figure 4.

(A) Hierarchical clustering of pairwise comparisons of DNA-binding specificity (10-mers) for ATF3 homodimer and 9 ATF3-containing heterodimers. (B) DNA logos showing the MEME motifs derived from the top 1000 12-mer sequences for ATF3 homodimer and ATF3-containing heterodimers. Grey-scale circles next to dimer names indicate PPI strength using the scale from Figure 3. (C) 3-dimensional and (D) 2-dimensional scatter plots comparing the DNA-binding specificities of bZIP homodimers vs. ATF3-containing heterodimers. Scatter plots of quantile-normalized CSI intensities (z-scores) of ATF3 dimers for 80,000 10-mers are shown.

DOI: http://dx.doi.org/10.7554/eLife.19272.015

Heterodimer sites enriched in SELEX-seq map to occupied genomic loci in vivo

To determine the extent to which cognate sites identified by SELEX-seq can explain genome-wide occupancy in cells, we examined ChIP-seq data for ATF3 in four different human cell lines. H1 human embryonic stem cells, HEPG2 liver-derived hepatocellular carcinoma cells, and K562 erythroblastoma cells have been examined comprehensively (Encode, 2011). The fourth cell line, GBM1 from Glioblastoma multiforme, is an aggressive brain cancer, wherein ATF3 is a tumor suppressor and its loss of function is indicative of high-grade cancer and poor prognosis (Gargiulo et al., 2013). As a first step, we identified ATF3 ChIP-seq peaks and examined the overlap between the genomic sites occupied by ATF3 in all four lines. Only a small number of sites (119) were common between the four cell lines, although the number increased to 1602 genomic loci if only the ENCODE cells lines (H1, K562, and HEPG2) were examined (Figure 5A). This is a minor fraction of the over 10,000 peaks identified in K562 and about a third of the 4808 ATF3-bound sites in H1 cells.

Figure 5. ATF3 binds to different genomic regions using diverse motifs.

(A) Venn diagram of the numbers of ATF3-bound regions determined by ChIP-seq in different cell lines. (B) Heatmap of the False Positive Rate (FPR)-cutoffs at which ATF3 ChIP-seq peaks (rows) are detected as positive for ATF3 or ATF3-dimer binding. Peaks were scored using CSI intensities of the ATF3 homodimer or ATF3-containing heterodimers (columns) in H1hESCs, K562, and HEPG2 cells, and clustered by FPR-cutoffs across all dimers. (C) Same as (B) for the glioblastoma multiforme (GBM1) cell line. Highlighted clusters (blue and green) contain DNA motifs preferred by different ATF3 dimers and are enriched with different Gene Ontology Biological Process terms. False Discovery Rates (q-values) for each GO term are shown. See Supplementary file 1H.

DOI: http://dx.doi.org/10.7554/eLife.19272.016

Figure 5.

Figure 5—figure supplement 1. ROC curves.

Figure 5—figure supplement 1.

(A) Area Under the Receiver Operating Characteristic curve (AUC-ROC) values for the intersection of ChIP-seq peaks determined using in vitro specificity profiles of the corresponding bZIP heterodimer, as described in Materials and methods. x-axis: False-Positive Rate; y-axis: True-Positive Rate (TPR). ChIP-seq peaks from specified cell lines were downloaded from the ENCODE project. (B) ROC curves and AUC values for ChIP-seq peaks (all peaks) determined using DNA-binding specificity profiles of the corresponding bZIP homodimer.

ATF3 ChIP-seq peaks likely include both homodimer and heterodimer bound regions. To assess how well the in vitro discovered cognate sites explain bound sites in a cellular context, we used area under the curve-receiver operating characteristic (AUC-ROC) values, plotting the true-positive rate (TPR) versus false-positive rate (FPR) for peak detection (Materials and methods). ATF3 homodimer sites spanning the entire spectrum of CSI intensities (z-scores) yielded 0.67–0.77 AUC values (Supplementary file 1E and Figure 5—figure supplement 1). Using the AUC-ROC approach, we examined the ability of CSI profiles of nine different ATF3 heterodimers as well as the ATF3 homodimer to identify ATF3 ChIP-seq peaks that might represent heterodimer-bound regions. We used published RNA-seq datasets to verify the expression of the bZIP genes used for the ATF3 heterodimer analysis (Supplementary file 1F) (Encode, 2011; Gargiulo et al., 2013). Each of the 10 CSI datasets captures a large but varying fraction of the ATF3 peaks and, intriguingly, these data reveal that different ATF3 heterodimers perform better in different cell lines (Supplementary file 1E). For example, JUN•ATF3 gives 0.85 AUC in the Glioblastoma line, whereas BATF3•ATF3 better explains the ChIP-seq peaks in H1 and HepG2 cells with AUC of 0.69 and 0.70, respectively. While AUC-ROC curves are not robust to subtle changes, the differences we observe may reflect underlying cell line specific differences in the abundance and regulatory roles of different ATF3 heterodimers. The underlying epigenetic landscapes would further exacerbate these differences. Nevertheless, when considered together, the ATF3 homodimer combined with nine different heterodimers can account for a much larger fraction of ChIP-seq peaks than can the homodimer alone. For example, at an FPR-cutoff of 0.10, in the Glioblastoma cell line, the ATF3 homodimer classified just 39% of the ATF3 ChIP-seq peaks as positive, whereas 85% of the peaks are classified positive by at least one of the ATF3-containing dimers at FPR 0.10. Similar analysis for other cell lines and at different FPR cutoffs is reported in Supplementary file 1G.

Given the cell-type-specific differences in genomic sites occupied by ATF3, we scored the ATF3-bound loci for each cell line using the CSI data for 10 ATF3 dimers. Peaks were then clustered based on the FPR-cutoffs for each bound region (Figure 5B–C and Materials and methods). All four cell lines show clear clusters of sites where one or more heterodimer detects a peak at a lower FPR compared to the ATF3 homodimer. Several such clusters are apparent for heterodimers with CEBP or BATF family members. A striking result that emerged from the analysis of the GBM1 cell line is that multiple ATF3-bound genomic loci were better described by ATF3-heterodimers than the homodimer. For GBM1, we further examined two clusters of ChIP peaks for which heterodimers scored better than the ATF3 homodimer (Figure 5C, blue and green clusters in the dendrogram). In the blue cluster, de novo motif discovery revealed enrichment of a CRE-CAAT motif, which is the motif with maximal CSI intensities for CEBP•ATF3 dimers. De novo motif search of ChIP-seq peaks in the green cluster identified the TRE motif, which is the top ranked motif for ATF3 heterodimers formed with JUNB, JUN, FOS, and FOSL1, all of which are expressed in GBM1 cells (Supplementary file 1F). This is in contrast to the CRE motif preferred by the ATF3 homodimer. Gene ontology functional annotations of genes linked to the CRE-CAAT (blue) and TRE (green) clusters also differ substantially (Figure 5C and Supplementary file 1H). CRE-CAAT sites preferred by ATF3•CEBP heterodimers (blue cluster) enriched for gene ontology (GO) terms related to immune response and JAK-STAT signaling, whereas TRE sites (green cluster) enriched for GO terms associated to nutrient sensing, PDGF signaling, and cell junction regulation. This observation lends support to the notion that heterodimers drive cell-type and signal-specific gene networks.

Co-occupied genomic loci bear emergent and conjoined sites

Sharpening our focus to a subset of genomic loci that are co-occupied by ATF3 and another bZIP permitted us to examine whether heterodimer cognate sites were evident at co-occupied genomic loci. In Tier 1 ENCODE cell lines such as H1 and K562, occupancy of multiple TFs has been charted across the genome (Dunham et al., 2012). We first examined loci co-occupied by ATF3 and CEBPB or JUN. In H1 embryonic stem cells, we identified a region on chromosome I that shows overlapping ChIP peaks for ATF3 and CEBPB (Figure 6A, top panel). This locus is also resistant to DNAse I, suggesting that ATF3 and CEBPB are binding to a seemingly inaccessible part of the genome. Plotting CSI intensities for a given TF across the genome generates CSI-Genomescapes (Figure 6A–B, bottom panels; Materials and methods). CSI-Genomescapes in the co-occupied region identified a high-intensity site for the ATF3•CEBPA heterodimer, whereas no high-intensity sequences were found for ATF3 or CEBPA homodimers (Figure 6A). CEBPA is the closest homolog to CEBPB for which CSI data were obtained. Similar CSI-Genomescape analysis of a locus with overlapping ATF3 and JUN peaks readily identified the JUN•ATF3 emergent site (5′TGACGCAT3′). This site is within DNase I accessible euchromatin, and CSI-Genomescapes provide scant support for either JUN or ATF3 homodimer binding to this site (Figure 6B).

Figure 6. bZIP heterodimer DNA sites are bound in vivo.

(A) ChIP-seq traces for ATF3 (blue) and CEBPB (orange) and DNase I hypersensitivity (black) trace for in H1 human embryonic stem cells. Below, CSI-Genomescape for bound genomic regions for ATF3 and CEBPA homodimers and ATF3•CEPBA heterodimer. CEBPA and CEBPB share 76% identity over their bZIP domain. (B) ChIP-seq traces for ATF3 (blue) and JUN (green) and DNAse I hypersensitivity trace (black) in K562 cells. Below, CSI-Genomescape for bound genomic region for ATF3 and JUN homodimers, and for JUN•ATF3 heterodimer. (C) Venn diagram of bound regions (ChIP-seq peaks) for ATF3 and CEBPB in H1hESC and for (D) ATF3 and JUN in K562 cells. (E) Violin plots of CSI-seq scores for the ChIP-seq peaks derived from the intersection of ATF3 and CEBPB ChIP peaks (1018 overlapping peaks) in H1 stem cells using in vitro data for ATF3, CEBPA, CEBPB (from Jolma et al.) (Jolma et al., 2013), CEBPE, CEBPG homodimers and ATF3•CEBPA, ATF3•CEBPE, and ATF3•CEBPG heterodimers. CSI intensities were quantile normalized. (F) Violin plots of CSI-seq scores for the ChIP-seq peaks derived from the intersection of ATF3 and JUN ChIP peaks (left, 6539 overlapping peaks) in K562 cells, left. Violin plots for the subset of overlapping peaks of ATF3 and JUN containing a match for the heterodimer-specific site TGACGCAT (39 peaks), right. Peaks were scored using ATF3 and JUN homodimers, and JUN•ATF3 heterodimers.

DOI: http://dx.doi.org/10.7554/eLife.19272.018

Figure 6.

Figure 6—figure supplement 1. CSI intensities for bound genomic regions.

Figure 6—figure supplement 1.

Violin plots of CSI intensities (z-scores) for (A) Negative regions were taken from ±5 kb from the center of each ATF3 and CEBPB overlapping ChIP peaks in H1 cells. (B) Negative regions were taken from ±5 kb from the center of each ATF3 and JUN overlapping ChIP peaks in K562 cells. (C) ATF3 ChIP-seq peaks after removing peaks that overlap with CEBPB in H1 cells. (D) CEBPB ChIP-seq peaks after removing peaks that overlap with ATF3 in H1 cells.

Next, we identified genomic loci that are co-occupied by ATF3 and CEBPB in H1 (1018 overlapping ChIP peaks) or ATF3 and JUN in K562 cells (6539 overlapping ChIP peaks; Figure 6C–D). We then used CSI data of different homo- and heterodimers to assign CSI scores within these co-occupied regions. Violin plots clearly demonstrate that regions co-occupied by ATF3 and CEBPB have higher CSI intensities when scored with ATF3•CEBP heterodimers than when scored with ATF3 and CEBP homodimers (Figure 6E and Figure 6—figure supplement 1). In contrast, for loci co-occupied by JUN and ATF3, violin plots indicate that cognate sites for JUN•ATF3 heterodimer perform only marginally better at explaining the genomic binding data than sites preferred by JUN or ATF3 homodimers (Figure 6F, left panel). This observation is consistent with the ability of the JUN•ATF3 heterodimer to bind consensus CRE sites that are also bound by each contributing homodimer. The perceptibly higher CSI intensity when using JUN•ATF3 cognate sites might arise from heterodimer-preferred TRE sites or heterodimer-specific emergent sites. To examine this possibility, we utilized CSI-Genomescapes to score all co-occupied regions that include emergent heterodimer-specific 5′TGACGCAT3′ sites (39 sites). When this subset of genomic regions was examined with homodimer CSI data, the violin plots reveal the inability of ATF3 homodimer cognate sites to account for the ChIP signals, whereas JUN homodimers account for some of the JUN occupancy at those regions (Figure 6F, right panel). In contrast, the ATF3•JUN heterodimer cognate sites showed the highest scores for the emergent site.

Heterodimer-specific cognate sites map to SNPs associated with diseases

Armed with 102 CSI profiles of bZIP dimers, we scrutinized 5076 non-coding single-nucleotide polymorphisms (SNPs) that are associated with diseases and quantitative traits (Maurano et al., 2012). We reasoned that non-coding SNPs that are not assigned to known TF cognate sites might be explained with our compendium of new bZIP-DNA interactomes. As a first step, we calibrated our CSI data by examining SNPs that are known to alter binding by CREB1 and CEBPA (Figure 7A top panel and Figure 7—figure supplement 1A). The minor allele of rs10993994 in the promoter of the MSMB gene has been associated with prostate cancer and it creates a cognate site that is bound by CREB1 (Lou et al., 2009). Similarly, the minor allele of rs12740374 has been associated with myocardial infarction, aberrant plasma levels of low-density lipoprotein cholesterol (LDL-C), and enhanced expression of SORT1 gene in the liver (Musunuru et al., 2010). Biochemical studies have demonstrated that the G-to-T change generates an optimal CAAT site that is bound by CEBPA. We applied CSI-Genomescape analysis to both SNPs. In both cases, the minor allele has a higher CSI intensity than the corresponding major allele, suggesting that the minor alleles of these SNPs create CEBPA- and CREB1-binding sites (Figure 7A and Figure 7—figure supplement 1). The CSI-Genomescape for rs7631605 site is particularly interesting because it predicts disruption of the emergent site 5′TGACGCAT3′ (Figure 7A middle panel). This allele is associated to Alzheimer’s disease and mild cognitive impairment (MCI) and elevated levels of phosphorylated Tau-181P (Han et al., 2010). Additionally, CSI-Genomescape predicts that rs1869901, a variant associated with schizophrenia, impacts binding of FOS•CEBPE by altering a TRE-CAAT site (Figure 7A bottom panel).

Figure 7. bZIP heterodimers and human diseases and traits.

(A) CSI-Genomescape predicts increased binding by CREB1 to the alternate allele of rs10993994 and decreased binding to alternate alleles of rs7631605 and rs1869901 by JUN•ATF3 and FOS•CEBPE heterodimers, respectively. (B) Scatterplot of FOS•JUN predicted CSI intensities for reference and alternative alleles of 5076 autosomal SNPs linked to human diseases and quantitative traits identified in genome-wide association studies. SNPs and disease/traits classifications are from Maurano et al. (Maurano et al., 2012). (C) (left) Number of SNPs predicted to increase or decrease bZIP binding by twofold at different stringency levels determined by noise factor F (see Materials and methods). The F values at which a twofold difference in CSI score is predicted for rs12740374 (#) and rs10993994 (*) are indicated in red. (right) Distribution of predicted fold changes in bZIP binding for GWAS SNPs using CSI Intensities, using F = 25. Dashed lines mark a twofold change. Red lines indicate the predicted change in binding of CREB1 and CEBPA to rs10993994 (*) and rs12740374 (#). (D) Predicted fold-change in CSI score of sequences centered at SNPs linked to disease or quantitative traits. A total of 156 SNPs have a predicted increase (red) or decrease (blue) of ≥2 fold in CSI score for at least one bZIP dimer, when F = 25 (Materials and methods and Supplementary file 1I). Fold-changes are relative to the reference genome hg19. Rows (SNPs) are organized by class of disease/trait. Columns (bZIP dimers) are clustered by DNA specificity as in Figure 1.

DOI: http://dx.doi.org/10.7554/eLife.19272.020

Figure 7.

Figure 7—figure supplement 1. Genomescapes, transcription factor binding, and chromatin environment for selected SNPs.

Figure 7—figure supplement 1.

(A) Left, CSI Genomescape and right, UCSC genome browser screen shots of the genomic and chromatin context of SNPs rs12740374 and rs10993994. (B) UCSC genome browser screen shots of the genomic and chromatin context of SNPs rs3758354 and rs17293632. UCSC genome browser tracks for ChIP-seq peaks for selected bZIPs in ENCODE cell lines, ChIP-seq signal for histone 3 lysine 27 acetylation (H3K27Ac marks), and DNAse I hypersensitive regions.

A scatterplot of CSI intensity scores for FOS•JUN (AP-1) for reference (hg19) or alternate alleles reveals SNPs that create or disrupt binding sites (Figure 7B). The plot shows that nearly all the 5076 SNPs are near the origin and do not lead to large differences in CSI scores for the FOS•JUN heterodimer. However, a striking example of predicted increase in binding is rs3758354, a SNP associated with schizophrenia, depression, and bipolar disorder (Huang et al., 2010). In contrast, a decrease in FOS•JUN heterodimer binding is predicted for rs17293632, a variant linked to Crohn’s autoimmune disorder (Franke et al., 2010). ChIP-seq studies in several cell lines examined by the ENCODE consortium have shown binding by FOS and JUN to both loci, providing support that these sites are accessed by bZIP proteins in a cellular context (Figure 7—figure supplement 1B).

Extending beyond AP-1, we used 102 bZIP CSI profiles to score both alleles of the 5076 SNPs and calculated a predicted fold-change in CSI intensity, which correlates with binding affinity (Figure 1—figure supplement 2) (Carlson et al., 2010; Puckett et al., 2007). Similar correlations also hold true for other high-throughput platforms (Berger et al., 2006; Fordyce et al., 2010; Slattery et al., 2011). We added a noise factor to our scoring function to make the fold-change predictions less sensitive to low CSI intensities (Figure 7C; Materials and methods). A total of 156 SNPs yielded a greater than twofold difference in CSI intensity between the reference and alternate alleles (Figure 7C–D). Displaying the predicted increase (blue) or decrease (red) in binding by 102 bZIP dimers at 156 SNPs reveals minor alleles that are targeted by unique heterodimers as well as mutations that have wide-ranging impacts on multiple bZIP dimers. For example, rs10994336 is predicted to increase CSI intensity by at least twofold for 44 out of 102 bZIP pairs reported here. We also report that 80% of the identified changes impact bZIP heterodimers. In the richly annotated RegulomeDB database that ties SNP impact to occurrence of TF-binding sites, only 20 of 156 SNPs are currently annotated with a bZIP motif (Boyle et al., 2012). It is particularly important to note that many of the SNPs in the database are annotated with PWMs derived from bZIP homodimers, whereas our CSI intensity fold-change predictions for 22 homo- and 80 bZIP heterodimers make use of the entire bZIP-DNA interactomes (all 10-mers). The clusters in Figure 7D also point to potential roles of bZIP proteins in less understood diseases and provide new hypotheses for the etiology of such diseases and traits.

Discussion

Transcription factors rarely function alone, different TFs are activated by different cellular stimuli, and specific combinations of TFs converge at specific genomic loci to regulate expression of genes (Ptashne and Gann, 2002). Such combinatorial control provides the means to integrate multiple signals and tune the expression of specific genes or sculpt genome-wide transcriptomes in a nuanced manner. The ability of different TFs to form hetero-oligomers via PPI and protein-DNA interaction is an essential feature of this process. While most eukaryotic TFs can bind DNA as monomers, the bZIP class of TFs only binds DNA as homo- or heterodimers. The ability of bZIPs to form heterodimers appears to increase with increasing evolutionary complexity, with human bZIPs displaying more intricate heterodimerization networks than C. elegans and D. melanogaster, which in turn, exhibit more complex dimerization networks than S. cerevisiae (Reinke et al., 2013). Comprehensive PPI analysis has shown that 36 human bZIP monomers can form nearly 217 heterodimers, greatly expanding the repertoire of factors that can potentially bind DNA (Reinke et al., 2013). We demonstrate that this diversity of dimers expands the DNA sequence space that can be targeted by bZIPs. Our study further reveals that nearly 20% of the non-interacting bZIP pairs examined can be induced to dimerize at cognate DNA sites, providing yet greater diversity from a modest number of contributing monomers.

Given the large repertoire of human bZIP heterodimers, this family of TFs is particularly amenable to effect combinatorial control. Indeed, this potential was recognized long ago (Bohmann et al., 1987; Franza et al., 1988; Lamb and McKnight, 1991). An ever-increasing body of evidence now implicates bZIPs in numerous aspects of cellular and organismal function. Given their importance, a systematic study of DNA binding by bZIP heterodimers is clearly essential to understanding their functions. However, despite large surveys charting the TF-DNA interactomes (Badis et al., 2009, 2008; Berger et al., 2008; Carlson et al., 2010; Fordyce et al., 2010; Franco-Zorrilla et al., 2014; Grove et al., 2009; Jolma et al., 2010, 2013; Kamesh et al., 2015; Nitta et al., 2015; Noyes et al., 2008; Siggers et al., 2012; Wei et al., 2010; Weirauch et al., 2014), bZIP dimers were under-scrutinized with only a handful of heterodimers reported thus far (Cohen et al., 2015; Jolma et al., 2015; Mann et al., 2013). Thus, it was quite unclear prior to this work how dimerization between different bZIP partners would impact DNA recognition. The DNA-binding profiles for 80 heterodimers, which we report alongside equivalent data for 22 homodimers, is the largest bZIP heterodimer DNA-binding data reported to date and provides unprecedented insight into the impact of heterodimer formation.

Guided by PPI data, we examined the DNA-binding specificities of 126 stable dimers and 144 bZIP dimers that display no dimerization even at 1 µM. These 270 bZIP pairs represent a wide survey of the 666 potential pairs that can be formed by 36 monomers. The bZIP-DNA interactomes and specificity landscapes that emerged revealed three classes of cognate sites and several heterodimers displayed an ability to interact with more than one class of binding site. Of the three classes, conjoined half-sites were the most abundant, with nearly 72% of all heterodimers displaying some affinity for such sites. The second class contained variably-spaced half-sites, often overlapping by a single nucleotide. The final class, comprising 16% of the sites, was the least expected ‘emergent’ class of binding sites, where new non-obvious preferences for half-sites were revealed. Emergent sites targeted by heterodimers fall into ‘loss of specificity’ or ‘gain of specificity’ categories, as defined above. EMSA-FRET analyses not only quantified the relative affinities of hetero- and homodimers for these sites but also revealed the widespread ability of heterodimers to associate with cognate sites that have typically assumed to be bound by homodimers. More closely examining the emergent site targeted by ATF3•JUN, we find its occurrence at multiple locations across the human genome and, more importantly, several of these sites are co-occupied by ATF3 and JUN in vivo. Further emphasizing a physiological role for these non-obvious binding sites, a SNP that disrupts this site is linked to neurological diseases (Han et al., 2010).

The high granularity of our CSI data also revealed that sequences flanking well-studied homodimer motifs, such as CRE, can impart sub-structure to the motif that is recognized and preferentially bound by different bZIP pairs. Access to such nuanced specificity preferences allows better annotation of genome-wide binding data for bZIPs for which specificity profiles and high-quality ChIP data exist. This is particularly relevant because it is not uncommon for ChIP or genomic DNase I footprinting experiments to identify TF-bound regions that lack matches to the consensus motifs for a given TF. Our results suggest that a fraction of such in vivo occupied regions likely contain heterodimer binding sites. Another important insight from our comparative analysis of genome-wide binding profiles across four cell types is that a given heterodimer associates with distinct set of genomic loci in each cell type. The results suggest that underlying chromatin and epigenetic landscapes in different cell types may contribute significantly to the sites that are accessed by bZIP dimers. In this context, the ability of ATF3•CEBPB to bind a cognate site within seemingly closed chromatin is consistent with the ability of bZIPs such as CEBPB and FOS•JUN to function as ‘pioneer’ factors that first associate with closed chromatin and enable binding of additional TFs to yield transcriptionally active euchromatin (Biddie et al., 2011; Garber et al., 2012). Whether the ability to bind just one half site is important, or whether DNA-templated dimerization of bZIPs confers any added ability to bind an otherwise inaccessible enhancer in closed chromatin, remains to be determined.

Finally, the specificity and binding energy profiles of 102 bZIP dimers enables a more nuanced examination of SNPs that have been linked by genome-wide association studies to various diseases and quantitative traits. The vast majority of SNPs associated with diseases occur in non-coding regions of the genome and most are not readily annotated by the available TF-DNA interactomes perhaps in part because the focus has been on obtaining consensus motifs of monomeric or homodimeric TFs. Rather than consensus motifs, the use of the full spectrum of binding specificity may enable more accurate mapping of TF-binding sites onto SNPs that are linked to diseases and phenotypic traits. Our compendium of CSI profiles accurately predicted creation of known bZIP cognate sites by previously validated SNPs. Of the 156 SNPs predicted by our CSI profiles to impact bZIP binding, nearly 77% were mapped to bZIP heterodimers, highlighting the importance of determining protein-DNA interactomes for heterodimer TFs. Nearly 64% created bZIP binding sites and were ‘gain of function’ changes relative to the human reference genome. These results are consistent with the 10-fold greater abundance of bZIP heterodimers over homodimers and the observation that aberrant stimulation of gene networks is arguably a greater contributor to disease etiology (Bell et al., 2015; Lee and Young, 2013; Mansour et al., 2014). SNPs that disrupt binding also contribute to disease, an example of this form of regulatory perturbation being the loss of the emergent JUN•ATF3-binding site that is associated with Alzheimer’s and other neurological, cognitive and behavioral disorders. Our bZIP-DNA interactomes identify 156 SNPs that potentially impact 646 bZIP binding events, any one of these could potentially contribute to the associated ailments. Not only do our data help better annotate the genome, they also serve as an invaluable resource to generate hypotheses on how genetic variants may contribute to the etiology of a range of diseases.

The recent emergence of powerful high-throughput platforms for mapping protein-DNA interactomes has brought the goal of comprehensively mapping the binding specificities of all individual human TFs within reach (Carlson et al., 2010; Jolma et al., 2013; Stormo and Zhao, 2010; Weirauch et al., 2014). However, it is clear from our work as well as the recent work of others that the binding of TFs to each other, and/or to adjacent DNA sites, can influence binding specificity profiles in important ways (Ansari and Peterson-Kaufman, 2011; Garvie et al., 2001; Grove et al., 2009; Jolma et al., 2015; Mann et al., 2013; Siggers et al., 2012; Slattery et al., 2011). Our study heralds the important next wave of specificity mapping, in which the field will tackle the effects of higher order interactions and begin to relate these to the transcriptional control of key biological processes.

Materials and methods

bZIP cloning, expression, and labeling

Human bZIP proteins containing the basic-region and coiled-coiled domains with an N-terminal 6x His tag and a C-terminal intein-chitin binding domain were expressed as described previously (Reinke et al., 2013). Sequences are in Supplementary file 1A. Briefly, Escherichia coli RP3098 cells transformed with bZIP clones were grown in 0.5 L LB cultures at 37°C to OD600 = 0.4–0.8. Expression was induced with the addition of 0.5 mM IPTG (Isopropyl β-D-thiogalactopyranoside) and cultures incubated for 3–4 hr at which point cells were pelleted. Cells pellets were resuspended in 20 mM HEPES pH 8.0, 500 mM NaCl, 2 mM EDTA (ethylenediaminetetraacetic acid), 1 M guanidine-HCl, 0.2 mM PMSF (phenylmethylsulfonyl fluoride), and 0.1% Trition X-100). Cells were then sonicated and the lysate poured over a column of 1 ml chitin beads to bind the protein (NEB, Ipswich, MA). The column was then washed and equilibrated in EPL buffer (50 mM HEPES pH 8.0, 500 mM NaCl, 200 mM MESNA (2-mercaptoethanesulfonic acid), 1 M guanidine-HCl). The bZIP domain was then cleaved from the intein and labeled with biotin on the C-terminus by incubation for at least 16 hr in 1 ml EPL buffer containing 1 mg/ml cysteine-lysine-biotin peptide (CELTEK Peptides, Nashville, TN). The cleaved and biotin-labeled proteins were then eluted from the column using EPL buffer without MESNA and then diluted fivefold into denaturing buffer (6 M guanidine-HCl, 5 mM imidazole, 0.5 M NaCl, 20 mM TRIS, 1 mM (DTT) Dithiothreitol, pH 7.9) and bound to a column containing 1 ML Ni-NTA beads (QIAGEN, Hilden, Germany). Columns were washed and proteins eluted with 60% ACN (Acetonitrile)/0.1% TFA (Trifluoroacetic acid). The labeled proteins were then lyophilized, resuspended, and desalted using spin-columns (Bio-Rad, Hercules, CA). Proteins were stored in 10 mM potassium phosphate pH 4.5 at −80°C. Peptide concentrations were determined by measuring absorbance at 280 nM in 6 M guanidine-HCl/100 mM sodium phosphate pH 7.4. The fluorescein and TAMRA-labeled proteins used in gel-shift assays were generated as described previously (Reinke et al., 2013).

Cognate site identification by HT-SELEX

Cognate-binding sites for bZIP homo- and heterodimers were determined by SELEX-seq (Jolma et al., 2010; Tietjen et al., 2011; Zhao et al., 2009; Zykovich et al., 2009). A DNA library (Integrated DNA Technologies, Inc.) with a central randomized 20 bp region (1012 possible sequences), flanked by constant sequences used for amplification was used (Supplementary file 1B). In vitro selections were performed as follows. For bZIP homodimers, purified, C-terminal biotinylated-bZIP proteins (50 nM) were added to 100 nM of DNA library (Binding buffer: 1x PBS (10 mM PO43-, 137 mM NaCl, 2.7 mM KCl), pH 7.6, 2.5 mM DTT, 50 ng/µl poly dI-dC, 0.1% BSA) and incubated at room temperature for 1 hr. The DNA library concentration and volume (20 µl) were such that there was a high probability of sampling at least one copy of every 20-mer sequence (1012 permutations). bZIP-DNA complexes were enriched with streptavidin-coated magnetic beads (Dynabeads, Invitrogen, Carlsbad, CA) following the manufacturer’s protocol. After pull-down, three quick washes with 100 µl ice-cold binding buffer were performed to remove unbound DNA. Beads were resuspended in a PCR master mix (EconoTaq PLUS 2X Master Mix, Lucigen) and the DNA was amplified for 15 cycles. Amplified DNA was column purified (QIAGEN), quantified by absorbance at 260 nm, and used for subsequent binding rounds. Three rounds of selection were performed. For bZIP heterodimers, one bZIP partner had a C-terminal biotin tag. bZIP-DNA complexes were pulled down with streptavidin-coated magnetic beads. Several steps were followed to decrease DNA binding by competing homodimers: (1) a 1:3 molar ratio with an excess of the non-biotinylated bZIP was used to shift the thermodynamic equilibrium from the biotin-labeled homodimer; (2) the biotin-bZIP used for pull-down was chosen as the more weakly interacting homodimer of the two interaction partners. As a convention, when naming bZIP heterodimers, the bZIP-biotin is listed first, unless otherwise stated. After three rounds of selection, an additional PCR was done to incorporate Illumina sequencing adapters and a unique 6 bp barcode for multiplexing. The starting library (Round 0) was also barcoded. Up to 180 samples were combined and sequenced in a single Illumina GAIIx or HiSeq2000 lane.

Sequencing data analysis

Illumina sequencing yielded ~180 million reads per lane. Reads were de-multiplexed by requiring an exact match to the 6 bp barcode and truncated to include only the 20 bp derived from the random portion of the library. On average, we obtained 850,000 reads per barcode. The occurrence of every k-mer (lengths 8 through 14 bp) was counted using a sliding window of size k. To correct for biases in our starting DNA library, we took the ratio of the counts of every k-mer to the expected number of counts in the starting library. The starting library was modeled using a fifth-order Markov Model derived from the sequencing reads corresponding to the starting library (Round 0) (Slattery et al., 2011). We then calculated a CSI intensity (z-score = (x – µ)/ σ) for each k-mer, using the distribution of k-mer enrichment values for that dimer. The most enriched 10, 12, and 14 bp subsequences were used to derive PWM motifs using MEME. Samples that failed to enrich specific sequences relative to the starting library (Round 0) or that only enriched low-complexity sequences were not included in further analysis. Data files for 20 bp reads and normalized 10 bp sequences are available at https://ansarilab.biochem.wisc.edu/computation.html.

Previously reported bZIP-DNA interaction data were downloaded from study PRJEB3289 in the European Nucleotide Archive (http://www.ebi.ac.uk/ena/data/view/PRJEB3289) (Jolma et al., 2013). 20 bp reads for bZIP proteins and their corresponding 20 bp DNA library (round 0) were analyzed as described previously.

Homodimer and heterodimer clustering

Binding profiles were defined for each bZIP pair using the CSI intensities (z-scores) of 1222 unique 10-mer sequences. This set of 10-mers is composed of the 50 highest-scoring sequences for each dimer. Unsupervised hierarchical clustering of pair-wise binding profile similarities, assessed by Pearson’s correlation coefficient r, was done using R. Dendrograms and heatmaps were generated using the heatmap.2 function in the gplots R-package. Heterodimers were labeled as such if the bZIP-DNA complex was pulled-down by a biotinylated bZIP that does not binds DNA as a homodimer in our experimental conditions. If the bZIP used for pull-down of the bZIP heterodimer also bound DNA as a homodimer, the observed DNA specificity was assigned to the heterodimer only if the heterodimer specificity landscape was different (t-test p<0.05) from the homodimer specificity, assessed by correlation scores (Figure 1—figure supplement 4).

Sequence logos

PWMs were derived from the 1000 most enriched 12-mer sequences (ranked by z-score) for each bZIP pair, using the MEME (Bailey and Elkan, 1994). The most enriched 14-mer sequences were used for MAF dimers. MEME was run with following parameters: -dna -mod anr -nmotifs 10 -minw 8 -maxw 18 -time 7200 -maxsize 60000 –revcomp.

Specificity and energy landscapes

SELs display high-throughput protein-DNA (or protein-RNA) binding data for both array and sequencing methods (Campbell et al., 2012; Carlson et al., 2010; Tietjen et al., 2011). The organization of data in SEL is detailed in Figure 2A. The SELs shown in this work were generated from 10-, 12-, or 14-mer intensity files. Seed motifs were derived from PWM-derived DNA logos or from the highest intensity k-mer, and are shown on top of each SEL. The length of the seed motifs has to be smaller than the k-mer length of the CSI intensity file. The software to generate SELs is provided as Source Code Files (SEL_10MER and SEL12MER_14MER).

Electrophoretic mobility shift assay–FRET

An electrophoretic mobility shift assay (EMSA) with fluorescence resonance energy transfer (FRET) readout was used to validate bZIP heterodimer binding to DNA. The assay relies on the ability to observe FRET between two fluorophores, TAMRA and fluorescein, as well as to detect each fluorophore in the absence of FRET (Figure 3A). The assay also measures DNA fluorescence to ensure that protein-DNA complexes are being examined. Two versions of each bZIP were made, one conjugated to TAMRA and the other to fluorescein. We observed that the fluorophores reproducibly retard (TAMRA) or increase (fluorescein) the mobility of the bZIP protein that they are attached to and thus assist in resolving each heterodimer with respect to the two homodimers formed by contributing partners. The sequences of all the DNA sites used are listed in Supplementary file 1D. Each site was flanked by six constant nucleotides on each side (GAGTCC-site-CCGTAG). Oligos modified on the 5′ end with the dye TYE 665 (IDT, Coralville, IA) were annealed with an unlabeled reverse-complement oligo. Binding reactions contained 50 nM of each fluorescein- and TAMRA-labeled proteins, 10 nM annealed dye-labeled DNA in 20 µl of binding buffer (50 mM potassium phosphate pH 7.4, 150 mM KCl, 0.1% BSA, 0.1% Tween-20, 5 ng/µl poly (dI-dC), 0.5 mM TCEP). Samples were mixed, incubated at 37°C for 30 min, and then at 21°C for 30 min. NOVEX 6% DNA retardation gels were loaded with 16 µl of each sample (Life Technologies, Carlsbad, CA) and run at 300V for 20–22 min at 22–25°C. Gels were then imaged using a Typhoon 9500 scanner (GE Healthcare Bio-Sciences Corp., Piscataway, NJ) with separate channels for fluorescein, TAMRA, TYE 665, and FRET. Bleed through between channels was corrected using the spectral-unmixing plugin in ImageJ (http://rsb.info.nih.gov/ij/). The amount of DNA bound for the homodimers was calculated by quantifying the DNA signal that corresponded to all three bound species (fluorescein homodimer, TAMRA homodimer, and mixed-dye homodimer). For the heterodimers, the amount of DNA bound was calculated by quantifying the DNA signal that corresponded to the mixed dye heterodimer. The amount of bound DNA was divided by the amount of unbound DNA run without protein added. For each heterodimer, the interaction was measured twice, with the fluorescein and TAMRA dye on different proteins, and the average of the two measurements is reported.

ChIP-seq data

ChIP-seq peaks from the ENCODE project used in this work were downloaded from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/ (Dunham et al., 2012). Overlapping genomic regions of ChIP-seq peaks were determined and extracted using bedops (Neph et al., 2012). For ATF3 ChIP-seq in GBM1 cells, aligned reads (.bam file) were downloaded from GEO (GSE33912). ATF3 peaks were called using the MACS tool (Zhang et al., 2008) in the Galaxy (Goecks et al., 2010) platform using default parameters. Overlapping ATF3-bound regions between different cell lines (Figure 5) were determined using the ChIPpeakAnno R-package (Zhu et al., 2010).

CSI genomescapes: scoring in vivo bound sites with in vitro data

A CSI Genomescape is a plot generated by assigning in vitro CSI intensities (z-scores) to genomic regions. To generate the CSI Genomescapes in Figures 6 and 7, a 10 bp sliding window was used to score reported ChIP-seq peaks using quantile-normalized CSI intensities for different bZIP dimers as follows: Given a bZIP pair and a ChIP-seq peak, the peak was assigned the maximum CSI intensity for any 10-mer within the reported peak.

Receiver operating characteristic

CSI Genomescapes of ChIP-seq data sets were then used to generate Receiver Operating Characteristic (ROC) curves to reflect how well the in vitro binding data for different bZIPs explains the ChIP-seq data. In this analysis, ChIP-seq peaks were used as a true positive set, whereas two regions of equal length ±5 kbp from the center of each peak (that did not overlap another ChIP-seq peak) were chosen to make the true negative set. The fraction of regions in the positive vs. negative sets with scores above a varying CSI intensity cutoff were plotted to generate ROC curves (True Positive Rate vs. False Positive Rate). ATF3-bound regions (ChIP-seq peaks) were scored with the CSI intensities for ATF3 homodimer or for ATF3-containing heterodimers to generate the areas under the curves. Heatmaps and clustergrams in Figure 5 were made by hierarchical clustering of the lowest FPR-cutoff values at which peaks were detected as positives using the CSI intensities of the ATF3 containing dimers. ROC curves and heatmaps were generated in MATLAB.

De novo motifs and functional annotation of ChIP-seq peaks

Motif finding within ChIP-seq peaks was done with MEME-ChIP with default settings (Machanick and Bailey, 2011). Enrichment of functional annotations of genomic regions was done with Genomic Regions Enrichment of Annotations Tool (GREAT) with default settings (McLean et al., 2010). Gene Ontology annotations that are significantly enriched (FDR < 0.05) by both binomial and hypergeometric test are shown. The False Discovery Rate (q-value) is corrected for multiple hypothesis tests.

Single-nucleotide polymorphism scoring

SNPs linked to diseases or quantitative traits by GWAS were obtained from the Supplemental Table S2 from Maurano et al., which reports human SNPs associated to diseases and quantitative traits (Maurano et al., 2012). For each SNP, we considered 21 bp region centered on the SNP (10 bp on each side) and assigned a score using the CSI intensity data all 10-mers. We scored both alleles using a 10 bp sliding window and assigning the highest CSI intensity (z-score) in the 21 bp fragment; each 21 bp region was scored with twelve 10 bp windows. We calculated a predicted fold-difference in CSI intensity between a given SNP and its reference allele (hg19) using the following formula:

(CSI Intensity for alternate allele  Minimum CSI Intensity+A)(CSI Intensity for reference allele (hg19)  Minimum CSI Intensity+A)

where the A = (Maximum CSI Intensity – Minimum CSI Intensity) * F, Minimum CSI Intensity = minimum CSI Intensity (z-score) among the scored SNPs, Maximum CSI Intensity = maximum CSI Intensity (z-score) among the scored SNPs. And F is a noise factor which was varied from 1% to 90%, from lower to higher stringency in estimating the predicted difference in CSI intensity. We added a noise factor (F) to the formula to make the fold-change prediction less sensitive to low CSI scores and decrease the number of false-positives predictions.

Acknowledgements

We thank Professor Parmesh Ramanathan and members of the Ansari and Keating laboratories for helpful discussions, Christos Kougentakis for technical assistance with EMSA assays, Laura Vanderploeg for help with the artwork, and Marie Adams from the University of Wisconsin Biotechnology Center DNA Sequencing Facility. This study was supported by NIH award R01 GM096466 to AEK, and NIH grants R01 CA133508 and U01 HL099773, and the W M Keck Medical Research Award to AZA. JARM was supported by the National Human Genome Research Institute (NHGRI) training grant of the Genome Sciences Training program (T32 HG002760).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

  • National Institutes of Health R01GM096466 to Amy E Keating.

  • W. M. Keck Foundation to Aseem Z Ansari.

  • National Institutes of Health U01HL099773 to Aseem Z Ansari.

  • National Institutes of Health R01CA133508 to Aseem Z Ansari.

  • National Institutes of Health T32HG002760 to Jose A Rodriguez-Martinez.

Additional information

Competing interests

AZA: The sole member of VistaMotif, LLC and founder of the nonprofit WINStep Forward.

The other authors declare that no competing interests exist.

Author contributions

JAR-M, Conceptualization, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing.

AWR, Conceptualization, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing.

DB, Conceptualization, Software, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing.

AEK, Conceptualization, Funding acquisition, Methodology, Writing—original draft, Writing—review and editing.

AZA, Conceptualization, Funding acquisition, Methodology, Writing—original draft, Writing—review and editing.

Additional files

Supplementary file 1.

(A) bZIP sequences. (B) DNA library and primers. (C) DNA stabilized bZIP dimers. (D) Oligonucleotide sequences for EMSA. (E) ROC-AUC. (F) Expression of bZIP genes. (G) ATF3 dimers in ChIP-seq peaks. (H) GREAT GO annotations. (I) SNP fold-change predictions.

DOI: http://dx.doi.org/10.7554/eLife.19272.022

elife-19272-supp1.xlsx (125.8KB, xlsx)
DOI: 10.7554/eLife.19272.022
Supplementary file 2. MEME motifs and Sequence Specificity and Energy Landscapes (SEL) for human bZIP homodimers and heterodimers.

DOI: http://dx.doi.org/10.7554/eLife.19272.023

elife-19272-supp2.pdf (1.8MB, pdf)
DOI: 10.7554/eLife.19272.023

Major datasets

The following dataset was generated:

José A Rodríguez-Martínez,Aaron W Reinke,Devesh Bhimsaria,Amy E Keating,Aseem Z Ansari,2017,Data from: Combinatorial dimerization of human bZIP factors confers preferences for different classes of DNA binding sites,https://ansarilab.biochem.wisc.edu/computation.html,Publicly available on the Ansari Lab (https://ansarilab.biochem.wisc.edu/)

The following previously published datasets were used:

Gargiulo G,Cesaroni M,Serresi M,Lancini C,De Vries N,Hulsman D,van Lohuizen M,2013,Functional Identification of Critical Bmi1 target genes in Neural Progenitor and Malignant Glioma cells,http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33912,Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE33912)

Jolma A,Yan J,Whitington T,Toivonen J,Nitta KR,Rastas P,Morgunova E,Enge M,Taipale M,Wei G,Palin K,Vaquerizas JM,Vincentelli R,Luscombe NM,Hughes TR,Lemaire P,Ukkonen E,Kivioja T,Taipale J,2017,DNA-binding specificities of human transcription factors,http://www.ebi.ac.uk/ena/data/view/PRJEB3289,Publicly available at the EMBL European Nucleotide Archive (accession no: PRJEB3289)

References

  1. Ansari AZ, Peterson-Kaufman KJ. A partner evokes latent differences between hox proteins. Cell. 2011;147:1220–1221. doi: 10.1016/j.cell.2011.11.046. [DOI] [PubMed] [Google Scholar]
  2. Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, Gebbia M, Talukder S, Yang A, Mnaimneh S, Terterov D, Coburn D, Li Yeo A, Yeo ZX, Clarke ND, Lieb JD, Ansari AZ, Nislow C, Hughes TR. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Molecular Cell. 2008;32:878–887. doi: 10.1016/j.molcel.2008.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, Bulyk ML. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings. International Conference on Intelligent Systems for Molecular Biology. 1994;2:28–36. [PubMed] [Google Scholar]
  5. Bell RJ, Rube HT, Kreig A, Mancini A, Fouse SD, Nagarajan RP, Choi S, Hong C, He D, Pekmezci M, Wiencke JK, Wrensch MR, Chang SM, Walsh KM, Myong S, Song JS, Costello JF. Cancer. the transcription factor GABP selectively binds and activates the mutant TERT promoter in Cancer. Science. 2015;348:1036–1039. doi: 10.1126/science.aab0015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotechnology. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Peña-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, Khalid F, Zhang W, Newburger D, Jaeger SA, Morris QD, Bulyk ML, Hughes TR. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Biddie SC, John S, Sabo PJ, Thurman RE, Johnson TA, Schiltz RL, Miranda TB, Sung MH, Trump S, Lightman SL, Vinson C, Stamatoyannopoulos JA, Hager GL. Transcription factor AP1 potentiates chromatin accessibility and glucocorticoid receptor binding. Molecular Cell. 2011;43:145–155. doi: 10.1016/j.molcel.2011.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bohmann D, Bos TJ, Admon A, Nishimura T, Vogt PK, Tjian R. Human proto-oncogene c-jun encodes a DNA binding protein with structural and functional properties of transcription factor AP-1. Science. 1987;238:1386–1392. doi: 10.1126/science.2825349. [DOI] [PubMed] [Google Scholar]
  10. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M. Annotation of functional variation in personal genomes using RegulomeDB. Genome Research. 2012;22:1790–1797. doi: 10.1101/gr.137323.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Campbell ZT, Bhimsaria D, Valley CT, Rodriguez-Martinez JA, Menichelli E, Williamson JR, Ansari AZ, Wickens M. Cooperativity in RNA-protein interactions: global analysis of RNA binding specificity. Cell Reports. 2012;1:570–581. doi: 10.1016/j.celrep.2012.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Carlson CD, Warren CL, Hauschild KE, Ozers MS, Qadir N, Bhimsaria D, Lee Y, Cerrina F, Ansari AZ. Specificity landscapes of DNA binding molecules elucidate biological function. PNAS. 2010;107:4544–4549. doi: 10.1073/pnas.0914023107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chu HM, Tan Y, Kobierski LA, Balsam LB, Comb MJ. Activating transcription factor-3 stimulates 3',5'-cyclic adenosine monophosphate-dependent gene expression. Molecular Endocrinology. 1994;8:59–68. doi: 10.1210/mend.8.1.8152431. [DOI] [PubMed] [Google Scholar]
  14. Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, Agarwal A, Huang W, Parkhurst CN, Muratet M, Newberry KM, Meadows S, Greenfield A, Yang Y, Jain P, Kirigin FK, Birchmeier C, Wagner EF, Murphy KM, Myers RM, Bonneau R, Littman DR. A validated regulatory network for Th17 cell specification. Cell. 2012;151:289–303. doi: 10.1016/j.cell.2012.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cohen DM, Won KJ, Nguyen N, Lazar MA, Chen CS, Steger DJ. ATF4 licenses C/EBPβ activity in human mesenchymal stem cells primed for adipogenesis. eLife. 2015;4:e06821. doi: 10.7554/eLife.06821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Collins C, Wang J, Miao H, Bronstein J, Nawer H, Xu T, Figueroa M, Muntean AG, Hess JL. C/EBPα is an essential collaborator in Hoxa9/Meis1-mediated leukemogenesis. PNAS. 2014;111:9899–9904. doi: 10.1073/pnas.1402238111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Costa RH, Kalinichenko VV, Holterman AX, Wang X. Transcription factors in liver development, differentiation, and regeneration. Hepatology. 2003;38:1331–1347. doi: 10.1016/j.hep.2003.09.034. [DOI] [PubMed] [Google Scholar]
  18. Deng T, Karin M. JunB differs from c-Jun in its DNA-binding and dimerization domains, and represses c-Jun by formation of inactive heterodimers. Genes & Development. 1993;7:479–490. doi: 10.1101/gad.7.3.479. [DOI] [PubMed] [Google Scholar]
  19. Deppmann CD, Alvania RS, Taparowsky EJ. Cross-species annotation of basic leucine zipper factor interactions: insight into the evolution of closed interaction networks. Molecular Biology and Evolution. 2006;23:1480–1492. doi: 10.1093/molbev/msl022. [DOI] [PubMed] [Google Scholar]
  20. Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis C, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. ENCODE A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biology. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Filén S, Ylikoski E, Tripathi S, West A, Björkman M, Nyström J, Ahlfors H, Coffey E, Rao KV, Rasool O, Lahesmaa R. Activating transcription factor 3 is a positive regulator of human IFNG gene expression. The Journal of Immunology. 2010;184:4990–4999. doi: 10.4049/jimmunol.0903106. [DOI] [PubMed] [Google Scholar]
  23. Fordyce PM, Gerber D, Tran D, Zheng J, Li H, DeRisi JL, Quake SR. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotechnology. 2010;28:970–975. doi: 10.1038/nbt.1675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Franco-Zorrilla JM, López-Vidriero I, Carrasco JL, Godoy M, Vera P, Solano R. DNA-binding specificities of plant transcription factors and their potential to define target genes. PNAS. 2014;111:2367–2372. doi: 10.1073/pnas.1316278111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Verspaget HW, Hugot JP, Karban A, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Russell R, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Wilson D, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma RK, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ, Parkes M. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nature Genetics. 2010;42:1118–1125. doi: 10.1038/ng.717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Franza BR, Rauscher FJ, Josephs SF, Curran T. The Fos complex and Fos-related antigens recognize sequence elements that contain AP-1 binding sites. Science. 1988;239:1150–1153. doi: 10.1126/science.2964084. [DOI] [PubMed] [Google Scholar]
  27. Garber M, Yosef N, Goren A, Raychowdhury R, Thielke A, Guttman M, Robinson J, Minie B, Chevrier N, Itzhaki Z, Blecher-Gonen R, Bornstein C, Amann-Zalcenstein D, Weiner A, Friedrich D, Meldrim J, Ram O, Cheng C, Gnirke A, Fisher S, Friedman N, Wong B, Bernstein BE, Nusbaum C, Hacohen N, Regev A, Amit I. A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Molecular Cell. 2012;47:810–822. doi: 10.1016/j.molcel.2012.07.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gargiulo G, Cesaroni M, Serresi M, de Vries N, Hulsman D, Bruggeman SW, Lancini C, van Lohuizen M. In vivo RNAi screen for BMI1 targets identifies TGF-β/BMP-ER stress pathways as key regulators of neural- and malignant glioma-stem cell homeostasis. Cancer Cell. 2013;23:660–676. doi: 10.1016/j.ccr.2013.03.030. [DOI] [PubMed] [Google Scholar]
  29. Garvie CW, Hagman J, Wolberger C. Structural studies of Ets-1/Pax5 complex formation on DNA. Molecular Cell. 2001;8:1267–1276. doi: 10.1016/S1097-2765(01)00410-5. [DOI] [PubMed] [Google Scholar]
  30. Gilchrist M, Thorsson V, Li B, Rust AG, Korb M, Roach JC, Kennedy K, Hai T, Bolouri H, Aderem A. Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature. 2006;441:173–178. doi: 10.1038/nature04768. [DOI] [PubMed] [Google Scholar]
  31. Goecks J, Nekrutenko A, Taylor J, Galaxy Team Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Grove CA, De Masi F, Barrasa MI, Newburger DE, Alkema MJ, Bulyk ML, Walhout AJ. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell. 2009;138:314–327. doi: 10.1016/j.cell.2009.04.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hai T, Curran T. Cross-family dimerization of transcription factors fos/Jun and ATF/CREB alters DNA binding specificity. PNAS. 1991;88:3720–3724. doi: 10.1073/pnas.88.9.3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hai T, Wolfgang CD, Marsee DK, Allen AE, Sivaprasad U. ATF3 and stress responses. Gene Expression. 1999;7:321–335. [PMC free article] [PubMed] [Google Scholar]
  35. Hai T, Wolford CC, Chang YS. ATF3, a hub of the cellular adaptive-response network, in the pathogenesis of diseases: is modulation of inflammation a unifying component? Gene Expression. 2010;15:1–11. doi: 10.3727/105221610X12819686555015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Han MR, Schellenberg GD, Wang LS, Alzheimer's Disease Neuroimaging Initiative Genome-wide association reveals genetic effects on human aβ42 and τ protein levels in cerebrospinal fluids: a case control study. BMC Neurology. 2010;10:14. doi: 10.1186/1471-2377-10-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Hauschild KE, Stover JS, Boger DL, Ansari AZ. CSI-FID: high throughput label-free detection of DNA binding molecules. Bioorganic & Medicinal Chemistry Letters. 2009;19:3779–3782. doi: 10.1016/j.bmcl.2009.04.097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Herdegen T, Waetzig V. AP-1 proteins in the adult brain: facts and fiction about effectors of neuroprotection and neurodegeneration. Oncogene. 2001;20:2424–2437. doi: 10.1038/sj.onc.1204387. [DOI] [PubMed] [Google Scholar]
  39. Hsu JC, Bravo R, Taub R. Interactions among LRF-1, JunB, c-Jun, and c-Fos define a regulatory program in the G1 phase of liver regeneration. Molecular and Cellular Biology. 1992;12:4654–4665. doi: 10.1128/MCB.12.10.4654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Huang J, Perlis RH, Lee PH, Rush AJ, Fava M, Sachs GS, Lieberman J, Hamilton SP, Sullivan P, Sklar P, Purcell S, Smoller JW. Cross-disorder genomewide analysis of schizophrenia, bipolar disorder, and depression. American Journal of Psychiatry. 2010;167:1254–1263. doi: 10.1176/appi.ajp.2010.09091335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Jain J, McCaffrey PG, Valge-Archer VE, Rao A. Nuclear factor of activated T cells contains fos and jun. Nature. 1992;356:801–804. doi: 10.1038/356801a0. [DOI] [PubMed] [Google Scholar]
  42. Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M, Taipale M, Vaquerizas JM, Yan J, Sillanpää MJ, Bonke M, Palin K, Talukder S, Hughes TR, Luscombe NM, Ukkonen E, Taipale J. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Research. 2010;20:861–873. doi: 10.1101/gr.100552.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, Taipale J. DNA-binding specificities of human transcription factors. Cell. 2013;152:327–339. doi: 10.1016/j.cell.2012.12.009. [DOI] [PubMed] [Google Scholar]
  44. Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, Enge M, Kivioja T, Morgunova E, Taipale J. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature. 2015;527:384–388. doi: 10.1038/nature15518. [DOI] [PubMed] [Google Scholar]
  45. Jung KA, Kwak MK. The Nrf2 system as a potential target for the development of indirect antioxidants. Molecules. 2010;15:7266–7291. doi: 10.3390/molecules15107266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kamesh N, Lambert SA, Yang AWH, Riddell J, Mnaimneh S, Zheng H, Albu M, Najafabadi HS, Reece-Hoyes JS, Bass JIF, Hughes TR, Weirauch MT, Walhout AJM. Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities. eLife. 2015;4:e06967. doi: 10.7554/eLife.06967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kim J, Struhl K. Determinants of half-site spacing preferences that distinguish AP-1 and ATF/CREB bZIP domains. Nucleic Acids Research. 1995;23:2531–2537. doi: 10.1093/nar/23.13.2531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kittler R, Zhou J, Hua S, Ma L, Liu Y, Pendleton E, Cheng C, Gerstein M, White KP. A comprehensive nuclear receptor network for breast Cancer cells. Cell Reports. 2013;3:538–551. doi: 10.1016/j.celrep.2013.01.004. [DOI] [PubMed] [Google Scholar]
  49. König P, Richmond TJ. The X-ray structure of the GCN4-bZIP bound to ATF/CREB site DNA shows the complex depends on DNA flexibility. Journal of Molecular Biology. 1993;233:139–154. doi: 10.1006/jmbi.1993.1490. [DOI] [PubMed] [Google Scholar]
  50. Lamb P, McKnight SL. Diversity and specificity in transcriptional regulation: the benefits of heterotypic dimerization. Trends in Biochemical Sciences. 1991;16:417–422. doi: 10.1016/0968-0004(91)90167-T. [DOI] [PubMed] [Google Scholar]
  51. Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152:1237–1251. doi: 10.1016/j.cell.2013.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Lopez-Bergami P, Lau E, Ronai Z. Emerging roles of ATF2 and the dynamic AP1 network in Cancer. Nature Reviews Cancer. 2010;10:65–76. doi: 10.1038/nrc2681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lou H, Yeager M, Li H, Bosquet JG, Hayes RB, Orr N, Yu K, Hutchinson A, Jacobs KB, Kraft P, Wacholder S, Chatterjee N, Feigelson HS, Thun MJ, Diver WR, Albanes D, Virtamo J, Weinstein S, Ma J, Gaziano JM, Stampfer M, Schumacher FR, Giovannucci E, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Crawford ED, Anderson SK, Tucker M, Hoover RN, Fraumeni JF, Thomas G, Hunter DJ, Dean M, Chanock SJ. Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. PNAS. 2009;106:7933–7938. doi: 10.1073/pnas.0902104106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. doi: 10.1093/bioinformatics/btr189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Male V, Nisoli I, Gascoyne DM, Brady HJ. E4BP4: an unexpected player in the immune response. Trends in Immunology. 2012;33:98–102. doi: 10.1016/j.it.2011.10.002. [DOI] [PubMed] [Google Scholar]
  56. Mann IK, Chatterjee R, Zhao J, He X, Weirauch MT, Hughes TR, Vinson C. CG methylated microarrays identify a novel methylated sequence bound by the CEBPB|ATF4 heterodimer that is active in vivo. Genome Research. 2013;23:988–997. doi: 10.1101/gr.146654.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, Etchin J, Lawton L, Sallan SE, Silverman LB, Loh ML, Hunger SP, Sanda T, Young RA, Look AT. Oncogene regulation. an oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science. 2014;346:1373–1377. doi: 10.1126/science.1259037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Stamatoyannopoulos JA. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology. 2010;28:495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Miller M. The importance of being flexible: the case of basic region leucine zipper transcriptional regulators. Current Protein & Peptide Science. 2009;10:244–269. doi: 10.2174/138920309788452164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Murphy TL, Tussiwand R, Murphy KM. Specificity through cooperation: batf-irf interactions control immune-regulatory networks. Nature Reviews Immunology. 2013;13:499–509. doi: 10.1038/nri3470. [DOI] [PubMed] [Google Scholar]
  62. Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs KV, Li X, Li H, Kuperwasser N, Ruda VM, Pirruccello JP, Muchmore B, Prokunina-Olsson L, Hall JL, Schadt EE, Morales CR, Lund-Katz S, Phillips MC, Wong J, Cantley W, Racie T, Ejebe KG, Orho-Melander M, Melander O, Koteliansky V, Fitzgerald K, Krauss RM, Cowan CA, Kathiresan S, Rader DJ. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature. 2010;466:714–719. doi: 10.1038/nature09266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, Sandstrom R, Humbert R, Stamatoyannopoulos JA. BEDOPS: high-performance genomic feature operations. Bioinformatics. 2012;28:1919–1920. doi: 10.1093/bioinformatics/bts277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Nitta KR, Jolma A, Yin Y, Morgunova E, Kivioja T, Akhtar J, Hens K, Toivonen J, Deplancke B, Furlong EE, Taipale J. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. eLife. 2015;4:04837. doi: 10.7554/eLife.04837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Noyes MB, Christensen RG, Wakabayashi A, Stormo GD, Brodsky MH, Wolfe SA. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell. 2008;133:1277–1289. doi: 10.1016/j.cell.2008.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Ptashne M, Gann A. Genes and Signals. Cold Spring Harbor Laboratory Press; 2002. [Google Scholar]
  67. Puckett JW, Muzikar KA, Tietjen J, Warren CL, Ansari AZ, Dervan PB. Quantitative microarray profiling of DNA-binding molecules. Journal of the American Chemical Society. 2007;129:12310–12319. doi: 10.1021/ja0744899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Reinke AW, Baek J, Ashenberg O, Keating AE. Networks of bZIP protein-protein interactions diversified over a billion years of evolution. Science. 2013;340:730–734. doi: 10.1126/science.1233465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Siggers T, Chang AB, Teixeira A, Wong D, Williams KJ, Ahmed B, Ragoussis J, Udalova IA, Smale ST, Bulyk ML. Principles of dimer-specific gene regulation revealed by a comprehensive characterization of NF-κB family DNA binding. Nature Immunology. 2012;13:95–102. doi: 10.1038/ni.2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker HJ, Mann RS. Cofactor binding evokes latent differences in DNA binding specificity between hox proteins. Cell. 2011;147:1270–1282. doi: 10.1016/j.cell.2011.10.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Stormo GD, Zhao Y. Determining the specificity of protein-DNA interactions. Nature Reviews Genetics. 2010;11:751–760. doi: 10.1038/nrg2845. [DOI] [PubMed] [Google Scholar]
  72. Tanaka Y, Nakamura A, Morioka MS, Inoue S, Tamamori-Adachi M, Yamada K, Taketani K, Kawauchi J, Tanaka-Okamoto M, Miyoshi J, Tanaka H, Kitajima S. Systems analysis of ATF3 in stress response and Cancer reveals opposing effects on pro-apoptotic genes in p53 pathway. PLoS One. 2011;6:e26848. doi: 10.1371/journal.pone.0026848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Thanos D, Maniatis T. Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell. 1995;83:1091–1100. doi: 10.1016/0092-8674(95)90136-1. [DOI] [PubMed] [Google Scholar]
  74. Thompson MR, Xu D, Williams BR. ATF3 transcription factor and its emerging roles in immunity and Cancer. Journal of Molecular Medicine. 2009;87:1053–1060. doi: 10.1007/s00109-009-0520-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Tietjen JR, Donato LJ, Bhimisaria D, Ansari AZ. Sequence-specificity and energy landscapes of DNA-binding molecules. Methods in Enzymology. 2011;497:3–30. doi: 10.1016/B978-0-12-385075-1.00001-9. [DOI] [PubMed] [Google Scholar]
  76. Tsukada J, Yoshida Y, Kominato Y, Auron PE. The CCAAT/enhancer (C/EBP) family of basic-leucine zipper (bZIP) transcription factors is a multifaceted highly-regulated system for gene regulation. Cytokine. 2011;54:6–19. doi: 10.1016/j.cyto.2010.12.019. [DOI] [PubMed] [Google Scholar]
  77. Ubeda M, Wang XZ, Zinszner H, Wu I, Habener JF, Ron D. Stress-induced binding of the transcriptional factor CHOP to a novel DNA control element. Molecular and Cellular Biology. 1996;16:1479–1489. doi: 10.1128/MCB.16.4.1479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Vinson CR, Hai T, Boyd SM. Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes & Development. 1993;7:1047–1058. doi: 10.1101/gad.7.6.1047. [DOI] [PubMed] [Google Scholar]
  79. Warren CL, Kratochvil NC, Hauschild KE, Foister S, Brezinski ML, Dervan PB, Phillips GN, Ansari AZ. Defining the sequence-recognition profile of DNA-binding molecules. PNAS. 2006;103:867–872. doi: 10.1073/pnas.0509843102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wei GH, Badis G, Berger MF, Kivioja T, Palin K, Enge M, Bonke M, Jolma A, Varjosalo M, Gehrke AR, Yan J, Talukder S, Turunen M, Taipale M, Stunnenberg HG, Ukkonen E, Hughes TR, Bulyk ML, Taipale J. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. The EMBO Journal. 2010;29:2147–2160. doi: 10.1038/emboj.2010.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158:1431–1443. doi: 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Xie D, Boyle AP, Wu L, Zhai J, Kawli T, Snyder M. Dynamic trans-acting factor colocalization in human cells. Cell. 2013;155:713–724. doi: 10.1016/j.cell.2013.09.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Yamamoto T, Kyo M, Kamiya T, Tanaka T, Engel JD, Motohashi H, Yamamoto M. Predictive base substitution rules that determine the binding and transcriptional specificity of maf recognition elements. Genes to Cells. 2006;11:575–591. doi: 10.1111/j.1365-2443.2006.00965.x. [DOI] [PubMed] [Google Scholar]
  84. Yin X, Dewille JW, Hai T. A potential dichotomous role of ATF3, an adaptive-response gene, in Cancer development. Oncogene. 2008;27:2118–2127. doi: 10.1038/sj.onc.1210861. [DOI] [PubMed] [Google Scholar]
  85. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS) Genome Biology. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Zhao Y, Granas D, Stormo GD. Inferring binding energies from selected binding sites. PLoS Computational Biology. 2009;5:e1000590. doi: 10.1371/journal.pcbi.1000590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS, Green MR. ChIPpeakAnno: a bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics. 2010;11:237. doi: 10.1186/1471-2105-11-237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Zykovich A, Korf I, Segal DJ. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Research. 2009;37:e151. doi: 10.1093/nar/gkp802. [DOI] [PMC free article] [PubMed] [Google Scholar]
eLife. 2017 Feb 10;6:e19272. doi: 10.7554/eLife.19272.030

Decision letter

Editor: Michael R Green1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Combinatorial bZIP dimers define complex DNA-binding specificity landscapes" for consideration by eLife. Your article has been reviewed by two peer reviewers, including Michael Green who is a member of our Board of Reviewing Editors, and Kevin Struhl (who is not responsible for the references to previous work in the reviews) as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This is an interesting manuscript on the DNA binding specificity of dimers in the human bZIP family. bZIP proteins are known to play critical regulatory roles in many cellular processes, and different members of this family are known to form both homo- and heterodimers (sometimes with more than one partner). Prior to the work described in this manuscript, it was unclear how dimerization influences DNA specificity. The authors used SELEX-seq experiments to characterize the binding specificities of 80 heterodimers and 22 homodimers, and used proper control experiments to ensure that the data is interpreted correctly (e.g. they tested that the choice of which bZIP monomer is biotinylated does not affect results).

This rich data set has the potential to change the way we analyze genomic binding of bZIP proteins, especially in terms of assessing the potential impact of SNPs on bZIP-DNA binding. However, in its current form, the manuscript is not ready for publication. Several sections are unclear and hard to read, and there are many incorrect references to figures or supplementary material. Please see specific comments and questions below.

Essential revisions:

1) The identification of "emergent sites" for some bZIP dimers is one of the major insights gained from the authors' large compendium of bZIP specificities. According to the authors, such sites cannot be predicted from the specificities of the homodimers. However, the section presenting this result, "Conjoined, Variably-spaced and Emergent cognate sites bound by heterodimers" is very hard to read.

The supplementary files are not numbered correctly: "[…] the homodimer DNA specificity (e.g., ATF4 vs. ATF4•CEBPA, r = 0.1; Figure 1—figure supplement 3)." I could not find this figure.

In Figure 1C, please annotate in the figure itself which motifs are "conjoined" vs. "variably-spaced" vs. "emergent". Arranging the motifs into these categories would make the manuscript easier to follow. For the motifs that are compared to one another, the authors should display them together and aligned them. Otherwise it is very hard to follow what the authors are trying to convey (see subsection “Conjoined, Variably-spaced and Emergent cognate sites bound by heterodimers” paragraph two). A direct comparison of FOS-CABPG and FOS-CABPE should be shown in a figure.

Related to the same paragraph and Figure 1, it is interesting that BATF3 as a homodimer binds TGACGTCA, but as heterodimers with CEBPA, ATF3, ATF5, and NFIL3, BATF proteins bind GTGG (a CRE-L half-site). Could it be that as homodimers BATF proteins can bind both CRE and CRE-L sites, but these two different sites cannot be captured by a single motif? Certain bZIP subfamilies have been known for many years to bind more than 1 type of motif. For example, see early work from Kevin Struhl's lab on Ap-1 versus ATF/CREB bZIP domains: Ap-1 proteins recognize overlapping TGAC half-sites TGA(C|G)TCA, while ATF/CREB proteins recognize adjacent half-sites TGACGTCA; however, "AP-1 proteins prefer to bind AP-1 sites, but they also bind ATF/CREB sites with only slightly lower affinity" (PMID: 7630732). Given that the authors use single motifs to represent the specificity of homo and heterodimers, such preferences for more than one arrangement of half-sites is not illustrated in the paper.

2) From the manuscript it seems clear that if we take only the top motif for each dimer, then "emergent" motifs cannot be predicted from the motifs of the individual homodimers. But if for each homodimer we consider all the motifs they recognize (sometimes with different affinities, other times with very similar affinity), can we then predict some of the "emergent" motifs?

On a related note: The authors mention that "Nine out of the 80 heterodimers (11%) enriched two motifs (Supplementary file 2). For example, BATF•CEBPG enriched both CRE-CAAT and CRE-L-CAAT motifs." But what about homodimers? Some homodimers might also bind two motifs.

The authors tried to address this issue by using "specificity and energy landscapes" (SEL). While I understand that these landscapes contain a lot more information than the motifs, they are not intuitive to interpret and it is not clear to me how one would immediately extract information about multiple half-site arrangements from SELs. A much simpler representation would be by using several motifs for each homo/hetero-dimer that binds more than 1 half-site arrangement.

3) Comparing SELs is also difficult. When two dimers have a site in common, how can we see that from the SELs? Unless the seed is exactly the same, the SELs are hard to compare.

The authors interpret some SELs in subsection “Specificity and Energy Landscapes (SELs) reveal the entire spectrum of cognate sites bound by heterodimers”, but couldn't the same insights be gained by simply deriving more than 1 motif from each data set?

Maybe one possibility for presenting the data is to generate a set of motifs representing all the half-site combinations observed in the data, and then for each dimer plot its affinities for sites matching each motif.

4) Few studies of heterodimers binding to DNA carefully dissect the binding of hetero- versus homodimers. The authors did this using EMSA-FRET assays, and for a large number of heterodimers (67).

Although this data is very valuable, the way it is presented is not clear – see Figure 3 and associated text. I would recommend using a single color for PPIs, and a single color for PDIs (with intensity proportional to affinity of the interactions). Or even have separate plots for PPIs versus PDIs. Alternatively, the authors could show the data as barplots with PDI strength above the x-axis (i.e. as positive numbers) and PPIs underneath the x-axis (i.e. as negative numbers). Another option would be to use circles where the size reflects PDI strength and the color intensity reflects PDI strength. In subsection “EMSA-FRET analysis to validate heterodimer binding to different cognate sites” the authors say: "It is readily evident that neither the JUN nor the ATF3 homodimers associate with the emergent site identified by SELs for JUN•ATF3 (Figure 3C and Figure 3—figure supplement 1)." To this reviewer it was not evident, and it took quite a lot of time to extract this information from SELs and the figures in the manuscript.

5) The case study of ATF3's specificity being influenced by partner TFs is clear and interesting.

6) Regarding the in vivo ATF3 analysis: the authors found that the motifs of different ATF3 dimes are best at explaining the ATF3 ChIP-seq data in different cell lines. Did the authors verify that the partners identified by their analysis are actually expressed, at high enough levels, in those cell lines?

Also, are the reported differences in AUC-ROC values between different motifs (Figure 5B) significant? What is the magnitude of differences observed for motifs trained on replicate SELEX-seq experiments? The AUC-ROC values are generally not robust to small changes in the motifs. The analysis shown in Figure 5C-D, based on the SELEX-seq data, better reflects the cell-type specific differences in ATF3 heterodimer binding.

7) The analysis in the last section of Results (Figure 7) is very interesting. However, the "noise factor" used in the analyses of SNP data seems ad-hoc. What is the rationale for the formula used to calculate the noise factor?

eLife. 2017 Feb 10;6:e19272. doi: 10.7554/eLife.19272.031

Author response


Essential revisions:

1) The identification of "emergent sites" for some bZIP dimers is one of the major insights gained from the authors' large compendium of bZIP specificities. According to the authors, such sites cannot be predicted from the specificities of the homodimers. However, the section presenting this result, "Conjoined, Variably-spaced and Emergent cognate sites bound by heterodimers" is very hard to read.

We have revised this section of the manuscript extensively and worked to make it more accessible. We now present the reasoning that led us to describe heterodimer-preferred sites as emergent sites (Results section “Conjoined, Variably-spaced, and Emergent cognate sites bound by heterodimers”, third paragraph). We hope the clarity of this section is now improved.

The supplementary files are not numbered correctly: "[…] the homodimer DNA specificity (e.g., ATF4 vs. ATF4•CEBPA, r = 0.1; Figure 1—figure supplement 3)." I could not find this figure.

We have corrected figure supplement numbers in the revised manuscript.

In Figure 1C, please annotate in the figure itself which motifs are "conjoined" vs. "variably-spaced" vs. "emergent". Arranging the motifs into these categories would make the manuscript easier to follow. For the motifs that are compared to one another, the authors should display them together and aligned them. Otherwise it is very hard to follow what the authors are trying to convey (see subsection “Conjoined, Variably-spaced and Emergent cognate sites bound by heterodimers” paragraph two). A direct comparison of FOS-CABPG and FOS-CABPE should be shown in a figure.

We have rearranged the representative heterodimer motifs that are shown in Figure 1. Motifs are now grouped and labeled as conjoined, variably spaced, or emergent. As suggested by the reviewer, FOS-CEBPG (motif 1) is now aligned and placed on top of FOS-CEBPE (motif 4) to facilitate the comparison. In addition, we now include direct comparison of FOS•CEBPG vs. FOS•CEBPE and FOSL1•CEBPE vs FOSL1•CEBPG in Figure 1—figure supplement 5.

Related to the same paragraph and Figure 1, it is interesting that BATF3 as a homodimer binds TGACGTCA, but as heterodimers with CEBPA, ATF3, ATF5, and NFIL3, BATF proteins bind GTGG (a CRE-L half-site). Could it be that as homodimers BATF proteins can bind both CRE and CRE-L sites, but these two different sites cannot be captured by a single motif?

The sequence preferences of BATF, BATF2 and BATF3 homodimers were examined in this study, and of these, only BATF3 homodimer yielded a motif in our SELEX-seq studies (see Supplementary file 1A for the complete list of bZIP dimers examined). The data in Figures 1B, 2C, and 3C and in Supplementary file 2 (logos and landscapes) indicate that BATF3 homodimer binds to the CRE site. The CRE-L motif was not identified using the MEME motif-finding algorithm, and EMSA-FRET analysis showed that BATF3 homodimer binds to CRE, but not to CRE-L (Figure 3C). We conclude that CRE-L binding by BATF3 is extremely weak, if it occurs at all.

Related to the MEME search, we agree with the reviewer that a single motif would not capture both CRE and CRE-L sites. However, we respectfully note that our motif searches were not restricted to single motifs and we did identify examples where more than one motif is bound for both homo and heterodimers. We report the MEME motifs in Supplementary file 2 (see below).

Certain bZIP subfamilies have been known for many years to bind more than 1 type of motif. For example, see early work from Kevin Struhl's lab on Ap-1 versus ATF/CREB bZIP domains: Ap-1 proteins recognize overlapping TGAC half-sites TGA(C|G)TCA, while ATF/CREB proteins recognize adjacent half-sites TGACGTCA; however, "AP-1 proteins prefer to bind AP-1 sites, but they also bind ATF/CREB sites with only slightly lower affinity" (PMID: 7630732). Given that the authors use single motifs to represent the specificity of homo and heterodimers, such preferences for more than one arrangement of half-sites is not illustrated in the paper.

We now cite the groundbreaking study from the Struhl lab in our revised manuscript to provide better context for our work.

In agreement with the earlier studies, we do identify homo and heterodimers that bind multiple motifs, and such cases are marked in Supplementary file 2. In Figure 2B, we present a specificity and energy landscape (SEL) of JUN•ATF3 and highlight the ability of one dimer to bind multiple different binding sites. Examination of SELs for all 102 bZIPs reveals the extent to which different bZIPs associate with multiple motifs. In our revised SEL pipeline, we now include the feature to identify motifs within the sequences that contribute to peaks in a SEL plot. Given the non-uniform impact of changes in the core half-site or in the flanking sequences on the affinity of different bZIP dimers, simply listing all the arrangements results in tables that have the potential to obscure meaningful differences.

In response to the reviewers’ suggestion that we illustrate preferences of bZIPs for different half-site arrangements, we now include Figure 2C, wherein we display the relative affinity of all 102 dimers for half-sites that occur in one of the six classic motifs (second set of six rows). We also display the relative affinity of all 102 bZIPs for sequences identified from SELs of JUN•ATF3 and ATF4•CEBPG (first set of 10 rows in Figure 2C). Together, this heatmap illustrates the relative affinities of 102 bZIP dimers for the entire CRE (row 1), TRE (row 2), CRE-CAAT (rows 6 & 7), TRE-CAAT (row 10) sites as well as the affinities for half-sites from homodimers binding to the six classic sites (rows 11-16). We respectfully note that expanding this heatmap to include all possible half-site arrangements results in a complex table that is less accessible than SELs and that misses many subtle, yet meaningful differences between dimers (such as the differential impact of flanking sequences on the binding of CREB3 and CREB1 to the CRE site -row 5 or the non-uniform impact of changes in the core CAAT half-site that impacts different CEBP containing heterodimers to different degrees –row 8). We propose that SELs plots together with affinity-based motif lists would most effectively capture the range of sequences that can be targeted by a given transcription factor.

2) From the manuscript it seems clear that if we take only the top motif for each dimer, then "emergent" motifs cannot be predicted from the motifs of the individual homodimers. But if for each homodimer we consider all the motifs they recognize (sometimes with different affinities, other times with very similar affinity), can we then predict some of the "emergent" motifs?

On a related note: The authors mention that "Nine out of the 80 heterodimers (11%) enriched two motifs (Supplementary file 2). For example, BATF•CEBPG enriched both CRE-CAAT and CRE-L-CAAT motifs." But what about homodimers? Some homodimers might also bind two motifs.

We use three approaches to identify and validate binding motifs for both homo and heterodimers. First, when applying motif-finding algorithms we do not limit the search to a single motif. Second, rather than focusing on top binding sites, we examine the entire bZIP-DNA interactome with SELs. As mentioned above, in SELs alternate motifs emerge as peaks in the mismatch rings (see Figure 2B). Finally, we use EMSA-FRET to compare and independently validate the sequence preferences of homo- and heterodimers (Figure 3).

In Figure 3C, we found BATF3 homodimer binds CRE sites but displays no binding to the CRE-L site. By comparison, the heterodimers formed by BATF3 (such as ATF2•BATF3 and BATF3•JUN), bound both CRE and CRE-L sites comparably. XBP provides an excellent example of a homodimer binding to both CRE and CRE-L sites with comparable affinity (Figure 3C) and yields both motifs (Supplementary file 2). Moreover, JUN homodimer binds to CRE while displaying no detectable affinity for CRE-L (Figure 3B).

We also note that while homodimers may weakly associate with some sequences, the heterodimerization-dependent switch in their ability to bind the same sequence with high affinity cannot predicted by current models of protein-DNA recognition. Moreover, it is unclear why one weak site, over many others, would emerge as the heterodimer-preferred site. Therefore, while a list of possible sites may be compiled by mixing and matching half-sites, it is not possible to a priori identify emergent sites as the most preferred sites of heterodimers. For this reason, we term them emergent sites as they arise due to heterodimer formation.

The authors tried to address this issue by using "specificity and energy landscapes" (SEL). While I understand that these landscapes contain a lot more information than the motifs, they are not intuitive to interpret and it is not clear to me how one would immediately extract information about multiple half-site arrangements from SELs. A much simpler representation would be by using several motifs for each homo/hetero-dimer that binds more than 1 half-site arrangement.

While the use of SELs to examine protein-DNA interactomes is not common, SELs do provide a comprehensive representation of the sequence specificity a DNA binding protein. In fact, many of the emergent sites are not captured by motif finding algorithms and are often compressed into more generic/classic motifs. Our SEL-based evaluation of various publicly available protein-DNA interactomes revealed up to 40% had compressed multiple related motifs into low-information content generic motifs (Carlson et al. PNAS 2010). Furthermore, the contribution of flanking sequences to binding even the most ideal consensus motifs is greatly dampened by most motif finding algorithms (as noted by us in publications from 2006 onwards and by others –Gordon et al. Mol Cell 2013). Thus, we request the reviewers consider our inclusion of the heatmap (Figure 2C) alongside our SEL displays as scholarly effort to increase the general awareness and appreciation of the breadth of cognate sequences bound by DNA (and RNA) binding proteins.

3) Comparing SELs is also difficult. When two dimers have a site in common, how can we see that from the SELs? Unless the seed is exactly the same, the SELs are hard to compare.

Indeed, as the reviewer points out, the best way to compare two dimers is to make landscapes for each bZIP using the same seed; this is easy to do, and we have found it very effective. Landscapes generated using the same seed permit the generation of a difference-SEL that readily reveals differential specificities of closely related or completely unrelated DNA binding proteins (for example, see Erwin et al., PNAS 2016). This reviewer’s point has prompted us to formalize the “differential-SEL” analysis and disseminate it via a publication in a bioinformatics methods-focused journal.

The authors interpret some SELs in subsection “Specificity and Energy Landscapes (SELs) reveal the entire spectrum of cognate sites bound by heterodimers”, but couldn't the same insights be gained by simply deriving more than 1 motif from each data set?

Maybe one possibility for presenting the data is to generate a set of motifs representing all the half-site combinations observed in the data, and then for each dimer plot its affinities for sites matching each motif.

As we mention above, our examination of publicly available protein-DNA interactomes, we find that infrequent binding sites as well as many closely related binding sites are often compressed into binding motifs that mask information content. That being said, we recognize that PWMs represent a simpler way to represent particular motifs, and one that is familiar to the wider community. Thus, we propose the inclusion of motifs alongside SELs, as we present in Supplementary file 2, as a good strategy for representing the data. Note that the entries for several bZIPs in Supplementary file 2 do include two PWM motifs (grouped in brackets).

4) Few studies of heterodimers binding to DNA carefully dissect the binding of hetero- versus homodimers. The authors did this using EMSA-FRET assays, and for a large number of heterodimers (67).

Although this data is very valuable, the way it is presented is not clear – see Figure 3 and associated text. I would recommend using a single color for PPIs, and a single color for PDIs (with intensity proportional to affinity of the interactions). Or even have separate plots for PPIs versus PDIs. Alternatively, the authors could show the data as barplots with PDI strength above the x-axis (i.e. as positive numbers) and PPIs underneath the x-axis (i.e. as negative numbers). Another option would be to use circles where the size reflects PDI strength and the color intensity reflects PDI strength. In subsection “EMSA-FRET analysis to validate heterodimer binding to different cognate sites” the authors say: "It is readily evident that neither the JUN nor the ATF3 homodimers associate with the emergent site identified by SELs for JUN•ATF3 (Figure 3C and Figure 3—figure supplement 1)." To this reviewer it was not evident, and it took quite a lot of time to extract this information from SELs and the figures in the manuscript.

Based on the reviewer’s advice, we have completely revised Figure 3 and now display the data as bar graph using the color and size scales suggested by the reviewer.

5) The case study of ATF3's specificity being influenced by partner TFs is clear and interesting.

We thank the reviewer for their appreciation of this study.

6) Regarding the in vivo ATF3 analysis: the authors found that the motifs of different ATF3 dimes are best at explaining the ATF3 ChIP-seq data in different cell lines. Did the authors verify that the partners identified by their analysis are actually expressed, at high enough levels, in those cell lines?

To address this concern, we examined RNA-seq data from the ENCODE project to verify that all bZIP genes described in Figure 5 are expressed in the ENCODE cell lines used in our analysis (i.e., H1, HEPG2, and K526). With the exception of BATF and BATF3 the remaining bZIP monomers are expressed in these cell lines. In addition, RNA-seq data from the van Lohuizen laboratory (Gargiulo et al., 2013) shows that the bZIP genes used for the analysis in Figure 5 are expressed in the GBM1 cell line that was used for ChIP-seq of ATF3. We now include this information in section Heterodimer sites enriched in SELEX-seq map to occupied genomic loci in vivo: “We used published RNA-seq datasets to verify the expression of the bZIP genes used for the ATF3 heterodimer analysis (Supplementary file 1F).”

Also, are the reported differences in AUC-ROC values between different motifs (Figure 5B) significant? What is the magnitude of differences observed for motifs trained on replicate SELEX-seq experiments? The AUC-ROC values are generally not robust to small changes in the motifs. The analysis shown in Figure 5C-D, based on the SELEX-seq data, better reflects the cell-type specific differences in ATF3 heterodimer binding.

We fully concur with the reviewer that AUC-ROC values are not robust to small changes. We included those analyses to conform to the common practices in the field. Moreover, while we display the original AUC-ROC curves in Figure 5—figure supplement 1 and report AUC values in Supplementary file 1E, we have removed the table from Figure 5B.

7) The analysis in the last section of Results (Figure 7) is very interesting. However, the "noise factor" used in the analyses of SNP data seems ad-hoc. What is the rationale for the formula used to calculate the noise factor?

In the SNP analysis, we calculated a predicted fold-change between the CSI score for a reference allele (from hg19) and the CSI score for the corresponding alternate allele using the formula described in the Single Nucleotide Polymorphism (SNP) scoring in Materials and methods. To make our fold-difference predictions robust, we added a “noise factor” to the CSI intensity in the numerator and the denominator. The rationale of the noise factor is to make the prediction less sensitive to calculating ratios with low CSI scores. Instead of adding a constant value to every CSI intensity to mitigate for low CSI intensities, we added a value (A) that varied according to the range of the CSI intensities for a given dimer. This was done to take into account the differences in the observed range of CSI intensities between dimers.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 1—source data 1. Data for Figure 1C.

    Pairwise comparison (Pearson's correlation) of the DNA-binding preferences of 102 bZIP dimers using the CSI intensity for 1222 10 bp sequences.

    DOI: http://dx.doi.org/10.7554/eLife.19272.004

    elife-19272-fig1-data1.xlsx (120.7KB, xlsx)
    DOI: 10.7554/eLife.19272.004
    Figure 2—source data 1. Data for Figure 2C.

    Relative CSI intensity for 102 bZIP dimers for different DNA-binding sites and half-sites.

    DOI: http://dx.doi.org/10.7554/eLife.19272.011

    DOI: 10.7554/eLife.19272.011
    Supplementary file 1.

    (A) bZIP sequences. (B) DNA library and primers. (C) DNA stabilized bZIP dimers. (D) Oligonucleotide sequences for EMSA. (E) ROC-AUC. (F) Expression of bZIP genes. (G) ATF3 dimers in ChIP-seq peaks. (H) GREAT GO annotations. (I) SNP fold-change predictions.

    DOI: http://dx.doi.org/10.7554/eLife.19272.022

    elife-19272-supp1.xlsx (125.8KB, xlsx)
    DOI: 10.7554/eLife.19272.022
    Supplementary file 2. MEME motifs and Sequence Specificity and Energy Landscapes (SEL) for human bZIP homodimers and heterodimers.

    DOI: http://dx.doi.org/10.7554/eLife.19272.023

    elife-19272-supp2.pdf (1.8MB, pdf)
    DOI: 10.7554/eLife.19272.023

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES