Abstract
Protein–DNA binding is central to specificity in gene regulation, and methods for characterizing transcription factor (TF)–DNA binding remain crucial to studies of regulatory specificity. High-throughput (HT) technologies have revolutionized our ability to characterize protein–DNA binding by significantly increasing the number of binding measurements that can be performed. Protein-binding microarrays (PBMs) are a robust and powerful HT platform for studying DNA-binding specificity of TFs. Analysis of PBM-determined DNA-binding profiles has provided new insight into the scope and mechanisms of TF binding diversity. In this review, we focus specifically on the PBM technique and discuss its application to the study of TF specificity, in particular, the binding diversity of TF homologs and multi-protein complexes.
Keywords: protein-binding, microarray, specificity, multi-protein, DNA-binding, diversity
INTRODUCTION
Decoding the regulatory logic that governs gene transcription remains a major focus of biological research. Central to this goal is mapping transcription factors (TFs) to transcriptional regulatory elements (i.e. enhancers and promoters), and describing the mechanisms that determine which TFs bind to which regulatory elements. Protein–DNA binding is a primary mechanism of specificity in gene regulation, and efforts to characterize TF–DNA binding remain crucial to studies of regulatory specificity. Over the past 15 years, high-throughput (HT) methodologies have dramatically improved our ability to characterize the DNA-binding specificity of TFs by significantly increasing the number of protein–DNA-binding measurements that can be performed in an experiment. These methods use a range of technologies to increase the number of DNA sequences that can be assayed, including microarrays [1–9], HT sequencing [10–15], microfluidics [16] and cell-based selection systems (which can also be coupled with HT sequencing) [17–19] (Table 1). Studies have used these methods to generate large compendiums of detailed DNA-binding data for many TFs, facilitating genomic analyses of TF function and providing insight into mechanisms of TF-binding specificity [20, 23, 31–38].
Table 1:
Summary of HT methodologies used to study protein-DNA interactions
| Acronym | Name | Throughput | Query length (bp) | Probe length (bp) | Probe type | References |
|---|---|---|---|---|---|---|
| PBM | Protein-binding microarray | 104–105 | 8–36 | 60 | Microarray | [1–5, 20, 21] |
| CSI | Cognate site identifier | 104–105 | 8–10 | 34 nt ssDNA hairpin | Microarray | [8, 22] |
| HiTS-FLIP | High throughput sequencing - fluorescent ligand interaction profiling | 109 | 25 | 135 | Array/ Oligo library | [15] |
| Bind-n-Seq | Bind and sequence | 1013 | 21 | 93 | Oligo library | [10] |
| EMSA-seq | EMSA followed by high throughput sequencing | 105–106 | 20 | 60 | Oligo library | [11] |
| HT-SELEX | High throughput systematic evolution of ligand by exponential enrichment | 106 | 14–40 | 66–96 | Oligo library | [12–14, 23] |
| MEGAshift | Microarray evaluation of genomic aptamers by shift | 103 | 35 | 35 | Oligo library | [7] |
| MITOMI | Mechanically induced trapping of molecular interactions | 102–103 | 15 | 33 | Oligo library | [16, 24] |
| B1H | Bacterial one-hybrid | 107 | 18 | N/A | Oligo library (in plasmid) | [18, 19, 25] |
| DIP-chip | DNA Immunoprecipitation followed by microarray hybridization (chip) | 105 | Genomic fragments | DNA sheared to 600 bp average | Genomic DNA | [26, 27] |
| HT-SPR | High throughput surface plasmon resonance | 102 | 16 | 100 | Synthesized oligos | [28, 29] |
| TIRF-PBM | Total internal reflectance fluorescence PBM | 102 | 12 | 51 | Synthesized oligos | [6, 30] |
Names and features of HT methodologies. Throughput (column 3) lists the approximate number of DNA sequences assayed using the methodologies as reported in the cited papers. Query length (column 4) lists the length of the queried sequence (i.e. binding sites) as reported in the cited papers. Probe length (column 5) lists the length of full DNA oligomer used in the assay; values include query sequence, constant regions and requisite primers/adapters. Probe type (column 6) lists the format of the DNA probe oligomer. Here we distinguish between individually synthesized DNA oligos and oligo libraries to distinguish the scale of the oligo pools. HiTS-FLIP is annotated as ‘Array/Oligo library’, as it uses an oligo library to construct a spatial array of probes in an Illumina sequencing flow cell to which protein is applied [15].
HT methods for measuring protein–DNA binding have provided particular insight into the DNA-binding diversity of TF homologs and multi-protein complexes. Gene duplication throughout evolution has led to genomes containing large families of TFs that share structurally similar DNA-binding domains (DBDs), many of which bind to similar DNA sequences [39–41]. Critical to understanding gene regulatory specificity is determining how TF homologs that recognize similar consensus sequences are able to target distinct genes. For example, humans have 28 ETS-family proteins, half of which are co-expressed in any given cell type, and most of which can bind to similar 5′-GGA(A/T)-3′ containing DNA sites [42]; determining what distinguishes the many co-expressed ETS proteins from one another remains a challenge. Fortunately, HT methodologies are providing more comprehensive DNA-binding data sets than were previously possible, and comparison of detailed TF–DNA-binding profiles is revealing a range of homolog-specific binding preferences that is shedding new light on the issues of TF homolog specificity and diversity. A related area in which HT methods are beginning to make an impact is the role of multi-protein complexes in gene regulatory specificity. Interactions with cofactor proteins are another important mechanism for TFs to diversify and enhance their target gene specificity [43, 44]. In this review, we focus on the widely used protein-binding microarray (PBM) technique, and discuss the application of PBMs to these two central aspects of TF specificity: (i) binding diversity between homologs and (ii) DNA binding of multi-protein complexes. The objective of this review is to address the ways that PBMs can be used to provide insight into TF specificity of homologs and protein complexes. Further, we note that although we discuss primarily PBM-based studies, our conclusions regarding the perspective provided by large-scale data sets are relevant for all HT methodologies.
PROTEIN-BINDING MICROARRAYS
PBMs are as well-established microarray-based technique to study the in vitro binding of proteins to DNA [1–5]. PBM experiments involve applying protein to a double-stranded DNA microarray and then quantifying the amount of DNA-bound protein using a fluorescently labeled antibody. Available high-density, multi-chambered microarray platforms allow the DNA binding of multiple protein samples to be tested in parallel to tens of thousands of DNA sequences [1, 31, 32]. This HT platform has facilitated numerous DNA-binding studies on large groups of TFs [20, 23, 31–38] that have provided rich data sets for comparative analysis of TF-DNA-binding specificity. Furthermore, PBM experiments can be used to measure protein–DNA-binding interactions spanning several orders of magnitude in affinity, and resolve binding affinities that differ by less than 1.5-fold [21]. Therefore, comparison of PBM-binding profiles provides a detailed and sensitive approach to compare the DNA-binding specificity of TFs.
PBMs can be used to assay binding to synthetic [1, 4, 31, 32, 49] or genome-derived [3, 21, 50, 51] DNA sequences. The universal PBM [1] is a particularly useful synthetic design that allows an unbiased and comprehensive measurement of TF binding to all possible 8 bp long DNA sites (i.e. 8-mers). The universal PBM design has been used in many of the large TF compendium analyses [20, 23, 31–38] and provides a powerful platform on which to discover new binding specificities and reveal TF binding differences. The UniPROBE database (http://thebrain.bwh.harvard.edu/unipobe) is a useful resource for universal PBM data sets from a range of species. One caveat with the universal PBM approach is that binding specificity is assayed using 8-mers that may miss features inherent in longer binding sites (see Table 1). To address these issues, one study used a two-stage approach where universal PBMs were first used to delimit potential binding site sequences, and was then followed by an experiment on a focused set of several thousand longer 10 bp binding sites sequences [49]. Other studies focused on specific TFs have selected microarray oligos based on prior knowledge about the TF specificity [50, 52–54]. PBMs using genome-derived DNA sequences have also been used to analyze TF specificity and have been instrumental in identifying features such as the role of flanking DNA (i.e. genomic context of a DNA-binding sites) [51] and coregulatory motifs for multi-protein complexes [21] (discussed more below). In summary, the PBM is a robust, HT methodology that provides a sensitive and flexible platform with which to examine TF–DNA-binding specificity.
OVERVIEW OF HT METHODOLOGIES
In addition to PBMs, a wide range of alternate HT methodologies are available that capitalize on technological innovations (e.g. HT sequencing, microfluidics) to greatly increase the number of protein–DNA interactions that can be measured (Table 1). A comprehensive review of these HT methodologies is beyond the scope of this review, and we refer the readers to reviews that more thoroughly cover the field [45–47]. However, we briefly discuss several features that distinguish the methodologies. A primary feature of HT methodologies is the ‘throughput’, or number of protein–DNA-binding interactions that can be measured. Microarray-based approaches (PBM and CSI) can assay 104–105 DNA sequences. Using multiple microarrays, the CSI approach was extended to assay ∼106 DNA sequences covering all possible 10-mers [22]. Methodologies that use HT sequencing technology have reported even higher numbers of measured interactions 106 (EMSA-seq, HT-SELEX), 109 (HiTS-FLIP) and 1013 (Bind-n-Seq). The throughput of the sequencing-based approaches is ostensibly limited only by the size of the available oligo libraries and sequencing capacity, which is changing rapidly. Another important feature of these methodologies is the size of the DNA oligomers to which protein binding can be assayed. Microarray-based approaches are limited to the length of DNA oligo available on the microarray slides. The most widely used platform is from Agilent, which supports a 60 bp long DNA probe [1]. However, despite this limitation, the Agilent platform has been used to assay binding of multi-protein complexes greater than 200 kDa [21]. Sequencing and microfluidics-based approaches that use oligo libraries as the source of the DNA are less limited and have reported probe lengths between 33 and 135 bp. Still longer DNA oligos were reported for the DIP-chip method that assayed protein binding to genomic DNA fragments with an average length of 600 bp. To date, most publications using HT methods have reported binding to synthetic sites (i.e. random and pseudo-random DNA sequences) ranging in size from 8 to 40 bp (Table 1). Despite their differences, different methodologies can produce highly similar protein–DNA-binding data [21, 55]. Direct comparison of PBM and MITOMI-based binding measurements to the same DNA sequences reported excellent agreement over several orders of magnitude in binding affinity [21]. A recent comparison of PBM and HT-SELEX binding data for 162 proteins found that binding models from both methods agreed well for most TFs [55]. However, this analysis also found differences—although PBM-derived 8-mer ranking is more accurate than HT-SELEX, HT-SELEX-derived models predicted in vivo chromatin immunopreciptation (ChIP) data better. Future comparisons and more studies on diverse proteins should help to clarify when a particular approach may be preferred over another. In the meantime, the large number of available HT methodologies provides tremendous flexibility that should greatly facilitate progress in the HT analysis of protein-DNA binding.
DNA-BINDING DIVERSITY OF TF HOMOLOGS
Many TF homologs, which by definition share a structurally homologous DBD, often bind to a common consensus DNA sites [20, 23, 31–38], raising the question of how these TFs functionally distinguish themselves in vivo and regulate distinct target genes [42, 51]. Complicating matters, it has long been appreciated that TFs bind to degenerate sequences that defy simple descriptions (i.e. recognition codes) [56–58]. Comparisons of TF PBM-binding profiles to thousands of DNA sites over a wide affinity range are revealing that degenerate binding profiles of TFs exhibit both common and TF-specific binding preferences [20, 23, 31–38]. Here we review how PBMs have been used to identify binding preferences between TF homologs, and discuss several studies that have connected these in vitro differences with in vivo binding and functional differences.
Shared and TF-specific binding
Large PBM-based surveys have revealed that related TFs from many different structural classes exhibit common and TF-specific binding preferences. An illustrative example of this is the diversity seen among ETS-family proteins. ETS proteins share a common winged helix-turn-helix (HTH) DBD termed the ETS domain, and can all bind to DNA binding sites with a common 5′-GGA(A/T)-3′ core element [33, 42]. A survey of ETS factor DNA binding using both HT-SELEX and PBMs confirmed these common binding features, while demonstrating complex ETS factor-specific preferences [33]. Here we describe the specificity differences as identified using the PBM-determined binding profiles; however, analogous ETS-class binding differences were also identified in the HT-SELEX data sets. Based on DNA-binding specificity ETS factors can be grouped into four broad specificity classes (I, II, III and IV). Comparing the binding profiles between representative members of each class—Erg (class I), Elf2 (class II), Sfpi1/PU.1 (class III) and Spdef (class IV)—illustrates the nature of common and TF-specific binding preferences, and highlights the complex specificity landscape revealed by HT methodologies (Figure 1). We observe that while a broad correlation exists between binding profiles of different ETS factors, there are clear binding preferences that can be visualized by spots occurring ‘off-diagonal’ (i.e. scoring higher on one axis than the other). For perspective, we include a comparison of Erg with another protein, Elk3, from the same specificity class.
Figure 1:

Comparison of ETS binding profiles. (A) Pair-wise binding profile comparison for ETS protein Elf2, Spdef and Sfpi1 compared with Erg. Binding profiles include binding to all possible (32,896) 8 bp sequences (i.e. 8-mers), determined by universal PBM experiment [33]. Z scores are transformed PBM-derived 8-mer median signal intensities. Binding sites containing specific 7-mers are highlighted (see legend in top panel). (B) PBM-determined binding logos are shown for each factor [33]. (C) Binding profile comparison for Erg and Elk3, format as in (A). (A colour version of this figure is available online at: http://bfg.oxfordjournals.org)
Examining the distribution of the binding site scores (i.e. the scatter plot) provides an intuitive feel for the complexity of the relative binding preferences for each factor. To illustrate these differences, we highlight several groups of sites with different binding preferences for the ETS factors discussed (Figure 1). All factors bind with high affinity to 5′-CCGGAAG-3′ sites (Figure 1, red dots). However, binding to 5′-GGGGAAG-3′ sites (Figure 1, blue dots), which contain the 5′-GGAA-3′ core but differ in the 5′-flanking bases, vary strongly between the different factors. Examining the preference for sites with an alternate core element, 5′-CCGGATG-3′ (Figure 1, orange dots), show similar differences between factors—these sites are bound poorly by Sfpi1, but with highest affinity by Spdef. These nuanced binding differences are also reflected in the different DNA-binding logos (Figure 1); however, examining the pair-wise profile comparisons highlights the complexity of the differences between the TF homologs and provides an additional intuitive feel for relative preferences for different DNA sequences. One aspect of TF diversity that becomes clear by these comparisons is that there are binding preferences for both high and lower affinity DNA sites. Therefore, a comprehensive characterization of homolog binding differences requires analyzing the full binding profiles provided by HT methods.
Flanking sequence and DNA shape
The DNA specificity of TFs is often represented using relatively compact (5–15 bp long) binding motifs [59]. However, many studies have shown that nucleotides flanking the established core TF binding motifs can affect binding specificity [58, 60–62], and that DNA shape features play an important role in TF-DNA recognition [63, 64].
A recent study used PBMs to examine the role of flanking DNA in defining specificity for two TFs that have highly similar DNA binding motifs but distinct in vivo targets [51]. The basic helix-loop-helix (bHLH) TFs Cbf1 and Tye7 both bind to highly similar 5′-CACGTG-3′-like E-box motifs. PBM-binding profiles for Cbf1 and Tye7 using the universal PBM design identified TF-specific binding preferences, but these differences did not fully explain the in vivo binding observed in ChIP-chip assays. To assay the influence of flanking DNA on Cbf1 and Tye7 binding to native genomic sites, PBMs were used to measure the binding to 592 predicted binding sites taken directly from ChIP-positive and ChIP-negative regions throughout the yeast genome. PBM binding experiments to these genomic binding sites, each 30 bp long with the E-box site centered in the middle, were used to derive predictive binding models that accounted for increasingly distal flanking bases. The models that best discriminated Cbf1 and Tye7 in vivo binding included flanking DNA up to 11 bp away from the 5′-CACGTG-3′ core element, highlighting the effect played by distal flanking DNA. To understand what role these flanking sequences might play in the observed binding preferences, local DNA shape features of the flanks were analyzed and revealed distinct differences between Cbf1- and Tye7-preferred binding sites. This study highlights how binding profiles across both synthetic and genome-derived binding site sequences can provide important insights into the role of subtle influences such as flanking DNA and DNA structure. Another recent analysis of published PBM data further highlights the utility of HT data sets to examine the importance of nucleotides flanking the core TF binding element [65]. This study addressed the performance of PBM-derived binding motifs of different lengths, and demonstrated that longer motifs, which include the nucleotide positions flanking the core elements, perform better than shorter motifs in predicting in vitro binding behavior of the TF.
In vitro differences correlate with in vivo differences
In vitro TF binding preferences have been shown to correlate with in vivo binding differences, and are therefore critically relevant for our understanding of target gene specificity between TF homologs. Analyses of the ETS proteins provide an illustrative example. Genome-wide ChIP-seq profiles were determined for representative factors of the four ETS specificity classes described previously and analyzed for binding sites differences [33]. ETS binding sites that were most strongly enriched within the different ChIP-determined regions matched the in vitro-determined binding sites for TFs of the same specificity class. In other words, regions bound by Spdef in vivo are much more strongly enriched for Spdef binding sites as determined by PBM than for sites from ETS factors of other specificity classes. Therefore, although consensus sequences of the ETS factors may appear similar, many lower affinity sites that are much more class specific contribute strongly to in vivo binding differences (Figure 1). The ability of in vitro binding preferences to recapitulate in vivo binding differences has also been demonstrated for the closely related nuclear receptors (NRs) HNF4α, RXRα and COUPTF2 [54]. These NRs all bind as homodimers to the consensus direct repeat element 1 (DR1), 5′-AGGTCAnAGGTCA-3′ (‘n’ is any base). PBM-binding profiles for the three NRs show highly correlated binding to DR1-type sites, but revealed high-affinity HNF4α-specific sites of the form 5′-nnnnCAAAGTCCA-3′, termed the HNF4α-specific binding motif (H4-SBM) [31, 54]. Analysis of ChIP-seq binding data for a larger group of NRs demonstrated that the H4-SBMs were bound uniquely by HNF4α and could explain many HNF4α-specifically bound loci in vivo. Furthermore, reporter assays demonstrated that although both HNF4α and RXRα could drive gene expression from promoters containing a DR1-type site (5′-AGGGCAgGGGTCA-3′), only HNF4α could drive expression from a mutant site resembling the H4-SBM (5′-AGTCCAgGGTCCA-3′, mutated bases underlined). These experiments demonstrate the utility of HT methods to identify TF preferences that help to explain in vivo binding and regulatory differences. Analyses that combine genome-scale ChIP studies, gene expression analysis and HT in vitro binding data will be critical to dissecting how the many co-expressed TF homologs execute their individual functions.
EXPLORING MECHANISMS OF TF–DNA-BINDING DIVERSITY
PBMs provide a powerful platform with which to examine biophysical mechanisms of TF–DNA-binding specificity. Comparing DNA-binding profiles between homologs or isoforms or between wild-type and mutant versions of individual proteins, provides a practical way to relate these protein differences with binding differences [32, 33]. Studying the effect of protein differences on TF–DNA-binding interactions is central to improving our understanding of evolutionary adaptation of TFs and the role of protein mutations in gene regulation and disease. Here we review several studies in which PBMs have been used to explore mechanisms of TF-binding diversification.
Residue differences in the protein–DNA-binding interface
Complex networks of side-chain and water-mediated interactions in protein–DNA-binding interfaces are known to affect DNA binding and complicate simple models of binding specificity [58]. PBM-based studies of TF–DNA binding are exposing the full extent of binding complexity, while at the same time providing the high-resolution binding data needed to characterize the role of individual residues on TF-binding diversity. In a survey that examined the binding diversity between mouse homeodomain proteins, it was observed that a large group of 44 homeodomains bound with high affinity to a canonical 5′-TAATTA-3′ site [32]. However, high-affinity binding was also observed for many of these homeodomains to alternate sites that resemble the canonical AT-rich site but differed slightly, for example: 5′-TAATAG-3′ (Nkx1-1, Nkx1-2); 5′-TAATCG-3′ (Hlxb9, Hoxd1, Pax7); 5′-CAATT-3′ (Msx1, Msx2, Msx3); 5′-TAACnAG-3′ (Lbx2). To determine amino acids responsible for the observed binding variation, PBM-binding profiles were correlated with amino acid differences across the full set of 168 homeodomains examined in this study to establish predictive models of homeodomain binding specificity. Amino acid differences for a small set of established homeodomain base-contacting residues (i.e. position 47, 50 and 54 of the major-groove recognition helix) were broadly predictive, but did not resolve many of the TF-specific differences identified by the PBM experiments. More accurate predictions required considering the amino acid identity at an extended set of 15 residue positions that account for all base-specific and phosphate backbone contacts in representative homeodomain crystal structures. A picture emerged in which a conserved set of amino acids shared across homologs defines their high affinity common sites (i.e. I47, Q50, N51, S54 mediate binding to 5′-TAATTA-3′), while a more varied set of residues, both base- and backbone-contacting, contribute to factor-specific binding differences.
A similar scenario was observed for the ETS proteins described previously (Figure 1) [33]. PBM profile comparisons revealed factor-specific binding preferences to both high- and lower-affinity variants that contained or resembled the canonical 5′-GGA(A/T)-3′ ETS core element, for example: 5′-GGGGAA-3′ (Sfpi1 preference for preceding GG); 5′-CCGGTT-3′ (Spdef tolerance for alternate 5′-GGTT-3′ core). Correlating the individual ETS-factor binding preferences with amino acid differences provided predictions regarding the residues affecting binding. Mutant ETS domains with substitutions predicted to switch the DNA-binding specificities were tested and shown to correctly switch the DNA-binding specificity. Similar to the homeodomain study, an invariant set of amino acids dictate binding to the common sites—a pair of invariant Arginine residues mediate binding to the 5′-GGA(A/T)-3′ core element—while residues affecting ETS factor-specific binding to variant sites were found to occur throughout the wHTH DBD. These examples highlight the complex nature of TF-DNA binding, whereby conserved residues explain binding to common DNA sites by all homologs, but TF-specific differences can be mediated by more diverse set of distributed amino acids that vary between homologs. The comparison of binding profiles across many homologs provides a powerful approach to identify the binding differences and the specificity-determining residues.
Modularity by distinct binding modes
In a recent study, we investigated TF-specific preferences identified in universal-PBM-binding profiles of C2H2 zinc finger (ZF) proteins from yeast and uncovered two new mechanisms for ZF DNA-binding diversification that allow ZF proteins to gain new sites in a modular fashion [66]. C2H2 ZF proteins (hereafter referred to as ‘ZF proteins’) bind DNA using arrays of ZF domains in which DNA-base contacts are mediated by amino acids at 4 canonical ‘recognition’ positions in each ZF domain. We compared binding profiles for a number of short two-ZF proteins that had identical amino acids at their canonical recognition positions and were expected to have identical DNA-binding specificity. While all factors with identical recognition residues shared common high affinity sites that conformed to known ZF recognition rules, we also observed widespread factor-specific binding differences. To address the biophysical mechanism of the observed non–canonical binding preferences, we focused on a set of three homologs—Msn2, Usv1 and Com2—that bound with high-affinity to the yeast stress-response element 5′-AGGGG-3′, but also exhibited individual high-affinity binding to alternate sites: 5′-CGGGG-3′ (Msn2 specific), 5′-ATAGGAG-3′ (Com2 specific), 5′-AGGnAC-3′ (Usv1 specific). PBM experiments of multiple mutants of Com2 and Usv1 revealed that their individual binding preferences were the result of distinctly different biophysical mechanisms: Com2-specific binding involves a basic RGRK motif N-terminal to the ZF domains, while Usv1-specific binding involves a distributed set of residues across both ZF domains believed to stabilize an altered binding orientation of ZF1. Furthermore, our studies revealed these factor-specific binding preferences are modular and can be selectively abrogated or engineered onto another homolog without affecting the binding to the common high-affinity sites. PBM experiments of protein mutants were critical to our insights concerning the residues involved and the modular nature of these factor-specific behaviors. For example, comparison of the binding profiles for Com2 and a Com2 RGRK-to-RGEE mutant, in which we mutated the N-terminal basic motif, immediately revealed the loss of affinity for the Com2-specific sites (Figure 2A and B). The ability to monitor changes across the full DNA-binding specificity landscape in response to protein changes provides a particularly powerful way to uncover mechanisms of specificity.
Figure 2:

Comparison of wild-type and mutant Com2 binding profiles. (A) Comparison of Com2 and Com2 RK→EE mutant binding profiles, in relation to Msn2 (profiles as in Figure 1). Com2-preferred sites (sites bound significantly better by Com2 than Msn2 [66]), and high-affinity sites common to both Msn2 and Com2, are highlighted orange and red, respectively. (B) Binding logos determined from Com2-preferred sites (orange dots) and common sites (red dots). (C) Comparison of FoxP1-ES and FoxP1 8-mer binding profiles. Binding sites containing specific 7-mers are highlighted. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org).
Alternative splicing and protein isoforms
Alternative splicing is an important mechanism of TF regulation that can dramatically impact the wiring of gene regulatory networks [67, 68]. However, the role of protein isoforms in DNA-binding diversification has not been widely studied. A study of the embryonic stem cell (ESC)-specific splice variant of the forkhead protein FoxP1, termed FoxP1-ES, illustrates the importance of alternative splicing in TF function and how PBMs provide a natural methodology to investigate the impact of splicing on DNA-binding specificity [69]. FoxP1-ES expression in ESCs is important for maintenance of ESC pluripotency. FoxP1 and FoxP1-ES are produced by alternative splicing of exon 18 (FoxP1 uses exon 18, while FoxP1-ES uses a variant exon 18b) that encodes a portion of the forkhead DBD of each factor. The two isoforms regulate distinct target gene sets and bind to distinct loci in vivo, suggesting distinct DNA-binding activities. PBM experiments identified clear DNA-binding differences between the two variants (Figure 2C). Both splice variants bound with high-affinity to a common 5′-ACAACAA-3′ site; however, only FoxP1 bound strongly to the canonical forkhead site 5′-GTAAACA-3′, while only FoxP1-ES bound strongly to variant sites, such as 5′-ATACAAA-3′ (Figure 2C). Further analysis showed that these binding differences correlated with in vivo binding and gene expression differences, leading to a proposed model in which pluripotency genes are regulated by FoxP1-ES binding to FoxP1-ES-specific sites, while differentiation genes are regulated by FoxP1 binding to FoxP1-specific sites. Thus, protein isoforms can have unique DNA-binding preferences that enable regulation of functionally distinct cellular programs. PBM analyses of other alternative splice variants will likely lead to similar connections between splicing, DNA binding and TF diversification.
DNA BINDING OF PROTEIN COMPLEXES
The DNA-binding specificity of individual TFs is a central determinant of their binding throughout the genome [59, 70]; however, TFs often bind DNA and function as multi-protein complexes [43, 58, 71]. Furthermore, it has been shown that protein complexes can exhibit novel DNA-binding specificities not evident from measurements of the individual proteins [14, 21]. Therefore, to determine the role of multi-protein complexes in target gene specification (i.e. multi-protein recognition codes [58]), HT methods are needed to characterize the DNA binding of protein complexes. Here we review several studies in which PBMs have been successfully used to characterize the DNA binding of protein complexes.
Functional diversity by alternative dimerization
Heterodimerization with multiple partners is a powerful mechanism to diversify TF functionality. In a study examining the network of bHLH TFs in C. elegans, PBMs were used to characterize the DNA-binding specificity of many bHLH heterodimers and to study the ability of dimerization partners to diversify TF DNA-binding specificity [38]. Fourteen bHLH factors in C. elegans can form heterodimers with the promiscuous HLH-2 protein. DNA-binding profiles for nine of these HLH-2 heterodimers revealed many dimer-specific binding preferences. For example, both the HLH-2:HLH-4 and the HLH-2:HLH-10 dimers recognized the core 5′-CACCTG-3′; but only HLH-2:HLH-10 recognized sites with the alternate core 5′-CATATG-3′ (Figure 3C). Many similar dimer-specific differences were observed between the heterodimers, highlighting the role of HLH-2 dimerization partners to alter its binding specificity. This study focused on obligate heterodimers of factors within the same structural class (i.e. all were bHLHs). However, there are also numerous examples of dimers involving proteins from different structural classes that can bind cooperatively to composite DNA binding elements, such as IRF4 (IRF domain) and Jun-BATF (bZIP dimer) binding to the composite AP-1-IRF element 5′-TCAnTCAGAAA-3′ (IRF site underlined) [72]. HT methods to evaluate the DNA binding of the different classes of dimers (i.e. obligate and opportunistic) will provide considerable insight into the role of protein complexes in gene regulatory specificity and the diversification of TF function.
Figure 3:

Binding specificity of multi-protein complexes. (A) Median PBM probe fluorescence intensities (y-axis) for Met4 binding to 673 genomic Cbf1 sites in the presence of Cbf1 only (left-hand panel), or Cbf1 and Met28 (right-hand panel). X-axis coordinates are Kd values for Cbf1 binding to the respective sites. Cartoons in each panel represent the complexes being assayed. A subset of Met4 ‘recruitment sites’—sites that mediate enhanced binding to Met4:Met28:Cbf1 complexes [21], see logo in (B)—are highlighted (red dots) to illustrate that protein complex binding correlates with distinct sequence features. (B) Binding logo determined for Met recruitment sites [21]. (C) Comparison of HLH2:HLH10 and HLH2:HLH4 heterodimer 8-mer binding profiles. Binding sites containing specific 6-mers are highlighted. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org).
Recruitment of non-DNA-binding cofactors
Recruitment of transcriptional cofactors that lack inherent DNA-binding capacity (i.e. non-DNA-binding cofactors) is an important aspect of transcriptional regulation [43]. Furthermore, recruitment of non-DNA-binding cofactors can occur in a DNA-site-specific manner and adds an additional layer of potential specificity to target gene regulation [58]. HT methods of measuring the DNA-binding specificity of protein complexes that involve recruited cofactors will be instrumental in dissecting these multi-protein recognition codes in gene regulatory networks. In a study focused on dissecting the regulatory specificity of yeast sulfur metabolism genes, we demonstrated that PBMs could be used to examine the DNA-specificity of cofactor recruitment [21]. Furthermore, this approach allowed us to uncover a previously unknown coregulatory motif necessary for efficient assembly of a trimeric regulatory complex to DNA. Regulation of sulfur metabolism genes in yeast requires that Cbf1, a bHLH DNA-binding TF, recruits two non-DNA-binding factors, Met4 and Met28, to the promoters of target genes. Using a hybrid PBM/surface plasmon resonance (SPR) approach, we determined the equilibrium binding affinities of Cbf1 to binding sites throughout the genome. Cbf1-binding sites in the promoters of sulfur metabolism-related target genes were not particularly high affinity; therefore, we asked whether the target gene specificity resulted from additional specificity conferred by the multi-protein Met4:Met28:Cbf1 regulatory complex. PBMs were used to measure the recruitment of Met4 to several hundred Cbf1 sites derived from the genome (Figure 3). Met4 was recruited only when all three proteins were present in the binding reaction (i.e. an obligatory trimeric interaction), and assembled most strongly on specialized DNA site found in the promoters of sulfur metabolism genes (Figure 3). Cbf1 binds to 5′-CACGTG-3′-type E-box sequences; however, Met4:Met28:Cbf1 binding was enhanced when an additional 5′-RYAAT-3′ ‘recruitment motif’ (R—purine, Y—pyrimidine) was present immediately adjacent to the Cbf1 site: 5′-RYAATnnCACGTG-3′ (Figure 3). The importance of this co-occurring motif to gene expression was validated by reporter genes experiments in vivo. Based on sequence similarity between Met28 and the mammalian bZIP proteins C/EBP β, we proposed a model in which the non–DNA-binding factors Met4 and Met28 play an active role in recognizing the 5′-RYAAT′-3′ recruitment motif, but are not able to do so with sufficient affinity in the absence of Cbf1. This study highlights the need to analyze the DNA-binding specificity of multi-protein complexes to understand gene regulatory logic, and the utility of HT approaches, such as the PBM, for these studies.
SUMMARY
Characterizing the DNA-binding specificity of proteins, and multi-protein complexes, is a critical component of efforts to understand specificity in gene regulation. HT methods for measuring protein–DNA binding have significantly improved our ability to characterize the DNA-binding specificity of TFs. The considerable increase in the number of protein–DNA measurements that can be made has qualitatively changed our understanding of TF binding by revealing a much more complex binding specificity landscape than previously appreciated. This has been most clearly seen in the comparison of TF homologs, where both shared and TF-specific binding preferences have been observed for many TF classes [20, 23, 31–38, 49, 53, 54]. Furthermore, studies have shown that these factor-specific binding preferences correlate with in vivo binding and functional differences [33, 54]. Comparative analysis of DNA-binding profiles from HT methods, such as PBMs, provides a powerful approach to identify functionally relevant differences between TFs and study the impact of TF-binding specificity in gene regulation.
HT methods have also provided tremendous insight into mechanisms of specificity. The ability to monitor how protein changes (e.g. residue mutations) affect the binding specificity landscape over thousands of different DNA sequences (e.g. Figure 2) provides a powerful approach to interrogate mechanisms of specificity. The application of HT methods to examine effects of protein alterations on DNA binding will likely reveal new insights into mechanisms of specificity, such as the role of alternate binding modes [21, 73] or protein isoforms [69]. One area that has not yet been widely studied using HT methods, but which may prove fruitful, is the role of posttranslational modifications (i.e. phosphorylation) on TF binding specificity. Future studies using HT methods to study mechanisms of specificity promise to provide a clearer picture of both the role of protein mutations on gene regulation and disease, and the evolution of TFs and gene regulatory networks.
Finally, we have discussed how PBMs have been used to study the DNA binding of multi-protein complexes, leading to insights regarding the role of promiscuous binding partners [38] and non–DNA-binding cofactors [21]. Other HT methods, such as HT-SELEX, have also been used to study protein complexes [14]. Multi-protein complexes greatly increase the potential binding diversity of TFs and provide an important means to integrate signals in the cell. Applying HT methods to studies of DNA binding by protein complexes will likely lead to many new insights regarding the role of multi-protein interactions in gene regulatory specificity. Integrating these types of data sets with in vivo binding and gene expression data will lead to more sophisticated models of specificity in gene regulation.
Key points.
High-throughput (HT) methods, such as protein-binding microarrays (PBMs), have greatly improved our ability to characterize DNA binding of TFs
Transcription factor homologs exhibit common and factor-specific DNA-binding preferences that explain common and unique functions in vivo.
PBMs provide a powerful platform to associate protein alterations with changes in DNA-binding specificity.
PBMs can be used to study the DNA binding of multi-protein complexes.
FUNDING
This work was supported by NIH grant K22AI093793 to T.S.
Biographies
Kellen Andrilenas is a PhD student in the Biology Department at Boston University.
Ashley Penvose is a PhD student in the Biology Department at Boston University.
Trevor Siggers is an Assistant Professor in the Biology Department at Boston University.
References
- 1.Berger MF, Philippakis AA, Qureshi AM, et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006;24:1429–35. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bulyk ML, Gentalen E, Lockhart DJ, et al. Quantifying DNA-protein interactions by double-stranded DNA arrays. Nat Biotechnol. 1999;17:573–7. doi: 10.1038/9878. [DOI] [PubMed] [Google Scholar]
- 3.Mukherjee S, Berger MF, Jona G, et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat Genet. 2004;36:1331–9. doi: 10.1038/ng1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Linnell J, Mott R, Field S, et al. Quantitative high-throughput analysis of transcription factor binding specificities. Nucleic Acids Res. 2004;32:e44. doi: 10.1093/nar/gnh042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Field S, Udalova I, Ragoussis J. Accuracy and reproducibility of protein-DNA microarray technology. Adv Biochem Eng Biotechnol. 2007;104:87–110. doi: 10.1007/10_2006_035. [DOI] [PubMed] [Google Scholar]
- 6.Bonham AJ, Neumann T, Tirrell M, et al. Tracking transcription factor complexes on DNA using total internal reflectance fluorescence protein binding microarrays. Nucleic Acids Res. 2009;37:e94. doi: 10.1093/nar/gkp424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tantin D, Gemberling M, Callister C, et al. High-throughput biochemical analysis of in vivo location data reveals novel distinct classes of POU5F1(Oct4)/DNA complexes. Genome Res. 2008;18:631–9. doi: 10.1101/gr.072942.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Warren CL, Kratochvil NC, Hauschild KE, et al. Defining the sequence-recognition profile of DNA-binding molecules. Proc Natl Acad Sci USA. 2006;103:867–72. doi: 10.1073/pnas.0509843102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hu S, Xie Z, Onishi A, et al. Profiling the human protein-DNA interactome reveals ERK2 as a transcriptional repressor of interferon signaling. Cell. 2009;139:610–22. doi: 10.1016/j.cell.2009.08.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zykovich A, Korf I, Segal DJ. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 2009;37:e151. doi: 10.1093/nar/gkp802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wong D, Teixeira A, Oikonomopoulos S, et al. Extensive characterization of NF-kappaB binding uncovers non-canonical motifs and advances the interpretation of genetic functional traits. Genome Biol. 2011;12:R70. doi: 10.1186/gb-2011-12-7-r70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhao Y, Granas D, Stormo GD. Inferring binding energies from selected binding sites. PLoS Comput Biol. 2009;5:e1000590. doi: 10.1371/journal.pcbi.1000590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jolma A, Kivioja T, Toivonen J, et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 2010;20:861–73. doi: 10.1101/gr.100552.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Slattery M, Riley T, Liu P, et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell. 2011;147:1270–82. doi: 10.1016/j.cell.2011.10.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nutiu R, Friedman RC, Luo S, et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat Biotechnol. 2011;29:659–64. doi: 10.1038/nbt.1882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Maerkl SJ, Quake SR. A systems approach to measuring the binding energy landscapes of transcription factors. Science. 2007;315:233–7. doi: 10.1126/science.1131007. [DOI] [PubMed] [Google Scholar]
- 17.Li JJ, Herskowitz I. Isolation of ORC6, a component of the yeast origin recognition complex by a one-hybrid system. Science. 1993;262:1870–4. doi: 10.1126/science.8266075. [DOI] [PubMed] [Google Scholar]
- 18.Meng X, Smith RM, Giesecke AV, et al. Counter-selectable marker for bacterial-based interaction trap systems. Biotechniques. 2006;40:179–84. doi: 10.2144/000112049. [DOI] [PubMed] [Google Scholar]
- 19.Noyes MB, Meng X, Wakabayashi A, et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 2008;36:2547–60. doi: 10.1093/nar/gkn048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Franco-Zorrilla JM, Lopez-Vidriero I, Carrasco JL, et al. DNA-binding specificities of plant transcription factors and their potential to define target genes. Proc Natl Acad Sci USA. 2014;111:2367–72. doi: 10.1073/pnas.1316278111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Siggers T, Duyzend MH, Reddy J, et al. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol Syst Biol. 2011;7:555. doi: 10.1038/msb.2011.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Puckett JW, Muzikar KA, Tietjen J, et al. Quantitative microarray profiling of DNA-binding molecules. J Am Chem Soc. 2007;129:12310–9. doi: 10.1021/ja0744899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jolma A, Yan J, Whitington T, et al. DNA-binding specificities of human transcription factors. Cell. 2013;152:327–39. doi: 10.1016/j.cell.2012.12.009. [DOI] [PubMed] [Google Scholar]
- 24.Maerkl SJ, Quake SR. Experimental determination of the evolvability of a transcription factor. Proc Natl Acad Sci USA. 2009;106:18650–5. doi: 10.1073/pnas.0907688106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Meng X, Brodsky MH, Wolfe SA. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat Biotechnol. 2005;23:988–94. doi: 10.1038/nbt1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liu X, Lee CK, Granek JA, et al. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res. 2006;16:1517–28. doi: 10.1101/gr.5655606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu X, Noll DM, Lieb JD, et al. DIP-chip: rapid and accurate determination of DNA-binding specificity. Genome Res. 2005;15:421–7. doi: 10.1101/gr.3256505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Shumaker-Parry JS, Aebersold R, Campbell CT. Parallel, quantitative measurement of protein binding to a 120-element double-stranded DNA array in real time using surface plasmon resonance microscopy. Anal Chem. 2004;76:2071–82. doi: 10.1021/ac035159j. [DOI] [PubMed] [Google Scholar]
- 29.Campbell CT, Kim G. SPR microscopy and its applications to high-throughput analyses of biomolecular binding events and their kinetics. Biomaterials. 2007;28:2380–92. doi: 10.1016/j.biomaterials.2007.01.047. [DOI] [PubMed] [Google Scholar]
- 30.Bonham AJ, Wenta N, Osslund LM, et al. STAT1: DNA sequence-dependent binding modulation by phosphorylation, protein: protein interactions and small-molecule inhibition. Nucleic Acids Res. 2013;41:754–63. doi: 10.1093/nar/gks1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Badis G, Berger MF, Philippakis AA, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–3. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Berger MF, Badis G, Gehrke AR, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–76. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wei GH, Badis G, Berger MF, et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 2010;29:2147–60. doi: 10.1038/emboj.2010.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Badis G, Chan ET, van Bakel H, et al. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol Cell. 2008;32:878–87. doi: 10.1016/j.molcel.2008.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhu C, Byers KJ, McCord RP, et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 2009;19:556–66. doi: 10.1101/gr.090233.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gordan R, Murphy KF, McCord RP, et al. Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome Biol. 2011;12:R125. doi: 10.1186/gb-2011-12-12-r125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.De Silva EK, Gehrke AR, Olszewski K, et al. Specific DNA-binding by apicomplexan AP2 transcription factors. Proc Natl Acad Sci USA. 2008;105:8393–8. doi: 10.1073/pnas.0801993105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Grove CA, De Masi F, Barrasa MI, et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell. 2009;138:314–27. doi: 10.1016/j.cell.2009.04.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 40.Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291:1304–51. doi: 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]
- 41.Tan K, McCue LA, Stormo GD. Making connections between novel transcription factors and their DNA motifs. Genome Res. 2005;15:312–20. doi: 10.1101/gr.3069205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hollenhorst PC, McIntosh LP, Graves BJ. Genomic and biochemical insights into the specificity of ETS transcription factors. Annu Rev Biochem. 2011;80:437–71. doi: 10.1146/annurev.biochem.79.081507.103945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–51. doi: 10.1038/nature01763. [DOI] [PubMed] [Google Scholar]
- 44.Wolberger C. Multiprotein-DNA complexes in transcriptional regulation. Annu Rev Biophys Biomol Struct. 1999;28:29–56. doi: 10.1146/annurev.biophys.28.1.29. [DOI] [PubMed] [Google Scholar]
- 45.Wang J, Lu J, Gu G, et al. In vitro DNA-binding profile of transcription factors: methods and new insights. J Endocrinol. 2011;210:15–27. doi: 10.1530/JOE-11-0010. [DOI] [PubMed] [Google Scholar]
- 46.Stormo GD, Zhao Y. Determining the specificity of protein-DNA interactions. Nat Rev Genet. 2010;11:751–60. doi: 10.1038/nrg2845. [DOI] [PubMed] [Google Scholar]
- 47.Slattery M, Zhou T, Yang L, et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci. 2014;39:381–99. doi: 10.1016/j.tibs.2014.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat Protoc. 2009;4:393–411. doi: 10.1038/nprot.2008.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Siggers T, Chang AB, Teixeira A, et al. Principles of dimer-specific gene regulation revealed by a comprehensive characterization of NF-kappaB family DNA binding. Nat Immunol. 2012;13:95–102. doi: 10.1038/ni.2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bolotin E, Liao H, Ta TC, et al. Integrated approach for the identification of human hepatocyte nuclear factor 4alpha target genes using protein binding microarrays. Hepatology. 2010;51:642–53. doi: 10.1002/hep.23357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gordan R, Shen N, Dror I, et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 2013;3:1093–104. doi: 10.1016/j.celrep.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Udalova IA, Mott R, Field D, et al. Quantitative prediction of NF-kappa B DNA-protein interactions. Proc Natl Acad Sci USA. 2002;99:8167–72. doi: 10.1073/pnas.102674699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wong D, Teixeira A, Oikonomopoulos S, et al. Extensive characterization of NF-KappaB binding uncovers non-canonical motifs and advances the interpretation of genetic functional traits. Genome Biol. 2011;12:R70. doi: 10.1186/gb-2011-12-7-r70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Fang B, Mane-Padros D, Bolotin E, et al. Identification of a binding motif specific to HNF4 by comparative analysis of multiple nuclear receptors. Nucleic Acids Research. 2012;40:5343–56. doi: 10.1093/nar/gks190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Orenstein Y, Shamir R. A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data. Nucleic Acids Res. 2014;42:e63. doi: 10.1093/nar/gku117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Matthews BW. Protein-DNA interaction. No code for recognition. Nature. 1988;335:294–5. doi: 10.1038/335294a0. [DOI] [PubMed] [Google Scholar]
- 57.Pabo CO, Nekludova L. Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? J Mol Biol. 2000;301:597–624. doi: 10.1006/jmbi.2000.3918. [DOI] [PubMed] [Google Scholar]
- 58.Siggers T, Gordan R. Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 2013;42:2099–111. doi: 10.1093/nar/gkt1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. doi: 10.1186/gb-2003-5-1-201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Leonard DA, Rajaram N, Kerppola TK. Structural basis of DNA bending and oriented heterodimer binding by the basic leucine zipper domains of Fos and Jun. Proc Natl Acad Sci USA. 1997;94:4913–8. doi: 10.1073/pnas.94.10.4913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Morin B, Nichols LA, Holland LJ. Flanking sequence composition differentially affects the binding and functional characteristics of glucocorticoid receptor homo- and heterodimers. Biochemistry. 2006;45:7299–306. doi: 10.1021/bi060314k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Garvie CW, Wolberger C. Recognition of specific DNA sequences. Mol Cell. 2001;8:937–46. doi: 10.1016/s1097-2765(01)00392-6. [DOI] [PubMed] [Google Scholar]
- 63.Rohs R, Jin X, West SM, et al. Origins of specificity in protein-DNA recognition. Ann Rev Biochem. 2010;79:233–69. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rohs R, West SM, Sosinsky A, et al. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–53. doi: 10.1038/nature08473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Orenstein Y, Mick E, Shamir R. RAP: accurate and fast motif finding based on protein-binding microarray data. J Comput Biol. 2013;20:375–82. doi: 10.1089/cmb.2012.0253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Siggers T, Reddy J, Barron B, et al. Diversification of transcription factor paralogs via noncanonical modularity in C2H2 zinc finger DNA binding. Mol Cell. 2014;55:1–9. doi: 10.1016/j.molcel.2014.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Blencowe BJ. Alternative splicing: new insights from global analyses. Cell. 2006;126:37–47. doi: 10.1016/j.cell.2006.06.023. [DOI] [PubMed] [Google Scholar]
- 68.Chen M, Manley JL. Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol. 2009;10:741–54. doi: 10.1038/nrm2777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Gabut M, Samavarchi-Tehrani P, Wang X, et al. An alternative splicing switch regulates embryonic stem cell pluripotency and reprogramming. Cell. 2011;147:132–46. doi: 10.1016/j.cell.2011.08.023. [DOI] [PubMed] [Google Scholar]
- 70.Pique-Regi R, Degner JF, Pai AA, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21:447–55. doi: 10.1101/gr.112623.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Johnson AD. Molecular mechanisms of cell-type determination in budding yeast. Curr Opin Genet Dev. 1995;5:552–8. doi: 10.1016/0959-437x(95)80022-0. [DOI] [PubMed] [Google Scholar]
- 72.Singh H, Glasmacher E, Chang AB, et al. The Molecular Choreography of IRF4 and IRF8 with Immune System Partners. Cold Spring Harb Symp Quant Biol. 2014;78:101–4. doi: 10.1101/sqb.2013.78.020305. [DOI] [PubMed] [Google Scholar]
- 73.Nakagawa S, Gisselbrecht SS, Rogers JM, et al. DNA-binding specificity changes in the evolution of forkhead transcription factors. Proc Natl Acad Sci USA. 2013;110:12349–54. doi: 10.1073/pnas.1310430110. [DOI] [PMC free article] [PubMed] [Google Scholar]
