Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2022 Aug 17;121(20):3987–4000. doi: 10.1016/j.bpj.2022.08.015

Flexibility of flanking DNA is a key determinant of transcription factor affinity for the core motif

Debostuti Ghoshdastidar 1, Manju Bansal 1,
PMCID: PMC9674967  PMID: 35978548

Abstract

Selective gene regulation is mediated by recognition of specific DNA sequences by transcription factors (TFs). The extremely challenging task of searching out specific cognate DNA binding sites among several million putative sites within the eukaryotic genome is achieved by complex molecular recognition mechanisms. Elements of this recognition code include the core binding sequence, the flanking sequence context, and the shape and conformational flexibility of the composite binding site. To unravel the extent to which DNA flexibility modulates TF binding, in this study, we employed experimentally guided molecular dynamics simulations of ternary complex of closely related Hox heterodimers Exd-Ubx and Exd-Scr with DNA. Results demonstrate that flexibility signatures embedded in the flanking sequences impact TF binding at the cognate binding site. A DNA sequence has intrinsic shape and flexibility features. While shape features are localized, our analyses reveal that flexibility features of the flanking sequences percolate several basepairs and allosterically modulate TF binding at the core. We also show that lack of flexibility in the motif context can render the cognate site resistant to protein-induced shape changes and subsequently lower TF binding affinity. Overall, this study suggests that flexibility-guided DNA shape, and not merely the static shape, is a key unexplored component of the complex DNA-TF recognition code.

Significance

Transcription factors (TFs) precisely perform an extremely complex task of recognizing target DNA binding sites amid millions of putative motifs within the genome. Elements of this recognition code are not only found within the core motif but also in the flanking sequences that are not directly contacted by the TF. In this work, we suggest that a key unexplored element of this recognition code is “DNA flexibility” by which DNA flanks, lying several basepairs away, can allosterically modulate the shape space traversed by the core motif. Our microsecond-long molecular dynamics simulation studies offer mechanistic insights into TF recognition of DNA, successfully explaining in vitro DNA binding affinity differences observed for homeodomain TFs amid putative binding sites.

Introduction

The mechanism underlying recognition of specific DNA sequences by transcription factors (TFs) has been investigated for over five decades. Yet no simple recognition code has been identified by which TFs select a subset of consensus motifs among millions of putative binding sites in the genome (1,2). Decades of research have in fact revealed the complexity of the in vivo genome association landscape of TFs. Apart from the primary sequence of the target DNA, chromatin accessibility, epigenetic modifications, shape of the DNA, and shape complementarity between the TF binding site (TFBS) and its partner are all known to modulate the high specificity of DNA recognition by TFs (2, 3, 4, 5, 6, 7, 8, 9, 10). A fundamental first step toward better understanding of this complex landscape is in vitro assays of DNA binding by TFs reviewed in (11, 12, 13). Developments in high-throughput in vitro binding assay techniques have paved the way for looking beyond the target sequence while elucidating TF-DNA recognition and binding specificities (14, 15, 16, 17). These studies have revealed that the context of the target DNA, ranging from bases immediately flanking TFBSs to higher-order level (e.g., DNA looping), can alter the DNA-TF recognition code (15,18, 19, 20, 21, 22, 23, 24).

Motifs lying distal from the cognate site generally aid in DNA:TF recognition by facilitating genome scanning and directing the TF toward its ideal binding site (8,25,26). However, the role of bases immediately flanking the cognate site (up to 2–3 nucleotides) cannot be generalized and appears to be unique to each TF (2,23,24,27,28). For some TFs, shape of the flanking DNA modulates binding specificity while for some others the presence of homotypic clusters flanking the binding motif confers additional specificity on DNA recognition akin to the cognate site itself (21,23,26,29, 30, 31, 32). For a comprehensive understanding of the role of flanking DNA on TF binding affinity, we recently investigated nine representative TFs from the three largest eukaryotic TF families—zinc finger, basic leucine zipper and homeodomain (28). DNA binding affinity was correlated with nine different nonredundant structural features of the flank—encompassing DNA shape, flexure, and stability. For the first time, flexibility of the flanking DNA emerged as a common modulator of TF-DNA binding across all representative TFs.

The significance of DNA flexibility in protein binding events is well established (33, 34, 35, 36, 37, 38, 39, 40, 41). The physiologically significant B-family of DNA exhibits high structural and conformational heterogeneity, where the same sequence exists in an ensemble of metastable states that are thermodynamically distinct from each other (42, 43, 44). The role of DNA flexibility in TF recognition has been extensively researched in the context of DNA looping by cofactor binding, DNA bending upon nucleosome positioning and severe deformation of DNA induced by protein binding (2,40,45, 46, 47, 48, 49). The more subtle role of flexure of sequences immediately flanking the cognate site, however, is still not clear. The structural dynamics of flanking DNA must be probed at single-nucleotide resolution to decipher its role in TF recognition, which is a challenging task for conventional structural biophysical techniques. In this regard molecular dynamics (MD) simulations have proven to be a powerful atomistic probe of intricate internal motions in DNA that contribute to its flexibility (50, 51, 52, 53). In this study we employ atomistic MD simulations to unravel the mechanism by which the flanking DNA alters binding specificity of two closely related Hox TF pairs Ultrabithorax (Ubx)-Extradenticle (Exd) and Sex combs reduced (Scr)-Exd for the cognate site. Upon altering the sequence of flanking DNA, based on in vitro high-throughput SELEX data, the conformational ensemble explored by the core binding site is significantly modulated. While the ensembles are very similar in apo-DNA, they exhibit striking differences in the presence of the Hox TF pair. Subsequently, the ensemble composition guides the final conformer selected by the TF, suggesting that flexibility signatures embedded in the flanking sequence allow TFs to distinguish the ideal cognate site from all putative sequences.

Materials and methods

Rationale for selection of model system

In vivo, a TF encounters an ensemble of structural variants explored by its binding site and finally binds a variant with the most compatible shape. Hence, as a model system we chose the Hox TF family, since its preference for binding a cognate site with a prominent shape feature—narrow minor groove—is well established (54, 55, 56, 57, 58, 59). Moreover, the role of flanking sequence in modulating DNA binding specificity of Ubx and Scr homeodomains, in association with the cofactor Exd, is also reported (28). Thus, the Exd-Hox heterodimer offers a reasonable model system to explore the mechanism by which flanking sequences modulate TF binding at the cognate site. To investigate the role of flanking DNA in modulating TF binding affinity to the cognate site, we simulated Exd-Ubx in complex with six 20-mer DNA sequences (Table 1) and Exd-Scr with one 20-mer DNA sequence. All simulated DNA oligomers contained the 8-mer core binding site for Ubx (TGATTTAT) or Scr (AGATTAAT) but flanked by varying 4-mers along with GC-capped termini. The sequences of the flanks were selected based on our earlier study, where we reported the binding affinity of Exd-Hox to 12-mer sequences containing the cognate site along with either the 5′ flank (5′-NNNN(A/T)GATT(A/T)AT-3′) or 3′ flank (5ʹ-(A/T)GATT(A/T)ATNNNN-3′) (Fig. S1 a). A combination of high (H) and low (L) affinity flanks were used to prepare sequences for the simulation (Table 1). Based on the binding affinity of the two flanks, each sequence was named using a two-letter code, where the first letter indicates experimental binding affinity due to the 5′ flank and the second letter indicates affinity due to the 3′ flank. For example, the LH sequence bound by Exd-Ubx is composed of the core motif TGATTTAT flanked by a low (L) affinity 5′ flank and high (H) affinity 3′ flank. It is important to note that, owing to the presence of the cognate site TGATTTAT in all the sequences studied here, Exd-Ubx binds to all these sequences, but the affinity of binding is clearly affected by changing the flanks (Fig. S1 a). Similarly, Exd-Scr was simulated in complex with the LL1 motif (5′-AAAAAGATTAATAAAA-3′) based on its affinity preferences (Fig. S1 b).

Table 1.

The 20-mer DNA sequences used for simulation in complex with Exd-Ubx homeodomain transcription factor

#Sl System Sequence (5ʹ→3′)
1 HH GC C1C2G3A4 T5G6A7T8T9T10A11T12 G13G14C15C16 GC
2 LH GC C1C2C3C4 T5G6A7T8T9T10A11T12 G13G14C15C16 GC
3 LH1 GC A1A2A3A4 T5G6A7T8T9T10A11T12 G13G14C15C16 GC
4 HL GC C1C2G3A4 T5G6A7T8T9T10A11T12 A13A14A15A16 GC
5 LL GC C1C2C3C4 T5G6A7T8T9T10A11T12 A13A14A15A16 GC
6 LL1 GC A1A2A3A4 T5G6A7T8T9T10A11T12 A13A14A15A16 GC

All sequences comprised of the core motif preferentially bound by the homeodomain, flanked by varying 4-mer sequences on the 5′ and 3′ ends. All sequences were capped with a basepaired GC dinucleotide to prevent end fraying during the simulation. H, high-affinity flank; L, low-affinity flank; e.g., LH, low (L) affinity 5′ flank and high (H) affinity 3′ flank. GC cap; 3′ flank; 5′ flank; core motif; GC cap.

Molecular modeling of Exd-Hox heterodimer

The Exd-Ubx heterodimer binds the consensus site TGATTTAT with high specificity. Among all Exd-Ubx-DNA ternary crystal structure complexes available on the Protein Data Bank (PDB: 1b8i, 4cyc, 4uus), the highest resolution complex (PDB: 1b8i, resolution = 2.4 Å) was used as the starting structure for MD simulation setup (60). The complex comprises of the Exd and Ubx homeodomains bound in a head-to-tail orientation to the consensus DNA sequence 5′-TGATTTAT-3′ (Fig. 1 a). Several conserved Exd-Ubx interfacial residues, which are known to be crucial in its DNA-binding specificity (60, 61, 62, 63), are missing in the selected crystal structure (PDB: 1b8i) and were modeled (highlighted in Fig. 1 a) to ensure that simulations are carried out with starting structures that replicate accurate intermolecular interactions. The missing N- and C-termini of Ubx were modeled by homology modeling with a closely related homeodomain Scr using the UCSF Chimera molecular modeling software package (64). The six Exd-DNA-Ubx ternary complexes modeled in this study were each simulated for 500 ns, summing up to a total simulation time of 3 μs, thereby providing reliable prototypes for corroborating experimental evidence of the Exd-Ubx interface. Modeling of the Exd-DNA-Scr complex was carried out in a similar way and is described in the supporting material.

Figure 1.

Figure 1

The conserved Exd-Ubx binding interface. (a) Recognition and binding of DNA consensus motif 5′-TGATTTAT-3′ by Exd (green)-Ubx (ice blue) heterodimer. The recognition helices of Exd and Ubx (indicated by ) bind to opposite major groove faces of the DNA consensus site. The Ubx NTA binds to the minor groove of the consensus site. The crystal structure of Exd-DNA-Ubx was obtained from the PDB (1b8i). Missing regions critical for DNA-protein and protein-protein interactions were modeled (highlighted in opaque shade). DNA core motif (TGATTTAT) is in off-white and the 4-mer flanks in gray. (b) Interfacing residues between Exd and Ubx are depicted in surface mode and their (c) percentage occurrence during the simulation period is depicted using the Exd-HH-Ubx complex as a representative example. All interactions were calculated for the last 400 ns of the simulation time. CTE, C-terminal extension; H1/2/3, homeodomain 1/2/3; TALE, three-amino acid loop extension; HX, hexapeptide; NTA, N-terminal arm. To see this figure in color, go online.

Preparing Exd-Hox-DNA complex

To analyze the effect of flanking sequences on Exd-Hox:DNA binding, DNA duplexes with different sequences (Table 1) were generated using the Nucgen-Plus software suite using Bolshoy parameters (65,66). All sequences were capped with a GC dinucleotide to prevent the effect of end-fraying from affecting the structural properties of the motif of interest. The 20-mer DNA duplexes were energy minimized and simulated in a TIP3P water box for 50 ns before using the structure to create the Exd-Hox-DNA complexes. The simulated sequence was aligned to the DNA duplex in the crystal structure (PDB: 1b8i) using the Needleman-Wunsh algorithm incorporated in the Chimera molecular modeling program (64). Following successful alignment, the DNA duplex from the crystal structure was replaced by the designed DNA sequence. Thereafter, the naked DNA simulations were extended up to at least 200 ns for each sequence. The same protocol was followed to generate Exd-Ubx-DNA complexes containing different DNA sequences as listed in Table 1 as well as the Exd-Scr-DNA complex.

MD simulation details

Each Exd-Hox-DNA complex was placed at the center of a rectangular TIP3P water box with periodic boundary conditions in all directions. The box dimensions were chosen such that any DNA/protein atom was at least 15 Å away from the box surface, preventing unwanted interactions with its image in translated unit cells. Required number of Na+ and Cl counterions were added to first neutralize the systems and then attain a physiological salt concentration of 150 mM. All simulations were performed using the pmemd CUDA version of the Amber14 MD suite (67). The Amber OL15 force field for DNA, which incorporates parmbsc0 along with dihedral (beta, epsilon, zeta, chi) corrections, and the ff14SB force field for protein, were adopted. The solvated Exd-Hox-DNA complexes were equilibrated using an alternating heating and cooling protocol. The protocol involved energy minimization of the systems using 2500 cycles of steepest descent followed by 2500 cycles of conjugate gradient algorithms. This was followed by rapid heating of the system to 298 K followed by rapid minimization. This alternating heating and cooling protocol enables optimal equilibration of the solvent and ions around the biomolecular complex. The above protocol was repeated twice, reducing the restraint on the solute from 10 to 5 kcal/mol. Following equilibration of the solvent, the restraint on the solute was released and the system was gradually heated up to 298 K before carrying out a production MD simulation run of 500 ns for each solvated complex. All simulations were performed using a 2-fs time step, and snapshots were saved from the simulation for analysis every 2 ps. To enable volume variation, simulations were performed in an NPT ensemble using the Berendsen thermostat and barostat. SHAKE was used to constrain bond lengths between heavy atoms and hydrogens. Analyses of MD trajectories were carried out using in-house codes, NUPARM software suite (65) and the cpptraj module in Amber 18 (67).

Results

The conserved Exd-Ubx interface

The Exd and Ubx proteins elegantly juxtapose to bind the 8-mer cognate sequence TGATTTAT with high specificity and affinity (Fig. 1 a). As illustrated in Fig. 1 b and c, Exd and Ubx interface through three different regions 1) the HX motif of Ubx with the TALE pocket of Exd, 2) the linker region of Ubx with the C-terminal extension of Exd, and 3) the UbdA domain of Ubx with the H2 domain of Exd (60,61,63,68). The six systems simulated here, differing only in the flanking sequences, showed almost identical Exd-Ubx interfacial interactions. In addition to the well-explored interaction between the YPWM (or HX) motif of Ubx and a conserved hydrophobic pocket in Exd, we identified that Phe and His residues located N-terminally to the HX motif form hydrophobic interactions with Ile63 and Gln67 of Exd in all the simulated systems. In addition, Lys58, Gln61, and Glu65 of the Ubx H3 and UbdA domains form an interface with residues 42 to 46 of the Exd homeodomain (Gly42-Ile43-Thr44-Val45-Ser46). The importance of these interactions in effective recognition of the DNA consensus by the Exd-Ubx heterodimer has been described in earlier structural and mutational studies. Apart from the above interfacial interactions that are conserved in all systems, interactions between the linker region of Ubx and residues C-terminal to the HD region of Exd are also observed in all systems. However, the residues involved in these interactions differed in each of the simulated systems, possibly because both the linker region and the Exd C-terminal are highly disordered.

The dynamic Ubx-DNA binding interface

Unlike the Exd-Ubx interface, which was almost conserved across all the simulated systems, the Ubx-DNA binding interface was found to vary with the sequence of the flanks. Fig. 2 presents the time evolution of binding of the N-terminal arm (NTA) with the core DNA site in all six simulated systems. The NTA inserts into the DNA minor groove in all systems, but the dynamics of its interaction with DNA and the trajectory of the NTA vary significantly. The inherently dynamic NTA is confined in the minor groove by two forces 1) conformational stabilization of the NTA upon Exd-Ubx binding and 2) binding affinity of NTA for the core DNA. Since the Exd-Ubx interfacial interactions are largely identical in all the simulated systems (Fig. 1), we turned our attention to decipher the basis for altered affinity of NTA for the cognate DNA site in the simulated systems.

Figure 2.

Figure 2

The flexible Ubx-DNA binding interface. Time evolution of binding of the Ubx N-terminal arm (NTA) with the cognate DNA motif (TGATTTAT) in all six simulated systems. The NTA is color coded based on the simulation time. To illustrate NTA:DNA binding dynamics, snapshots of MD runs were saved every 10 ns between 100 and 500 ns. The 20-mer DNA sequences used for simulation in complex with Exd-Ubx homeodomain are listed (see Table 1 for details). To see this figure in color, go online.

Same cognate sequence, different binding affinities

For a more detailed understanding of the NTA-DNA interaction, we determined residue-wise interaction energy between the residues constituting the NTA and the core region where it binds (sense strand: 5′-A7T8T9T10A11T12 N13N14-3ʹ; and antisense strand: 5′-N′14N′13A′12T′11A′10A′9A′8T′7-3′) (Fig. S2). We could broadly categorize the interacting NTA residues into two types—those that penetrate the minor groove and therefore interact with both the DNA strands (boxed in the figure) and those that interact with the phosphate backbone of either strand. In all systems, residues penetrating the minor groove were primarily arginines, which have high affinity for the AT-rich electronegative minor groove of the cognate site. Among all arginines, Arg 5 was found to interact with the minor groove in all six sequences. This corroborates with earlier structural and biochemical studies where the interaction of Arg5 of the NTA with the core DNA binding site was shown to be highly conserved among all Hox TFs in Drosophila (60,69,70).

Available crystal structures for DNA-bound complexes of Hox TFs have all shown the insertion of Arg5 at the YN step of the cognate sequence T5G6A7Y8N9N10A11Y12 (54,55,59,60). Our simulations reveal that the guanidino-N6 of Arg5 of the Ubx NTA can form a H-bond with O2 of thymine or N3 of adenine at basepairs 8, 9, or 10, occasionally forming a cross-strand bridge between A-N3 and T-O2. To capture this dynamic interaction, we slightly relaxed the definition of a H-bond to an N-H-O/N angle cutoff of 120° and an N-H … O/N distance cutoff of 3.5 Å. Since the core sequence occurs in all six DNA oligomers studied here, the conserved Arg5-mediated DNA-NTA interaction was present in all systems (Fig. 3). However, both the persistence and the specific site of interaction varied significantly. In the high-affinity HH and LH systems the well-studied Arg5-mediated T9-O2:A8′-N3 bridge was observed (T5G6A7T8T9T10A11T12) (54,55,60). However, in the low-affinity HL and LL systems, Arg5:DNA bridge formation was highly delayed and remained poorly persistent, indicating loss of binding affinity. The most significant loss of interaction, however, was observed for the LH1 and LL1 systems. Arg5 did not insert at the expected T8T9 basepair step in these systems, inserting instead at the adjacent T9T10 step, indicating loss of binding specificity. Such altered specificity has only been observed when the core DNA binding sequence is modified (54,55). However, in this study the core sequence was identical in all systems (Fig. 2). Clearly, altering the flanking sequence alone has a profound effect on both affinity and specificity of DNA-Ubx binding.

Figure 3.

Figure 3

Persistence of H-bond between Arg5 (R5) of Ubx and the DNA cognate sequence (T5G6A7T8T9T10A11T12) during the entire simulation period in all simulated systems. The guanidino group of Arg5 can H-bond with T-O2 (black) and with A-N3 (green). Simultaneous occurrence of black and green dots indicates formation of R5-mediated H-bond bridge between T-O2 and A-N3. H-bonds between the cognate site and other arginines (R2 and R10) of the N-terminal arm are indicated in brown, as described in the schematic. An N-H-O/N angle cutoff of 120° and an N-H … O/N distance cutoff of 3.5Å was used to capture the dynamic R5:DNA interaction. Bases in the antisense strand are indicated by (‘). To see this figure in color, go online.

Shape preferences of the cognate site

It is well established that the conserved Arg5 of the Hox NTA identifies its preferred site of binding, the T8T9 step, by recognizing a typical DNA shape feature—a narrow stretch of minor groove formed by an A tract at the core of the consensus motif T5G6A7T8T9T10A11T12 (54, 55, 56, 57, 58, 59). Any alteration in this DNA shape profile is known to alter binding specificity of Hox TFs. The narrow groove width of the Hox DNA binding sites is evident from crystal structures of the TF-bound complexes. However, crystal structures of the corresponding naked DNA do not exist. Hence, no structural evidence exists to show if the different DNA binding sites pre-exist in unique structural variants.

To understand if the core sequence inherently possesses a typical minor groove width profile, we elucidated the shape space explored by the free DNA from simulations of all six DNA sequences. In agreement with an earlier study, our simulations confirmed that variations in minor groove width show maximum dependence on the local translation parameter slide and the rotational parameter roll (71). Hence, we represented the shape space traversed by the cognate DNA (T5G6A7T8T9T10A11T12) in terms of variation in minor groove width as a function of slide. Figs. 4 a and S3 depict that every basepair step within the core sequence samples a wide shape space, with some of them exhibiting strong preferences for certain structural variants. For example, the canonical site of Arg5 binding—the T8T9 step—inherently prefers a groove width that is ∼2 Å lower than the minor groove width at the neighboring steps. Moreover, within the cognate site, structural preferences of the same dinucleotide step were context dependent. For example, AT steps A7T8 and A11T12 in T5G6A7T8T9T10A11T12 exhibit starkly different shapes. Most importantly, the structural preference of the cognate site in free DNA was not impacted by the flanking sequence. Hence, in all six systems the core TFBS sampled near-identical shape space despite being flanked by different sequences (HH and HL sequences shown as representative examples in Figs. 4 a and S3).

Figure 4.

Figure 4

2D shape space explored by DNA in apo and complexed forms. (a) Structural sampling by the cognate site of apo-HH DNA (CCGAT5G6A7T8T9T10A11T12GGCC) shown for selected basepairs steps. Fraction of occurrence of a particular structural variant is determined from the entire simulation period of 200 ns for apo DNA. The dotted line is used to guide the eye to perceive the minor groove width profile of the consensus site. Time evolution of shape transition of the (b) HH and (c) HL DNA in complex with Exd-Ubx heterodimer for basepair steps that are involved in binding of Ubx R5 residue and their immediate neighbors (T5G6A7T8T9T10A11T12). The basepair step in question is highlighted in boldface/underline. To see this figure in color, go online.

DNA shape: Rigid or flexible?

Unlike in free DNA, the 2D shape space explored by the core binding site in complexed DNA varied significantly when its 3′ flanking sequence was changed from the high-affinity GGCC flank (HH and LH, Figs. 4 b and S4 a) to the low-affinity AAAA flank (HL and LL, Figs. 4 c and S4 b). The differences are illustrated for basepairs that are involved in Arg5 binding (T8T9 and T9T10, see Fig. 3) and their immediate neighbors (T5G6A7T8T9T10A11T12). As evident from Fig. 4 b, in the core binding site of the HH sequence the two consecutive TT steps transit to a low minor groove width rapidly and in a concerted fashion at <100 ns, closely followed by the neighboring T10A11 and A11T12 steps. On the other hand, in the low-affinity HL sequence the TT steps show a concerted, but highly delayed, structural transition to the low minor groove width regime at ∼350 ns (Fig. 4 c). Moreover, unlike in HH DNA, the neighboring T10A11 and A11T12 steps in HL DNA transit to a higher groove width before the transition of the TT steps. Similar differences were observed between the LH and LL systems as well (Fig. S4). Thus, the same cognate DNA, which traverses a similar shape space in the apo form, chooses to assume different shapes in the protein bound form depending on the sequence of the DNA flanking the cognate site. This finding substantiates that the double helix does not remain restricted to a predisposed shape but can be flexed to sample diverse structural variants (42).

Since the cognate sequence sampled identical shape space in the apo form but diverse shape space in the protein-bound form, we elucidated if these shape variations were induced by protein binding. In HH and HL, sequences narrowing of minor groove width at basepair step T8T9 occurs in close conjunction with insertion of Arg5 at this basepair step (see Figs. 3 and 4, b and c). Thus, apparently Arg5 binds to the T8T9 step in HH-DNA at <100 ns, thereby causing an early narrowing of groove width, whereas it binds the T8T9 step of HL-DNA at ∼350 ns, thereby delaying groove width narrowing. However, this does not hold true across all systems. In LH, LH1, and LL1-DNA, Arg5 insertion and groove width narrowing are not correlated (Figs. 3, S4 a, and 5 a, respectively). Thus, shape transitions are not solely protein induced. On the contrary, in all systems, shape transition at the T8T9 step is correlated with that of the neighboring basepair steps. The T8T9 step and its immediate neighboring T9T10 step always transit in a concerted fashion to the lower groove width regime. The time of this concerted transition is closely dependent on the structural variation of the neighboring TA and AT steps (T5G6A7T8T9T10A11T12). For example, in sequences where the cognate site is flanked by an A tract on its 3′ end (HL, LL, and LL1), the TA and AT steps transit to a wide minor groove width variant early in the simulation. With its neighboring steps assuming a wide minor groove width, narrowing of groove width at the T8T9 step is delayed in these low-affinity sequences (see Figs. 4 c, S4 b, and 5 b). Reduced binding of Arg5 with these sequences is a consequence of this delayed groove width narrowing and not vice versa (Fig. 3). Conversely, when the cognate site is flanked on its 3′ end with a GGCC sequence, the TA and AT steps do not appear to have any shape preferences and merely follow the T8T9/T9T10 steps to transit to a narrow groove width variant. Thus, sequences flanking the TFBS, lying several basepairs away from the site of interaction, appear to allosterically influence shape of the core DNA motif and subsequently TF binding affinity.

Figure 5.

Figure 5

Time evolution of structural transition of the (a) LH1 and (b) LL1 DNA in complex with Exd-Ubx heterodimer for basepair steps that are involved in binding of Ubx R5 residue and their immediate neighbors (T5G6A7T8T9T10A11T12). The basepair step in question is highlighted in boldface/underline. To see this figure in color, go online.

Allosteric flexing of DNA shape

The shape space scanned by the cognate site is allosterically modulated by the A tract that flanks the cognate sequence on its 3′ end in the HL, LL, and LL1 systems (5′-TGATTTATAAAA-3′). Despite possessing the ideal core site for TF binding, these sequences are unable to flex to appropriate protein-induced shape (Figs. 4 c, S4 b, and 5 b). The canonical Arg5 binding TT step in these sequences is flanked by a highly deformable TA step followed by an extremely rigid A tract. When present next to the rigid A tract, the already flexible TA step further becomes a point of large conformational factures, especially in protein-bound complexes (72,73). For example, the widely studied TATA-box binding protein binds to the TA step at the minor groove of the TATA box motif (TATAa/tAa/tN), flexing the TA step to an almost A-DNA-like conformation (also known as the TA-DNA) with large positive roll, positive slide, and low twist.

The TATAAAAA subsequence of HL, LL, and LL1 DNA studied here closely resembles the TATA-box motif. However, unlike TATA-box binding protein, the alpha helix of Ubx binds to the major groove edge of DNA, restricting the TA step to a disfavored low roll, narrow major groove conformation for several hundred nanoseconds. A similar conformational stress at the TA step is not observed for the HH/LH sequences, clearly indicating that, in protein-bound complex, the conformation of the deformable TA step is context dependent. As shown in Fig. 6 a for HL DNA, this conformational stress at the TA step is released with the disruption of a highly conserved H-bond between A11 of the consensus motif (TATAAAAA) and Asn51 of the Ubx α3 recognition helix at ∼370 ns. This Asn:Adn H-bond is crucial to cognate site recognition by Ubx, and its disruption leads to weakening of the Exd-DNA-Ubx complex, as indicated by the comparatively increased dynamics of bound Ubx in the low-affinity systems (Fig. S5). Nonetheless, disruption of the Asn:Adn H-bond permits the TA step to freely assume its preferred large positive roll conformation (Fig. 6 a), leading to an increase in interphosphate distance on the major groove edge, signifying a widening of the major groove (Fig. 6 a). This release in conformational stress at the TA step enables the neighboring TT steps to rapidly transform to a narrow minor groove width variant (Fig. 4 c) and the canonical H-bond between Arg5 and the T8T9 step is formed (Figs. 6 d and 3 d: 370 ns). Thus, DNA shape at the canonical TT binding site of the Exd-Ubx TF heterodimer is modulated allosterically by the flexibility of flanking sequences lying several basepairs away from it. DNA shape can therefore be allosterically flexed.

Figure 6.

Figure 6

Sequence of events leading up to the formation of the canonical Arg5:DNA H-bond in the HL system (CCGAT5G6A7T8T9T10A11T12AAAA). The TA step in HL is held in an unfavorable conformation until (a) the disruption of a highly conserved H-bond between A11 of the consensus motif (TATAAAAA) and Asn51 of the Ubx α3 recognition helix (inset) at ∼370 ns (b) permits the TA step to freely assume its preferred large positive roll conformation, (c) leading to an increase in interphosphate distance on the major groove edge and finally enabling the neighboring TT steps to rapidly transform to a narrow minor groove width conformation to (d) form the canonical H-bond with Arg5. (a) and (b) are truncated at 400 ns since no changes were observed in these parameters beyond this time point. The interphosphate distance was calculated using the NUPARM suite (64). All other calculations were carried out using cpptraj v18.01 (66). To see this figure in color, go online.

Discussion

More than a decade has passed since the scientific community identified that the presence of a consensus binding sequence alone is not sufficient to ensure TF binding. The focus has since shifted to the role of combinatorial interactions, where both the cognate binding site and its context are believed to dictate TF recognition and binding (8). The cognate TFBS presents base-specific chemical signatures and appropriate three-dimensional shape that are conducive for binding a specific TF partner (21,49). But elucidating the role of the context is not simple, since the sequences flanking the cognate site do not engage in base-specific interactions with the TF. Hence, a significant body of work has focused on the shape of the flanking sequence and its role in modulating TF binding at the core (19,21,23,74). Results have pointed toward a possible role of GC/AT-rich flanks, since these present with prominent shape features like wide/narrow groove width and high/low propeller twist. However, shape features are localized and do not impact beyond the immediate neighboring step. Hence, the mechanism by which shape of the flank can impact binding at the core motif, which lies several basepairs away, was missing.

Flanking DNA flexibility—A key element of the DNA-TF recognition code

To study the role of the sequence context on DNA recognition by TFs, it was imperative to study a TF that searches beyond the sequence alone to recognize a cognate motif by its shape. The widely studied homeodomain TF Exd-Ubx therefore became the system of choice. A change in the shape of the cognate site drastically alters the DNA binding abilities of this model Hox heterodimer (54, 55, 56, 57). For the first time in this study we employed an alternate strategy of retaining the shape of the cognate site and merely altering the sequence of its flanking context that is not even contacted by the TF. Under the influence of the varying flanks, the same sequence traversed a wide range of shapes—nomenclated here as the “shape space.” Interestingly, this DNA shape space is discretized only in the presence of TF, while remaining as a continuum of indistinguishable shapes in its absence. Our analysis revealed that the preferred shape of the cognate sequence is directly correlated with the flexibility of the flanking DNA. Evidently, between sequence and shape there exists a moderator—flexibility. The right cognate sequence can be induced into the wrong shape by flexibility signatures lying outside of the cognate site, in the flanking region. Modulated by flexibility signatures in the context, the core sequence samples different shape space. The composition of the shape ensemble in turn dictates TF binding affinity to the core sequence.

Flexibility gone wrong—Inaccurate flexibility leads to nonspecific TF binding

The significance of flanking DNA flexibility on TF binding is further asserted in two of our six simulated systems—LH1 and LL1—where the conserved Arg5 of Ubx fails to insert at the canonical T8T9 step and instead inserts at the T9T10 bp step of the cognate site (T5G6A7T8T9T10A11T12). As shown in Fig. 5, the groove width of LH1 and LL1 sequences at the canonical binding site T8T9 and its neighboring T9T10 step converges to a narrow minor groove width variant, just like the rest of the systems. Despite this, Arg5 fails to insert at its ideal binding site in LH1 and LL1, reiterating as mentioned above that having the right shape may not be enough to ensure right binding between a TF and its partner. The sequence feature that sets apart LH1 and LL1 sequences from the rest of the simulated systems is the presence of an A tract flanking the 5′ end of the core TFBS. This A tract is not contacted directly by Exd or Ubx homeodomains, hinting at its indirect impact on Ubx binding at the core sequence. The presence of a 5′ terminal A tract appears to impact Arg5 binding to LH1 and LL1 DNA in two ways, firstly Arg5 fails to insert at the canonical T8T9 step inserting instead at the T9T10 step and secondly even at the T9T10 step Arg5:DNA H-bond is transient and weak.

The presence of the 5′-AAAA flank makes LH1/LL1 sequences asymmetric A tracts as shown below (only the A tract regions of the two sequences are marked for clarity):

LH1:5-AAAA------------andLL1:5-AAAA-----------AAAA-----------AAA----5-----------AAA-----------5

Asymmetric A tracts are known to be rigid. Experimentally, the rigidity of A tract sequences has been analyzed using multiple techniques. Among these, NMR studies have suggested that the rigidity of the A tract traverses to rigidity in the hydration spine of sequences harboring A tracts (75, 76, 77, 78). Thus, as a measure of sequence rigidity, we tracked H-bond dynamics in the hydration spine of the DNA sequences in the apo and protein-bound forms. Persistence of the hydration spine was defined as frequency of disruption of a cross-strand water bridge and its re-formation by a different water molecule. Fig. S6 compares the persistence of cross-strand H-bond bridges forming the minor groove hydration spine in the apo HH and LL1 sequences. Only those H-bond bridges that occurred in both sequences and had a lifetime of >20 ps (T5:A4ʹ, T8:T7ʹ, T9:A8ʹ, and T12:A1ʹ) were considered for probing the dynamics of the hydration spine of the cognate sequence T5G6A7T8T9T10A11T12. As evident from Fig. S6, all four water bridges were more persistent in the rigid LL1 DNA with lifetimes extending up to several hundreds of picoseconds. The most striking difference was observed at the central T8:T7ʹ and T9:A8ʹ bridges. These bridges were highly persistent in LL1 but transient-to-absent in HH DNA. Hence, we tracked the dynamics of these central bridges as a measure of flexibility of the protein-bound LL1 and LH1 sequences in comparison with the high-affinity counterparts (like HH) (Fig. 7).

Figure 7.

Figure 7

Water-bridge dynamics as a measure of DNA flexibility. Persistence of water-mediated cross-strand H-bonds bridging (ac) T8:T7ʹ and (df) T9:A8ʹ bases in the consensus site (5′-T5G6A7T8T9T10A11T12-3′) of HH, LH1, and LL1 sequences. “+” indicates the presence of a water-mediated H-bond bridge, i.e., a single water molecule simultaneously H-bonded to T8-O2 and T7′-O2 atoms (T8:T7ʹ bridge) or T9-O2 and A8′-N3 atoms (T9:A8ʹ). The presence of contiguous patches of + indicate highly persistent H-bond bridges (indicated by ). Discrete + symbols indicate H-bond bridges that are highly transient. Persistence of H-bond bridges increased with increasing number of A tracts (HH < LH1 < LL1) with lifetimes extending up to several tens of nanoseconds in the A tract containing sequences (LH1 and LL1). A schematic of the DNA duplex with cross-strand H-bond bridges (red lines) is given for reference. A stringent angle cut-off of 135° and distance cut-off of 3.0 Å between heavy atoms was used to define H-bonds. The H-bond bridges zip up the minor groove of LL1 DNA in a cooperative fashion. (g) Cross-correlation matrix of the step-wise minor groove width of the LL1-DNA shows positive correlation between the minor groove width of the central A tract (A9ʹ) and the 5′ A tract flank (A3). To see this figure in color, go online.

In agreement with an earlier study, the lifetime of cross-strand water bridges was found to be higher in bound DNA versus in its free form, irrespective of the sequence (79). Interestingly, there appeared to be a competition between the Arg5:DNA H-bond and the water-mediated T8:T7ʹ and T9:A8ʹ H-bond bridges in the LH1 and LL1 sequences. As evident from Fig. 7, ac the T8:T7ʹ bridge and the Arg5:DNA H-bond (compare with Fig. 3) coexisted in the HH-DNA complex but were mutually exclusive in the LH1 and LL1 sequences. For example, in the LL1 system, in the first 100 ns of the simulation the presence of the Arg5-mediated T10:A9ʹ bridge (Fig. 3 f) competes out the water-mediated T8:T7ʹ bridge followed by the reverse at >100 ns (Fig. 7 c). Even more striking is the differential dynamics of the water-mediated T9:A8ʹ bridge (Fig. 7, df). The water-mediated T9:A8ʹ bridge in HH is transient (Fig. 7 b) since it is competed out by the highly persistent Arg5-mediated T9:A8ʹ bridge (Fig. 3 a). In the LH1 and LL1 sequences, however, the water-mediated bridges are highly persistent with lifetimes spanning several tens of nanoseconds. The lifetime of the water-mediated H-bond bridges increases with increasing number of flanking A tracts (HH < LH1 < LL1), indicating that rigidity increases with increasing number of flanking A tracts.

The rigid LH1 and LL1 DNA structures not only harbor a persistent hydration spine, but with its aid undergo shape changes independent of protein binding. As shown in Fig. 7 g, a cross-correlation matrix of stepwise minor groove width of LL1 DNA shows a positive correlation between minor groove width of the central A tract (5′-AAAATGATT9TATAAAA-3′) and the 5′ A tract (5′-AAA3ATGATTTATAAAA-3′). This indicates that, in LL1-DNA, a minor groove width narrowing initiates at the central A tract and cooperatively propagates to the 5′ A tract, independent of Ubx-Arg5 binding. We know from classical studies that the hydration spine can aid in such a zipping mechanism to close-up the minor groove.

Flanking DNA diminishes subtle shape differences between cognate sites of homologous TFs

A close homolog of the Ubx TF is the Scr homeodomain, which, in a similar heterodimeric form with the Exd cofactor, recognizes the cognate DNA site by its minor groove shape profile (55). While the Exd-Ubx heterodimer prefers a stretch of narrow groove width (TGATTTAT), Exd-Scr binds to a minor groove that is constricted at two different basepair steps (AGATTAAT). A conserved R5 is known to insert into the first pocket (GATT), while conserved residues R3 and His-12 insert into the second one (AATN) (55). The Hox proteins are extremely sensitive to the shape of their DNA binding partners, and Scr has been shown to bind poorly to the Ubx-motif due to their shape differences. Having demonstrated the significant ability of flanks to alter the shape of the cognate site, we asked a final question—what is the impact of flanking sequence on subtle shape differences between similar cognate motifs?

Scr and Ubx share similar flanking sequence preferences (Fig. S1 b) (28). Apo- and complexed forms of a low-affinity Scr motif was simulated (5′-AAAAAGATTAATAAAA-3′) akin to the LL1 motif of Ubx (supporting material). The apo LL1-DNA preferentially traversed the typical shape profile observed in the Exd-Scr-DNA crystal structure (Fig. 8 a) (55). However, this characteristic shape was lost in the complex, where the minor groove at the 3′ AT step (AGATTAAT) widened instead of undergoing the expected constriction (Fig. 8 b). This is because the highly flexible TA flank prevents the AT step from restricting to a low groove width profile. Surprisingly, the new shape attained by the Scr cognate site instead exhibits striking resemblance with the shape of the Ubx motif, with both showing contiguous groove narrowing at the central basepair steps followed by increasing groove width toward the 3′ end (compare Fig. 8, b and c). As a result of this altered shape, the conserved His-12 of Scr loses interaction with the AATA pocket and is ejected early during the simulation (insets to Fig. 8 b). A similar loss of Scr-His:DNA interaction was observed when the Scr cognate site was replaced by the Ubx cognate site.55 Our results show that, by merely altering the flanking sequence alone, shape differences that enable closely related TFs to distinguish between their binding sites could be perturbed. Evidently, flanking sequences could significantly impact in vivo binding specificities of TFs that use subtle shape variations to distinguish between similar cognate sites.

Figure 8.

Figure 8

Flanking DNA sequence diminishes subtle shape distinctions between similar cognate sites. (a) Structural sampling by the cognate site of Scr LL1 DNA in the apo form shows preference for characteristic minor groove width profile with groove constriction at two bp steps (AAAAA5G6A7T8T9A10A11T12AAAA). (b) The same shape profile is lost in the Exd-Scr-DNA complex, due to which the conserved His-12 residue of Scr is ejected from the groove within 25 ns of the simulation (inset). Conserved R3 and R5 residues continue to persist in H-bonding with respective bp steps (insets to bp step panels). In the complexed form with Exd-Scr, the cognate site instead exhibits stark resemblance with the groove width profile exhibited by (c) the DNA binding motif in the Exd-Ubx-DNA complex. An N-H-O/N angle cutoff of 120° and an N-H … O/N distance cutoff of 3.5 Å was used to capture the dynamic Arg (R3/R5):Thy-O2 interactions. Water-mediated interactions between His-12 and DNA were calculated using an angle cutoff of 135° and a distance cutoff of 3.5 Å (see Fig. 4 for all other details). To see this figure in color, go online.

Concluding remarks

Flexibility signatures in the motif context allosterically modulate the shape of the motif itself. Moreover, inherent lack of flexibility of DNA flanks can make the core motif resistant to protein-induced shape changes. Our findings could offer a glimpse into DNA-binding events by TFs in vivo, where the shape space of the cognate motif can be modulated not only by sequence of flanking DNA but also by cell-type, environmental stimuli, or cofactor occupancy at neighboring binding sites (80). Lack of this key feature in TFBS search algorithms might explain the discrepancies observed between TFBS predicted in vitro from those observed in vivo (19,81).

Author contributions

Conceptualization, methodology, performance of simulations, data analysis, manuscript preparation, D.G.D., conceptualization, methodology, manuscript reviewing and supervision, M.B.

Data availability

The authors declare that the data supporting the findings of this study are available within the article and its supporting material. The raw MD simulation trajectories can be obtained from the corresponding author (Manju Bansal) upon reasonable request.

Acknowledgments

Authors are grateful to Professor Aseem Z. Ansari and Dr Devesh Bhimsaria for useful discussions and critical comments that have contributed significantly to improve the manuscript. The authors are grateful to Professor Debasisa Mohanty and Professor Balasubramanian Gopal for providing computational resources. M.B. acknowledges the Indian National Science Academy for a Senior Scientist fellowship. The work was supported by a grant from the Ministry of Electronics and Information Technology (MeitY, project no.: CORP:DG:3191), Government of India, through the National Supercomputing Mission (NSM) program.

Declaration of interests

The authors declare no competing interests.

Editor: Tamar Schlick.

Footnotes

Supporting material can be found online at https://doi.org/10.1016/j.bpj.2022.08.015.

Supporting material

Document S1. Figures S1–S6
mmc1.pdf (1MB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (4.9MB, pdf)

References

  • 1.Pabo C.O., Nekludova L. Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? J. Mol. Biol. 2000;301:597–624. doi: 10.1006/jmbi.2000.3918. [DOI] [PubMed] [Google Scholar]
  • 2.Von Hippel P.H., Rees W.A., et al. Wilson K.S. Specificity mechanisms in the control of transcription. Biophys. Chem. 1996;59:231–246. doi: 10.1016/0301-4622(96)00006-3. [DOI] [PubMed] [Google Scholar]
  • 3.Jones S., van Heyningen P., et al. Thornton J.M. Protein-DNA interactions: a structural analysis. J. Mol. Biol. 1999;287:877–896. doi: 10.1006/jmbi.1999.2659. [DOI] [PubMed] [Google Scholar]
  • 4.Bai L., Morozov A.V. Gene regulation by nucleosome positioning. Trends Genet. 2010;26:476–483. doi: 10.1016/j.tig.2010.08.003. [DOI] [PubMed] [Google Scholar]
  • 5.Zhu F., Farnung L., et al. Taipale J. The interaction landscape between transcription factors and the nucleosome. Nature. 2018;562:76–81. doi: 10.1038/s41586-018-0549-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yin Y., Morgunova E., et al. Taipale J. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. 2017;356:eaaj2239. doi: 10.1126/science.aaj2239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Slattery M., Zhou T., et al. Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 2014;39:381–399. doi: 10.1016/j.tibs.2014.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dror I., Rohs R., Mandel-Gutfreund Y. How motif environment influences transcription factor search dynamics: finding a needle in a haystack. Bioessays. 2016;38:605–612. doi: 10.1002/bies.201600005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Inukai S., Kock K.H., Bulyk M.L. Transcription factor–DNA binding: beyond binding site motifs. Curr. Opin. Genet. Dev. 2017;43:110–119. doi: 10.1016/j.gde.2017.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shen N., Zhao J., et al. Gordan R. Divergence in DNA specificity among Paralogous transcription factors contributes to their differential in vivo binding. Cell Syst. 2018;6:470–483.e8. doi: 10.1016/j.cels.2018.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lambert S.A., Jolma A., et al. Weirauch M.T. The human transcription factors. Cell. 2018;172:650–665. doi: 10.1016/j.cell.2018.01.029. [DOI] [PubMed] [Google Scholar]
  • 12.Levo M., Segal E. In pursuit of design principles of regulatory sequences. Nat. Rev. Genet. 2014;15:453–468. doi: 10.1038/nrg3684. [DOI] [PubMed] [Google Scholar]
  • 13.Stormo G.D., Zhao Y. Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 2010;11:751–760. doi: 10.1038/nrg2845. [DOI] [PubMed] [Google Scholar]
  • 14.Berger M.F., Philippakis A.A., et al. Bulyk M.L. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Warren C.L., Kratochvil N.C.S., et al. Ansari A.Z. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl. Acad. Sci. USA. 2006;103:867–872. doi: 10.1073/pnas.0509843102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jolma A., Kivioja T., et al. Taipale J. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 2010;20:861–873. doi: 10.1101/gr.100552.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Carlson C.D., Warren C.L., et al. Ansari A.Z. Specificity landscapes of DNA binding molecules elucidate biological function. Proc. Natl. Acad. Sci. USA. 2010;107:4544–4549. doi: 10.1073/pnas.0914023107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jolma A., Yan J., et al. Taipale J. DNA-binding specificities of human transcription factors. Cell. 2013;152:327–339. doi: 10.1016/j.cell.2012.12.009. [DOI] [PubMed] [Google Scholar]
  • 19.Cohen D.M., Lim H.W., Won K.J., Steger D.J. Shared nucleotide flanks confer transcriptional competency to bZip core motifs. Nucleic Acids Res. 2018;46:8371–8384. doi: 10.1093/nar/gky681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bhimsaria D., Rodríguez-Martínez J.A., et al. Ansari A.Z. Specificity landscapes unmask submaximal binding site preferences of transcription factors. Proc. Natl. Acad. Sci. USA. 2018;115:E10586–E10595. doi: 10.1073/pnas.1811431115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Levo M., Zalckvar E., et al. Segal E. Unraveling determinants of transcription factor binding outside the core binding site. Genome Res. 2015;25:1018–1029. doi: 10.1101/gr.185033.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rodríguez-Martínez J.A., Reinke A.W., et al. Ansari A.Z. Combinatorial bZIP dimers display complex DNA-binding specificity landscapes. Elife. 2017;6 doi: 10.7554/eLife.19272. e19272-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gordân R., Shen N., et al. Bulyk M.L. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 2013;3:1093–1104. doi: 10.1016/j.celrep.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhang L., Martini G.D., et al. Pufall M.A. SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site. Genome Res. 2018;28:111–121. doi: 10.1101/gr.222844.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bansal M., Kumar A., Yella V.R. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr. Opin. Struct. Biol. 2014;25:77–85. doi: 10.1016/j.sbi.2014.01.007. [DOI] [PubMed] [Google Scholar]
  • 26.Kribelbauer J.F., Rastogi C., et al. Mann R.S. Low-affinity binding sites and the transcription factor specificity paradox in eukaryotes. Annu. Rev. Cell Dev. Biol. 2019;35:357–379. doi: 10.1146/annurev-cellbio-100617-062719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen Y., Bates D.L., et al. Chen L. DNA binding by GATA transcription factor suggests mechanisms of DNA looping and long-range gene regulation. Cell Rep. 2012;2:1197–1206. doi: 10.1016/j.celrep.2012.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yella V.R., Bhimsaria D., et al. Bansal M. Flexibility and structure of flanking DNA impact transcription factor affinity for its core motif. Nucleic Acids Res. 2018;46:11883–11897. doi: 10.1093/nar/gky1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Crocker J., Abe N., et al. Stern D.L. Low affinity binding site clusters confer HOX specificity and regulatory robustness. Cell. 2015;160:191–203. doi: 10.1016/j.cell.2014.11.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kribelbauer J.F., Loker R.E., et al. Mann R.S. Context-dependent gene regulation by homeodomain transcription factor complexes revealed by shape-readout deficient proteins. Mol. Cell. 2020;78:152–167.e11. doi: 10.1016/j.molcel.2020.01.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Afek A., Ilic S., et al. Akabayov B. DNA sequence context controls the binding and Processivity of the T7 DNA Primase. iScience. 2018;2:141–147. doi: 10.1016/j.isci.2018.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Azad R.N., Zafiropoulos D., et al. Tullius T.D. Experimental maps of DNA structure at nucleotide resolution distinguish intrinsic from protein-induced DNA deformations. Nucleic Acids Res. 2018;46:2636–2647. doi: 10.1093/nar/gky033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dlakić M., Harrington R.E. The effects of sequence context on DNA curvature. Proc. Natl. Acad. Sci. USA. 1996;93:3847–3852. doi: 10.1073/pnas.93.9.3847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Brukner I., Sánchez R., et al. Pongor S. Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 1995;14:1812–1818. doi: 10.1002/j.1460-2075.1995.tb07169.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.El Hassan M.A., Calladine C.R. Two distinct modes of protein-induced bending in DNA. J. Mol. Biol. 1998;282:331–343. doi: 10.1006/jmbi.1998.1994. [DOI] [PubMed] [Google Scholar]
  • 36.Weber G., Essex J.W., Neylon C. Probing the microscopic flexibility of DNA from melting temperatures. Nat. Phys. 2009;5:769–773. doi: 10.1038/nphys1371. [DOI] [Google Scholar]
  • 37.Travers A.A. The structural basis of DNA flexibility. Philos. Trans. A Math. Phys. Eng. Sci. 2004;362:1423–1438. doi: 10.1098/rsta.2004.1390. [DOI] [PubMed] [Google Scholar]
  • 38.Bhattacharyya D., Kundu S., et al. Majumdar R. Sequence directed flexibility of dna and the role of cross-strand hydrogen bonds. J. Biomol. Struct. Dyn. 1999;17:289–300. doi: 10.1080/07391102.1999.10508362. [DOI] [PubMed] [Google Scholar]
  • 39.Kalodimos C.G., Biris N., et al. Kaptein R. Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science. 2004;305:386–389. doi: 10.1126/science.1097064. [DOI] [PubMed] [Google Scholar]
  • 40.Kitayner M., Rozenberg H., et al. Shakked Z. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2010;17:423–429. doi: 10.1038/nsmb.1800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zhang Y., Xi Z., et al. Crothers D.M. Predicting indirect readout effects in protein-DNA interactions. Proc. Natl. Acad. Sci. USA. 2004;101:8337–8341. doi: 10.1073/pnas.0402319101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Marathe A., Karandur D., Bansal M. Small local variations in B-form DNA lead to a large variety of global geometries which can accommodate most DNA-binding protein motifs. BMC Struct. Biol. 2009;9:24–26. doi: 10.1186/1472-6807-9-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Völker J., Klump H.H., Breslauer K.J. DNA energy landscapes via calorimetric detection of microstate ensembles of metastable macrostates and triplet repeat diseases. Proc. Natl. Acad. Sci. USA. 2008;105:18326–18330. doi: 10.1073/pnas.0810376105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Mandelkern M., Elias J.G., et al. Crothers D.M. The dimensions of DNA in solution. J. Mol. Biol. 1981;152:153–161. doi: 10.1016/0022-2836(81)90099-1. [DOI] [PubMed] [Google Scholar]
  • 45.Nelson H.C., Finch J.T., et al. Klug A. The structure of an oligo(dA)oligo(dT) tract and its biological implications. Nature. 1987;330:221–226. doi: 10.1038/330221a0. [DOI] [PubMed] [Google Scholar]
  • 46.Kim S., Broströmer E., et al. Xie X.S. Probing allostery through DNA. Science. 2013;339:816–819. doi: 10.1126/science.1229223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Olson W.K., Gorin A.A., et al. Zhurkin V.B. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl. Acad. Sci. USA. 1998;95:11163–11168. doi: 10.1073/pnas.95.19.11163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Satchwell S.C., Drew H.R., Travers A.A. Sequence Periodicities in chicken nucleosome core DNA rotational sequencing. J. Mol. Biol. 1986;191:659–675. doi: 10.1016/0022-2836(86)90452-3. [DOI] [PubMed] [Google Scholar]
  • 49.Rozenberg H., Rabinovich D., et al. Shakked Z. Structural code for DNA recognition revealed in crystal structures of papillomavirus E2-DNA targets. Proc. Natl. Acad. Sci. USA. 1998;95:15194–15199. doi: 10.1073/pnas.95.26.15194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Pasi M., Maddocks J.H., et al. Lavery R. μABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA. Nucleic Acids Res. 2014;42:12272–12283. doi: 10.1093/nar/gku855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Galindo-Murillo R., Roe D.R., Cheatham T.E. On the absence of intrahelical DNA dynamics on the μs to ms timescale. Nat. Commun. 2014;5:5152. doi: 10.1038/ncomms6152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Dans P.D., Danilāne L., et al. Orozco M. Long-timescale dynamics of the Drew-Dickerson dodecamer. Nucleic Acids Res. 2016;44:4052–4066. doi: 10.1093/nar/gkw264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Battistini F., Hospital A., et al. Orozco M. How B-DNA dynamics decipher sequence-selective protein recognition. J. Mol. Biol. 2019;431:3845–3859. doi: 10.1016/j.jmb.2019.07.021. [DOI] [PubMed] [Google Scholar]
  • 54.Zeiske T., Baburajendran N., et al. Mann R.S. Intrinsic DNA shape accounts for affinity differences between hox-cofactor binding sites. Cell Rep. 2018;24:2221–2230. doi: 10.1016/j.celrep.2018.07.100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Joshi R., Passner J.M., et al. Mann R.S. Functional specificity of a hox protein mediated by the recognition of minor groove structure. Cell. 2007;131:530–543. doi: 10.1016/j.cell.2007.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Dror I., Zhou T., et al. Rohs R. Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res. 2014;42:430–441. doi: 10.1093/nar/gkt862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Abe N., Dror I., et al. Mann R.S. Deconvolving the recognition of DNA shape from sequence. Cell. 2015;161:307–318. doi: 10.1016/j.cell.2015.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Rube H.T., Rastogi C., et al. Bussemaker H.J. A unified approach for quantifying and interpreting DNA shape readout by transcription factors. Mol. Syst. Biol. 2018;14 doi: 10.15252/msb.20177902. e7902-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Slattery M., Riley T., et al. Mann R.S. Cofactor binding evokes latent differences in DNA binding specificity between hox proteins. Cell. 2011;147:1270–1282. doi: 10.1016/j.cell.2011.10.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Passner J.M., Ryoo H.D., et al. Aggarwal A.K. Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex. Nature. 1999;397:714–719. doi: 10.1038/17833. [DOI] [PubMed] [Google Scholar]
  • 61.Chan S.K., Pöpperl H., et al. Mann R.S. An extradenticle-induced conformational change in a HOX protein overcomes an inhibitory function of the conserved hexapeptide motif. EMBO J. 1996;15:2476–2487. doi: 10.1002/j.1460-2075.1996.tb00605.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Mann R.S., Affolter M. Hox proteins meet more partners. Curr. Opin. Genet. Dev. 1998;8:423–429. doi: 10.1016/S0959-437X(98)80113-5. [DOI] [PubMed] [Google Scholar]
  • 63.Merabet S., Mann R.S. To Be specific or not: the critical relationship between hox and TALE proteins. Trends Genet. 2016;32:334–347. doi: 10.1016/j.tig.2016.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Pettersen E.F., Goddard T.D., et al. Ferrin T.E. UCSF Chimera - a visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  • 65.Bansal M., Bhattacharyya D., Ravi B. NUPARM and NUCGEN: software analysis and generation of sequence dependent nucleic acid structures. Comput. Appl. Biosci. 1995;11:281–287. doi: 10.1093/bioinformatics/11.3.281. [DOI] [PubMed] [Google Scholar]
  • 66.Bolshoy A., McNamara P., et al. Trifonov E.N. Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles. Proc. Natl. Acad. Sci. USA. 1991;88:2312–2316. doi: 10.1073/pnas.88.6.2312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Case D.A., Brozell S.R., et al. Kollman P.A. University of California, San Francisco; 2018. AMBER. [Google Scholar]
  • 68.Foos N., Maurel-Zaffran C., et al. Graba Y. A Flexible extension of the drosophila ultrabithorax homeodomain defines a Novel Hox/PBC interaction mode. Structure. 2015;23:270–279. doi: 10.1016/j.str.2014.12.011. [DOI] [PubMed] [Google Scholar]
  • 69.Billeter M., Güntert P., et al. Wüthrich K. Hydration and DNA recognition by homeodomains. Cell. 1996;85:1057–1065. doi: 10.1016/S0092-8674(00)81306-9. [DOI] [PubMed] [Google Scholar]
  • 70.Ades S.E., Sauer R.T. Specificity of minor-groove and major-groove interactions in a homeodomain-DNA complex. Biochemistry. 1995;34:14601–14608. doi: 10.1021/bi00044a040. [DOI] [PubMed] [Google Scholar]
  • 71.Bhattacharyya D., Bansal M. Groove width and depth of b-dna structures depend on local variation in slide. J. Biomol. Struct. Dyn. 1992;10:213–226. doi: 10.1080/07391102.1992.10508639. [DOI] [PubMed] [Google Scholar]
  • 72.Juo Z.S., Chiu T.K., et al. Dickerson R.E. How proteins recognize the TATA box. J. Mol. Biol. 1996;261:239–254. doi: 10.1006/jmbi.1996.0456. [DOI] [PubMed] [Google Scholar]
  • 73.Mack D.R., Chiu T.K., Dickerson R.E. Intrinsic bending and deformability at the T-A step of CCTTTAAAGG: a comparative analysis of T-A and A-T steps within A-tracts. J. Mol. Biol. 2001;312:1037–1049. doi: 10.1006/jmbi.2001.4994. [DOI] [PubMed] [Google Scholar]
  • 74.Rohs R., Jin X., et al. Mann R.S. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. doi: 10.1146/annurev-biochem-060408-091030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Haran T.E., Mohanty U. The unique structure of A-tracts and intrinsic DNA bending. Q. Rev. Biophys. 2009;42:41–81. doi: 10.1017/S0033583509004752. [DOI] [PubMed] [Google Scholar]
  • 76.Liepinsh E., Otting G., Wüthrich K. NMR observation of individual molecules of hydration water bound to DNA duplexes: direct evidence for a spine of hydration water present in aqueous solution. Nucleic Acids Res. 1992;20:6549–6553. doi: 10.1093/nar/20.24.6549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Duboué-Dijon E., Fogarty A.C., et al. Laage D. Dynamical disorder in the DNA hydration shell. J. Am. Chem. Soc. 2016;138:7610–7620. doi: 10.1021/jacs.6b02715. [DOI] [PubMed] [Google Scholar]
  • 78.Nikolova E.N., Bascom G.D., et al. Al-Hashimi H.M. Probing sequence-specific DNA flexibility in A-tracts and Pyrimidine-Purine steps by nuclear magnetic resonance 13 C relaxation and molecular dynamics simulations. Biochemistry. 2012;51:8654–8664. doi: 10.1021/bi3009517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Chong S.H., Ham S. Anomalous dynamics of water confined in protein-protein and protein-DNA interfaces. J. Phys. Chem. Lett. 2016;7:3967–3972. doi: 10.1021/acs.jpclett.6b01858. [DOI] [PubMed] [Google Scholar]
  • 80.Pan Y., Tsai C.J., et al. Nussinov R. Mechanisms of transcription factor selectivity. Trends Genet. 2010;26:75–83. doi: 10.1016/j.tig.2009.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Mathelier A., Xin B., et al. Wasserman W.W. DNA shape features improve transcription factor binding site Predictions in vivo. Cell Syst. 2016;3:278–286.e4. doi: 10.1016/j.cels.2016.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S6
mmc1.pdf (1MB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (4.9MB, pdf)

Data Availability Statement

The authors declare that the data supporting the findings of this study are available within the article and its supporting material. The raw MD simulation trajectories can be obtained from the corresponding author (Manju Bansal) upon reasonable request.


Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES