Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2001 Mar;10(3):504–518. doi: 10.1110/ps.35501

HMG-D complexed to a bulge DNA: An NMR model

Rachel Cerdan 1,1, Dominique Payet 1,2, Ji-Chun Yang 1, Andrew A Travers 1, David Neuhaus 1
PMCID: PMC2374130  PMID: 11344319

Abstract

An NMR model is presented for the structure of HMG-D, one of the Drosophila counterparts of mammalian HMG1/2 proteins, bound to a particular distorted DNA structure, a dA2 DNA bulge. The complex is in fast to intermediate exchange on the NMR chemical shift time scale and suffers substantial linebroadening for the majority of interfacial resonances. This essentially precludes determination of a high-resolution structure for the interface based on NMR data alone. However, by introducing a small number of additional constraints based on chemical shift and linewidth footprinting combined with analogies to known structures, an ensemble of model structures was generated using a computational strategy equivalent to that for a conventional NMR structure determination. We find that the base pair adjacent to the dA2 bulge is not formed and that the protein recognizes this feature in forming the complex; intermolecular NOE enhancements are observed from the sidechain of Thr 33 to all four nucleotides of the DNA sequence step adjacent to the bulge. Our results form the first experimental demonstration that when binding to deformed DNA, non-sequence-specific HMG proteins recognize the junction between duplex and nonduplex DNA. Similarities and differences of the present structural model relative to other HMG–DNA complex structures are discussed.

Keywords: Protein-DNA complex, NMR spectroscopy, HMG protein, bulge DNA, bent DNA


The high-mobility group (HMG) domain is a DNA-binding motif that occurs both in sequence-specific transcription factors, including LEF-1 and SRY, and in abundant chromosomal proteins typified by the vertebrate proteins HMG1 and HMG2 that bind DNA with little or no sequence specificity. The global fold of this domain is well conserved and consists of 3 α-helices arranged in an L-shape (Murphy and Churchill 2000; Travers 2000). This fold is stabilized by one major and two minor hydrophobic cores, the former consisting of three aromatic residues stacked edge to face. The concave surface of the HMG domain contacts the minor groove of linear B-form DNA inducing a substantial bend (Love et al. 1995; Jamieson et al. 1999; Lorenz et al. 1999; Murphy et al. 1999). In addition, both sequence-specific and nonspecific HMG domains exhibit a strong preference for binding to distorted DNA structures.

In this article we report an NMR model of the structure of HMG-D, one of the putative counterparts of HMG1 in Drosophila, bound to a DNA ligand containing a dA2 bulge. We chose this particular DNA ligand because it is a close structural analog of several biologically relevent targets for members of the nonspecific class of HMG proteins. Examples include UV-induced pyrimidine dimers, which are naturally occurring lesions in DNA, and also cis-platinum adducts (Sherman et al. 1985), which are induced in vivo by the anticancer drug cisplatin. Both pyrimidine dimer formation (Husain et al. 1988; Wang and Taylor 1991) and cis-platination (Gelasco and Lippard 1998; Coste et al. 1999) induce bends in the DNA of comparable magnitude to the bulge-induced bend (Aboul-ela et al. 1993; Dornberger et al. 1999) and have also been shown to be preferential ligands for nonspecific HMG domain class proteins (Pasheva et al. 1998; Pil and Lippard 1992). The interaction between these proteins and distorted DNA structures is of direct medical relevance. For example, steroid hormones induce overexpression of HMG-1 (Chau et al. 1998; He et al. 2000), a human counterpart of HMG-D, and sensitize breast cancer cells to cis-platin and the similar drug carboplatin (He et al. 2000). Related targets for nonspecific HMG proteins include four-way junctions (Bianchi et al. 1989; Webb and Thomas 1999), DNA microcircles (Payet and Travers 1997), and supercoiled DNA (Sheflin and Spaulding 1989; Sheflin et al. 1993).

HMG-D consists of a single HMG domain followed by a basic region and an acidic tail. It was previously shown (Payet and Travers 1997) that these flanking regions are important determinants of the DNA-binding properties of HMG-D, the basic region enhancing the affinity of the HMG domain for most DNA ligands and the acidic tail conferring the structural selectivity for such a bulged DNA. The DNA bend angle in a complex of HMG-D with a dA2 DNA bulge, as determined by FRET, is ∼95° (Lorenz et al. 1999). Two residues within the HMG domain of HMG-D, Val 32 and Thr 33 in the loop between α-helices I and II, are predicted to partially intercalate at the junction between duplex DNA and the bulged bases (Payet et al. 1999). Both these residues, especially Thr 33, are required for the efficient binding and bending of DNA by HMG-D (Payet et al. 1999), and in the crystal structure of HMG-D with a 10-bp duplex, both partially intercalate inducing a positive roll angle in the DNA (Murphy et al. 1999).

Our results form the first experimental demonstration that when binding to deformed DNA, non-sequence-specific HMG proteins recognize the junction between duplex and nonduplex DNA. In the present case, this junction occurs just beyond the dA2 bulge itself because the adjacent base pair is found to be disrupted in this DNA ligand, and we find that the sidechain of residue Thr 33 of the protein partially intercalates at this point. The results also confirm earlier footprinting work from this laboratory showing that the protein binds to the adjacent duplex rather than to the bend itself (Payet et al. 1999).

Results

Conditions for studying the complex

Complexes of full-length HMG-D with two different DNA ligands, each containing a bulge of two adenines interrupting a TTG sequence, were tested in this study. The first (an 18/16 mer; see Materials and Methods) was a slightly shortened version of the sequence used by Payet and Travers (1997) to demonstrate the preference of full-length HMG-D for bulged DNA over linear DNA. The second (a 14/12 mer; see Materials and Methods) was further shortened by removal of several AT base pairs in regions well away from the bulge and permutation of some GC base pairs at the termini; these changes were made to limit molecular mass and facilitate the NMR assignment. As in all cases the individual DNA strands have no appreciable self-complementarity, when the appropriate mixtures are annealed, only the bulge-containing heteroduplexes can form. The presence of these as pure species was fully confirmed by the physical and spectroscopic properties of the DNA, both when free and when complexed.

Complexes with both these DNA ligands were in fast exchange on the NMR chemical shift timescale. A clear indicator for this was the NHɛ signal of Trp 43, which moved from 10.13 to 10.51 ppm during titration of the protein with DNA to a 1 : 1 ratio but moved no further when excess DNA was added. For titrations of protein into DNA, imino signals of the DNA were similarly used to monitor stoichiometry. For both complexes, 1D, 1H, and 2D (15N, 1H) HSQC spectra were acquired under differing conditions of temperature (10°–40°C), pH (5.5–6.5), and salt concentration (10–200 mM) to locate the most favorable regime for NMR measurements. Addition of 5%, 10%, or 20% of 2H6-DMSO as cosolvent was also investigated, as this could increase the flexibility of the DNA by destabilizing the spline of water molecules lining the bottom of the major and minor grooves; in the event, however, no advantage was seen. Not unexpectedly, the shorter DNA ligand gave sharper NMR resonances for the complex, and the best overall combination of conditions found (adopted for the remainder of the study) was 30°C, pH 6.0, 10 mM NaCl, and 10 mM phosphate buffer. At higher temperatures, the HMG-D box domain starts to unfold (in either the free or bound states), while at lower temperatures broadening of many protein and DNA signals in the complex becomes more pronounced. Notably, however, signals from the basic region and acidic tail of the protein are not appreciably affected by these temperature changes.

To demonstrate binding of the shorter DNA to HMG-D 1–112, an electromobility shift assay (EMSA) was carried out. Because of the small number of base pairs, very dilute solutions of this DNA species would be expected to dissociate into single chains, precluding quantitative measurement of Kd for the complex using this method. However, the results do demonstrate that a 1 : 1 complex was formed and indicate an upper limit for the Kd of around 1 × 10−6 M (data not shown). A rough lower limit on Kd may be estimated using the observation that the NMR signals are in fast exchange for Δδ values of at least 0.5 ppm (i.e., 300 Hz at 600 MHz); this implies a minimum value for koff given by ∼(300 × π)/21/2 = ∼700 s−1, which in turn suggests a lower limit of ∼1 × 10−7 M for Kd, assuming kon to be under diffusion control with a value of ∼1010–109 M−1 s−1. Significant caution is needed in interpreting these data, however, as the conditions for the EMSA experiments were necessarily rather different from those of the NMR experiments.

NMR assignments

The strategy that we employed for assigning the NMR resonances of the complex and the free components is described in Materials and Methods. Significantly, many signals corresponding to sidechains at or near the protein–DNA interface of the complex were missing from the spectra, which hindered our efforts to define its structure using purely experimental data.

Given the importance of the residues of the basic region and acidic tail in modulating DNA-binding activity of the protein (Payet and Travers 1997), it is interesting to see how their NMR resonances are affected on complexation. The fact that these signals are sharp and resonate close to random coil chemical shifts indicates that the corresponding residues are largely unfolded, both in the free and bound forms of the protein. This is well demonstrated by projecting the CBCA(CO)NH data along the 15N axis so as to produce a (13C, 1H) plane (Fig. 1). Because of their slow relaxation, signals from the basic region and acidic tail dominate these spectra. Contributions from individual amino acid types may be quite easily distinguished using their random-coil 13C chemical shifts, and their numbers estimated. About five or six alanine residues are seen from the unstructured region in both spectra, while six are expected from the sequence; similarly, about six lysines are seen and nine expected, five aspartates seen and five expected, and four glutamates seen and four expected. In the case of glycine, more resonances (eight) are seen than expected (five). This is probably because of a partial deamidation-isomerization reaction at position Asn 80 to produce an iso-Asp residue (Chazin et al. 1989). This reaction occurs spontaneously at –Asn–Gly– sequences (particularly at elevated pH values) and also causes racemization, so the reaction could occur during isolation of the protein and the additional backbone amide signals seen for glycines could result from products having partial sequences at residues 80–81 such as –(L)Asp–Gly–, –(D)iso–Asp–Gly– and –(L)iso–Asp-Gly–. Such changes would not be expected to change appreciably the DNA-binding properties of the protein.

Fig. 1.

Fig. 1.

Fig. 1.

Projection along the 15N axis of 3D CBCA(CO)NH spectra of 13C, 15N double-labeled HMG-D 112, both (A) free and (B) bound to DNA. In A, some indicative assignments for peaks originating in the folded HMG domain are indicated, while in B, approximate random coil 13C shifts for different residue types are shown. Spectra were recorded at 600 MHz and 303 K. See text for further discussion.

Overall, comparison of these (and other) spectra shows that the resonances of the basic region and acidic tail are only very weakly affected by DNA binding in the present case. Almost all the chemical shifts of signals corresponding to this region of the protein are preserved on complexation, and their linewidths increase only slightly, as judged by the appearance of signals in the triple resonance and other spectra. Another point of interest is that the presence or absence of the basic region and acidic tail has negligible effects on signals from the remainder of the protein. Comparison of our assignments for free HMG-D 1–112 with the earlier data for free HMG-D 1–74 (Jones et al. 1994) shows that 1H and 15N shifts for the box domain are almost identical; aside from a few very localized effects of the truncation, differences are all within ±0.6 ppm for 15N and ±0.1 ppm for 1H. Thus it seems the residues of the basic region and acidic tail do not exert their effects on DNA binding selectivity by inducing significant changes in the structure of the box.

Turning to the DNA, a number of features in the spectra of the free DNA point to the conclusion that the T7–A20 base pair is probably not formed. In particular, the cross-strand NOESY cross peaks linking A20 H2 to the H2′ and (more weakly) H1′ signals of T7 are inconsistent with the geometry required for Watson-Crick base pairing. In addition, the (H1′, H2′) and (H1′, H2′′) J couplings for T7 (observed in the DQF-COSY spectrum) are nearly equal, suggesting a significant deviation from 2′-endo sugar pucker and again differentiating this nucleotide from others in the stems. Finally, the imino signal of T7 was not found in the spectra, whereas the other base pair adjacent to the bulge (G10–C19) gave an imino signal of comparable linewidth to those in the stems. The two adenines in the bulge itself gave rise to relatively weak cross peaks generally (and no imino resonances), but the only significant missing assignments were those of the two H2 signals. The sequential assignment pathway linking H1′ to H6/8 signals (and similarly that linking H2′/2" to H6/8 signals) can be traced through the whole of the bulged strand; also, the residues to either side of the bulge (T7 and G10) do not show NOE interactions with one another. Taken together, these last two observations strongly suggest that stacking is maintained at least approximately throughout the bulged nucleotides, with no base being looped out (Aboul-ela et al. 1993). On the opposite strand, the low-field shift of A20 H8 is similar to that seen for terminal purine residues, while the NOE connectivities between A20 H8 and C19 H1′/H2′ are very weak or absent, suggesting that stacking between A20 and C19 is at least partly disrupted. However, A20 and A21 appear to be well stacked according to all the usual indicators (the H2/H2, H8/H8, and aromatic-sugar NOE connectivities are all present).

For the bound DNA, absence of the C1–G26 and C14–G15 imino signals again indicates that these base pairs are probably fraying, while the imino signal for the T7–A20 base pair is also missing; again, the sequential assignment pathways linking H6/8 signals to H1′ and to H2′/H2" were maintained throughout the bulged residues, while for the opposite strand the low-field H8 signal of A20 and the pattern of NOE connectivities is once more consistent with interrupted stacking between A20 and C19. Taken together, these observations suggest that the conformation of the bulge region is broadly similar between the free DNA and the bound DNA.

NMR footprinting and detection of intermolecular interactions

Figure 2 shows a comparison of backbone 1H and 15N chemical shifts for residues 1–74 of HMG-D in the free and DNA-bound forms. Consistent with the expected mode of DNA binding, whereby the DNA principally contacts regions at the start of helix 1 and in helix 2 of the protein, the most extensive changes in shifts occur in the N-terminal part of the HMG-box, with the largest changes being in the region of helix 2.

Fig. 2.

Fig. 2.

Difference of free and bound chemical shifts for backbone amides of HMG-D in the box region (1–77). Shift differences for 15N are shown as solid bars; those for 1H are shown as hatched bars.

Comparison of the HSQC and NOESY data between free and bound forms also showed that certain residues of the protein suffered varying degrees of line broadening in the complex, to the extent that several disappeared totally from the spectra. Most of these residues are clustered around the two expected sites of DNA intercalation, namely Met 13 and Val 32. Thus, in (15N, 1H) HSQC spectra of the complex, no backbone NH signals could be assigned for residues Val 32 or Val 35. Similarly, the backbone NH signals of Leu 14, Trp 15, and Glu 34 and the sidechain NH2 signals of Asn 17 were extremely weak; since for the complex these latter signals were only visible in HSQC spectra, their assignments rest on analogy with HSQC spectra of free protein. Other parts of these residues suffered from weak or missing signals. For instance, in 2D NOESY spectra, the only signals found for Trp 15 include some of the ring signals (15Nɛ1, Hɛ1, Hζ2, Hδ1), while for Leu 14, signals were only assigned for the δ methyl groups. Severe broadening was observed in the HSQC spectrum for the backbone NH signals of Ser 10, Tyr 12, Met 13, Leu 16, Asn 17, and Thr 33. Sidechain signals for these residues were located mainly using the 3D 15N HSQC-NOESY and 2D (1H, 1H) NOESY spectra of the complex, and in most cases these signals were severely broadened or, in some cases, unassignable. For instance, the only sidechain signals seen for Met 13 comprised Hγ and the ɛ methyl, for Leu 16 only the δ methyl signals were seen, and for Tyr 12 only an Hβ signal; in all cases even these signals were extremely broad and weak. Some of these features may be seen by comparing the (15N, 1H) HSQC spectra of free and bound protein in Figure 3.

Fig. 3.

Fig. 3.

(15N, 1H) HSQC spectra of (A) free and (B) bound HMG-D 112. Assignments for NH backbone signals of some residues involved in key DNA interactions are indicated in B; these signals are markedly weaker and broader than others in the spectrum of bound protein, and few correlations could be found from these signals in other experiments (see text). Spectra were recorded at 600 MHz and 303 K.

Thus, the principle regions of the protein where signals are strongly affected by DNA binding are from 10 to 17 and from 32 to 38. Outside these regions, a few isolated residues exhibit some relatively large chemical shift changes upon DNA binding, notably Gly 40 NH (Δδ = 0.46 ppm), Trp 43 NHɛ1 (Δδ = 0.40 ppm), and Arg 44 backbone NH (Δδ = 0.45 ppm). The largest chemical shift change on complexation is for the Lys 37 backbone 15N signal, which is assigned to 119.1 ppm in the free protein and to 108.3 ppm in the complex. In light of this very large 15N shift change, we regard this assignment as remaining somewhat tentative. However, it is not inconsistent with the model, as Lys 37 NH forms a hydrogen bond to the carbonyl of Thr 33 in the free protein that might be disrupted or distorted when the sidechain of Thr 33 intercalates the DNA in the complex (see below).

Changes in the DNA resonances on complexation are somewhat more subtle than for the protein (Fig. 4) the only signals for which Δδ > 0.1 ppm are the H1′ of A8 and G10; the H2′ of A5 and T22; the H2′′ of A5, T6, T7, A8, A21, and T22; and the H3′ of T22. Nonetheless, the locations of these changes are largely confined to the first stem (i.e., from the C1–G26 base pair to T7 and A20) and to the bulge. Further evidence concerning the bulge region comes from 31P spectra (including a [31P, 1H] correlation spectrum of the free DNA), which show that signals from the phosphates of A8 and A9 are shifted downfield from the bulk of the 31P signals in the free DNA, but in the 31P spectrum of the complex, this relative shift is probably much reduced.

Fig. 4.

Fig. 4.

Differences of free and bound chemical shifts for 1H resonances of the DNA. Cases where δfree-δbound is negative are indicated by a minus sign close to the top of the relevant bar in the figure.

Extensive efforts were made to detect intermolecular NOE interactions. Complex containing (13C, 15N) double-labeled protein was used in 2D NOESY and 3D 13C HSQC-NOESY experiments with appropriate 13C half filters to search for cross peaks linking carbon-bound protons of the protein with any protons on the DNA (Otting and Wüthrich 1990); similarly, complex containing (2H, 15N) double-labeled protein was used in a 2D NOESY experiment including a 15N half filter in one dimension, so as to search for cross peaks linking nitrogen-bound protons of the protein (present because of solvent exchange once the deuterated protein is dissolved in H2O) with any protons on the DNA (Walters et al. 1997). In analyzing all of these spectra, great care was necessary throughout to exclude misinterpretation of artifactual cross peaks (such as intraprotein cross peaks linking amide 15N-H groups to the low level of residual carbon-bound protein protons in the experiments with complex containing perdeuterated protein). The overall procedure thus comprised an exhaustive combined analysis of the various filtered experiments in conjunction with all of the conventional 2D and 3D NOESY experiments.

At the end of this process, ∼10 cross peaks had been identified unambiguously as being intermolecular in origin. Of these, five could be unambiguously assigned to specific interactions, and a sixth was unambiguously assigned later in conjunction with preliminary model structures; these results are summarized in Table 1. This is quite a small number of cross peaks, but reasons are not hard to find. As discussed, many signals expected to be involved in the interface are substantially broadened in spectra of the complex, to an extent well beyond that expected purely on grounds of the increase in molecular mass, and in some cases signals disappear altogether in the complex. Such broadening greatly reduces the sensitivity of multidimensional NMR experiments, and particularly so the half-filtered experiments, as these necessarily include long fixed delays within the pulse sequence.

Table 1.

Intermolecular NOE interactions

HMG-D112 DNA 12/14mer Intensity categorya Experimentsb
Thr 33 Me T6 H1′ m 1, 2, 3
Thr 33 Me T7 H1′ w 1, 2, 3
Thr 33 Me A21 H1′ vw 1, 2, 3
Thr 33 Me A20 H1′ vw 1, 2, 3
Thr 33 Me A21 H2 vwc 1, 2, 3
Arg Hδd T18 H1′ and G12 H1′ w 4
Lys Hɛd T18 H1′ and G12 H1′ w 4
lle 30 Cγ H3 ?e w 5
Ser Hβf T4 H4′/H3′ w 3

a Intensity categories: m (medium), w (weak), vw (very weak).

b Experiments used (for all cases the sample was (13C, 15N) double-labeled protein complexed to DNA): 1, 2D NOESY in 2H2O (τm = 150 ms; 600 MHz); 2, 2D NOESY (τm = 150 ms; 800 and 600 MHz) with filter to reject 13C- and 15N-coupled protons during t1 and to accept 13C- and 15N-coupled protons during t2, 3, 3D 13C NOESY-HSQC (τm = 150 ms; 800 MHz) with filter to reject 13C- and 15N-coupled protons during t1; 4, 2D NOESY (τm = 150 ms; 800 MHz) in H2O; 5, 3D 13C NOESY-HSQC (τm = 150 ms; 600 MHz).

c Observed in experiment 1 only.

d Not assigned sequentially to a specific residue.

e The nearest DNA proton to lle 30 Cγ2H3 in the calculated structures is T7 H4′ (distance ∼8Å in some structures). These protons could give rise to an NOE cross peak through spin-diffusion via Lys 31 CβH2, which lies between them in the calculated structures, but as this assignment is very tentative, we did not consider using it as a constraint.

f Assigned only by using preliminary structures.

We consider the most likely cause for this differential broadening of some interfacial signals to be intermediate-rate exchange between many locally different conformations. One requirement for substantial exchange broadening is that the broadened signal undergoes a significant change of chemical shift between the exchanging states (Δδ > 0). Thus, one plausible reason why only a few intermolecular cross peaks are detected is that these few involve just a relatively small number of interfacial signals that, fortuitously, do not change their chemical shifts greatly between the various locally different conformations present at the interface and, hence, suffer less broadening than other interfacial signals. It is unlikely that these broadening effects are directly connected to the fact that the complex is in fast exchange with respect to dissociation. Exchange broadening requires that more than one contributing exchange form is significantly populated, but in fact there is very little of either the free protein or free DNA present in the samples. The NMR titration used to establish 1 : 1 stoichiometry could be in error by up to ∼5%, but we consider this not to leave a sufficient excess of either free component to explain the extent of the observed effects. Further, broadening caused by dissociation of the complex would be expected only for the component that is present in excess. We find that broadening is present both for protein and for DNA signals (though less so for the latter) and, moreover, that the observed broadening is at least roughly consistent over all samples, which would not be expected if there were randomly different errors in the stoichiometry of each.

Model of the complex

The highly dynamic nature of much of the HMG-D/bulge DNA complex, and particularly the effect this has on signals from contact regions, essentially precludes determination of a detailed, atomic-resolution structure for the interface based on NMR data alone. Nonetheless, by combining the available experimental data with a small number of nonexperimental restraints (see below and Materials and Methods), a restraint table was constructed from which reasonably precise model structures were calculated using essentially the same methodology as for a conventional NMR structure determination.

The two key assumptions made during this approach were that the protein binds in the minor groove of the DNA and that the sidechain of Met 13 intercalates into the DNA. The justification for both of these assumptions is very strong. Analysis of existing HMG protein-DNA complex structures has shown previously that in every case the protein lies in the minor groove of the DNA, with a few key protein sidechains in conserved positions intercalating between bases of the DNA (Hardman et al. 1995; Love et al. 1995; Werner et al. 1995; Allain et al. 1999; Murphy et al. 1999). For HMG-D, these sidechains can be identified with near certainty as those of Tyr 12, Met 13, Val 32, and Thr 33, based both on analogy with other cases and on the existing crystal structure of HMG-D 1–74 bound to a linear DNA duplex (Murphy et al. 1999). We chose to restrain the sidechain of Met 13 because this was at the opposite end of the interface from the experimental NOE-based restraints to Thr 33. Given the position of the Thr 33 sidechain contacts, the site of intercalation for Met 13 can reasonably be limited to either the A3–T24/T4–A23 step or the T4–A23/A5–T22 step, so restraints were defined so that they would be satisfied provided Met 13 intercalated into either of these sites, using the "r−6 summation" ambiguous NOE protocol of Nilges (1995); all of the calculated structures placed the Met 13 sidechain in the T4–A23/A5–T22 step. Restraints were set to atoms on both sides of the groove so as not to force the protein to one side or the other, and the restraint upper bounds were set to 6 Å, which is at least 1 Å longer than the longest corresponding distances found in the crystal structure.

To limit the search for the DNA structure to reasonable areas of conformation space, a number of further nonexperimental restraints, detailed in Materials and Methods, were introduced for the stems of the DNA (excluding T7 and A20). To an extent, these are needed to compensate for the fact that the XPLOR force field as implemented included no electrostatic terms, no hydrogen bonding terms (other than specifically assigned hydrogen-bonded distances), and no attractive van der Waals term. As already mentioned, preliminary calculations run with this restraint table allowed a further intermolecular interaction to be assigned, namely, linking the Ser 10 Hβ protons to the H4′ proton of T4; the final ensemble of structures was therefore calculated with this restraint included. The combination of calculation protocol and final restraint list resulted in a reasonable convergence rate, but even so, not all calculated structures had the protein located within the minor groove. After discarding those that did not, 10 structures were selected for further analysis (see Fig. 5) and minimized using the CHARMM 25A2 program as described in Materials and Methods. The CHARMM minimization resulted only in relatively small shifts of the coordinates (the rmsd for all heavy atoms was 0.61 ± 0.1Å across the 10 structures). Statistics for the resulting structures appear in Table 2.

Fig. 5.

Fig. 5.

Rmsd and XPLOR NOE energy profiles for the ensemble of calculated structures (before refinement using CHARMM). The rmsd values are independently calculated using each ensemble size, adding successive structures in order of increasing XPLOR NOE energy term. Missing points on the rmsd profile correspond to structures that were rejected because the protein was not wholly in the minor groove of the DNA. Filled circles represent the XPLOR energy terms, and open symbols represent the rmsd values corresponding to superposition using the protein (N, Cα, and C′ atoms of residues 11–59; open triangles with horizontal base), stem 1 of the DNA (heavy atoms of nucleotides 1–7 and 20–26; open triangles with horizontal top), the whole DNA (heavy atoms of all nucleotides; open diamonds) and the complex (N, Cα, and C′ atoms of protein residues 11–59 and heavy atoms of nucleotides 1–7 and 20–26; open circles).

Table 2.

Structural statistics for the ensemble of 10 model structures

Mean rmsd to mean structure XPLOR structures CHARMM structures
Protein
N, Cα and C′, res. 11–59 0.71 ± 0.23 Å 0.73 ± 0.22 Å
DNA
Heavy atoms, stem 1 (C1-G26 to T7-A20) 0.75 ± 0.21 Å 0.72 ± 0.21 Å
Heavy atoms, all nucleotides 1.79 ± 0.56 Å 1.76 ± 0.59 Å
Complex
N, Cα and C′, residues 11-59 (protein) + heavy atoms stem 1 (DNA) 0.87 ± 0.17 Å 0.87 ± 0.17 Å
XPLOR E (total) 708.2 ± 69.8 kcal mol−1
XPLOR E (noe) 259.9 ± 10.5 kcal mol−1
XPLOR E (cdih) 108.4 ± 27.7 kcal mol−1
Rmsd from distance restraints: 0.0445 ± 0.005 Å
Rmsd from torsion restraints: 1.21 ± 1.22°

The overall architecture of the model of the complex is similar to that of other known HMG–DNA complexes, with the protein accommodated into a widened minor groove and the key intercalation interactions involving insertions into sites two base pairs apart (Fig. 6) Thus, the sidechains of Val 32 and Thr 33 are inserted into the T6–A21/T7–A20 step, while the sidechains of Tyr 12 and Met 13 are inserted into the T4–A23/A5–T22 step. Nucleotides T7 and A20 do not form a base pair; they consistently appear in the calculated structures with the base of A20 roughly parallel to that of T7 but displaced toward T6 and inserted slightly between the bases of T6 and T7. Other features of the bulge region show more variability across the ensemble, as does the bend angle of the DNA over the bulge (which varies between ∼40° and 70°). The stem carrying the protein is also significantly bent, particularly at the two intercalation sites (on average over the ensemble, the bend angle at the T4/A5 step is 22 ± 2°, and at the T6/T7 step it is 10 ± 2°).

Fig. 6.

Fig. 6.

Fig. 6.

Two views of the complex, using a ribbon representation for the protein. The lowest-energy structure from the final ensemble of 10 is shown. Key interfacial protein residues are shown in ball-and-stick style as follows: Ser 10 in pink, Tyr 12 in black, Met 13 in dark blue, Val 32 in cyan, and Thr 33 in green. (A) This view shows the two bulged adenines A8 and A9, the distorted T7-A20 base pair, the intercalation of the key interfacial residues, and the manner in which the L-shape of the protein follows the bent DNA helix of stem 1. (B) This view shows how the protein lies in the widened minor groove of stem 1 of the DNA. The figure was prepared using the program MOLMOL (Koradi et al. 1996).

Some key features of the interface in these model structures are highlighted by the CHARMM interaction energy histogram shown in Figure 7. The residues known to be involved in DNA intercalation show some of the largest intermolecular interactions, that to Met 13 being of course large in all structures as it was restrained, while several nearby residues, such as Pro 8 and Leu 9, pack against the DNA backbone. A number of other residues show significant interaction energies in several of the structures, including particularly a number of basic residues that can interact with phosphates of the DNA backbone. Although these are not wholly consistent among the different ensemble members, the data suggest that such basic sidechain to phosphate interactions include Arg 20 to T7, Lys 31 to A8, Lys 37 to A23, Arg 44 to T24, Lys 49 to C25, and Lys 60 to G26. In some cases, basic sidechains show interactions with different phosphates in different ensemble members; this probably reflects mainly the methods used to calculate the structures but could well also reflect some genuine variability in the solution structure. Examples include Lys 6 (which interacts with the phosphates of either G2 or A3) and Arg 7 (which interacts with the phosphates of T4, A5, or G26). Another phosphate interaction is that of Trp 43 Nɛ H with T24, which presumably accounts for the relatively large chemical shift for the Trp 43 NɛH signal on complexation. The apparently unfavorable interaction energy of Glu53 is probably artifactual; in the calculated structures, this sidechain consistently approaches the phosphate of C25, but in reality it is likely to be involved in a salt bridge within the protein; for instance, with Lys 49. Interfacial hydrogen bonds from neutral protein sidechains are less consistently represented in the ensemble, but some that appear include Ser 10 OH to the O2 of T4 or T24, Tyr 12 OH to A23 O4′, and Asn 17 NH2 to A5 O3′; the donor-acceptor distances (typically 3–4 Å in the CHARMM structures) indicate that these may be water mediated.

Fig. 7.

Fig. 7.

Histogram showing protein–DNA interaction energies as a function of residue number, calculated using the program CHARMM for all 10 of the finally selected structures. Contributions to each interaction energy from the individual structures are indicated using different shading styles.

Discussion

The general features of HMG domain–DNA complexes so far described are highly conserved. In B-type domain complexes, the binding face of the domain presents a hydrophobic surface that conforms to a wide and shallow minor groove. In the center of the surface, a hydrophobic wedge is inserted deep into the minor groove. Within this wedge, one residue (Met in LEF-1 [Love et al. 1995], NHP6A [Allain et al. 1999], and HMG-D [Murphy et al. 1999]; Ile in SRY [King and Weiss 1993; Werner et al. 1995]) partially intercalates between two base pairs. The other components of this wedge, comprising residues 9, 12, and 43, are also conserved. The hydrophobic surface is flanked by a set of conserved basic residues that bind to the phosphodiester backbones and stabilize the complex. The non-sequence-specific HMG domains, but not the sequence-specific domains of SRY and LEF-1, also partially interact via a hydrophobic residue(s) close to the N terminus of helix 2 at a secondary site approximately two base steps distant from the primary intercalation. The nature of this interaction is variable. In the NMR model of the complex of NHP6A with a linear 20 mer (Allain et al. 1999), Phe 48 (corresponding to Val 32 in HMG-D) is proposed to stack in an edge-to-face manner with bases exposed in the minor groove. In contrast, in a crystal structure of the HMG domain (residues 1–74) of HMG-D with a linear 10 mer (Murphy et al. 1999), Val 32 and Thr 33 partially intercalate at the same step. In addition, this latter structure suggests a similar role for Ala 36 at the base step between the primary and secondary intercalations. Our data confirm that, as predicted from the model of an HMG-D–bulge DNA complex (Payet et al. 1999), Thr 33 partially intercalates in a similar manner to its intercalation in linear DNA (Murphy et al. 1999).

Although the presence of a DNA bulge normally increases the affinity of HMG-D relative to a corresponding duplex (Payet and Travers 1997), it is notable that, when bound, the protein does not interact directly with the bulged bases themselves. This contrasts with the A domain of HMG1, where Phe 37, in a similar position to Val 32 in HMG-D, partially intercalates directly into a cis-platin-induced kink at a G–G step (Ohndorf et al. 1999). This observation is concordant with the conclusions of Webb and Thomas (1999), who demonstrated that the A domain of HMG1 preferentially bound to the crossover of a four-way junction, while the B domain, which is structurally homologous to HMG-D, bound to the arms. A possible inference is that HMG-D, and by extension other B-type HMG domains, prefers to bind to a more smoothly bent duplex DNA than to preformed kinks. In this situation, the enhanced affinity for DNA bulges could be explained by the somewhat wider minor groove in the immediate vicinity of the bulge. As in the case of minicircles (Payet and Travers 1997), such a widening could accommodate the region containing Val 32 and Thr 33.

The overall architecture of the complex as presented here is fairly similar to that predicted in the earlier model of Payet et al. (1999), with the protein lying predominantly to one side of the bulge. The model predicts an interaction between the basic region of HMG-D and the phosphate backbone of one strand of the DNA on the other side of the bulge. While there is clear biochemical evidence that the basic region does increase DNA-binding affinity, it is also apparent from this work that such interactions as do occur with the present bulged DNA are insufficient to cause any appreciable effect on the NMR properties of the residues of either the basic region or the acidic tail. This contrasts with other NMR studies where intermolecular NOE interactions were detected between HMG proteins and DNA; for instance, for LEF1 (Love et al. 1995), NHP6A (Allain et al. 1999), and most recently, HMG-D 1–100 (Dow et al. 2000). However, in all those cases the part of the protein outside the structured HMG domain comprises only a basic region, so it could be that the presence of the acidic tail in full-length HMG-D modifies the interaction between the unstructured portion of the protein and the DNA, making it undetectable by NMR.

At a more detailed level, some features of the interface between residues of the HMG box domain and the DNA are clearly also in line with expectations based on previous work. For instance, the sidechains of residues Met 13, Val 32, and/or Thr 33 are expected to intercalate partially or completely into the DNA. Of these, residues Met 13 and Val 32 are among the most severely broadened in the complex (to the extent that no sidechain signals could be detected for them), while residue Thr 33 gives the clearest evidence for intermolecular interactions. However, these cross peaks lead directly to the clearest apparent deviation from the predictions of the model. Our observations show that the sidechain of Thr 33 is inserted into the DNA sequence between the bases of T6 and T7, whereas in the model, Thr 33 and Val 32 were both inserted into the T7–A8 step (thereby involving directly the first of the bulged adenines). While we see no direct evidence for the position of Val 32, it seems most likely that this sidechain is also inserted into the same step in the DNA sequence as that of Thr 33 (and it is consistently placed there by the calculations leading to the models), leading to the conclusion that the position of the protein is shifted by one base pair away from the bulge, relative to the position to which it was assigned in the earlier model. We note, however, that, as discussed above, the bases T7 and A20 immediately on one side of the bulge do not appear to form a canonical base pair in either the free DNA or the complex. This contrasts with the results of NMR studies of bulged DNA species (Aboul-ela et al. 1993; Dornberger et al. 1999), similar to that studied by Payet et al. (1999). We infer that the preferred binding site of HMG-D is, at least in part, determined by a DNA distortion, in this case a lack of canonical base-pairing, and that the principle that Val 32 intercalates at the boundary between an intact DNA duplex and a distorted structure is conserved.

Materials and methods

Protein expression and purification

Expression of HMG-D 112 was carried out essentially as described by Payet and Travers (1997). The protein sequence is MSDKPKRPLS AYMLWLNSAR ESIKRENPGI KVTEVAKRGG ELWRAMKDKS EWEAKAAKAK DDYDRAVKEF EANGGSSAAN GGGAKKRAKP AKKVAKKSKK EESDEDDDDE SE (written in blocks of 10 purely to aid readability). 15N- and 13C-labeled samples were prepared by growing on minimal medium containing 15NH4Cl and U-13C6 glucose. Cells were grown at 37°C, and expression of HMG-D 112 was induced by adding isopropyl-β,D-thiogalactopyranose to a final concentration of 250 μg/mL at A600 ∼0.8. Growth was continued at 25°C for 4 h to overproduce HMG-D. For the preparation of (2H, 15N) double-labeled protein, Martek 9-dN was used as the growth medium; in this medium, cells were grown during 24 h after induction by IPTG. Purification of HMG-D 112 was carried out as described by Jones et al. (1994) and Churchill et al. (1995). Approximate overall protein yields were 4–6 mg/L for the (13C, 15N) double-labeled protein and 0.5 mg/L for the (2H, 15N) double-labeled protein. Final samples were concentrated to 0.5–1.0 mM in 10 mM phosphate buffer (pH 6), 20 mM NaCl, 5% 2H2O, and 0.02% NaN3.

DNA purification

The two single-stranded oligonucleotides (14-mer dCGATATTAAGAGCC and 12-mer dGGCTCAATATCG) were synthesized on an Applied Biosystems ABI394 DNA synthesizer and then purified by FPLC with an anion exchange column (Resource Q, Pharmacia) using a salt gradient in neutral conditions (10 mM Tris, 0.05–1.0 M NaCl). A Sep-Pack C18 cartridge was used to desalt the samples. The two oligonucleotides were first lyophilized and then dissolved in H2O or 2H2O, and their concentrations determined by measuring absorbance at 260 nm. Finally, the two strands were annealed at a 1 : 1 molar ratio at 1.0–1.5 mM concentration in the same buffer as the protein (omitting the sodium azide). The purity of the duplex was checked using analytical native gel electrophoresis on a 20% acrylamide gel, which showed only one band. The 18/16 oligomer, which was prepared similarly, had the sequence 18-mer dGCAAATATTAAGAAAACG; 16-mer dCGTTTTCAATATTTGC.

Preparation of the complex

To ensure that a 1 : 1 ratio of protein to DNA was achieved in samples of the complex, small aliquots of lyophilized protein were added to a solution of the DNA, following the stoichiometry by using chemical shift changes of the imino signals in 1D NMR spectra recorded between each addition. Alternatively, aliquots of the DNA were lyophilized and added to a solution of protein. In this case, the titration was mainly followed using 2D HSQC spectra recorded between each addition, using the NɛH signal of Trp 43 and the NH signal of Gly 40 to monitor stoichiometry, as these show significant changes on complexation. Most of the temperature, pH, salt concentration, and DMSO tests were performed using a 15N HMG-D–12/14-mer sample at 1 mM. The preferred conditions for studying the complex by NMR were found to be 20 mM NaCl, 10 mM phosphate buffer (pH 6) at 30°C. (15N, 13C) HMG-D–12/14-mer samples in H2O and in 2H2O, at 1 mM and 0.8 mM, respectively, were used for the majority of NMR experiments. The (2H, 15N) HMG-D–12/14-mer sample was at 0.2 mM in the same buffer.

NMR spectroscopy

NMR spectra were recorded on Bruker DMX 600, AMX 500, and Avance 800 spectrometers, equipped with either a 5-mm quadruple-resonance (1H/15N/13C/31P) probe (500 MHz) or a 5-mm triple-resonance (1H/15N/13C) probe (600 and 800 MHz). Data were processed using the program XWIN-NMR (Bruker GmbH) and analyzed using the programs XWIN-NMR and Felix (Molecular Simulations). Homonuclear experiments (for the DNA assignment in both the free and bound states) included 2D (1H, 1H) NOESY, TOCSY, and COSY experiments in 2H2O (spectral widths of 6613.75 Hz in both dimensions, using selective irradiation during the relaxation delay to suppress the water signal), and 2D (1H, 1H) NOESY experiments in H2O (spectral widths of 12,500 Hz in both dimensions, using a water flip-back scheme to suppress the water signal). Heteronuclear experiments (for labeled protein in both the free and bound states) included 2D 15N-HSQC (generally acquired with spectral widths of 4000 Hz for 15N and 8012.82 Hz for 1H), 3D HNCA, 3D CBCA(CO)NH, 3D 15N NOESY-HSQC, 3D 13C NOESY-HSQC, and 3D HCCH-TOCSY (free protein only). In the 3D experiments, spectral widths were generally 8012.82 Hz for protons (9615.4 Hz in HNCA), 1600 Hz for 15N, while for 13C dimensions they were 3000 Hz (HNCA), 10563.66 Hz (CBCA(CO)NH), or 9615.4 Hz (13C NOESY-HSQC), respectively. The following half-filtered experiments were acquired: first, a 2D NOESY (τm = 150 msec) run at 600 MHz with half-filters set to reject 13C- and 15N-coupled protons during t1 and to accept 13C- and 15N-coupled protons during t2, with heteronuclear decoupling applied during t2 but not during t1; second, a 2D NOESY (τm = 150 msec) run at 800 MHz with half-filters set to accept 13C- and 15N-coupled protons during t1 and to reject 13C- and 15N-coupled protons during t2, and with heteronuclear decoupling applied during t1 but not during t2; third, a 3D 13C NOESY-HSQC experiment (τm = 150 msec) run at 800 MHz with a filter set to reject 13C- and 15N-coupled protons during t1 and no heteronuclear decoupling applied during t1. All spectra were acquired in phase-sensitive mode, and frequency discrimination in indirect dimensions was achieved using TPPI, States-TPPI, or echo-antiecho (for 15N or 13C dimensions with gradient selection). Water suppression was achieved by selective irradiation during the relaxation delay and during the mixing time in NOESY and HSQC-NOESY experiments. 1H, 13C, and 15N chemical shifts were referenced following the method described by Wishart et al. (1995), using sodium 3,3,3-trimethylsilylpropionate (TSP) as an internal 1H reference.

Assignment

Signals for both the free and bound protein were assigned using a combination of through-coupling and NOE-based experiments. For the free protein, almost all 1H and 15N signals from the box region were assigned straightforwardly by analogy with the previously reported assignments for HMG-D 1–74 (Jones et al. 1994), while the 13Cα and 13Cβ assignments were obtained by analyzing HNCA and CBCA(CO)NH spectra, with HCCH-TOCSY data being used to obtain other sidechain 13C assignments. Using this approach, most of the 13Cα and 13Cβ signals from the box region (all but six Cα and all but 16 Cβ signals) were assigned, as were the majority of those from the acidic tail (all but two Cα and two Cβ signals). In the basic region, the highly repetitive nature of the sequence made full analysis difficult, but even so, assignments were obtained for seven residues.

Spectra from through-coupling experiments contained substantially fewer peaks in the case of the complex than in that of the free protein (Fig. 1), forcing the assignment process for the complex to make substantially greater use of NOE-based experiments. For signals of the box region, assignment was accomplished using a combination of triple-resonance experiments, comparison of data between the free and bound forms, and analysis of NOE connectivities. In contrast, the substantially unfolded basic region and acidic tail gave signals with good sensitivity in the triple-resonance experiments even for the complex, but these suffered from spectral crowding, as they all resonated close to random coil chemical shifts. Overall, almost all amide group 1H and 15N assignments were made in the box region and acidic tail (but only two in the highly overlapped basic region), while for the remaining proton signals of the box region, ∼50% of assignments were made, with a lower proportion in the unfolded basic region and acidic tail. Carbon assignments were difficult to make, but around one-third of 13Cα signals from the box and basic region were assigned, together with all but two of the 13Cα and 13Cβ signals from the acidic tail.

Assignment of the DNA signals for the 12/14 oligonucleotide, both free and bound, followed an essentially conventional strategy. 2D (1H, 1H) NOESY, TOCSY, and COSY data recorded in 2H2O were used to establish the spin-systems of the bases and sugars (excluding most H5′/H5" signals) and to provide sequential relationships through connectivities linking H1′ and H2′/H2" sugar signals to aromatic protons on neighboring bases. Imino protons were assigned using 2D (1H, 1H) NOESY data recorded in H2O. As expected, base pairs C1–G26 and C14–G15 gave relatively weak and broad imino signals as a result of rapid exchange with solvent, presumably because of fraying of the ends of the helices. When assigning the DNA signals from the complex, a 2D (1H, 1H) NOESY spectrum recorded from an H2O solution of a sample of the complex containing perdeuterated protein proved extremely useful, as it showed connectivities for the bound DNA freed from overlap with protein signals.

Model structure calculations

Structures were calculated using the program XPLOR 3.851 (Brünger 1992) and refined using the program CHARMM 24 (MacKerell et al. 1998). Input to the XPLOR calculations, in addition to the primary sequences of the protein and two DNA chains, comprised 760 NOE-derived distance restraints for the free protein, 48 torsion angle and 10 backbone H-bonding restraints (for five hydrogen bonds) for the free protein, 227 NOE-derived distance restraints for the DNA in the complex, 39 base-pairing H-bond restraints (for 11 basepairs), a number of nonexperimental restraints on the DNA (detailed below), and a small number of intermolecular distance restraints. The NOE, torsion angle, and backbone H-bonding restraints for the free protein were essentially those used previously to calculate the free protein structure (Jones et al. 1994) but were modified both to delete all restraints on the conformation of sidechains likely to be involved in the protein-DNA interface (residues 12, 13, 32, 33) and to remove multiplicity corrections so as to make the input compatible with use of r−6 summation for restraints involving symmetrical and nonstereoassigned groups. The NOE restraints for the DNA were compiled using experimental data from the complex, in particular the 2D (1H, 1H) NOESY spectrum recorded from an H2O solution of a sample of the complex containing perdeuterated protein.

These NOE-based DNA restraints were augmented by a number of nonexperimental restraints designed to improve convergence by restricting the search to reasonable regions of conformation space. Aside from the hydrogen bonding restraints (for G–C pairs, these were set to N1–N3 < 3.0 Å, H1–N3 < 2.0 Å, N2–O2 < 3.0 Å, and O6–N4 < 3.0 Å; for A–T pairs, these were set to N1–N3 < 3.0 Å, N1–H3 < 2.0 Å, and N6–O4 < 3.0 Å) applied to enforce Watson-Crick base pairing for the nucleotides of the stems (other than C1–G26, T7–A20, and G14–C15), these restraints were very loose and comprised the following: (i) Torsion angle restraints to keep backbone angles in the DNA stems (excluding T7 and A20) in the general region of B-form values. These were set to −110° ≤ α ≤ −30°, +145° ≤ β ≤ +195°, +35° ≤ γ ≤ +85°, −210° ≤ ɛ ≤ −130°, and −160° ≤ ζ ≤ −60°; note that these ranges are set so widely that no backbone angle in the double-helical DNA of the complexes with LEF-1 (Love et al. 1995), SRY (Werner et al. 1995), or HMG-D (Murphy et al. 1999) would violate them. (ii) Lower-limit restraints for 1,3-related phosphorus–phosphorus distances within the stems to prevent excessive kinking of the backbone. These were applied to the following phosphorus–phosphorus distances: G2–T4, A3–A5, T4–T6, A5–T7, G10–G12, A11–C13, G12–G14, G16–T18, C17–C19, A20–T22, A21–A23, T22–T24, A23–C25, and T24–G26. The restraints were set to allow a closest approach of 10.5 Å; for comparison, the minimum 1–3 related phosphorus–phosphorus distance observed in the very highly kinked DNA double helix of the TATA-box protein-DNA complex (Kim et al. 1993) was 10.8 Å. (iii) Lower-limit distance restraints to prevent the two DNA stems from approaching each other too closely. These were set to allow a closest approach of 9 Å and were applied to the following phosphorus–phosphorus distances: A3–T18, A3–C19, A3–A20, T4–T18, T4–C19, and T4–A20. (iv) Restraints on internal torsion angles of the sugar rings within the stems (excluding T7 and A20) to produce approximate 2′-endo puckering; these were set to −30° ≤ ν0 ≤ −10°, +10° ≤ ν1 ≤ +50°, −50° ≤ ν2 ≤ −10°; 0° ≤ ν3 ≤ +40°, and −20° ≤ ν4 ≤ +20°). (v) Restraints to maintain planarity within the base pairs (but note that no such restraint was applied to T7 and A20). (vi) Upper-limit distance restraints between the tips of the nucleotides in or near the bulge to prevent these bases looping out completely; upper limits of 6 Å were set for the distances T6 C4 to T7 C4, T7 C4 to A8 N1, A8 N1 to A9 N1, A9 N1 to G10 N1, and A20 N1 to A21 N1. All of these distance and torsion angle restraints were applied in exactly the same manner as the NMR-based distance and torsion angle restraints, using the same force constants.

The experimental NOE-based intermolecular restraints comprised the four distance restraints to the Thr 33 sidechain methyl group (see Table 1; set to 5 Å) and (in the final round of calculations) the restraint from the Ser 10 Hβ protons to T4 H4′. In addition, nonexperimental restraints involving the Sδ atom of Met 13 were applied; these comprised restraints to A5 N9 and T22 N1, and "ambiguous" restraints to either T6 N1 or T4 N1 and to either A21 N9 or A23 N9.

Starting structures were constructed by randomizing all rotatable backbone angles in the protein (φ and ψ angles for each residue) and DNA (α, β, γ, ɛ, and ζ angles for each nucleotide) chains and then placing the three chains with their centers of mass at the vertices of an equilateral triangle with sides of length 60 Å. These starting coordinates were subjected to a two-stage simulated annealing/refinement protocol based on that of Allain et al. (1996) for protein–RNA complexes but modified to constrain the three chains to their initial starting separations (by fixing the coordinates of atoms Ala36 Cα, T7 C1′, and A20 C1′) during the first part of the high-temperature dynamics stage; this initial positioning and constraining of the three chains was found empirically to improve convergence. The numbers of steps in each phase of the simulated annealing protocol were set to 1000 (initial minimization) 15,000 (high-temperature dynamics with Ala 36 Cα, T7 C1′ and A20 C1′ constrained), 30,000 (high-temperature dynamics with chain separations unconstrained), 3000 (increase VDW force constant and adjust asymptote), 2000 (slow cool), and 1500 (minimization); for the refinement stage of the protocol, they were set to 1000 (initial minimization), 2000 and 2000 (two stages of RMD while increasing torsion angle force constant from 5.0 to 50 kcal mol−1), 600 (cool), and 4000 (final minimization). Using this protocol, an ensemble of 50 structures was calculated. For the DNA, parameter and topology files from the program CNS version 1.0 (Brünger et al. 1998) were used ("dna-rna-allatom.param" and "dna-rna-allatom.topol") so as to allow correct 2′-endo puckering to be achieved in the sugar rings (in conjunction with the appropriate torsion restraints already mentioned). Force constants were set to 50 kcal mol−1 for distances and torsion angles, and a weight of 200 was applied for base-pair planarity restraints.

Structures were selected for further analysis based on both low values of the NOE energy term (<400 kcal mol−1) and the value of the dihedral angle θ formed by atoms Ser 10 Cβ, Thr 33 Cγ2, Thy 10 H4′, and Thy 4 H4′. This dihedral spans almost the whole length of the protein–DNA interface and is diagnostic for correct alignment of the components; models having the protein correctly positioned into the minor grove are always associated with θ values in the range of approximately −35° to −55°. Using these criteria, 10 structures from a total of 50 were selected and used as input for refinement in the program CHARMM.

In the final stage of the refinement, the CHARMM 25A2 program (Brooks et al. 1983) was used with the PARM22 all hydrogen force field (MacKerell et al. 1998). Distance-dependent dielectric (ɛ = 4r) was used to compensate for the effect of the neglected solvent. Nonbonded interactions were cut off at 8.5 Å with a shift function applied between 6.5 and 7.5 Å for the electrostatic and van der Waals terms. No NMR restraints were included. A harmonic restraint was applied to parts of the model; 8 kcal mol−1 Å−1 on protein backbone, 6 kcal mol−1 Å−1 on DNA except bulge, 2 kcal mol−1 Å−1 on bulge (nucleotides 7–10, 19, and 20). The force constants for these restraints were gradually reduced to zero using the approach of Ashfar et al. (Afshar et al. 1994); this employs three cycles of steepest descent and ABNR minimization (Brooks et al. 1983). The models were further minimized to a constant gradient of 0.2 kcal mol−1 Å−1 using ABNR. Interaction energies (electrostatic and van der Waals) were calculated between each protein residue and the whole DNA.

The program CLUSTERPOSE (Diamond 1992, 1995) was used to calculate average structures of ensembles of NMR structures and determine the mean rmsd of these ensembles to the mean structure, and bend angles in the DNA were measured using the program CURVES (Lavery and Sklenar 1989).

Electronic data deposition

NMR resonance assignments have been deposited at BioMagResBank under accession numbers 4732 (free protein), 4733 (free DNA), and 4734 (complex). The 10 final structures (refined by CHARMM) have been deposited in the pdb databank under accession code 1E7J.

Acknowledgments

We thank Niko Goeke and Nick Lowe for helping to prepare the deuterated protein sample; Fareed Aboul-ela and Mohammad Afshar for generous assistance with setting up the X-PLOR and CHARMM calculations and for many helpful discussions; Graeme Mitchison for assistance with analysis software; and the National 800 MHz NMR facility, University of Cambridge (particularly Dani Nietlispach), for machine time, assistance with the 800 MHz NMR experiments, and many useful discussions. R.C. and D.P. held Marie Curie grants from the European community, contract numbers ERBFMBICT972482 and ERBFMBICT961777, respectively.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

Abbreviations

  • ABNR, adopted-basis Newton-Raphson

  • COSY, correlation spectroscopy

  • FRET, fluorescence resonance energy transfer

  • HMG, high-mobility group

  • HSQC, heteronuclear single quantum correlation

  • IPTG, isopropyl-β,D-thiogalactopyranoside

  • NOE, nuclear Overhauser effect

  • NOESY, nuclear Overhauser effect correlation spectroscopy

  • rmsd, root mean squared deviation

  • TOCSY, total correlation spectroscopy

  • TPPI, time-proportional phase incrementation

To facilitate the distinction between protein and DNA, amino acid residues are referred to throughout using three-letter codes (except in the figures), whereas nucleotides are referred to using one-letter codes.

Article and publication are at www.proteinscience.org/cgi/doi/10.1110/ps.35501.

References

  1. Aboul-ela, F., Murchie, A.I.H., Homans, S.W., and Lilley, D.M.J. 1993. Nuclear magnetic resonance study of a deoxyoligonucleotide duplex containing a three base bulge. J. Mol. Biol. 229 173–188. [DOI] [PubMed] [Google Scholar]
  2. Afshar, M., Caves, L.S.D., Guimard, L., Hubbard, R.E., Calas, B., Grassy, G., and Haiech, J. 1994. Investigating the high-affinity and low sequence specificity of calmodulin binding to its targets. J. Mol. Biol. 244 554–571. [DOI] [PubMed] [Google Scholar]
  3. Allain, F.H.-T., Yen, Y.-M., Masse, J.E., Schultze, P., Dieckmann, T., Johnson, R.C., and Feigon, J. 1999. Solution structure of the HMG protein NHP6A and its interaction with DNA reveals the structural determinants for non-sequence-specific binding. EMBO J. 18 2563–2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bianchi, M.E., Beltrame, M., and Paonessa, G. 1989. Specific recognition of cruciform DNA by nuclear protein HMG1. Science 243 1056–1059. [DOI] [PubMed] [Google Scholar]
  5. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., and Karplus, M. 1983. CHARMM—A program for macromolecular energy minimization and dynamics calculations. J. Comput. Chem. 4 187–217. [Google Scholar]
  6. Brünger, A.T. 1992. X-PLOR version 3.1: A system for crystallography and NMR. Yale University Press, New Haven, CT.
  7. Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros P, Kunstleve, R.W.G., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. 1998. Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Cryst. 54D: 905–921. [DOI] [PubMed] [Google Scholar]
  8. Chau, K.Y., Lam, H.Y.P., and Lee, K.L.D. 1998. Estrogen treatment induces elevated expression of HMG1 in MCF-7 cells. Exp. Cell Res. 241 269–272. [DOI] [PubMed] [Google Scholar]
  9. Chazin, W.J., Kordel, J., Thulin, E., Hofmann, T., Drakenberg, T., and Forsén, S. 1989. Identification of an isoaspartyl linkage formed upon deamidation of bovine calbindin D9k and structural characterization by 2D 1H NMR. Biochemistry 28 8646–8653. [DOI] [PubMed] [Google Scholar]
  10. Churchill, M.E.A., Jones, D.N.M., Glaser, T., Hefner, H., Searles, M.A., and Travers, A.A. 1995. HMG-D is an architecture-specific protein that binds to DNA containing the dinucleotide TG. EMBO J. 14 1264–1275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Coste, F., Malinge, J.M., Serre, L., Shepard, W., Roth, M., Leng, M., and Zelwer, C. 1999. Crystal structure of a double-stranded DNA containing a cisplatin interstrand cross-link at 1.63 Å resolution: Hydration at the platinated site. Nucleic Acids Res. 27 1837–1846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Diamond, R. 1992. On the multiple simultaneous superposition of molecular structures by rigid body transformation. Protein Sci. 1 1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. ———. 1995. Coordinate-based cluster analysis. Acta Cryst. 51D: 127–135. [DOI] [PubMed] [Google Scholar]
  14. Dornberger, U., Hillisch, A., Gollmick, F.A., Fritzsche, H., and Diekmann, S. 1999. Solution structure of a five-adenine bulge loop within a DNA duplex. Biochemistry 38 12860–12868. [DOI] [PubMed] [Google Scholar]
  15. Dow, L.K., Jones, D.N.M., Wolfe, S.A., Verdine, G.L., and Churchill, M.E.A. 2000. Structural studies of the high mobility group globular domain and basic tail of HMG-D bound to disulphide cross-linked DNA. Biochemistry 39 9725–9736. [DOI] [PubMed] [Google Scholar]
  16. Gelasco, A. and Lippard, S.J. 1998. NMR solution structure of a DNA dodecamer duplex containing a cis-diammineplatinum(II) d(GpG) intrastrand cross-link, the major adduct of the anticancer drug cisplatin. Biochemistry 37 9230–9239. [DOI] [PubMed] [Google Scholar]
  17. Hardman, C.H., Broadhurst, R.W., Raine, A.R.C., Grasser, K.D., Thomas, J.O., and Laue, E.D. 1995. Structure of the A-domain of HMG1 and its interaction with DNA as studied by heteronuclear three- and four-dimensional NMR spectroscopy. Biochemistry 34 16596–16607. [DOI] [PubMed] [Google Scholar]
  18. He, Q., Liang, C.H., and Lippard, S.J. 2000. Steroid hormones induce HMG1 overexpression and sensitize breast cancer cells to cisplatin and carboplatin. Proc. Natl. Acad. Sci. 97 5768–5772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Husain, I., Griffith, J., and Sancar, A. 1988. Thymine dimers bend DNA. Proc. Natl. Acad. Sci. 85 2558–2562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jamieson, E.R., Jacobson, M.P., Barnes, C.M., Chow, C.S., and Lippard, S.J. 1999. Structural and kinetic studies of a cisplatin-modified DNA icosamer binding to HMG1 domain B. J. Biol. Chem. 274 12346–12354. [DOI] [PubMed] [Google Scholar]
  21. Jones, D.N.M., Searles, M.A., Shaw, G.L., Churchill, M.E.A., Ner, S.S., Keeler, J., Travers, A.A., and Neuhaus, D. 1994. The solution structure and dynamics of the HMG-Box motif of HMG-D from Drosophila melanogaster. Structure 2 609–627. [DOI] [PubMed] [Google Scholar]
  22. Kim, J.L., Nikolov, D.B., and Burley, S.K. 1993. Co-crystal structure of TBP recognizing the minor-groove of a TATA element. Nature 365 520–527. [DOI] [PubMed] [Google Scholar]
  23. King, C.Y. and Weiss, M.A. 1993. The SRY high-mobility group box recognizes DNA by partial intercalation in the minor groove—A topological mechanism of sequence specificity. Proc. Natl. Acad. Sci. 90 11990–11994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Koradi, R., Billeter, M., and Wüthrich, K. 1996. MOLMOL: A program for display and analysis of macromolecular structures. J. Mol. Graphics 14 51–55. [DOI] [PubMed] [Google Scholar]
  25. Lavery, R. and Sklenar, H. 1989. Defining the structure of irregular nucleic acids: Conventions and principles. J. Biomolec. Struct. Dynamics 6 655–667. [DOI] [PubMed] [Google Scholar]
  26. Lorenz, M., Hillisch, A., Payet, D., Buttinelli, M., Travers, A., and Diekmann, S. 1999. DNA bending induced by high mobility group proteins studied by fluorescence resonance energy transfer. Biochemistry 38 12150–12158. [DOI] [PubMed] [Google Scholar]
  27. Love, J.J., Li, X.A., Case, D.A., Giese, K., Grosschedl, R., and Wright, P.E. 1995. Structural basis for DNA bending by the architectural transcription factor LEF-1. Nature 376 791–795. [DOI] [PubMed] [Google Scholar]
  28. MacKerell, J.A.D, Bashford, D., Bellott, M., Dunbrack Jr., R.L., Evanseck, J., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S. et al. 1998. All-hydrogen empirical potential for molecular modeling and dynamics studies of proteins using the CHARMM22 force field. J. Phys. Chem. 102B: 3586–3616. [DOI] [PubMed] [Google Scholar]
  29. Murphy, F.V.I. and Churchill, M.E.A. 2000. Nonsequence-specific DNA recognition: a structural perspective. Curr. Opin. Struc. 8 R83–R89. [DOI] [PubMed] [Google Scholar]
  30. Murphy, F.V., Sweet, R.M., and Churchill, M.E.A. 1999. The structure of a chromosomal high mobility group protein–DNA complex reveals sequence-neutral mechanisms important for non-sequence-specific DNA recognition. EMBO J. 18 6610–6618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Nilges, M. 1995. Calculation of protein structures with ambiguous distance restraints—Automated assignment of ambiguous NOE cross peaks and disulfide connectivities. J. Mol. Biol. 245 645–660. [DOI] [PubMed] [Google Scholar]
  32. Ohndorf, U.-M., Rould, M.A., He, Q., Pabo, C., and Lippard, S.J. 1999. Basis for recognition of cisplatin modified DNA by high-mobility-group proteins. Nature 399 708–712. [DOI] [PubMed] [Google Scholar]
  33. Otting, G. and Wüthrich, K. 1990. Heteronuclear filters in two-dimensional (1H-1H) NMR spectroscopy—Combined use with isotope labelling for studies of macromolecular conformation and intermolecular interactions. Quart. Rev. Biophys. 23 39–96. [DOI] [PubMed] [Google Scholar]
  34. Pasheva, E.A., Pashev, I.G., and Favre, A. 1998. Preferential binding of high mobility group 1 protein to UV-damaged DNA: Role of the COOH-terminal domain. J. Biol. Chem. 273 24730–24736. [DOI] [PubMed] [Google Scholar]
  35. Payet, D. and Travers, A. 1997. The acidic tail of the high mobility group protein HMG-D modulates the structural selectivity of DNA binding. J. Mol. Biol. 266 66–75. [DOI] [PubMed] [Google Scholar]
  36. Payet, D., Hillisch, A., Lowe, N., Diekmann, S., and Travers, A. 1999. The recognition of distorted DNA structures by HMG-D: A footprinting and molecular modelling study. J. Mol. Biol. 294 79–91. [DOI] [PubMed] [Google Scholar]
  37. Pil, P.M. and Lippard, S.J. 1992. Specific binding of chromosomal protein HMG-1 to DNA damaged by the anti-cancer drug cisplatin. Science 256 234–237. [DOI] [PubMed] [Google Scholar]
  38. Sheflin, L.G. and Spaulding, S.W. 1989. High mobility group protein 1 preferentially conserves torsion in negatively supercoiled DNA. Biochemistry 28 5658–5664. [DOI] [PubMed] [Google Scholar]
  39. Sheflin, L.G., Fucile, N.W., and Spaulding, S.W. 1993. The specific interactions of HMG1 and 2 with negatively supercoiled DNA are modulated by their acidic C-terminal domains and involve cysteine residues in their HMG1/2 boxes. Biochemistry 32 3238–3248. [DOI] [PubMed] [Google Scholar]
  40. Sherman, S.E., Gibson, D., Wang, A.H., and Lippard, S.J. 1985. X-ray structure of the major adduct of the anticancer drug cisplatin with DNA. Science 230 412–417. [DOI] [PubMed] [Google Scholar]
  41. Travers, A. 2000. Recognition of distorted DNA structures by HMG domains. Curr. Opin. Struct. Biol. 10 102–109. [DOI] [PubMed] [Google Scholar]
  42. Walters, K.J., Matsuo, H., and Wagner, G. 1997. A simple method to distinguish intermonomer nuclear Overhauser effects in homodimeric proteins with C-2 symmetry. J. Amer. Chem. Soc. 119 5958–5959. [Google Scholar]
  43. Wang, C.I. and Taylor, J.-S. 1991. Site-specific effect of thymine dimer formation on dAn-dTn tract bending and its biological implications. Proc. Natl. Acad. Sci. 88 9072–9076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Webb, M. and Thomas, J.O. 1999. Structure-specific binding of the two tandem HMG boxes of HMG1 to four-way junction DNA is mediated by the A domain. J. Mol. Biol. 294 373–387. [DOI] [PubMed] [Google Scholar]
  45. Werner, M.H., Ruth, J.R., Gronenborn, A.M., and Clore, G.M. 1995. Molecular basis of human 46X, Y sex reversal revealed from the 3-dimensional solution structure of the human SRY–DNA complex. Cell 81 705–714. [DOI] [PubMed] [Google Scholar]
  46. Wishart, D.S., Bigam, C.G., Yao, J., Abildgaard, F., Dyson, H.J., Oldfield, E., Markley, J.L., and Sykes, B.D. 1995. 1H, 13C and 15N chemical shift referencing in biomolecular NMR. J. Biomolec. NMR 6 135–141. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES