Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Aug 24.
Published in final edited form as: Nature. 2011 Jan 26;470(7335):498–502. doi: 10.1038/nature09775

Transient Hoogsteen Base Pairs in Canonical Duplex DNA

Evgenia N Nikolova 1, Eunae Kim 2, Abigail A Wise 1, Patrick J O’Brien 3, Ioan Andricioaei 2,*, Hashim M Al-Hashimi 1,*
PMCID: PMC3074620  NIHMSID: NIHMS262665  PMID: 21270796

Abstract

Sequence-directed variations in the canonical DNA double helix structure that retain Watson-Crick base-pairing play important roles in DNA recognition, topology, and nucleosome positioning. By using nuclear magnetic resonance relaxation dispersion spectroscopy in concert with steered molecular dynamics simulations, we have observed transient sequence-specific excursions away from Watson-Crick base-pairing at CA and TA steps inside canonical duplex DNA towards low-populated and short-lived A•T and G•C Hoogsteen base-pairs. The observation of Hoogsteen base-pairs in DNA duplexes specifically bound to transcription factors and in damaged DNA sites implies that the DNA double helix intrinsically codes for excited state Hoogsteen base-pairs as a means of expanding its structural complexity beyond that which can be achieved based on Watson-Crick base-pairing. The methods presented here provide a new route for characterizing transient low-populated nucleic acid structures, which we predict will be abundant in the genome and constitute a second transient layer of the genetic code.


Soon after its discovery1, it was recognized that the DNA double helix could accommodate a range of conformations that retain Watson-Crick (WC) base-pairing2. Sequence-directed variations in duplex DNA structure, shape, and flexibility have since been shown to play fundamental roles in biology including in the indirect readout of DNA sequences by recognition factors3,4, nucleosome positioning4,5, and formation of loops and other large-scale architectures6 involved in DNA packaging, replication, transcription, and recombination. DNA duplexes resiliently maintain WC base-pairing even when supercoiled and wrapped around histone octamers in nucleosomes7 and when adopting left-handed double-helical conformations known as Z-DNA8. Deviations from the WC base-pairing have so far only been observed in duplex DNA bound to proteins9,10 and small molecule ligands11,12 and in the context of damaged DNA13,14, but never within naked canonical B-DNA duplexes.

Thus far, atomic resolution structural studies of the iconic DNA double helix have exclusively focused on its dominant, experimentally accessible, ground state conformation. Far less is known about other low-energy DNA conformations that may be sampled only transiently in solution. NMR relaxation dispersion experiments15,16 provide a rare opportunity to detect and characterize such short-lived (<5 ms) and low-populated species (>0.1%), often referred to as “excited states”. This methodology has been widely used to characterize protein excited states that have been implicated in folding15,16, recognition17, and catalysis18, culminating in the recent structure determination of a transient protein-folding intermediate19. Recent advances in carbon-based relaxation dispersion experiments combined with selective labeling schemes have addressed limitations that have hindered application of this methodology to nucleic acids, allowing detection of excited states in both RNA20,21 and DNA21,22. However, the structures of nucleic acid excited states remain elusive.

Relaxation dispersion reveals base-pair specific excited states in CA steps of duplex DNA

We used a recently introduced carbon R relaxation dispersion NMR experiment that allows detection of excited states with enhanced timescale sensitivity21 to probe for the existence of excited states in canonical DNA duplexes (Fig. 1a). We uncovered chemical exchange processes directed towards excited states occurring specifically at A•T and G•C base-pairs in CA/TG steps (Fig. 1b), which together with TA steps are the most flexible dinucleotide steps in DNA and frequently the confluence point for local structural deformations23. In particular, we observed significant R relaxation dispersion, indicative of chemical exchange, at the base C8 and sugar C1’ carbons of the adenine and guanine residues, and for the C6 carbon of the cytosine residue. No relaxation dispersion was observed for the adenine base C2, the thymine residue, or the cytosine sugar C1’ sites (Fig. 1b, Supplementary Fig. 3). The exchange process is enhanced by longer 3’ neighboring A-tracts, modulated by positional context, and for the G•C base-pair, is highly pH dependent (Supplementary Fig. 2). A two-state analysis (AkBkAB) of the off-resonance relaxation dispersion data (Fig. 1c, Supplementary Fig. 2) revealed a single base-pair exchange process that is slightly faster for A•T versus G•C and that is directed towards minutely populated (pB ~ 0.64 % and ~ 0.47 % for G•C and A•T, 26 °C) excited states that have exceptionally short lifetimes (τB = 1/kB ~ 1.5 ms and ~0.3 ms for G•C and A•T, 26 °C), and that have downfield-shifted carbon chemical shifts (ΔωAB(C8) ~ 2.7 – 3.2 ppm, ΔωAB(C1’) ~ 3.1 – 3.7 ppm, and ΔωAB(C6) ~ 2.2 ppm, Supplementary Table 2).

Figure 1. Detection of base-pair specific excited states in CA/TG steps of duplex DNA.

Figure 1

a, DNA constructs containing varying length A-tracts with color-coded A•T and G•C base-pairs at CA/TG steps that show carbon chemical exchange. b, On-resonance 13C R relaxation dispersion profiles for A•T (26.0 °C) and G•C (30.5 °C) showing CA/TG specific chemical exchange at purine base C8 and sugar C1’ and at cytosine base C6. Shown are the best base-pair global fits (solid line) to a two-state asymmetric exchange model (Supplementary Eq.1). c, Representative off-resonance relaxation dispersion profiles for corresponding C1’ sites and best global fits as in b. All error bars represent experimental uncertainty (one s.d.) estimated from mono-exponential fitting of duplicate sets of R data.

Chemical shift and kinetic-thermodynamic assignment of excited state Hoogsteen base-pairs

What is the excited state encoded by CA/TG steps and detected by NMR relaxation dispersion? NMR imino proton exchange measurements24,25 and computer simulations26 have previously shown that WC base-pairs can spontaneously break open and access extrahelical conformations. While the forward exchange rates (kA) measured by relaxation dispersion (~ 4 – 20 s−1 at 26 °C, Supplementary Table 2) are within an order of magnitude of rates reported previously for base-pair opening (~ 40 – 400 s−1 at 25 °C)27, the excited states detected here are at least three orders of magnitude more populated (Supplementary Table 2) and point to more energetically favorable species. To obtain more insights into the excited state, we measured carbon R relaxation dispersion as a function of temperature (Fig. 2a). We then used transition-state theory and van’t Hoff analysis to extract a complete thermodynamic-kinetic description of the two-state equilibria (Supplementary Table 5). Semi-logarithmic van’t Hoff plots revealed a linear dependence characteristic of a two-state process for A•T and G•C base-pairs (Fig. 2b). The analysis yielded activation free energies (~16 kcal/mol) and enthalpies (~12 – 26 kcal/mol) for the forward transition (Fig. 2c) that are comparable to values reported previously for base-pair opening (~14 – 25 kcal/mol and ~8 – 29 kcal/mol respectively)27,28. Thus, the transition to the excited state entails disruption of stacking and hydrogen-bonding interactions in the WC base-pair. However, for both A•T and G•C, this loss in enthalpy is nearly entirely restored when the exited state is formed. In fact, the excited state is in part destabilized relative to the WC ground state by a less favorable entropy, implying a more “rigid” excited state conformation and counter to what would be expected for a flexible looped-out state.

Figure 2. Kinetic-thermodynamic analysis of ground-to-excited state transitions.

Figure 2

a, Representative on-resonance 13C R relaxation dispersion profiles as a function of temperature for A16 (A6-DNA and A4-DNA), A3 (A2-DNA), and G10 (A6-DNA) C1’. b, Modified van’t Hoff plots showing temperature dependence of the forward (kA) and reverse (kB) rate constants for the two-site exchange in A•T and G•C base-pairs highlighted in Fig. 1a. Error bars represent experimental uncertainty (one s.d.) as determined from propagation of errors obtained from mono-exponential fitting of duplicate sets of R data. c, Corresponding kinetic-thermodynamic profiles for exchange between the Watson-Crick (WC) ground state and the excited state (ES) via a transition state (‡), showing activation and net free energy (G), enthalpy (H), and entropy (TS) changes (referenced to 0).

Taken together, our data point to an excited state conformation whose creation requires complete disruption of WC base-pairs, but whose thermodynamic stability is comparable to that of a WC base-pair. One possibility is that the excited state represents an alternative base-pair. Here, the correlated nature of the exchange at purine C8 and C1’ nuclei and the large downfield carbon chemical shifts provides important clues. In particular, the magnitude and direction of ΔωAB(C8) and ΔωAB(C1’) are strongly indicative of an anti-to-syn transition, as deduced from a survey of carbon chemical shifts29 and density functional theory (DFT) calculations30. Remarkably, an anti-to-syn rotation of the adenine or guanine base results in creation of a Hoogsteen (HG) base-pair optimally stabilized by two hydrogen bonds (Fig. 3a). The HG G•C base-pair would require protonation of cytosine N3 (G•C+, pKa(N3) ~4.231), which can in turn explain the relaxation dispersion observed at cytosine C6 at neutral and acidic pH, and the downfield shifted excited state C6 chemical shift, as well as the pH dependence of dispersion measured at carbon sites in the G•C but not in the A•T base-pair (see Supplementary Discussion). HG base-pairs have widely been observed in non-canonical DNA structures, such as DNA triplexes and quadruplexes, where they are specifically recognized by proteins32. In a few cases, HG base-pairs have also been observed in duplex DNA containing alternating AT repeats33 or in complex with ligands11,12 and proteins, including the active site of DNA polymerase-ι34 and a complex between TATA binding protein (TBP) and a mutant TATA-box DNA9, where HG G•C+ base-pairs are formed at near neutral pH. Indeed, the free energy differences measured between the ground and excited state (3.0 – 3.5 kcal/mol for A•T, 2.9 kcal/mol for G•C+) compare well with estimates for stability of HG base-pairs in an all-anti triplex relative to duplex WC base-pairs (3.2 – 3.7 kcal/mol for A•T, 3.1 – 4.2 kcal/mol for G•C+)35. An excited-state HG base-pair can explain the lower enthalpy and lower entropy of the exited state relative to a base-pair open state. Its creation requires a ~ 180-degree base rotation around the glycosidic linkage and disruption of the WC base-pair, consistent with our measured transition state barriers. These excited state HG base-pairs may have evaded detection by prior solvent exchange measurements due to hydrogen-bond protection of the thymine imino proton in (A•T) and/or inaccessibility of the guanine imino proton in (G•C) in addition to slower exchange rates. For example, the observation of a non-hydrogen bonded G H1 resonance in a G(syn)•AH+(anti) mismatch base-pair at 20 °C inside a DNA duplex36 supports the lower exchange rates with solvent.

Figure 3. Chemical shift assignment of excited state Hoogsteen base-pairs.

Figure 3

a, Chemical structures for WC and HG A•T and G•C+ base-pairs. The HG geometry can be achieved by purine rotation around the glycosidic bond (χ) and base-flipping (θ), affecting simultaneously C8 and C1’ (yellow). b, Depiction of C8 and C1’ chemical shifts relative to WC for the excited state (ES, grey); an N1-methyladenine modified A6-DNA (HG(1mA), red), an N1-methylguanine modified A6-DNA (HG(1mG), violet), and an echinomycin-bound DNA (HG(drug), green) forming HG base-pairs; a simulated A16•T9 HG base-pair in A6-DNA (HG(MD), blue); a C3’-endo locked A6-DNA (WC(LNA), orange); and representative cartoons.

Trapping HG base-pairs in duplex DNA

To further test for the existence of HG base-pairs, we used chemical modifications to trap the excited state HG base-pair within duplex DNA. This allowed us to directly compare the carbon chemical shift signatures of the trapped HG base-pair with those measured for the excited state using relaxation dispersion. By installing an N1-methylated adenine (1mA), which is a common DNA lesion known to sterically impair WC base-pairs and favor HG base-pairs13,37 at the CA step, we trapped the HG A•T base-pair in the A6-DNA duplex as confirmed by analysis of NOE connectivity and proton chemical shifts (Supplementary Fig. 5). Though never observed previously, we also successfully trapped an HG G•C+ base-pair using instead the structural analog to 1mA – an N1-methylated guanine (1mG) – at the same CA step (Supplementary Fig. 5). The ability to independently trap HG A•T or G•C+ base-pairs is consistent with the relaxation dispersion data showing that transitions to excited state A•T and G•C HG base-pairs within CA steps are semi-independent of one another. Strikingly, the differences in carbon chemical shift measured between modified and unmodified DNA were in excellent agreement with the chemical shift differences measured between the ground and excited state by relaxation dispersion (Fig. 3b). The noticeably downfield chemical shift observed for the trapped 1mA versus the unmodified excited state HG base-pair could be attributed to changes in the electronic environment arising from introduction of the methyl group and positive charge (not present in 1mG), as supported by DFT calculations (Supplementary Information) and possibly from slight differences in the HG geometry. Conversely, resonances that showed small differences in carbon chemical shift between modified and unmodified constructs (Supplementary Table 4) exhibited little to no carbon chemical exchange (Supplementary Fig. 3). Our data rule out other non-canonical base-pairing modes involving pyrimidine base rotation or a repuckered sugar conformation from C2’-endo to C3’-endo, which yields upfield rather than downfield nucleobase carbon chemical shifts (Fig. 3b, Supplementary Fig. 4). Thus, comparison of known HG ground state chemical shifts with the excited state chemical shifts provides strong support for its assignment as an HG base-pair.

As an inverse experiment, we asked whether TA steps, which have also been observed to form HG base-pairs in duplex DNA bound to small molecule ligands11,12, exhibit the characteristic HG exited state. We measured carbon relaxation dispersion data for a palindromic DNA sequence, which has previously been shown to form tandem A•T HG base-pairs in solution38 when in complex with the bis-intercalating antibiotic echinomycin (Fig. 3b). Strikingly, we observe the same chemical exchange pattern in the TA step and carbon chemical shift differences between ground and exited state that are in excellent agreement with the differences between the free (WC) DNA and drug-bound (HG) DNA (Fig. 3b, Supplementary Fig. 6). Thus, excited state HG base-pairs are not restricted to CA/TG steps but also include TA steps, and they can be conformationally captured by recognition factors. In fact, our recent observation of chemical exchange in a TA step containing an 1,N6-ethenoadenine adduct21 could potentially be explained by transient anti-to-syn excursions as in the HG base-pair. Moreover, the weaker binding affinity of TBP to an HG-containing mutant TATA box, where a WC base-pair would not be tolerated, versus the wild-type sequence with all-WC base-pairing (ΔΔG ~3 kcal/mol)39 could be attributed to conformational selection of a low-populated HG base-pair and could be further correlated with transcriptional regulation of gene repression40.

NMR-informed simulations of the WC-to-HG transition

To assess the energetic and stereochemical feasibility of the proposed WC-to-HG transition, as well as obtain insights into the transition pathway, we used conjugate peak refinement (CPR)41 methods to simulate multiple transition pathways between WC and excited state HG base-pairs that sample various glycosidic (χ) and base opening (θ) angles (Fig. 4a). CPR trajectories for A16•T9 and G10•C15 in A6-DNA showed smooth WC-to-HG transitions via anti-to-syn purine rotation around χ accompanied by minor adjustments in the complementary pyrimidine residue and neighboring WC base-pairs (Fig. 4a, Supplementary Fig. 7 and Movies). Optimal pathways feature purine base-flips at low base-pair opening angles into the major groove and transition states, which require complete disruption of WC base-pairing – this is also observed experimentally by relaxation dispersion – in which the purine base is near-orthogonal to its pyrimidine partner. This unusual geometry is well accommodated by the rather large inter-base-pair spacing in the B-form helix (i.e. base-pair rise ~3.3 Å) and its ability to mold without loss of other WC base-pairing. Although a large spread of transition barriers are sampled, the minimal energy barriers and net energy changes are within 2 kcal/mol of NMR-derived ensemble enthalpy terms (Fig. 4b). The final HG state for G10•C15 yields generally lower relative CPR energies – it is observed to adopt variable syn-guanine geometry, with either one or no optimal hydrogen bonds to the cytosine due to the lack of N3 protonation that stabilizes the ideal HG base-pair, but on occasion forming an intraresidue hydrogen bond to a backbone oxygen (Supplementary Movie 2).

Figure 4. Watson-Crick to Hoogsteen base-pair transition simulations.

Figure 4

a, Pseudo-free-energy (E, kcal/mol) contour plots as a function of (χ, θ) pairs for A16•T9 obtained from multiple CPR trajectories (a–d). b, Initial WC and final exited-state (ES) HG structures and representative lowest-energy (b1) and highest-energy (c3) transition state (‡) structures of A16•T9 illustrating the span of CPR transition barriers, and their relative potential energies (averaged for WC and HG) compared with enthalpies (H) derived from NMR R relaxation dispersion for chemical exchange or NMR imino proton exchange for A•T base-pair opening27. c, Snapshots from a representative CPR transition pathway (a1) for A16•T9.

The carbon chemical shifts computed for the WC and lowest-energy HG geometries using DFT also exhibited the characteristic downfield-shifted ΔωAB(C8) and ΔωAB(C1’) observed experimentally by relaxation dispersion (Fig. 3b, Supplementary Table 4). Conversely, the predicted chemical shifts revealed little and/or random variations between WC and HG base-pairs for adenine C2, thymine C6, and cytosine C1’, consistent with the lack of observable relaxation dispersion at those sites (Supplementary Fig. 8). Comprehensive DFT calculation of carbon chemical shifts for conformers sampled in various pathways between WC and HG states (Supplementary Fig. 8) yielded a small number of geometries that match the measured excited-state chemical shifts, but that could readily be excluded because they involve barrierless transitions and/or represent high-energy structures that disagree with experimentally derived enthalpies.

Our data strongly argue that the DNA double helix codes for a pre-existing WC-to-HG equilibrium, with HG base-pairs representing an accessible and energetically competent alternative to WC base-pairing that present very distinct electrostatic and hydrophobic signatures. This makes it possible to trap HG base-pairs by interactions with cellular triggers, thereby expanding the structural and functional diversity of the double helix beyond that which can be achieved based on an alphabet of only WC base-pairing. There are several examples of transcription factors including TBP9 and p53 tumor suppressor10 that specifically recognize HG base-pairs embedded in different WC contexts, where the modulation in binding affinity, conceivably, by an HG base-pair could even be correlated with an essential biological function40. In addition, HG base-pairs are often trapped by oxidative and alkylation lesions14, providing unique recognition signals for repair enzymes in search of damage sites in a sea of undamaged DNA. Transient formation of HG base-pairs inside B-DNA may also serve to promote non-canonical structures such as contiguous HG motifs, especially in tandem CA and TA repeats, or more dramatic transformations to Z-DNA42, and may well exist in much greater abundance for native genomic DNA, which is under torsional stress in the cellular environment. The methods presented here provide a general strategy for detecting and characterizing excited states in DNA and RNA, which we predict will be abundant in the genome and constitute another transient layer of the genetic code.

METHODS SUMMARY

Detailed methods on DNA sample preparation and assignment, NMR relaxation dispersion data collection and analysis, MD/CPR simulations, and DFT chemical shift calculations can be found in Methods.

Full Methods and associated references are available in the online version of the paper at www.nature.com/nature.

METHODS

Preparation and NMR resonance assignment of unlabeled and 13C/15N-labeled DNA

Isotopically labeled DNA dodecamers (Fig. 1a) were synthesized by in vitro primer extension using a template hairpin DNA (IDT, Inc.), Klenow fragment DNA polymerase (NEB, Inc.), and uniformly 13C/15N-labeled dNTPs (Isotec, Sigma-Aldrich). Single-stranded DNA products were purified by 20% denaturing polyacrylamide gel electrophoresis (PAGE), isolated by passive elution and desalted on a C18 reverse-phase column (Sep-pak, Waters). Oligonucleotides were lyophilized and resuspended in NMR buffer (15 mM sodium phosphate pH 6.8, 25 mM sodium chloride, 0.1 mM EDTA, 10% D2O). Complementary oligonucleotides were annealed at an equimolar ratio typically at 0.5–1.0 mM for NMR studies. Unlabeled DNA oligonucleotides were purchased from IDT, Inc. (A2-DNA, A4-DNA, A6-DNA in Fig. 1a, and E-DNA in Fig. S6), Exiqon A/S (A6-DNAA16LNA and A2-DNAA16LNA in Fig. S4) and Midland Certified, Inc. (A6-DNA1mA16 and A6-DNA1mG10 in Fig. S5). Unlabeled DNA constructs including equivalent samples to 13C/15N-labeled DNA were prepared as described21 at 2.0–4.0 mM concentrations and assigned using conventional 1H-1H NOESY in 10% D2O at 5 °C and/or 26 °C. The 2:1 complex between E-DNA and echinomycin (Selleck Chemicals) was prepared as previously described38. All NMR experiments were performed on a Bruker Avance 600 MHz NMR spectrometer equipped with a 5mm triple-resonance cryogenic probe.

Selective 13C R relaxation dispersion

Rotating frame (R) relaxation dispersion profiles were measured at a single field (14.1 T) using a selective carbon experiment with a 1D acquisition scheme21 that extends the sensitivity to chemical exchange into millisecond timescales relative to conventional 2D relaxation dispersion methods. On-resonance data were recorded at variable (100 to 3500 Hz) effective spin-lock field strength (ωeff) (Supplementary Table 1 and 3) for various sites in 13C/15N-labeled and unlabeled DNA constructs. For 13C/15N-labeled DNA samples (Supplementary Fig. 2) and E-DNA octamer (Supplementary Fig. 6), off-resonance dispersion data were collected in a temperature dependent manner at various spinlock offset frequencies (Ωeff) and at three to four different spinlock powers (ω) (Supplementary Table 1 and 3). In each case, the following relaxation delays were used: {0, 4, 8, 12 (2X), 16, 20, 26, 32 (2X) ms} for C2/C6/C8 and {0, 4, 8, 12 (2X), 18, 26, 34, 42 (2X) ms} for C1’ in 13C/15N-labeled DNA constructs; {0, 40(2X) ms} for C8 and {0, 48(2X) ms} for C1’ at 17 °C, {0, 42(2X) ms} for C8 (17.0 °C) and {0, 60(2X) ms} for C1’ at 26 °C in E-DNA; {0, 30(2X) ms} for C8 (17.0 °C) in A6-DNAA16LNA. Data points corresponding to Hartmann-Hahn matching conditions were omitted from the data fits as previously described21. Data were processed using nmrPipe43 and the effective transverse relaxation rates (R2eff = R2 + Rex) were computed by fitting the resonance intensities with monoexponential decays using Mathematica 6.0 (Wolfram Research, Inc., Champaign, IL).

Measured relaxation dispersion profiles with on- and off-resonance data were fit by Eq.1 (below) that assumes a two-state equilibrium (AkBkAB) with asymmetric population distributions (pApB)44 using Mathematica 6.0 (Wolfram Research, Inc., Champaign, IL)

R1ρ=R1cos2θ+R2sin2θ+sin2θpApBΔωAB2kexΩB2+ω1+kex2 (1)

, where R1 and R2 are the intrinsic longitudinal and transverse relaxation rates respectively (assumed to be identical for A and B species), Ω is the resonance offset from the spinlock carrier, ω1 is the spinlock strength; Ω= tan(ω1/ Ωave), ΔωAB = ΩB - ΩA, Ωave = pAΩA - pBΩB, where pA (pB) is the major (minor) state fractional population (pA + pB = 1); kex = kA + kB is the exchange rate constant for a two-state equilibrium, where kA = pBkex and kB = pAkex are the forward and reverse rate constants respectively. Similar results were obtained when fitting relaxation dispersion profiles against more complex two-state exchange models including the Laguerre approximation45 that do not assume a skewed population distribution (data not shown) and statistical analysis implied that application of the more complex model to extract exchange parameters here was not justified. Temperature-dependent data for base/sugar resonances within the same nucleotide or base-pair were fit individually and globally with shared parameters (kex and pB for each temperature) (Supplementary Table 2 and 3). The best-fit model was assessed using F-test (data not shown), which uses chi-square (χ2) and F-distribution analysis to determine the feasibility of a complicated model (i.e. individual fits) versus a more simple model (i.e. shared-parameter fits) nested inside the first model. The chemical shift difference ΔωAB was assumed to be invariant over the narrow temperature range investigated. On-resonance R profiles were fit by a simplified two-state fast exchange expression44 (kex ≫ ΔωAB):

R1ρ=R2+Rex=R2+Φexkexω1+kex2;Φex=pApBΔωAB2 (2)

, where all parameters are as described above.

Thermodynamic Analysis

The observed temperature dependence of the forward (kA) and reverse (kB) rate constants (Supplementary Table 2) was fit by a modified van’t Hoff equation that accounts for statistical compensation effects and assumes a smooth energy surface27:

ln(ki(T)T)=ln(kBκh)ΔGiT(Thm)RThmΔHiT(Thm)R(1T1Thm) (3)

, where ki (i = A, B) is the rate constant, ΔGiTandΔHiT are the free energy and enthalpy of activation respectively, R is the universal gas constant, T is temperature, and Thm is the harmonic mean of the experimental temperatures computed as Thm=n/i=1n(1/Ti); kB is Boltzmann’s constant, h is Plank’s constant, κ is the transmission coefficient (assumed to be 1) in the pre-exponential factor of Eyring’s theory. The entropy of activation (ΔSiT) was calculated as follows:

ΔSiT=ΔHiTΔGiT(Thm)Thm (4)

Semi-logarithmic plots are included in Fig. 2b and best-fit thermodynamic parameters are reported in Supplementary Table 5.

An alternative interpretation of the thermodynamic parameters is given by the phenomenological Ferry law46,47, which incorporates a lower energy barrier with a rough enthalpic surface:

ln(ki)=lnCΔHiTRT<Hi2>(RT)2 (5)

, where C is a constant, where <Hi2>1/2 represents the enthalpy due to ruggedness of the barrier. The maximum <Hi2>1/2 values were calculated by taking the smooth Arrhenius-like enthalpic barrier to be vanishing (ΔHiT=0) and are reported in Supplementary Table 4.

Molecular dynamics simulations of WC-to-HG base-pair transition pathways

An initial duplex DNA in standard B-form with Watson-Crick (WC) base-pairing corresponding to A6-DNA was generated using 3DNA48. Hoogsteen (HG) base-pairs were generated at A16•T9, A3•T22, and G10•C15 positions using known X-ray coordinates49,50, where the purine adopts a syn conformation, while the complementary pyrimidine retains an anti conformation (see Fig. 3a). Initially, all conformers with WC or HG base-pairing were equilibrated through a series of energy minimizations using the Adopted Basis Newton-Raphson (ABNR) algorithm. Canonical (NVT) ensemble MD were performed with the CHARMM27 all-atom force field51 and the generalized Born molecular volume (GBMV) implicit solvation model (GBMV II)5254. The velocity-Verlet algorithm was used with a time step of 2 fs. A temperature of 300 K was kept constant with a Nose-Hoover thermostat55,56. The cutoff for non-bonded list generation was set to 21 Å, the cutoff for non-bonded interactions was set to 18 Å, and the onset of switching for non-bonded interactions was set to 16 Å. The SHAKE algorithm was used to constrain vibrations of covalent bonds to hydrogen atoms involved. To conserve the duplex DNA structure during pre-equilibration, flat-bottom distance restraints were applied, which prevented the hydrogen-bond donor from moving more than 2.0 Å. The simulations ran for a total of 6.0 ns (Supplementary Fig. 7).

For the collection and analysis of equilibrium data, the initial 1 ns simulation data were discarded. We constructed a 2D free-energy map using the following two reaction coordinates: the DNA backbone root-mean-square deviation (rmsd) and the potential energy value; the minimization of this approximate free energy surface permitted us to choose the most populated structure in the canonical ensemble. [F=−kBT ln P(X,Y), X and Y are reaction coordinates, kB is Boltzmann’s constant, T is the absolute temperature (Supplementary Fig. 7)]. The DNA backbone rmsd is defined by selecting the P-O5’-C5’-C4’-C3’-O3’ atoms. The reference structure was a standard B-form DNA (Twist angle Ω = 36.0°, Rise per base-pair along helix axis = 3.3 Å) without minimization. In order to probe the transition pathway between a WC and a HG base-pair conformer, we used the conjugate peak refinement (CPR) method41 applicable to the study of complex isomerization reactions including allosteric transitions in proteins and more general conformational changes in macromolecules. The resulting paths follow the adiabatic energy surface without applying any constraints and path-points between saddle-points ensure the continuity of the path, not necessarily constrained to find the absolute bottom of the energy valley. The initial WC-to-HG pathways were generated using a targeted molecular dynamics method that applied a holonomic constraint, which decreased gradually the rmsd to the final target structure. In each CPR cycle, a heuristic procedure was used to modify the path by improving, removing, or inserting one path-point, so that the new path avoids the maximum energy peak. Finally, to refine the CPR path further, we used a synchronous chain minimization method of all path-points, under the constraint that the points move within hyper-planes orthogonal to the path. The most populated structures with the WC base-pair or with the HG base-pair from the normal MD simulations corresponded to “reactant” and “product” wells, respectively, on the energy surface. In order to sample a wider range of possible transition pathways, additional putative intermediates that differed in either the flip-over or flip-out were added by modifying, respectively, the glycosidic angle, χ (O4’-C1’-N9-C4), of the purine base and a center-of-mass pseudo-dihedral angle, θ28, which describes the extent to base opening. CPR data is reported in Supplementary Fig. 7.

Density functional theory (DFT) calculations of carbon chemical shifts

DFT chemical shift calculations were conducted on a high-performance computing cluster using Gaussian 0357. DNA conformations of the target A•T and G•C base-pair that represent a range of (χ, θ) pairs were selected from each simulated WC-to-HG transition pathway and capped by 3’OH/5’OH (UCSF Chimera58) for DFT calculations (Supplementary Fig. 8) without further geometry optimization. NMR 13C chemical shift calculations were conducted using the GIAO method with the B3LYP/6-311+G(2d,p) basis set. The isotropic carbon chemical shifts (σISO) were referenced to TMS (σTMS = 182.759) using the relationship δISO = σTMS - σISO, where the structure of TMS was optimized at the same level of theory. Computed carbon chemical shifts were referenced to the most stable WC conformer in a given transition pathway and matched with NMR excited-state chemical shift differences (ΔωAB) in Supplementary Table 4. For benchmarking, similar calculations were performed on single guanosine nucleotides with anti and syn glycosidic conformations from crystal structures of Oxytricha Nova telomeric G-quadruplex (5’ (G)4(T)4(G)4; PDB ID: IJPQ, 1JRN, 2GWQ, 2GWE, and 2NPR) with added hydrogen atoms (UCSF Chimera58) and no further geometry optimization, and compared to observed NMR C8 chemical shifts for the same G-quadruplex (courtesy of Dr. Michelle Gill and Dr. Patrick Loria) (Supplementary Fig. 8).

Supplementary Material

1
2
3

Acknowledgements

We thank Dr. Alexandar L. Hansen, Scott Horowitz, and Prof. Juli Feigon for valuable discussions and suggestions, and Dr. Alexander V. Kurochkin for NMR expertise. We gratefully acknowledge the Michigan Economic Development Cooperation and the Michigan Technology Tri-Corridor for support in the purchase of a 600 MHz spectrometer. This work was supported by NSF CAREER awards (MCB 0644278 to HMA and CHE-0918817 to IA) and an NIH grant (R01GM089846). E.N.N. acknowledges support by a Rackham International and Predoctoral Fellowship awarded by the University of Michigan.

Footnotes

Author Contributions E.N.N. prepared DNA samples assisted by A.A.W. and performed/analyzed all NMR experiments and DFT calculations; E.N.N. and H.M.A. conceived the idea of an HG excited state base-pair and approaches to investigate its formation; I.A. and E.K. performed and analyzed the CPR simulations; P.J.O. provided expertise and guidance for damaged DNA studies along with critical manuscript revisions; H.M.A. and E.N.N. with help from P.J.O, E.K. and I.A. wrote the paper.

Author Information Reprints and permissions information is available at www.nature.com/reprints.

References

  • 1.Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–738. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
  • 2.Record MT, Jr, et al. Double helical DNA: conformations, physical properties, and interactions with ligands. Annu Rev Biochem. 1981;50:997–1024. doi: 10.1146/annurev.bi.50.070181.005025. [DOI] [PubMed] [Google Scholar]
  • 3.Koudelka GB, Mauro SA, Ciubotaru M. Indirect readout of DNA sequence by proteins: the roles of DNA sequence-dependent intrinsic and extrinsic forces. Prog Nucleic Acid Res Mol Biol. 2006;81:143–177. doi: 10.1016/S0079-6603(06)81004-4. [DOI] [PubMed] [Google Scholar]
  • 4.Rohs R, et al. The role of DNA shape in protein-DNA recognition. Nature. 2009;461:1248–1253. doi: 10.1038/nature08473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Segal E, et al. A genomic code for nucleosome positioning. Nature. 2006;442:772–778. doi: 10.1038/nature04979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Saiz L, Vilar JM. DNA looping: the consequences and its control. Curr Opin Struct Biol. 2006;16:344–350. doi: 10.1016/j.sbi.2006.05.008. [DOI] [PubMed] [Google Scholar]
  • 7.Richmond TJ, Davey CA. The structure of DNA in the nucleosome core. Nature. 2003;423:145–150. doi: 10.1038/nature01595. [DOI] [PubMed] [Google Scholar]
  • 8.Wang AH, et al. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature. 1979;282:680–686. doi: 10.1038/282680a0. [DOI] [PubMed] [Google Scholar]
  • 9.Patikoglou GA, et al. TATA element recognition by the TATA box-binding protein has been conserved throughout evolution. Genes Dev. 1999;13:3217–3230. doi: 10.1101/gad.13.24.3217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kitayner M, et al. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat Struct Mol Biol. 2010;17:423–429. doi: 10.1038/nsmb.1800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ughetto G, et al. A comparison of the structure of echinomycin and triostin A complexed to a DNA fragment. Nucleic Acids Res. 1985;13:2305–2323. doi: 10.1093/nar/13.7.2305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Seaman FC, Hurley L. Interstrand cross-linking by bizelesin produces a Watson-Crick to Hoogsteen base-pairing transition region in d(CGTAATTACG)2. Biochemistry. 1993;32:12577–12585. doi: 10.1021/bi00210a005. [DOI] [PubMed] [Google Scholar]
  • 13.Yang H, Zhan Y, Fenn D, Chi LM, Lam SL. Effect of 1-methyladenine on double-helical DNA structures. FEBS Lett. 2008;582:1629–1633. doi: 10.1016/j.febslet.2008.04.013. [DOI] [PubMed] [Google Scholar]
  • 14.Shanmugam G, Kozekov ID, Guengerich FP, Rizzo CJ, Stone MP. Structure of the 1,N2-ethenodeoxyguanosine adduct opposite cytosine in duplex DNA: Hoogsteen base pairing at pH 5.2. Chem Res Toxicol. 2008;21:1795–1805. doi: 10.1021/tx8001466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Palmer AG., 3rd NMR characterization of the dynamics of biomacromolecules. Chem Rev. 2004;104:3623–3640. doi: 10.1021/cr030413t. [DOI] [PubMed] [Google Scholar]
  • 16.Korzhnev DM, Kay LE. Probing invisible, low-populated States of protein molecules by relaxation dispersion NMR spectroscopy: an application to protein folding. Acc Chem Res. 2008;41:442–451. doi: 10.1021/ar700189y. [DOI] [PubMed] [Google Scholar]
  • 17.Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol. 2009;5:789–796. doi: 10.1038/nchembio.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Henzler-Wildman K, Kern D. Dynamic personalities of proteins. Nature. 2007;450:964–972. doi: 10.1038/nature06522. [DOI] [PubMed] [Google Scholar]
  • 19.Korzhnev DM, Religa TL, Banachewicz W, Fersht AR, Kay LE. A transient and low-populated protein-folding intermediate at atomic resolution. Science. 2010;329:1312–1316. doi: 10.1126/science.1191723. [DOI] [PubMed] [Google Scholar]
  • 20.Johnson JE, Jr, Hoogstraten CG. Extensive backbone dynamics in the GCAA RNA tetraloop analyzed using 13C NMR spin relaxation and specific isotope labeling. J Am Chem Soc. 2008;130:16757–16769. doi: 10.1021/ja805759z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hansen AL, Nikolova EN, Casiano-Negroni A, Al-Hashimi HM. Extending the range of microsecond-to-millisecond chemical exchange detected in labeled and unlabeled nucleic acids by selective carbon R(1rho) NMR spectroscopy. J Am Chem Soc. 2009;131:3818–3819. doi: 10.1021/ja8091399. [DOI] [PubMed] [Google Scholar]
  • 22.Shajani Z, Varani G. 13C relaxation studies of the DNA target sequence for hhai methyltransferase reveal unique motional properties. Biochemistry. 2008;47:7617–7625. doi: 10.1021/bi7020469. [DOI] [PubMed] [Google Scholar]
  • 23.Travers AA. The structural basis of DNA flexibility. Philos Transact A Math Phys Eng Sci. 2004;362:1423–1438. doi: 10.1098/rsta.2004.1390. [DOI] [PubMed] [Google Scholar]
  • 24.Pardi A, Morden KM, Patel DJ, Tinoco I., Jr Kinetics for exchange of imino protons in the d(C-G-C-G-A-A-T-T-C-G-C-G) double helix and in two similar helices that contain a G. T base pair, d(C-G-T-G-A-A-T-T-C-G-C-G), and an extra adenine, d(C-G-C-A-G-A-A-T-T-C-G-C-G) Biochemistry. 1982;21:6567–6574. doi: 10.1021/bi00268a038. [DOI] [PubMed] [Google Scholar]
  • 25.Gueron M, Kochoyan M, Leroy JL. A single mode of DNA base-pair opening drives imino proton exchange. Nature. 1987;328:89–92. doi: 10.1038/328089a0. [DOI] [PubMed] [Google Scholar]
  • 26.Perez A, Luque FJ, Orozco M. Dynamics of B-DNA on the microsecond time scale. J Am Chem Soc. 2007;129:14739–14745. doi: 10.1021/ja0753546. [DOI] [PubMed] [Google Scholar]
  • 27.Coman D, Russu IM. A nuclear magnetic resonance investigation of the energetics of basepair opening pathways in DNA. Biophys J. 2005;89:3285–3292. doi: 10.1529/biophysj.105.065763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Song K, et al. An Improved Reaction Coordinate for Nucleic Acid Base Flipping Studies. J. Chem. Theory Comput. 2009;5:3105–3113. doi: 10.1021/ct9001575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Greene KL, Wang Y, Live D. Influence of the glycosidic torsion angle on 13C and 15N shifts in guanosine nucleotides: investigations of G-tetrad models with alternating syn and anti bases. J Biomol NMR. 1995;5:333–338. doi: 10.1007/BF00182274. [DOI] [PubMed] [Google Scholar]
  • 30.Xu X, Au-Yeung S. Investigation of chemical shift and structure relationships in nucleic acids using NMR and density functional theory methods. J. Phys. Chem. B. 2000;104:5641–5650. [Google Scholar]
  • 31.Izatt RM, Christensen JJ, Rytting JH. Sites and thermodynamic quantities associated with proton and metal ion interaction with ribonucleic acid, deoxyribonucleic acid, and their constituent bases, nucleosides, and nucleotides. Chem Rev. 1971;71:439–481. doi: 10.1021/cr60273a002. [DOI] [PubMed] [Google Scholar]
  • 32.Ghosal G, Muniyappa K. Hoogsteen base-pairing revisited: resolving a role in normal biological processes and human diseases. Biochem Biophys Res Commun. 2006;343:1–7. doi: 10.1016/j.bbrc.2006.02.148. [DOI] [PubMed] [Google Scholar]
  • 33.Abrescia NG, Thompson A, Huynh-Dinh T, Subirana JA. Crystal structure of an antiparallel DNA fragment with Hoogsteen base pairing. Proc Natl Acad Sci U S A. 2002;99:2806–2811. doi: 10.1073/pnas.052675499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Nair DT, Johnson RE, Prakash L, Prakash S, Aggarwal AK. Human DNA polymerase iota incorporates dCTP opposite template G via a G.C + Hoogsteen base pair. Structure. 2005;13:1569–1577. doi: 10.1016/j.str.2005.08.010. [DOI] [PubMed] [Google Scholar]
  • 35.Powell SW, Jiang L, Russu IM. Proton exchange and base pair opening in a DNA triple helix. Biochemistry. 2001;40:11065–11072. doi: 10.1021/bi010890a. [DOI] [PubMed] [Google Scholar]
  • 36.Sau AK, et al. Evidence for A+(anti)-G(syn) mismatched base-pairing in d-GGTAAGCGTACC. FEBS Lett. 1995;377:301–305. doi: 10.1016/0014-5793(95)01362-8. [DOI] [PubMed] [Google Scholar]
  • 37.Lu L, Yi C, Jian X, Zheng G, He C. Structure determination of DNA methylation lesions N1-meA and N3-meC in duplex DNA using a cross-linked protein-DNA system. Nucleic Acids Res. 2010;38:4415–4425. doi: 10.1093/nar/gkq129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gilbert DE, van der Marel GA, van Boom JH, Feigon J. Unstable Hoogsteen base pairs adjacent to echinomycin binding sites within a DNA duplex. Proc Natl Acad Sci U S A. 1989;86:3006–3010. doi: 10.1073/pnas.86.9.3006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hoopes BC, LeBlanc JF, Hawley DK. Contributions of the TATA box sequence to rate-limiting steps in transcription initiation by RNA polymerase II. J Mol Biol. 1998;277:1015–1031. doi: 10.1006/jmbi.1998.1651. [DOI] [PubMed] [Google Scholar]
  • 40.Meyer T, Carlstedt-Duke J, Starr DB. A weak TATA box is a prerequisite for glucocorticoid-dependent repression of the osteocalcin gene. J Biol Chem. 1997;272:30709–30714. doi: 10.1074/jbc.272.49.30709. [DOI] [PubMed] [Google Scholar]
  • 41.Fischer S, Karplus M. Conjugate Peak Refinement - an Algorithm for Finding Reaction Paths and Accurate Transition-States in Systems with Many Degrees of Freedom. Chem. Phys. Lett. 1992;194:252–261. [Google Scholar]
  • 42.Segers-Nolten GM, Sijtsema NM, Otto C. Evidence for Hoogsteen GC base pairs in the proton-induced transition from right-handed to left-handed poly(dG-dC).poly(dG-dC) Biochemistry. 1997;36:13241–13247. doi: 10.1021/bi971326w. [DOI] [PubMed] [Google Scholar]
  • 43.Delaglio F, et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR. 1995;6:277–293. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
  • 44.Palmer AG, 3rd, Massi F. Characterization of the dynamics of biomacromolecules using rotating-frame spin relaxation NMR spectroscopy. Chem Rev. 2006;106:1700–1719. doi: 10.1021/cr0404287. [DOI] [PubMed] [Google Scholar]
  • 45.Miloushev VZ, Palmer AG., 3rd R(1rho) relaxation for two-site chemical exchange: general approximations and some exact solutions. J Magn Reson. 2005;177:221–227. doi: 10.1016/j.jmr.2005.07.023. [DOI] [PubMed] [Google Scholar]
  • 46.Ferry J, Grandine L, Fitzgerald E. The Relaxation Distribution Function of Polyisobutylene in the Transition from Rubberlike to Glasslike Behavior and its Dependence on Temperature. Physical review. 1953;91:217–217. [Google Scholar]
  • 47.Denisov VP, Peters J, Horlein HD, Halle B., q Using buried water molecules to explore the energy landscape of proteins. Nat Struct Biol. 1996;3:505–509. doi: 10.1038/nsb0696-505. [DOI] [PubMed] [Google Scholar]
  • 48.Olson WK, et al. A Standard Reference Frame for the Description of Nucleic Acid Base-pair Geometry. J. Mol. Biol. 2001;131:9. doi: 10.1006/jmbi.2001.4987. [DOI] [PubMed] [Google Scholar]
  • 49.Abrescia NG, Gonzalez C, Gouyette C, Subirana JA. X-ray and NMR studies of the DNA oligomer d(ATATAT): Hoogsteen base pairing in duplex DNA. Biochemistry. 2004;43:4092–4100. doi: 10.1021/bi0355140. [DOI] [PubMed] [Google Scholar]
  • 50.Aishima J, et al. A Hoogsteen base pair embedded in undistorted B-DNA. Nucleic Acids Res. 2002;30:5244–5252. doi: 10.1093/nar/gkf661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.MacKerell AD, Jr, Banavali N, Foloppe N. Development and current status of the CHARMM force field for nucleic acids. Biopolymers. 2000;56:257–265. doi: 10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W. [DOI] [PubMed] [Google Scholar]
  • 52.Chocholousova J, Feig M. Implicit solvent simulations of DNA and DNA-protein complexes: agreement with explicit solvent vs experiment. J Phys Chem B. 2006;110:17240–17251. doi: 10.1021/jp0627675. [DOI] [PubMed] [Google Scholar]
  • 53.Feig M, et al. Performance comparison of generalized born and Poisson methods in the calculation of electrostatic solvation energies for protein structures. J Comput Chem. 2004;25:265–284. doi: 10.1002/jcc.10378. [DOI] [PubMed] [Google Scholar]
  • 54.Lee MS, Feig M, Salsbury FR, Jr, Brooks CL., 3rd New analytic approximation to the standard molecular volume definition and its application to generalized Born calculations. J Comput Chem. 2003;24:1348–1356. doi: 10.1002/jcc.10272. [DOI] [PubMed] [Google Scholar]
  • 55.Nose S. A Unified Formulation of the Constant Temperature Molecular-Dynamics Methods. Journal of Chemical Physics. 1984;81:511–519. [Google Scholar]
  • 56.Hoover WG. Canonical Dynamics - Equilibrium Phase-Space Distributions. Physical Review A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
  • 57.Frisch MJ, et al. Gaussian 03, Revision C.02. Wallingford CT: Gaussian, Inc.; 2004. [Google Scholar]
  • 58.Pettersen EF, et al. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

RESOURCES