Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2021 Nov 18;49(21):12540–12555. doi: 10.1093/nar/gkab936

Revealing A-T and G-C Hoogsteen base pairs in stressed protein-bound duplex DNA

Honglue Shi 1, Isaac J Kimsey 2, Stephanie Gu 3, Hsuan-Fu Liu 4, Uyen Pham 5, Maria A Schumacher 6,, Hashim M Al-Hashimi 7,8,
PMCID: PMC8643651  PMID: 34792150

Abstract

Watson–Crick base pairs (bps) are the fundamental unit of genetic information and the building blocks of the DNA double helix. However, A-T and G-C can also form alternative ‘Hoogsteen’ bps, expanding the functional complexity of DNA. We developed ‘Hoog-finder’, which uses structural fingerprints to rapidly screen Hoogsteen bps, which may have been mismodeled as Watson–Crick in crystal structures of protein–DNA complexes. We uncovered 17 Hoogsteen bps, 7 of which were in complex with 6 proteins never before shown to bind Hoogsteen bps. The Hoogsteen bps occur near mismatches, nicks and lesions and some appear to participate in recognition and damage repair. Our results suggest a potentially broad role for Hoogsteen bps in stressed regions of the genome and call for a community-wide effort to identify these bps in current and future crystal structures of DNA and its complexes.

INTRODUCTION

One of the cornerstones of molecular biology is that A pairs with T and G with C to form Watson–Crick base pairs (bps) (Figure 1A). However, soon after the discovery of the DNA double helix, it was shown that A-T and G-C could also pair in an alternative conformation known as the ‘Hoogsteen’ bp (1,2) (Figure 1A). A Hoogsteen bp can be obtained by flipping the purine base in a Watson–Crick bp from the anti to syn conformation and then forming a unique set of hydrogen bonds (H-bonds) with the partner pyrimidine requiring protonation of cytosine-N3 (Figure 1A). Relative to Watson–Crick bps, Hoogsteen pairing requires that the two bases also come into closer proximity by ∼2.0–2.5 Å. This has been shown to locally constrict the helical diameter and to cause kinking of the DNA double helix toward the major groove by ∼10° (3).

Figure 1.

Figure 1.

Hoog-finder to rapidly identify putative Hoogsteen bps in crystal structures of protein–DNA complexes. (A) Dynamic equilibrium between Watson–Crick and Hoogsteen bps. (B) Generating the training and negative training datasets. 2mFo-DFc electron density maps (contoured at ∼1 σ) are shown in gray, whereas red and blue regions represent mFo-DFc difference electron density maps contoured at around + 3σ and -3σ, respectively. Steric clashes and H-bonds between the two bases are denoted using a pink and a green dashed line, respectively. (C) Representative 2mFo-DFc and mFo-DFc electron density maps for original Hoogsteen (left, solid boxes) and the corresponding mismodeled Watson–Crick models (right, dashed boxes) highlighting the unique structural fingerprints of mismodeled Watson–Crick bps. Gray and purple meshed regions represent 2mFo-DFc densities at 1.0σ and 3.0σ, respectively, while blue and red meshed regions are mFo-DFc difference densities contoured at 3.0σ and -3.0σ, respectively. Also shown is the stereochemistry assessed by MolProbity (Materials and Methods section). All bp structures and electron densities in the training dataset are provided in Supplementary Figure S1. (D) 2D scatter plot comparing C1′-C1′ distance, shear, and opening for Hoogsteen bps mismodeled as Watson–Crick (red, n = 28) and the canonical Watson–Crick dataset (16) (blue, n = 149). The three structural criteria are denoted as the dashed line. (E) Workflow used to identify putative Hoogsteen bps mismodeled as Watson–Crick. (F) Percentage distribution of bps identified to be Hoogsteen (HG, orange), Watson–Crick (WC, sky blue) and ambiguous bps (AMB, yellow). Data shown for non-redundant bps following data curation (Materials and Methods section). Also shown is the percentage of Hoogsteen bps found in stressed regions of DNA.

Following their initial discovery, Hoogsteen bps were observed in a handful of crystal structures of protein–DNA complexes and shown to participate in DNA shape recognition (4–7). An early example was the crystal structure (PDB: 1IHF) of duplex DNA in complex with the integration host factor (IHF) protein (4). The structure included an unusual A(anti)-T Hoogsteen bp in which the adenine base was in the anti rather than syn conformation. The bp was located immediately adjacent to a nick used to aid crystallization (4).

More conventional A(syn)-T and G(syn)-C+ Hoogsteen bps in which the purine base is in the syn conformation were subsequently reported in crystal structures of intact DNA duplexes in complex with transcription factors, including the TATA box-binding protein (TBP) (5) (PDB: 1QN3, 6NJQ), MATα2 homeodomain (6) (PDB: 1K61) and the DNA binding domain of the p53 tumor suppressor protein (7) (PDB: 3KZ8). Beyond transcription factors, crystallographic and biochemical studies also revealed Hoogsteen bps in the active sites of specialized polymerases including human polymerase ι (8,9) (PDB: 1TN3, 2ALZ) and Sulfolobus solfataricus polymerase Dpo4 (10,11) (PDB: 1RYS, 1S0M), in which they were proposed to be involved in mediating the bypass of DNA damage during replication. These crystal structures together with structures of certain DNA–drug complexes (12) established Hoogsteen bps as an alternative to Watson–Crick imparting unique characteristics to the DNA.

NMR studies later revealed Hoogsteen bps are ubiquitous in DNA duplexes. Across a wide variety of sequence and positional contexts, A-T and G-C Watson-Crick bps were shown to exist in dynamic equilibrium with their Hoogsteen counterparts (13) (Figure 1A). The population (∼0.1–1.0%) of the minor Hoogsteen conformation exceeds that of other conformational states commonly stabilized by proteins such as the base open conformation (14) by more than two orders of magnitude. Since Hoogsteen bps can also occur in any sequence context (15), it is surprising that they have not been more extensively observed in crystal structures of DNA, particularly in protein–DNA complexes, in which the DNA structure is often highly distorted and conformationally stressed. Indeed, Hoogsteen bps appear to favor stressed regions in which the helix is unwound and/or kinked toward the major groove as well as at terminal ends of the DNA (16,17) and in which neighboring bps are partially melted (3).

Prior crystallographic studies have underscored the difficulty distinguishing Watson–Crick from Hoogsteen bps especially when the electron density is of moderate or low quality (7,8,18–20). Because a Watson–Crick bp is generally assumed initially unless there are other data to indicate otherwise, or the structure is at high resolution and reveals a clear non-Watson–Crick conformation, some of the Watson–Crick bps in current crystal structures of DNA in the Protein Data Bank (PDB) (21) could be ambiguous. Some might even be better modeled as Hoogsteen bps.

Re-analyzing the electron density for some ∼100,000 DNA bps bound to proteins in the PDB to assess the degree to which the data supports the Watson–Crick versus a Hoogsteen model is laborious and impractical. To help streamline this analysis, a recent study (20) developed an automated approach, which uses differences in electron density expected for Watson–Crick versus Hoogsteen bp models as fingerprints to identify Hoogsteen bps mismodeled as Watson–Crick. This work identified eight Hoogsteen bps mismodeled as Watson–Crick at terminal ends of DNA sites and in structures of DNA in complex with the polymerase Dpo4 which had previously been shown to bind DNA with Hoogsteen bps at certain positions (10,11,22,23).

Here, we developed an alternative structure-guided approach termed ‘Hoog-finder’ to rapidly screen for Hoogsteen bps that may have been mismodeled as Watson–Crick in crystal structures of protein–DNA complexes. Using Hoog-finder, we uncovered 17 bps that better satisfy the electron density and also result in improved stereochemistry when modeled as Hoogsteen relative to Watson–Crick. Seven of these Hoogsteen bps were observed in DNA in complexes of six proteins never before shown to bind DNA in a Hoogsteen conformation. Interestingly, almost all of the newly uncovered Hoogsteen bps were adjacent to mismatches, lesions, nicks and terminal ends, and some of them appear to play roles in DNA recognition and damage repair. In addition, more than half of the ∼200 bps examined had ambiguous electron density. Among these, 21 bps had slightly better fits to the electron density and/or resulted in improved stereochemistry when modeled as Hoogsteen relative to Watson–Crick. Thus, our results point to potentially broader roles for Hoogsteen bps than currently appreciated, particularly in stressed regions of the genome, and call for a community-wide effort to identify these bps in current and future crystal structures of DNA.

MATERIALS AND METHODS

Generating training dataset of Hoogsteen base pairs mismodeled as Watson–Crick

The training dataset (n = 28) was generated based on a previous X-ray structural survey of Hoogsteen bps (16). We selected all the non-redundant Hoogsteen bps from Table 1 in Zhou et al. (16), excluding structures with no deposited structure factors (e.g. Triostin A–DNA complex, PDB: 1VS2), with multiple models (e.g. terminal bps in Echinomycin–DNA complex, PDB: 1XVN), or with modified purine bases (e.g. the m1A(syn)-T bp in ALKBH2–DNA complex, PDB: 3H8O). To this dataset we also added two recent examples of G(syn)-C+ Hoogsteen bps from two recently solved crystal structures of the TBP–DNA complex (PDB: 6NJQ, 6UEO), which were not included in Zhou et al. (16). The final dataset contained a total of 28 Hoogsteen bps (22 A(syn)-T and 6 G(syn)-C+ Hoogsteen) (Figure 1C, Supplementary Figure S1, Table S1 and Supplementary Discussion S1).

All the Hoogsteen bps in the training dataset were then mismodeled as Watson–Crick bps using the following procedure: (i) The coordinates for the syn purine residue in the Hoogsteen bp was removed from the original coordinate file. (ii) An omit map was derived from the coordinates with the syn purine nucleotide in question removed and by three cycles of refinement in phenix.refine (24) using the default settings in the PHENIX software (25). (iii) An anti purine residue was modeled into the resulting omit map and optimized via real space refinement using COOT (26). (iv) A second round of refinement was conducted using the same phenix.refine routine with the remodeled coordinates. The stereochemistry of different bp models were assessed using MolProbity (27).

Identification of structural fingerprints for the training dataset

X3DNA-DSSR (28) was used to analyze all the nucleotide torsion angles (α, β, γ, δ, ϵ, ζ, χ, sugar phase angle) as well as all the bp parameters (shear, stretch, stagger, buckle, propeller twist, opening, C1′-C1′ distance) of the mismodeled Watson–Crick bps in the training dataset (n = 28) as well as for canonical Watson–Crick bps (n = 149) from a previous structural survey (16) (Supplementary Figure S2A–C). The sign of the raw output values for bp parameters shear and buckle were adjusted according to the index order of purine and pyrimidine as described in (29).

Generating the negative training dataset of Watson–Crick base pairs mismodeled as Hoogsteen

The negative training dataset (n = 10) was generated by selecting a subset of well-resolved Watson–Crick bps (five A-T and five G-C bps) from the canonical Watson–Crick bps (n = 149) from Zhou et al. (16) (Supplementary Figure S3 and Table S2). We performed a similar procedure as described in ‘Generating training dataset of Hoogsteen base pairs mismodeled as Watson–Crick’, but this time we flipped the anti purine to be the syn conformation and followed the same refinement protocol. The bp parameters (shear, stretch, stagger, buckle, propeller twist, opening) cannot be interpreted because they are ill-defined for the Hoogsteen bp given a change in the coordinate reference frame as described previously (29).

Screening putative Hoogsteen candidates in X-ray structures using Hoog-finder

The PDB coordinates and structure factor files of X-ray structures of protein-DNA complexes (defined as PDB structures with both DNA and protein present in the macromolecular entities) with resolution ≤ 3.5 Å were downloaded from the RCSB website (www.rcsb.org) on 29 August 2020. For palindromic DNA that were deposited as single chains in the ASU, the biological assemblies containing the double stranded models were downloaded from RCSB and processed by X3DNA-DSSR with the symmetry flag ‘–symm’. X3DNA-DSSR was then used to parse the structural descriptors of bps from all PDB structures into a searchable database, which included nucleotide local torsion angles (α, β, γ, δ, ϵ, ζ, χ, sugar phase angle), bp parameters (shear, stretch, stagger, buckle, propeller, opening), C1′-C1′ distance. We then searched for potential candidate bps that were Hoogsteen but mismodeled as Watson–Crick based on the following queries:

  • We only considered dA-dT or dG-dC bp with Watson–Crick geometry defined by the Leontis-Westhof (LW) notation as ‘cWW’, ‘cWS’, ‘cW.’, which excluded all the trans bps, Hoogsteen bps, platform bps and all bps involved in bp multiplets (e.g. triplets).

  • Based on the structural fingerprints of mismodeled Watson–Crick bps in the training dataset, we only considered bps that satisfy shear >0.5 Å, opening >10° and C1′-C1′ distance <10.0 Å simultaneously.

  • We manually checked and excluded cases including bps from tertiary interactions, misaligned bps that are false positive, bps in DNA regions with potential two-fold statistical disorder, bps with multiple modeling and identical bps due to crystal symmetry.

The local electron density for bps satisfying all the above queries (Starting dataset, n = 215) (Figure 1E and Supplementary Tables S3 and S4) were manually inspected. Cases of either weak local density which are difficult to model any bp (n = 91) or well-resolved Watson–Crick density (n = 58) were excluded (Figure 1E, Supplementary Figure S4A–B, Supplementary Tables S3 and S4). The remaining bps (Filtered dataset, n = 66) were subjected to a similar procedure used in ‘Generating training dataset of Hoogsteen base pairs mismodeled as Watson–Crick’, this time flipping the anti purine to the syn conformation to generate Hoogsteen bps for structural refinement. We then compared the agreement of the electron density and any improvement in stereochemistry of the two bases between the Watson–Crick and the Hoogsteen models. The 22 Hoogsteen bps identified using this procedure didn’t resemble the distorted Hoogsteen geometry in the negative training dataset. The remaining 44 bps were denoted ambiguous bps (Supplementary Figure S4C and Supplementary Discussion S2).

In the structures where we found Hoogsteen or ambiguous Hoogsteen bps, there are sometimes more than one repeating protein–DNA complex within a single ASU. However, not all the bp positions were identified by our structure-based screening. This is either because they did not form H-bonds detectable by 3DNA which were not included in the Parent dataset or because they failed to satisfy all three criteria applied due to subtle structural differences between different protein–DNA complexes (Supplementary Table S4). Therefore, we manually analyzed the electron densities of additional bps which were not identified by the structure-based screening. Indeed, we found four more Hoogsteen bps and seven more ambiguous Hoogsteen bps (Supplementary Table S4). Note that bps at the same positions as those in the other protein–DNA complexes in the ASU were considered redundant bps and subsequently removed from the curated dataset.

For the final structure refinement of all the putative Hoogsteen structures in the Hoogsteen dataset, we incorporated TLS refinement. We did not observe significant improvements or differences in the electron density when adding TLS refinement, but R-factors were improved. As TLS was not used in the original refinement of many of the structures, for a faithful comparison of R-factors between Watson–Crick and Hoogsteen models, we also repeated the same TLS refinement protocol on the corresponding original Watson–Crick model for all structures with Hoogsteen bps and those containing ambiguous Hoogsteen bps which display a slight preference as a Hoogsteen. The R-factors between Watson–Crick and Hoogsteen models are listed in Supplementary Table S5.

As the primary focus is on Watson–Crick and Hoogsteen bps, we did not inspect and fix potential modeling errors in the protein structures. However, we note that improvement of protein modeling in cases where there are errors may improve the overall electron density, which would require further investigation.

A similar analysis was also carried out for structures of DNA without protein bound but no putative Hoogsteen bps emerged.

Structural analysis

DNA global shape

DNA major and minor groove widths were quantified by the P-P distance metric (30) using X3DNA-DSSR (28). DNA inter-helical local kinking and twisting were quantified by a Euler angle approach as described before (16). In this approach, two 2-bp idealized B-form DNA helices (H1 and H2) were generated by 3DNA (31) and were superimposed on the DNA structure immediately above and below a specific junction (J) bp. The H1 is specified by the 5′-direction of one of the J residues (in ‘nt_1’ columns in Supplementary Table S9). The resulting orientation of the H1 and H2 was then calculated using three inter-helical Euler angles (αh, βh, γh) relative to a reference helix, in which the two helices are coaxially aligned in an idealized B-form helix geometry (16). The inter-helical Euler angle βh (0º ≤ βh ≤ 180º) therefore defines the local kink angle about the J bp, while γh (-180º ≤ γh ≤ 180º) defines the directionality of kinking, with γh = ±90º indicating major groove and γh = -180º ≤ γh ≤ -90º or 90º ≤ γh ≤ 180º indicating minor groove directed kinking, respectively (16). The inter-helical twist angle ζh = αh + γh describes the relative twist between H1 and H2 with ζh > 0º and ζh < 0º representing over- and unwinding, respectively (16). All the calculations with poor alignment to the idealized B-form helix (RMSD > 2 Å using all backbone atoms) were excluded as poor agreement to an idealized helix leads to unreliable Euler angles.

DNA protein interactions

H-bonding and van der Waals interactions between DNA and protein were detected by a web-based tool: DNAproDB (32) (https://dnaprodb.usc.edu/index.html).

NMR experiments

All the DNA constructs (hpCG, hpTA, hpTG, hpTT) used for NMR R measurements are summarized in Supplementary Figure S11A. 13C,15N uniformly labeled DNA samples were synthesized following the procedure described in Zimmer and Crothers, 1995 (33). The buffer used for NMR measurement was composed of 25 mM NaCl, 15 mM Na3PO4, 0.1 mM EDTA, 10% D2O at pH 5.9.

NMR 13C R experiments were carried out on Bruker Avance III 600 MHz equipped with a triple-resonance HCN cryogenic probe as described previously (34). Resonance assignments for hpTG were reported previously (34) while assignments for other constructs were readily obtained by overlaying spectra to the hpTG construct. The spinlock powers and offsets used in the R experiments are summarized in Supplementary Table S10. The analysis of R data was also described in a prior study (34). The fitting parameters of all the R profiles are listed in Supplementary Table S11.

RESULTS

Structural fingerprints of Hoogsteen base pairs mismodeled as Watson–Crick

We hypothesized that mismodeling a Watson–Crick bp into electron density belonging to a Hoogsteen bp could result in a distorted Watson–Crick geometry deviating from the canonical Watson–Crick conformation (Figure 1BD). These geometrical distortions could then be used as ‘structural fingerprints’ to screen the PDB for Hoogsteen bps that had been mismodeled as Watson–Crick before examining the electron density, which is laborious and time-consuming.

To examine whether or not Hoogsteen bps mismodeled as Watson–Crick have unique geometrical distortions, we built a training dataset (Supplementary Table S1) of previously reported A(syn)-T and G(syn)-C+ Hoogsteen bps (16) with available structure factors. The dataset comprised 22 A(syn)-T and six G(syn)-C+ Hoogsteen bps from 23 crystal structures of duplex DNA, 22 were DNA–protein complexes and one was a naked DNA duplex (Supplementary Table S1).

For each Hoogsteen bp in the training dataset, we generated an omit map by removing the syn purine. We also deliberately mismodeled the Watson–Crick bp by introducing a purine residue in the anti conformation. The resulting structure was refined using PHENIX (24,25) to generate coordinates and electron density maps for the structure with a mismodeled Watson–Crick bp (Figure 1C, Supplementary Figure S1 and Materials and Methods). Except for two bps, which had ambiguous electron density, the Hoogsteen bps showed better agreement with the electron density and better stereochemistry when assessed by MolProbity (27) compared to the Watson–Crick bps (Figure 1C, Supplementary Figure S1, Supplementary Discussion S1 and Materials and Methods). However, the extent of improvement varied from case to case, in agreement with the original publications.

We then compared the geometrical features of the mismodeled Watson–Crick bps with those of canonical Watson–Crick bps. The canonical Watson–Crick geometry was defined based on n = 149 bps obtained from a prior survey (16) (Materials and Methods) with well-defined density satisfying the Watson–Crick geometry. The geometrical features analyzed included backbone torsion angles, sugar pucker, bp parameters, C1′-C1′ inter-nucleotide distance, as well as major and minor groove widths (Supplementary Figure S2A–C).

For most structural parameters, including backbone torsion angles, sugar pucker and groove widths, we did not observe a clear distinction between the mismodeled and canonical Watson–Crick bps (Supplementary Figure S2D–F). However, for all the mismodeled Watson–Crick bps, the C1′-C1′ distance was consistently reduced by >0.6 Å (from ∼10.6 Å to <10.0 Å) relative to the canonical Watson–Crick geometry (Figure 1C,D and Supplementary Figure S2D). Constriction of the C1′-C1′ distance by ∼2.0–2.5 Å has been shown to be one of the most distinguishing structural (3,35) as well as functional (29,36,37) characteristics of the Hoogsteen bps relative to Watson–Crick, and it is not surprising that modeling Watson–Crick bps into density belonging to Hoogsteen bps would result in a constriction (Figure 1A). In addition, the purine base was also consistently displaced toward the major groove (shear > 0.5 Å) and adopted a more open conformation (opening > 10°) relative to a canonical Watson–Crick bp (Figure 1D and Supplementary Figure S2D). These deviations likely accommodate constriction of the C1′-C1′ distance, without them, the two bases would sterically clash.

Conversely, using a negative training dataset (n = 10) of Watson–Crick bps, we also asked whether there were ‘structural fingerprints’, which could be used to identify cases in which a Watson–Crick bp was mismodeled as Hoogsteen (Figure 1B). Indeed, we found that such mismodeled Hoogsteen bps have C1′-C1′ distances exceeding 10.0 Å, with the syn purine base being substantially displaced toward the minor groove resulting in loss of the H-bond between the purine-N7 and pyrimidine-N3 and oftentimes resulting in steric clashes between purine-N6/O6 and pyrimidine-N4/O4/N3 (Supplementary Figure S3, Table S2 and Materials and Methods).

Based on these results, we developed ‘Hoog-finder’, a structure-based approach to rapidly identify Hoogsteen bps which may have been mismodeled as Watson–Crick. Such bps could be identified if they satisfied all three ‘positive structural fingerprints’ (C1′-C1′ distance < 10 Å, shear > 0.5 Å, and opening > 10°) while also not satisfying the ‘negative structural fingerprints’ after being remodeled as Hoogsteen bps. As an initial test, Hoog-finder identified seven of eight Hoogsteen bps that were mismodeled as Watson–Crick and found in a prior analysis (20), with the one exception only satisfying two of the positive structural fingerprints.

Structure-based approach for identifying Hoogsteen base pairs mismodeled as Watson–Crick

We used Hoog-finder to screen 97,100 Watson–Crick bps in a Parent dataset (n = 97,100) representing 4002 crystal structures of all DNA-protein complexes in the PDB as of 29 August 2020 with resolution better than 3.5 Å. Hoog-finder identified 215 Watson–Crick bps in 173 crystal structures (Figure 1E; Materials and Methods). Pseudo-palindromic DNA sites, which displayed possible statistical disorder (20), were not included in the analysis (Materials and Methods). The electron density for each of these bps was then analyzed manually.

Of the 215 Watson-Crick bps examined, 58 showed good agreement with electron density and favorable stereochemistry as assessed using MolProbity (Supplementary Figure S4A and Supplementary Tables S3 and S4). These bps were annotated as ‘Watson–Crick’. These Watson–Crick bps were slightly distorted with geometrical features falling at the edge of the cutoff for all three structural criteria (Supplementary Figure S5). For 91 bps, the electron density around the bp in question was too weak to evaluate the Hoogsteen or Watson–Crick model (Supplementary Figure S4B and Supplementary Tables S3 and S4). These bps were annotated as ‘ambiguous’.

The remaining 66 bps were refined using PHENIX to compare the Hoogsteen containing model versus the model containing a Watson–Crick bp (Figure 1E). The resolutions of these structures ranged from 1.7 Å to 3.2 Å but most were better than 2.5 Å (Supplementary Table S5). However, while the electron densities surrounding the bps were, as expected, generally better in the high-resolution structures, the quality of the electron density varied locally around each bp and thus had to be analyzed in a case-by-case manner.

For each bp, we first generated an omit map by removing the anti purine residue. We then introduced a syn purine residue and refined the structure using PHENIX (Materials and Methods). We then assessed the agreement with the electron density maps between the refined Hoogsteen and original Watson–Crick model as well as the stereochemistry of the two bps. Bps showing much better agreement with the electron density in either Watson–Crick or Hoogsteen conformations were annotated as ‘Watson–Crick’ and ‘Hoogsteen’, respectively. In general, these bps showed better stereochemistry with the model that best fits the electron density. Bps showing a slight preference with the electron density and/or improved stereochemistry either due to lower number of steric clashes or more favorable H-bonding were labeled as ‘ambiguous Watson–Crick’ and ‘ambiguous Hoogsteen’. If no preference was observed, the bp was again labeled ‘ambiguous’. A list with all annotated bps is provided in Supplementary Table S4.

Interestingly, among these 66 bps examined, 22 showed better agreement with the electron density and stereochemistry when modeled as Hoogsteen relative to Watson–Crick (see Figures 1E, 2A, 3D, 4C, 5C, 6B, 7A,B and Supplementary Figure S6). Notably, in these cases, omit maps were not necessary to reveal that they had been incorrectly modeled as Watson–Crick with the fit to the Hoogsteen conformation providing the optimal fit. As in the training dataset (Supplementary Figure S1), the improved agreement with the electron density varied from case to case, in some cases the improvement was very substantial (e.g. PDB 5A0W in Figure 6B) whereas in other cases the Hoogsteen was clearly the better model but the difference relative to Watson–Crick was not as strong (e.g. PDB 5WN0 in Figure 4C). Except for nine terminal Hoogsteen bps, all of which formed crystal contacts, there were no crystal contacts observed with the remaining 13 non-terminal Hoogsteen bps that were identified. The other 44 bps were ambiguous (Supplementary Figure S4C and Supplementary Discussion S2), with 23 showing slightly better agreement with Hoogsteen (‘ambiguous Hoogsteen’) (Supplementary Figure S7 and Materials and Methods). In addition, we observed that for all of our Hoogsteen bps and even ambiguous Hoogsteen bps, the A-T N6—O4 and G-C N1/N7—N3 and G-C O6—N4 H-bonding distance is generally smaller relative to Watson–Crick models (Supplementary Figure S8).

Figure 2.

Figure 2.

Hoogsteen base pairs in Dpo4. (A) Comparison of 2mFo-DFc and mFo-DFc electron density maps calculated with models containing the original Watson–Crick (left) and Hoogsteen models (right). Electron density meshes and stereochemistry are as described in Figure 1C. The favored and less favored models are indicated using solid and dashed boxes, respectively. The boxes are in gray for ambiguous bps. A complete set of data is provided in Supplementary Figures S6, S7 and S9. (B) 3D structures of the protein–DNA complex showing the Hoogsteen bps. (C) Schematic showing the DNA (bolded PDB ID) containing Hoogsteen bps (in orange) and ambiguous Hoogsteen bps (in yellow next to orange stars). The corresponding structures (unbolded PDB ID) containing Watson–Crick bps (in sky blue) and ambiguous Watson–Crick bps (in yellow next to sky blue stars) are also shown inside the same dashed box. The lesions and mismatches were highlighted in red. (D) Distribution of C1′-C1′ distance between bases in bps at positions n-2 and n-4 in the Dpo4 DNA without lesions or mismatches (see PDB ID in Supplementary Table S8) (sky blue), showing constriction at n-2 compared to distribution of Watson–Crick bps in B-DNA (in gray) from Afek et al. (29). (E) Close up of the Hoogsteen bp in the active site of Dpo4, which are more compressed than idealized B-form DNA, thus avoiding potential steric clash between the DNA backbone at n-2 and Ile248.

Figure 3.

Figure 3.

Hoogsteen base pairs next to mismatches. (A andD) Comparison of 2mFo-DFc and mFo-DFc electron density maps for the Watson–Crick (left) and the corresponding Hoogsteen models (right) for (A) the G-C bp next to an A-C mismatch in TBP, and (D) the A-T bp next to a C-T mismatch in T5-flap endonuclease. Note that the G-C bp in (A) was modeled as a Hoogsteen bp in PDB 6UEO. Electron density meshes and stereochemistry are as described in Figure 1C and the box scheme is as described in Figure 2A. (B andE) 3D structures of the protein–DNA complex showing the Hoogsteen bps for (B) TBP and (E) T5-flap endonuclease. (C andF) Schematic showing the DNA containing Hoogsteen bps (in orange), as well as the mismatches (in red) for (C) TBP and (F) T5-flap endonuclease. (G) Hairpin DNA with and without mismatch used in NMR measurements. Hoogsteen populations were measured at G6-C14 bp. (H) NMR off-resonance R profiles of G6-C8. Spin-lock powers are color coded. Error bars were estimated using a Monte-Carlo scheme and are smaller than data points (Materials and Methods).

Figure 4.

Figure 4.

Hoogsteen base pairs in APE1-exo. (A–C) Comparison of 2mFo-DFc and mFo-DFc electron density maps for the original Watson–Crick (left) and corresponding Hoogsteen models (right) for the A-T bp at position n-1 and G-C bp at position n-2 (A and B) with C-T mismatch at position n in (A) a substrate complex (PDB: 5WN4) and (B) a product complex (PDB: 5WN1), and (C) with C-G match at position n in a substrate complex (PDB: 5WN0). Electron density meshes and stereochemistry are as described in Figure 1C and the box scheme is as described in Figure 2A. (D) 3D structures of the protein–DNA complex showing the Hoogsteen bps. (E) Schematic showing the DNA containing Hoogsteen bps (in orange), Watson–Crick bps (in sky blue) and ambiguous Watson–Crick bps (in yellow next to sky blue stars), as well as the R177 and mismatches (in red). Also shown is the nick (red solid circle) and cleavage site (red dashed circle).

Figure 5.

Figure 5.

Hoogsteen base pairs in TtAgo. (A–C) Comparison of 2mFo-DFc and mFo-DFc electron density maps for the original Watson–Crick (left) and the corresponding Hoogsteen models for the A-T bp at position n+ 4 in (A) an inactive substrate complex with a 15-mer target DNA, (B) an active substrate complex with a 16-mer target DNA, (C) a product complex with a 19-mer target DNA with a nick between positions n and n+1. Electron density meshes and stereochemistry are as described in Figure 1C and the box scheme, as in Figure 2A. A complete set of data for other bps and structures is provided in Supplementary Figure S7. (D) 3D structures of the protein-DNA complex showing the Hoogsteen bps. (E) Schematic showing the DNA containing Hoogsteen bps (in orange), ambiguous Watson–Crick bps (in yellow next to sky blue stars), as well as the E512 (in red). Also shown is the nick (red unfilled circle) and metal ions (red filled circle). Disordered DNA regions were denoted with transparency.

Figure 6.

Figure 6.

Hoogsteen base pairs in I-DMOI endonuclease. (A and B) Comparison of 2mFo-DFc and mFo-DFc electron density maps for the original Watson–Crick (left) and the corresponding Hoogsteen models (right) for the G-C bp at position n-2 in (A) a wild-type substrate complex, (B) the E117A mutant substrate complex. Electron density meshes and stereochemistry are as described in Figure 1C and the box scheme is as described in Figure 2A. (C) 3D structures of the protein–DNA complex showing the Hoogsteen bps. (D) Schematic showing the DNA containing Hoogsteen bps (in orange), Watson–Crick bps (in sky blue), as well as the E117/A117 (in red). Also shown are the metal ions (red filled circle).

Figure 7.

Figure 7.

Terminal Hoogsteen base pairs. (A and B) Comparison of 2mFo-DFc and mFo-DFc electron density maps for the Watson–Crick (left) and the corresponding Hoogsteen models (right) for (A) a G-C terminal bp in Homing endonuclease I-Onul and (B) a A-T terminal bp in Esp1396I. Electron density meshes and stereochemistry are as described in Figure 1C and the box scheme is as described in Figure 2A. A complete set of data is provided in Supplementary Figures S6 and S7. (C) An example of crystal stacking interactions in a terminal Hoogsteen bp.

As a positive control, our pipeline correctly uncovered all Hoogsteen bps that were mismodeled as Watson–Crick within the same position in the four complexes in the crystallographic ASU that were also identified in a prior study (20) (Supplementary Tables S5 and S6). In addition, several bps annotated as Hoogsteen or ambiguous Hoogsteen were found in structures of proteins previously shown to bind DNA in a Hoogsteen conformation (Supplementary Figures S6 and S7, Supplementary Tables S5–S7 and Supplementary Discussions S3 and S4). These include the tumor suppressor p53 (7) and Sulfolobus solfataricus polymerase Dpo4 (10).

In summary, ∼10% (n = 22) of the bps identified using our pipeline were Hoogsteen, ∼63% (n = 135) were ambiguous (including those with weak density), and only ∼27% (n = 58) were Watson–Crick. The percentages of Hoogsteen (n = 17, ∼9%), Watson–Crick (n = 52, ∼26%) and ambiguous (n = 130, ∼65% with n = 21 ambiguous Hoogsteen) bps did not change substantially when curating the data to account for redundant bps (Figure 1F, Supplementary Table S3 and Materials and Methods). As expected, we didn’t observe substantial overall structural changes (coordinate RMSD) as well as substantial differences between the R-work/R-free (R-factors) in the structures when they were refined with a Watson–Crick or Hoogsteen conformation (Supplementary Table S5). This underscores the limitations of R-factors in differentiating model differences that comprise a small percentage of the total structure. Interestingly, in a set of high-resolution structures (resolution better than 2.5 Å), for 13 out of 15 Hoogsteen bps mismodeled as Watson–Crick, the B-factors of the purine residue were slightly lower when modeling the bp as Hoogsteen relative to Watson–Crick (Supplementary Table S6), which is consistent with the Hoogsteen bp providing a better fit. Although the differences are small, compared to R-factors, local B-factors may be better parameters to assess model selection noting that B-factors are generally most reliable in high resolution structures. In all, these results suggest there may be widespread ambiguities regarding the nature of base pairing in existing structures of DNA that are not well-documented, and that Hoog-finder provides a means for effectively identifying such bps.

Hoogsteen bps are located near stressed DNA sites

Most of the newly identified Hoogsteen bps were located in stressed regions of DNA duplexes, which we define to be bps that are not flanked by canonical Watson–Crick bps as detected using X3DNA-DSSR (28). Among the 17 Hoogsteen bps, 13 A(syn)-T and four G(syn)-C+, 16 (94%) were at or near stressed regions of DNA duplexes. Two were found next to a mismatch, two near lesions, two next to a nick, one near a melted bp, and nine were terminal bps (Figure 1F and Supplementary Table S6). For comparison, only ∼20% of the total bps in the Parent dataset (∼100 000 bps) were in stressed regions of DNA duplexes.

Hoogsteen base pairs near lesions

Among the eight non-terminal Hoogsteen bps, three were in crystal structures of DNA bound to the low fidelity polymerase Sulfolobus solfataricus polymerase Dpo4 (38) (Figure 2AC, Supplementary Figure S6, Table S6 and Supplementary Discussion S3). Figure 2A shows the improvement in the electron density observed with the Hoogsteen versus Watson–Crick model in some cases accompanied by better stereochemistry including reduced steric clashes and more favorable H-bonding. An additional five ambiguous Hoogsteen bps in Dpo4 structures were also identified that showed a slightly better fit to the electron density when modeled as Hoogsteen relative to Watson–Crick and also showed some improvement in stereochemistry (Supplementary Figure S7, Table S7 and Supplementary Discussion S3). The large number of crystal structures available for Dpo4-DNA complexes (n = 162) provided a unique opportunity to assess the role of DNA stress, in this case lesions and mismatches, in determining preferences for a Hoogsteen versus the Watson–Crick conformation. To aid this statistical analysis, we also considered those five ambiguous bps in Dpo4-DNA structures that show a slight preference for the Hoogsteen conformation.

Prior studies have identified Hoogsteen bps in crystal structures of Dpo4 in which they were proposed to accommodate lesion-induced DNA distortions (10,11,22,23) to allow bypass of damage during replication (39). Our new findings expand this Hoogsteen landscape, revealing Hoogsteen bps adjacent to a wider variety of damaged nucleotides (such as 2,4-difluorotoluene and S-methanocarba-dATP), sampling a broader variety of positions (n-3 in addition to the previously documented n-1 and n-2) relative to the active site, with two or as many as three consecutive Hoogsteen or ambiguous bps forming adjacent to one another (Figure 2BC and Supplementary Tables S6 and S7).

Importantly, Hoogsteen bps were only observed in Dpo4-DNA crystal structures (n = 9) with duplexes containing lesions or mismatches (Figure 2C, Supplementary Figure S10 and Supplementary Discussion S3). By contrast, 26 Dpo4-DNA crystal structures lacking lesions or mismatches were purely Watson–Crick (Supplementary Table S8). Not all structures, however, containing mismatches or lesions feature Hoogsteen or ambiguous bps. Instead, duplexes containing the lesions can be Hoogsteen, Watson–Crick or ambiguous bps depending on the identity of the base partner and/or position of lesion along the duplex (Figure 2C and Supplementary Figure S10).

As noted, Hoogsteen bps in the Dpo4-DNA structures tend to be observed at positions n-1 and n-2 near the active site (n) (Figure 2C and Supplementary Figure S10). Interestingly, we noticed that the C1′-C1′ distances at n-2 were slightly pre-constricted even when the bps are Watson–Crick in Dpo4 DNA lacking lesions or mismatches (Figure 2D). Without the constriction at this position, steric collisions would occur with the Dpo4 protein (Figure 2E). Thus, it appears that Dpo4 actively constricts the bp at this position, and that this in turn increases the propensity to form a Hoogsteen bp. A similar mechanism has been proposed to explain the preference of polymerase ι for Hoogsteen bps in its active site (n) (36). These findings reinforce a prominent role for Hoogsteen bps in DNA damage and mismatch bypass by Dpo4. The Hoogsteen bps might serve to better absorb the conformational stress and deviation from canonical Watson–Crick geometry imposed by damaged nucleotides or mismatches.

Hoogsteen base pairs near mismatches

We recently reported the first series of crystal structures for a transcription factor bound to a DNA duplex containing mismatches (29). Although not discussed in the original publication, one of the structures (PDB: 6UEO) included a G(syn)-C+ Hoogsteen bps immediately adjacent to a partially melted A-C mismatch within the consensus sequence of TBP, a transcription factor shown previously to bind matched DNA in a Hoogsteen conformation (5). The G(syn)-C+ Hoogsteen bp occurs at an unstacked step, an environment similar to duplex terminal ends, in which Hoogsteen bps are frequently found (16) (Figure 3AC).

Interestingly, our new analysis identified other Hoogsteen bps next to mismatches, including two A(syn)-T Hoogsteen bps sandwiched between two C-T mismatches in a complex involving the endonuclease T5 flap (T5Fen). Here, the electron density and stereochemistry strongly favor the Hoogsteen over Watson–Crick model (Figure 3D). This enzyme trims branched DNAs that arise from Okazaki-fragment synthesis (40). The C-T mismatches were used to aid crystallization (PDB: 5HP4) in a region distant from the active site (41) (Figure 3DF). Like Hoogsteen bps, pairing to form a C-T mismatch requires constriction of the two bases by ∼2.0–2.5 Å. Indeed, pre-constricted pyrimidine-pyrimidine mismatches such as C-T and T-T have recently been shown to mimic the distortions induced by Hoogsteen bps (29). The T5Fen crystal structure suggests that in addition to structurally mimicking the constricted Hoogsteen conformation (29), these mismatches can also promote Hoogsteen bps at neighboring sites.

We tested the above hypothesis for naked duplex DNA under solution conditions with the use of off-resonance R relaxation dispersion (RD) NMR experiment (42–44). The R experiment can characterize chemical exchange between a major conformational state and a minor low-populated and short-lived species. The experiment measures the effect of resonance broadening due to chemical exchange as a function of varying the spin-lock power (ωSL) and frequency (Ω) of a continuous radiofrequency (RF) field. Chemical exchange will result in the appearance of peak in the off-resonance R profile, which can be fit to an appropriate kinetic model to obtain the population of the minor species (pB), the exchange rate (kex = k1 + k-1) as well as the chemical shift difference between the major and minor state (ΔωB = ωminor – ωmajor). We measured off-resonance R profiles for guanine-C8 in a G-C bp surrounded by Watson–Crick bps and then examined how the exchange varies when introducing neighboring G-T or T-T mismatches (Figure 3G). We observed the (13,45) R profiles in the Watson–Crick control expected for Hoogsteen exchange (pB ∼ 0.5%, kex ∼ 600 s–1, ΔωB ∼ 3 ppm). The R profiles differed for the duplexes with mismatches. Strikingly, a 2-state fit of these data reveals that the equilibrium G-C+ Hoogsteen population increased by 3- and 13-fold when placed next to G-T and T-T mismatches, respectively (Figure 3GH and Supplementary Figure S11).

Hoogsteen base pairs near nicks

Nicked DNA is a form of damage and reaction intermediate that various enzymes act upon during DNA replication, damage repair and gene editing (46–48). Our pipeline identified Hoogsteen bps near nicked sites in crystal structures of DNA duplexes bound to two different proteins, human AP endonuclease 1 (APE1) and Thermus thermophilus Argonaute (TtAgo).

APE1 is a multifunctional enzyme. One of its roles is an exonuclease removing 3′ lesions (49,50) to enable downstream repair. Through its exonuclease activity, APE1 is proposed to help proofread polymerase β insertions during BER by removing mis-inserted bases to regenerate a gapped DNA (51–53). In this role, APE1 needs to act on the mis-inserted mismatched base adjacent to a 3′ nick while discriminating against a correctly inserted Watson–Crick bp.

In the crystal structure (PDB: 5WN4) of the catalytically active substrate complex of APE1 bound to a nicked DNA duplex containing template thymine and mis-inserted cytosine, the T-C mismatch within the active site is melted and the DNA backbone is sharply bent within the catalytic pocket (Figure 4A, DE). The n-1 and n-2 bps adjacent to the mismatch have weak electron density and we annotate them as ambiguous (Figure 4A). A similar structure was observed (PDB: 5WN1) for the product complex following excision of the mis-inserted cytosine in which positions n-1 and n-2 form well-resolved Watson–Crick bps (Figure 4B, DE).

In contrast, in the corresponding APE1-DNA crystal structure (PDB: 5WN0) with template guanine, the correctly inserted cytosine formed the expected Watson–Crick G-C bp (Figure 4D and E). However, the structure of this complex differs substantially from that of the T-C mismatch. The inserted cytosine is displaced 7.5 Å away from the active site. In the original publication (53), this inactive APE1-DNA conformation was proposed to explain how it discriminates and avoids cleaving matched Watson–Crick DNA.

Interestingly, our analysis identifies the n-1 and n-2 bps in this inactive structure to be A(syn)-T and G(syn)-C+ Hoogsteen bps, respectively (Figure 4C). The electron density here is not as strong as for some of the other examples and it is clearer for the position n-1 versus n-2, with both positions showing improved stereochemistry with the Hoogsteen model (Figure 4C). By unwinding the DNA ∼12°, the Hoogsteen bps appear to induce a register shift so that they now occupy the active site in place of the G-C Watson–Crick bp, displacing the inserted cytosine away from the active site (Figure 4D,E and Supplementary Table S9). In addition, one of the key catalytic residues, Arg177 is recruited to the Hoogsteen bps where it stacks on the thymine base and forms H-bonds with the thymine phosphate backbone at position n-1. Notably, position n-1 was also shown to be a Hoogsteen bp in Figure 4B of the original publication by Whitaker et al. (53); however, the bp is modeled as Watson–Crick in the deposited PDB and no reference was made to the Hoogsteen bp in the publication (53). The Hoogsteen bps may help increase the specificity of APE1 through an induced-fit (54) mechanism by stabilizing a catalytically inactive conformation when bound to a matched Watson–Crick bp. It should be noted that APE1 functions primarily as an endonuclease to cleave abasic sites in the base excision repair pathway and interestingly, no Hoogsteen bps were identified in these functional contexts in this study.

Our analysis also identified an A(syn)-T Hoogsteen bp near a nick in crystal structures (PDB: 4KPY, 4NCA, 4NCB) of the TtAgo–DNA complex. TtAgo employs short 13–25 nt single-stranded DNA guides to introduce nicks between positions n and n + 1 in single-stranded RNA during RNA silencing (55,56) and in single-stranded DNA as part of a defense system (56,57). Prior crystal structures of TtAgo complexes with guide and target DNA revealed a transition between inactive and active conformations that ensures specificity toward substrates of specific length. During this transition, the highly conserved catalytic residue Glu512 moves near the binding pocket where it contacts the DNA backbone at position n + 4, forming water mediated contacts with catalytic metal ions (58).

In both the inactive (PDB: 4N41) and the active (PDB: 5GQ9) substrate complex (PDB: 5GQ9) where the target DNA is not cleaved, the bp at position n + 4 is more favored as Watson–Crick bp (Figure 5A,B, D and E). However, in a crystal structure (PDB: 4KPY) of a product complex where the target DNA is cleaved (with DNA nicked between n and n + 1), our analysis indicates that the bp at position n + 4 is an A(syn)-T Hoogsteen bp (Figure 5C). Here, the strong positive difference densities around adenine N7/C5/N6 and at N3 in the Watson–Crick conformation essentially disappear when the base is modeled and refined in the Hoogsteen conformation (Figure 5C). The Hoogsteen bp retains the same contacts with Glu512 as observed in the Watson–Crick conformation (Figure 5E). Although it remains unclear what interactions favor the Hoogsteen bp, the density at this position also slightly favors the Hoogsteen conformation in two other related crystal structures (PDB: 4NCA and 4NCB) (Supplementary Figure S7 and Table S7). Moreover, a preference to form a Hoogsteen bp at position n + 4 was robustly observed for the same bps in complexes that were present as multiple copies in the crystallographic asymmetric unit (ASU). Therefore, these data are suggestive of a Watson–Crick to Hoogsteen transition taking place during the catalytic cycle, but this requires further investigation.

Our analysis also identified an ambiguous A(syn)-T Hoogsteen bp adjacent to a nick in the crystal structure of an inactive hairpin-forming complex of the RAG1/2 recombinase (PDB: 5ZDZ) (Supplementary Figure S12 and Table S7). Together with the prior crystal structure of the IHF-DNA complex (4), these results suggest a preference for Hoogsteen bps adjacent to nicked sites.

Hoogsteen bps involving interactions with metal ions

Our analysis also identified a Hoogsteen bp in a crystal structure of the homing endonuclease I-DMOI that appears to be stabilized through interactions with metal ions. I-DMOI sequence specifically recognizes and cleaves a stretch of 22 bps of double-stranded DNA (59). In the crystal structure (PDB: 4UN9) of the catalytically active conformation, the DNA within the active site is locally overwound and has a substantially narrowed minor groove (Supplementary Table S9). The catalytic residue Glu117 contacts two metals, termed MB and MC, which in turn form a network of interactions with the DNA, stabilizing a strained conformation at positions n-1 to n-3 (Figure 6A, C, D).

Interestingly, in the corresponding crystal structure (PDB: 5A0W) of a mutant of I-DMOI with Glu117 replaced by Ala117, MC is no longer observed, while MB changes coordination likely to compensate for loss of contacts with Glu117 (Figure 6C,D). This alteration in metal coordination is accompanied by a change in the DNA conformational strain, particularly at position n-1. Rather than base pairing, the adenine and partner thymine stack on top of each other, thus constricting the DNA (Figure 6D). Immediately adjacent to this unusual A/T stack at position n-2, our analysis identified a G(syn)-C+ Hoogsteen bp, which may help to absorb the unusual constriction at the neighboring position n-1 (Figure 6B). Here, the electron density and stereochemistry very clearly favor the Hoogsteen over the Watson–Crick model (Figure 6B). As proposed in the original paper (60), it is possible that the newly positioned metal MB, and phosphate group stabilizes this new type of strain. The same bp in the other complexes in the ASU were also identified as Hoogsteen.

It is noteworthy that the ambiguous Hoogsteen bp observed next to a nick in the crystal structure of the inactive hairpin-forming complex of the RAG1/2 recombinase (PDB: 5ZDZ) also featured changes in metal coordination to the DNA relative to the active Watson–Crick form (Supplementary Figure S12, Table S7 and Supplementary Discussion S5), providing an additional example in which metals appear to participate in Hoogsteen bp formation.

Terminal Hoogsteen bps

Many biochemical processes act on the terminal ends of DNA duplexes, including homologous recombination and nonhomologous end joining (61). There is evidence showing a preference for Hoogsteen bps to form within terminal ends of DNA duplexes. The prior Hoogsteen survey (16) identified at least 10 A(syn)-T and two G(syn)-C+ terminal Hoogsteen bps distant from the protein binding site in 10 crystal structures of protein–DNA complexes. In addition, Hintze et al. (20) identified an additional 4 A(syn)-T and two G(syn)-C+ Hoogsteen bps that were mismodeled as Watson–Crick, also distant from the protein binding site. As noted by Hintze et al. (20), some of these terminal Hoogsteen bps could be stabilized by crystal contacts. However, solution state NMR RD studies also show a 4-fold higher propensity to form Hoogsteen bps at DNA terminal ends relative to the center of a DNA duplex (62).

Our current analysis uncovered two new terminal Hoogsteen bps positioned also distant from protein binding sites (Supplementary Table S6). These include a G(syn)-C+ bp in the DNA of a homing endonuclease I-Onul complex (PDB: 3QQY) (Figure 7A), and an A(syn)-T bp in the DNA complexed with the regulatory protein Esp1396I of the type II restriction-modification (RM) system (PDB: 4IWR) (Figure 7B). Here, both the electron density and stereochemistry favor the Hoogsteen over the Watson–Crick model (Figure 7A,B). We also identified several ambiguous Hoogsteen bps at DNA terminal ends (Supplementary Figure S7, Table S7 and Supplementary Discussion S6). We cannot rule out that these terminal Hoogsteen bps are induced by crystal contacts, as all of them are involved in packing with neighboring symmetry related molecules in the crystal unit cell. For example, the terminal Hoogsteen bp in Esp1396I stacks with a symmetry related Hoogsteen bp from a neighboring complex in the crystal (Figure 7C).

DISCUSSION

It is commonly assumed that A-T and G-C bps in duplex DNA are Watson–Crick. However, prior studies showed that certain proteins (4–7) and drugs (12) bind to specific DNA sequences and render the Hoogsteen bp as the dominant conformation at certain positions. Our results suggest that Hoogsteen bps are not restricted to a few transcription factors or specialized polymerases, but may in fact be a more common feature of conformationally stressed DNA, also found in complexes with enzymes that repair or cleave DNA.

In particular, forms of stress that result in the constriction of the helical diameter, such as pyrimidine–pyrimidine mismatches and stacking of base partners, or that result in an environments mimicking the terminal ends, such as nicks, appear to favor the Hoogsteen conformation. Interestingly, in the crystal structure of the IHF–DNA complex (4), a Hoogsteen bp was only observed at the nicked site but a Watson–Crick bp was observed at a symmetrically pseudo-symmetry related site lacking the nick. In addition, solution NMR studies (63) revealed that the Hoogsteen bp observed in the crystal structure of the complex does not form in an intact DNA duplex lacking the nick. Thus, the Hoogsteen bp observed in the IHF–DNA complex can directly be attributed to the nick.

The enrichment of Hoogsteen bps in non-canonical regions also has implications for the occurrence of ambiguous Hoogsteen bps in crystal structures of RNA. In stark contrast to duplex B-DNA, rA-rU and rG-rC Hoogsteen bps have been shown to be highly energetically disfavored in A-RNA duplexes due to unique constraints imposed by the A-form geometry (35,64). Nevertheless, rA-rU and rG-rC Hoogsteen bps have been observed near non-canonical regions, such as near bulges and internal loops (PDB: 1HR2, 2R8S) and in the context of tertiary contacts (PDB: 3G78). Given the potential to form Hoogsteen in these contexts, there is also room for ambiguity during crystallographic refinement. Preliminary application of Hoog-seq reveals several candidate hits within a subset of crystal structures of RNA and protein–RNA complexes (Supplementary Figure S13), including one unambiguous rA-rU Hoogsteen bp which was mismodeled as Watson–Crick in the crystal structure (PDB: 3PDR) of M-box riboswitch, one ambiguous rA-rU Hoogsteen bp also in the at the same position in the same RNA (PDB: 2QBZ), and another unambiguous rA-rU long-range Hoogsteen bp involving tertiary contacts in the crystal structure (PDB: 5DAR) of a 74-nt fragment of RNA in complex with 50S ribosomal protein L10. Future studies should therefore also comprehensively examine the potential occurrence of Hoogsteen bps mismodeled as Watson–Crick in RNA structures as well.

The crystallographic and NMR evidence presented here showing a preference for Hoogsteen bps near mismatches is of particular interest considering a recent study (29) showing that introducing mismatches, including pyrimidine–pyrimidine mismatches that we find favors Hoogsteen bps at specific positions in duplex DNA, can increase transcription factor binding affinity. High affinity transcription factor binding to mismatched DNA could compete with damage repair and promote mutagenesis at transcription factor binding sites (65). The increased binding affinity imparted by mismatches was previously attributed in part to pre-paying the energetic cost of deforming the DNA for protein recognition. Based on our results, Hoogsteen bps near mismatches could also contribute to high affinity binding to mismatched DNA.

Even for the bps annotated as Hoogsteen, the weight of the crystallographic evidence varied from case to case. Whether these newly uncovered Hoogsteen bps also form under physiological solution conditions remains to be established. It will therefore be important to apply complementary solution-state approaches to test the validity of these Hoogsteen bps, resolve ambiguous bps, and also provide insights into any Watson–Crick to Hoogsteen dynamics that may be taking place. In the case of p53-DNA complexes, the tandem A(syn)-T Hoogsteen bps observed in crystal structures (7) could be verified independently under solution conditions using chemical substitutions (37) and more recently, via high throughput binding measurements (29). Similarly, the G(syn)-C+ Hoogsteen bps observed in crystal structures of TBP (5) were recently verified under solution conditions using IR spectroscopy (66). These and other chemical probing approaches (67) could be used to verify the newly identified Hoogseen bps under solution conditions.

While we have proposed potential roles for some of the newly identified Hoogsteen bps, future studies could more directly examine their biological significance. Here, approaches similar to those first introduced to study polymerase ι (9,68) could be applied: one examines how deazapurine substitutions which selectively destabilize the Hoogsteen bp (69), or pyrimidine–pyrimidine substitutions which mimic the Hoogsteen bp (29), impact binding affinity and/or enzymatic activity.

There is good reason to believe that additional Hoogsteen bps remain to be uncovered that are presently modeled as Watson–Crick in existing crystal structures of DNA. Our pipeline only analyzed the electron density for ∼200 out of ∼90 000 bps satisfying all three positive structural fingerprints, yet based on our training dataset, we know that some Hoogsteen bps only satisfy a subset of the criteria. There are an additional ∼1400 bps that remain to be analyzed that satisfy the key C1′-C1′ distance criteria, which appears to be the most reliable diagnostic feature of a Hoogsteen conformation. In addition, Hoog-finder will likely fail to identify Hoogsteen-like conformations found in a previous survey of crystal structures (16), in which the two base partners are not constricted but form H-bonds with syn purine bases.

Equally importantly, many of the DNA bps analyzed in existing crystal structures could not be definitively modeled as either Watson–Crick or Hoogsteen. Among the ∼200 bps satisfying all three structural fingerprints, over 60% were ambiguous. In this regard it is notable that our data indicate that Hoogsteen bps tend to be located adjacent to mismatches, lesions or nicks, which may be more flexible. However, it remains to be seen whether the weakening of electron density at some of these sites originates from increased flexibility. Future studies should also explore the application of ensemble-based refinement of both Watson–Crick and Hoogsteen models with fractional populations (70,71). Together with prior studies showing the ambiguity when modeling Hoogsteen versus Watson–Crick (7,8,18–20), these results underscore the importance of exercising caution when modeling DNA bases, test Hoogsteen and other conformational states as a possible alternative, and annotate those bps that have ambiguous electron density.

Our approach identified 13 new Hoogsteen bps (Supplementary Table S6), which were not previously identified in the study by Hintze et al. (20), which utilized as the sole diagnostic, the pattern of difference electron density peaks (Figure 1B). Indeed, some of the bps, which we found to be mismodeled as Watson–Crick but are really Hoogsteen, did not show all the expected diagnostic difference electron density peaks used by Hintze et al. (20) (Supplementary Discussion S1). However, this is not surprising given the relatively low resolutions of some of the structures and/or the weak electron density in the vicinity of the given bps; hence the difference densities in some of the structures were not highly reliable. In fact, half of Hoogsteen bps mismodeled as Watson–Crick in the training dataset lack the precise diagnostic difference density peaks and therefore could not be identified by the find_purine_decoy program developed in Hintze et al. (20) (Supplementary Figure S1 and Supplementary Discussion S1). Future studies could combine aspects of the two approaches to most effectively flag for potential Hoogsteen bps mismodeled as Watson–Crick.

Finally, we hope that these findings will help spur a community-wide effort to re-analyze existing structures of DNA to consider the possibility of Hoogsteen and perhaps other bp conformations and to find ways to resolve bp conformation ambiguities in crystal structures and to also consider the Hoogsteen conformation when solving future crystal structures of DNA.

DATA AVAILABILITY

The PDB coordinate files and structure factor amplitudes (MTZ) files of all the structural models rebuilt and refined with Hoogsteen bps in our study can be downloaded from: https://github.com/alhashimilab/HoogsteenInTheData. The Hoog-Finder Python program with the user manual can be downloaded from: https://github.com/alhashimilab/Hoog-Finder. All other data supporting the findings are available within the article and its Supplementary Data.

Supplementary Material

gkab936_Supplemental_File

ACKNOWLEDGEMENTS

We thank members of the Al-Hashimi laboratory for assistance and critical comments on the manuscript, Prof. Zippora Shakked (Weizmann Institute of Science) for bringing to our attention the Hoogsteen base pair in the crystal structure of T5 flap endonuclease (PDB: 5HP4), Prof. Mark Wilson (University of Nebraska) for critical input during early stages of the project, and Dr. Bradley Hintze (Duke University) for assistance.

Author contributions: H.S., M.A.S. and H.M.A. conceived the project and experimental design. H.S. performed the structural survey. H.S. performed X-ray structure refinement and other structural analysis, with assistance from M.A.S., S.G., H.-F.L. and U.P. I.J.K. prepared NMR samples, performed NMR experiments and analyzed NMR data. H.M.A. and H.S. wrote the manuscript with critical input from M.A.S.

Notes

Present address: Honglue Shi, California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA 94720, USA.

Present address: Isaac J. Kimsey, Nymirum, 4324 S. Alston Avenue, Durham, NC 27713, USA.

Contributor Information

Honglue Shi, Department of Chemistry, Duke University, Durham, NC 27710, USA.

Isaac J Kimsey, Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

Stephanie Gu, Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

Hsuan-Fu Liu, Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

Uyen Pham, Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

Maria A Schumacher, Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

Hashim M Al-Hashimi, Department of Chemistry, Duke University, Durham, NC 27710, USA; Department of Biochemistry, Duke University School of Medicine, Durham, NC 27710, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health [R01GM089846 to H.M.A.; R35GM130290 to M.A.S.]. Funding for open access charge: National Institutes of Health [R01GM089846 to H.M.A.; R35GM130290 to M.A.S.].

Conflict of interest statement. None declared.

REFERENCES

  • 1. Felsenfeld G., Davies D.R., Rich A.. Formation of a three-stranded polynucleotide molecule. J. Am. Chem. Soc. 1957; 79:2023–2024. [Google Scholar]
  • 2. Hoogsteen K. The structure of crystals containing a hydrogen-bonded complex of 1-methylthymine and 9-methyladenine. Acta Crystallogr. 1959; 12:822–823. [Google Scholar]
  • 3. Sathyamoorthy B., Shi H., Zhou H., Xue Y., Rangadurai A., Merriman D.K., Al-Hashimi H.M.. Insights into Watson–Crick/Hoogsteen breathing dynamics and damage repair from the solution structure and dynamic ensemble of DNA duplexes containing m1A. Nucleic Acids Res. 2017; 45:5586–5601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rice P.A., Yang S., Mizuuchi K., Nash H.A.. Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn. Cell. 1996; 87:1295–1306. [DOI] [PubMed] [Google Scholar]
  • 5. Patikoglou G.A., Kim J.L., Sun L., Yang S.H., Kodadek T., Burley S.K.. TATA element recognition by the TATA box-binding protein has been conserved throughout evolution. Genes Dev. 1999; 13:3217–3230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Aishima J., Gitti R.K., Noah J.E., Gan H.H., Schlick T., Wolberger C.. A Hoogsteen base pair embedded in undistorted B-DNA. Nucleic Acids Res. 2002; 30:5244–5252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Kitayner M., Rozenberg H., Rohs R., Suad O., Rabinovich D., Honig B., Shakked Z.. Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2010; 17:423–429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Nair D.T., Johnson R.E., Prakash S., Prakash L., Aggarwal A.K.. Replication by human DNA polymerase-iota occurs by Hoogsteen base-pairing. Nature. 2004; 430:377–380. [DOI] [PubMed] [Google Scholar]
  • 9. Nair D.T., Johnson R.E., Prakash L., Prakash S., Aggarwal A.K.. Human DNA polymerase iota incorporates dCTP opposite template G via a G.C + Hoogsteen base pair. Structure. 2005; 13:1569–1577. [DOI] [PubMed] [Google Scholar]
  • 10. Ling H., Boudsocq F., Plosky B.S., Woodgate R., Yang W.. Replication of a cis-syn thymine dimer at atomic resolution. Nature. 2003; 424:1083–1087. [DOI] [PubMed] [Google Scholar]
  • 11. Ling H., Sayer J.M., Plosky B.S., Yagi H., Boudsocq F., Woodgate R., Jerina D.M., Yang W.. Crystal structure of a benzo[a]pyrene diol epoxide adduct in a ternary complex with a DNA polymerase. Proc. Natl. Acad. Sci. USA. 2004; 101:2265–2269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Wang A.H., Ughetto G., Quigley G.J., Hakoshima T., van der Marel G.A., van Boom J.H., Rich A.. The molecular structure of a DNA-triostin A complex. Science. 1984; 225:1115. [DOI] [PubMed] [Google Scholar]
  • 13. Nikolova E.N., Kim E., Wise A.A., O’Brien P.J., Andricioaei I., Al-Hashimi H.M.. Transient Hoogsteen base pairs in canonical duplex DNA. Nature. 2011; 470:498–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Gueron M., Kochoyan M., Leroy J.L.. A single mode of DNA base-pair opening drives imino proton exchange. Nature. 1987; 328:89–92. [DOI] [PubMed] [Google Scholar]
  • 15. Alvey H.S., Gottardo F.L., Nikolova E.N., Al-Hashimi H.M.. Widespread transient Hoogsteen base pairs in canonical duplex DNA with variable energetics. Nat. Commun. 2014; 5:4786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zhou H., Hintze B.J., Kimsey I.J., Sathyamoorthy B., Yang S., Richardson J.S., Al-Hashimi H.M.. New insights into Hoogsteen base pairs in DNA duplexes from a structure-based survey. Nucleic Acids Res. 2015; 43:3420–3433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Nikolova E.N., Zhou H., Gottardo F.L., Alvey H.S., Kimsey I.J., Al-Hashimi H.M.. A historical account of hoogsteen base-pairs in duplex DNA. Biopolymers. 2013; 99:955–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Wang J. DNA polymerases: Hoogsteen base-pairing in DNA replication?. Nature. 2005; 437:E6–E7. [DOI] [PubMed] [Google Scholar]
  • 19. Aggarwal A., Nair D., Johnson R., Prakash L., Prakash S.. Hoogsteen base-pairing in DNA replication? Reply. Nature. 2005; 437:E7. [DOI] [PubMed] [Google Scholar]
  • 20. Hintze B.J., Richardson J.S., Richardson D.C.. Mismodeled purines: implicit alternates and hidden Hoogsteens. Acta Crystallogr. D Struct. Biol. 2017; 73:852–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E.. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bauer J., Xing G.X., Yagi H., Sayer J.M., Jerina D.M., Ling H.. A structural gap in Dpo4 supports mutagenic bypass of a major benzo[a]pyrene dG adduct in DNA through template misalignment. Proc. Natl. Acad. Sci. USA. 2007; 104:14905–14910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Zhao L., Christov P.P., Kozekov I.D., Pence M.G., Pallan P.S., Rizzo C.J., Egli M., Guengerich F.P.. Replication of N2,3-Ethenoguanine by DNA Polymerases. Angew. Chem. Int. Ed. 2012; 51:5466–5469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Afonine P.V., Grosse-Kunstleve R.W., Echols N., Headd J.J., Moriarty N.W., Mustyakimov M., Terwilliger T.C., Urzhumtsev A., Zwart P.H., Adams P.D.. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. Sect. D Biol. Crystallogr. 2012; 68:352–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Adams P.D., Afonine P.V., Bunkoczi G., Chen V.B., Davis I.W., Echols N., Headd J.J., Hung L.W., Kapral G.J., Grosse-Kunstleve R.W.et al.. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. Sect. D Struct. Biol. 2010; 66:213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Emsley P., Lohkamp B., Scott W.G., Cowtan K.. Features and development of Coot. Acta Crystallogr. D. 2010; 66:486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Williams C.J., Headd J.J., Moriarty N.W., Prisant M.G., Videau L.L., Deis L.N., Verma V., Keedy D.A., Hintze B.J., Chen V.B.et al.. MolProbity: more and better reference data for improved all-atom structure validation. Protein Sci. 2018; 27:293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lu X.J., Bussemaker H.J., Olson W.K.. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 2015; 43:e142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Afek A., Shi H., Rangadurai A., Sahay H., Senitzki A., Xhani S., Fang M., Salinas R., Mielko Z., Pufall M.A.et al.. DNA mismatches reveal conformational penalties in protein–DNA recognition. Nature. 2020; 587:291–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. El Hassan M.A., Calladine C.R.. Two distinct modes of protein-induced bending in DNA. J. Mol. Biol. 1998; 282:331–343. [DOI] [PubMed] [Google Scholar]
  • 31. Lu X.J., Olson W.K.. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 2003; 31:5108–5121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Sagendorf J.M., Markarian N., Berman H.M., Rohs R.. DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes. Nucleic Acids Res. 2020; 48:D277–D287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Zimmer D.P., Crothers D.M.. NMR of enzymatically synthesized uniformly 13C15N-labeled DNA oligonucleotides. Proc. Natl. Acad. Sci. USA. 1995; 92:3091–3095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Kimsey I.J., Petzold K., Sathyamoorthy B., Stein Z.W., Al-Hashimi H.M.. Visualizing transient Watson-Crick-like mispairs in DNA and RNA duplexes. Nature. 2015; 519:315–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Rangadurai A., Zhou H., Merriman D.K., Meiser N., Liu B., Shi H., Szymanski E.S., Al-Hashimi H.M.. Why are Hoogsteen base pairs energetically disfavored in A-RNA compared to B-DNA?. Nucleic Acids Res. 2018; 46:11099–11114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Makarova A.V., Kulbachinskiy A.V.. Structure of human DNA polymerase iota and the mechanism of DNA synthesis. Biochemistry. Biokhimiia. 2012; 77:547–561. [DOI] [PubMed] [Google Scholar]
  • 37. Golovenko D., Bräuning B., Vyas P., Haran T.E., Rozenberg H., Shakked Z.. New Insights into the Role of DNA Shape on Its Recognition by p53 Proteins. Structure. 2018; 26:1237–1250. [DOI] [PubMed] [Google Scholar]
  • 38. Ling H., Boudsocq F., Woodgate R., Yang W.. Crystal structure of a Y-family DNA polymerase in action: a mechanism for error-prone and lesion-bypass replication. Cell. 2001; 107:91–102. [DOI] [PubMed] [Google Scholar]
  • 39. Boudsocq F., Iwai S., Hanaoka F., Woodgate R.. Sulfolobus solfataricus P2 DNA polymerase IV (Dpo4): an archaeal DinB-like DNA polymerase with lesion-bypass properties akin to eukaryotic poleta. Nucleic Acids Res. 2001; 29:4607–4616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Maga G., Villani G., Tillement V., Stucki M., Locatelli G.A., Frouin I., Spadari S., Hubscher U.. Okazaki fragment processing: modulation of the strand displacement activity of DNA polymerase delta by the concerted action of replication protein A, proliferating cell nuclear antigen, and flap endonuclease-1. Proc. Natl. Acad. Sci. USA. 2001; 98:14298–14303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. AlMalki F.A., Flemming C.S., Zhang J., Feng M., Sedelnikova S.E., Ceska T., Rafferty J.B., Sayers J.R., Artymiuk P.J.. Direct observation of DNA threading in flap endonuclease complexes. Nat. Struct. Mol. Biol. 2016; 23:640–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Mulder F.A., Mittermaier A., Hon B., Dahlquist F.W., Kay L.E.. Studying excited states of proteins by NMR spectroscopy. Nat. Struct. Mol. Biol. 2001; 8:932–935. [DOI] [PubMed] [Google Scholar]
  • 43. Palmer A.G. 3rd, Massi F.. Characterization of the dynamics of biomacromolecules using rotating-frame spin relaxation NMR spectroscopy. Chem. Rev. 2006; 106:1700–1719. [DOI] [PubMed] [Google Scholar]
  • 44. Hansen A.L., Nikolova E.N., Casiano-Negroni A., Al-Hashimi H.M.. Extending the range of microsecond-to-millisecond chemical exchange detected in labeled and unlabeled nucleic acids by selective carbon R(1rho) NMR spectroscopy. J. Am. Chem. Soc. 2009; 131:3818–3819. [DOI] [PubMed] [Google Scholar]
  • 45. Shi H., Clay M.C., Rangadurai A., Sathyamoorthy B., Case D.A., Al-Hashimi H.M.. Atomic structures of excited state A–T Hoogsteen base pairs in duplex DNA by combining NMR relaxation dispersion, mutagenesis, and chemical shift calculations. J. Biomol. NMR. 2018; 70:229–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Wilson D.M. 3rd, Takeshita M., Grollman A.P., Demple B.. Incision activity of human apurinic endonuclease (Ape) at abasic site analogs in DNA. J. Biol. Chem. 1995; 270:16002–16007. [DOI] [PubMed] [Google Scholar]
  • 47. Timson D.J., Singleton M.R., Wigley D.B.. DNA ligases in the repair and replication of DNA. Mutat. Res. 2000; 460:301–318. [DOI] [PubMed] [Google Scholar]
  • 48. Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E.. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012; 337:816–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Chou K.M., Cheng Y.C.. An exonucleolytic activity of human apurinic/apyrimidinic endonuclease on 3′ mispaired DNA. Nature. 2002; 415:655–659. [DOI] [PubMed] [Google Scholar]
  • 50. Wong D., DeMott M.S., Demple B.. Modulation of the 3′→5′-exonuclease activity of human apurinic endonuclease (Ape1) by its 5′-incised Abasic DNA product. J. Biol. Chem. 2003; 278:36242–36249. [DOI] [PubMed] [Google Scholar]
  • 51. Freudenthal B.D., Beard W.A., Perera L., Shock D.D., Kim T., Schlick T., Wilson S.H.. Uncovering the polymerase-induced cytotoxicity of an oxidized nucleotide. Nature. 2015; 517:635–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Caglayan M., Horton J.K., Dai D.P., Stefanick D.F., Wilson S.H.. Oxidized nucleotide insertion by pol beta confounds ligation during base excision repair. Nat. Commun. 2017; 8:14045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Whitaker A.M., Flynn T.S., Freudenthal B.D.. Molecular snapshots of APE1 proofreading mismatches and removing DNA damage. Nat. Commun. 2018; 9:399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Koshland D.E. Application of a theory of enzyme specificity to protein synthesis. Proc. Natl. Acad. Sci. USA. 1958; 44:98–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Wang Y., Juranek S., Li H., Sheng G., Tuschl T., Patel D.J.. Structure of an argonaute silencing complex with a seed-containing guide DNA and target RNA duplex. Nature. 2008; 456:921–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Wang Y.L., Juranek S., Li H.T., Sheng G., Wardle G.S., Tuschl T., Patel D.J.. Nucleation, propagation and cleavage of target RNAs in Ago silencing complexes. Nature. 2009; 461:754–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Swarts D.C., Jore M.M., Westra E.R., Zhu Y., Janssen J.H., Snijders A.P., Wang Y., Patel D.J., Berenguer J., Brouns S.J.J.et al.. DNA-guided DNA interference by a prokaryotic Argonaute. Nature. 2014; 507:258–261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Sheng G., Zhao H., Wang J., Rao Y., Tian W., Swarts D.C., van der Oost J., Patel D.J., Wang Y.. Structure-based cleavage mechanism of Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage. PNAS. 2014; 111:652–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Dalgaard J.Z., Garrett R.A., Belfort M.. Purification and characterization of two forms of I-DmoI, a thermophilic site-specific endonuclease encoded by an archaeal intron. J. Biol. Chem. 1994; 269:28885–28892. [PubMed] [Google Scholar]
  • 60. Molina R., Besker N., Marcaida M.J., Montoya G., Prieto J., D’Abramo M.. Key Players in I-DmoI Endonuclease Catalysis Revealed from Structure and Dynamics. ACS Chem. Biol. 2016; 11:1401–1407. [DOI] [PubMed] [Google Scholar]
  • 61. Scully R., Panday A., Elango R., Willis N.A.. DNA double-strand break repair-pathway choice in somatic mammalian cells. Nat. Rev. Mol. Cell Biol. 2019; 20:698–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Xu Y., McSally J., Andricioaei I., Al-Hashimi H.M.. Modulation of Hoogsteen dynamics on DNA recognition. Nat. Commun. 2018; 9:1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Zhou H.Q., Sathyamoorthy B., Stelling A., Xu Y., Xue Y., Pigli Y.Z., Case D.A., Rice P.A., Al-Hashimi H.M.. Characterizing Watson-Crick versus Hoogsteen Base Pairing in a DNA-Protein Complex Using Nuclear Magnetic Resonance and Site-Specifically C-13- and N-15-Labeled DNA. Biochemistry. 2019; 58:1963–1974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Zhou H., Kimsey I.J., Nikolova E.N., Sathyamoorthy B., Grazioli G., McSally J., Bai T., Wunderlich C.H., Kreutz C., Andricioaei I.et al.. m1A and m1G disrupt A-RNA structure through the intrinsic instability of Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2016; 23:803–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Sabarinathan R., Mularoni L., Deu-Pons J., Gonzalez-Perez A., López-Bigas N.. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016; 532:264–267. [DOI] [PubMed] [Google Scholar]
  • 66. Stelling A.L., Liu A.Y., Zeng W.J., Salinas R., Schumacher M.A., Al-Hashimi H.M.. Infrared Spectroscopic Observation of a G-C+ Hoogsteen Base Pair in the DNA:TATA-Box Binding Protein Complex Under Solution Conditions. Angew Chem. Int. Ed. 2019; 58:12010–12013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Xu Y., Manghrani A., Liu B., Shi H., Pham U., Liu A., Al-Hashimi H.M.. Hoogsteen base pairs increase the susceptibility of double-stranded DNA to cytotoxic damage. J. Biol. Chem. 2020; 295:15933–15947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Johnson R.E., Prakash L., Prakash S.. Biochemical evidence for the requirement of Hoogsteen base pairing for replication by human DNA polymerase iota. Proc. Natl. Acad. Sci. USA. 2005; 102:10466–10471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Nikolova E.N., Gottardo F.L., Al-Hashimi H.M.. Probing transient Hoogsteen hydrogen bonds in canonical duplex DNA using NMR relaxation dispersion and single-atom substitution. J. Am. Chem. Soc. 2012; 134:3667–3670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Lang P.T., Ng H.L., Fraser J.S., Corn J.E., Echols N., Sales M., Holton J.M., Alber T.. Automated electron-density sampling reveals widespread conformational polymorphism in proteins. Protein Sci. 2010; 19:1420–1431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Riley B.T., Wankowicz S.A., de Oliveira S.H.P., van Zundert G.C.P., Hogan D.W., Fraser J.S., Keedy D.A., van den Bedem H.. qFit 3: Protein and ligand multiconformer modeling for X-ray crystallographic and single-particle cryo-EM density maps. Protein Sci. 2021; 30:270–285. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkab936_Supplemental_File

Data Availability Statement

The PDB coordinate files and structure factor amplitudes (MTZ) files of all the structural models rebuilt and refined with Hoogsteen bps in our study can be downloaded from: https://github.com/alhashimilab/HoogsteenInTheData. The Hoog-Finder Python program with the user manual can be downloaded from: https://github.com/alhashimilab/Hoog-Finder. All other data supporting the findings are available within the article and its Supplementary Data.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES