Abstract

High-order quantum chemistry is applied to hydrogen-bonded natural DNA nucleobase pairs [adenine:thymine (A:T) and guanine:cytosine (G:C)] and non-natural Hachimoji nucleobase pairs [isoguanine:1-methylcytosine (B:S) and 2-aminoimidazo[1,2a][1,3,5]triazin-4(1H)-one:6-amino-5-nitropyridin-2-one (P:Z)] to see how the intermolecular interaction energies and their energetic components (electrostatics, exchange-repulsion, induction/polarization, and London dispersion interactions) vary among the base pairs. We examined the Hoogsteen (HG) geometries in addition to the traditional Watson–Crick (WC) geometries. Coupled-cluster theory through perturbative triples [CCSD(T)] extrapolated to the complete basis set (CBS) limit and high-order symmetry-adapted perturbation theory (SAPT) at the SAPT2+(3)(CCD)δMP2/aug-cc-pVTZ level are used to estimate highly accurate noncovalent interaction energies. Electrostatic interactions are the most attractive component of the interaction energies, but the sum of induction/polarization and London dispersion is nearly as large, for all base pairs and geometries considered. Interestingly, the non-natural Hachimoji base pairs interact more strongly than the corresponding natural base pairs, by −21.8 (B:S) and −0.3 (P:Z) kcal mol–1 in the WC geometries, according to CCSD(T)/CBS. This is consistent with the H-bond distances being generally shorter in the non-natural base pairs. The natural base pairs are energetically more stabilized in their Hoogsteen geometries than in their WC geometries. The Hoogsteen geometry makes the A:T base pair slightly more stable, by −0.8 kcal mol–1, and it greatly stabilizes the G:C+ base pair, by −15.3 kcal mol–1. The G:C+ stabilization is mainly due to the fact that C has typically added a proton when found in Hoogsteen geometries. By contrast, Hoogsteen geometries are substantially less favorable than WC geometries for non-natural Hachimoji base pairs, by 17.3 (B:S) and 13.8 (P:Z) kcal mol–1.
Introduction
Noncovalent interactions (NCI) play a key role in many applications in (bio)chemistry/physics, materials and nanoscience, and biology. NCI govern such processes as drug binding, supermolecular assembly, and crystal packing of organics. In biochemistry, the architectures of biomacromolecules such as deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and proteins are determined by NCI among the building blocks.1,2
In 1953, James Watson and Francis Crick reported a double helix structure for double-stranded DNA.3 This double helix structure assembles through H-bonding between the nucleobases to form Watson–Crick (WC) base pairs: adenine (A) with thymine (T), and guanine (G) with cytosine (C). For RNA, T is replaced with uracil (U). A decade after Watson and Crick published their model of the DNA double helix, Karst Hoogesteen discovered that A:T(U) and G:C+ can pair in a different geometrical orientation now called Hoogsteen pairing (here C+ denotes a protonated cytosine).4
In 2014, Malyshev and co-workers opened up a novel technique in artificial (or synthetic) biology by introducing exogenous non-natural nucleic acid base pairs into a living organisms DNA, demonstrating the feasibility of propagating an augmented genetic alphabet.5 Further advances in synthetic biology have demonstrated the creation of semisynthetic DNA- and RNA-like systems built from eight nucleotide “letters,” the original four for DNA (or RNA), plus four non-natural so-called “Hachimoji” (HM) nucleobases, allowing a total of four orthogonal pairs in DNA or RNA.6 Analogous to natural double-stranded DNA and RNA, these non-natural nucleobases can also form double helices. Along with natural nucleobase H-bond pairs, Hachimoji DNA and RNA can involve additional H-bond pairs resulting from new synthetic nucleobases: isoguanine (B), 2-aminoimidazo[1,2a][1,3,5]triazin-4(1H)-one (P), isocytosine (rS), 1-methylcytosine (S), and 6-amino-5-nitropyridin-2-one (Z).6 These are the Hachimoji nucleobases synthesized by Hoshika and co-workers in early 2019.6 The non-natural DNA nucleobase S bonds with B, and P bonds with Z, while B bonds with rS in the non-natural RNA system. These non-natural DNA building blocks P and B are analogues of the purine system, and S (and rS) and Z are analogues of the pyrimidine system. These synthetic systems meet the structural requirements needed to support Darwinian evolution, including a polyelectrolyte backbone, predictable thermodynamic stability, and stereoregular building blocks that fit a Schrödinger aperiodic crystal.6 Like natural DNA, Hachimoji DNA though non-natural can also support the evolution of organisms.4 Non-natural DNA has been suggested for numerous applications, including information storage and drug design.6−9
It is well established that natural DNA and RNA duplexes are primarily stabilized by the forces of WC H-bonding and nucleobase stacking (interstrand and intrastrand).10−12 The H-bonds are stronger than the π-stacking interactions, although both are essential for the stability of natural DNA and RNA duplexes in aqueous solution.12−14 It has been reported that H-bonding is primarily electrostatic in nature with a non-negligible dispersion contribution.15,16 Jurečka and co-workers have reported16−18 high-level computations for H-bonding interaction energies of several DNA base pairs, including some with nonstandard substituents, using second-order Møller–Plesset perturbation theory (MP2) with coupled-cluster corrections including noniterative triples contributions [CCSD(T)].19 Hesselmann and co-workers have examined20 the interaction energy between A:T and G:C base pairs and their physical components using density functional theory-based symmetry-adapted perturbation theory (DFT-SAPT),21,22 and Fonseca Guerra and Bickelhaupt conducted a similar study23 using an energy decomposition scheme of the supermolecular DFT energy.
However, to date, no high-level theoretical study has directly compared H-bonding in non-natural Hachimoji nucleobases to that in natural WC base pairs. Here, we present such an analysis using high orders of symmetry-adapted perturbation theory (SAPT) to obtain the electrostatic, exchange-repulsion, induction/polarization, and London dispersion contributions to the interaction in each base pair, complemented by CCSD(T) computations in the complete basis set (CBS) limit to obtain benchmark-quality interaction energies. Our analysis includes Watson–Crick-type geometries, as well as Hoogsteen-type geometries (newly proposed here for the Hachimoji base pairs). We analyze interaction energies and their components in terms of the geometries of the base pairs and find in general that interaction strength in these systems correlates with the sum of H-bond and H-bond-like closest-contact distances.
Theoretical Methods
In this present study, we have modeled a total of eight base pairs: natural WC base pairs (A:T and G:C) and non-natural HM base pairs (P:Z and B:S), as well as Hoogsteen (HG) geometry base pairs (A:T and G:C+) and HG-type HM base pairs (P:Z and B:S). We employ B3LYP-D3(BJ)24−26 methods that have become routinely applied to NCI geometry optimizations, with the popular correlation-consistent basis sets of Dunning augmented with diffuse functions [aug-cc-pVXZ (X = D, T, Q); abbreviated throughout as aXZ].27 All three basis sets provide essentially identical geometries. The geometry optimization was done using the Q-Chem code.28 Geometries were also optimized using (frozen core) MP2, and results were very similar to the DFT geometries. We used the B3LYP-D3(BJ)/aDZ geometries for further analysis. All results are obtained in the gas phase, which is sufficient to quantify the intrinsic attraction between the nucleobases. In the condensed phase, additional interactions will be present (e.g., interactions between the nucleobases and water molecules and also nearby nucelobases and DNA backbone), but the details will vary depending on the particular environment. Moreover, such additional interactions are not expected to modify or “tune” the direct nucleobase–nucleobase interactions.29
At the DFT geometries, we obtained high-quality intermolecular interaction energies. To obtain noncovalent interaction energies as well as their physically meaningful components, we used the “gold standard”30 of symmetry-adapted perturbation theory, i.e., SAPT2+(3)(CCD)δMP2/aXZ, where X = D, T. This level of theory includes the following terms:
| 1 |
where
| 2 |
and
| 3 |
IE denotes an interaction energy. The two numbers in superscripts refer to the order of the perturbation for the intermolecular interaction and the intramolecular electron correlation, respectively. Cross terms like exchange-induction and exchange-dispersion are accounted as induction and dispersion, respectively. A detailed discussion of the above terms can be found in our previous paper.30 This level of SAPT is expected to provide accurate noncovalent interaction energies and their components. Interaction energies should similar to those from the reliable CCSD(T)/CBS method.30 We also considered some lower levels of SAPT for comparison purposes.
Our most accurate estimates of the interaction energies are obtained using the focal-point approach to estimate the CCSD(T)/CBS limit,31,32 whereby the MP2/CBS limit [computed using the two-point extrapolation scheme of Helgaker,33 denoted as MP2/CBS(aXZ, a[X+1]Z)] is corrected for higher-order electron correlation effects by adding the difference between CCSD(T) and MP2 as computed in a smaller basis set, denoted by δMP2CCSD(T). In particular, these noncovalent interaction energy values are computed at the MP2/CBS(aTZ, aQZ) + δMP2/X and MP2/CBS(aQZ, a5Z) + δMP2CCSD(T)/X levels of theory, where X = aDZ or aTZ. These computations will be abbreviated here as CCSD(T)/CBS[aTQZ;δ:X] and CCSD(T)/CBS[aQ5Z;δ:X]. This focal-point approach has been widely used to estimate CCSD(T) at the CBS limit for noncovalent interaction energy values.18,34−49
CCSD(T) interaction energies are computed with and without the Boys–Bernardi counterpoise correction.50−52 Core electrons are kept frozen in all interaction energy computations, which have been performed with the quantum chemistry program Psi4 v1.5.53,54
Results and Discussion
Watson–Crick Natural and Hachimoji Non-Natural DNA Base Pairs
Optimized structures of the four H-bonded nucleic acid base pairs are shown in Figure 1. The DFT computed structures show planar H-bonded structures for the natural base pairs (A:T and G:C) and non-natural base pairs (B:S and P:Z). The computed H-bonding closest-contact distances (distance between hydrogen being donated and the acceptor atom, in Å) are included in the figure. Each base pair features three H-bonds with intermolecular contact distances about 1.9 Å or less, except for A:T which has two such bonds plus a long 2.7 Å weak C–H···O bond.
Figure 1.

Optimized structures of the H-bonded natural WC (A:T, G:C) and non-natural HM (B:S, P:Z) base pairs at the B3LYP-D3(BJ)/aug-cc-pVDZ basis set level of theory. The dotted black color lines represent the H-bond distances (Å) for different nucleic acid base pairs.
High-quality total interaction energy values approaching the CCSD(T)/CBS limit, along with our highest level SAPT computations at the SAPT2+(3)(CCD)δMP2/aTZ level of theory, are presented in Table 1 for the WC-type (A:T, G:C, B:S, P:Z) and HG-type (A:T, G:C+, B:S, P:Z) nucleic acid base pairs. Table S1 of the Supporting Information additionally presents CCSD(T)/CBS results without counterpoise correction, and also using the smaller aDZ basis set for the CCSD(T) correction, to demonstrate basis set convergence. The difference in interaction energies between the CCSD(T)/CBS[aTQZ;δ:aTZ] and CCSD(T)/CBS[aQ5Z;δ:aTZ] computations is very tiny and amounts to ≤0.02 kcal mol–1 for the WC- and HG-type base pairs, indicating that the MP2 portion of the interaction energy is very well converged. The differences between CCSD(T)/CBS[aQ5Z;δ:aTZ] and CCSD(T)/CBS[aQ5Z;δ:aDZ] (provided in the SI) are rather small (ranging from 0.2 to 0.4 kcal mol–1 with counterpoise correction, and 0.0 to 0.1 kcal mol–1 without counterpoise correction) and suggest that aTZ is probably sufficient to converge the coupled-cluster part of the focal point procedure to within ∼0.1–0.2 kcal mol–1. Additionally, the interaction energy difference between counterpoise correction and without counterpoise correction ranges from 0.1 to 0.2 kcal mol–1 for CCSD(T)/CBS[aQ5Z;δ:aDZ] and 0.0 to 0.1 kcal mol–1 for CCSD(T)/CBS[aQ5Z;δ:aTZ] for WC-type base pairs. This suggests that any remaining basis set superposition error in our best estimates is rather small. We may also compare our best total interaction energy values at the CCSD(T)/CBS limit to the CCSD(T)/CBS[aTQZ;δ:DZ] results reported by Jurečka and co-workers18 for the A:T and G:C base pairs. Despite using a methodology that appears similar, by neglecting diffuse functions for the δMP2CCSD(T) correction, the interaction energies of Jurečka et al. are underbinding by 0.5–0.9 kcal mol–1 vs our best estimates with larger basis sets.
Table 1. Total Interaction Energy (in kcal mol–1) of WC-Type and HG-Type Nucleic Acid Base Pairs.
| Base Pairs | WC type |
HG type |
||||||
|---|---|---|---|---|---|---|---|---|
| Methods | A:T | G:C | B:S | P:Z | A:T | G:C+ | B:S | P:Z |
| CCSD(T)/CBS[aQ5Z;δ:aTZ] | –17.41 | –33.03 | –39.19 | –33.36 | –18.21 | –48.34 | –21.86 | –19.56 |
| CCSD(T)/CBS[aTQZ;δ:aTZ] | –17.40 | –33.01 | –39.17 | –33.34 | –18.20 | –48.32 | –21.85 | –19.55 |
| SAPT2+(3)(CCD)δMP2/aTZ | –16.87 | –32.82 | –39.30 | –33.11 | –17.63 | –48.22 | –22.00 | –19.59 |
| CCSD(T)/CBS[aTQZ;δ:DZ]a | –16.9 | –32.1 | ||||||
| DFT-SAPT/aTZb | –15.2 | –29.8 | ||||||
Results from our highest-level SAPT method, SAPT2+(3)(CCD)δMP2/aTZ, are quite close to our best CCSD(T)/CBS estimates. The range of differences between SAPT2+(3)(CCD)δMP2/aTZ and CCSD(T)/CBS[aQ5Z;δ:aTZ] is from −0.11 to 0.54 kcal mol–1 for WC-type natural and non-natural base pairs. This indicates that our highest level of SAPT should be quite suitable for analyzing the various components of the interaction energy; we present this analysis below. In Tables S3–S4 and Figures S4–S11, we present a comparison between different levels of SAPT with aTZ and aDZ basis sets. Changes of a few kcal mol–1 in the interaction energies are found as we add more terms in SAPT. The modest changes in interaction energy when the basis set increases from aDZ to aTZ suggest that the SAPT results using the aTZ basis are fairly well converged with respect to the basis set. Finally, Table 1 also provides a comparison to the DFT-SAPT/aTZ results of Hesselmann et al.20 for the A:T and G:C base pairs. Their results are underbound by 2.2 (A:T) and 3.2 (G:C) kcal mol–1 compared to our highest-level CCSD(T)/CBS results, for errors of about 10%–13%. SAPT2+(3)(CCD)δMP2/aTZ reduces the underbinding to 0.5 and 0.2 kcal mol–1, respectively (or about 1%–3%), albeit at a significantly increased computational cost compared to DFT-SAPT.
The natural G:C base pair has nearly twice the interaction energy as the natural A:T base pair (Table 1). These interaction energies are determined primarily by the H-bonds formed between base pairs. We note that the G:C base pair has 3 H bonds, and A:T has only 2 H-bonds and a long weak C–H···O bond (Figure 1). For the non-natural base pairs, the B:S base pair has a somewhat more negative interaction energy than the P:Z base pair (by 5.8 kcal mol–1). Two of the bond lengths are very similar for B:S and P:Z, but one is somewhat weaker (lengthened by 0.072 Å) in the P:Z base pair (Figure 1). Comparing the H-bond lengths in G:C vs P:Z, which have very similar interaction energies, we see that the middle H-bond is about the same length in both complexes (shorter by 0.009 Å in P:Z), while the top H-bond in the figure lengthens by 0.145 Å in P:Z, and the bottom H-bond shortens by 0.184 Å. These bond length changes thus seem to approximately cancel out in the determination of the overall interaction energies of G:C vs P:Z. Indeed, the sum of the three closest-contact H-bond distances in each base pair (including the weak C–H···O bond in A:T) seems to correlate well with the total interaction energy: 6.374 Å and −17.4 kcal mol–1 (A:T), 5.436 Å and −33.0 kcal mol–1 (G:C), 5.388 Å and −33.4 kcal mol–1 (P:Z), and 5.291 Å and −39.2 kcal mol–1 (B:S).
SAPT Energy Component Analysis
The point of view that H-bonding is mainly stabilized by electrostatic contributions20 is supported by Figure 2, in the sense that it is the dominating attractive contribution. In addition, the electrostatic term varies the most between base pairs. However, note that the attractive electrostatic contributions are not enough to overcome the exchange-repulsion terms. Indeed, the sum of electrostatics and exchange-repulsion is +4.3–7.4 kcal mol–1 for the A:T, G:C, B:S, and P:Z base pairs, respectively (detailed energy component data are provided in the Supporting Information Tables S5 and S6). Figure 2 shows that induction and London dispersion components are also contributing significantly to total interaction energy values. While for the A:T base pair, the induction and dispersion energy contributions are nearly similar (within 2.0 kcal mol–1); for other base pairs, induction is significantly larger than dispersion, by 7.8 kcal mol–1 or more. Overall, all four components are very important for determining the interaction energy. For the B:S base pair, the electrostatic contribution is the strongest among the base pairs. Furthermore, we note that the electrostatic, exchange-repulsion, induction, and London dispersion components for the B:S base pair are stronger by at least −8.0, 7.7, – 4.6, and −1.3 kcal mol–1 than in the A:T, G:C, and P:Z base pairs. SAPT interaction energy components tend to be stronger for intermolecular interactions with shorter contacts, and as noted above, the sum of the H-bond contact distances is the shortest (5.291 Å) for B:S. P:Z and G:C have energy components that are similar to each other, but smaller in magnitude than for B:S. This is consistent with the sum of H-bond contact distances being larger for these dimers than for B:S, but similar to each other (5.388 Å for P:Z and 5.436 Å for G:C). Finally, the SAPT energy components are the smallest in magnitude for A:T, which has only two real H-bonds plus a weak C–H···O contact (summing to 6.374 Å).
Figure 2.

Total noncovalent interaction energy (total IE) values and SAPT energy components (electrostatic, exchange, induction, dispersion) for the WC (A:T, G:C) and HM (B:S, P:Z) nucleic acid base pairs at the SAPT2+(3)(CCD)δMP2/aTZ basis set level of theory.
Hoogsteen Natural and Hoogsteen-Type Non-Natural Hachimoji DNA Base Pairs
Next, we consider natural Hoogsteen (HG) base pair geometries, to form A(syn):T(anti) and G(syn):C+(anti), by rotating the purine base in the WC-type base pair 180° around the glycosidic bond torsion angle, so that it adopts a syn conformation (0° < χ < 90°, where χ is the glycosidic bond torsion angle as illustrated in Figure 3 and Figure S1) rather than an anticonformation (−180° < χ < −90°), followed by the translation of the two bases by ∼2.0–2.5 Å, thus allowing the formation of a set of H-bonds (Figure 3 and Figure S1).55−58 In a similar way, we create HG-type HM base pairs to form B(syn):S(anti) and P(syn):Z(anti). Optimized structures of natural HG base pairs and a non-natural P:Z HG geometry are found to be planar, while the non-natural B:S HG geometry turns nonplanar. We suspect that unfavorable H···H intermolecular interactions [H from B (syn) and NH2 from S(anti), Figure 3] might be responsible for the nonplanar geometry of the HG B:S base pair.
Figure 3.
Optimized structures of the HG (A:T, G:C+) and HG-type HM (B:S, P:Z) nucleic acid base pairs at the B3LYP-D3(BJ)/aDZ basis set level of theory. The dotted black color lines represent the H-bond distances (Å) for different nucleic acid base pairs.
The interaction energy results at the equilibrium geometries of the HG-type base pairs are tabulated in Table 1 and Table S1. Additional comparison tables of interaction energy results for CCSD(T)/CBS with and without counterpoise correction and with different SAPT approaches and basis sets are also presented in the Supporting Information. As indicated in Table 1 and Tables S1 and S2, the interaction energies of the HG-type base pairs are in excellent agreement with each other as we vary the basis sets used in the CCSD(T)/CBS estimates, or when we compare those values to our highest-level SAPT values (within a few tenths of one kcal mol–1). This is consistent with our findings for the WC geometries. Herein, we mainly discuss the results obtained using the highest-level CCSD(T)/CBS[aQ5Z;δ:aTZ] focal-point approach and the SAPT2+(3)(CCD)δMP2 approach.
From Table 1, we note that the HG-type (A:T, G:C+) base pairs are more stabilized (have more negative interaction energies) than the WC-type base pairs. The interaction energy differences at our best level of theory, CCSD(T)/CBS[aQ5Z;δ:aTZ], are −0.8 kcal mol–1 for A:T and −15.3 kcal mol–1 for G:C+. For the Hoogsteen G:C+ base pair, the N–H···O contact distance is slightly shorter, and the N–H···N distance is significantly shorter than in the WC geometry. However, the final N–H··· O H-bond is replaced with a weak C–H···O contact (2.802 Å), which one might expect to lead to a much weaker overall interaction energy in the HG geometry. Instead, the interaction is much stronger in the HG geometry because of the protonation of C, leading to strong ion–dipole interactions. This explanation is confirmed by the SAPT analysis below.
The H-bond contact distances for A:T are similar in the HG geometry as in the WC geometry, consistent with its similar binding energy. The slightly stronger interaction of the A:T base pair in the HG geometry than in the WC geometry is consistent with the experimental infrared-spectra analysis of the A:T base pair in a nonpolar solvent.59−61 This result of a stronger interaction in the HG geometry is also consistent with previous quantum chemical studies by Hobza et al.62−65
In contrast to the natural base pairs, the non-natural Hachimoji (B:S and P:Z) base pairs are energetically less stable in their HG-type geometries than in their WC-type geometries. As mentioned above, for the HG-type B:S base pair, we observed a nonplanar titled configuration. The significantly weaker interaction in this HG geometry (−21.9 kcal mol–1) vs the WC geometry (−39.2 kcal mol–1) is consistent with elongated intermolecular H-bond contact distances. For the HG-type P:Z base pair, the weaker interaction energy in the HG geometry (−19.6 kcal mol–1) vs the WC-type geometry (−33.4 kcal mol–1) is again consistent with longer intermolecular H-bond contacts. Although one N–H···O contact is medium length (1.815 Å), the other N–H···O contact is long (2.050 Å), and the remaining weak C–H···O H-bond is quite long (2.260 Å). Also note that both of the N–H···O contacts deviate significantly from the linear arrangement of the three atoms that would be optimal for a strong H-bond. Nevertheless, as in the case of the WC geometries, we continue to see a correlation between the sum of the closest-contact H-bond distances and the interaction energy, at least for A:T and P:Z: 6.389 Å and −18.2 kcal mol–1 for A:T and 6.125 Å and −19.6 kcal mol–1 for P:Z. G:C+ deviates from this trend (6.164 Å and −48.3 kcal mol–1 due to the aforementioned ion–dipole interactions), and B:S has only two significant H-bond-like contacts.
SAPT Energy Component Analysis
We analyzed the SAPT2+(3)(CCD)δMP2/aTZ energy components for the HG-type DNA base pairs (Figure 4). The largest energy component in Figure 4 is the electrostatic interaction between G and C+ (nearly twice as large as in B:S and P:Z). This is due to the very strong ion–neutral interaction in this base pair, as asserted above. We also note an unusually strong induction/polarization term for this base pair, as the ion–induced dipole contribution will be very large compared to dipole–induced dipole terms for the other base pairs. These strongly attractive contributions pull the bases close together (leading to two short H-bond contact distances noted above), thus also leading to an increase in the exchange-repulsion energy.
Figure 4.

Total noncovalent interaction energy (total IE) values and SAPT components (electrostatic, exchange, induction, dispersion) for the HG (A:T, G:C+) and HG-type HM (B:S, P:Z) nucleic acid base pairs with the SAPT2+(3)(CCD)δMP2/aTZ basis set level of theory.
Looking more generally at Figure 4, it is interesting to note that the exchange-repulsion term is approximately the same size as the attractive electrostatics term (thus largely canceling it out), except for A:T, where it is slightly larger (by 6.5 kcal mol–1). For the WC geometries, exchange-repulsion was larger in magnitude than electrostatics for every base pair. This could be due to generally shorter H-bond distances in the WC geometries. Like in the WC base pairs, induction and dispersion components contribute significantly to the total interaction energies, with induction dominating over dispersion in all four base pairs, although the difference is <2.0 kcal mol–1, except for the G:C+ base pair where the difference is 16.8 kcal mol–1.
Conclusions
Noncovalent forces play a critical role in determining biomolecular structure, including the structures of DNA and RNA. Here, we have presented a detailed, high-level theoretical examination of H-bonding in both natural (A:T and G:C) and non-natural Hachimoji (B:S and P:Z) DNA base pairs, in their standard Watson–Crick (WC) type geometries, and also in Hoogsteen (HG) type geometries. Interestingly, the non-natural Hachomoji-type base pairs interact more strongly than the corresponding natural base pairs, by −21.8 (B:S) and −0.3 kcal mol–1 (P:Z) in the WC geometries, according to coupled-cluster theory [CCSD(T)] in the complete-basis-set limit. The stronger interaction in B:S vs A:T is due to the formation of a third H-bond in B:S.
In addition, the natural base pairs are energetically more stabilized in their HG-type geometries than in their WC-type geometries: A:T by a minor −0.8 and G:C+ by −15.3 kcal mol–1. This stronger interaction of A:T in the HG-type geometry more than the WC-type geometry is consistent with the experimental infrared-spectra analysis of A:T dimers in a nonpolar solvent.59−61 By contrast, HG-type geometries are substantially less stable than WC-type geometries for non-natural Hachimoji DNA base pairs, B:S by 17.3 and P:Z by 13.8 kcal mol–1. We also noted a general correlation between the total interaction energy and the sum of the closest-contact H-bond distances (including the weak C–H···O contact in A:T and in the HG geometry of P:Z).
We have applied high-order quantum mechanical energy component analysis, using symmetry-adapted perturbation theory (SAPT), to understand the base pair interaction energies in terms of the physically meaningful components of electrostatics, exchange-repulsion, induction/polarization, and London dispersion. SAPT analysis confirmed that electrostatic interactions are the most attractive in base pair interactions, consistent with conventional wisdom about the nature of H-bonding interactions. However, the sum of induction and dispersion is nearly as large as electrostatics for both natural and non-natural base pairs, and for WC- and HG-type geometries. The magnitude of the SAPT components also tended to correlate with the sum of the closest-contact H-bond-type interactions. However, all SAPT terms are much larger in magnitude in the Hoogsteen G:C+ base pair than in any other base pair considered. This is consistent with very strong ion–dipole interactions in this complex which are absent in the other complexes, and it explains why the interaction energy is much stronger for this base pair even though it has only two true H-bonds.
The present interaction energies are expected to be the most accurate theoretical values reported to date for these important prototype systems, including for the newly proposed structures of Hoogsteen-type geometries for the non-natural Hachimoji base pairs. As such, the interaction energies may serve as valuable reference data. We also hope that the detailed energetic analysis of the interaction energy components, and how they relate to structure, may aid future studies of H-bonding between nucleobases and the design of additional non-natural nucleobases.
Acknowledgments
The authors gratefully acknowledge financial support from the U.S. National Science Foundation through Grant CHE-1955940. We thank Dr. Lori A. Burns, Dr. Derek P. Metcalf, and Dr. Zachary L. Glick for helpful discussions.
Data Availability Statement
The Supporting Information includes Cartesian coordinates for all geometries considered in this work. Most computations were performed using the open-source Psi4 quantum chemistry program, version 1.5, freely available from https://psicode.org. As noted in the Theoretical Methods section, geometry optimizations using B3LYP-D3(BJ) were peformed using the Q-Chem code, version 5.1, available from https://q-chem.com.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c00428.
Scheme for torsion angles, MBIS atomic charge analysis, tables for comparison of aDZ/aTZ and CP/no-CP interaction energies, tables with interaction energies at different levels of SAPT and basis sets, SAPT interaction energy components, figures for the difference in total interaction energy values at some lower levels of SAPT for WC-, HM-, and HG-type natural and non-natural base pairs at different basis sets.(PDF)
Optimized geometries (xyz) for all base pairs at B3LYP-D3(BJ)/aDZ level of theory (ZIP)
The authors declare no competing financial interest.
Supplementary Material
References
- Černỳ P.; Hobza J. Non-covalent interactions in biomacromolecules. Phys. Chem. Chem. Phys. 2007, 9, 5291–5303. 10.1039/b704781a. [DOI] [PubMed] [Google Scholar]
- Sherrill C. D. Computations of noncovalent π interactions. Rev. Comput. Chem. 2009, 26, 1–38. 10.1002/9780470399545.ch1. [DOI] [Google Scholar]
- Watson J. D.; Crick F. H. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 1953, 171, 737–738. 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
- Hoogsteen K. The crystal and molecular structure of a hydrogen-bonded complex between 1-methylthymine and 9-methyladenine. Acta Crystallogr. 1963, 16, 907–916. 10.1107/S0365110X63002437. [DOI] [Google Scholar]
- Malyshev D. A.; Dhami K.; Lavergne T.; Chen T.; Dai N.; Foster J. M.; Corrêa I. R.; Romesberg F. E. A semi-synthetic organism with an expanded genetic alphabet. Nature 2014, 509, 385–388. 10.1038/nature13314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoshika S.; Leal N. A.; Kim M.-J.; Kim M.-S.; Karalkar N. B.; Kim H.-J.; Bates A. M.; Watkins N. E.; SantaLucia H. A.; Meyer A. J.; DasGupta S.; Piccirilli J. A.; Ellington A. D.; SantaLucia J.; Georgiadis M. M.; Benner S. A. Hachimoji DNA and RNA: A genetic system with eight building blocks. Science 2019, 363, 884–887. 10.1126/science.aat0971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumawat R. L.; Pathak B. Conductance and tunnelling current characteristics for individual identification of synthetic nucleic acids with a graphene device. Phys. Chem. Chem. Phys. 2022, 24, 15756–15766. 10.1039/D2CP01255C. [DOI] [PubMed] [Google Scholar]
- Wang X.; Hoshika S.; Peterson R. J.; Kim M.-J.; Benner S. A.; Kahn J. D. Biophysics of artificially expanded genetic information systems. Thermodynamics of DNA duplexes containing matches and mismatches involving 2-amino-3-nitropyridin-6-one (Z) and imidazo [1, 2-a]-1, 3, 5-triazin-4 (8H) one (P). ACS Synth. Biol. 2017, 6, 782–792. 10.1021/acssynbio.6b00224. [DOI] [PubMed] [Google Scholar]
- Molt R. W. Jr; Georgiadis M. M.; Richards N. G. Consecutive non-natural PZ nucleobase pairs in DNA impact helical structure as seen in 50 μs molecular dynamics simulations. Nucleic Acids Res. 2017, 45, 3643–3653. 10.1093/nar/gkx144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Šponer J.; Riley K. E.; Hobza P. Nature and magnitude of aromatic stacking of nucleic acid bases. Phys. Chem. Chem. Phys. 2008, 10, 2595–2610. 10.1039/b719370j. [DOI] [PubMed] [Google Scholar]
- Parker T. M.; Hohenstein E. G.; Parrish R. M.; Hud N. V.; Sherrill C. D. Quantum-mechanical analysis of the energetic contributions to π stacking in nucleic acids versus rise, twist, and slide. J. Am. Chem. Soc. 2013, 135, 1306–1316. 10.1021/ja3063309. [DOI] [PubMed] [Google Scholar]
- Yakovchuk P.; Protozanova E.; Frank-Kamenetskii M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 2006, 34, 564–574. 10.1093/nar/gkj454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rezac J.; Hobza P. On the nature of DNA-duplex stability. Eur. J. Chem. 2007, 13, 2983–2989. 10.1002/chem.200601120. [DOI] [PubMed] [Google Scholar]
- Cerny J.; Kabelác M.; Hobza P. Double-helical→ ladder structural transition in the B-DNA is induced by a loss of dispersion energy. J. Am. Chem. Soc. 2008, 130, 16055–16059. 10.1021/ja805428q. [DOI] [PubMed] [Google Scholar]
- Thanthiriwatte K. S.; Hohenstein E. G.; Burns L. A.; Sherrill C. D. Assessment of the performance of DFT and DFT-D methods for describing distance dependence of hydrogen-bonded interactions. J. Chem. Theory Comput. 2011, 7, 88–96. 10.1021/ct100469b. [DOI] [PubMed] [Google Scholar]
- Jurečka P.; Hobza P. True stabilization energies for the optimal planar hydrogen-bonded and stacked structures of guanine ··· cytosine, adenine ··· thymine, and their 9-and 1-methyl derivatives: complete basis set calculations at the MP2 and CCSD (T) levels and comparison with experiment. J. Am. Chem. Soc. 2003, 125, 15608–15613. 10.1021/ja036611j. [DOI] [PubMed] [Google Scholar]
- Šponer J.; Jurečka P.; Hobza P. Accurate interaction energies of hydrogen-bonded nucleic acid base pairs. J. Am. Chem. Soc. 2004, 126, 10142–10151. 10.1021/ja048436s. [DOI] [PubMed] [Google Scholar]
- Jurečka P.; Šponer J.; Černỳ J.; Hobza P. Benchmark database of accurate (MP2 and CCSD (T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs. Phys. Chem. Chem. Phys. 2006, 8, 1985–1993. 10.1039/B600027D. [DOI] [PubMed] [Google Scholar]
- Raghavachari K.; Trucks G. W.; Pople J. A.; Head-Gordon M. A 5th-Order Perturbation Comparison of Electron Correlation Theories. Chem. Phys. Lett. 1989, 157, 479–483. 10.1016/S0009-2614(89)87395-6. [DOI] [Google Scholar]
- Hesselmann A.; Jansen G.; Schütz M. Interaction energy contributions of H-bonded and stacked structures of the AT and GC DNA base pairs from the combined density functional theory and intermolecular perturbation theory approach. J. Am. Chem. Soc. 2006, 128, 11730–11731. 10.1021/ja0633363. [DOI] [PubMed] [Google Scholar]
- Hesselmann A.; Jansen G.; Schuetz M. Density-functional theory-symmetry-adapted intermolecular perturbation theory with density fitting: A new efficient method to study intermolecular interaction energies. J. Chem. Phys. 2005, 122, 014103. 10.1063/1.1824898. [DOI] [PubMed] [Google Scholar]
- Misquitta A. J.; Podeszwa R.; Jeziorski B.; Szalewicz K. Intermolecular potentials based on symmetry-adapted perturbation theory with dispersion energies from time-dependent density-functional calculations. J. Chem. Phys. 2005, 123, 214103. 10.1063/1.2135288. [DOI] [PubMed] [Google Scholar]
- Fonseca Guerra C.; Bickelhaupt F. M. Orbital interactions in strong and weak hydrogen bonds are essential for DNA replication. Angew. Chem., Int. Ed. 2002, 41, 2092–2095. . [DOI] [PubMed] [Google Scholar]
- Stephens P. J.; Devlin F. J.; Chabalowski C. F.; Frisch M. J. Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields. J. Phys. Chem. 1994, 98, 11623–11627. 10.1021/j100096a001. [DOI] [Google Scholar]
- Grimme S.; Ehrlich S.; Goerigk L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 2011, 32, 1456–1465. 10.1002/jcc.21759. [DOI] [PubMed] [Google Scholar]
- Becke A. D. A new mixing of Hartree-Fock and local density-functional theories. J. Chem. Phys. 1993, 98, 1372–1377. 10.1063/1.464304. [DOI] [Google Scholar]
- Kendall R. A.; Dunning T. H. Jr; Harrison R. J. Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 1992, 96, 6796–6806. 10.1063/1.462569. [DOI] [Google Scholar]
- Shao Y.; Gan Z.; Epifanovsky E.; Gilbert A. T.B.; Wormit M.; Kussmann J.; Lange A. W.; Behn A.; Deng J.; Feng X.; Ghosh D.; Goldey M.; Horn P. R.; Jacobson L. D.; Kaliman I.; Khaliullin R. Z.; Kus T.; Landau A.; Liu J.; Proynov E. I.; Rhee Y. M.; Richard R. M.; Rohrdanz M. A.; Steele R. P.; Sundstrom E. J.; Woodcock H. L.; Zimmerman P. M.; Zuev D.; Albrecht B.; Alguire E.; Austin B.; Beran G. J. O.; Bernard Y. A.; Berquist E.; Brandhorst K.; Bravaya K. B.; Brown S. T.; Casanova D.; Chang C.-M.; Chen Y.; Chien S. H.; Closser K. D.; Crittenden D. L.; Diedenhofen M.; DiStasio R. A.; Do H.; Dutoi A. D.; Edgar R. G.; Fatehi S.; Fusti-Molnar L.; Ghysels A.; Golubeva-Zadorozhnaya A.; Gomes J.; Hanson-Heine M. W.D.; Harbach P. H.P.; Hauser A. W.; Hohenstein E. G.; Holden Z. C.; Jagau T.-C.; Ji H.; Kaduk B.; Khistyaev K.; Kim J.; Kim J.; King R. A.; Klunzinger P.; Kosenkov D.; Kowalczyk T.; Krauter C. M.; Lao K. U.; Laurent A. D.; Lawler K. V.; Levchenko S. V.; Lin C. Y.; Liu F.; Livshits E.; Lochan R. C.; Luenser A.; Manohar P.; Manzer S. F.; Mao S.-P.; Mardirossian N.; Marenich A. V.; Maurer S. A.; Mayhall N. J.; Neuscamman E.; Oana C. M.; Olivares-Amaya R.; O’Neill D. P.; Parkhill J. A.; Perrine T. M.; Peverati R.; Prociuk A.; Rehn D. R.; Rosta E.; Russ N. J.; Sharada S. M.; Sharma S.; Small D. W.; Sodt A.; Stein T.; Stuck D.; Su Y.-C.; Thom A. J.W.; Tsuchimochi T.; Vanovschi V.; Vogt L.; Vydrov O.; Wang T.; Watson M. A.; Wenzel J.; White A.; Williams C. F.; Yang J.; Yeganeh S.; Yost S. R.; You Z.-Q.; Zhang I. Y.; Zhang X.; Zhao Y.; Brooks B. R.; Chan G. K.L.; Chipman D. M.; Cramer C. J.; Goddard W. A.; Gordon M. S.; Hehre W. J.; Klamt A.; Schaefer H. F.; Schmidt M. W.; Sherrill C. D.; Truhlar D. G.; Warshel A.; Xu X.; Aspuru-Guzik A.; Baer R.; Bell A. T.; Besley N. A.; Chai J.-D.; Dreuw A.; Dunietz B. D.; Furlani T. R.; Gwaltney S. R.; Hsu C.-P.; Jung Y.; Kong J.; Lambrecht D. S.; Liang W.; Ochsenfeld C.; Rassolov V. A.; Slipchenko L. V.; Subotnik J. E.; Van Voorhis T.; Herbert J. M.; Krylov A. I.; Gill P. M.W.; Head-Gordon M. Advances in molecular quantum chemistry contained in the Q-Chem 4 program package. Mol. Phys. 2015, 113, 184–215. 10.1080/00268976.2014.952696. [DOI] [Google Scholar]
- Sirianni D. A.; Zhu X.; Sitkoff D. F.; Cheney D. L.; Sherrill C. D. The influence of a solvent environment on direct non-covalent interactions between two molecules: A symmetry-adapted perturbation theory study of polarization tuning of π–π interactions by water. J. Chem. Phys. 2022, 156, 194306. 10.1063/5.0087302. [DOI] [PubMed] [Google Scholar]
- Parker T. M.; Burns L. A.; Parrish R. M.; Ryno A. G.; Sherrill C. D. Levels of symmetry adapted perturbation theory (SAPT). I. Efficiency and performance for interaction energies. J. Chem. Phys. 2014, 140, 094106. 10.1063/1.4867135. [DOI] [PubMed] [Google Scholar]
- East A. L.; Allen W. D. The heat of formation of NCO. Chem. Phys. 1993, 99, 4638–4650. 10.1063/1.466062. [DOI] [Google Scholar]
- Csaszar A. G.; Allen W. D.; Schaefer H. F. III In pursuit of the ab initio limit for conformational energy prototypes. Chem. Phys. 1998, 108, 9751–9764. 10.1063/1.476449. [DOI] [Google Scholar]
- Halkier A.; Helgaker T.; Jørgensen P.; Klopper W.; Koch H.; Olsen J.; Wilson A. K. Basis-set convergence in correlated calculations on Ne, N2, and H2O. Chem. Phys. Lett. 1998, 286, 243–252. 10.1016/S0009-2614(98)00111-0. [DOI] [Google Scholar]
- Marshall M. S.; Burns L. A.; Sherrill C. D. Basis set convergence of the coupled-cluster correction, δMP2CCSD(T): Best practices for benchmarking non-covalent interactions and the attendant revision of the S22, NBC10, HBC6, and HSG databases. J. Chem. Phys. 2011, 135, 194102. 10.1063/1.3659142. [DOI] [PubMed] [Google Scholar]
- Burns L. A.; Mayagoitia Á. V.; Sumpter B. G.; Sherrill C. D. Density-functional approaches to noncovalent interactions: A comparison of dispersion corrections (DFT-D), exchange-hole dipole moment (XDM) theory, and specialized functionals. J. Chem. Phys. 2011, 134, 084107. 10.1063/1.3545971. [DOI] [PubMed] [Google Scholar]
- Burns L. A.; Marshall M. S.; Sherrill C. D. Appointing silver and bronze standards for noncovalent interactions: A comparison of spin-component-scaled (SCS), explicitly correlated (F12), and specialized wavefunction approaches. J. Chem. Phys. 2014, 141, 234111. 10.1063/1.4903765. [DOI] [PubMed] [Google Scholar]
- Koch H.; Fernández B.; Christiansen O. The benzene–argon complex: a ground and excited state ab initio study. J. Chem. Phys. 1998, 108, 2784–2790. 10.1063/1.475669. [DOI] [Google Scholar]
- Tsuzuki S.; Honda K.; Uchimaru T.; Mikami M.; Tanabe K. Origin of attraction and directionality of the π/π interaction: model chemistry calculations of benzene dimer interaction. J. Am. Chem. Soc. 2002, 124, 104–112. 10.1021/ja0105212. [DOI] [PubMed] [Google Scholar]
- Sinnokrot M. O.; Valeev E. F.; Sherrill C. D. Estimates of the ab initio limit for π- π interactions: The benzene dimer. J. Am. Chem. Soc. 2002, 124, 10887–10893. 10.1021/ja025896h. [DOI] [PubMed] [Google Scholar]
- Jurečka P.; Hobza P. On the convergence of the (ΔECCSD(T) - ΔEMP2) term for complexes with multiple H-bonds. Chem. Phys. Lett. 2002, 365, 89–94. 10.1016/S0009-2614(02)01423-9. [DOI] [Google Scholar]
- Hobza P.; Šponer J. Toward true DNA base-stacking energies: MP2, CCSD(T), and complete basis set calculations. J. Am. Chem. Soc. 2002, 124, 11802–11808. 10.1021/ja026759n. [DOI] [PubMed] [Google Scholar]
- Sinnokrot M. O.; Sherrill C. D. High-accuracy quantum mechanical studies of π- π interactions in benzene dimers. J. Phys. Chem. A 2006, 110, 10656–10668. 10.1021/jp0610416. [DOI] [PubMed] [Google Scholar]
- Boese A. D.; Martin J. M.; Klopper W. Basis set limit coupled cluster study of H-bonded systems and assessment of more approximate methods. J. Phys. Chem. A 2007, 111, 11122–11133. 10.1021/jp072431a. [DOI] [PubMed] [Google Scholar]
- Janowski T.; Pulay P. High accuracy benchmark calculations on the benzene dimer potential energy surface. Chem. Phys. Lett. 2007, 447, 27–32. 10.1016/j.cplett.2007.09.003. [DOI] [Google Scholar]
- Pitonak M.; Janowski T.; Neogrády P.; Pulay P.; Hobza P. Convergence of the CCSD (T) correction term for the stacked complex methyl adenine- methyl thymine: comparison with lower-cost alternatives. J. Chem. Theory Comput. 2009, 5, 1761–1766. 10.1021/ct900126q. [DOI] [PubMed] [Google Scholar]
- Takatani T.; Hohenstein E. G.; Malagoli M.; Marshall M. S.; Sherrill C. D. Basis set consistent revision of the S22 test set of noncovalent interaction energies. J. Chem. Phys. 2010, 132, 144104. 10.1063/1.3378024. [DOI] [PubMed] [Google Scholar]
- Faver J. C.; Benson M. L.; He X.; Roberts B. P.; Wang B.; Marshall M. S.; Kennedy M. R.; Sherrill C. D.; Merz K. M. Jr Formal estimation of errors in computed absolute interaction energies of protein- ligand complexes. J. Chem. Theory Comput. 2011, 7, 790–797. 10.1021/ct100563b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carrell E. J.; Thorne C. M.; Tschumper G. S. Basis set dependence of higher-order correlation effects in π-type interactions. J. Chem. Phys. 2012, 136, 014103. 10.1063/1.3671950. [DOI] [PubMed] [Google Scholar]
- Sirianni D. A.; Alenaizan A.; Cheney D. L.; Sherrill C. D. Assessment of density functional methods for geometry optimization of bimolecular van der Waals complexes. J. Chem. Theory Comput. 2018, 14, 3004–3013. 10.1021/acs.jctc.8b00114. [DOI] [PubMed] [Google Scholar]
- Boys S. F.; Bernardi F. The calculation of small molecular interactions by the differences of separate total energies. Some procedures with reduced errors. Mol. Phys. 1970, 19, 553–566. 10.1080/00268977000101561. [DOI] [Google Scholar]
- Feller D. Application of systematic sequences of wave functions to the water dimer. J. Chem. Phys. 1992, 96, 6104–6114. 10.1063/1.462652. [DOI] [Google Scholar]
- Burns L. A.; Marshall M. S.; Sherrill C. D. Comparing counterpoise-corrected, uncorrected, and averaged binding energies for benchmarking noncovalent interactions. J. Chem. Theory Comput. 2014, 10, 49–57. 10.1021/ct400149j. [DOI] [PubMed] [Google Scholar]
- Parrish R. M.; Burns L. A.; Smith D. G. A.; Simmonett A. C.; DePrince A. E.; Hohenstein E. G.; Bozkaya U.; Sokolov A. Y.; Di Remigio R.; Richard R. M.; Gonthier J. F.; James A. M.; McAlexander H. R.; Kumar A.; Saitow M.; Wang X.; Pritchard B. P.; Verma P.; Schaefer H. F.; Patkowski K.; King R. A.; Valeev E. F.; Evangelista F. A.; Turney J. M.; Crawford T. D.; Sherrill C. D. Psi4 1.1: An open-source electronic structure program emphasizing automation, advanced libraries, and interoperability. J. Chem. Theory Comput. 2017, 13, 3185–3197. 10.1021/acs.jctc.7b00174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith D. G. A.; Burns L. A.; Simmonett A. C.; Parrish R. M.; Schieber M. C.; Galvelis R.; Kraus P.; Kruse H.; Di Remigio R.; Alenaizan A.; James A. M.; Lehtola S.; Misiewicz J. P.; Scheurer M.; Shaw R. A.; Schriber J. B.; Xie Y.; Glick Z. L.; Sirianni D. A.; O’Brien J. S.; Waldrop J. M.; Kumar A.; Hohenstein E. G.; Pritchard B. P.; Brooks B. R.; Schaefer H. F.; Sokolov A. Y.; Patkowski K.; DePrince A. E.; Bozkaya U.; King R. A.; Evangelista F. A.; Turney J. M.; Crawford T. D.; Sherrill C. D. PSI4 1.4: Open-source software for high-throughput quantum chemistry. J. Chem. Phys. 2020, 152, 184108. 10.1063/5.0006002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou H.; Kimsey I. J.; Nikolova E. N.; Sathyamoorthy B.; Grazioli G.; McSally J.; Bai T.; Wunderlich C. H.; Kreutz C.; Andricioaei I.; Al-Hashimi H. M. m1A and m1G disrupt A-RNA structure through the intrinsic instability of Hoogsteen base pairs. Nat. Struct. Mol. Biol. 2016, 23, 803–810. 10.1038/nsmb.3270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rangadurai A.; Zhou H.; Merriman D. K.; Meiser N.; Liu B.; Shi H.; Szymanski E. S.; Al-Hashimi H. M. Why are Hoogsteen base pairs energetically disfavored in A-RNA compared to B-DNA?. Nucleic Acids Res. 2018, 46, 11099–11114. 10.1093/nar/gky885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi S.; Sugimoto N. Watson-Crick versus Hoogsteen Base Pairs: Chemical Strategy to Encode and Express Genetic Information in Life. Acc. Chem. Res. 2021, 54, 2110–2120. 10.1021/acs.accounts.0c00734. [DOI] [PubMed] [Google Scholar]
- Hoogsteen K. The structure of crystals containing a hydrogen-bonded complex of 1-methylthymine and 9-methyladenine. Acta Crystallogr. 1959, 12, 822–823. 10.1107/S0365110X59002389. [DOI] [Google Scholar]
- Kyogoku Y.; Higuchi S.; Tsuboi M. Intra-red absorption spectra of the single crystals of 1-methyl-thymine, 9-methyladenine and their 1:1 complex. Spectrochimica Acta Part A: Molecular Spectroscopy 1967, 23, 969–983. 10.1016/0584-8539(67)80022-9. [DOI] [Google Scholar]
- Kyogoku Y.; Lord R.; Rich A. Hydrogen bonding specificity of nucleic acid purines and pyrimidines in solution. Science 1966, 154, 518–520. 10.1126/science.154.3748.518. [DOI] [PubMed] [Google Scholar]
- Kyogoku Y.; Lord R.; Rich A. An infrared study of the hydrogen-bonding specificity of hypoxanthine and other nucleic acid derivatives. Biochim. Biophys. Acta, Nucleic Acids Protein Synth. 1969, 179, 10–17. 10.1016/0005-2787(69)90116-6. [DOI] [PubMed] [Google Scholar]
- Šponer J.; Sabat M.; Burda J. V.; Leszczynski J.; Hobza P. Interaction of the adenine- thymine Watson- Crick and adenine- adenine reverse-Hoogsteen DNA base pairs with hydrated group IIa (Mg2+, Ca2+, Sr2+, Ba2+) and IIb (Zn2+, Cd2+, Hg2+) metal cations: absence of the base pair stabilization by metal-induced polarization effects. J. Phys. Chem. B 1999, 103, 2528–2534. 10.1021/jp983744w. [DOI] [Google Scholar]
- Šponer J.; Leszczyński J.; Hobza P. Nature of nucleic acid- base stacking: nonempirical ab initio and empirical potential characterization of 10 stacked base dimers. Comparison of stacked and H-bonded base pairs. J. Phys. Chem. A 1996, 100, 5590–5596. 10.1021/jp953306e. [DOI] [Google Scholar]
- Kratochvíl M.; Šponer J.; Hobza P. Global Minimum of the Adenine ··· Thymine Base Pair Corresponds Neither to Watson-Crick Nor to Hoogsteen Structures. Molecular Dynamic/Quenching/AMBER and ab Initio beyond Hartree-Fock Studies. J. Am. Chem. Soc. 2000, 122, 3495–3499. 10.1021/ja9936060. [DOI] [Google Scholar]
- Ryjacek F.; Engkvist O.; Vacek J.; Kratochvil M.; Hobza P. Hoogsteen and stacked structures of the 9-methyladenine center dot center dot center dot 1-methylthymine pair are populated equally at experimental conditions: Ab initio and molecular dynamics study. J. Phys. Chem. A 2001, 105, 1197–1202. 10.1021/jp003078a. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Supporting Information includes Cartesian coordinates for all geometries considered in this work. Most computations were performed using the open-source Psi4 quantum chemistry program, version 1.5, freely available from https://psicode.org. As noted in the Theoretical Methods section, geometry optimizations using B3LYP-D3(BJ) were peformed using the Q-Chem code, version 5.1, available from https://q-chem.com.

