Pseudobond parameters for QM/MM studies involving nucleosides, nucleotides, and their analogs

Robin Chaudret; Jerry M Parks; Weitao Yang

doi:10.1063/1.4772182

. 2013 Jan 24;138(4):045102. doi: 10.1063/1.4772182

Pseudobond parameters for QM/MM studies involving nucleosides, nucleotides, and their analogs

Robin Chaudret ¹, Jerry M Parks ², Weitao Yang ^1,^3,^a)

PMCID: PMC3568090 PMID: 23387624

Abstract

In biological systems involving nucleosides, nucleotides, or their respective analogs, the ribose sugar moiety is the most common reaction site, for example, during DNA replication and repair. However, nucleic bases, which comprise a sizable portion of nucleotide molecules, are usually unreactive during such processes. In quantum mechanical/molecular simulations of nucleic acid reactivity, it may therefore be advantageous to describe specific ribosyl or ribosyl phosphate groups quantum mechanically and their respective nucleic bases with a molecular mechanics potential function. Here, we have extended the pseudobond approach to enable quantum mechanical/molecular mechanical simulations involving nucleotides, nucleosides, and their analogs in which the interface between the two subsystems is located between the sugar and the base, namely, the C(sp³)–N(sp²) bond. The pseudobond parameters were optimized on a training set of 10 molecules representing several nucleotide and nucleoside bases and analogs, and they were then tested on a larger test set of 20 diverse molecules. Particular emphasis was placed on providing accurate geometries and electrostatic properties, including electrostatic potential, natural bond orbital (NBO) and atoms in molecules (AIM) charges and AIM first moments. We also tested the optimized parameters on five nucleotide and nucleoside analogues of pharmaceutical relevance and a small polypeptide (triglycine). Accuracy was maintained for these systems, which highlights the generality and transferability of the pseudobond approach.

INTRODUCTION

In complex reactive systems, the influence of the macromolecular environment on chemical reactions is important. These effects are particularly relevant for biochemical systems, in which amino acid residues not only interact with ligands to facilitate specific binding but also participate directly in chemical reactions as catalyst. The computational cost of describing a complete biomolecular system quantum mechanically is, in general, prohibitive. Classical molecular mechanics approaches are then needed to model the biomolecular environment. The quantum mechanical/molecular mechanical (QM/MM) approach¹ enables chemical reactions to be modeled accurately while also including the influence of the macromolecular environment and solvent in simulations of complex molecular systems. The application of QM/MM methods to studies of biological systems has undergone sustained growth over the past two or more decades.²^,³^,⁴^,⁵^,⁶^,⁷ Although the approach has achieved great success in numerous applications,⁸^,⁹ several problems remain to be addressed. One such problem involves the interface between the QM and MM subsystems. Proper description of the junction between the two subsystems has been widely studied,⁸^,⁹^,¹⁰^,¹¹^,¹²^,¹³ and several approaches have been developed to resolve it. The main methods used most often in QM/MM simulations are the link atom,¹⁴^,¹⁵^,¹⁶ frozen orbital,¹⁷^,¹⁸^,¹⁹^,²⁰^,²¹^,²²^,²³^,²⁴^,²⁵ pseudobond,²⁶^,²⁷^,²⁸ and quantum capping²⁹ approaches, each having its own advantages and disadvantages.

The pseudobond method was developed to enable a straightforward and accurate interface between QM and MM subsystems without introducing additional degrees of freedom.²⁶ Here, we provide a brief summary of the pseudobond method. A fluorine atom with a single free valence electron is introduced into the QM subsystem to replace the first MM atom at the QM/MM boundary and complete the valence of the QM subsystem. This atom is described using a special basis set and effective core potential (ECP), both of which are parameterized to mimic the original bond across the boundary while contributing minimal perturbations to the electronic structure of the QM subsystem. In the initial pseudobond formulation,²⁶ a 6-31G(d) basis set was used for all atoms (including the fluorine), and only C_ps(sp³)–C(sp³) bonds were considered, enabling subsystem partitioning across aliphatic C–C bonds in, for example, the side chains of proteins.²⁶^,²⁷ The testing set used in that work to assess parameter accuracy included a set of eight simple functionalized alkanes (i.e., ethanol, ethylamine, ethylthiol, and propionic acid in neutral and charged protonation states), and the molecular properties included various bond distances, angles, bond dissociation energies, and Mulliken charges.³⁰ In subsequent work, C_ps(sp³)–C(sp²) and C_ps(sp³)–N(sp²) pseudobonds were also developed, which enabled QM/MM boundaries to be located along the protein backbone.²⁷ A smaller STO-2G basis and angular-momentum-independent ECP were used for increased efficiency. Later, C_ps(sp³)–C(sp³), C_ps(sp³)–C(sp²), and C_ps(sp³)–N(sp²) pseudobond parameters were developed which provided an improved description of electrostatics and deprotonation energies, while maintaining accurate geometries.²⁸ In that work, the criteria used during parameter optimization and testing included accurate geometries (i.e., bonds and angles), bond dissociation energies of the parameterized bond, deprotonation energies for ionizable side chains, and electrostatic potential (ESP) fitted, rather than Mulliken, charges, the latter of which are known to suffer from significant basis set dependence effects.

Previous pseudobond development was primarily oriented toward studying enzymes. It is therefore desirable to extend the capabilities to include nucleosides, nucleotides, and their analogs for simulations of DNA polymerization or cleavage processes. Nucleosides can be divided into two distinct subunits: the nucleobase and the ribosyl or deoxyribosyl group. Likewise, nucleotides are composed of a nucleobase and a (deoxy)ribosyl phosphate. In chemical processes involving nucleosides or nucleotides, reactivity occurs mainly at the level of the sugar ring, whereas the bases are typically unreactive and only interact with the reactive center through non-covalent interactions. Partitioning a nucleotide base into a reactive (i.e., QM) sugar group and an unreactive (MM) base is desirable because it reduces the computational cost significantly relative to describing an entire nucleotide or nucleoside quantum mechanically.

Here, we develop pseudobond parameters specifically for QM/MM simulations of nucleic acid-containing systems. These parameters enable partitioning across the N(sp²)–C(sp³) bond between the sugar moiety and the nucleobase, in which the sugar is contained in the QM subsystem and the base is included in the MM subsystem. In Secs. 2, 3, 4, 5, we first provide a brief summary of the pseudobond method. We then describe the approach used to carry out the pseudobond parameter optimizations and the training and testing sets used. Next, we assess the accuracy of various molecular properties using the resulting parameters. We then analyze the influence of the basis set of the QM subsystem on the accuracy of pseudobond-containing systems. Finally, we test the generality of our optimized parameters on a set of molecules that contain chemically diverse structures.

THEORY

As in previous pseudobond formulations,²⁷^,²⁸ the atom described by the pseudopotential and STO-2G basis set is a fluorine atom, which has nine electrons. Two electrons are described by the ECP and the other seven are valence electrons, leaving one free valence for bonding to the rest of the QM subsystem. The nuclear charge is that of fluorine. The form of the STO-2G basis set is

Φ_{s} {= g}_{s} {(α}_{1}, R) + d_{1} g_{s} {(α}_{2}, R),

Φ_{p} {= g}_{p} {(α}_{1}, R) + d_{2} g_{p} {(α}_{2}, R),

where g_s and g_p are normalized s- and p-type Gaussian functions, α₁ and α₂ are exponents, and d₁ and d₂ are coefficients. The functional form of the ECP is

V_{eff} (r) = a \exp (- {br}^{2}) / r,

where a and b are the coefficient and exponent, respectively. Thus, there are six adjustable parameters that are optimized: a, b, α_1, α₁, d₁, and d₂.

TRAINING AND TESTING SETS

Here, we adopt the following notation to describe the molecules used in the parameter optimization. A standard molecule (StdMol) refers to a reference molecule that contains a specific covalent linkage to be reproduced using a pseudobond (Figure 1a). A pseudobond-containing molecule (PsMol) is the analog of a StdMol in which a portion of the molecule has been replaced by a fluorine atom consisting of the parameterized ECP and basis set. The pseudobond parameters were optimized using a training set consisting of pairs of molecules (StdMol, PsMol). In this work, three different training sets were used:

The purine training set included models of adenosine, deoxyadenosine, guanosine, and deoxyguanosine in which the CH₂OH group on C4^′ was replaced by CH₃ (Figure 1b).
The pyrimidine training set included models of thymidine, deoxythymidine, cytidine, deoxycytidine, uridine, and deoxyuridine in which the –CH₂OH group on C4^′ was replaced by CH₃ (Figure 1b).
The complete training set included all molecules from both the purine and pyrimidine sets.

(a) Example of a truncated nucleoside (deoxycytidine) analog used in the training set during pseudobond parameter optimization. Atoms in the QM subsystem are shown in black, the N_ps boundary atom in blue, and MM atoms in red. (b) Representation of the different training and test sets used in this work.

In the following study, five different sets of parameters are discussed. The Pur1 and Pur2 (or Pyr1 and Pyr2) sets were optimized using only the purine (or pyrimidine) training set, and were optimized using two different sets of initial parameters. Finally, the Total set was optimized using the Total training set, which is simply the union of the purine and pyrimidine training sets. Only one initial guess set of parameters was used in that case.

After parameter optimization, the performance of the resulting parameters was tested using a larger test set of molecules and properties. In addition to the molecules present in the total training set, the testing set also included the monophosphorylated (phosphate dianion) counterparts of each nucleoside (i.e., nucleotides) (Figure 1b). Additional properties included in the assessment were geometries (bond lengths, angles, and dihedrals), ESP and NBO³¹^,³² charges, AIM³³^,³⁴ charges and AIM first moments³⁵ (M1). The AIM M1 is the first electrostatic moment of a molecular space (an atom in AIM) and can be compared to the dipole of an atom in a molecule. Unlike the parameter optimizations, for which the geometry of each PsMol was kept fixed at its corresponding StdMol geometry, full geometry optimizations were carried out for all molecules in the testing set. As in previous studies,²⁷^,²⁸ pure QM optimizations were performed instead of full QM/MM optimizations.

The various properties for each PsMol were compared to the corresponding StdMols. All atoms from the QM subsystem were included in the comparisons except for the phosphate group and the C5^′ hydrogens (for charges and dipoles) and also any bond, angle, and dihedral that included those atoms. The phosphate group was not included in the assessments because it was quite mobile and therefore significant artificial variations occurred. As in the training set, three different test sets were used to investigate the influence of the training set on the accuracy of the pseudobond, and purine, pyrimidine, and total testing sets were defined. The purine testing set contained all molecules from the purine training set and their monophosphate equivalents. Likewise, the pyrimidine testing set included the pyrimidine training set molecules and their monophosphate equivalents. Finally, the total testing set combined the purine and pyrimidine testing sets.

PARAMETER OPTIMIZATION

The various sets of pseudobond parameters were optimized by minimizing an objective error function, E_total, using the gradient-free Powell optimization procedure.³⁶ The objective function consisted of a weighted sum of error terms:

E_{t o t a l} = \sum_{p r o m p i} ω_{i} E_{i} (λ), λ = (α, b, α_{1}, α_{2}, d_{1}, d_{2}),

where E_i(λ) represents the error for property i computed with the parameter λ and ω_i is the weight assigned to that property. The weight factors were taken from our previous work,²⁸ and were chosen such that each term was treated approximately equally by taking into account the relative magnitudes of each energy or property. However, geometries and ESP charges were given greater weights than bond dissociation energies to reflect their relative importance. The error for a given property was calculated as the difference between the value of property i for a standard reference molecule (StdMol) and for the same property computed for the truncated molecule containing a pseudobond (PsMol). The three different properties of interest in this optimization are:

The gradient norm error was calculated as the norm of a single vector containing all the individual geometric gradients of the pseudobond-containing molecules fixed at their fully optimized StdMol geometries.
The ESP charge error was calculated as the rms deviation in ESP charges between all non-pseudobond atoms in PsMol and its corresponding StdMol.
The bond dissociation energy error was calculated as the rms deviation between the bond dissociation energies of the parameterized bonds (i.e., N_ps(sp²)–C(sp³) bonds) of molecules containing a pseudobond (PsMol) and their corresponding StdMols.

In all cases, a relative weight of 1.0 was used for the gradient norm term. A weight factor of 0.46 was used for ESP charges, and bond dissociation energies were given a weight of 0.0006. No further manual adjustment of weights was performed. The units of each weight are the inverse of the corresponding property. All calculations were performed at the B3LYP³⁷^,³⁸/6-31G*³⁹ level of theory using GAUSSIAN09⁴⁰ unless otherwise noted.

RESULTS AND DISCUSSION

Five sets of optimized pseudobond parameters are listed in Table 1. Each set was obtained by optimizing a different initial guess set of parameters against a particular training set. The initial guesses came from previously optimized parameters or pseudorandom numbers within a specified numerical range. In general, the different parameter sets provide similar accuracies (see Table 2) suggesting that, for a given bond, the nature of the molecule used in the training set has only a minor influence on the accuracy of the parameters, provided that somewhat similar and chemically reasonable training sets are used. For example, the Pur2 parameter set yields errors comparable to the Pyr2 set when assessed using the Total test set. Thus, the various pseudobond parameter sets exhibit some degree of transferability and are therefore able to model the desired bond in different molecular systems. In the remainder of this work, only the Tot parameters optimized on the Total training set are discussed further as they were found to be slightly more accurate than the other sets. Bond dissociation errors computed for the total training set molecules are provided in the supplementary material (Table S2).⁴² All errors are less than 5% of the total bond dissociation energy.

Table 1.

Five sets of optimized pseudobond parameters from the total test set (Tot), the purine test sets (Pur1 and Pur2), and the pyrimidine test sets (Pyr1 and Pyr2).

Set	a	b	α₁	α₂	d₁	d₂
Tot	24.92677	18.70758	1.82997	0.28389	3.95334	1.19779
Pur1	24.88664	17.70140	1.87873	0.28288	4.53586	1.27451
Pur2	33.87962	25.08505	2.13980	0.30586	6.73700	1.33034
Pyr1	33.29602	26.98681	2.04912	0.31768	5.00416	1.24585
Pyr2	22.92485	15.27730	1.68929	0.27526	4.18348	1.20014

Open in a new tab

Table 2.

RMS error for ESP, NBO, and AIM charges, AIM M1, bond distances, and bond angles for the different optimized parameters. The tables summarize the rmse values obtained for the purine (a), pyrimidine (b), and total (c) test sets.

a
RMSE	Tot	Pur1	Pur2	Pyr1	Pyr2
ESP charge (a.u.)	0.070	0.075	0.067	0.067	0.069
NBO charge (a.u.)	0.026	0.031	0.024	0.021	0.033
AIM charge (a.u.)	0.081	0.084	0.087	0.071	0.075
AIM M1 (a.u.)	0.042	0.043	0.042	0.040	0.041
Distances (Å)	0.007	0.010	0.007	0.008	0.007
Angles (°)	0.92	1.59	0.92	1.24	0.92
b
RMSE	Tot	Pur1	Pur2	Pyr1	Pyr2
ESP charge (a.u.)	0.071	0.074	0.074	0.071	0.071
NBO charge (a.u.)	0.021	0.024	0.020	0.016	0.028
AIM charge (a.u.)	0.067	0.072	0.066	0.058	0.068
AIM M1 (a.u.)	0.035	0.037	0.038	0.035	0.037
Distances (Å)	0.017	0.018	0.018	0.017	0.017
Angles (°)	1.19	1.44	1.26	1.40	1.39
c
RMSE	Tot	Pur1	Pur2	Pyr1	Pyr2
ESP charge (a.u.)	0.071	0.074	0.071	0.070	0.071
NBO charge (a.u.)	0.023	0.027	0.022	0.018	0.030
AIM charge (a.u.)	0.079	0.077	0.080	0.070	0.077
AIM M1 (a.u.)	0.038	0.040	0.040	0.037	0.039
Distances (Å)	0.014	0.015	0.015	0.014	0.014
Angles (°)	1.09	1.50	1.13	1.34	1.22

Open in a new tab

Geometric properties

The bond and angle deviations for the PsMol set relative to the StdMol equivalents are shown in Figure 2 and Table 2. The rmse is 0.01 Å for the bonds and 1.1° for the angles, and the maximum signed errors are +0.06 Å (O3^′–H3^′ bond of deoxycytidine monophosphate), −0.09 (O3^′–H3^′ bond of thymidine monophosphate) for the bonds and +5.1° (C3^′–C4^′–C5^′ angle of deoxyadenosine) and −5.0° (C3^′–C4^′–H4^′ angle of deoxyadenosine) for the angles. The rmse values are only slightly larger than those for proteins in the previous study.²⁸ The presence of hydrogen bonds of varying strength (depending on the presence or absence of the pseudobond) between a phosphoryl oxygen and H3^′ of the sugar ring is responsible for the majority of the deviations for the bonds and angles. This hydrogen bond is artificial and would not be present in a real QM/MM simulation where steric effects from the MM environment would preclude such an interaction. Figure 2 shows that the deviations for the geometric parameters are in general minor, although they are more significant for the angles than for the bonds. Deviations in dihedral angles tend to be more significant than for angles, mainly because there exists the possibility of phosphoryl group rotation in the nucleotides that is absent in nucleosides. Further analysis of dihedrals is provided in the supplementary material.⁴²

Correlations of (a) bond distances (in Å) and (b) angles (in º) between pseudobond-containing molecules (PsMol) and corresponding standard molecules (StdMol) for the total test set using the Tot parameters.

Atomic partial charges

In this study, three different atomic partial charge schemes have been studied as well as, for the first time in pseudobond studies, the AIM atomic M1, which is associated with the atomic dipoles (Table 2). The ESP charge rmse value is 0.07 a.u. and the maximum deviations are +0.35 a.u. (C4^′ atom of adenosine monophosphate) and −0.32 a.u. (C2^′ of adenosine monophosphate), which are of the same order of magnitude as errors in previous studies. This deviation is similar to the AIM results (rmse = 0.07 a.u. and maximum deviations = +0.19 a.u. and −0.32 a.u., see also Figure 3c) but is worse than the NBO one (rmse = 0.02 and maximum deviations = +0.07 a.u. and −0.11 a.u., see also Figure 3b). These results show that the pseudobond method is not only able to reproduce ESP charges, which are important for obtaining accurate QM/MM electrostatic energies, but also AIM and NBO charges, which are commonly used to analyze atomic charges in quantum chemical studies.

Correlation between pseudobond containing molecules (PsMol) and standard molecules (StdMol) for (a) ESP charges, (b) NBO charges, (c) AIM charges, and (d) AIM M1 computed using the Total testing set.

The AIM M1 values are also well reproduced in the pseudobond systems (rmse = 0.04 a.u., positive and negative maximum deviations +0.18 a.u. and −0.18 a.u.; see also Figure 3d), which indicates that the polarization properties of the molecules with or without pseudobonds are similar, and therefore the polarization of the QM subsystem will be similar at the pure QM or QM/MM level. Future applications of polarizable QM/MM schemes will likely benefit from such properties.

Additional tests

Three additional tests were performed to study the generality of the pseudobond approach for DNA-like systems. First, the basis set dependence for the accuracy of the pseudobond parameters was investigated. Using the optimized Tot parameter set, the basis set for the rest of the QM subsystem was varied and LANL2DZ, 3-21G, 6-31G, 6-31+G(d), 6-31++G(d,p), 6-311++G(d,p) basis sets were selected. They were chosen to be representative of the “typical” basis sets used in QM/MM simulations in order to give an idea of the expected accuracy of the pseudobonds in biomolecular simulations. As a second test, we assessed the transferability of the optimized parameters to chemically diverse, pharmaceutically relevant systems. Four drug molecules, each containing an N(sp²)–C(sp³) bond, were chosen. The chemical structures of acyclovir, clofarabine, emtricitabine, and tenofovir are shown in Figure 2. Clofarabine (Figure 4b) is a purine nucleoside antimetabolite used in leukemia treatment. Emtricitabine (Figure 4c) and tenofovir (Figure 4d) are nucleoside reverse transcriptase inhibitors that are used to treat HIV. The electronic structures of these compounds are significantly different from standard nucleosides and nucleotides. Acyclovir (Figure 4a), or acycloguanosine, is a guanosine analogue that is used to treat viral infections. As an additional test, we also included a polypeptide, triglycine (Figure 4e), to show that the pseudobonds can be used in a new way to cut along the peptide backbone in protein systems.

Representations of the molecules used for pseudobond testing: (a) acyclovir, (b) clofarabine, (c) emtricitabine, (d) tenofovir, and (e) GLY₃ (triglycine) polypeptide. Atoms in the QM subsystem are shown in black, the N_ps boundary atom in blue, and MM atoms in red.

Basis set effects

During the parameter optimizations, all calculations were performed at the B3LYP/6-31G(d) level of theory. However, it is important to assess the performance of the resulting parameters when other basis sets are used to describe the QM subsystem. A total of six different basis sets were tested, and the results are summarized in Table 3. As expected, the 6-31G(d) basis set gives, on average, better results than the others, although other basis sets also provide reasonable results. The most pronounced differences are evident in the dihedral terms. The dihedral rmse values can be explained by the significant differences between the PsMols and StdMols in the ring conformations that occur for all basis sets except 6-31G(d). It is also interesting to note that larger or smaller basis sets yield practically the same levels of accuracy relative to their respective StdMol reference molecules. Indeed, the pseudobonds were optimized for use with the 6-31G(d) basis set. Therefore, although an oversimplification, the trend is that the more different the basis set is from 6-31G(d), the larger the deviation. However, on average, the error remains of the same order of magnitude as for the training basis set, 6-31G(d), demonstrating that the pseudobond approach can be used with any basis set as long as the accuracy for important properties, e.g., dihedral angles, is carefully checked.

Table 3.

RMS error for charges, first moments M1 (a.u.), bond distances (Å), and angles (degrees) and the dihedrals for adenosine monophosphate computed using various basis sets.

RMSE	ESP charges	NBO charges	AIM charges	AIM M1	Bond	Angle	Dihedral
LANL2DZ	0.057	0.017	0.056	0.016	0.017	1.71	31.0
3-21G	0.034	0.027	0.066	0.014	0.014	1.25	13.1
6-31G(d)	0.020	0.022	0.074	0.040	0.005	1.69	2.3
6-31+G(d)	0.055	0.026	0.069	0.026	0.013	1.55	25.9
6-31++G(d,p)	0.055	0.024	0.088	0.018	0.013	1.57	26.2
6-311++G(d,p)	0.056	0.030	0.033	0.022	0.010	2.40	27.7

Open in a new tab

The dependence of the accuracy of the pseudobond method on the particular quantum chemical method was not studied here. However, it has been shown previously²⁶^,²⁸ that the accuracy was quite insensitive to the particular method used, including HF, MP2, and various DFT methods.

Generality of the pseudobond parameters

Finally, to assess the generality of the parameterized pseudobonds, four diverse molecules of pharmaceutical relevance and one polypeptide were selected for further testing. These molecules were chosen to have QM (and/or MM) subsystems that are quite different from standard nucleobases. The results are summarized in Table 4 and S1.

Table 4.

RMS errors for partial charges (a.u.), M1 (a.u.), bond distances (Å), and angles (degrees) for acyclovir, clofarabin, emtricitabine, tenofovir, and triglycine polypeptide.

RMSE	ESP	NBO	AIM charge	AIM M1	Bond	Angle
Acyclovir	0.044	0.026	0.090	0.035	0.004	1.15
Clofarabine	0.038	0.020	0.063	0.050	0.004	0.44
Emtricitabine	0.080	0.023	0.129	0.029	0.006	0.73
Tenofovir	0.064	0.013	0.104	0.081	0.004	0.51
Triglycine	0.023	0.018	0.006	0.016	0.003	0.58

Open in a new tab

Overall, the accuracy for the different molecules is, on average, quite reasonable. Acyclovir presents the largest deviations in the angles, with a rms deviation of slightly more than 1°. Indeed, during the QM optimization of the full molecule (StdMol) a hydrogen bond is formed between the terminal hydroxyethoxy group in the QM subsystem and N7 of the guanosyl group. This hydrogen bond induces the formation of a 9-membered ring and therefore constrains the conformation slightly. N7 is not included into the QM subsystem in the PsMol. Therefore, the QM geometry optimization of the PsMol destroys the ring structure and allows the QM subsystem to relax completely. Interestingly, the pseudobond also showed good accuracy, using any of the parameter sets, for the polypeptide triglycine. These results show that our new pseudobond parameters can also be used to cut along the peptide backbone in protein systems. The present pseudobond parameters can therefore be considered as general to any C(sp³)–N(sp²) bond treated with any method and basis set. The pseudobond parameters provided here have been optimized to reproduce ground state properties. It is, however, possible to generalize the approach, in the same spirit as the work of Slaviček and Martinez,⁴¹ by introducing properties computed for excited states or other properties.

CONCLUSIONS

Here, we have developed pseudobond parameters for QM/MM simulations involving nucleosides, nucleotides, and their analogs such that the boundary is located across the bond between the ribosyl group and the nucleobase. Because reactions of biological interest commonly involve the (deoxy)ribose sugar (e.g., DNA replication, repair, phosphoryl transfer) the nucleobase can be described using molecular mechanics, significantly improving the efficiency of the simulation. Using a previously described protocol, we have carried out pseudobond parameter optimizations with three different training sets and tested their performance on seven molecular properties (bond distances, angles, dihedrals, ESP, NBO and AIM charges, and AIM first moment (M1)) using a test set of 20 molecules. No large differences in accuracy were observed when the different optimized pseudobonds sets were compared, with each training set yielding essentially equal performance. The size of the training set is not, therefore, crucial for pseudobond generality.

Pseudobonds are able to provide not only accurate ESP charges, which are useful for electrostatic potential fitting, but also AIM and NBO charges, which may provide more physically and chemically meaningful quantities. Perhaps more importantly, the approach also reproduces the AIM first moment M1, which is related to the atomic dipole and then shows the generality of the approach toward polarizable simulations. This approach is therefore a valuable tool for linking QM and MM subsystems in QM/MM simulations, including future applications involving polarizable force fields.

The pseudobond parameters developed here should enable accurate and efficient QM/MM calculations to be carried out on biomolecular systems containing DNA-like molecules. They are also general enough to be used at different levels of theory, with different basis sets and for any system containing a covalent C(sp³)–N(sp²) bond at the QM/MM interface (from sugar/base to protein backbone junctions). We have also shown that the pseudobond approach is robust enough to preserve dipolar properties and may therefore be applicable in possible future polarizable QM/MM schemes.

The optimization program, scripts, and instructions for use are available upon request from the authors.

ACKNOWLEDGMENTS

Support from the National Institute of Health (NIH R01-GM061870) is greatly appreciated.

References

Warshel A. and Levitt M., J. Mol. Biol. 103, 227–249 (1976). 10.1016/0022-2836(76)90311-9 [DOI] [PubMed] [Google Scholar]
Wu P., Cisneros G. A., Hu H., Chaudret R., Hu X. Q., and Yang W. T., J. Phys. Chem. B 116, 6889–6897 (2012). 10.1021/jp212643j [DOI] [PMC free article] [PubMed] [Google Scholar]
Garcia-Viloca M., Gao J., Karplus M., and Truhlar D. G., Science 303, 186–195 (2004). 10.1126/science.1088172 [DOI] [PubMed] [Google Scholar]
Field M. J., J. Comput. Chem. 23, 48–58 (2002). 10.1002/jcc.1156 [DOI] [PubMed] [Google Scholar]
Shurki A. and Warshel A., Protein Simulations (Academic, San Diego, 2003), pp. 249–313. [Google Scholar]
Mulholland A. J., Lyne P. D., and Karplus M., J. Am. Chem. Soc. 122, 534–535 (2000). 10.1021/ja992874v [DOI] [Google Scholar]
Hu H. and Yang W. T., Annu. Rev. Phys. Chem. 59, 573–601 (2008). 10.1146/annurev.physchem.59.032607.093618 [DOI] [PMC free article] [PubMed] [Google Scholar]
Senn H. M. and Thiel W., Angew. Chem., Int. Ed. 48, 1198–1229 (2009). 10.1002/anie.200802019 [DOI] [PubMed] [Google Scholar]
Hu H. and Yang W. T., J. Mol. Struct. (THEOCHEM) 898, 17–30 (2009). 10.1016/j.theochem.2008.12.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin H. and Truhlar D. G., Theor. Chem. Acc. 117, 185–199 (2007). 10.1007/s00214-006-0143-z [DOI] [Google Scholar]
Hagiwara Y. and Tateno M., J. Phys.-Condens. Matter 22, 413101 (2010). 10.1088/0953-8984/22/41/413101 [DOI] [PubMed] [Google Scholar]
Ranaghan K. E. and Mulholland A. J., Int. Rev. Phys. Chem. 29, 65–133 (2010). 10.1080/01442350903495417 [DOI] [Google Scholar]
Acevedo O. and Jorgensen W. L., Acc. Chem. Res. 43, 142–151 (2010). 10.1021/ar900171c [DOI] [PMC free article] [PubMed] [Google Scholar]
Field M. J., Bash P. A., and Karplus M., J. Comput. Chem. 11, 700–733 (1990). 10.1002/jcc.540110605 [DOI] [Google Scholar]
Singh U. C. and Kollman P. A., J. Comput. Chem. 7, 718–730 (1986). 10.1002/jcc.540070604 [DOI] [Google Scholar]
Amara P. and Field M. J., Theor. Chem. Acc. 109, 43–52 (2003). 10.1007/s00214-002-0413-3 [DOI] [Google Scholar]
Thery V., Rinaldi D., Rivail J. L., Maigret B., and Ferenczy G. G., J. Comput. Chem. 15, 269–282 (1994). 10.1002/jcc.540150303 [DOI] [Google Scholar]
Monard G., Loos M., Thery V., Baka K., and Rivail J. L., Int. J. Quantum Chem. 58, 153–159 (1996). 10.1002/(SICI)1097-461X(1996)58:2<153::AID-QUA4>3.0.CO;2-X [DOI] [Google Scholar]
Assfeld X. and Rivail J. L., Chem. Phys. Lett. 263, 100–106 (1996). 10.1016/S0009-2614(96)01165-7 [DOI] [Google Scholar]
Ferre N., Assfeld X., and Rivail J. L., J. Comput. Chem. 23, 610–624 (2002). 10.1002/jcc.10058 [DOI] [PubMed] [Google Scholar]
Gao J. L., Amara P., Alhambra C., and Field M. J., J. Phys. Chem. A 102, 4714–4721 (1998). 10.1021/jp9809890 [DOI] [Google Scholar]
Amara P., Field M. J., Alhambra C., and Gao J. L., Theor. Chem. Acc. 104, 336–343 (2000). 10.1007/s002140000153 [DOI] [Google Scholar]
Garcia-Viloca M. and Gao J. L., Theor. Chem. Acc. 111, 280–286 (2004). 10.1007/s00214-003-0512-9 [DOI] [Google Scholar]
Pu J. Z., Gao J. L., and Truhlar D. G., J. Phys. Chem. A 108, 632–650 (2004). 10.1021/jp036755k [DOI] [Google Scholar]
Pu J. Z., Gao J. L., and Truhlar D. G., ChemPhysChem 6, 1853–1865 (2005). 10.1002/cphc.200400602 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y., Lee T.-S., and Yang W., J. Chem. Phys. 110, 46–54 (1999). 10.1063/1.478083 [DOI] [Google Scholar]
Zhang Y. K., J. Chem. Phys. 122, 024114 (2005). 10.1063/1.1834899 [DOI] [PubMed] [Google Scholar]
Parks J. M., Hu H., Cohen A. J., and Yang W. T., J. Chem. Phys. 129, 154106 (2008). 10.1063/1.2994288 [DOI] [PMC free article] [PubMed] [Google Scholar]
DiLabio G. A., Hurley M. M., and Christiansen P. A., J. Chem. Phys. 116, 9578 (2002). 10.1063/1.1477182 [DOI] [Google Scholar]
Mulliken R. S., J. Chem. Phys. 23, 1833 (1955). 10.1063/1.1740588 [DOI] [Google Scholar]
Reed A. E. and Weinhold F., J. Chem. Phys. 78, 4066 (1983). 10.1063/1.445134 [DOI] [Google Scholar]
Reed A. E., Weinstock R. B., and Weinhold F., J. Chem. Phys. 83, 735 (1985). 10.1063/1.449486 [DOI] [Google Scholar]
Bader R. F. W., Atoms in Molecules: A Quantum Theory (Oxford University Press, Oxford, 1990). [Google Scholar]
Matta C. F. and Boyd R. J., The Quantum Theory of Atoms in Molecules: From Solid State to DNA and Drug Design (Wiley-VCH, Weinheim, 2007). [Google Scholar]
Pilme J. and Piquemal J. P., J. Comput. Chem. 29, 1440–1449 (2008). 10.1002/jcc.20904 [DOI] [PubMed] [Google Scholar]
Press W. H., Teukovsky S., Vetterling W., and Flannery B., Numerical Recipes in Fortran 77. The Art of Scientific Computing, Numerical Recipes, 2nd ed. (Cambridge University Press, 1992). [Google Scholar]
Becke A. D., J. Chem. Phys. 98, 5648–5652 (1993). 10.1063/1.464913 [DOI] [Google Scholar]
Lee C., Yang Y., and Parr R. G., Phys. Rev. B 37, 785–789 (1988). 10.1103/PhysRevB.37.785 [DOI] [PubMed] [Google Scholar]
Harihara P. and Pople J. A., Theor. Chim. Acta 28, 213–222 (1973). 10.1007/BF00533485 [DOI] [Google Scholar]
Frisch M. J., Trucks G. W., Schlegal H. B.et al. , GAUSSIAN 09, Gaussian, Inc., Wallingford, CT, 2009.
Slavicek P. and Martinez T. J., J. Chem. Phys. 124, 084107 (2006). 10.1063/1.2173992 [DOI] [PubMed] [Google Scholar]
See supplementary material at http://dx.doi.org/10.1063/1.4772182 for a discussion of the accuracy of pseudobonds regarding dihedral angles and the bond dissociation errors.

[c1] Warshel A. and Levitt M., J. Mol. Biol. 103, 227–249 (1976). 10.1016/0022-2836(76)90311-9 [DOI] [PubMed] [Google Scholar]

[c2] Wu P., Cisneros G. A., Hu H., Chaudret R., Hu X. Q., and Yang W. T., J. Phys. Chem. B 116, 6889–6897 (2012). 10.1021/jp212643j [DOI] [PMC free article] [PubMed] [Google Scholar]

[c3] Garcia-Viloca M., Gao J., Karplus M., and Truhlar D. G., Science 303, 186–195 (2004). 10.1126/science.1088172 [DOI] [PubMed] [Google Scholar]

[c4] Field M. J., J. Comput. Chem. 23, 48–58 (2002). 10.1002/jcc.1156 [DOI] [PubMed] [Google Scholar]

[c5] Shurki A. and Warshel A., Protein Simulations (Academic, San Diego, 2003), pp. 249–313. [Google Scholar]

[c6] Mulholland A. J., Lyne P. D., and Karplus M., J. Am. Chem. Soc. 122, 534–535 (2000). 10.1021/ja992874v [DOI] [Google Scholar]

[c7] Hu H. and Yang W. T., Annu. Rev. Phys. Chem. 59, 573–601 (2008). 10.1146/annurev.physchem.59.032607.093618 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c8] Senn H. M. and Thiel W., Angew. Chem., Int. Ed. 48, 1198–1229 (2009). 10.1002/anie.200802019 [DOI] [PubMed] [Google Scholar]

[c9] Hu H. and Yang W. T., J. Mol. Struct. (THEOCHEM) 898, 17–30 (2009). 10.1016/j.theochem.2008.12.025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c10] Lin H. and Truhlar D. G., Theor. Chem. Acc. 117, 185–199 (2007). 10.1007/s00214-006-0143-z [DOI] [Google Scholar]

[c11] Hagiwara Y. and Tateno M., J. Phys.-Condens. Matter 22, 413101 (2010). 10.1088/0953-8984/22/41/413101 [DOI] [PubMed] [Google Scholar]

[c12] Ranaghan K. E. and Mulholland A. J., Int. Rev. Phys. Chem. 29, 65–133 (2010). 10.1080/01442350903495417 [DOI] [Google Scholar]

[c13] Acevedo O. and Jorgensen W. L., Acc. Chem. Res. 43, 142–151 (2010). 10.1021/ar900171c [DOI] [PMC free article] [PubMed] [Google Scholar]

[c14] Field M. J., Bash P. A., and Karplus M., J. Comput. Chem. 11, 700–733 (1990). 10.1002/jcc.540110605 [DOI] [Google Scholar]

[c15] Singh U. C. and Kollman P. A., J. Comput. Chem. 7, 718–730 (1986). 10.1002/jcc.540070604 [DOI] [Google Scholar]

[c16] Amara P. and Field M. J., Theor. Chem. Acc. 109, 43–52 (2003). 10.1007/s00214-002-0413-3 [DOI] [Google Scholar]

[c17] Thery V., Rinaldi D., Rivail J. L., Maigret B., and Ferenczy G. G., J. Comput. Chem. 15, 269–282 (1994). 10.1002/jcc.540150303 [DOI] [Google Scholar]

[c18] Monard G., Loos M., Thery V., Baka K., and Rivail J. L., Int. J. Quantum Chem. 58, 153–159 (1996). 10.1002/(SICI)1097-461X(1996)58:2<153::AID-QUA4>3.0.CO;2-X [DOI] [Google Scholar]

[c19] Assfeld X. and Rivail J. L., Chem. Phys. Lett. 263, 100–106 (1996). 10.1016/S0009-2614(96)01165-7 [DOI] [Google Scholar]

[c20] Ferre N., Assfeld X., and Rivail J. L., J. Comput. Chem. 23, 610–624 (2002). 10.1002/jcc.10058 [DOI] [PubMed] [Google Scholar]

[c21] Gao J. L., Amara P., Alhambra C., and Field M. J., J. Phys. Chem. A 102, 4714–4721 (1998). 10.1021/jp9809890 [DOI] [Google Scholar]

[c22] Amara P., Field M. J., Alhambra C., and Gao J. L., Theor. Chem. Acc. 104, 336–343 (2000). 10.1007/s002140000153 [DOI] [Google Scholar]

[c23] Garcia-Viloca M. and Gao J. L., Theor. Chem. Acc. 111, 280–286 (2004). 10.1007/s00214-003-0512-9 [DOI] [Google Scholar]

[c24] Pu J. Z., Gao J. L., and Truhlar D. G., J. Phys. Chem. A 108, 632–650 (2004). 10.1021/jp036755k [DOI] [Google Scholar]

[c25] Pu J. Z., Gao J. L., and Truhlar D. G., ChemPhysChem 6, 1853–1865 (2005). 10.1002/cphc.200400602 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c26] Zhang Y., Lee T.-S., and Yang W., J. Chem. Phys. 110, 46–54 (1999). 10.1063/1.478083 [DOI] [Google Scholar]

[c27] Zhang Y. K., J. Chem. Phys. 122, 024114 (2005). 10.1063/1.1834899 [DOI] [PubMed] [Google Scholar]

[c28] Parks J. M., Hu H., Cohen A. J., and Yang W. T., J. Chem. Phys. 129, 154106 (2008). 10.1063/1.2994288 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c29] DiLabio G. A., Hurley M. M., and Christiansen P. A., J. Chem. Phys. 116, 9578 (2002). 10.1063/1.1477182 [DOI] [Google Scholar]

[c30] Mulliken R. S., J. Chem. Phys. 23, 1833 (1955). 10.1063/1.1740588 [DOI] [Google Scholar]

[c31] Reed A. E. and Weinhold F., J. Chem. Phys. 78, 4066 (1983). 10.1063/1.445134 [DOI] [Google Scholar]

[c32] Reed A. E., Weinstock R. B., and Weinhold F., J. Chem. Phys. 83, 735 (1985). 10.1063/1.449486 [DOI] [Google Scholar]

[c33] Bader R. F. W., Atoms in Molecules: A Quantum Theory (Oxford University Press, Oxford, 1990). [Google Scholar]

[c34] Matta C. F. and Boyd R. J., The Quantum Theory of Atoms in Molecules: From Solid State to DNA and Drug Design (Wiley-VCH, Weinheim, 2007). [Google Scholar]

[c35] Pilme J. and Piquemal J. P., J. Comput. Chem. 29, 1440–1449 (2008). 10.1002/jcc.20904 [DOI] [PubMed] [Google Scholar]

[c36] Press W. H., Teukovsky S., Vetterling W., and Flannery B., Numerical Recipes in Fortran 77. The Art of Scientific Computing, Numerical Recipes, 2nd ed. (Cambridge University Press, 1992). [Google Scholar]

[c37] Becke A. D., J. Chem. Phys. 98, 5648–5652 (1993). 10.1063/1.464913 [DOI] [Google Scholar]

[c38] Lee C., Yang Y., and Parr R. G., Phys. Rev. B 37, 785–789 (1988). 10.1103/PhysRevB.37.785 [DOI] [PubMed] [Google Scholar]

[c39] Harihara P. and Pople J. A., Theor. Chim. Acta 28, 213–222 (1973). 10.1007/BF00533485 [DOI] [Google Scholar]

[c40] Frisch M. J., Trucks G. W., Schlegal H. B.et al. , GAUSSIAN 09, Gaussian, Inc., Wallingford, CT, 2009.

[c41] Slavicek P. and Martinez T. J., J. Chem. Phys. 124, 084107 (2006). 10.1063/1.2173992 [DOI] [PubMed] [Google Scholar]

[c42] See supplementary material at http://dx.doi.org/10.1063/1.4772182 for a discussion of the accuracy of pseudobonds regarding dihedral angles and the bond dissociation errors.

PERMALINK

Pseudobond parameters for QM/MM studies involving nucleosides, nucleotides, and their analogs

Robin Chaudret

Jerry M Parks

Weitao Yang

Abstract

INTRODUCTION

THEORY

TRAINING AND TESTING SETS

Figure 1.

PARAMETER OPTIMIZATION