Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 2.
Published in final edited form as: J Phys Chem B. 2012 Aug 21;116(34):10342–10356. doi: 10.1021/jp304678d

Exploring, Refining, and Validating the Paradynamics QMMM Sampling

Nikolay V Plotnikov 1, Arieh Warshel 1,*
PMCID: PMC12401620  NIHMSID: NIHMS2104425  PMID: 22853800

Abstract

The performance of the paradynamics (PD) reference potential approach in QMMM calculations is examined. It is also clarified that, in contrast to some possible misunderstandings, this approach provides a rigorous strategy for QMMM free energy calculations. In particular, the PD approach provides a gradual and controlled way of improving the evaluation of the free energy perturbation associated with moving from the EVB reference potential to the target QMMM surface. This is achieved by moving from the linear response approximation to the full free energy perturbation approach in evaluating the free energy changes. We also present a systematic way of improving the reference potential by using Gaussian-based correction potentials along a reaction coordinate. In parallel, we review other recent adaptations of the reference potential approach, emphasizing and demonstrating the advantage of using the EVB potential as a reference potential, relative to semiempirical QMMM molecular orbital potentials. We also compare the PD results to those obtained by direct calculations of the potentials of the mean force (PMF). Additionally, we propose a way of accelerating the PMF calculations by using Gaussian-based negative potentials along the reaction coordinate (which are also used in the PD refinement). Finally, we discuss performance of the PD and the metadynamics approaches in ab initio QMMM calculations and emphasize the advantage of using the PD approach.

Graphical Abstract

graphic file with name nihms-2104425-f0001.jpg

I. INTRODUCTION

Modeling chemical processes in condensed phases using the QMMM method became the mainstream in computer simulations.1-5 Although recent methodological advances in electronic structure calculations and in parallel computing as well as an increase in available computer power, have opened new opportunities for predictive ab initio QMMM simulations, evaluation of the free energies in ab initio QMMM (QM(ai)MM) studies has remained one of the major challenges in the field. One of the most promising options for performing QM(ai)MM free energy calculations has been our approach6-8 of using a reference potential (RP) for the QMMM potential and recent related versions discussed below. The most recent version of our idea of using the RP is implemented in the paradynamics (PD) model.9 The PD model is based in a large part on a number of earlier works and ideas including (a) those which have established the way of efficient sampling of the RP using the empirical valence bond10 (EVB) method combined with the evaluation of the free energy surface by a specialized combination11,12 of the free energy perturbation (FEP13) and umbrella sampling (US14) protocols, with the energy gap as a reaction coordinate;11,12 (b) those works which introduced the idea of using the EVB RP and the FEP/US approach in semiempirical7 and ab initio6 QMMM free energy calculations; and (c) the ideas of refining the EVB RP8 to minimize the energy difference between the EVB RP and the QM(ai)MM target potential (TP), and using the linear response approximation15,16 (LRA) for calculating the FEP, while moving from the RP to the TP at the transition (TS) and reactant states (RS).

The PD approach has been found to be quite effective,9 where a typical evaluation of the activation free energy barrier by the PD approach (using the EVB RP) was estimated to require ~200 times less calls to the TP compared to the metadynamics (MTD) studies involving direct sampling of the TP.17 Thus the PD approach provides a rigorous way of dealing with the high computational cost of sampling of the QM(ai)MM potential which is necessary to get an accurate free energy barrier.

In the present work, we report recent advances in expanding capabilities of the PD approach, clarifying possible misunderstandings and introducing several practical advances.

The first point that will be addressed is a misunderstanding associated with the fact that implementation of the PD scheme of moving from the RP to the TP has been mostly done with the LRA approach. This might have led some to the assumption that our approach is limited by validity of the LRA and thus does not yield an estimate of the relevant error. However, as we tried hard to clarify, the perturbation between the two potentials can be done with a full FEP treatment. This point will be demonstrated here.

The second point that will be addressed in this work is our finding that the iterative PD approach9 of refining the EVB RP has not been fully robust. In particular, in some cases, one may want to change the EVB functional form rather than just refine the EVB parameters. In order to overcome this problem, we developed an approach that at least in some cases can be very effective in constructing and refining the RP in an alternative, more general way. This new approach involves fitting the potential energy scan (PES) along the specified reaction coordinate (RC) in the gas phase (or in an implicit solvent) for both the RP and the TP by Gaussian functions. The refined RP is constructed as the original RP modified with a correction that is taken as the difference between the Gaussian functions that approximate the TP and the RP.

The original RP approach7 has been adopted and implemented in a number of methods aimed at QM(ai)MM calculations of activation free energies in enzymes18-20 and in aqueous solutions,21 as well as at evaluation of free energies of solvation,22,23 binding,24 and conformational changes.25 For example, in one recent approach, a semiempirical AM1/MM potential was used as the RP for the configurational sampling and for the construction of the reference free energy surface (FES). This was followed by an optimization on a DFT QMMM potential at a number of points along the chosen RC, yielding the least energy reaction path. The entropic contribution was approximated by the difference between the activation free energy and the activation energy obtained on the AM1/MM RP. Furthermore, single point energy calculations on the optimized DFT QMMM least energy path, using CCSD18 or MP2,19 were proposed to provide the activation free energy for a high level ab initio QMMM surface. Another technique, introduced by Yang and co-workers,26 and adopted elsewhere,27 provides an estimate of the QM(ai)MM free energy profiles with a fixed solute at a predetermined RC (which is determined, for instance, through iterative sequential optimization of the QM and MM regions independently, where in each step one of the regions is kept fixed while the second is optimized26). This is followed by a single-step FEP from the MM-type RP to a QMMM TP (with averaging on the RP). The reference FES is constructed by sampling on the RP. The ESP-derived solute charges for the MM-type RP are periodically updated by the QMMM calculations. This method seems to be capable of capturing both the fixed solute polarization and the solvent polarization in a physically consistent way. A related approach involves single point energy evaluations20,28 with a high level QMMM, performed at fixed solute geometries (in a similar way to that used in the work18 mentioned earlier). The energy difference between the DFT QMMM and the high-level QMMM was assumed to approximate the FEP while moving between these two potentials.27 In another recent work, a different semiempirical RP, PM3MM,25 was used to construct the reference FES, followed by a single step FEP from the RP to a DFT QMMM TP with the average being calculated on the RP.

The overall RP results, obtained in the above studies, clearly indicate that this approach is a powerful and versatile tool, which allows one to overcome the high cost of direct sampling on the QM(ai)MM potential. However, there is a great potential for improvements to some of the existing RP treatments, especially considering the fact that some of them overlooked our earlier, and arguably more consistent, RP treatment. For instance, one can foresee certain disadvantages to the use of minimization in exploring the multidimensional DFT QMMM surface, as it is expensive and rather computationally inefficient. In addition to this, such a strategy can lead to multiple local minima. One has to keep in mind that the entropic contribution to the activation free energy barrier can be different for the TP and the RP, and that the activation energy is not equal to the activation enthalpy. Another serious problem is the slow convergence of the single step FEP approach when the average is taken only on the RP. This is especially problematic when the difference between the TP and the RP is significant (and their overlap in the phase space is small). In this case, sampling on the RP poorly represents the relevant region of interest on the TP.

The above problems are largely eliminated by the PD approach mainly due to the following features: (a) using the end points LRA that takes the averages both on the RP and on the TP and (b) using the PD refinement, which ensures faster LRA convergence due to a higher overlap between the TP and the refined RP. These points will be illustrated and discussed below.

An important element of the PD approach is the idea of refining the RP so it will be as close as possible to the TP. Our main refinement strategy starts with evaluation of the energy gap between the RP and the TP for the geometries generated by MD trajectories propagated at the RS and at the TS on the RP as well as on the TP. This step is followed by a least-squares minimization of the energy difference (plus other quantities). A somewhat similar refinement strategy has also been introduced in another context.29 Although this approach is very effective, we found out that in cases when the improvement of the EVB requires a new functional we can get an improved refinement by adding Gaussian functions to the RP in a way that minimizes the difference between the RP and TP along the RC. A related idea of deforming the original potential to eliminate local minima was proposed by Scheraga and co-workers30 in order to find the global minimum on a multidimensional surface. The use of a single negative Gaussian in the bias potential was also found to be useful in a MC study performed by Jorgensen and co-workers;31 however, it was for a rather stand-alone, specific case. A more general strategy of improved sampling by iteratively fitting the negative of the original potential with Gaussians in the local elevation method32 was formulated by van Gunsteren and co-workers. This idea was adopted, generalized in an elegant way, and popularized in the MTD approach of Parinello and co-workers.33 As will be clarified below, unlike these approaches, ours does not require expensive iterative sampling while building the negative potential. This step is accomplished by fitting the PES, and using the negative potential approximated by Gaussians in our FEP/US PMF34 calculations. We accelerate convergence of PMF calculations with explicit solvent models by using the EVB type solvent driving potential, which forces the solvent to be polarized by the solvated solute charges along the reaction path.

In summary, we carry out two separate applications with the Gaussian functions: (a) during the PD refinement procedure, we evaluate two PESs (on the RP and on the TP), fit them both with Gaussian functions, and take the difference as a correction to the original RP; and (b) as the improved sampling strategy in FEP/US PMF calculations, we evaluate the negative of the original potential (without any iterative approaches) and use the corresponding Gaussians to create a flat mapping potential. The sampling approach is augmented by the EVB-type solvent driving potential for the condensed phase calculations with explicit solvent models. At any rate, our Gaussian refinement strategy is as related to the MTD as our idea of building the RP that will be as similar as possible to the TP. As we clarified in our previous work,9 this early RP idea is formally identical to the MTD idea of building the negative of the TP but allows for more effective implementation.

II. METHODS AND RESULTS

II.1. The Standard Paradynamics Approach.

The central idea of the PD (RP) approach is that the extensive configurational sampling required to calculate the QM(ai)MM free energy barrier can be done on a computationally inexpensive reference potential6-8,16 rather than directly on the expensive target QM(ai)MM potential. Our standard approach, described in Figure 1, involves construction of the free energy surface (FES) using the EVB10 method and the FEP/US technique.12,14 This is followed by evaluating the FEP using the LRA approach, while moving from the EVB RP to the TP at the RS and at the TS.

Figure 1.

Figure 1.

The thermodynamic cycle used in the paradynamics approach to calculate the free energy barriers. The blue curve is the free energy surface, Δg(ξ), obtained for the reference potential with the corresponding barrier, Δg (blue arrow); the red curve is the free energy surface for the target potential, for which we want to evaluate the free energy barrier (red arrow). To achieve that, we estimate the free energy of moving from the reference potential to the target potential at the RS and at the TS (shown by black arrows).

In considering activation barriers for chemical reactions, we follow our previous works16 and clarify the difference between the free energy profile (or the PMF), Δg, and the free energy of the system in a particular state, ΔG. This is done by starting with the expression for the rate constant, k:35,36

k=κξ.TS2ΔξΔξexp(βΔg)ξexp(βΔg(x))dx=κξ.TS2Δξexp(βΔG) (1)

where κ is the transmission factor, ξ. is the velocity along the RC, ξ, Δξ is the width of the TS region, ΔgRS is the value of the free energy profile (or PMF) at the minimum at the RS(x=ξRS), and Δg is its value at the TS(x=ξ). Here ΔG is defined by

exp(βΔG)=exp(β(ΔgΔgRS))1Δξξexp(β(Δg(x)ΔgRS))dx (2a)

or

ΔG=ΔgΔgRS+kTln(1Δξξexp(β(Δg(x)ΔgRS))dx) (2b)

where the last term of eq 2b provides a convenient way of incorporating the entropic effect of the ground state into the TS-theory rate constant.35 For the free energy of the RS, we also have

eβΔG(RS)=ξeβΔg(x)dx (3)

This work will use the lower case Δg(ξ) and the capital ΔG, as respectively, the value of the free energy profile, or the PMF, at a particular RC value (e.g., at TS, RS) and the free energy of a particular state (e.g., RS or TS defined by eq 3 or by Δg).

The second key idea of the PD approach, which drastically improves its computational cost, is the use of the LRA approach to calculate the free energy differences between the RP and the TP shown in Figure 1.

For example, in the case where the TS position is identical for the two free energy surfaces, we can write16

ΔΔgEREFETARG(ξETARG)=kTlnqETARG(ξETARG)qEREF(ξETARG)=kTlnexp(ETARG(ξETARG)EREF(ξETARG)kT)EREF12(ETARG(ξETARG)EREF(ξETARG)ETARG+ETARG(ξETARG)EREF(ξETARG)EREF) (4)

where the qs are the partition functions at the specific RC values.

Finally, the third important component of the PD approach is the refinement of the RP, which brings it closer to the TP and ensures fast convergence of the FEP calculation from the RP to the TP. The above LRA expression also provides a framework for the PD refinement procedure, where the EVB parameters are refined by seeking the minimum of the least-squares function:9

𝒯(p,r)=k1i=1N(EiEVB(p,r)EiQM(r))2+k2i=1Nj=13(xjEiEVB(p,r)xjEiQM(r))2 (5)

by either using a simplifed Newton–Rhaphson approach where we refine one EVB parameter, pi, at a time, where the result of the k+1-th iteration is given

pik+1=pikg[𝒯pi][2pi2]k (6)

or just by refining the vector of EVB parameters, p, in the optimal steepest descent approach, where the result of the k+1-th iteration is given by

pk+1=pksgrad𝒯(pk) (7)

The PD approach can be applied to any RP. However, the EVB potential has been the RP of choice, due to its very low computational cost, good performance in condensed phases (in particular in studies of enzymatic reactions16), physically consistent treatment of the solute–solvent electrostatic coupling, and other advantages discussed elsewhere.37

While the refinement scheme given above is for EVB parameters, we also provide below a benchmark example of using a more expensive (compared to EVB) semiempirical RP to demonstrate that the PD approach is general as far as the RP is concerned.

II.2. Validating the LRA by a Full FEP Treatment.

II.2.1. The Case of Two General Potentials.

Although the LRA treatment of eq 4 has been effective, its overall convergence may be problematic when the difference between the RP and the TP is significant. This has led some to assume that the validity of the LRA treatment limits application of our PD approach. However, what is missing in this assumption is the fact that LRA is just the end point approximation of the full FEP treatment which can be easily implemented by using

ΔΔGFEP=kTm=1n1lnexp(Em+1EmkT)Em (8)

where n is the number of simulation windows, between which the perturbation parameter λ (here weight of the TP) changes from 0 to 1, and Em is the mapping potential given by

Em=(1λm)EREF+λmETARG (9)

EREF and ETARG designate the RP and the TP, respectively. Better convergence is obtained, however, if we take the average of forward and backward FEP (see, e.g., ref 34):

ΔΔGFEP=12(m=1n1kTlnexp(Em+1EmkT)Em)m=2nkTln(exp(Em1EmkT)Em) (10)

With the accurate results of the FEP method, we can explore the validity of the end point LRA, which gives (see also eq 4) the relevant ΔΔG by15

ΔΔGLRA=12(ETARGEREFEREF+ETARGEREFETARG) (11)

Furthermore, to demonstrate how the performance of the LRA is affected by the difference between the two potentials, we also examined the n-points LRA between the adjacent mapping potentials given by eq 9:

ΔΔGnLRA=m=1n112(Em+1EmEm+Em+1EmEm+1) (12)

Obviously, with the large n, this expresion converges to the regular FEP.

In the present case, we start our examination of the validity of the LRA treatment by evaluating the FEP and LRA perturbations between the PM3MM38 and PM6MM39 potentials of the MOPAC200940 package, with the correct treatment of the solute polarization1,7 described in section II.3.3 (see also simulation 1 in the Supporting Information for additional computational details). This is done for an SN2 reaction in water between methyl chloride and chloride (here PM3MM is the RP, and PM6MM is the TP).

For the FEP treatment at the TS, we used (in addition to the QMMM potential) the harmonic potential:

ECONS=K(ξξ0)2 (13)

where ξ is the reaction coordinate defined as

ξ=R(CClA)R(CClB) (14)

Note that ξ0(xPM3)=ξ0(xPM6)=0.

We also evaluated the free energy by the thermodynamic integration method (which is very similar to the regular FEP approach), using

ΔΔGTDI=01Gλdλ=m=2n12(ETARGEREFEm+ETARGEREFEm1)Δλ (15)

The ΔΔG was evaluated for the difference between the potentials EREF and ETARG, which correspond, respectively, to EPM3MM and EPM6MM. These potentials were contained within the TS region by

EREF=EPM3MM+100(ξξ0Iξ0=0)2ETARG=EPM6MM+100(ξξ0Iξ0=0)2 (16)

Similarly, for the RS, we have

EREF=EPM3MM+100(ξξ0Iξ0RS=1.75)2ETARG=EPM6MM+100(ξξ0Iξ0RS=1.75)2 (17)

The performance of the FEP and LRA calculations is compared in Figure 2, where we see remarkably good performance of the end-point LRA treatment with nearly identical results to those of the full FEP treatment. Note that Figure 2 gives estimates of the free energy difference between the RP and the TP, which were modified by adding the constraining harmonic potentials in order to keep the trajectories at the specific RC values (RS and TS). Note, also, that the harmonic potentials used to obtain the sampling within the RS and within the TS have no bias effect at the RC values (specified by the corresponding ξ0) at which the FEP was calculated, and a small effect in the vicinity of these points.

Figure 2.

Figure 2.

Comparative performance of the LRA approach and of a full multi-step FEP treatment in estimating the free energy of moving from EPM3MM to EPM6MM at the TS (A) and at the RS (B). The estimates were obtained by: 2-point LRA (black dashes); single-point LRA with the average of the energy gap, EPM6MMEPM3MM, calculated on EPM3MM (red dashes) and on EPM6MM (green dashes); forward FEP (magenta squares); backward FEP (cyan triangles); average FEP (blue line); n-point LRA (black triangles); TDI (orange stars). Note that FEP-based estimates are hardly distinguishable due to overlap (they are shown in insets).

Of course, a convincing validation requires us to show that the barrier calculated for the TP using the PD approach is the same as the barrier obtained in a separate PMF calculation. In order to be able to compare the ΔΔg of the PD treatment to the corresponding difference between the full PMF calculations we have to take into account the fact that what we get from the PMF corresponds to the incomplete partition functions, qs, of eq 4, whereas the FEP simulations described above sample the whole phase space. Thus, strictly speaking, in order to compare with the difference between the individual PMFs, we have to determine ΔΔg(ξ) through the difference between the free energy functions of the RC (at its specific values with removing the effect of the constraining harmonic potential). This, in turn, can be compared to the ΔΔG estimates obtained by the FEP treatment (eqs 8-15) between the potentials of eqs 16 and 17. This can be done with a treatment, which is a variant of the EVB FEP/US mapping formulation.11,12,41 Namely,

ΔΔgPM3MMPM6MM(ξ)=ΔgPM6MM(ξ)ΔgPM3MM(ξ) (18)

where the free energy functions are given by

ΔgPM3MM(ξ)=ΔΔGmkTlnδ(xξ)exp(EPM3MMEmkT)Em (19)

and

ΔgPM6MM(ξ)=ΔΔGmkTlnδ(xξ)exp(EPM6MMEmkT)Em

where

Em=(1λm)EPM3MM+λmEPM6MM+ECONS

In this expression, we deal with the change from EREF to ETARG (in the EVB treatment, we go from E1 to E2) and ΔΔGm is the result of moving from EPM3MM+ECONS to Em by the FEP procedure. The second term removes the effect of the constraining potential and selects the particular values of RC.

To demonstrate the point discussed above, we start by building a histogram for the MD trajectories to see the distribution of the energy gap between the two potentials, (EPM3MMEPM6MM). As can be seen from Figure 3A, the distribution of the energy gap for the MD trajectories propagated during calculation of the free energy of moving from the RP to the TP at the RS is centered at −3.5 kcal/mol. Similarly, the corresponding gap evaluated near the TS is centered at 4 kcal/mol. These are the most probable differences between the two potentials constrained at the RS and at the TS. Adding up these estimates gives an estimate of the total difference between the activation free energy barriers of the RP and of the TP, which is 7.5 kcal/mol. These estimates are close to the values obtained by performing the FEP between the two potentials (see Figure 2A and B).

Figure 3.

Figure 3.

(A) The distribution of the energy gap, EPM3MMEPM6MM, obtained from the MD trajectories runs used in FEP calculations while moving from EPM3MM+ECONS(ξ) to EPM6MM+ECONS(ξ), where ξ=d(CClL)d(CClA) is the solute RC, in the RS region and at the TS regions (designated by red and green, respectively). (B) The sampling distribution of the nuclear RC, obtained from the MD runs used in FEP calculation of moving from EPM3MM+ECONS(ξ) to EPM6MM+ECONS(ξ) in the RS and TS regions (which are designated by red and green, respectively).

Further examination of the distribution of the nuclear RC is given in Figure 3B. The figure shows that the harmonic potentials effectively narrowed the sampling to the corresponding target values of the RC. Thus, one can evaluate the free energy perturbations at any value of the RC specified by the ξ0 parameter of the harmonic potential. While the ξ0 parameter specifies the RC value at which the free energy perturbation is estimated, the force constant controls the width of the RC distribution. Note, however, that all other degrees of freedom are sampled. We also found that, while increasing the force constant further narrowed the distribution in the RC, it had very little effect on the distribution of the energy gap and almost no effect on the FEP estimate.

At this stage, we estimated the free energy functions of the RC at the RS and at the TS regions, ΔgPM3MM(ξ) and ΔgPM6MM(ξ), using eq 19 (note that this treatment removes the bias potential of eqs 16 and 17). The resulting differences between the PM6MM and PM3MM free energy functions, ΔΔg(ξ), are shown in Figure 4, e.g., ~−4 kcal/mol at the TS (ξ=0) and ~3.5 kcal/mol at the RS (ξ=1.75). Thus, we demonstrated that the estimates obtained by the full FEP and by the LRA evaluation of the free energy of moving from the RP to the TP (which are modified by the harmonic potentials to keep the sampling within the RS and the TS regions) are in excellent agreement with the difference between the free energy functions at the corresponding RC values (obtained by removing the bias potential using eq 19).

Figure 4.

Figure 4.

The free energy functions of the solute RC, ξ, obtained at the RC values corresponding to the TS (A) and to the RS (B), for the full mapping between PM3MM and PM6MM. The blue line designates ΔgPM6MM and the red line designates ΔgPM3MM, obtained as weighted averages over all mapping potentials, blue and red dots, respectively.

Finally, we compared in Figure 5 our careful evaluation to the difference between the PMFs calculated for PM3MM and PM6MM potentials. Each PMF was obtained (see also simulation 2 in the Supporting Information) by introducing a harmonic constraint at different RC values, ξ0, along the reaction path, and combining the simulation windows using the WHAM equations.42-44 Inspecting the PMFs, one finds that the difference between the activation free energy barriers is about 7.5 kcal/mol, which is in perfect agreement with the results obtained by the FEPLRA treatment (eqs 8-15) and by the FEP/US approach (eqs 18 and 19).

Figure 5.

Figure 5.

The PMF obtained for the PM3MM potential with the solute charges derived by the Mulliken analysis (red) and by the ESP-fitting (blue) as well as for PM6MM potential with the Mulliken solute charges (green). The PMFs have been calculated by the WHAM procedure.

In summary, in the above section, we demonstrated by using two MO-type QMMM potentials that the LRA approach of eq 4 provides a very good approximation for the full FEP/US treatment while giving significant time savings compared to the full FEP treatment. Also note that the LRA approach performs remarkably well when the averages are calculated on both potentials. Finally, by evaluating both the PMFs for the TP and for the RP and the free energy perturbation from the RP to the TP at the TS and the RS, the consistency of our approach is established.

II.2.2. Using the EVB as a Reference Potential.

In this section, we will expand on the FEP-LRA evaluation of ΔΔG for moving from the EVB RP to a QMMM potential. In this case, it is useful to comment here about the evaluation of eq 4 at particular values of the RC, when one uses EVB as the RP. Now, as we have shown in the previous example, the harmonic constraint on the solute RC is quite efficient for estimating the FEP at a particular RC value for an MO-based QMMM potential (where it is sufficient to have a constraint with K=100kcal(A2mol) to stay at the specified RC). For the EVB potential, we found that a more effective way to keep EVB at its TS (where E1=E2) involves using the approximation11

EEVB(xEVB)0.5E1+0.5E2H12 (20)

where we use the fact that eq 20 is equal to the adiabatic EVB at the TS:

EEVB=c12E1+c22E22c1c2H12 (21)

The use of this expression is equivalent to constraining the eigenvector components to be equal (so that both of the EVB diabatic states equally contribute to the adiabatic state).

Here we estimate the FEP between the EVB and MO-based PM3MM potentials (see also simulation 3 in the Supporting Information) at the TS using the potentials:

EREF=EEVB(xEVB)+100(ξξ0Iξ0=0)2andETARG=EPM3MM+100(ξξ0Iξ0=0)2 (22)

and at the RS using

EREF=EEVB+100(ξξ0Iξ0=1.5)2andETARG=EPM3MM+100(ξξ0Iξ0=1.5)2

The estimate of ΔΔG by the end-point LRA approach was found to provide again a very good approximation to the full FEP (see Figure 6). However, we see the importance of averaging on both the RP as well as on the TP.45 The FEPs at the TS and RS are, respectively, ~2 and ~6 kcal/mol; thus, the activation free energy barrier for EVB is ~4 kcal/mol higher than the PM3MM activation free energy barrier based on this estimate.

Figure 6.

Figure 6.

Comparative performance of the LRA approach and of a full multi-step FEP treatment in estimating the free energy of moving from EEVB to EPM3MM at the TS (A) and at the RS (B). The estimates were obtained by: 2-point LRA (black dashes); single-point LRA with the average of the energy gap, EPM3MMEEVB, calculated on EEVB (red dashes) and on EPM3MM (green dashes); forward FEP (magenta squares); backward FEP (cyan triangles); average FEP (blue line); n-point LRA (black triangles); TDI (orange stars). Note that FEP-based estimates are hardly distinguishable due to overlap (they are shown in insets).

In the next step, we constructed (see Figure 7) the histograms of the energy gap EEVBEPM3MM for the FEP calculations at the RS and at the TS. The histograms show that the most probable difference between the TP and the RP, EPM3MMEEVB, is ~7 kcal/mol at the RS and ~2 at the TS, which practically coincides with the full FEP estimates and with LRA.

Figure 7.

Figure 7.

The distribution of the energy gap between the EVB potential and the PM3MM potential, obtained while performing FEP calculations at the RS (green) and the distribution of the gap between the EVB mapping potential and PM3MM while doing FEP at the TS (red).

ΔgPM3MM(ξ)=ΔΔGmkTlnδ(xξ)exp(EPM3MMEmkT)Em (23)
ΔgEVB(ξ)=ΔΔGmkTlnδ(xξ)exp(EEVBEmkT)Em

Next, using eq 23, we find that the difference between the free energy functions at the TS, ξ=0, is ~4.5 kcal/mol and at the RS(ξ=1.5) is ~5.5 kcal/mol (see Figure 8). While this is close to the value found from the FEP calculation at the RS, the difference between the free energy functions at the TS is ~3 kcal/mol higher than the corresponding FEP estimate. This is probably due to use of the potential given by eq 20 in the FEP calculation, since if we construct the free energy function along the RC for the approximate EVB potential at the TS (see also Figure 9):

ΔgEVB(xEVB)(ξ)=ΔΔGmkTlnδ(xξ)exp(EEVB(xEVB)EmkT)Em (24)
Figure 8.

Figure 8.

The free energy functions along the reaction coordinate ξ=d(CClL)d(CClA) obtained by FEP calculations of the free energy for moving from the EVB potential to the PM3MM in water at the TS (A) and at the RS (B). The blue line designates ΔgEVB (weight-averaged over the blue points), while the red line designates ΔgPM3MM (weight-averaged over red points).

Figure 9.

Figure 9.

The free energy functions along the reaction coordinate obtained by FEP calculations of the free energy for moving from the EVB mapping potential to PM3MM in water at the TS. The blue line designates the EVB function (weight-averaged over blue points), while the red line designates the PM3MM function (weight-averaged over red points).

The difference between the free energy functions at the TS is about 3 kcal/mol, which is close to the FEP estimate of 2 kcal/mol. In any case, the correction to the PM3MM free energy barrier based on eq 23 is about −1 kcal/mol, while on the basis of the FEPLRA (with the approximation of EVB potential at the TS by eq 24) it is −4 kcal/mol.

The calculated EVB free energy profile and the PM3MM PMF are given in Figure 10. The EVB free energy barrier is found to be ~1.5 kcal/mol higher than the PM3MM free energy barrier, and this is in a perfect agreement with the results calculated in the PD model using FEP/US of eq 23. Alternatively, if the EVB free energy profile vs the EVB energy gap is taken as the reference free energy surface, for which the activation free energy barrier is ~3.5 kcal/mol higher than the barrier on the PM3MM PMF, a good agreement with the corresponding FEPLRA estimate of 4 kcal/mol is achieved.

Figure 10.

Figure 10.

(A) The free energy profile along the nuclear RC obtained by the FEP/US approach using the EVB reference potential (blue dots) and PMF for the PM3MM target potential (red line); (B) EVB FEP/US free energy profile along the energy gap between the EVB diabatic states.

The above approach evaluates only the vertical free energy changes, and thus when the TS coordinates on the RP and on the TP are different, we have to evaluate the PMFs on both surfaces at the TS region (see ref 16 for the full discussion of this case). More specifically, if the RC for the QM TS is known (e.g., from the gas phase or from the implicit solvent model minimization), we can evaluate the vertical transition from the QM TS on the RP and then use the PMF (again on the RP) to obtain the free energy of moving from the QM TS on the EVB potential to the actual EVB TS state. In this case, the PMF on EVB can be estimated using the FEP/US,6 method as is done in Figure 10. If we do not know the exact QM TS, then PMF on the TP is required. This can be done by one of the approaches described below only in the TS region, which will still give significant time savings compared to the PMF in the full range of the RC. In general, there is no need to use any constraints while performing perturbation at the RS, since the corresponding RSs are effectively sampled when the trajectories are propagated on both the RP and the TP.

II.3. PD Refinement of the Reference Potential by Modifying Its Functional form.

II.3.1. Refining the Intramolecular Potential in the Gas Phase Using Gaussians.

In considering the PD refinement, we noted that it is possible to take advantage of the fact that evaluating the PES in the gas phase for chemical reactions is a relatively trivial operation, which is routinely performed nowadays. This makes PES an easily available source of data for our refinement procedure.

With the gas phase PES, we can simply fit the TP and the RP (e.g., EVB potential) by a set of Gaussian functions. For instance, for the TP, we minimize the least-squares function, using the optimal steepest descent with analytical derivatives, with respect to the parameters of the Gaussians:

𝒯({αk},{Ak})=i=1N(ΓTARG(ri)ETARG(ri))2 (25)

The result of this fitting is a function, Γ, which approximates the TP in the range of the RC where the PES was performed:

ΓTARG=kAkexp(αk(xξk)2) (26)

It was found that a combination of 3 Gaussians was sufficient to fit the PES for PM3, PM6, and EVB (see Figure 11 and Table 1). For the details of the fitting algorithm, see also the Supporting Information. This procedure can be done for both the RP and the TP. In the next step, we use the obtained Γ-functions to refine the original RP (e.g., the EVB RP) using:

EEVB=EEVBΓEVB+ΓQM (27)
Figure 11.

Figure 11.

Fitting the Γ-functions. (A) ΓEVB (green line) fitted to the EVB gas-phase PMF/WHAM (red triangles). The EVB potential used in the simulation (blue squares) was modified by removing the effect of the EVB distance constraints for the RC values outside the range 1>ξ>1. (B) PM3 and PM6 gas phase energy scans from MOPAC2009 (red stars and green triangles) and the corresponding Γ-functions.

Table 1.

Parameters for the Γ-Functions Presented in Figure 11 Composed of Three Gaussians: Aiexp(αi(ξξi)2)

A1, A3 A2
(kcal/mol)
α1, α3 α2
(A−1)
ξ1(ξ3) ξ2
(A)
EVB −3.92 6.98 1.58 3.66 −1.1 (1.1) 0.0
PM3 −4.39 6.80 2.40 2.22 −1.0 (1.0) 0.0
PM6 −5.25 3.35 2.89 3.70 −1.0 (1.0) 0.0
PM3cosmo −0.18 20.85 0.63 2.47 −1.2 (1.2) 0.0
PM6cosmo −1.02 16.58 1.34 2.16 −1.1 (1.1) 0.0

This refinement procedure is aimed to bring the RP close to the TP in a given range of the RC, since:

ETARGETEREF=EREFΓREF+ΓTARG (28)

Now we have the refined RP that provides an improved approximation for the TP which is essential for good convergence of the LRA. This approach is particularly useful when the EVB functional form is sophisticated and the refinement of its parameters is lengthy and tedious.

As a quick demonstration of the described above refinement procedure, we consider an example with an EVB potential serving as the RP and the PM6 potential serving as the TP (see also simulation 4 in the Supporting Information). Figure 12 illustrates the effect of the Γ-correction refinement (using eq 27) of the original EVB RP on the reference free energy surface. As seen from the figure, the calculated PMF on the original EVB RP is 10.5 kcal/mol, whereas the PMF on the refined EVB RP is 8.8 kcal/mol, which is in excellent agreement with the activation free energy barrier of 8.9 kcal/mol for PM6 (calculated using the harmonic approximation for the entropic contribution). As can be seen, this relatively simple functional form is quite effective in reducing the difference between reference FES and the target FES.

Figure 12.

Figure 12.

The gas-phase EVB PMF obtained by using the WHAM approach and the harmonic constraints on different values of RC. The red line is for PMF on the original EVB reference potential, while the green line is for the PMF obtained on the refined EVB reference potential (EEVBΓEVB+ΓPM6). The black stars are the free energy points calculated on PM6 (target potential) using the harmonic approximation for activation entropy.

II.3.2. Application of the Gamma-Correction in PMF Calculations: Derivation.

Obviously the main application of QMMM calculations is in studies of reactions in condensed phases. However, prior to extension of the refinement approach proposed in section II.3.1 to the condensed phases, we would like to report several practical applications of this approach to calculating the PMF, which were found to be quite useful and can be also applied to condensed phase PMF calculations.

Since PMF calculations can be quite expensive, it is important to look for ways to optimize the corresponding calculations. Here we tried to exploit an element of the MTD approach (although, as will be clarified in the concluding discussion, with an entirely different philosophy). That is, as pointed out by van Gunsteren and co-workers32 and as implemented elegantly in the MTD33 approach, flattening the original potential by iteratively adding to it the negative of the original potential, approximated by a sum of Gaussians, improves the convergence of free energy sampling. Implementation of this strategy in the MTD approach results in a high computational cost of building the negative potential; see the detailed discussion in our recent paper.9 However, here we bypassed this cost by using the idea of the PD refinement described in section II.3.1. Namely, we derive the negative potential by fitting the corresponding PES in the gas phase by a sum of Gaussians using eqs 25 and 26. Thus, in addition to their use in the PD refinement, the Γ-functions of eq 26 help in convergence of the PMF calculations. This is achieved by flattening the original potential by adding to it the negative of the corresponding Γ-function. That is, following the FEP/US approach to PMF,34 we construct the mapping potentials of the form

Em=(1λm)ERS+λmEPS (29)

where

ERS=EQMΓQM+ECONS(ξRS)EPS=EQMΓQM+ECONS(ξPS) (30)

where one can, for instance, use ECONS of eq 13.

To obtain the PMF, we start with the FEP approach for the corresponding change in free energy (using eqs 8 and 10). Then, the PMF is evaluated by using a modification of eq 19 (FEP/US approach):

Δg(ξ)=ΔGmkTlnδ(xξ)exp(ETARG(x)Em(x)kT)Em (31)

Furthermore, the results from different simulation windows are combined by

Δg¯(ξ)=framesNi(ξ)framesNi(ξ)Δgi(ξ) (32)

where Ni(ξ) is the number of times MD visited a particular RC value, ξ, while propagating on the ith mapping potential. The practical application of this approach to PMF calculations, when the solvent is modeled using an implicit solvent model, will be illustrated below.

II.3.3. Application of the Gamma-Correction in PMF Calculations: Practical Examples for Condensed Phase Calculations.

When moving to a condensed phase, we should take into account the fact that different QM(ai)MM potentials will have different charge distributions for the solute, and that this has to be reflected in the RP. A straightforward application of the Γ-function approach (described in section II.3.2) to condensed phases is to evaluate the PES with a solvent model, and subsequently derive the corresponding Γ-function that reflects the solvation effects. This strategy is obviously impractical when one deals with explicit solvent models, which are extremely expensive for QM(ai)MM studies. However, performing reasonable minimization is not expensive when one uses an implicit solvent model.

To illustrate this approach, as a first step, we examined the option of using the Γ-function which captures the solvation effect by the COSMO model.46 As a test system, we took the same SN2 reaction described above and fitted the ΓCOSMO function to the PES obtained using MOPAC200940 with PM3+COSMO combination (see Figure 13). The free energy profile, ΔgPM3,COSMO, was obtained by the approach described in section II.3.2 with the MD simulation run on the flat potential, EPM3,COSMOΓCOSMO. The high efficiency of sampling at the TS region and its uniform distribution along the studied range of the RC is shown in Figure 14.

Figure 13.

Figure 13.

The PMFs obtained for the PM3 potential with explicit (green) and implicit COSMO (red) solvation models. The PMFs were obtained by adding to the PM3(COSMO) and to the PM3MM potentials the negative of ΓCOSMO. The figure also depicts the ΓCOSMO (blue line) function, derived from PES performed on the PM3 potential with the COSMO solvation model (blue triangles).

Figure 14.

Figure 14.

The distribution of the nuclear RC during MD simulation on the potential EPM3MMΓCOSMO (red) and on the potential EPM3+COSMOΓCOSMO (green). The histogram for EPM3MMΓCOSMO shows a poor sampling at the TS for the explicit model, since ΓCOSMO does not make the explicit potential fully flat.

The proposed approach provides an effective and powerful way that outperforms the recent attempts to obtain the FES of solute in implicit solvent models.47 In particular, the FEP/US PMF allows movement along the specified reaction path in the most efficient way as far as the sampling is concerned (since the FEP increments are chosen in terms of the energy difference between the two adjacent states which is what essentially matters the most for the fastest free energy convergence). The representative Γ-function flattens the original potential and within a given simulation window, and thus, it naturally allows for proper sampling in all directions perpendicular to the RC.

Unfortunately, the implicit solvent models have difficulties in capturing changes of the solute cavity during chemical reaction (even with a reasonable calibration) and can miss other microscopic features of solvent. Thus, we may have significant errors in estimates of the TS solvation (see Table 2 and simulation 5 in the Supporting Information). Furthermore, the implicit models cannot capture the microscopic physics of protein interiors.

Table 2.

Comparing the Solvation Energies for the COSMO and for the Explicit SCAAS Models

ΔGAC(0Qgas),
kcal/mol
ΔHf,COSMOΔHf,GAS,
kcal/mol
XTSPM3 (fixed) −52.1 −61.1
XRSPM3 (fixed) −67.8 −70.4
ΔΔGsolv 15.7 9.3

In an attempt to overcome the limitations of implicit solvent models, we considered the performance of the proposed approach for the explicit solvent models. We obtained the PMF with the explicit solvent model, ΔgPM3MM, (using the ESP-derived QM charges) by running on the potential

Em=ΨpolarHPM3MMΨpolarΓPM3,COSMO+(1λm)K(ξξRS)2+λmK(ξξPS)2 (33)

with the polarized PM3MM Hamiltonian

hPM3MM(i)=hPM3(i)+MM atomsqjrij (34)

which we have implemented in MOPAC2009, following our earlier work7 (see also simulation 6 in the Supporting Information). While the resulting PMF (see Figure 13) is in excellent agreement with the PMF separately obtained by the WHAM approach (see simulation 2 in the Supporting Information), the efficiency of sampling on the potential given by eq 33 at the TS (shown in Figure 14) is very low. This can be explained by comparing the barriers in the explicit and the implicit solvation models. The addition of ΓCOSMO does not make the TS region on the explicit potential fully flat due to the differences in the TS solvation shown in Table 2.

The PMF on the PM3+COSMO potential resulted in a lower free energy barrier than that obtained by the energy scan. Furthermore, with the explicit solvent model, we also obtained a higher free energy barrier. To explain the difference in the activation free energy between different solvent models, we calculated the solvation free energies for the gas-phase optimized PM3RS and TS using MOPAC2009 with the COSMO46 model and by the adiabatic charging model.35 From the results given in Table 2, one can see that there is a difference in solvation of the RS and of the TS between the explicit and implicit models, with the highest disagreement between the implicit and explicit solvent models found for the TS solvation. Essentially for this SN2 reaction, ΔΔGSOLV is a rough approximation of the contribution to the activation free energy barrier due to the solvation difference (if we add the gas phase barrier of ~12 kcal/mol, we get roughly the PM3MM barrier given in Figure 13). The deficiencies of the barrier obtained by this specific COSMO implementation are beyond the scope of this work, since different implicit models can give different results and it is always important to compare the solvation estimates by the different models (see, e.g., the careful study of ref 48). In general, any attempts to obtain a reliable QMMM surface should involve careful calibration of the model on the observed solvation energies (as we have been doing with the EVB model for a very long time), but this is not the issue addressed in the present work.

In section II.3.3, we showed that the approach of section II.3.2 provides an extremely powerful way of evaluating the PMF with an implicit solvent model. This is significant, since the implicit solvent models are frequently used in studies of reference solution reactions, and such studies require a very tedious manual mapping in order to obtain reliable estimates.49 Now we have provided a systematic and effective way of obtaining the relevant free energy surfaces.

Mapping with an implicit solvent model does not present major problems (except missing the nonequilibrium solvation effect45 which would lead to underestimating the implicit solvation barrier) as well as having other well-known disadvantages of continuum models. However, in section II.3.3, we also saw that evaluation of the PMF becomes much more challenging when we deal with explicit solvent models. Here the selection of the proper mapping potential is not trivial, as we have to force the combined solute–solvent coordinate to respond to the change in the solute charge distributions. Here we can take advantage of the fact that any RP-based QMMM calculation in polar environments (water or enzymes), and, in particular, in the approaches that use semiempirical MO-based QMMM potentials as the RP,19,25 can be refined using the Γ-correction of eq 28. This correction can be derived, for instance, by eqs 25 and 26 with an implicit solvent model for the original RP and for the TP (e.g., COSMO for which the analytical gradients are available), using:

EREF=EREFΓREF,IMPL+ΓTARG,IMPL (35)

Even though evaluating the PES with the implicit solvent is relatively expensive, it describes more consistently the solute changes along the reaction path in the condensed phases than the corresponding gas phase PES, since the solute polarization is captured in a physically more reasonable way by the implicit solvent models.

In view of the length of section II.3, we summarize below what was done in this section. Section II.3.1 showed how to refine the RP using the Γ-correction. Section II.3.2 introduced another practical use of the Γ-correction for improving the sampling efficiency while evaluating PMF. Section II.3.3 demonstrated a successful straightforward application with an implicit solvent model and less efficient (but possible if agreement between the implicit and explicit solvation models is improved) with the explicit solvent. Moreover, eq 35 provides a practical recipe for refinement of RP in the condensed phase described in section II.3.1.

II.4. Extension to the Condensed Phase Using the EVB-Type Solvent Potential.

In this section, we review application of the methods described in section II.3 when they are combined with an idea of using the EVB-type solvent driving potential. This allows for implementation of eq 30 in a practical way when the difference between the implicit and explicit models is significant or when it is desirable to have transferable Γ-functions (e.g., from the gas phase to water or from water to protein). These functions can be easily derived in the gas phase (or with an implicit solvent model) and subsequently applied to calculations in condensed phases without reparameterization. The Γ-correction, fitted to the gas phase calculations, contains only information about changes in the solute along the reaction path, namely, the information about the intramolecular part of the potential. To incorporate the effect of solvent and still take advantage of the gas-phase derived Γ-function, we consider a mapping potential where the QM and MM regions are coupled classically by

Emap=EPM3,gasΓPM3,gas+(1λ)(ECONS(ξRS)+EsS(QRS))+λ(ECONS(ξPS)+EsS(QPS)) (36)

where QRS and QPS are the vectors of the QM solvated charges at the RS and at the PS.

EsS(Qk)=i=1QMj=1MM(Qi(k)qjrij+Aijrij12Bijrij6) (37)

Equation 37 contains the EVB-type solvent driving potential, which polarizes the solvent in the correct direction toward the product state by the solvated solute charges, and captures the nonequilibrium solvent effect. However, this treatment should be applied with care to SN1 and other charge separation reactions11 (since essentially the QM region is described by the gas phase QM potential, which is perturbed by the MM force field without the correct solute polarization). This mapping potential allows separating the QM and MM parts for the straightforward application of the gas-phase Γ-functions, while addressing the sampling problems separately (namely, requiring that the intra QM barrier is flattened with the gas-phase Γ-function and the solvent is pulled toward the TS by the EVB-type potential).

A correct treatment of the solute polarization in the case of a MO-based QMMM potential is obtained by:

EQMMM=ΨpolarHQMMMΨpolar+EVdW (38)

Note that this treatment was used for most of the calculations in this work.

At this stage, it is interesting to see whether we can extend the approach of eq 28 to simulations of processes in polar environments. As was mentioned above, the gas phase Γ-function only contains information about the intramolecular contributions and says nothing about the intermolecular interactions with solvent. Equation 35 with the Γ-function obtained with an implicit solvent can provide a higher-level approximation, but this requires further examination. In particular, it is interesting to examine whether such a correction (obtained by fitting the PES performed with an implicit solvent model) can be directly used to refine the RP for enzymatic simulation. Similarly, application of the EVB-type solvent driving potential combined with eq 33 should be further explored for purposes of improving the sampling of a MO-based QMMM RP including the correct solute polarization.

A hypothetical situation of refining a general RP (e.g., a MO-type QMMM PM3MM potential) is considered in Figure 15A. Here the original reference potential is given by

EREF=EPM3,gas+EsS(QPM3,gas) (39)

Figure 15.

Figure 15.

The PMFs obtained for the CH3Cl and ClSN2 reaction in water. (A) (red) PMF constructed for the EPM3MM potential (polarized PM3 Hamiltonian with the solvent polarized by the ESP charges) using the WHAM approach. (blue) PMF obtained by sampling on the flat EPM3(GAS)ΓPM3(GAS) with the EVB-type solvent driving potential using the FEP/US approach for the potential EPM3(GAS)+EsS(QPM3). (B) Demonstrating the refinement of the original reference potential, EPM3(GAS)+EsS(QPM3), by forcing it to approximate the PM6MM target potential. (red) PMF for the EPM6MM potential (polarized PM6 Hamiltonian with the solvent polarized by the ESP charges) constructed using WHAM. (blue) The refined reference free energy surface obtained by sampling on the flat EPM3(GAS)ΓPM3(GAS) with the EVB-type solvent driving potential, using the FEP/US approach, for ETARG=EPM3ΓPM3+ΓPM6+ESOLV(QPM3).

(with the corresponding FES obtained after removing the bias of eq 36 using eq 31, and combining the multiple simulation windows using eq 32) (see simulation 7 in the Supporting Information). The corresponding original FES (as well as the PMF obtained using eq 38) are given in Figure 15A. To refine this reference potential for the PD calculation of the free energy barrier on the PM6MM potential, we use

EREF=EREFΓPM3,gas+ΓPM6,gas (40)

with the mapping potential given by eq 36. The refined reference FES (for the potential of eq 40) as well as the target PM6MM FES (obtained using the PM6MM version of eq 38) are given in Figure 15B. In actual QM(ai)MM studies, one can substitute the PM6MM TP (which was used here only for demonstration purposes) by an expensive QM(ai)MM potential. In such cases, it is reasonable to assume that the corresponding gas phase potential and the Γ-function of eq 40 can be substituted with the corresponding correction obtained with an implicit solvent model (e.g., COSMO).

Turning now to the EVB RP, we note that this model describes the correct polarization of solute by including the effect of solvent in the EVB diagonal elements.11 Thus, for the EVB RP, the Γ-correction in the gas phase is the correction for the intramolecular interactions within the solute. Moreover, the solute–solvent interaction term can be refined by using

EEVB=EINTRA+ΔΓINTRA+c12E1,sS(QRSTARG)+c22E2,sS(QPSTARG) (41)

In other words, in the case of the EVB RP, the gas phase correction, ΔΓ, refines the intramolecular part of the EVB RP by the relationship:

ΔΓINTRA=ΓTARGΓEVB (42)

Since the original EVB RP can be partitioned as

EEVB=c12E1+c22E22c1c2H12=EINTRA+c12E1,sS(QRS0)+c22E2,sS(QPS0) (43)

The only minor problem here can be that the intramolecular interactions reflected by the ΓEVB are calculated with the original vector of EVB charges, Q0, and one might need to recalculate the ΓEVB for the new set of charges to improve the accuracy of the ΔΓINTRA correction.

In conclusion, we demonstrate in Figure 16 how we refine the EVB RP originally derived for the PM3 gas-phase calculations. In other words, we refine the EVB RP for PM3MM and PM6MM as the TPs, see also simulation 8 in the Supporting Information.

Figure 16.

Figure 16.

Demonstrating the refinement of the EVB reference potential in condensed phases, using the correction potential ΓEVB+ΓTARG fitted by Gaussians plus a vector of the new EVB charges derived for the target potential. The free energy profiles obtained by the EVB FEP/US approach along the EVB energy gap (A) and along the nuclear RC (B). (blue) the original reference potential with parameters refined for the gas phase PM3; (red) the refined EVB reference potential for PM3MM target potential; (green) the refined EVB reference potential for PM6MM target potential.

III. CONCLUDING DISCUSSION

This work refines, quantifies, and validates our paradynamics model. The validation starts by comparing the LRA approach to the full FEP treatment for two arbitrary MO-type QMMM potentials, and is also carried out for EVB as a RP. The LRA is found to be sufficiently accurate in evaluation of the free energy perturbation when moving from the RP to the TP at the TS and at the RS. It is also found that it is sufficient to use a harmonic constraint potential in determining the free energy correction at the specific value of the RC. Overall, it is concluded that the calculated ΔΔG between the two modified potentials is a good approximation for the difference between the free energy functions, ΔΔg(ξ), of the TP and of the RP at the coordinate ξ. That is, it is found that

ΔΔGLRA(EREF+ECONS(ξ)ETARG+ECONS(ξ))ΔΔGFEPΔgTARG(ξ)ΔgREF(ξ) (44)

Several practical improvements for the PD reference potential are proposed. More specifically, a novel idea of refining the RP (usually following the initial refinement of the EVB parameters in the case of the EVB RP) in a general case is put forward and tested on the CH3Cl+Cl SN2 reaction in water. The refinement approach is based on modifying the RP with the help of Gaussian functions fitted to the PESs along the RC performed on the original RP and on the TP. This technique is shown to be efficient for gas phase simulations, as well as with the implicit solvent models with no further modifications. Moreover, the extension of this model to simulation in condensed phases with the explicit solvent models is made in a reasonable way, although further validation is needed.

It should be clarified here that our approach for adding Gaussian functions shares common features with the MTD basic idea of flattening the original potential by adding to it the negative of the model potential.32,33 However, in contrast to spending an enormous amount of time while iteratively elevating the local minima of the ab initio QMMM potential,50 we propose a simple approach of building the negative potential by simply evaluating the gas phase or the implicit solvent PES along the RC.

We note here that some RP approaches, which are very similar to the PD approach (e.g., ref 51), expressed the optimistic perspective that the results obtained therein are superior to our earlier RP results. What has been overlooked is that those works addressed the rather trivial challenge where the solute is fixed (this case has been already handled in our 1992 work7). That is, with a fixed solute, the calculations converge very fast. Nevertheless, obtaining reliable solvation free energies by ab initio calculation is still a challenge even with a fixed solute.3 Here we must point out that even a very recent innovative idea of combining lambda dynamics and metadynamics52 in solvation calculations is significantly less efficient than our RP approach, where almost all the charging processes are done with a classical potential.53

The present work demonstrated that the end point LRA approach, involving averaging both on the RP and on the TP, practically coincides with the full n-step FEP, whereas the averages taken only on the RP result in a higher error. This point is important, since most of the works24,25,51,54 that adopted our RP idea have not yet moved to the end points LRA treatment. Of course, the error associated with the initial LRA estimate will further decrease as the RP is refined to increase the overlap with the TP. In any case, both elements of our PD approach, the refinement of the RP and the LRA estimate of the FEP, are crucial for the efficiency and accuracy when the RP is used. In fact, using the refined PD RP, one can sufficiently sample the TP to ensure the free energy convergence (since the computationally cheap refined RP has high overlap with the TP) as well as to use the highest possible level of theoretical treatment for the QM region given limited computational resources (with evaluation of the true FEP). In fact, we demonstrated in our previous work9 that the PD approach is about 2 orders of magnitude more efficient (in terms of calling the ab initio QM calculations) than MTD. These advantages of the PD approach might help in avoiding the possible corresponding artifacts encountered by some researchers in MTD QMMM studies, which resulted in difficulties of reproducing the experimental catalytic activity of enzymes.55 In this respect, we would like to clarify that even the extreme efforts and technically impressive (by the size of the QM region) results in studies of B12 enzymes56 by MTD are not necessarily as reliable as one might tend to think considering the elegance of the approach. In fact, the conclusions56 that the surface cannot be concerted are problematic, as the actual surface is quite flat in the diagonal range and the actual calculations involve a very short sampling time for the PMF calculations (1.5 ps for each simulation window) and a very short (5 fs) time interval between the iterative depositions of Gaussians in the MTD sampling, which might not be sufficient for the accurate elevation process. Furthermore, we would like to clarify that there is no a single experiment that actually excludes the concerted mechanism (clearly no such experiment exists in solution where there is no relevant model system with both the Co─C bond and the hydrogen donor). Here we would like to state that the strategy (e.g., ref 57) of using very careful QM calculations in solution (with a higher level QM model than that used in ref 56) has produced a concerted path, and that moving the solution surface to the protein environment by a calibrated EVB model is expected to be more reliable (both in terms of the sampling and in terms of extrapolation of a reliable reference system) than the direct MTD in the protein site. For more solid conclusions, it is crucial to perform the MTD simulations also for the solution reaction (this is also crucial for obtaining the catalytic effect57). Furthermore, studies of enzyme catalysis require major experience and validation in treating electrostatic effects in proteins and such validations have not yet been reported by the MTD studies, whereas this has been done reliably by reference potential QM(ai)MM calculations of pKas.3 Here again, it would be interesting to see a comparison between the PD and MTD studies, and since the EVB (with a very careful calibration) has already reproduced quantitatively the catalysis in B12 with a concerted path, we tend to believe that the same results (or at least a flat surface) will be obtained by the PD QM(ai)MM approach with the EVB RP.

As clarified in our previous work, the MTD is formally similar to the earlier PD approach9 in the sense that the RP is built iteratively. However, the important difference, which results in major time savings in the PD approach, is that, while the search of the negative RP is carried out blindly by directly sampling of the TP in MTD calculations, PD calculations start with a RP that is already close to the TP and further refining it iteratively. Of course, our PD approach requires an approximate knowledge of the reaction path, but as we pointed out in detail elsewhere,9 we find it useful not to have a black box blind approach in studies of chemical problems. Our recipe of overcoming the blind search is quite simple. It involves the initial mapping with an implicit solvent model that can provide the main mechanistic options, the relevant RC, and the reaction paths. It is true that there is a conceivable advantage in fully automated black box approaches (provided they are sufficiently efficient). However, when one faces the dilemma, in principle, between the simplified “artificial intelligence” and human intellect and experience then, knowing the fundamental chemical concepts, or identifying the possible reaction path in solution using fast models, the latter has an advantage in solving complex problems. Thus, we consider the present work as another illustration of the advantage and power of the PD in QMMM studies.

Supplementary Material

Supporting Information for publication “Exploring, Refining and Validating the Paradynamics QM/MM Sampling”

Computational details of the performed simulations and description of the implemented MOPAC-MOLARIS QMMM interface and the script in Maple9.5 used in fitting potential energy scans with sums of Gaussians. This material is available free of charge via the Internet at http://pubs.acs.org.

ACKNOWLEDGMENTS

This work was supported by NIH grant GM 24492 and NSF grant MCB-08364000. We would like to express our profound gratitude to Dr. J. J. Stewart for letting us work with the code of MOPAC 2009 in these QMMM studies. We gratefully acknowledge the University of Southern California’s High Performance Computing and Communications Center for computer time.

Footnotes

The authors declare no competing financial interest.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information for publication “Exploring, Refining and Validating the Paradynamics QM/MM Sampling”

RESOURCES