Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 14.
Published in final edited form as: J Chem Theory Comput. 2021 Sep 1;17(9):5745–5758. doi: 10.1021/acs.jctc.1c00565

Machine Learning Assisted Free Energy Simulation of Solution–Phase and Enzyme Reactions

Xiaoliang Pan , Junjie Yang , Richard Van , Evgeny Epifanovsky , Junming Ho , Jing Huang §, Jingzhi Pu , Ye Mei ⊥,#,@, Kwangho Nam Δ, Yihan Shao
PMCID: PMC9070000  NIHMSID: NIHMS1801843  PMID: 34468138

Abstract

Despite recent advances in the development of machine learning potentials (MLPs) for biomolecular simulations, there has been limited effort in developing stable and accurate MLPs for enzymatic reactions. Here, we report a protocol for performing machine learning assisted free energy simulation of solution-phase and enzyme reactions at an ab initio quantum mechanical and molecular mechanical (ai-QM/MM) level of accuracy. Within our protocol, the MLP is built to reproduce the ai-QM/MM energy as well as forces on both QM (reactive) and MM (solvent/enzyme) atoms. As an alternative strategy, a delta machine learning potential (ΔMLP) is trained to reproduce the differences between ai-QM/MM and semi-empirical (se) QM/MM energy and forces. To account for the effect of the condensed–phase environment in both MLP and ΔMLP, the DeePMD representation of a molecular system is extended to incorporate external electrostatic potential and field on each QM atom. Using the Menshutkin and chorismate mutase reactions as examples, we show that the developed MLP and ΔMLP reproduce the ai-QM/MM energy and forces with an error on average less than 1.0 kcal/mol and 1.0 kcal/mol/Å for representative configurations along the reaction pathway. For both reactions, MLP/ΔMLP-based simulations yielded free energy profiles that differed by less than 1.0 kcal/mol from the reference ai-QM/MM results, but only at a fractional computational cost.

Graphical Abstract

graphic file with name nihms-1801843-f0001.jpg

1. Introduction

To accurately model solution-phase and enzyme reactions, it would be desirable to perform direct ab initio quantum mechanical and molecular mechanical (ai-QM/MM) free energy simulations.111 In a typical ai-QM/MM free energy simulation, the ai-QM/MM potential is evaluated for each configuration of the system of interest, in which a quantum mechanical (QM) reactive region (typically containing up to 150 atoms) is embedded in thousands of or more molecular mechanical (MM) atoms (i.e. the rest of enzyme or solvent atoms).12,13 In practice, the ai-QM/MM potential needs to be computed for 105 or 106 configurations in umbrella sampling calculations14 or other enhanced samplings (such as metadynamics), before the mean free energy pathway and corresponding reaction free energy profile can be obtained. As such, direct ai-QM/MM free energy simulations are rather compute-intensive, requiring O(105) CPU hours,1518 and thus have yet to gain wide use.

To avoid such steep costs of direct ai-QM/MM free energy simulations, Gao,19 Warshel20 and others2125 developed indirect free energy simulations, where sampling is carried out using a reference semi-empirical QM/MM (se-QM/MM) potential and then the free energy result is corrected with a se-QM/MM→ai-QM/MM thermodynamic perturbation or with an interpolation between the two levels of potential, for example along the energy minimized reaction path.26,27 Alternatively, one can adopt the multiple timestep simulation methodology from Tuckerman,2831 Schlick,32,33 Nam,34 Roux,35 Rothlisberger,36 and others,37 where se-QM/MM trajectory propagation at regular/inner timesteps is coupled with ai-QM/MM trajectory corrections at outer timesteps. Needless to say, the accuracy of both indirect and multiple-timestep simulations is controlled by the quality of the se-QM/MM potential in use. In many cases, it is beneficial to re-optimize the se-QM/MM parameters23,38 or directly modify the internal forces39 to ensure a proper thermodynamic perturbation or interpolation correction or to maintain a stable multiple-timestep trajectory.

Recently, Yang,4042 Gastegger,43 York,44 Riniker45 and coworkers have proposed machine learning (ML) as a new strategy to address the computational cost of direct ai-QM/MM free energy simulations. Specifically, for configurations of interest, artificial neural network (ANN) models were designed and trained to reproduce either the ai-QM/MM potential, i.e., the machine learning potential (MLP), or the difference between the ai-QM/MM and se-QM/MM potentials, hereafter, referred to as the delta machine learning potential (ΔMLP). Thus developed MLP or ΔMLP was then employed (in lieu of the ai-QM/MM potential) to drive the dynamical sampling of the enzyme system. These MLPs/ΔMLPs led to fairly accurate free energy barriers, with errors of around 1.0 kcal/mol, for several solution-phase reactions.4042,44 In these approaches, however, the effects of solvent on the reacting system were rather homogeneous, and their applicability to reactions in a heterogeneous solvent environment, such as in enzyme, has not been fully explored.

When compared to the construction of MLPs for gas-phase and small periodic systems, these are substantial achievements because the training of MLP for reproducing the ai-QM/MM potential is more challenging. Naively adding thousands of MM atoms to commonly-used ML descriptors of a gas-phase molecular system [such as symmetry functions for constructing high-dimensional neural network potentials (HDNNP) descriptors,4648 Coulomb matrix,49 Faber-Christensen-Huang-Lilienfeld (FCHL) representation,50 tensor representation,51 embedding matrix,52,53 and PhysNet54] would lead to an explosively large number of inputs for the ANN model, thus jeopardizing the convergence (in training) and affordability (for both training and production).

In the development of the above mentioned MLPs/ΔMLPs, two approaches have been proposed to efficiently include MM atoms in the ANN models. The first is an “implicit” approach, where MM atoms are implicitly accounted for through their collective perturbation to the electronic structure of the QM region. For example, Yang and coworkers40,41 used semi-empirical QM Mulliken charges as perturbed by all the MM charges or the total MM electrostatic potential on each QM atom,42 whereas Gastegger and coworkers43 captured the effect of MM atoms through their net dipolar field on each QM atom. The second is an “explicit” approach, in which any MM atom within a “cutoff” distance (such as 6.0Å) from the QM region is included in the computation of ML descriptors. This approach was proposed by York and coworkers44 for the QM/MM expansion to the embedding matrix within the DeePMD coding framework,53,55 and also utilized by Riniker and coworkers45 to extend the HDNNP descriptors to QM/MM calculations.

Inspired by these approaches to include MM environment in the MLP development, in this work, we aimed to develop a more robust protocol for training MLPs/ΔMLPs for free energy simulations of enzyme reactions by incorporating the effects of long-range MM electrostatic interactions, such as, under the periodic boundary conditions. [In a separate manuscript from us,56 a different ai-QM/MM-based machine learning approach was proposed, where ANN was trained to produce a set of chaperone polarizabilities that augment the insufficient polarizability of the semi-empirical Hamiltonian.] Anticipating some amino acid side chains and/or solvent molecules to move in and out of the cutoff boundary with the progression of the reaction, we opted not to follow the “cutoff” approach, because the number of MM atoms retained in the cutoff might change, for example, between neighboring umbrella sampling windows and thus additional smoothing57,58 might be needed to ensure a continuous potential energy surface. Furthermore, in enzymatic reactions, the long-range electrostatic interactions play a critical role in many catalytic reactions as well as in the stability of the molecular dynamics (MD) simulations. In addition, within the “implicit” approach, we opted not to utilize semi-empirical Mulliken charges on QM atoms, which would not be available if we aimed to directly produce the ai-QM/MM-quality MLP to propagate the MD trajectories. In the end, our approach would resemble that of Yang and coworkers42 and of Gastegger et al,43 but the details differed significantly as described briefly below.

Compared to previous efforts, three key features differentiate our approach from those of other groups. Firstly, long-range electrostatic effects of the MM environment is incorporated rigorously in the training of the MLP models. As will be demonstrated in Section 2, this is achieved by representing the enzyme/solvent environment as MM electrostatic potential and field (ESPF) within our recent QM/MM with augmentary charges (QM/MM-AC) scheme.59 Secondly, a more reliable sampling/collection of training configurations is acquired by calibrating the se-QM/MM Hamiltonian. In the case of ΔMLP, it reduces the magnitude of the Hamiltonian differences and thus lessens the need of iterative training of the MLP model. Finally, standalone MLP and ΔMLP are developed side-by-side, thanks to the construction of a single set of ML descriptors, thereby allowing a direct comparison of the two potentials for reactions of interest. With this development, we have simulated the catalytic reaction of an enzyme, chorismate mutase, which can be considered to be the first realistic and successful application of MLP/ΔMLP to model enzyme reactions under the full periodic boundary conditions.

This article is organized as follows. Section 2 introduces our overall methodology (incorporation of electrostatic embedding potential into the MLP, description of long-range QM/MM electrostatics, umbrella sampling), while more computational details for the training and free energy simulations are presented in Section 3. Results for solution-phase Menshutkin reaction and chorismate mutase-catalyzed Claisen rearrangement are presented in Section 4, with concluding remarks made in Section 5.

2. Machine Learning Potential

2.A. Descriptors for the QM atoms

Our current MLP/ΔMLP implementation is based on Deep Potential - Smooth Edition (DeepPot-SE),52,53 where each QM atom i is represented by its local environment matrix to capture the internal interactions within the QM region (with N QM atoms). Specifically, for the i-th atom in the QM region at position Ri, its has n = N − 1 neighbors. The corresponding environment matrix (n-by-4) is

Ri=(s(R1i)s(R1i)x1iR1is(R1i)y1iR1is(R1i)z1iR1is(R2i)s(R2i)x2iR2is(R2i)y2iR2is(R2i)z2iR2is(Rni)s(Rni)xniRnis(Rni)yniRnis(Rni)zniRni), (1)

where s(Rji) = 1/|RjRi| contains the Coulomb interaction. No screening was applied to the Coulomb interaction, due to the small size of the QM region. Next, an embedding neural network, Gi, maps a single value, s(Rji), through multiple hidden layers into m1 outputs

Gi=((G[s(R1i)])1(G[s(R1i)])2(G[s(R1i)])m1(G[s(R2i)])1(G[s(R2i)])2(G[s(R2i)])m1(G[s(Rni)])1(G[s(Rni)])2(G[s(Rni)])m1). (2)

The encoded feature matrix Di is m1-by-m2

Di=(Gi1)TRiRiTGi2, (3)

where Gi1 is the same as Gi and Gi2 is a matrix that consists of the first m2 columns of Gi with m2m1. Both m1 and m2 are hyperparameters of the model.

2.B. Descriptors for the MM environment

For QM/MM interactions, our model closely resembles the regular QM/MM models, where the MM atoms are represented as point charges with no atomic identities for the QM-MM electrostatic interactions, and the QM-MM van der Waals (vdW) interactions are treated at the MM level.4,7,6062

In regular QM/MM models, there are three general schemes to treat the QM-MM electrostatic interactions, namely, the continuous, the surrogate, and the hybrid schemes,59 depending on whether the continuous electron density or its surrogate charges or a combination interact with the MM atomic charges. Since electrons are not described explicitly in most MLPs, it is natural to adopt a surrogate-like scheme in MLP where QM atoms interact with MM atoms through the electrostatic potential and field generated,

ϕi=BMMqB|RiRB|, (4)
Ei=BMMqB(RiRB)|RiRB|3, (5)

or higher-order Taylor expansions at the QM atomic sites. In contrast to normal QM/MM calculations, where one either evaluates the core Hamiltonian contribution associated with ϕi and Ei or derives multipole moments on each QM atom to interact with them, we choose to add the electrostatic potential ϕi and field Ei on each QM atom directly to the list of the input features of the MLP. In this way, we can control the accuracy of not only the QM-MM interactions but also their variation with respect to the change in the QM or MM atoms, i.e., forces.

Specifically, the contributions of the local electrostatic potential, ϕi, and field, Ei, are fed into embedding neutral networks similar to Eq. 2, leading to additional feature matrices, Diϕ and DiF, to capture the effect of the MM environment. For example, for the local electrostatic potential, a single value ϕi is mapped to m1 outputs through multiple hidden layers, which is used as the feature matrix Diϕ, i.e.,

Diϕ=((G(ϕi))1(G(ϕi))2(G(ϕi))m1). (6)

The electric field on each QM atom is projected onto the vectors pointing toward its neighbors and scaled by the inverse of the distance, i.e.,

Fi=[s(r1i)(R1iR1i)Eis(r2i)(R2iR2i)Eis(rni)(RniRni)E]. (7)

Then, each element of the projected local field Fi is mapped to m1 outputs through multiple hidden layers, i.e.,

GiF=((G(F1i))1(G(F1i))2(G(F1i))m1(G(F2i))1(G(F2i))2(G(F2i))m1(G(Fni))1(G(Fni))2(G(Fni))m1). (8)

The final feature matrix is a m1-by-m2 matrix, which can be calculated as

DiF=(GiF1)TGiF2, (9)

where GiF1 is the same as GiF and GiF2 is a matrix that consists of the first m2 columns of GiF. In Eqs. 6 and 9, m1, m1, and m2 are hyperparameters of the model.

2.C. Fitting Neural Network

After the three sets of feature matrices (i.e., Eqs. 3, 6 and 9) are obtained from the embedding neural networks, they are fed into a fitting neural network (NN) to obtain the total energy,

E=iNN(Di,Diϕ,DiF). (10)

With a differentiable neural network model, the analytical energy derivatives in Fig. 1 are also obtained. During the standalone MLP model training, the loss function combines the error in total energy, forces on QM atoms, as well as ESP potential/field values on MM atoms,

L=λ1|EEref|2+λ2iQM|ERi+ErefRi|2+λ3BMM|EqBϕBref|2+λ4BMM|1qBERBEBref|2, (11)

where the learned forces on QM atoms, ERi, and ESP potential/field values, EqB and 1qBERB, are obtained from the differentiable NN model in use. (We note that, with an efficient treatment of the long-range QM/MM interactions as described in the next subsection, the third and fourth terms in Eq. 11 sum over only the charges on the inner MM atoms within 10Å from the QM region. Thus, the number of terms in the summations remains unchanged even for large systems.) The same feature matrices can be applied to the development of ΔMLP model, in which the energy, force, and ESP differences between ai-QM/MM and se-QM/MM models are used as the reference values in Eq. 11. In both MLP models, λ1, λ2, λ3, and λ4 are tunable prefactors and can be adjusted during the model training.

Figure 1:

Figure 1:

Workflow for the training of MLP and its use to generate energy and forces for MD simulations. Qi is the ESP charge on QM atom i, which is fitted using the electrostatic potential ϕB on inner MM atom positions.

2.D. Long-Range QM/MM Interactions

For a condensed-phase reaction, the MM environment contains thousands of (or more) atoms. An explicit account of all these atoms in the MLP training becomes very inefficient, if feasible at all. A common strategy around this is to completely ignore the long-range electrostatics due to MM atoms beyond a cutoff distance.44,45 In the context of ΔMLP-learning, this strategy means that the long-range electrostatics of ΔMLP is approximated at the low-level method. However, a systematic analysis has yet to be carried out to gauge the impact of solvent/enzyme atoms moving across the cutoff boundary during the simulation on the developed ML potentials, as it can contribute to a discontinuity on the potential energy surface.

In lieu of this approximation, we have decided to adopt our recent QM/MM with Augmented Charges (QM/MM-AC) model59 to accurately capture long-range QM/MM electrostatics in this work. Specifically, only inner MM atoms, namely those within a cutoff distance (10Å in this work) from the QM region, are explicitly incorporated into the training of MLP and ΔMLP. This can be easily achieved by replacing the MM charges in Eqs. 4 and 5, with

qBqB<,C+qBAC, (12)

where qB<,C refer to the “continuous” portion of charges on inner MM atoms and qBAC are the augmentary charges projected on the inner MM atoms from all outer MM charges and the “surrogate” portion of charges on the inner MM atoms. More specifically,

qB<,C=S(dBmin)qB<, (13)

where qB< refer to charges on inner MM atoms, dBmin is a soft minimum distance of MM atom B to the QM atoms. S(dBmin) is a switching function that decays from 1 to 0 smoothly when dBmin increases from 0 to the cutoff distance. For all outer MM charges and the “surrogate” portion of charges on the inner MM atoms, the electrostatic potential ϕi> that are exerted from them onto the QM atoms are calculated using the particle mesh Ewald (PME) summation method.63,64 Then the augmentary charges qB<,AC are calculated as

qB<,AC=iϕi>((WK)1)i,BwB, (14)

where K is Coulomb interaction tensor between QM atoms and inner MM atoms, and wB is a weighting function that scales the Coulomb interaction tensor to maintain a smooth potential energy surface at the cutoff distance, (WK)i,B=wBKi,B.

Then, the QM/MM-AC charges, qBAC in Eq. 12, are precomputed for inner MM atoms of each configuration before our training of MLP and ΔMLP. After these models are constructed, the electrostatic potential on inner MM atom positions, ϕB=EqB, is used to fit ESP charge Qi on QM atom i,

Qi=B(K1)i,BϕB. (15)

As shown in Fig. 1, these charges are then used to provide the electrostatic potential to update the forces on outer MM atoms, through the procedure described in Ref. 59.

2.E. Implementation and Training Details

Our method, which combines the Deep Potential – Smooth Edition (DeepPot-SE)52 descriptors for QM atoms and QM/MM-AC-based descriptors for the MM environment, was implemented in PyTorch.

In this work, the hyperparameters were not fully optimized, where the recommended values from the DeePMD-kit package were adopted where applicable. For each of the local embedding networks, three hidden layers with 25, 50, and 100 (m1) neurons, were used. For each of the electrostatic potential embedding networks and the electric field embedding networks, three hidden layers with 5, 10, and 20 (m1 and m1) neurons were used. For the local embedding networks and the electric field embedding networks, the size of the axis filters were chosen to be 4 (m2 and m2). For the fitting networks, three hidden layers, each of which has 240 neurons, were used.

Due to the pairwise nature of the “interactions” between the QM atoms (no distance cutoff for these interactions in our implementation), the formal computational complexity of the model is quadratic with the size of the QM region. While the embedding network for the QM-QM interactions has a cost that scales quadratically, the calculations of the embedding networks for the QM-MM and fitting network scale linearly.

The (standalone) MLP model was trained using the Adam optimizer. During the training, the batch size was set to 32, and the initial learning rate was set to 10−4. Following DeePMD-kit,53 the prefactor for the energy error in the loss function in Eq. 11 was much smaller than the other three at the beginning, and they gradually evolved to the same value towards the end of the training. The learning rate decayed exponentially by a factor of 0.95 every epoch. A total of 100 epochs were performed for each system. A separate ΔMLP model was also trained to reproduce the difference between the PM3*/MM and B3LYP/6–31G*/MM models using the same architecture and hyperparameters as the standalone MLP model.

For each of the two reaction systems (i.e., the solution-phase Menshutkin reaction system and the chorismate mutase reaction system), 40,000 frames were collected from the PM3*/MM simulations (see the next Section for details) and then randomly split into sets of 38,400 and 1,600 samples for training and validation, respectively. The testing set consisted of 2,000 additional configurations sampled along the reaction pathway from the ai-QM/MM simulations. For each sample in the training/validation sets, B3LYP/6–31G*/MM single point calculations were performed. The QM/MM-AC method59 was used to project the MM charges that were more than 10Å away from any QM atom and all the charges from the periodic images onto the MM atomic sites within the 10Å cutoff. Besides the energies and the gradients of the QM atoms, the electrostatic potentials and fields on the inner MM atomic sites were also collected.

3. Simulation Details

3.A. System Setup and Equilibration

In this work, we started with the Menshutkin reaction, a widely-used model system for solution chemistry, for developing the protocol of training the MLP and ΔMLP. Then the protocol is applied to the chorismate mutase reaction, which is a popular system to test new enzyme simulation methods because of its small QM region (24 QM atoms) and because the QM region is not covalently linked to the MM region.

For the Menshutkin reaction (Figure 2a), the solutes (ammonia and chloromethane) were solvated in a cubic box of 723 TIP3P65 waters, and modeled by the general Amber force field (GAFF).66 For the chorismate mutase reaction (Figure 2b), the initial structure was prepared based on the X-ray crystal structure (PDB ID: 2CHT67) of Bacillus subtilis chorismate mutase complexed with a transition state analog, which was modified to the substrate chorismate manually, and solvated in a cubic box of 13,067 TIP3P65 waters. 8 sodium counter ions were added to neutralize the system. The substrate and the enzyme were modeled using GAFF66 and the AMBER ff14SB force field,68 respectively.

Figure 2:

Figure 2:

Schemes for (a) Menshutkin and (b) chorismate mutase reactions.

Both systems were equilibrated at 300 K and 1 atm using Langevin dynamics with a friction coefficient of 5ps−1 and Berendsen barostat with a relaxation time of 1 ps under the periodic boundary conditions. The solutes were restrained to their initial positions by a weak harmonic potential for the Menshutkin reaction, whereas no restraints were applied for the enzymatic system. The particle mesh Ewald (PME) summation method63,64 was used to treat the electrostatic interactions, while the van der Waals interactions were turned off to zero smoothly at a cutoff of 10 Å. The SHAKE algorithm69 was used to constrain all bonds involving hydrogen atoms, and a time step of 2 fs was used for the MD integration using the leapfrog integrator. After equilibration (500 ps and 17 ns for the Menshutkin and chorismate mutase reactions, respectively), the sizes of the simulation boxes were ~28Å × 28Å × 28Å and ~76Å × 76Å × 76Å for the Menshutkin and chorismate mutase reactions, respectively, which were used for the subsequent simulations in the NVT ensemble. The classical MD simulations were performed using the PMEMD program from the Amber20 package.70

3.B. QM/MM Calculations

To simulate the bond breaking/forming process, a QM/MM description of the system was needed. To sample at the target level of theory within a moderate amount of computer time (<500,000 CPU hours), the B3LYP functional7173 and 6–31G* basis set74 (in conjugation with the MM force field) were chosen as the reference ai-QM/MM method for the two test systems in this study. On the other hand, the PM3*/MM model, where the parameters of the standard PM3 method75 were recalibrated through force-matching,23 was used to simulate the reactions under study. We note that despite the same notation, the PM3* model parameters are different between the two reaction systems. For the Menshutkin reaction, the parameters were directly taken from our previous study,23 whereas for the chorismate mutase reaction, an improved version of the parameter set were used (Table S1).

The setup of the QM/MM MD simulations was overall similar to the classical ones. During the QM/MM MD simulations, the solutes and the substrate were described by the QM method, while the rest of the system (solvent or protein) was described by the force fields used in the classical simulations. The QM/MM-AC method59 was used to capture the long-range QM-MM electrostatic interactions. The SHAKE algorithm69 was only applied to the MM subsystem, and the integration time step was set to 1 fs. The QM/MM MD simulations were performed using our QM/MM interface QMHub (https://github.com/panxl/qmhub) and a modified version of the SANDER program from the AmberTools20 package.70 All DFT/MM calculations were performed using Q-Chem 5.2.76

3.C. Umbrella Sampling

To achieve good coverage of all configurations relevant to the reactive process, enhanced sampling MD simulations are needed. Umbrella sampling14 is a method that can enhance the sampling of the system along one or more predefined reaction coordinates by adding harmonic biasing potentials to restrain the system to the region of interest of the reaction coordinate. Typically, a series of restraint centers ξi0 are determined to cover the region of interest of the reaction coordinate. Then, for each restraint center ξi0 (also called window), separate simulations are conducted with the harmonic biasing potential, in the form of,

Ebias,i(ξ)=ki(ξξi0)2, (16)

added to the Hamiltonian of the system, where ki is the force constant of the harmonic potential at window i and ξ refers to the reaction coordinate. In practice, the number and locations of windows and the force constants are determined to ensure sufficient overlapping of the sampled configurations between neighboring windows while minimizing the computational cost. In this study, we also applied the Hamiltonian replica exchange molecular dynamics (HREMD)77 technique to accelerate the convergence of the free energy simulation, in which the exchanges of the biasing potentials between the neighboring windows were attempted at a fixed interval of steps set to 100 fs.

To generate the training/validation sets that cover all the relevant configuration space for the reactions, the above-mentioned umbrella sampling technique was used to collect configurations along the predefined reaction coordinates, using the PM3*/MM potential. For both reaction systems, the reaction coordinates were defined as ξ = dbreakdform, where dbreak and dform were the bond lengths of the breaking and forming bonds, respectively. Specifically, ξ = dC–CldC–N for Menshutkin reaction and dC–OdC–C for chorismate mutase reaction (see Figure 2). For each reaction system, a total of 80 windows were evenly distributed with an interval of 0.05Å to cover ξ ranged from −1.975 to 1.975 Å, and the force constants were set to be 150 kcal/mol/Å2 for all the windows. For each window, 50 ps simulation was conducted, and the configurations were saved every 0.1 ps, which resulted in 500 configurations. Overall, 40,000 frames were collected for each reaction system, and for each configuration saved, the B3LYP/6–31G*/MM calculation was performed to generate the reference data.

After the ML model was trained, the same umbrella sampling simulations were performed where PM3*/MM was replaced by the MLP/MM or PM3*+ΔMLP/MM model. Multistate Bennett acceptance ratio (MBAR)78 method as implemented in the pymbar package (https://github.com/choderalab/pymbar) was used to compute the free energy profile.

4. Results and Discussion

4.A. Energy Conservation During Microcanonical Ensemble Simulations

As a validation of the developed MLP and ΔMLP models, especially, the conservation of the total energy, we first performed 100 ps microcanonical ensemble (NVE) MD simulations for the reactant of the chorismate mutase reaction using the PM3*/MM, MLP/MM, and PM3*+ΔMLP/MM models. The results are presented in Figure 3. Overall, a good conservation of energy was observed, with the energy drifts being 3.96 ± 0.08 × 10−3 kcal/mol/ps, −1.81 ± 0.08 × 10−3 kcal/mol/ps, and 4.17 ± 0.08 × 10−3 kcal/mol/ps for the PM3*/MM, MLP/MM, and PM3*+ΔMLP/MM models, respectively. In addition, we compared the initial 500 fs trajectories with an NVE trajectory using B3LYP/6–31G*/MM under the same initial conditions (Figure S2). It showed that MLP and PM3*+ΔMLP models diverged less from the DFT/MM than the PM3* model.

Figure 3:

Figure 3:

Conservation of the total energy in 100ps NVE simulations of the chorismate mutase reaction using A) PM3*, B) MLP, and C) PM3*+ΔMLP models. In each figure, the line shown in orange indicates the drift of energy (see the values mentioned in the main text for each model).

4.B. Overall Free Energy Results

The overall free energy results for the two reactive systems studied are summarized in Table 1. For the aqueous Menshutkin reaction, the B3LYP/MM free energy barrier was calculated to be 15.5 ± 0.1 kcal/mol and the free energy of reaction was −26.9 ± 0.1 kcal/mol, respectively. Despite extensive reparameterization as outlined in Section 3.B, the PM3*/MM model still substantially overestimated the barrier height with a value of 23.3 ± 0.1 kcal/mol. Similarly, the free energy of reaction was overestimated by 18.6 kcal/mol. In contrast, much improved free energy barriers and reaction free energies were obtained with MLP sampling (16.1 ± 0.1 kcal/mol for barrier and −25.9 ± 0.1 kcal/mol for the reaction free energy) and also with PM3*+ΔMLP sampling (15.1 ± 0.1 kcal/mol for barrier and −27.5 ± 0.1 kcal/mol for the reaction free energy), both within 1.0 kcal/mol from the reference B3LYP values.

Table 1:

Free Energy Barriers and Reaction Free Energies for Menshutkin (MEN) and Chorismate Mutase (CM) Reactionsa

Free Energy Barrier (kcal/mol) Reaction Free Energy (kcal/mol)
PM3* MLPb PM3* + ΔMLP B3LYP PM3* MLPa PM3* + ΔMLP B3LYP
MEN 23.3 ± 0.1 16.1 ± 0.1 15.1 ± 0.1 15.5 ± 0.1 −8.3 ± 0.1 −25.9 ± 0.1 −27.5 ± 0.1 −26.9 ± 0.1
CM 14.9 ± 0.1 13.9 ± 0.1 13.9 ± 0.1 13.6 ± 0.1 −15.5 ± 0.1 −16.3 ± 0.1 −16.6 ± 0.1 −16.8 ± 0.1
a

The numbers after the plus–minus sign show the uncertainties estimated by the MBAR method.

b

For the chorismate mutase reaction, the reported results are from the 2nd iteration MLP.

Similar improvements were found for the chorismate mutase reaction. Both MLP and PM3*+ΔMLP simulations reproduced the barrier height of 13.6 ± 0.1 kcal/mol and the free energy of reaction of −16.8 ± 0.1 kcal/mol, both with errors less than 0.5 kcal/mol from the reference B3LYP values. Considering the length of the sampling for each simulation, the three results can be considered to be essentially the same. On the other hand, while remaining excellent, the PM3* model produced a barrier of 14.9 ± 0.1 kcal/mol and a reaction free energy of −15.5 ± 0.1 kcal/mol, which were only 1.3 kcal/mol higher than their corresponding B3LYP values.

We want to note that our standalone MLP and ΔMLP models have not been fully optimized in terms of their computational cost. With the current set of hyperparameters (as specified in Section 2.E), the MLP sampling of the Menshutkin reaction system took about 800 CPU hours, which was 3-fold higher than the PM3* sampling, but it was still 16 times faster than the B3LYP simulation (Table 2). For PM3*+ΔMLP, since both PM3* and ΔMLP energy/force evaluation were performed for each configuration, its cost increased to around 1,300 CPU hours, which was still 10 times faster than the reference B3LYP simulation. It should be noted that, our MLP and ΔMLP models were implemented using the PyTorch framework which supports both CPU and GPU parallelization. However, the timing reported here was only based on CPU calculations.

Table 2:

CPU Time (in 103h) for the Computation of Free Energy Profiles for Menshutkin (MEN) and Chorismate Mutase (CM) Reactionsa

PM3* MLP PM3* + ΔMLP B3LYP
MEN 0.3 0.8 1.3 12.4
CM 1.5 11.9b 8.2 379.5
a

Umbrella sampling simulations were performed for 50ps per window for 80 windows. Thus, a total of 4ns umbrella sampling MD simulations were performed for each QM/MM model. The time for recalibrating PM3 is not included, which is ~0.6 × 103 h and ~0.9 × 103 h for the Menshutkin and chorismate mutase reactions, respectively.

b

Including time from two iterations. Thus, a total of 100ps for each umbrella sampling window.

For the chorismate mutase system, the MLP sampling took 11,900 CPU hours, which was about 8 times slower than the PM3* sampling. The higher MLP/PM3* timing ratio arose largely from two rounds of MLP training performed to obtain a stable MLP model around the reaction transition state region (see Section 4.D for details). In comparison to the 379,500 CPU hours for the B3LYP sampling, the MLP sampling offered a 32-fold speedup. With PM3*+ΔMLP, only one round of ML model training was needed, which led to a net 8,200 CPU hours for the sampling. This was 46 times faster than the reference B3LYP simulation.

Below we go over each reactive system in more detail, and show how such speedups were obtained.

4.C. Menshutkin Reaction

For Menshutkin reaction, the standard PM3 model (marked red), which is shown in the bottom panels in Fig. 4, deviates substantially from the reference B3LYP values. In terms of the root-mean-square error (RMSE), PM3 energy differed by 6.24 kcal/mol, while the corresponding force error on QM atoms was 12.90 kcal/mol/Å (Table S2). Similarly, the mean-unsigned-error (MUE) for energy was 5.00 kcal/mol and for force was 9.62 kcal/mol/Å. The RMS errors in the QM electrostatic potential and field on MM atom position were substantially smaller at 1.33 kcal/mol/e and 0.95 kcal/mol/Å/e, respectively, and their corresponding MUE values were 0.67 kcal/mol/e and 0.24 kcal/mol/Å/e. The maximum errors (MAXs) were largest for PM3 among the test QM/MM models: 21.55 kcal/mol for energy, 54.32 kcal/mol/Å for force, 38.41 kcal/mol/e for electrostatic potential and 49.39 kcal/mol/Å/e for the field, respectively.

Figure 4:

Figure 4:

Accuracy of MLP (top), PM3*+ΔMLP (middle), PM3 and PM3* (bottom) energy, forces, electrostatic potential (ϕ) and electric field (E) for the 2,000 testing configurations for aqueous Menshutkin reaction. In each figure, the reference values are obtained from the B3LYP/6–31G*/MM calculations. The root-mean-square error (RMSE) value is also shown for each method.

Through our automated reparameterization (which placed a larger weight on the force errors), the PM3* model (marked blue in bottom panels in Figure 4) substantially reduced the force error to 4.42 kcal/mol/Å for RMSE and 3.21 kcal/mol/Å for MUE, respectively. Meanwhile, the PM3* energy error remained at 6.38 kcal/mol for RMSD and 5.49 kcal/mol for MUE, which was evident from Fig. 5A with a broad distribution of the energy difference between the PM3* and B3LYP models.

Figure 5:

Figure 5:

Distribution of high-level (B3LYP/6–31G*) and low-level (PM3*, MLP, and PM3*+ΔMLP energy differences for configuration collected from B3LYP/MM MD trajectories (blue) or low-level MD trajectories (orange) of the aqueous Menshutkin reaction.

When ΔMLP was trained, it was designed to reproduce the difference between the PM3* and B3LYP potentials. As shown in the middle panels in Fig. 4, the energy error of the combined PM3*+ΔMLP model reached 0.78 kcal/mol for RMSE and 0.62 kcal/mol for MUE, which were below 1.0 kcal/mol error and lowest among the models tested. The reduction in the force error was more impressive, reaching a RMSE value of 0.79 kcal/mol/Å and a MUE value of 0.53 kcal/mol/Å. A moderate but more systematic improvement of the ESP and field values on MM atom position could also be noticeable. As a consequence, the distribution of PM3*+ΔMLP and B3LYP energy differences as shown in Fig. 5C became much narrower. The corresponding standard deviation of energy difference was 0.7 kcal/mol, which fell within the 1.7–2.5 kcal/mol threshold for a lower-level theory to accurately reproduce the sampling from a high-level model.7981

Alternatively, MLP can be directly trained to reproduce the B3LYP energy/force as well as the QM ESP and field on MM atom positions. As shown in the top panel of Fig. 4, a standalone MLP can reproduce the B3LYP energy with 1.03 kcal/mol RMSE and 0.79 kcal/mol MUE and the forces with 1.17 kcal/mol/Å RMSE and 0.75 kcal/mol/Å MUE, respectively, which were only slightly worse than PM3*+ΔMLP. Therefore, it led to a narrow distribution of energy differences in Fig. 5B, with a slightly larger standard deviation of 1.5 kcal/mol than that of PM3*+ΔMLP.

Due to their capability to reproduce B3LYP energy and forces, both MLP and PM3*+ΔMLP models led to proper sampling of the pathway (shown in Fig. 6A) and thus accurate reaction free energy profile for the Menshutkin reaction. In addition, we expect that the conformational space sampled by both models would overlap very well with that of B3LYP.

Figure 6:

Figure 6:

(A) Sampled pathway and (B) potential of mean force for the aqueous Menshutkin reaction based on umbrella sampling using PM3*+ΔMLP and MLP potentials in comparison to PM3* and B3LYP/6–31G* results. The pathways are represented by the average bond lengths for each of the 80 windows in the umbrella sampling simulations (shown in Figure S2). The stars show the locations of the transition states on the pathways.

4.D. Chorismate Mutase Catalysis

Similar to the case of Menshutkin reaction in an aqueous solution, the chorismate mutase system was poorly described by the PM3 model, with an energy error of 12.08 kcal/mol and a force error of 13.59 kcal/mol/Å shown in the bottom panel of Fig. 7. A substantial improvement to the PM3 model was achieved through reparameterization, which reduced the energy error to 2.25 kcal/mol and the force error to 4.88 kcal/mol/Å. The improvement of the PM3* model is also noticeable from the comparison of MUE and MAXE values in Table S3.

Figure 7:

Figure 7:

Accuracy of MLP (top), PM3*+ΔMLP (middle), PM3 and PM3* (bottom) energy and forces for the 2,000 testing configurations for the chorismate mutase reaction.

The modeling of the chorismate mutase system was brought within 1.0 kcal/mol with PM3*+ΔMLP, whose RMS energy error was found to be 0.69 kcal/mol and RMS force error was 0.88 kcal/mol/Å. This was also evident in the change of the wide PM3*–B3LYP energy distribution in Fig. 8A to the narrow peaks for ΔMLP-corrected model in Fig. 8D. In addition, we point out the substantial reduction of MAXE for the forces from 33.29 kcal/mol/Å for PM3* to 7.18 kcal/mol/Å for PM3*+ΔMLP, suggesting very robust improvement of the accuracy of the PM3*+ΔMLP model.

Figure 8:

Figure 8:

Distribution of high-level (B3LYP/6–31G*) and low-level (PM3*, MLP, and PM3*+ΔMLP energy differences for configuration collected from B3LYP/MM MD trajectories (blue) or low-level MD trajectories (orange) of chorismate mutase reaction.

With standalone MLP, one round of training using the configuration collected from the PM3*/MM MD trajectory did yield a good agreement with the target B3LYP/MM energy (RMSE: 0.99 kcal/mol and MUE: 0.78 kcal/mol). However, the force error was found to be 1.84 kcal/mol/Å for RMSE and 1.36 kcal/mol/Å for MUE. The MAXE of 14.78 kcal/mol/Å was also noticeably large. With such larger errors in the force, we observed some level of instability of the MD simulations using this MLP, especially near the transition state region. To address this issue, we constructed an expanded pool of configurations combining the configurations used in the first MLP training and the configurations selected from this round of MLP sampling, and performed a second round of MLP training using the expanded configuration pool. Previously, Pu et al. has shown that the iterative parameterization of the se-QM model can systematically improve the accuracy of the (reparameterized) QM model38 and the present second round of MLP training can be considered to be a second iteration of the MLP model training. The resultant MLP model, referred here to as MLP(2nd), further reduced the RMS force error to 1.10 kcal/mol/Å (and MUE to 0.81 kcal/mol/Å), which was accompanied by a RMS energy error of 0.73 kcal/mol and MUE of 0.58 kcal/mol, respectively. More impressively, the MAX force error was reduced to 8.32 kcal/mol/Å, which was only 1.1 kcal/mol/Å larger than the MAX force error of PM3*+ΔMLP. The MLP(2nd) model also demonstrated a much narrower MLP(2nd)–B3LYP energy distribution than the one for MLP(1st)–B3LYP one as shown in Fig. 8.

These MLP(2nd) and PM3*+ΔMLP models yielded an accurate sampling of the reaction pathway shown in Fig. 9A. This is in contrast to the PM3* model, whose sampling was shown in Fig. 8d of Ref. 23 to deviate substantially from the B3LYP pathway between the transition and the product. As a result of the accurate energetics and sampling, both MLP(2nd) and PM3*+ΔMLP models produced free energy profiles of the chorismate mutase reaction that matched the B3LYP profile in Fig. 9B.

Figure 9:

Figure 9:

(A) Sampled pathway and (B) potential of mean force for the chorismate mutase reaction based on umbrella sampling using PM3*, PM3*+ΔMLP and MLP (the 2nd iteration) potentials in comparison to B3LYP/6–31G* results. The pathways are represented by the average bond lengths for each of the 80 windows in the umbrella sampling simulations (in Figure S3). The stars show the locations of the transition states on the pathways.

5. Conclusions

Inspired by the works from Yang, York, Gastegger, Riniker and coworkers, we explored the construction of system-specific machine-learning potentials (MLPs) to accelerate the generation of ab initio-quality QM/MM free energy profiles of enzymatic reactions. A couple of advances were made in this work:

  • The deep-potential of E and coworkers (i.e., DeepPot-SE) was combined with our QM/MM-AC electrostatic model to provide descriptors for a QM reactive region embedded in an enzyme or solution environment (described by MM point charges).

  • A reparameterized semi-empirical quantum model, PM3*, was utilized to generate the configurations for training the atomistic machine-learning potentials. The deep-potential and QM/MM-AC descriptors of these configurations were then used to train the embedding and fitting neural networks to provide MLPs or ΔMLPs to propagate MD trajectories of condensed-phase reactive systems.

  • The ΔMLP model, when combined with the PM3* model in umbrella simulations, produced accurate reaction free energies and barrier heights of two model reactions (aqueous Menshutkin reaction and chorismate mutase reaction) within 1.0 kcal/mol. A 10–fold or 46–fold reduction in CPU time, respectively, was achieved in comparison to reference B3LYP/6–31G*/MM free energy simulations.

  • The standalone MLP, which sought to directly reproduce the B3LYP energy/force on QM atoms and fitted electrostatic potentials and fields on MM atoms, could reproduce the aqueous Menshutkin free energy profile with a 16-fold CPU timing saving. For the chorismate mutase reaction, however, two rounds of MLP training was found to be necessary to ensure stable MD trajectories. With a larger QM region in chorismate mutase reaction, the timing saving was 32–fold.

On the other hand, several limitations need to be addressed in the future:

  • The hyperparameters for our MLP and ΔMLP (which were fixed in this work) are needed to be systematically explored to find an optimal set of hyperparameters for each machine learning potential for efficiency and accuracy.

  • In our current protocol, the samples from one or more iterations were all labelled using the high-level method. Active learning or concurrent learning44,55 as well as a more robust data clustering algorithm can be adopted to reduce the cost of both data labelling and training.

  • Our protocol for reparameterizing semi-empirical QM models and for training MLPs needs to be tested on typical enzyme reactions, where the computational savings are expected to be even more substantial with a larger QM region.

  • Given the generality of our QM/MM electrostatics treatment, a MLP trained with the wild-type enzyme should be applicable to the mutants. The transferability of the trained MLPs has yet to be tested and verified.

  • More accurate ab initio QM/MM models beyond B3LYP/6–31G*/MM should be adopted as the labelling method for our MLP and ΔMLP training.

Work in these directions are currently being pursued.

Supplementary Material

supporting information

Acknowledgement

We thank Drs. Linfeng Zhang, Han Wang, and Mr. Jinzhe Zeng for helpful discussions. This work was supported by the National Institutes of Health through grant R43GM133270 to EE, KN, and YS. JP and YS are also supported by the National Institutes of Health (grant: R01GM135392). YS also thanks the support from the Oklahoma Center for the Advancement of Science and Technology (grant: HR18-130), and the Office of the Vice President of Research and the College of Art and Sciences at the University of Oklahoma (OU). KN is also supported by the National Institutes of Health (grants: R01GM132481 and R01GM138472). YM is supported by the National Natural Science Foundation of China (Grant No. 22073030). The authors thank the OU Supercomputing Center for Education & Research (OSCER) for the computational resources.

Footnotes

Supporting Information Available

The following files are available free of charge.
  • A brief introduction of the improved semi-empirical QM recalibration protocol, recalibrated PM3 parameters for the chorismate mutase reaction (Table S1), errors with respect to B3LYP/6–31G*/MM for the Menshutkin (Table S2) and the chorismate mutase (Table S3) reactions, the total energy during the initial 500 fs of the NVE simulations (Figure S1), and samples in each window of the umbrella sampling simulations for the Menshutkin (Figure S2) and chorismate mutase (Figure S3) reactions.

References

  • (1).Gao J; Ma S; Major DT; Nam K; Pu J; Truhlar DG Mechanisms and Free Energies of Enzymatic Reactions. Chem. Rev 2006, 106, 3188–3209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Warshel A Computer Simulations of Enzyme Catalysis: Methods, Progress, and Insights. Ann. Rev. Biophys. Biomol. Struct 2003, 32, 425–443. [DOI] [PubMed] [Google Scholar]
  • (3).Klähn M; Braun-Sand S; Rosta E; Warshel A On Possible Pitfalls in ab Initio Quantum Mechanics/Molecular Mechanics Minimization Approaches for Studies of Enzymatic Reactions. J. Phys. Chem. B 2005, 109, 15645–15650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (4).Lin H; Truhlar DG QM/MM: What Have We Learned, Where Are We, and Where Do We Go From Here? Theor. Chem. Acc 2007, 117, 185–199. [Google Scholar]
  • (5).Hu H; Yang W Free Energies of Chemical Reactions in Solution and in Enzymes with Ab Initio Quantum Mechanics/Molecular Mechanics Methods. Ann. Rev. Phys. Chem 2008, 59, 573–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).Senn HM; Thiel W QM/MM Studies of Enzymes. Curr. Op. Chem. Biol 2007, 11, 182–187. [DOI] [PubMed] [Google Scholar]
  • (7).Senn HM; Thiel W QM/MM Methods for Biomolecular Systems. Angew. Chem. Int. Ed 2009, 48, 1198–1229. [DOI] [PubMed] [Google Scholar]
  • (8).Lonsdale R; Ranaghan KE; Mulholland AJ Computational Enzymology. Chem. Commun 2010, 46, 2354. [DOI] [PubMed] [Google Scholar]
  • (9).van der Kamp MW; Mulholland AJ Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 2708–2728. [DOI] [PubMed] [Google Scholar]
  • (10).Masgrau L; Truhlar DG The Importance of Ensemble Averaging in Enzyme Kinetics. Acc. Chem. Res 2015, 48, 431–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Cui Q; Pal T; Xie L Biomolecular QM/MM Simulations: What Are Some of the “Burning Issues”? J. Phys. Chem. B 2021, 125, 689–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Karplus M Development of Multiscale Models for Complex Chemical Systems: From H+H2 to Biomolecules (Nobel Lecture). Angew. Chem. Int. Ed 2014, 53, 9992–10005. [DOI] [PubMed] [Google Scholar]
  • (13).Warshel A Multiscale Modeling of Biological Functions: From Enzymes to Molecular Machines (Nobel Lecture). Angew. Chem. Int. Ed 2014, 53, 10020–10031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Torrie GM; Valleau JP Monte Carlo Free Energy Estimates Using Non-Boltzmann Sampling: Application to the Sub-Critical Lennard-Jones Fluid. Chem. Phys. Lett 1974, 28, 578–581. [Google Scholar]
  • (15).Wang L; Yu X; Hu P; Broyde S; Zhang Y A Water-Mediated and Substrate-Assisted Catalytic Mechanism for Sulfolobus solfataricus DNA Polymerase IV. J. Am. Chem. Soc 2007, 129, 4731–4737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Rosta E; Nowotny M; Yang W; Hummer G Catalytic Mechanism of RNA Backbone Cleavage by Ribonuclease H from Quantum Mechanics/Molecular Mechanics Simulations. J. Am. Chem. Soc 2011, 133, 8934–8941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).Wong K-Y; Gu H; Zhang S; Piccirilli JA; Harris ME; York DM Characterization of the Reaction Path and Transition States for RNA Transphosphorylation Models from Theory and Experiment. Angew. Chem. Int. Ed 2012, 51, 647–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Ganguly A; Thaplyal P; Rosta E; Bevilacqua PC; Hammes-Schiffer S Quantum Mechanical/Molecular Mechanical Free Energy Simulations of the Self-Cleavage Reaction in the Hepatitis Delta Virus Ribozyme. J. Am. Chem. Soc 2014, 136, 1483–1496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Gao J Absolute Free Energy of Solvation From Monte Carlo Simulations Using Combined Quantum and Molecular Mechanical Potentials. J. Phys. Chem 1992, 96, 537–540. [Google Scholar]
  • (20).Muller RP; Warshel A Ab Initio Calculations of Free Energy Barriers for Chemical Reactions in Solution. J. Phys. Chem 1995, 99, 17516–17524. [PubMed] [Google Scholar]
  • (21).Heimdal J; Ryde U Convergence of QM/MM free-energy perturbations based on molecular-mechanics or semiempirical simulations. Phys. Chem. Chem. Phys 2012, 14, 12592. [DOI] [PubMed] [Google Scholar]
  • (22).Li P; Jia X; Pan X; Shao Y; Mei Y Accelerated Computation of Free Energy Profile at ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semi-Empirical Reference Potential. I. Weighted Thermodynamics Perturbation. J. Chem. Theory Comput 2018, 14, 5583–5596. [DOI] [PubMed] [Google Scholar]
  • (23).Pan X; Li P; Ho J; Pu J; Mei Y; Shao Y Accelerated Computation of Free Energy Profile at ab initio Quantum Mechanical/Molecular Mechanical Accuracy via a Semi-Empirical Reference Potential. II. Recalibrating Semi-Empirical Parameters with Force Matching. Phys. Chem. Chem. Phys 2019, 21, 20595–20605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).Hu W; Li P; Wang J-N; Xue Y; Mo Y; Zheng J; Pan X; Shao Y; Mei Y Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 3. Gaussian Smoothing on Density-of-States. J. Chem. Theory Comput 2020, 16, 6814–6822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Wang J-N; Liu W; Li P; Mo Y; Hu W; Zheng J; Pan X; Shao Y; Mei Y Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM. J. Chem. Theory Comput 2021, 17, 1318–1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Ruiz-Pernía JJ; Silla E; Tuñón I; Martí S; Moliner V Hybrid QM/MM Potentials of Mean Force with Interpolated Corrections. J. Phys. Chem. B 2004, 108, 8427–8433. [Google Scholar]
  • (27).Martí S; Moliner V; Tuñón I Improving the QM/MM Description of Chemical Processes: A Dual Level Strategy To Explore the Potential Energy Surface in Very Large Systems. J. Chem. Theory Comput 2005, 1, 1008–1016. [DOI] [PubMed] [Google Scholar]
  • (28).Tuckerman M; Berne BJ; Martyna GJ Reversible Multiple Time Scale Molecular Dynamics. J. Chem. Phys 1992, 97, 1990–2001. [Google Scholar]
  • (29).Martyna GJ; Tuckerman ME; Tobias DJ; Klein ML Explicit Reversible Integrators for Extended Systems Dynamics. Mol. Phys 1996, 87, 1117–1157. [Google Scholar]
  • (30).Leimkuhler B; Margul DT; Tuckerman ME Stochastic, Resonance-Free Multiple Time-Step Algorithm for Molecular Dynamics with Very Large Time Steps. Mol. Phys 2013, 111, 3579–3594. [Google Scholar]
  • (31).Margul DT; Tuckerman ME A Stochastic, Resonance-Free Multiple Time-Step Algorithm for Polarizable Models That Permits Very Large Time Steps. J. Chem. Theory Comput 2016, 12, 2170–2180. [DOI] [PubMed] [Google Scholar]
  • (32).Barth E; Schlick T Overcoming Stability Limitations in Biomolecular Dynamics. I. Combining Force Splitting via Extrapolation with Langevin Dynamics in LN. J. Chem. Phys 1998, 109, 1617–1632. [Google Scholar]
  • (33).Barth E; Schlick T Extrapolation Versus Impulse in Multiple-Timestepping Schemes. II. Linear Analysis and Applications to Newtonian and Langevin Dynamics. J. Chem. Phys 1998, 109, 1633–1642. [Google Scholar]
  • (34).Nam K Acceleration of Ab Initio QM/MM Calculations under Periodic Boundary Conditions by Multiscale and Multiple Time Step Approaches. J. Chem. Theory Comput 2014, 10, 4175–4183. [DOI] [PubMed] [Google Scholar]
  • (35).Chen Y; Kale S; Weare J; Dinner AR; Roux B Multiple Time-Step Dual-Hamiltonian Hybrid Molecular Dynamics – Monte Carlo Canonical Propagation Algorithm. J. Chem. Theory Comput 2016, 12, 1449–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Liberatore E; Meli R; Rothlisberger U A Versatile Multiple Time Step Scheme for Efficient ab Initio Molecular Dynamics Simulations. J. Chem. Theory Comput 2018, 14, 2834–2842. [DOI] [PubMed] [Google Scholar]
  • (37).Pan X; Epifanovsky E; Liu J; Pu J; Nam K; Shao Y Accelerating ab initio QM/MM Molecular Dynamics Simulations With Multiple TimeStep Integration and a Recalibrated Semi-empirical QM/MM Hamiltonian To Be Submitted [DOI] [PMC free article] [PubMed]
  • (38).Zhou Y; Pu J Reaction Path Force Matching: A New Strategy of Fitting Specific Reaction Parameters for Semiempirical Methods in Combined QM/MM Simulations. J. Chem. Theory Comput 2014, 10, 3038–3054. [DOI] [PubMed] [Google Scholar]
  • (39).Kim B; Snyder R; Nagaraju M; Zhou Y; Ojeda-May P; Keeton S; Hege M; Shao Y; Pu J Reaction Path-Force Matching in Collective Variables: Determining Ab Initio QM/MM Free Energy Profiles by Fitting Mean Force. J. Chem. Theory Comput 2021, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Shen L; Wu J; Yang W Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks. J. Chem. Theory Comput 2016, 12, 4934–4946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Wu J; Shen L; Yang W Internal Force Corrections with Machine Learning for Quantum Mechanics/Molecular Mechanics Simulations. J. Chem. Phys 2017, 147, 161732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Shen L; Yang W Molecular Dynamics Simulations with Quantum Mechanics/Molecular Mechanics and Adaptive Neural Networks. J. Chem. Theory Comput 2018, 14, 1442–1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Gastegger M; Schütt KT; Müller K-R Machine learning of solvent effects on molecular spectra and reactions 2020. [DOI] [PMC free article] [PubMed]
  • (44).Zeng J; Giese TJ; Ekesan S; York DM Development of Range-Corrected Deep Learning Potentials for Fast, Accurate Quantum Mechanical/molecular Mechanical Simulations of Chemical Reactions in Solution; preprint, 2021. [DOI] [PMC free article] [PubMed]
  • (45).Böselt L; Thürlemann M; Riniker S Machine Learning in QM/MM Molecular Dynamics Simulations of Condensed-Phase Systems. J. Chem. Theory Comput 2021, 17, 2641–2658. [DOI] [PubMed] [Google Scholar]
  • (46).Behler J; Parrinello M Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett 2007, 98, 146401. [DOI] [PubMed] [Google Scholar]
  • (47).Behler J Atom-Centered Symmetry Functions for Constructing High-Dimensional Neural Network Potentials. J. Chem. Phys 2011, 134, 074106. [DOI] [PubMed] [Google Scholar]
  • (48).Behler J Constructing High-Dimensional Neural Network Potentials: A Tutorial Review. Int. J. Quantum Chem 2015, 115, 1032–1050. [Google Scholar]
  • (49).Rupp M; Tkatchenko A; Müller K-R; von Lilienfeld OA Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett 2012, 108, 058301. [DOI] [PubMed] [Google Scholar]
  • (50).Faber FA; Christensen AS; Huang B; von Lilienfeld OA Alchemical and Structural Distribution Based Representation for Universal Quantum Machine Learning. J. Chem. Phys 2018, 148, 241717. [DOI] [PubMed] [Google Scholar]
  • (51).Huo H; Rupp M Unified Representation of Molecules and Crystals for Machine Learning. arXiv preprint 2018, arXiv:1704.06439.
  • (52).Zhang L; Han J; Wang H; Saidi W; Car R; E, W. End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems. Advances in Neural Information Processing Systems 2018. [Google Scholar]
  • (53).Wang H; Zhang L; Han J; E W DeePMD-kit: A Deep Learning Package for Many-Body Potential Energy Representation and Molecular Dynamics. Comput. Phys. Commun 2018, 228, 178–184. [Google Scholar]
  • (54).Unke OT; Meuwly M PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. J. Chem. Theory Comput 2019, 15, 3678–3693. [DOI] [PubMed] [Google Scholar]
  • (55).Zhang Y; Wang H; Chen W; Zeng J; Zhang L; Wang H; E W DP-GEN: A Concurrent Learning Platform for the Generation of Reliable Deep Learning Based Potential Energy Models. Comput. Phys. Commun 2020, 253, 107206. [Google Scholar]
  • (56).Kim B; Shao Y; Pu J Doubly Polarized QM/MM with Machine Learning Chaperone Polarizability Submitted for Publication [DOI] [PMC free article] [PubMed]
  • (57).York DM; Karplus M A Smooth Solvation Potential Based on the Conductor-Like Screening Model. J. Phys. Chem. A 1999, 103, 11060–11079. [Google Scholar]
  • (58).Lange AW; Herbert JM A Smooth, Nonsingular, and Faithful Discretization Scheme for Polarizable Continuum Models: The Switching/Gaussian Approach. J. Chem. Phys 2010, 133, 244111. [DOI] [PubMed] [Google Scholar]
  • (59).Pan X; Nam K; Epifanovsky E; Simmonett AC; Rosta E; Shao Y A Simplified Charge Projection Scheme for Long-Range Electrostatics in ab initio QM/MM Calculations. J. Chem. Phys 2021, 154, 024115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (60).Riccardi D; Li G; Cui Q Importance of van der Waals Interactions in QM/MM Simulations. J. Phys. Chem. B 2004, 108, 6467–6478. [DOI] [PubMed] [Google Scholar]
  • (61).Riccardi D; Schaefer P; Yang; Yu H; Ghosh N; PratResina X; König P; Li G; Xu D; Guo H; Elstner M; Cui Q Development of Effective Quantum Mechanical/Molecular Mechanical (QM/MM) Methods for Complex Biological Processes. J. Phys. Chem. B 2006, 110, 6458–6469. [DOI] [PubMed] [Google Scholar]
  • (62).Freindorf M; Shao Y; Furlani TR; Kong J Lennard-Jones Parameters for the Combined QM/MM Method Using the B3LYP/6–31G*/AMBER Potential. J. Comput. Chem 2005, 26, 1270–1278. [DOI] [PubMed] [Google Scholar]
  • (63).Darden T; York D; Pedersen L Particle Mesh Ewald: AnN·log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys 1993, 98, 10089–10092. [Google Scholar]
  • (64).Essmann U; Perera L; Berkowitz ML; Darden T; Lee H; Pedersen LG A Smooth Particle Mesh Ewald Method. J. Chem. Phys 1995, 103, 8577–8593. [Google Scholar]
  • (65).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys 1983, 79, 926–935. [Google Scholar]
  • (66).Wang J; Wolf RM; Caldwell JW; Kollman PA; Case DA Development and Testing of a General Amber Force Field. J. Comput. Chem 2004, 25, 1157–1174. [DOI] [PubMed] [Google Scholar]
  • (67).Chook YM; Ke H; Lipscomb WN Crystal Structures of the Monofunctional Chorismate Mutase from Bacillus Subtilis and Its Complex with a Transition State Analog. Proc. Natl. Acad. Sci 1993, 90, 8600–8603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (68).Maier JA; Martinez C; Kasavajhala K; Wickstrom L; Hauser KE; Simmerling C ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput 2015, 11, 3696–3713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (69).Ryckaert J-P; Ciccotti G; Berendsen HJ Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J. Comput. Phys 1977, 23, 327–341. [Google Scholar]
  • (70).Case DA; Belfon K; Ben-Shalom IY; Brozell SR; Cerutti DS; Cheatham TE III,; Cruzeiro VWD; Darden TA; Duke RE; Giambasu G; Gilson MK; Gohlke H; Goetz AW; Harris R; Izadi S; Izmailov SA; Kasavajhala K; Kovalenko A; Krasny R; Kurtzman T; Lee TS; LeGrand S; Li P; Lin C; Liu J; Luchko T; Luo R; Man V; Merz KM; Miao Y; Mikhailovskii O; Monard G; Nguyen H; Onufriev A; Pan F; Pantano S; Qi R; Roe DR; Roitberg A; Sagui C; Schott-Verdugo S; Shen J; Simmerling C; Skrynnikov NR; Smith J; Swails J; Walker RC; Wang J; Wilson L; Wolf RM; Wu X; Xiong Y; Xue Y; York DM; Kollman PA AMBER 2020, University of California, San Francisco. 2020. [Google Scholar]
  • (71).Becke AD Density-Functional Exchange-Energy Approximation with Correct Asymptotic Behavior. Phys. Rev. A 1988, 38, 3098–3100. [DOI] [PubMed] [Google Scholar]
  • (72).Becke AD A New Mixing of Hartree–Fock and Local Density-Functional Theories. J. Chem. Phys 1993, 98, 1372–1377. [Google Scholar]
  • (73).Lee C; Yang W; Parr RG Development of the Colle-Salvetti Correlation-Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785–789. [DOI] [PubMed] [Google Scholar]
  • (74).Hariharan PC; Pople JA The Influence of Polarization Functions on Molecular Orbital Hydrogenation Energies. Theor. Chim. Acta 1973, 28, 213–222. [Google Scholar]
  • (75).Stewart JJP Optimization of Parameters for Semiempirical Methods II. Applications. J. Comput. Chem 1989, 10, 221–264. [Google Scholar]
  • (76).Shao Y; Gan Z; Epifanovsky E; Gilbert AT; Wormit M; Kussmann J; Lange AW; Behn A; Deng J; Feng X; Ghosh D; Goldey M; Horn PR; Jacobson LD; Kaliman I; Khaliullin RZ; Kuś T; Landau A; Liu J; Proynov EI; Rhee YM; Richard RM; Rohrdanz MA; Steele RP; Sundstrom EJ; Woodcock HL; Zimmerman PM; Zuev D; Albrecht B; Alguire E; Austin B; Beran GJO; Bernard YA; Berquist E; Brandhorst K; Bravaya KB; Brown ST; Casanova D; Chang C-M; Chen Y; Chien SH; Closser KD; Crittenden DL; Diedenhofen M; DiStasio RA; Do H; Dutoi AD; Edgar RG; Fatehi S; Fusti-Molnar L; Ghysels A; Golubeva-Zadorozhnaya A; Gomes J; Hanson-Heine MW; Harbach PH; Hauser AW; Hohenstein EG; Holden ZC; Jagau T-C; Ji H; Kaduk B; Khistyaev K; Kim J; Kim J; King RA; Klunzinger P; Kosenkov D; Kowalczyk T; Krauter CM; Lao KU; Laurent AD; Lawler KV; Levchenko SV; Lin CY; Liu F; Livshits E; Lochan RC; Luenser A; Manohar P; Manzer SF; Mao S-P; Mardirossian N; Marenich AV; Maurer SA; Mayhall NJ; Neuscamman E; Oana CM; Olivares-Amaya R; O’Neill DP; Parkhill JA; Perrine TM; Peverati R; Prociuk A; Rehn DR; Rosta E; Russ NJ; Sharada SM; Sharma S; Small DW; Sodt A; Stein T; Stück D; Su Y-C; Thom AJ; Tsuchimochi T; Vanovschi V; Vogt L; Vydrov O; Wang T; Watson MA; Wenzel J; White A; Williams CF; Yang J; Yeganeh S; Yost SR; You Z-Q; Zhang IY; Zhang X; Zhao Y; Brooks BR; Chan GK; Chipman DM; Cramer CJ; Goddard WA; Gordon MS; Hehre WJ; Klamt A; Schaefer HF; Schmidt MW; Sherrill CD; Truhlar DG; Warshel A; Xu X; Aspuru-Guzik A; Baer R; Bell AT; Besley NA; Chai J-D; Dreuw A; Dunietz BD; Furlani TR; Gwaltney SR; Hsu C-P; Jung Y; Kong J; Lambrecht DS; Liang W; Ochsenfeld C; Rassolov VA; Slipchenko LV; Subotnik JE; Voorhis TV; Herbert JM; Krylov AI; Gill PM; Head-Gordon M Advances in Molecular Quantum Chemistry Contained in the Q-Chem 4 Program Package. Mol. Phys 2014, 113, 184–215. [Google Scholar]
  • (77).Fukunishi H; Watanabe O; Takada S On the Hamiltonian Replica Exchange Method for Efficient Sampling of Biomolecular Systems: Application to Protein Structure Prediction. J. Chem. Phys 2002, 116, 9058–9067. [Google Scholar]
  • (78).Shirts MR; Chodera JD Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys 2008, 129, 124105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (79).Pinnick ER; Calderon CE; Rusnak AJ; Wang F Achieving Fast Convergence of ab initio Free Energy Perturbation Calculations with the Adaptive Force-Matching Method. Theor. Chem. Acc 2012, 131. [Google Scholar]
  • (80).Ryde U How Many Conformations Need To Be Sampled To Obtain Converged QM/MM Energies? The Curse of Exponential Averaging. J. Chem. Theory Comput 2017, 13, 5745–5752. [DOI] [PubMed] [Google Scholar]
  • (81).Boresch S; Woodcock HL Convergence of Single-Step Free Energy Perturbation. Mol. Phys 2017, 115, 1200–1213. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supporting information

RESOURCES