Abstract
Meaningful efforts in computer-aided drug design (CADD) need accurate molecular mechanical force fields to quantitatively characterize protein-ligand interactions, ligand hydration free energies and other ligand physical properties. Atomic models of new compounds are commonly generated by analogy from the pre-defined tabulated parameters of a given force field. Two widely used approaches following this strategy are the General Amber Force Field, GAFF, and the CHARMM General Force Field (CGenFF). An important limitation of using pre-tabulated parameter values is that they may be inadequate in the context of a specific molecule. To resolve this issue, we previously introduced the General Automated Atomic Model Parameterization (GAAMP) for automatically generating the parameters of atomic models of small molecules using the results from ab initio quantum mechanical (QM) calculations as target data. The GAAMP protocol uses QM data to optimize the bond, valence angle, and dihedral angle internal parameters, and atomic partial charges. However, because the treatment of van der Waals interactions based on QM is challenging and may often be unreliable, the Lennard-Jones 6-12 parameters are kept unchanged from the initial atom types assignments (GAFF or CGenFF), which limits the accuracy that can be achieved by these models. To address this issue, a new set of van der Waals Lennard-Jones 6-12 parameters was systematically optimized to reproduce experimental neat liquid densities and enthalpies of vaporization for a large set of 430 compounds covering a wide range of chemical functionalities. Calculations of the hydration free energy indicate that optimal accuracy for these models is achieved when the molecule-water van der Waals dispersion is rescaled by a factor of 1.115. The final optimized model yields an average unsigned error in the hydration free energies of 0.79 kcal/mol.
Section: Molecular Mechanics, Enthalpy, Liquid density, solvation, free energy, hydration
Graphical abstract
Introduction
Molecular dynamics (MD) simulations based on classical molecular mechanical (MM) force fields are increasingly used to provide atomic-level insights in studies of biological phenomena.1 Force fields aim to represent the quantum-mechanical (QM) Born-Oppenheimer (BO) potential energy surface of molecular systems in terms of physically meaningful components that are modeled via a combination of simple analytic functions with multiple parameters. While force fields that explicitly treat induced polarization are progressing rapidly,2-3 non-polarizable additive potential functions that represent polarization in an average manner with effective fixed partial charges achieve a reasonably accurate representation of the condensed phase. To date, the most widely used additive force fields for MD simulations of biomolecular systems are CHARMM,4-8 AMBER,9 OPLS,10 and GROMOS.11 These force fields were empirically optimized to reproduce a number of calculated QM and experimental properties for basic biological constituents, including proteins, nucleic acids, carbohydrates and lipids.12 As a result, they cover only a fairly restricted set of chemical functionalities and models of additional compounds are typically generated by chemical similarity. One of the challenges encountered in computer-aided drug design (CADD) occurs when compounds that have no close analogs within the existing biomolecular force fields are needed. Meaningful efforts in computer-aided drug design (CADD) required accurate molecular mechanical force fields to quantitatively characterize protein-ligand interactions.13-17 The best way to address this issue is to have an objective algorithmic procedure to automatically parameterize an arbitrary molecule from pre-defined tabulated values in a manner that is consistent with a given force field.13, 18-20
Currently two of the most widely used are the general Amber force field (GAFF);13, 21 and the CHARMM general force field (CGenFF).14, 20 The program Antechamber22 in AmberTools automatically parameterizes small compounds in accord with GAFF;13, 21 atom types and internal parameters (bonds, angles, dihedrals and improper dihedrals) of a given compound are assigned automatically from tabulated values according to an AMBER-consistent classification while atomic charges are fitted to match the results of quantum mechanical (QM) or semi-empirical calculations.23 CGenFF generates CHARMM-consistent force field parameters for small compounds and drug-like molecules according to knowledge-based rules.14, 20, 24 These computational tools represent important advances that greatly broaden the range of chemical systems that can be studied with simulations by enabling an objective and automatic parametrization of novel molecules. More importantly, such procedures avoid the subjective manual adjustments of force field parameters, which ultimately undermine the predictive value of computations based on atomic models. As an extension of these methods aimed at achieving an automatic parametrization for small molecules, we introduced the General Automated Atomic Model Parameterization (GAAMP),25 which was designed to determine force field parameters using ab initio QM calculations as the primary target data. The method has been implemented in a web server (GAAMP, http://gaamp.lcrc.anl.gov/) and can also be setup locally as a script (https://github.com/gaamp). The overall algorithm comprises four main steps. First, GAAMP generates starting atomic models consistent with the atom-type assignments from either GAFF,13, 21 or from CGenFF.14, 20 Second, GAAMP optimizes the geometry using QM and then refines the internal structural parameters (bonds and angles). Third, GAAMP optimizes the atomic partial charges by simultaneously seeking to match the QM electrostatic potential (ESP) as well as compound-water interactions determined from QM calculations. Fourth, GAAMP automatically identifies the “soft” dihedrals (i.e., those with low energy barriers that are most likely to undergo conformational change), and then iteratively refines the associated dihedral parameters on the basis of QM data.25
The resulting GAAMP models retain the internal topology and the van der Waals (i.e. Lennard-Jones (LJ)) parameters from the initial force field (GAFF or CGenFF), but with bond, angle, dihedral, and atomic partial charge parameters re-optimized on the basis of QM data. The bond, angle and dihedral parameters are optimized from QM data with high confidence, and avoid the context-dependent limitations that necessarily arise from tabulated standard values. Similarly, the electrostatic charges of the model can be determined with high confidence by combing ESP9, 26 and water interactions,4 which yields more accurate models.25 The GAAMP protocol makes the best use of QM data to determine all the needed parameters, free from the constraint of tabulated parameters that may be inadequate in the context of a specific molecule. It is important to note, however, that the LJ parameters are directly transferred from the initial atom type assignments of GAFF or CGenFF. By choice, the LJ parameters are not modified by the GAAMP algorithm because an automated treatment of van der Waals interactions based on QM is challenging and may often be unreliable. Although GAFF and CGenFF were certainly developed with great care, this remains one aspect of the final GAAMP model that may be a concern because the properties of solvated systems can be highly sensitive to very small changes in the LJ parameters. The situation severely limits the accuracy that can be achieved by automated parametrization procedures to model arbitrary small drug-like molecules.
The goal of the present work is to derive a set of optimal LJ parameters, which dominate the thermodynamics properties of pure liquids,27 adjusted to reproduce experimental neat liquid densities and enthalpies of vaporization27 and hydration free energies for a large set of more than 400 compounds covering a wide range of chemical functionalities. To date, empirical optimization of LJ parameters targeting experimental data has yielded models that accurately reproduce the targeted set of data though may be limited in their ability to satisfactorily reproduce a wider range of experimental data. There have been a number of attempts to improve the ability of force fields to describe experimental observables, especially in the context of biomolecules. Head-Gordon and coworkers optimized solvent-water van der Waals interactions to reproduce experimental hydration free energies for a set of 47 small molecules that incorporated all of the chemical functionalities of standard protein side chains and backbone groups.28 In the present effort we extend that approach by starting with the known GAFF or CGenFF LJ parameters, which act as a high quality initial guess, and optimize the parameters in a redundant fashion (i.e. multiple molecules contain the same LJ atom type) targeting an extended set of experimental data. This effort involved MD simulations of neat liquids conducted to calculate densities and enthalpies of vaporization for more than 400 compounds with known experimental values. Subsequently, free energy perturbation simulations were carried out to calculate the hydration free energies for a set of 426 compounds.
Method
The functional form of the potential function used in the parametrization is,29
(1) |
With some small variances regarding the internal energy terms, this functional form is essentially the same as that used in the non-polarizable AMBER30 and CHARMM4 force fields, e.g., CHARMM includes Urey-Bradley that are absent from AMBER. The most important difference concerns the 1-4 non-bonded charge-charge interactions, which are scaled by a factor of 0.833 in the AMBER force field and scaled by a factor of 1.0 in the CHARMM force field. The choice of 1-4 non-bonded scaling factor affects potential energies about dihedral angles, such that the associated dihedral parameters must be optimized in a fashion that is consistent with this choice. Unless specified otherwise, the LJ parameters for pairs of atoms i and j are constructed using the Lorentz-Berthelot combination rule,31 and .
The GAAMP algorithm25 was used to generate initial molecular mechanics models based on the Generalized Amber Force field GAFF13, 21 for all the small compounds. Starting from a structure file in protein data bank (pdb) or mol2 formats comprising all atoms, GAAMP parametrization proceeded in three mains steps: (1) verification and adjustment of equilibrium bond length and angle parameters, (2) charge fitting using QM target data including ESP and specific interaction with water molecules, and (3) dihedral parameter fitting using QM target data. Previous results suggest that including compound-water interactions as target data can substantially improve the quality of partial charges derived from RESP when a compound has hydrogen-bond donors/acceptors.25 All the ab initio calculations required by GAAMP were performed with the program Gaussian 09,32 though other QM codes could be utilized. AM1 was used for the pre-optimization for the initial structure before calling Antechamber21 to generate an initial force field based on GAFF13, 21 with AM1-BCC charges.23 In the following, this initial model is referred as “GAFF/AM1-BCC”. For GAAMP models, calculations at the HF/6-31G* level were used for geometry optimization as well as ESP calculation. The interactions between water and the target molecule to parametrize were calculated at the HF/6-31G* level without BSSE, following the recommended prescription as developed in the context of the CHARMM additive force field.5, 14, 20, 24 This involves scaling the target HF/6-31G* interaction energies by 1.16 in the case of neutral species while no scaling (scale factor = 1.0) is used for charged species. Calculation at the HF/6-31G* or MP2/6-31G* level were used to perform adiabatic 1D dihedral potential energy scans (PES), in which a single dihedral angles was constrained with geometry optimization of the remaining degrees of freedom at each step in the PES. The gradient-based optimizer L-BFGS33-34 was used for charge and dihedral parameter optimization. Augmented Lagrangian algorithm35-36 conjugated with L-BFGS33-34 was used for the geometry optimization with constraints on selected soft dihedrals for GAAMP.
To optimize the complete set of LJ well-depth and radii parameters (p1, p2, …, pn) associated with all the atom types, we construct an objective function,
(2) |
where the sum runs over all the molecules m in the training set, and are the calculated and experimental pure solvent molecular (specific) volume for molecule m, respectively, and are the calculated and experimental enthalpy of vaporization for molecule m, respectively, and WV and WΔH are the statistical weight ascribed to these two properties in the objective function. Equal weights were used in the present optimization. The molecular volume is calculated from MD simulations as the average total volume of the liquid box divided by the number of molecules in the system, V = 〈Vbox〉/N, and the enthalpy of vaporization is calculated as,
(3) |
where 〈ugas〉 is the average potential energy of one isolated molecule in the gas phase and 〈uliquid〉 is the average potential energy per molecule in the liquid box. The derivative of the objective function with respect to an arbitrary LJ parameter, pi can be expressed as,
(4) |
where the derivative of a property Q (molecular volume or the potential energy) with respect to the parameter pi can be expressed as,
(5) |
It should be noted that the first term 〈∂Q/∂pi〉 is non-zero only if Q depends explicitly on the parameter pi (i.e., this term is absent for the molecular volume for not for the enthalpy of evaporation).
The optimization of the LJ parameters aims at better reproducing the experimental densities and enthalpies for a large training set of 430 neutral model compounds. The gradient-based optimizer L-BFGS33-34 was used for charge and dihedral parameter optimization while in-house programs to drive the LJ optimization were written in C++, bash shell script and Python. Analytical gradients were used for the L-BFGS optimizer. To determine the average properties of the neat liquid of each compound, a box of comprising roughly 200-300 molecules (volume is roughly 35 × 35 × 35 Å3) was constructed and simulated for 2 ns with MD under periodic boundary condition (PBC). The simulation length was progressively increased to 5 ns in the final stages of optimization to reduce the statistical uncertainty. At every cycle of optimization, the MD simulations were started from the previously equilibrated systems with new LJ parameters. The systems were simulated at constant pressure and temperature using Langevin thermostat and Langevin piston.37 Long-range electrostatic interactions were computed using particle mesh Ewald (PME) summation38-39 with a Ewald splitting parameter 0.34 Å-1, a grid spacing of ∼0.6 Å, and a sixth-order interpolation of the charge to the grid. Non-bonded van der Waals interactions were smoothly switched to zero between 10 and 12 Å and a long-range correction for missing van der Waals dispersion interactions was applied.40 The RATTLE algorithm42 was used to fix the length of those bonds connecting heavy atoms and hydrogen atoms in the compound. All MD simulations were carried out using the program NAMD.43
The initial LJ parameters for all the compounds in the training set were based on GAFF13, 21 while the atomic charges and dihedral parameters were determined using GAAMP25. These initial LJ parameters were then optimized in two distinct stages. In the first stage, the liquid properties for all the compounds of the training set were recalculated at each optimization step using the new set of LJ parameters while the charges and dihedral parameters determined from the initial GAAMP parametrization were kept unchanged. In the second stage, the parameters were further optimized according to the same procedure but the charges and dihedral parameters were periodically re-generated using GAAMP to insure full consistency of the models. A total of 3 full cycle of optimization was carried out on all the atom types in this second stage.
The LJ parameters of almost all the GAFF atom types were optimized, including H, C, N, O, S, F, Cl, and Br;I was not included due to lack of experimental data. A small number of new atom types were added for C, Cl, H, and O when deemed necessary. The parameters of hydrogen atoms were normally optimized together with the heavy atoms to which they are bonded, with the exception of the polar hydroxyl H, which was assigned a zero LJ well depth and radius by convention. A progressive strategy was adopted in which seven main groups of compounds with different chemical functionalities were considered. Group 1 comprises H and C atoms (9 atom types) from nonpolar alkanes (24 compounds), group 2 comprises H and C atoms (2 atom types) in aromatic rings (14 compounds), group 3 comprises additional H and C atoms (6 atom types) in other compounds with only CH elements (32 compounds), group 4 comprises carbon, oxygen and associated polar H atoms (9 atom types) in other compounds with only CHO elements including alcohols, phenols, ketones, aldehydes, and carboxylic acids (148 compounds), group 5 and 6 comprises N and associated polar H atoms (4 atom types) in compounds with only CHN elements (71 compounds), and finally, group 7 comprises S, F, Cl, Br and associated H atoms (4 atom types) in sulfurs and halogen containing compounds (93 compounds). A total of 382 compounds were considered for stage 1 of the optimization. In stage 1, about 10 to 20 iterations of LJ optimization were carried out for each group. No noticeable improvement of the objective function could be achieved with additional iterations beyond 10-20 iterations because the gradients are noisy. An additional subset of about 78 compounds that were not included in the training set was also considered periodically to check the overall consistency and the correctness of the experimental data. Creation of new atom types from the initial GAFF set was deemed necessary when large disparities were observed. The LJ parameters were further optimized according to the same progressive strategy in the second stage, carrying out about 5-10 steps before regenerating the charges and dihedrals via GAAMP. The complete optimization resulted in new LJ parameters for a total of 41 atom types (See file in Supp. Information).
To provide an additional test of set of optimized LJ parameters, we calculated the hydration free energy of 426 compounds using the new GAAMP models with optimized LJ. The absolute solvation free energy of the compounds was calculated and decomposed into three components (repulsive, dispersive and charge term) following an alchemical free energy perturbation (FEP) replica-exchange molecular dynamics (REMD) simulation protocol developed in our group.44-46 The systems were simulated with PBC under conditions of constant pressure and constant temperature with PME summation.38-39 A replica-exchange method45, 47 was used to enhance the sampling to achieve better convergence. The implementation of the alchemical FEP/REMD45 into NAMD48 was utilized for these calculations. The compounds were solvated in a cubic water box of TIP3P water molecules49 with an edge of 20 Å. The CHARMM version of the TIP3P water model was used, in which small LJ potentials are ascribed to the two hydrogens.4 For each value of the thermodynamic coupling parameter, λ, equilibrium properties were averaged over a 500 ps MD simulation after an initial equilibration of 300 ps. Exchanges of neighboring replica were attempted every 200 fs. Weighted histogram analysis method (WHAM)50 was used in data processing. A long-range correction for missing dispersion interactions40 was added to the calculated solvation free energy. The hydration free energies calculated using the standard GAFF/AM1-BCC parameters13, 21, 23 are also included for comparison, relying principally on the values previously reported by Mobley and co-workers51-52. Those hydration free energies calculated using the standard GAFF/AM1-BCC also include a long-range correction for missing dispersion40. All the information about the model compounds and the set of optimized LJ parameters is provided in the Supplementary Information.
Although the current set of LJ parameters shows high accuracy and transferability in reproducing experimental properties of 430 diverse compounds, it is important to consider the performance of the model for compounds independent from the training set. Intuitively, the overall performance of the model ought to be better, on average, for compounds that are similar to the members of the training set. Inaccuracies and uncertainty should increase for compounds that differ markedly from the training set. Atom types that appear frequently in the training are more globally constrained by experimental data than atom types that appear only infrequently. Accordingly, we define a simple empirical similarity fitness score of a compound as,
(6) |
where N is the number of atoms in the compound,αi is the atom type ascribed to the atom i, and W(α) is the normalized statistical frequency the atom type α in the entire training set of 430 compounds. The set of statistical weights W(α) extracted from the training set is shown in Supplementary Figure S1 and also given in Supplementary Information (training_atom_types_weights.xls). The atom types occurring with the highest frequency are hydrogen atom attached to carbon with 3 (HC33) or 2 (HC32) hydrogens, sp2 aromatic carbon (CA), hydrogen on aromatic carbon (HA), sp3 carbon atoms with 2 (C32) or 3 (C33) hydrogens, hydrogen atom attached to carbon in a closed ring (HC3), sp3 carbon in a closed ring (C3R), carbonyl oxygen (O) and carbon (C), ether and ester oxygen (OS), chloride not on aromatic cycle (CL). As expected, the model yields more accurate results when the empirical similarity fitness score S is larger (Supplementary Figure S2). Typically, the error on the molecular volume is generally less than 2-3% for the majority of compounds, but errors larger than 5% are more frequent when the empirical similarity fitness score is S smaller 0.11. The error on the enthalpy of evaporation is generally less than 5% for the majority of compounds when S is larger than 0.11, but errors larger than 10% are considerably more frequent when S is smaller than 0.11. On the basis of this analysis, we now consider the performance of the model for compounds from a validation set independent from the training set. To better illustrate the performance of the model, we deliberately selected 12 compounds with a high value of S and 12 compounds with a low value of S (Supplementary Tables S1-S3). The validation set enables us to compare 10 computed neat liquid densities, 9 computed neat liquid enthalpies of vaporization, and 10 computed hydration free energies with experimental data. Again, the results from this validation set show that the model is reliably more accurate for compounds that display a high degree of similarity to the set of atom types found in the training set. The average error on the molecular volume is 3% for the 5 compounds with S>0.092 but 8% for the 5 compounds with S<0.057 (Supplementary Table S1). The average error on the enthalpy of evaporation is 5% for the 5 compounds with S>0.1 but 11% for the 4 compounds with S<0.062 (Supplementary Table S2). However, the average unsigned error on the hydration free energy is 0.74 kcal/mol for the 5 compounds with S>0.1 and 0.68 kcal/mol for the 5 compounds with S<0.035 (Supplementary Table S3), which suggests that the model is apparently more robust with respect to this property. This is encouraging since the free energy is critical to determine the binding affinity of drugs.
Results and Discussion
The GAAMP25 algorithm was used to generate initial models based on GAFF.13, 21 The dihedral parameters, which are indirectly affected by the 1-4 non-bonded interactions, were kept constant during most of the optimization and re-calculated at each cycle toward the end of the global optimization. However, as these parameters are affected by the 1-4 LJ interactions, they should be re-optimized to match the QM torsion PES every time the LJ parameters are changed. In practice, the impact of the dihedral re-parametrization on the liquid properties is minor. To ensure complete consistency, the dihedral parameters were re-generated in the last iteration of the optimization. A gradient-based optimizer (L-BFGS) was used to systematically optimize the LJ parameters, starting from the GAFF parameters for 41 atom types. The optimization aims at better reproducing the experimental densities and enthalpies for a large training set of 430 model compounds by seeking to minimize the objective function F(p1, p2, …, pn), as expressed in Eq. (2). The target experimental data for the training set of model compounds is given in Supp. Info. As shown in Figure 1, the set of compounds in the training set covers the chemical functionalities and atom types present in drug molecules found in DrugBank.
Global optimization of the LJ parameters was carried out in stages, progressively increasing the chemical complexity of the compounds. For the different groups of compounds described above, about 10-20 iterations were performed, yielding a total of ∼40 iterations. During the optimization, the total error associated with the different atom types was monitored, and new atom types were introduced to better represent specific chemical contexts. In the following we refer to the resulting molecular mechanical force field model as GAAMP/GAFF-LJ*. Moreover, 4 atom types for carbon, hydrogen, oxygen, and chloride from the initial GAFF model were expanded into 11 new atom types, yielding a total of 52 atom types in the final model. One notable example is the carbon atoms C3 of GAFF, which was expanded to 5 different carbon types: C30 (sp3 carbon atoms with 0 bonded hydrogen atom), C31 (sp3 carbon atoms with 1 bonded hydrogen atom), C32 (sp3 carbon atoms with 2 bonded hydrogen atom), C33 (sp3 carbon atoms with 3 bonded hydrogen atom), and C3R (sp3 carbon in cyclic aliphatic molecules such as cyclohexane). The original C3 atom is removed from the final force field. In addition, the GAFF hydrogen atom type HC has been expanded to HC31 (hydrogen atom attached to carbon with 1 hydrogen), HC32 (hydrogen atom attached to carbon with 2 hydrogens), HC33 (hydrogen atom attached to carbon with 3 hydrogens), and HC3R (hydrogen atom attached to carbon in a cyclic aliphatic system); the GAFF atom type OH has been expanded to OH for general hydroxyl, and OHP for hydroxyl on aromatic ring (e.g., phenol); and the GAFF atom type Cl has been expanded to CL as a general chloride, and CLA for chloride on an aromatic ring.
The fitted liquid properties for the 430 compounds are compared with experiment in Figure 2. The average unsigned relative errors for the molecular volume are 2.8% and 1.8% using the parameters from the standard GAFF/AM1-BCC and GAAMP/GAFF-LJ*, respectively. Least-square linear regression for the experimental/simulation molecular volume displays a slope close to unity for GAFF/AM1-BCC (0.9828x+3.532) and GAAMP/GAFF-LJ* (0.9937x+ 0.5142), indicating that both models provide a reasonable picture of packing, though the slightly larger intercept at the origin is indicative of a small systematic mismatch with GAFF/AM1-BCC (about 3.5 Å3). The average unsigned relative error on the enthalpy of vaporization is 17.9% and is 5.9% using the parameters from GAFF/AM1-BCC and GAAMP/GAFF-LJ*, respectively. The experimental-simulation Pearson correlation coefficient is improved from 0.86 with GAFF/AM1-BCC to 0.96 with GAAMP/GAFF-LJ*. Least-square linear regression for the experimental/simulation enthalpy of vaporization is 1.290x–1.968 for GAFF/AM1-BCC, and 1.032x–0.26281 for GAAMP/GAFF-LJ*. GAFF/AM-BCC displays a slope that is considerably larger than unity (1.290) and a large intercept at the origin (1.968), indicative of a systematic bias in the model. In this regard, it is important to note that the GAFF/AM1-BCC neat liquid models were simulated without a long-range correction to account for the missing dispersion interactions. If such a long-range correction were included, the enthalpy of vaporization in Figure 2 would shift up, yielding an even worse agreement with experimental values. In contrast, there is a significant improvement with the GAAMP/GAFF-LJ* model, both with respect to the slope close to unity and the small intercept at the origin. The evolution of the LJ parameters from their initial GAFF values to their final optimized values is displayed in Figure 3. Overall, the LJ parameters resulting from the global optimization remained fairly close to their initial values, with maximum changes on the order of 0.1 kcal/mol and 0.1 Å in Emin and Rmin, respectively. Thus, it was possible to improve the accuracy of the initial model with relatively small changes to the LJ parameters, which highlights the great sensitivity of the results to the non-bonded interactions. Of importance, it can be observed that the specific atom types that were introduced diverged in opposite direction from their initial GAFF value to produce the final LJ parameters (e.g., OH and OHP, CL and CLA).
To provide an additional test of the optimized set of LJ parameters, free energy perturbation simulations were carried out to calculate the hydration free energies for 426 compounds. There have been a number of studies aimed at examining the accuracy of MM force fields by computing the solvation free energy for a large set of compounds 46, 52-55. What distinguishes the present effort with these previous studies is that the calculated solvation free energy here are based on the GAAMP charges and dihedral potentials combined with a set of LJ parameters specifically optimized to fit the properties of the neat liquids. The Pearson correlation coefficients between the calculated hydration free energies and experiments is 0.94 for both GAFF/AM1-BCC and GAMMP/GAFF-LJ* models. The average unsigned error is 1.04 and is 1.47 kcal/mol using the parameters from GAFF/AM1-BCC and GAAMP/GAFF-LJ*, respectively. Least-square linear regression yields 0.8972x-0.9273 for GAFF/AM1-BCC, and 0.9016x+1.158 for GAAMP/GAFF-LJ*. The distribution of deviations of the calculated hydration free energy relative to the experimental values is shown in Figure 4. The GAAMP/GAFF-LJ* calculated hydration free energies are systematically overestimated by about +1.8 kcal/mol relative to the experimental target values. In comparison, the hydration free energies from the GAFF/AM1-BCC model are underestimated by about -1.5 kcal/mol compared to experiment, although the deviations are broader. Thus, while significant improvement with GAAMP/GAFF-LJ* occurs with the neat liquids, there is a small degradation relative to the GAFF/AM1-BCC for the hydration free energies.
Given the substantial improvement of the neat liquid properties with the optimized LJ parameters, this result is somewhat disconcerting. After all, as observed in Figure 3, the agreement of the average properties of the neat liquids with experiment is significantly better with the GAAMP/GAFF-LJ* model compared to the original GAFF/AM1-BCC model. One consideration is that the GAAMP/GAFF-LJ* models are treated with a long-range correction for missing van der Waals dispersion. However, the long-range dispersion correction should make the calculated hydration free energies systematically more favorable, whereas the GAAMP/GAFF-LJ* are not sufficiently favorable. Consistent with this is a recent study that examined the hydration free energy of more than 500 compounds modeled from GAFF/AM1-BCC. That study, which employed a long-range dispersion correction, reported an average error of 0.31 kcal/mol, unsigned error of 1.12 kcal/mol, and a Pearson R value of 0.933. The smaller average error in that study, as compared to the value of 1.5 kcal/mol determined in this work, is consistent with the inclusion of the long-range dispersion correction. However, including a long-range LJ correction would significantly increase the deviation of the enthalpies of evaporation calculated from the GAFF/AM1-BCC model with respect with experiment, as the values are already too favorable (see Figure 2). These results indicate the challenge of obtaining a nonbond model that accurately treats both the neat liquid and aqueous solvation thermodynamic properties, though it is critical to employ a force field model that matches both the properties of neat liquids and solvation free energies. This is particular important because the binding of drug molecules correspond to a transfer from an aqueous environment to the interior of a protein that is more akin to an organic solvent.
Further considerations suggested that the systematic overestimation of the hydration free energies might be due to the underestimation of van der Waals dispersive interactions arising from the water model used in the simulation.56-58 In a study of unfolded and disordered states of proteins, Best et al59 concluded that protein conformations were too collapsed because the nonspecific protein-water dispersion interactions in the context of the TIP3P model were not sufficiently strong. Simply rescaling the Lennard-Jones well depth resulting from the Lorentz-Berthelot combination rule, by a factor of γ=1.10 was found to produce more realistic dimensions of unfolded and intrinsically disordered proteins. Such a mild perturbation is not expected to affect the overall behavior of folded proteins in MD simulations. Similarly, Piana et al57 re-designed the dispersion interactions of the TIP4P water model to producing disordered protein states that more closely agree with experimental measurements and a correction to the CHARMM TIP3P water model on the context of the additive CHARMM36 force field has been presented.58 Following up with these ideas, we determined the optimal rescaling of the van der Waals dispersion interactions between the compounds and water to shift the calculated hydration free energies toward the experimental values. The change in free energy produced by the scaling could be evaluated using free energy perturbation theory based on the unperturbed simulations. As observed in Figure 4, a modest strengthening of the small molecule-water interactions van der Waals interactions by a scaling factor of γ=1.115 is sufficient to remove the systematic deviation between the calculated and measured hydration free energy. Let us denote this model as GAAMP/GAFF-LJ*-γ. Using the latter, the mean error on the solvation free energy of 426 compounds is 0.12 kcal/mol, the average unsigned error is 0.79 kcal/mol, the least-square linear regression is 0.876x-0.202, and the Pearson correlation coefficient with the experimental values is 0.93. Using a similar scaling factor with GAFF/AM1-BCC, the mean error on the solvation free energy of 426 compounds is -0.73 kcal/mol, and the average unsigned error is 1.04 kcal/mol. The smaller impact of the scaling factor on GAFF/AM1-BCC is due to the truncation of the LJ interactions. With the scaled GAAMP/GAFF-LJ*-γmodel, the absolute error on the solvation free energy is less than 1 kcal/mol for 75% of the compounds in the training set. In contrast, only 49% of the compounds satisfy this criterion for the original GAFF model. Interestingly, the errors on the solvation free energy from GAFF/AM1-BCC and GAAMP/GAFF-LJ*-γrelative to the experimental values are fairly uncorrelated (Pearson correlation coefficient of 0.28). This suggests there is no systematic source of inaccuracy common to both model.
The value of the solute-solvent scaling factor of γ=1.115 is remarkably similar to the scaling factor of 1.10 previously proposed by Best et al59 in the context of unfolded proteins. It is noteworthy that both studies represented the solvent water with the TIP3P model49 in its CHARMM version, with small LJ potentials ascribed to the two hydrogens.4 The main rational for rescaling the solute-water dispersion is that this avoids perturbing the properties of the neat liquid. In the present study water oxygen atom LJ well depth was scaled to shift the hydration free energy. It should be noted, however, that this procedure also affects its effective radius based on the LJ repulsive contribution. Consequently, while the scaling makes the overall dispersive energy of a solute-solvent more favorable, it also reduces the hydrogen bonding energy with water very slightly. For example, the hydrogen bonding interaction between a hydroxyl group donor and a water molecule acceptor decreases by 0.15 kcal/mol and the donor-acceptor distance increases by 0.02 Å in the case of a water molecule (acceptor) and a hydroxyl group (donor). Alternatively, one might rebalance the dispersive interaction of a solute with water by scaling up the LJ well depth of the TIP3P hydrogen HT as the radii of the hydrogen is small (Rmin/2 = 0.2245 Å) such that alteration of the well depth is less expected to significantly impact hydrogen bonding.58 For the sake of simplicity, only a scaling of the LJ well depth of the TIP3P oxygen OT was considered here. In practice, scaling of the LJ well depth is easily implemented within the CHARMM parameter file by changing the LJ well depth of the TIP3P oxygen OT from its normal value of -0.1521 to the value of -0.18909, which will affect all interactions of the solutes with water, and then restoring the correct water-water LJ interactions (OT-OT, OT-HT, and HT-HT) via the pair-specific NBFIX option to make them consistent with the CHARMM version of theTIP3P water model. While the solute-solvent scaling factor would yield a more accurate solvation free energy for small compounds, the mild perturbation would not affect the overall behavior of folded proteins in MD simulations.
The present results reflect the inherent limitations of additive non-polarizable force fields with respect to the treatment of nonbond interactions. While optimization of the density and enthalpy of vaporization of neat liquids as well as the hydration free energies could be performed simultaneously, the present results with both GAAMP and GAFF indicate that the quality of the fit to the individual properties (neat vs. aqueous condensed phases) would be significantly sacrificed. This limitation is, in part, due to the additive nature of the force field such that the nonbond model, which includes overestimation of the charges to treat to omission of explicit polarization in the model, cannot satisfy both the neat and aqueous environments. From the present study, the GAAMP optimization of the electrostatic parameters combined with the subsequent LJ optimization yields excellent enthalpies of vaporization for the neat liquids from a large collection of compounds. Assuming that the interior of a protein is akin to an organic liquid, the set of optimized LJ parameters shall yield a more accurate representation of ligand-protein interactions. However, despite the improvement with the neat liquid properties, it was observed that the calculated hydration free energies were not favorable enough. This issue was addressed here by scaling the solute-water dispersion interactions by a factor γ=1.115, though an alternative would be to further overestimate the partial atomic charges (i.e. make the solutes intrinsically more polarized) thereby making the interactions with water more favorable leading to better agreement with the hydration free energies. However, a strategy based on charge rescaling would be challenging because it would require a global optimization of the neat liquid properties together with the hydration free energies. Furthermore, it is unclear if charge rescaling could improve the hydration free energy of nonpolar compounds. Ultimately, extension of force fields to explicitly include electronic polarization may help to overcome this limitation, though efforts in our laboratories with the polarizable Drude force field indicate that limitations in treating the full range of condensed phase environments at a high level of accuracy is still challenging.3, 60
Conclusion
Molecular mechanical force fields are constructed from a combination of simple analytic functions with multiple parameters for the purpose of accurately representing the potential energy surface of the system. Relying on pre-tabulated parameters optimized for a large training set of molecules,14, 20, 24 or determining those parameters from ab initio QM calculations61-63 correspond to two extreme approaches for producing a final molecular mechanical model. Both have advantages and disadvantages. Force fields relying on empirical parameters trained on a very large set of representative compounds display internal robustness, while force fields with parameters determined from ab initio calculations provide the most relevant representation of the specific properties of a given molecule.64-65 In practice, however, both the empirical and the ab initio approaches have their limitations. For instance, transferability can be an issue with a data-driven approaches relying on pre-tabulated parameters when the training set does not fully cover all the relevant chemical space, causing a failure of the force field to accurately represent the Born-Oppenheimer potential energy surface in the context of specific chemical functionalities. On the other hand, the accuracy of QM approaches can vary widely depending on the ab initio level especially in the context of dispersive interactions, which is typically limited by the necessity to remain computationally affordable.
The force field designed here by combining the GAAMP algorithm with optimized LJ parameters represents a compromise between these two extremes. Whenever accurate parameters can be determined consistently and with high confidence from moderate-level QM calculations, then this advantage ought to be exploited as much as possible. For example, the electrostatic charges of the model are determined by combining ESP9, 26 and water interactions.4 A previous investigation indicate that this procedure yields reliably accurate models.25 Similarly, internal energy terms such as the bond, angle and dihedral parameters can be optimized from QM data. Determination of the dihedral parameters from QM is particularly important because it avoids the context-dependent limitations that necessarily arise from tabulated standard dihedral values. In this regards, it is important to understand that the effective torsion potential for a given molecule is affected by both the dihedral parameters ascribed to this torsion but also by the 1-4 van der Waals and electrostatic interactions. Consistency between 1-4 nonbonded interactions and torsion potential may be achieved if all the parameters of a force field model are determined from pre-tabulated values, as in CGenFF. However, internal inconsistencies may arise when the charges are determined via QM but dihedral parameters rely on pre-tabulated values, as in GAFF.13, 21 Generally, it is difficult to achieve sufficient coverage of the chemical space with a training set and pre-tabulated values ultimately become limited. The LJ parameters stand as an exception to this rule. An accurate treatment of van der Waals interactions from ab initio QM calculations, while feasible, requires high-level approaches to be consistently reliable. However, a treatment of van der Waals interactions based on moderate-level QM calculations can be extremely unreliable. While freeing the LJ parameters from the constraint of pre-tabulated values would be useful, this does not seem to be practical at this point. For this reason, it seems more advantageous to use a set of empirically LJ parameters assigned from the chemical context and optimized for a large series of compounds. The atom types assignments could be obtained either from GAFF13, 21 or CGenFF.14, 20 In the present effort, the initial atom types from GAFF were used. To achieve better accuracy, the set of 41 atom types of the GAFF model was expanded to a total of 52 atom types. This remains a relative small number of empirical LJ parameters. Hydration free energy of a large set of compounds did not markedly improve the performance of the GAFF/AM1-BCC model, but a modest strengthening of the solute-water interactions van der Waals interactions by a factor of 1.115 considerably improved the results, with an average unsigned error of 0.79 kcal/mol. This improved GAAMP/GAFF-LJ* force field, which can be fully general and automatized, will enable us to carry out free energy computations of chemical accuracy in a wide range of applications. While the accuracy of the hydration free energy from the final set of optimized LJ parameters (with the solute-solvent re-scaling) is good but imperfect, it is also important to keep in mind that the accuracy of the liquid properties was improved considerably compared to the GAFF/AM1-BCC. As the binding free energy of a compound depend on its free energy in the protein site relative to its free energy in bulk water, the improved liquid properties of a wide range of liquids should hopefully yield a better representation of the interactions of small molecules in a protein environment as well. A similar strategy is currently being used to optimize the LJ parameters for the Drude polarizable force field.
Supplementary Material
Acknowledgments
We gratefully acknowledge the computing resources provided by the Laboratory Computing Resource Center at Argonne National Laboratory and of the Beagle computer. This work is supported by grants R01-GM072558, R01-GM051501and U54-GM087519 from NIH/NIGMS.
Footnotes
Supporting information: The final computed enthalpies and volumes of neat liquids and hydration free energy using GAFF/AM1-BCC and the force field GAAMP/GAFF-LJ*-γwith optimized LJ parameters are provided together with all experimental values in the files “gaamp_gaff_name_vol_hvap.xlsx”, and “solvation_free_enrgy.xlsx”, respectively. All experimental data used in the present work were taken from ref. 52 and the National Institute For Standards and Technology (NIST) Chemistry WebBook (https://webbook.nist.gov/chemistry). The weights for the empirical similarity fitness score of a compound are given in “training_atom_types_weights.xls”. The complete optimized GAAMP force fields for the small molecules (in the form of a RTF and PRM files in CHARMM format) can be downloaded from the github linkhttps://github.com/gaamp/Optimized-Additive-Force-Files-For-Large-Collection-of-Small-Molecules-/blob/master/optimized_ff.tar.gz
References
- 1.Karplus M, McCammon JA. Molecular dynamics simulations of biomolecules. Nat Struct Biol. 2002;9(9):646–52. doi: 10.1038/nsb0902-646. [DOI] [PubMed] [Google Scholar]
- 2.Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, Schnieders MJ, Haque I, Mobley DL, Lambrecht DS, Di Stasio RA, Jr, Head-Gordon M, Clark GN, Johnson ME, Head-Gordon T. Current status of the AMOEBA polarizable force field. J Phys Chem B. 2010;114(8):2549–64. doi: 10.1021/jp910674d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lemkul JA, Huang J, Roux B, MacKerell AD., Jr An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem Rev. 2016;116(9):4983–5013. doi: 10.1021/acs.chemrev.5b00505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mackerell AD, Jr, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-Mc Carthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B. 1998;102(18):3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
- 5.Foloppe N, MacKerell AD. All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data. J Comp Chem. 2000;21(2):86–104. [Google Scholar]
- 6.Mac Kerell AD, Jr, Banavali N. All-Atom Empirical Force Field for Nucleic Acids: 2) Application to Molecular Dynamics Simulations of DNA and RNA in Solution. J Comp Chem. 2000;21:105–120. [Google Scholar]
- 7.Mac Kerell AD, Jr, Feig M, Brooks CL., 3rd Improved treatment of the protein backbone in empirical force fields. J Am Chem Soc. 2004;126:698–699. doi: 10.1021/ja036959e. [DOI] [PubMed] [Google Scholar]
- 8.Klauda JB, Venable RM, Freites JA, O'Connor JW, Tobias DJ, Mondragon-Ramirez C, Vorobyov I, Mac Kerell AD, Jr, Pastor RW. Update of the CHARMM all-atom additive force field for lipids: validation on six lipid types. J Phys Chem B. 2010;114(23):7830–43. doi: 10.1021/jp101759q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wang JM, C P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem. 2000;21(12):1049–1074. [Google Scholar]
- 10.Kaminski GA, F RA, Tirado-Rives J, Jorgensen WL. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J Phys Chem B. 2001;105(28):6474–6487. [Google Scholar]
- 11.Oostenbrink C, V A, Mark AE, Van Gunsteren WF. A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6. J Comput Chem. 2004;25(13):1656–1676. doi: 10.1002/jcc.20090. [DOI] [PubMed] [Google Scholar]
- 12.Mackerell AD., Jr Empirical force fields for biological macromolecules: overview and issues. J Comp Chem. 2004;25(13):1584–604. doi: 10.1002/jcc.20082. [DOI] [PubMed] [Google Scholar]
- 13.Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general amber force field. J Comp Chem. 2004;25(9):1157–74. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- 14.Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I, Mackerell AD., Jr CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J Comp Chem. 2010;31(4):671–90. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chodera JD, Mobley DL, Shirts MR, Dixon RW, Branson K, Pande VS. Alchemical free energy methods for drug discovery: progress and challenges. Curr Opin Struct Biol. 2011;21(2):150–60. doi: 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rocklin GJ, Mobley DL, Dill KA. Calculating the sensitivity and robustness of binding free energy calculations to force field parameters. J Chem Theo Comp. 2013;9(7):3072–3083. doi: 10.1021/ct400315q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Perez A, Morrone JA, Simmerling C, Dill KA. Advances in free-energy-based simulations of protein folding and ligand binding. Curr Opin Struct Biol. 2016;36:25–31. doi: 10.1016/j.sbi.2015.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mohamadi F, R NGJ, Guida WC, Liskamp R, Lipton M, Caufield C, Chang G, Hendrickson T, Still WC. Macromodel - an Integrated Software System for Modeling Organic and Bioorganic Molecules Using Molecular Mechanics. J Comput Chem. 1990;11(14):440–467. [Google Scholar]
- 19.Udier-Blagovic M, MDT P, Pearlman SA, Jorgensen WL. Accuracy of free energies of hydration using CM1 and CM3 atomic charges. J Comput Chem. 2004;25(11):1322–1332. doi: 10.1002/jcc.20059. [DOI] [PubMed] [Google Scholar]
- 20.Vanommeslaeghe K, Raman EP, MacKerell AD., Jr Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J Chem Inf Mod. 2012;52(12):3155–3168. doi: 10.1021/ci3003649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang J, Wang W, A KP, Case DA. Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graphics Mod. 2006;25:247260. doi: 10.1016/j.jmgm.2005.12.005. [DOI] [PubMed] [Google Scholar]
- 22.Wang ZX, Zhang W, Wu C, Lei H, Cieplak P, Duan Y. Strike a balance: optimization of backbone torsion parameters of AMBER polarizable force field for simulations of proteins and peptides. J Comp Chem. 2006;27:781–790. doi: 10.1002/jcc.20386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jakalian A, B BL, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic Charges. AM1-BCC model: I. Method. J Comput Chem. 2000;21(2):132–146. doi: 10.1002/jcc.10128. [DOI] [PubMed] [Google Scholar]
- 24.Vanommeslaeghe K, MacKerell AD., Jr Automation of the CHARMM General Force Field (CGenFF) I: bond perception and atom typing. J Chem Inf Mod. 2012;52(12):3144–54. doi: 10.1021/ci300363c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Huang L, Roux B. Automated Force Field Parameterization for Non-Polarizable and Polarizable Atomic Models Based on Target Data. J Chem Theo Comp. 2013;9(8) doi: 10.1021/ct4003477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bayly C, Cieplak P, Clornell W, Kollman P. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges. J Phys Chem. 1993;97:10269–?. [Google Scholar]
- 27.Mackerell AD, Karplus M. Importance of Attractive Van der waals Contribution in Empirical Energy Function Models for the Heat of Vaporization of Polar Liquids. J Phys Chem. 1991;95(26):10559–10560. [Google Scholar]
- 28.Nerenberg PS, Jo B, So C, Tripathy A, Head-Gordon T. Optimizing solute-water van der Waals interactions to reproduce solvation free energies. J Phys Chem B. 2012;116(15):4524–4534. doi: 10.1021/jp2118373. [DOI] [PubMed] [Google Scholar]
- 29.Brooks BR, Brooks CL, 3rd, Mackerell AD, Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M. CHARMM: the biomolecular simulation program. J Comp Chem. 2009;30(10):1545–614. doi: 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cornell WD, Cieplak P, Bayly CI, Gould IR, M KM, Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. A second generation force field for the simulation of proteins and nucleic acids. J Am Chem Soc. 1995;117:5179–5197. [Google Scholar]
- 31.Allen MP, Tildesley DJ. Computer Simulation of Liquids. Oxford Science Publications, Clarendon Press; Oxford: 1989. [Google Scholar]
- 32.Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Mennucci B, Petersson GA, Nakatsuji H, Caricato M, Li X, Hratchian HP, Izmaylov AF, Bloino J, Zheng G, Sonnenberg JL, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Montgomery JJA, Peralta JE, Ogliaro F, Bearpark M, Heyd JJ, Brothers E, Kudin KN, Staroverov VN, Kobayashi R, Normand J, Raghavachari K, Rendell A, Burant JC, Iyengar SS, Tomasi J, Cossi M, Rega N, Millam NJ, Klene M, Knox JE, Cross JB, Bakken V, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Martin RL, Morokuma K, Zakrzewski VG, Voth GA, Salvador P, Dannenberg JJ, Dapprich S, Daniels AD, Farkas Ö, Foresman JB, Ortiz JV, Cioslowski J, Fox DJ. Gaussian 09. Gaussian, Inc; Wallingford CT: 2009. [Google Scholar]
- 33.Nocedal J. Updating Quasi-Newton Matrices with Limited Storage. Math Comput. 1980;35(151):773–782. [Google Scholar]
- 34.Liu DC, N J. On the Limited Memory Bfgs Method for Large-Scale Optimization. Math Program. 1989;45(3):503–528. [Google Scholar]
- 35.Conn AR, G NIM, Toint PL. A Globally Convergent Augmented Lagrangian Algorithm for Optimization with General Constraints and Simple Bounds. SIAM J Numer Anal. 1991;28(2):545–572. [Google Scholar]
- 36.Birgin EG, M JM. Improving ultimate convergence of an augmented Lagrangian method. Optim Method Softw. 2008;23(2):177–195. [Google Scholar]
- 37.Feller S, Zhang Y, Pastor R, Brooks B. Constant pressure molecular dynamics simulation - the Langevin piston method. J Chem Phys. 1995;103:4613–4621. [Google Scholar]
- 38.Darden T, Y D, Pedersen L. Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. J Chem Phys. 1993;98(12):10089–10092. [Google Scholar]
- 39.Essmann U, Perera L, Berkowitz M, Darden T, Lee H, Pedersen L. A smooth particle mesh Ewald method. J Chem Phys. 1995;103(8577-8593) [Google Scholar]
- 40.Shirts MR, Mobley DL, Chodera JD, Pande VS. Accurate and efficient corrections for missing dispersion interactions in molecular Simulations. J Phys Chem B. 2007;111(45):13052–13063. doi: 10.1021/jp0735987. [DOI] [PubMed] [Google Scholar]
- 41.Miyamoto S, Kollman PA. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. J Comp Chem. 1992;13(8):952–962. [Google Scholar]
- 42.Andersen HC. Rattle - a Velocity Version of the Shake Algorithm for Molecular-Dynamics Calculations. J Comput Phys. 1983;52(1):24–34. [Google Scholar]
- 43.Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K. Scalable molecular dynamics with NAMD. J Comp Chem. 2005;26(16):1781–802. doi: 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Deng YQ, Roux B. Hydration of amino acid side chains: Nonpolar and electrostatic contributions calculated from staged molecular dynamics free energy simulations with explicit water molecules. J Phys Chem B. 2004;108(42):16567–16576. [Google Scholar]
- 45.Jiang W, Hodoscek M, Roux B. Computation of Absolute Hydration and Binding Free Energy with Free Energy Perturbation Distributed Replica-Exchange Molecular Dynamics (FEP/REMD) J Chem Theo Comp. 2009;5(10):2583–2588. doi: 10.1021/ct900223z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Shivakumar D, Deng YQ, Roux B. Computations of Absolute Solvation Free Energies of Small Molecules Using Explicit and Implicit Solvent Model. J Chem Theo Comp. 2009;5(4):919–930. doi: 10.1021/ct800445x. [DOI] [PubMed] [Google Scholar]
- 47.Sugita Y, Okamoto Y. Replica-exchange multicanonical algorithm and multicanonical replica-exchange method for simulating systems with rough energy landscape. Chem Phys Lett. 2000;329:261–270. [Google Scholar]
- 48.Jiang W, Phillips JC, Huang L, Fajer M, Meng Y, Gumbart JC, Luo Y, Schulten K, Roux B. Generalized Scalable Multiple Copy Algorithms for Molecular Dynamics Simulations in NAMD. Comp Phys Comm. 2014;185(3):908–916. doi: 10.1016/j.cpc.2013.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J Chem Phys. 1983;79:926–935. [Google Scholar]
- 50.Nina M, Beglov D, Roux B. Atomic radii for continuum electrostatics calculations based on molecular dynamics free energy simulations. J Phys Chem B. 1997;101(26):5239–5248. [Google Scholar]
- 51.Mobley DL, Dumont E, Chodera JD, Dill KA. Comparison of charge models for fixed-charge force fields: small-molecule hydration free energies in explicit solvent. J Phys Chem B. 2007;111(9):2242–54. doi: 10.1021/jp0667442. [DOI] [PubMed] [Google Scholar]
- 52.Mobley DL, Bayly CI, Cooper MD, Shirts MR, Dill KA. Small molecule hydration free energies in explicit solvent: An extensive test of fixed-charge atomistic simulations. J Chem Theo Comp. 2009;5(2):350–358. doi: 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Shivakumar D, Harder E, Damm W, Friesner RA, Sherman W. Improving the Prediction of Absolute Solvation Free Energies Using the Next Generation OPLS Force Field. J Chem Theo Comp. 2012;8(8):2553–8. doi: 10.1021/ct300203w. [DOI] [PubMed] [Google Scholar]
- 54.Mobley DL, Guthrie JP. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comp-Aided Mol Des. 2014;28(7):711–20. doi: 10.1007/s10822-014-9747-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Matos GDR, Kyu DY, Loeffler HH, Chodera JD, Shirts MR, Mobley DL. Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database. J Chem Eng Data. 2017;62(5):1559–1569. doi: 10.1021/acs.jced.7b00104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Best RB, Zheng W, Mittal J. Correction to Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J Chem Theo Comp. 2015;11(4):1978. doi: 10.1021/acs.jctc.5b00219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Piana S, Donchev AG, Robustelli P, Shaw DE. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J Phys Chem B. 2015;119(16):5113–5123. doi: 10.1021/jp508971m. [DOI] [PubMed] [Google Scholar]
- 58.Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, Grubmuller H, MacKerell AD., Jr CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods. 2017;14(1):71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Best RB, Zheng W, Mittal J. Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J Chem Theo Comp. 2014;10(11):5113–5124. doi: 10.1021/ct500569b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Baker CM, Lopes PE, Zhu X, Roux B, Mackerell AD., Jr Accurate Calculation of Hydration Free Energies using Pair-Specific Lennard-Jones Parameters in the CHARMM Drude Polarizable Force Field. J Chem Theo Comp. 2010;6(4):1181–1198. doi: 10.1021/ct9005773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.McDaniel JG, Schmidt JR. Physically-Motivated Force Fields from Symmetry-Adapted Perturbation Theory. J Phys Chem A. 2013;117(10):2053–2066. doi: 10.1021/jp3108182. [DOI] [PubMed] [Google Scholar]
- 62.McDaniel JG, Schmidt JR. First-Principles Many-Body Force Fields from the Gas Phase to Liquid: A “Universal” Approach. J Phys Chem B. 2014;118(28):8042–8053. doi: 10.1021/jp501128w. [DOI] [PubMed] [Google Scholar]
- 63.McDaniel JG, Schmidt JR. Next-Generation Force Fields from Symmetry-Adapted Perturbation Theory. Ann Rev Phys Chem, Vol 67. 2016;67:467–488. doi: 10.1146/annurev-physchem-040215-112047. [DOI] [PubMed] [Google Scholar]
- 64.Bukowski R, Szalewicz K, Groenenboom GC, van der Avoird A. Predictions of the properties of water from first principles. Science. 2007;315(5816):1249–1252. doi: 10.1126/science.1136371. [DOI] [PubMed] [Google Scholar]
- 65.Paesani F. Getting the Right Answers for the Right Reasons: Toward Predictive Molecular Simulations of Water with Many-Body Potential Energy Functions. Acc Chem Res. 2016;49(9):1844–51. doi: 10.1021/acs.accounts.6b00285. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.