Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 1.
Published in final edited form as: J Chem Theory Comput. 2019 Oct 25;15(11):6225–6242. doi: 10.1021/acs.jctc.9b00748

Binding thermodynamics of host-guest systems with SMIRNOFF99Frosst 1.0.5 from the Open Force Field Initiative

David R Slochower 1, Niel M Henriksen 1, Lee-Ping Wang 2, John D Chodera 3, David L Mobley 4, Michael K Gilson 1
PMCID: PMC7328435  NIHMSID: NIHMS1057924  PMID: 31603667

Abstract

Designing ligands that bind their target biomolecules with high affinity and specificity is a key step in small-molecule drug discovery, but accurately predicting protein-ligand binding free energies remains challenging. Key sources of errors in the calculations include inadequate sampling of conformational space, ambiguous protonation states, and errors in force fields. Noncovalent complexes between a host molecule with a binding cavity and a drug-like guest molecules have emerged as powerful model systems. As model systems, host-guest complexes reduce many of the errors in more complex protein-ligand binding systems, as their small size greatly facilitates conformational sampling, and one can choose systems that avoid ambiguities in protonation states. These features, combined with their ease of experimental characterization, make host-guest systems ideal model systems to test and ultimately optimize force fields in the context of binding thermodynamics calculations.

The Open Force Field Initiative aims to create a modern, open software infrastructure for automatically generating and assessing force fields using data sets. The first force field to arise out of this effort, named SMIRNOFF99Frosst, has approximately one tenth the number of parameters, in version 1.0.5, compared to typical general small molecule force fields, such as GAFF. Here, we evaluate the accuracy of this initial force field, using free energy calculations of 43 α and β-cyclodextrin host-guest pairs for which experimental thermodynamic data are available, and compare with matched calculations using two versions of GAFF. For all three force fields, we used TIP3P water and AM1-BCC charges. The calculations are performed using the attach-pull-release (APR) method as implemented in the open source package, pAPRika. For binding free energies, the root mean square error of the SMIRNOFF99Frosst calculations relative to experiment is 0.9 [0.7, 1.1] kcal/mol, while the corresponding results for GAFF 1.7 and GAFF 2.1 are 0.9 [0.7, 1.1] kcal/mol and 1.7 [1.5, 1.9] kcal/mol, respectively, with 95% confidence ranges in brackets. These results suggest that SMIRNOFF99Frosst performs competitively with existing small molecule force fields and is a parsimonious starting point for optimization.

1.2. Introduction

The accurate prediction of protein-ligand binding free energies is a central goal of computational chemistry, with key applications in early stage drug discovery. However, calculations of protein-ligand binding thermodynamics still involve a number of challenging choices, including the choice of empirical force field, specifying the protonation states of ionizable residues, adding hydrogens and otherwise adjusting the initial protein structure, and positioning the candidate ligand in the binding pocket. Predictions of protein-ligand absolute binding free energies have achieved root mean square errors around 1-2 kcal/mol for “well-behaved” systems13, with deviations an order of magnitude larger for some protein families with slow degrees of freedom4. Retrospective relative free energy calculations on a series of congeneric ligands, using proprietary methods, have also achieved root mean square errors compared to experiment of around 1 kcal/mol57. However, it is not possible to determine how much of the prediction error can be attributed to each of the decisions made by the modeler, as opposed to accuracy limitations of the force field.

By minimizing the ambiguities involved in modeling protein-ligand complexes, host-guest systems offer a way to isolate and directly probe force field error. A variety of techniques for computing absolute binding free energies have been applied to host-guest systems, and some have shown accuracy as good as ~1 kcal/mol, as highlighted in the recent SAMPL5 and SAMPL6 blind challenges1,8. The techniques applied to this problem have included both quantum and classical dynamics, employing a range of energy and solvation models, with some techniques having knowledge-based steps, docking, or clustering916. The attach-pull-release (APR) method has consistently been ranked among the most reliable techniques for predicting binding thermodynamics of host-guest complexes in blind challenges8,17. In APR, the reversible work of transferring the guest from the binding site to solution, via a physical pathway, is computed using a series of umbrella sampling windows. Simulating each window and integrating over the partial derivative of the restraint energy with respect to the restraint target, in each window, is used to generate a potential of mean force along the pulling coordinate, yielding the binding free energy at standard state, ΔG° after applying an analytic correction to account for the effective concentration of the guest during the simulation18. Furthermore, subtracting the mean potential energies obtained from long simulations of the solvated bound complex and the solvated dissociated complex yields the binding enthalpy, ΔH19. Together, ΔG° and ΔH can be combined to determine the binding entropy at standard state, ΔS°. Thus, APR provides the complete thermodynamic signature of a host-guest binding reaction: ΔG°, ΔH, and −TΔS°.

Cyclodextrins, in particular, are ideal host molecules for testing computational methods. They are neutral across a broad pH range, with well-characterized structures20, and bind both small molecule fragments and drug-like guest molecules with reasonable affinity, from near −1 kcal/mol to about −5 kcal/mol in the present work21, and with higher affinity for some cyclodextrin derivatives21. Moreover, cyclodextrins are stable in a wide range of experimental conditions and their high millimolar aqueous solubility allows a range of different experimental techniques to be used to measure their binding to guests22. Here, we report the calculation of binding free energies, enthalpies, and entropies of small guest molecules with functional groups often found in drugs to α- and β-cyclodextrin host molecules, converged to within 0.1 kcal/mol statistical uncertainty, using the APR method. These calculations offer an opportunity to benchmark—and ultimately optimize—new and existing force fields.

The first force field produced by the Open Force Field Initiative, SMIRNOFF99Frosst v1.0.5, was released in late 201823,24. It is derived from AMBER parm9925 and Merck’s parm@Frosst26. Instead of relying on atom types to assign force field parameters to compounds, which is the procedure followed by the LEaP program used to assign parameters to molecules in AmberTools27, SMIRNOFF99Frosst and the Open Force Field Toolkit use separately defined local chemical environments for each atom, bond, angle, and dihedral, to apply force field parameters specified by SMIRKS strings28. This process simplifies and effectively uncouples the parameters for each term in the force field. For example, the addition of a new Lennard-Jones parameter does not require creating a new atom type that forces the addition of new bonded, angle, and dihedral parameters. This approach leads to a much leaner force field specification; there are over 3000 lines of parameters in GAFF v1.729, over 6000 lines of parameters in GAFF v2.1, and just 322 lines of parameters in SMIRNOFF99Frosst v1.0.530. It is important to note that SMIRNOFF99Frosst is not yet optimized at this stage, only compressed; subsequent work will focus on optimizing SMIRNOFF99Frosst and other SMIRNOFF-family force fields to fit quantum and experimental data31. In the following text, SMIRNOFF99Frosst refers to version 1.0.5 of the force field, unless otherwise noted.

Thus far, SMIRNOFF99Frosst has been tested on hydration free energies of 642 small molecules and the densities and dielectric constants of 45 pure organic liquids23. Here, we benchmark SMIRNOFF99Frosst, GAFF v1.7, and GAFF v2.1 using noncovalent binding, thermodynamics for 43 host-guest complexes (including two hosts and 33 unique guests) for which experimental thermodynamics data are available, representing three different functional group moieties. We first compare the results of SMIRNOFF99Frosst with those of the conventional force fields GAFF v1.7 and GAFF v2.1, based on calculations of experimental binding free energies, enthalpies, and entropies. We then characterize the differences in host conformations sampled by SMIRNOFF99Frosst compared to the other two force fields.

1.3. Methods

1.3.1. Choice of host-guest systems

In this study, we report the binding thermodynamics of 43 host-guest complexes (Figure 1 and Table 1) computed using three different force fields. The complexes consist of either α- or β-cyclodextrin as host molecules and a series of small molecule guests containing ammonium, carboxylate, or cyclic alcohol functional groups. The cyclodextrins in the current study are cyclic polymers consisting of six (αCD) or seven (βCD) glucose monomers in the shape of a truncated cone. The equilibrium constants and standard molar enthalpies of binding for these 43 complexes have been measured using isothermal titration calorimetry (ITC) at pH = 6.90 and T = 298 K, and nuclear magnetic resonance spectroscopy (NMR) at pH = 7.0 and T = 298 ± 1 K32. Calculations on these host-guest systems have been performed previously33, and, as in the prior study, we considered only a single stereoisomer for the 1-methylammonium guests because it was not clear whether a mixture or a pure solution was used in Rekharsky, et al.32, and the ΔG° difference between each stereoisomer is expected to be < 0.1 kcal/mol34.

Figure 1:

Figure 1:

Structures of the two cyclodextrin hosts and 33 guest molecules in this study which together comprise 43 unique host-guest pairs. The simulation “residue name” is written beneath each guest.

Table 1:

The 43 unique host-guest combinations used in this study. The formal charge of each guest is listed in brackets. The guest names correspond to Tables 1 and 2 in Rekharsky et al.32.

Host-guest ID Host Guest Charge SMILES
a-bam αCD 1-butylamine +1 CCCC[NH3+]
a-nmb αCD n-methylbutylamine +1 CCCC[NH2+]C
a-mba αCD 1-methylbutylaminea +1 CCC[C@@H](C)[NH3+]
a-pam αCD 1-pentylamine +1 CCCCC[NH3+]
a-ham αCD 1-hexylamine +1 CCCCCC[NH3+]
a-nmh αCD n-methylhexylamine +1 CCCCCC[NH2+]C
a-mha αCD 1-methylhexylaminea +1 CCCCC[C@@H](C)[NH3+]
a-hpa αCD 1-heptylamine +1 CCCCCCC[NH3+]
a-mhp αCD 1-methylheptylamineb +1 CCCCCC[C@H](C)[NH3+]
a-oam αCD 1-octylamine +1 CCCCCCCC[NH3+]
b-ham βCD 1-hexylamine +1 CCCCCC[NH3+]
b-mha βCD 1-methylhexylaminea +1 CCCCC[C@@H](C)[NH3+]
b-oam βCD 1-octylamine +1 CCCCCCCC[NH3+]
a-cbu αCD cyclobutanol 0 C1CC(C1)O
a-cpe αCD cyclopentanol 0 C1CCC(C1)O
a-chp αCD cycloheptanol 0 C1CCCC(CC1)O
a-coc αCD cyclooctanol 0 C1CCCC(CCC1)O
b-cbu βCD cyclobutanol 0 C1CC(C1)O
b-cpe βCD cyclopentanol 0 C1CCC(C1)O
b-mch βCD 1-methylcyclohexanol 0 CC1(CCCCC1)O
b-m4c βCD cis-4-methylcyclohexanol 0 CC1CCC(CC1)O
b-m4t βCD trans-4-methylcyclohexanol 0 CC1CCC(CC1)O
b-chp βCD cycloheptanol 0 C1CCCC(CC1)O
b-coc βCD cyclooctanol 0 C1CCCC(CCC1)O
a-but αCD butanoate −1 CCCC(=O)[O-]
a-pnt αCD pentanoate −1 CCCCC(=O)[O-]
a-hex αCD hexanoate −1 CCCCCC(=O)[O-]
a-hx2 αCD trans-2-hexenoate −1 CCC/C=C/C(=O)[O-]
a-hx3 αCD trans-3-hexenoate −1 CC/C=C/CC(=O)[O-]
a-hep αCD heptanoate −1 CCCCCCC(=O)[O-]
a-hp6 αCD 6-heptenoate −1 C=CCCCCC(=O)[O-]
a-oct αCD Octanoate −1 CCCCCCCC(=O)[O-]
b-pnt βCD pentanoate −1 CCCCC(=O)[O-]
b-hex βCD hexanoate −1 CCCCCC(=O)[O-]
b-hep βCD heptanoate −1 CCCCCCC(=O)[O-]
b-ben βCD Benzoate −1 ciccc(cc1)C(=O)[O-]
b-pha βCD phenylacetate −1 c1ccc(cc1)CC(=O)[O-]
b-mp3 βCD 3-methylphenylacetate −1 Cc1cccc(c1)CC(=O)[O-]
b-mp4 βCD 4-methylphenylacetate −1 Cc1ccc(cc1)CC(=O)[O-]
b-mo3 βCD 3-methoxyphenylacetate −1 COc1cccc(c1)CC(=O)[O-]
b-mo4 βCD 4-methoxyphenylacetate −1 COc1ccc(cc1)CC(=O)[O-]
b-pb3 βCD 3-phenylbutanoate −1 C[C@H](CC(=O)[O-])c1ccccc1
b-pb4 βCD 4-phenylbutanoate −1 c1ccc(cci)CCCC(=O)[O-]
a

Only the R enantiomer was considered.

b

Only the S enantiomer was considered. SMILES strings are written as canonical isomeric SMILES as implemented in the OpenEye OEChem Toolkit version 2.0.235.

1.3.2. Application of force field parameters

We sought to compare force fields directly and therefore attempted to minimize additional differences among the simulations with each force field. In all simulations, we applied AM1-BCC36,37 partial atomic charges to both the host and guest molecules using the antechamber program in AmberTools1627.

The Open Force Field Toolkit provides a mechanism for user-specified charges. If no charges are supplied, the toolkit will generate AM1-BCC charges. AM1-BCC is the recommended charge scheme, and the host charges were calculated using a single glucose molecule with methoxy caps on the O1 and O4 alcohols (Figure 2); each glucose monomer in the cyclodextrin polymer has identical charges. After removing the capping atoms, the net charge of the glucose monomer was −0.064 e. To ensure a neutrality of the glucose monomer, the charge remainder was proportionally distributed across all atoms according to the magnitude of the partial charge for each atom. The minimum and maximum charge adjustments were 0.000684 and 0.007245 e, respectively. Using the entire αCD molecule as an input to antechamber results in partial atomic charges that differ by at most 0.02 e, compared to using a single monomer, and requires reducing the maximum path length used to determine the equivalence of atomic charges (Figure S1). We used TIP3P water38 and Joung-Cheatham monovalent ion parameters39 in each simulation set.

Figure 2:

Figure 2:

Atom names (A) and GAFF atom types (B) for a glucose monomer in αCD shown with two flanking monomers. The remaining three glucose monomers are hidden for clarity.

GAFF v1.7 bond, angle, torsion, and Lennard-Jones parameters were applied using the tleap program distributed with AmberTools16. GAFF v2.1 parameters were applied in an identical manner to the GAFF v1.7 parameters, using the tleap program distributed with AmberTools18 and substituting leaprc.gaff for leaprc.gaff2 in the tleap input file.

To apply SMIRNOFF99Frosst parameters, we followed a multistep process, beginning with the AMBER-format .prmtop and .inpcrd GAFF v1.7 files. The host and guest molecules were parameterized with version 0.0.3 of the Open Force Field Toolkit which uses the OpenEye OEChem Toolkit version 2.0.235, which reads molecular coordinates and topologies and creates a serialized representation of the molecular system; version 1.0.5 of the SMIRNOFF99Frosst force field; specified in version 1.0 of the SMIRNOFF format. Once parameterized with SMIRNOFF99Frosst, the topology and coordinates for the host-guest complex were combined with the solvent and ions, which retained their TIP3P water parameters and Joung-Cheatham ion parameters, respectively. This was accomplished by the ParmEd program40, which enables saving the OpenMM system created by the Open Force Field Toolkit in AMBER-format .prmtop and .inpcrd files. Ongoing updates to the Open Force Field Toolkit may result in changes to how this procedure is carried out in the future.

1.3.3. Thermodynamic calculation

We used the attach-pull-release (APR) method, as implemented in the open source package pAPRika version 0.0.3, to calculate absolute binding free energies. A complete description of the APR method has been provided in the literature13,17,19,41. The attachment and release phases each consisted of 15 independent windows. During the attachment phase, the force constants on the host and guest are scaled by a λ parameter that goes from λ = 0, at which point all restraints are turned off, to λ = 1, at which point all restraints are at their maximum force constant. The λ windows are more densely spaced where the force constant is smaller to improve sampling along highly curved regions of the potential of mean force. These restraints include a set of distance, angle, and torsion restraints that orient the host and guest along the long axis of the simulation box. A separate set of conformational restraints were applied between neighboring glucose units of the cyclodextrin to minimize deformations of the host molecule as the guest molecule is pulled out. The conformational restraints were applied along the pseudodihedrals O5n–C1n–O1n–C4n+1 and C1n–O1n–C4n+1–C5n+1 to improve convergence and sampling of the bound state (Figure 2 for atom names). To further improve sampling of weak-binding guests, we applied a hard wall restraint that confined the guest molecule to within a sphere of 12.3 and 13.5 Å of αCD and βCD, respectively, during the bound state.

The release phase is the conceptual reverse of the attach phase, in which the conformational restraints on the host are gradually turned off (λ = 1→0) in the absence of the guest. This explicit release phase is performed once for αCD and once for βCD, as it is independent of guest molecule. Finally, an analytic correction is performed to compute the work of moving the guest from the restricted volume enforced by the APR restraints to standard state at 1 M concentration.

The pulling phase consisted of 45 independent, equally spaced windows. During the pulling phase, the λ parameter represents the target value of a distance restraint with constant force constant. This target distance is increased uniformly in 44 increments of 0.4 Å, yielding windows that separate the host and guest by 18 Å over the course of the calculation.

Due to the asymmetry of the primary and secondary alcohols of cyclodextrin (Figure S3), as well as of the small molecule guests, there are generally two distinct binding poses that do not interconvert during the simulation timescale. To account for this effect, we separately compute the binding free energy and enthalpy for each orientation13 and combine the results to produce a single value for each host-guest combination using the following equation:

ΔG=RTln(exp(βΔGprimary)+exp(βΔGsecondary))

The total binding enthalpy is weighted by both the binding enthalpy and binding free energy in each orientation using the following equation:

ΔH=ΔHprimaryexp(βΔGprimary)+ΔHsecondaryexp(βΔGsecondary)exp(βΔGprimary)+exp(βΔGsecondary)

In this manuscript, we refer to calculations where the guest functional group in the bound state is at the primary face of cyclodextrin with a -p suffix, and calculations where it is at the secondary face of cyclodextrin with a -s suffix.

Thermodynamic integration42 and the multistate Bennett acceptance ratio estimator (MBAR)43 were used to compute the binding free energy (ΔG°). The results presented in the main text are those analyzed using thermodynamic integration to be consistent with prior analysis presented in Henriksen, et al.33. The binding enthalpy (ΔH) was computed as the difference in mean potential energy of the bound state (in the absence of any restraints) and the unbound state (where the guest is held far away from the host, but the conformational restraints on the host are disabled). The binding entropy (ΔS°) was computed by subtraction using ΔG° and ΔH.

1.3.4. Simulations

Simulations were performed with the pmemd. cuda module of AMBER 16 (calculations with the GAFF v1.7 force field) and AMBER 18 (calculations with the GAFF v2.1 and SMIRNOFF99Frosst force fields) molecular dynamics software27,44. Each window for each system was independently solvated and simulated. Simulation data for the host-guest complexes using GAFF v1.7 were taken from Henriksen, et al.33 and are described in additional detail therein. Solvation consisted of 2000 TIP3P waters for the αCD systems and 2210 waters for the βCD systems in an orthorhombic box. The host and guest were oriented via non-interacting dummy atoms along the simulation box’s long z axis, to allow use of an elongated periodic box that reduces the amount of solvent required for the calculation. Each simulation contained enough Na+ or Cl ions to neutralize the host-guest complex and an additional 50 mM NaCl to match the experimental conditions in32. In the GAFF simulations, hydrogen mass repartitioning45 was used to adjust the mass of hydrogen atoms by a factor of 3 and decreasing the mass of the bound heavy atoms proportionally, keeping the total molecular weight of each molecule constant and enabling a simulation timestep of 4 fs. Hydrogen mass repartitioning produces negligible changes in computed thermodynamic observables for other cyclodextrin-guest calculations, with deviations within statistical uncertainty13. Equilibration consisted of 50,000 steps of energy minimization, 100 ps of heating from 0 to 300 K, and then 2000 ps of additional NPT simulation. AMBER’S Langevin thermostat with a collision rate of 1 ps−1, the Monte Carlo barostat, a nonbonded cutoff of 9 Å and default PME parameters, were used for the NPT simulations. An isotropic analytic correction to the Lennard-Jones interactions is applied beyond the cutoff distance46. Production NPT simulations were run for a minimum of 2.5 ns and maximum of 50 ns per window, except for the windows used to calculate the enthalpy, which were each simulated for 1 μs. In the GAFF v1.7 and GAFF v2.1 simulations, the exact length of each window’s simulation was determined by the uncertainty in the work done in each λ window. In particular, for restraint energy U in λ window i, we define the instantaneous SEM of ∂U/∂λi as σ(λi), and each window (except for the windows used to calculate ΔH) was simulated until the value of w(λi), defined as

w(λi)={σ(λi)λi+12i=0σ(λi)λi+1λi12i[1,N1]σ(λi)1λi12i=N} (1)

fell below a threshold of 0.02 kcal/mol during the attach phase and 0.1 kcal/mol during the pull phase.

The second term in Equation 1 scales the uncertainty in the work in each λ window by the nonuniform spacing of the λ windows. w(λi) is the approximate contribution of window λi to the overall PMF uncertainty. Excluding the first and last window, the average window length was 11.8 ns and 5.39 ns for GAFF v1.7 and GAFF v2.1 simulations, respectively. We took a more direct approach with the SMIRNOFF99Frosst simulations, due to changes in pAPRika that allowed us to target uncertainties of the same magnitude as in the GAFF simulations, by running each window for a constant length of 10 ns, except for the first and last window which ran for 1 μs to converge ΔH for all three force fields.

1.3.5. Statistics

The uncertainty in the work done by each restraint in each simulation window, σ(λi), was estimated using blocking analysis47, in a manner which has been shown to yield good agreement with uncertainties obtained from independent replicates13. In particular, rather than looking for a plateau in the SEM as the size of the blocks increased, as originally described by Flyvbjerg and Peterssen47, we instead use the largest standard error of the mean (SEM) obtained for any block size. This avoids the requirement of detecting a plateau and yields a more conservative estimate; i.e., a larger SEM. Then, using Gaussians with the mean and SEM of Uλ in each window, new values of Uλ were bootstrap sampled for each window 100,000 times and combined to create artificial data for 100,000 notional APR calculations. These were integrated across all windows with splines to generate 100,000 estimates of ΔG°. We report the mean and standard deviation of these 100,000 results as the final mean and its SEM. The SEM of ΔH was computed from the SEM of the total potential energy in each end point window, estimated using blocking analysis, added in quadrature. The standard error of the mean of −TΔS° was calculated using the uncertainties in ΔG° and ΔH added in quadrature.

For each force field, we computed the root mean squared error (RMSE), mean signed error (MSE), the coefficient of determination (R2), Kendall’s rank correlation coefficient (⊺), and the slope and intercept of the linear regressions of the computed properties against the experimental values. The R2 values for the subsets of ligand with each are also reported in the bottom right corner in each graph. Comparisons with experiment have 43 measurements, for the 43 unique host-guest complexes listed in Table 1; comparisons between force fields have 86 data points, representing the calculations for the two orientations of the guest, “p” and “s”, in the binding site (see above). The overall RMSE and R2 statistics for each comparison are reported as the sample mean estimated from using all the data, with the 95% confidence interval, from bootstrapping over the set of complexes, in brackets.

1.4. Results

This results section is organized as follows. We first present a comparison of binding free energies (ΔG°) and binding enthalpies (ΔH) of small molecule guests to α-cyclodextrin (αCD) and β-cyclodextrin (βCD), computed with SMIRNOFF99Frosst and two versions of the General AMBER Force Field (GAFF29). We then detail how the conformational preferences of the host molecules changes between force fields and seek insight into key parameter differences between SMIRNOFF99Frosst and GAFF and their effects.

1.4.1. Comparison with experimental binding free energies, enthalpies, and entropies

1.4.1.1. Binding free energies

Despite having far fewer numerical parameters, SMIRNOFF99Frosst does about as well as GAFF v1.7 and arguably better than GAFF v2.1 at replicating binding free energies measured by ITC or NMR. Thus, SMIRNOFF99Frosst yields an overall ΔG° RMSE from experiment of 0.9 [0.7, 1.1] kcal/mol across the 43 host-guest systems, compared to the statistically indistinguishable 0.9 [0.7, 1.1] kcal/mol for GAFF v1.7, and distinct from 1.7 [1.5, 1.9] kcal/mol for GAFF v2.1 (where the 95% confidence interval is written in brackets) as detailed in Figure 3; Tables 2, S5.

Figure 3:

Figure 3:

Comparison of calculated absolute binding free energies (ΔG°) and binding enthalpies (ΔH) with experiment with SMIRNOFF99Frosst parameters (A, B), GAFF v1.7 parameters (C, D), or GAFF v2.1 parameters (E, F) applied to both host and guest. The orange, blue, and purple coloring distinguish the functional group of the guest as an ammonium, alcohol, or carboxylate, respectively.

Table 2:

Predicted thermodynamic properties for each force field relative to experiment in kcal/mol.

RMSE MSE R2 Slope Intercept Tau
ΔG° SMIRNOFF99Frosst 0.91 [0.71, 1.13] −0.01 [−0.28, 0.26] 0.34 [0.12, 0.56] 0.49 [0.26, 0.72] −1.55 [−0.80, −2.29] 0.40 [0.57, 0.23]
ΔG° GAFF v1.7 0.88 [0.72, 1.08] 0.46 [0.23, 0.69] 0.54 [0.33, 0.71] 0.69 [0.47, 0.91] −0.48 [0.22, −1.16] 0.52 [0.65, 0.38]
ΔG° GAFF v2.1 1.68 [1.51, 1.85] −1.56 [−1.74, −1.37] 0.82 [0.61, 0.92] 1.19 [0.96, 1.34] −1.00 [−0.52, −1.62] 0.73 [0.82, 0.61]
ΔH SMIRNOFF99Frosst 1.85 [1.41, 2.30] 0.76 [0.26, 1.28] 0.44 [0.21, 0.66] 0.85 [0.54, 1.19] 0.41 [1.55, −0.50] 0.53 [0.69, 0.34]
ΔH GAFF v1.7 2.54 [2.08, 3.00] 1.84 [1.31, 2.37] 0.39 [0.17, 0.62] 0.80 [0.47, 1.18] 1.36 [2.67, 0.31] 0.50 [0.65, 0.32]
ΔH GAFF v2.1 2.21 [1.77, 2.65] −1.64 [−2.10, −1.20] 0.75 [0.58, 0.87] 1.38 [1.15, 1.63] −0.69 [0.16, −1.43] 0.67 [0.79, 0.52]
−TΔS° SMIRNOFF99Frosst 1.90 [1.49, 2.32] −0.78 [−1.29, −0.24] 0.40 [0.14, 0.63] 0.90 [0.51, 1.29] −0.83 [−0.34, −1.34] 0.33 [0.50, 0.13]
−TΔS° GAFF v1.7 2.21 [1.74, 2.68] −1.38 [−1.90, −0.86] 0.43 [0.16, 0.68] 0.95 [0.54, 1.38] −1.41 [−0.96, −1.89] 0.32 [0.50, 0.10]
−TΔS° GAFF v2.1 1.80 [0.68, 3.19] −0.00 [−0.98, 1.27] 0.48 [0.00, 0.97] 1.13 [−0.22, 1.96] 0.08 [1.14, −1.79] 0.46 [0.82, −0.02]

On the whole, GAFF v1.7 agrees well with SMIRNOFF99Frosst (Figure S6), as the RMSE and MSE between their results are 0.8 [0.6, 1.0] kcal/mol and −0.5 [−0.3, −0.7] kcal/mol. This result is not surprising as GAFF v1.7 and SMIRNOFF99Frosst may be considered cousin force fields with a common ancestor in AMBER’s parm99. Both SMIRNOFF99Frosst and GAFF v1.7 systematically underestimate the binding affinity for cyclic alcohols, with MSEs of 0.7 [0.2, 1.2] kcal/mol and 0.9 [0.4, 1.4] kcal/mol, respectively. In contrast, GAFF v2.1 significantly overestimates the binding of all compounds, leading to MSE and RMSE values of −1.6 [−1.7, −1.4] kcal/mol and 1.6 [1.4, 1.8] kcal/mol, respectively. However, GAFF v2.1 has a particularly good correlation with experiment across all functional group classes, with R2 of 0.8 [0.6, 0.9], compared with 0.3 [0.1, 0.6] and 0.5 [0.3, 0.7] for SMIRNOFF99Frosst and GAFF 1.7, respectively. This may trace to differences in the host conformations sampled by GAFF v2.1, which indicate a more consistently open cyclodextrin “pocket” for guests to bind (Figure 14), as detailed below.

Figure 14:

Figure 14:

Top: Root mean square deviation (RMSD in Å) of free βCD in the three force fields. Each RMSD is calculated relative to the initial structure, a gas-phase minimization of βCD with GAFF v1.7. A 1000 frame moving average is plotted in red. Middle: top-view of the unoccupied cavity of βCD with no guest (200 snapshots over 1 μs). Bottom: side-view of the unoccupied cavity. The carbons are colored blue in SMIRNOFF99Frosst, green in GAFF v1.7, and purple in GAFF v2.1. Hydrogen atoms have been hidden for clarity.

1.4.1.2. Binding enthalpies and entropies

In the case of binding enthalpies (Figure 3), SMIRNOFF99Frosst agrees the best with experiment (RMSE 1.8 [1.4, 2.3] kcal/mol), followed by GAFF v2.1 (RMSE = 2.2 [1.8, 2.7] kcal/mol), and then GAFF v1.7 (RMSE = 2.5 [2.0, 3.0] kcal/mol). In some cases, GAFF v1.7 underestimates ΔH by over 3 kcal/mol and up to 5 kcal/mol (b-chp). For binding entropies, GAFF v2.1 has the lowest RMSE relative to experiment (RMSE = 1.47 [1.1, 2.0] kcal/mol), followed by SMIRNOFF99Frosst (RMSE = 1.9 kcal/mol [1.5, 2.3]), and GAFF v1.7 (RMSE = 2.2 [1.7, 2.7] kcal/mol) (Figure S2, Figure S7). All force fields perform poorly at replicating −TΔS° for carboxylate guests, with RMSEs ranging from 1.8 [0.7, 3.2] kcal/mol (GAFF v2.1) to 3.0 [2.1, 3.9] kcal/mol (GAFF v1.7) (Figure S8). All force fields also underestimate the entropic component of binding of a-coc (αCD:cyclooctanol) relative to experiment, by 3-5 kcal/mol. This is likely due to the poor fit of cycloctanol inside the cavity of αCD, particularly in the primary orientation (Figure 4). Overall, SMIRNOFF99Frosst and GAFF v1.7 yield rather different binding enthalpies (RMSE = 1.6 [1.3, 2.0] kcal/mol) and entropies (RMSE =1.6 [1.2, 2.0] kcal/mol). The deviations between SMIRNOFF99Frosst and GAFF v2.1 are higher for ΔH (RMSE = 3.0 [2.5, 3.4] kcal/mol) and lower for −TΔS° (RMSE = 1.9 [1.6, 2.2] kcal/mol).

Figure 4:

Figure 4:

Comparisons of binding free energy (ΔG) between guests in either the primary or secondary orientation of αCD or βCD, for SMIRNOFF99Frosst (A), GAFF v1.7 (B), or GAFF v2.1 (C). Arrows point from ΔG° for the secondary to ΔG° for the primary cavity. (D) An overlay of cyclooctanol bound state positions (400 snapshots over 1 μs) with αCD (left) or βCD (right) in GAFF v2.1.

Analysis of the simulations with MBAR produces very slightly improved results for SMIRNOFF99Frosst ΔG°, ΔH, and −TΔS° compared to experiment (Table S4), but they do not appear to be statistically significant.

1.4.2. Guest preferences for binding in the primary or secondary orientation

The asymmetry of the hosts and the guests leads to two distinct bound states for each host-guest pair: one where the functional group of the guest sits at the primary face of the host and another where the functional group of the guest sits at the secondary face (18). The difference in binding free energy between these two orientations (ΔΔGorientation) can be large, at around 2 kcal/mol for SMIRNOFF99Frosst and GAFF v1.7 and 5 kcal/mol for GAFF v2.1. SMIRNOFF99Frosst predicts the largest ΔΔGorientation for the ammonium-containing butylamine and pentylamine guests with αCD (Figure 4, Figure S4, Figure S5), with the primary orientation being more favorable. Thus, the cationic ammonium groups are predicted to prefer the narrower primary portal of the host. GAFF v1.7 predicts a large ΔΔGorientation for the cyclic alcohols cyclooctanol and cycloheptanol, with the secondary orientation having a more favorable ΔG. When GAFF v2.1 is used, the differences between primary and secondary binding range even higher, greater than 4 kcal/mol, for αCD with these two guests. This effect is due, at least in part, to steric clashes in the bound state for very large guests (Figure 4, D), especially in the narrow primary cavity of the smaller αCD. It is worth noting that the experimental measurement for the the a-coc (αCD:cyclooctanol) complex has very large uncertainties associated with both ΔG° and ΔH.

1.4.3. Comparison of results for αCD versus βCD

It is of interest to compare the results between αCD and βCD by focusing on the ten guests for which experimental data are available with both hosts. The SMIRNOFF99Frosst and GAFF 1.7 force fields both yield somewhat more accurate binding affinities for αCD (RMSE = 0.8 [0.5, 1.1] kcal/mol) than for βCD (RMSE = 1.0 [0.8, 1.3] kcal/mol), whereas no clear patterns is observed for GAFF v2.1 (Figure S9). Much as seen for the two orientations of the guest molecules within each host, GAFF v2.1 yields relatively large differences in predicted free energies for each guest between the two hosts, but it does not seem to be more accurate for either host relative to the other.

1.4.4. Trends by guest functional group

The SMIRNOFF99Frosst force field yields rather accurate binding free energies for binding of the ammonium guests (MSE = −0.1 [−0.5, 0.3] kcal/mol and RMSE = 0.7 [0.4, 1.1] kcal/mol) to both αCD and βCD (Figure 6, Figure S10, and Table S5). It also replicates the experimental trends that shorter-chain molecules bind less strongly, and that each guest binds more strongly to αCD than βCD. The results are also reasonably good for the cyclic alcohols (MSE = 0.7 [0.2, 1.2] kcal/mol and RMSE = 1.1 [0.7, 1.6] kcal/mol) (Figure 7, Figure S11, and Table S7), though the predicted affinities for αCD are uniformly too weak, while those for βCD are mostly too strong. Finally, SMIRNOFF99Frosst yields rather accurate binding affinities for the carboxylate guests with both αCD and βCD (MSE = −0.4 [−0.7, 0] kcal/mol and RMSE = 0.9 [0.6, 1.2] kcal/mmol) (Figure 8, Figure S12, and Table S6).

Figure 6:

Figure 6:

Binding free energy (ΔG°) comparisons showing ammonium-containing guests in color and highlighted, for αCD (A) and βCD (B). Darker colors indicate shorter chain molecules. Non-ammonium guests are shown as smaller gray circles.

Figure 7:

Figure 7:

Binding free energy (ΔG°) comparisons showing alcohol-containing guests in color and highlighted, for αCD (A) and βCD (B). Darker colors indicate smaller molecules. Non-alcohol guests are shown as smaller gray circles.

Figure 8:

Figure 8:

Binding free energy (ΔG°) comparisons showing carboxylate-containing guests in color and highlighted, for αCD (A) and βCD (B). Darker colors indicates smaller molecules. Non-carboxylate guests are shown as smaller gray circles.

GAFF v1.7 tends to predict slightly weaker binding than SMIRNOFF99Frosst, whereas GAFF v2.1 predicts much stronger binding for all classses of guest compounds (Figures S5, S6, and S7).

1.4.5. Differences in cyclodextrin force field parameters between SMIRNOFF99Frosst and GAFF

We now summarize differences among the parameters assigned to the host αCD by SMIRNOFF99Frosst, a descendant of parm99 and parm@Frosst; GAFF v1.7 (released circa March 2015 according to gaff.dat distributed with AMBER16); and GAFF v2.1 (which has not yet been published). On going from GAFF v1.7 to GAFF v2.1, the bond and angle parameters were updated to reproduce small molecule geometries obtained from high-level quantum mechanical calculations and vibrational spectra of over 600 molecules; the torsion parameters were optimized to reproduce the potential energy surfaces of torsion angles in 400 model compounds; and the Lennard-Jones coefficients were redeveloped to reproduce interaction energies and pure liquid properties, as specified in the footer of gaff2.dat provided with AmberTools18. Note that chemically analogous atoms, bonds, angles and torsions in αCD and βCD are assigned identical parameters.

1.4.5.1. Lennard-Jones

The SMIRNOFF99Frosst and GAFF v1.7 force fields assign identical σ and ε parameters to the atoms of αCD. Note, that hydroxyl hydrogens are assigned σ = 0 Å and ε = 0 kcal/mol in both GAFF v1.7 and SMIRNOFF99Frosst v1.0.5, but later versions of SMIRNOFF99Frosst, produced after the calculations in the current manuscript, adopt small σ and ε values based on a similiar atom type in parm@Frosst4850. The GAFF v2.1 parameters differ in assigning shallower wells for oxygens and larger σ values for the hydroxyl hydrogens (Figure 9).

Figure 9:

Figure 9:

A comparison of Lennard-Jones nonbonded σ (A) and ε (B) parameters for SMIRNOFF99Frosst and GAFF v2.1. Values that differ by more than 10% are labeled in red. Atom names refer to Figure 2.

1.4.5.2. Bond stretches

Equilibrium bond lengths are very similar among the three force fields (Figure S13), but there are noticeable differences among the force constants (Figure 10) Thus, compared to GAFF v1.7, SMIRNOFF99Frosst tends to have slightly larger bond force constants, except for the O–H hydroxyl bond force constant, which is much stronger. In GAFF v2.1, the O–H hydroxyl bond force constant is very close to that of SMIRNOFF99Frosst, but the carbon-oxygen bond constants are distinctly weaker.

Figure 10:

Figure 10:

A comparison of bond force constants between SMIRNOFF99Frosst and GAFF v1.7 (A), or SMIRNOFF99Frosst and GAFF v2.1 (B). Values that differ by more than 10% are labeled in red. Atom names refer to Figure 2.

1.4.5.3. Bond angles

Relative to GAFF v1.7 and GAFF v2.1, SMIRNOFF99Frosst has fewer unique angle parameters applied to αCD; several distinct parameters appear to be compressed into a single force constant, around 50 kcal/mol/rad2 (Figure 11). These parameters correspond to C–C–C, C–O–C, O–C–O angles. The C–C–C angles are primarily around the ring of the glucose monomer. The C–O–C angles are both around the ring and between monomers (e.g., C1–O1–C4 and C1–O5–C5). Weaker force constants for these parameters in GAFF v1.7 compared to GAFF v2.1 may lead to increased flexibility.

Figure 11:

Figure 11:

A comparison of angle force constants between SMIRNOFF99Frosst and GAFF v1.7 (A) or SMIRNOFF99Frosst and GAFF v2.1 (B). A comparison of equilibrium angle values SMIRNOFF99Frosst and GAFF v1.7 (C) or SMIRNOFF99Frosst and GAFF v2.1 (D). Values that differ by more than 10% are labeled in red. Precise atom names have been omitted to compress multiple angles with the same parameter values into a single label.

1.4.5.4. Dihedral parameters

The dihedral parameters in SMIRNOFF99Frosst and GAFF v1.7 are extremely similar—where differences in barrier heights occur, they are in the hundredths or thousandths of 1 kcal/mol—with the exception of the H1–C1–C2–O2 parameter (Figure 2). For this dihedral, which corresponds to GAFF atom types h2-c3-c3-oh and SMIRKS pattern [#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]), SMIRNOFF99Frosst applies a single term with periodicity = 1 and GAFF v1.7 applies a single term with periodicity = 3 (Table S8, Figures 12).

Figure 12:

Figure 12:

(A) The atoms in the H1–C1–C2–O2 dihedral marked in purple on a glucose monomer in cyclodextrin. (B) The dihedral energy term applied to H1–C1–C2–O2 in SMIRNOFF99Frosst and GAFF v1.7. Atom names refer to Figure 2.

The dihedral parameters in GAFF v2.1 differ from those in SMIRNOFF99Frosst in a number of ways. There are several dihedrals that have a different number of terms (Table S9). This is partly due to the addition of dihedral terms with a barrier height of exactly 0.00 kcal/mol in GAFF, which are used to override wildcard parameters that might match the same atom types. For example, GAFF v2.1 applies a three term energy function to the atom types c3-os-c3-c3, whereas SMIRNOFF99Frosst employs a two term energy function for the hydroxyl rotation SMIRKS pattern [#6X4:1]-[#6X4:2]-[#8X2H0:3]-[#6X4:4], but only the terms with periodicity 2 and 3 have nonzero barrier heights in GAFF v2.1. Similarly, SMIRNOFF99Frosst uses two nonzero terms to model the potential barrier for the SMIRKS pattern [#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4], yet GAFF v2.1 applies a single term with a barrier height of exactly 0.00 kcal/mol for this rotation (atom types c3-c3-oh-ho). The fact that GAFF employs dihedral terms with zero amplitude terms highlights the complexity that would be required to optimize existing force fields that have accumulated legacy parameters needed to maintain backwards compatibility with older force fields and simulation codes.

In other cases, SMIRNOFF99Frosst and GAFF v2.1 have disagreements on the barrier height after matching the periodicity and phase for a given dihedral. For example, the amplitudes for the O1–C1–O5–C5 dihedral are 1.35 kcal/mol and 0.97 kcal/mol for SMIRNOFF99Frosst and GAFF v2.1, respectively, for the term with periodicity = 1, whereas the amplitudes are 0.85 kcal/mol and 1.24 kcal/mol for SMIRNOFF99Frosst and GAFF v2.1, respectively, for the term with periodicity = 2. It is notable that the barrier heights in GAFF v2.1 are similiar in magnitude to those in SMIRNOFF99Frosst, yet GAFF v2.1 produces much more rigid structures (Table 3, Figure 14), as detailed in the following section. Moreoever, many of the dihedrals that act between a pair of neighboring glucose monomers (i.e., inter-residue dihedrals) in cyclodextrin differ in their periodicies, phases, and amplitudes between SMIRNOFF99Frosst and GAFF v2.1 (Table 4, Figure 13). The dihedral acting on atoms O1n–C4n+1–C5n+1–O5n+1 is quite significantly different, with multiple minima and and barrier heights. This dihedral partially controls the rotation of glucose monomers towards or away from the interior of the cyclodextrin cavity. Surprisingly, glucose monomers in GAFF v2.1 penetrate the open cavity much less frequently than in SMIRNOFF99Frosst, despite the lower and broader dihedral energy in GAFF v2.1.

Table 3:

Dihedral barrier height differences between SMIRNOFF99Frosst and GAFF v2.1 for cases where the phase and periodicity of the energy term match but the barrier height does not. Atom names refer to Figure 2. Barrier height in kcal/mol.

SMIRNOFF99Frosst GAFF v2.1

SMIRKS Atom 1 Atom 2 Atom 3 Atom 4 Per Phase Height (kcal/mol) Height (kcal/mol)
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C1 C2 C3 C4 1 0 0.20 0.11
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C1 C2 C3 C4 2 0 0.25 0.29
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C1 C2 C3 C4 3 0 0.18 0.13
[*:1]-[#6X4:2]-[#6X4:3]-[*:4] C1 C2 C3 O3 3 0 0.16 0.21
[*:1]-[#6X4:2]-[#8X2H0:3]-[*:4] C1 O5 C5 H5 3 0 0.38 0.34
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C2 C3 C4 C5 1 0 0.20 0.11
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C2 C3 C4 C5 2 0 0.25 0.29
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C2 C3 C4 C5 3 0 0.18 0.13
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C3 C4 C5 C6 1 0 0.20 0.11
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C3 C4 C5 C6 2 0 0.25 0.29
[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4] C3 C4 C5 C6 3 0 0.18 0.13
[*:1]-[#6X4:2]-[#6X4:3]-[*:4] C4 C5 C6 O6 3 0 0.16 0.21
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H1 C1 C2 H2 3 0 0.15 0.16
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H2 C2 C3 H3 3 0 0.15 0.16
[*:1]-[#6X4:2]-[#8X2:3]-[#1:4] H2 C2 O2 HO2 3 0 0.17 0.11
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H3 C3 C4 H4 3 0 0.15 0.16
[*:1]-[#6X4:2]-[#8X2:3]-[#1:4] H3 C3 O3 HO3 3 0 0.17 0.11
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H4 C4 C5 H5 3 0 0.15 0.16
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H5 C5 C6 H61 3 0 0.15 0.16
[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4] H5 C5 C6 H62 3 0 0.15 0.16
[#6X4:1]-[#8X2:2]-[#6X4:3]-[#8X2:4] O1 C1 O5 C5 1 0 1.35 0.97
[#6X4:1]-[#8X2:2]-[#6X4:3]-[#8X2:4] O1 C1 O5 C5 2 0 0.85 1.24
[#6X4:1]-[#8X2:2]-[#6X4:3]-[#8X2:4] O1 C1 O5 C5 3 0 0.10 0.00
[*:1]-[#6X4:2]-[#6X4:3]-[*:4] O2 C2 C3 C4 3 0 0.16 0.21
[#8X2:1]-[#6X4:2]-[#6X4:3]-[#8X2:4] O2 C2 C3 O3 2 0 1.18 1.13
[#8X2:1]-[#6X4:2]-[#6X4:3]-[#8X2:4] O2 C2 C3 O3 3 0 0.14 0.90
[*:1]-[#6X4:2]-[#6X4:3]-[*:4] O3 C3 C4 C5 3 0 0.16 0.21
[*:1]-[#6X4:2]-[#8X2:3]-[#1:4] H61 C6 O6 HO6 3 0 0.17 0.11
[*:1]-[#6X4:2]-[#8X2:3]-[#1:4] H62 C6 O6 HO6 3 0 0.17 0.11
Table 4:

Inter-residue dihedral parameter differences between SMIRNOFF99Frosst and GAFF v2.1. Atom names refer to Figure 2. NP: not present. Barrier height in kcal/mol.

SMIRNOFF99Frosst GAFF v2.1

ID Atom 1 Res 1 Atom 2 Res 2 Atom 3 Res 3 Atom 4 Res 4 Per Phase Height (kcal/mol) Height (kcal/mol)
1 C1 n O1 n C4 n+1 C3 n+1 1 0 NP 0.00
C1 n O1 n C4 n+1 C3 n+1 2 0 0.10 0.16
C1 n O1 n C4 n+1 C3 n+1 3 0 0.38 0.24
2 C1 n O1 n C4 n+1 C5 n+1 1 0 NP 0.00
C1 n O1 n C4 n+1 C5 n+1 2 0 0.10 0.16
C1 n O1 n C4 n+1 C5 n+1 3 0 0.38 0.24
3 C2 n C1 n+1 O1 n+1 C4 n+1 1 0 NP 0.00
C2 n C1 n+1 O1 n+1 C4 n+1 2 0 0.10 0.16
C2 n C1 n+1 O1 n+1 C4 n+1 3 0 0.38 0.24
4 O1 n C4 n+1 C3 n+1 O3 n+1 1 0 NP 0.02
O1 n C4 n+1 C3 n+1 O3 n+1 2 0 1.18 0.00
O1 n C4 n+1 C3 n+1 O3 n+1 3 0 0.14 1.01
5 O1 n C4 n+1 C5 n+1 O5 n+1 1 0 NP 0.17
O1 n C4 n+1 C5 n+1 O5 n+1 2 0 1.18 0.00
O1 n C4 n+1 C5 n+1 O5 n+1 3 0 0.14 0.00
Figure 13:

Figure 13:

The dihedral energy term applied to three inter-residue dihedrals in SMIRNOFF99Frosst and GAFF v2.1. Atom names refer to Figure 2.

There are no improper dihedrals in αCD or βCD, nor any of the guests.

1.4.6. Structural consequences of the force field parameter differences

We observed a substantial difference between the conformational flexibility of the uncomplexed cyclodextrins in solution when simulated with GAFF v2.1 versus SMIRNOFF99Frosst and GAFF v1.7. With SMIRNOFF99Frosst and GAFF v1.7, the average RMSD of βCD, relative to the initial structure, is between 2.0–2.5 Å over 43 μs of unrestrained simulation, while with GAFF v2.1, the average RMSD is <1.0 Å (Figure 14). Not only are the RMSDs greater for SMIRNOFF99Frosst and GAFF v1.7, but there is greater variance in their RMSDs compared to GAFF v2.1, indicating greater flexibility. This large difference in structural fluctuations is clearly visible in the structure overlays also shown in the figure, which shows that GAFF v2.1 is the only one of the three force fields that leads to maintainance of a clearly defined binding cavity. In this respect, it is similar to the q4md-CD force field51, which was designed specifically for cyclodextrins and which also maintains a relatively well-defined cavity33.

This difference may be further analyzed by considering the “flip” pseudodihedral O2n–C1n–C4n+1–O3n+1, which characterizes the orientation of glucose monomers relative to their neighbors. An angle of 0° corresponds approximately to a glucose that forms part of a cylindrical wall of the binding cavity, while an angle of ± 90° indicates a glucose that has flipped to put its plane parallel to the top and bottom of the cylinder, partly filling the cavity. This dihedral is tightly distributed in GAFF v2.1, with all seven instances having a Gaussian-like distribution centered around −10° (Figure 15, A). GAFF v1.7 and SMIRNOFF99Frosst display, a mixed population of monomers both aligned with, and perpendicular to, the cyclodextrin cavity. In particular, during a single 1 μs simulation, each monomer will sample conformations at 0° and ±90°, as indicated by the timeseries in Figure 15, B. As detailed in the Discussion, the less flexible representation afforded by GAFF v2.1 agrees better with available NMR and crystallographic data.

Figure 15:

Figure 15:

(A) Population histograms of the pseudodihedral in free βCD, averaged over 43 μs, for each force field; one curve is drawn for each pseudodihedral in βCD. Representative structures for SMIRNOFF99Frosst and GAFF v2.1 are indicated by the black arrows. (B) Timeseries of a psuedodihedral in a GAFF v2.1 simulation (red), or SMIRNOFF99Frosst simulation (blue).

1.5. Discussion

As a terse representation of a GAFF-like force field, SMIRNOFF99Frosst performs remarkably well. Despite having far fewer parameters than GAFF v1.7 and GAFF v2.1, SMIRNOFF99Frosst performs as well as GAFF v1.7 and arguably better than GAFF v2.1 on estimated binding free energies of small molecules to αCD and βCD, based on the mean signed error relative to experiment. Moreover, SMIRNOFF99Frosst performs better than either GAFF v1.7 or GAFF v2.1 on predicted binding enthalpies, with a mean signed error less than 1 kcal/mol. It should be noted that the binding free energy and enthalpy root mean squared errors (RMSE) and mean signed errors (MSE) for GAFF v2.1 are not substantially worse than those of SMIRNOFF99Frosst, and GAFF v2.1 has statistically significant better correlations with the the experimental data. GAFF v2.1 has excellent agreement with experiment on predicted binding entropy, followed by SMIRNOFF99Frosst and then GAFF v1.7. Taken together, these results support the notion that a force field with many fewer parameters can provide competitive performance. The reduction in the number of parameters, and the simplification of the force field specification, will make it easier to iteratively refine and optimize SMIRNOFF99Frosst against experimental data and the results of quantum mechanical calculations.

However, both SMIRNOFF99Frosst and GAFF v1.7 result in excessively flexible representations of the cyclodextrin hosts, as detailed below. Cézard, et al. present strong NMR evidence that the vicinal 3J H5–H6’ (atom names H5–H62 in Figure 2) and 3J H5-H6”(atom names H5–H61) coupling show minimal fluctuation in distance over a number of timescales, suggesting little change in the population of rotamers51. This is also evident in X-ray structures, where the rigidity of the cyclodextrin ring is retained as long as water is present in the cavity and the torsional angles between adjacent glucose units show little variance (0.3–0.6°) across different crystal structures52. The combination of X-ray and NMR data suggest that the specialized q4md-CD51 force field, and the rigid GAFF v2.133 force field, better model the flexibility of the CD cavity. The CHARMM36 force field displays similar structural dynamics to q4md-CD, with certain GROMOS force fields even more rigid than those53. The present results suggest that, as SMIRNOFF99Frosst is further developed, it will be important to include sugars and other carbohydrates in the training sets used to develop parameters. Unfortunately, it may be challenging to find the types of high quality experimental data typically used to train force fields—heats of vaporization, heats of mixing, hydration free energies, and partition coefficients, among others—for biologically relevant sugars. Proper accounting of sugars, and protein-sugar interactions, will be especially useful for modeling physiologically relevant protein structures, such as proteoglycans and glycopeptides.

The greater rigidity of the cyclodextrins when simulated with GAFF v2.1 may contribute to its tendency to generate greater binding affinities and more negative enthalpies than the other two force fields, as a more rigid host may avoid an energy penalty associated with flipping the glucose residues out of the binding cavity to accommodate a guest molecule. The better preorganized cavity might also relate to the uniformly higher correlations between calculation and experiment for GAFF v2.1. On the other hand, it is perhaps unexpected that this force field which best represents the conformational preferences of the cyclodextrin yields consistently too negative binding free energies and enthalpies. It is worth noting the magnitude of these effects will depend on the guest parameters, as well as water model and ion parameters as well.

More broadly, the results presented in this manuscript further demonstrate that host-guest binding thermodynamics can be used to benchmark force fields, to help diagnose issues with parameters applied to specific functional groups, and to suggest directions for improvements. We are therefore continuing to build out experimental host-guest datasets tuned for this purpose, and to further streamline host-guest binding thermodynamics calculations so that binding data can be used alongside other data types, such as liquid properties, by automated tools for optimizing force field parameters.

Supplementary Material

Supporting Information

Figure 5:

Figure 5:

Shown are the αCD and βCD binding free energies for each guest, highlighting the differences in binding to the two hosts for SMIRNOFF99Frosst (A), GAFF v1.7 (B), or GAFF v2.1 (C). The binding affinity for αCD is circled in black. Thin colored lines connect data points for the same guest. Color is used purely to distinguish among the guests.

1.8. Acknowledgments

This work was funded in part by grant GM061300 to MKG from the National Institute of General Medical Sciences of the NIH. This work used computational resources from the Triton Shared Computing Cluster at UCSD. JDC was funded in part by grants R01 GM121505 and R01 GM124270 from the National Institute of General Medical Sciences of the NIH and P30 CA008748 from the National Cancer Institute of the NIH. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

1.9. Disclosures

The authors declare the following competing financial interest(s): MKG has an equity interest in and is a cofounder and scientific advisor of VeraChem LLC. JDC is a member of the Scientific Advisory Board of OpenEye Scientific Software. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, XtalPi, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, the Einstein Foundation, and the Sloan Kettering Institute. A complete funding history for the Chodera lab can be found at http://choderalab.org/funding.

1.10. List of abbreviations

APR

attach-pull-release

CD

cyclodextrin

GAFF

Generalized AMBER Force Field

Footnotes

1.6. Code and data availability

• GitHub repository used to convert AMBER input files from GAFF force field to SMIRNOFF99Frosst.

• GitHub repository for setting up the attach-pull-release calculations using paprika version 0.0.3.

• GitHub repository for analyzing the simulations and generating the plots in this manuscript.

• GitHub repository for the Open Force Field group containing the toolkit and force field XML file.

1.11. Supporting Information

Figures S1 through S13, Tables S1 through S9, and copies of Tables 1 through 4 and S1 through S9 in CSV format.

This information is available free of charge via the Internet at http://pubs.acs.org.

1.12. References

  • (1).Rizzi A; Murkli S; McNeill JN; Yao W; Sullivan M; Gilson MK; Chiu MW; Isaacs L; Gibb BC; Mobley DL; et al. Overview of the SAMPL6 Host-Guest Binding Affinity Prediction Challenge. J Comput Aided Mol Des 2018, 32 (10), 937–963. 10.1007/s10822-018-0170-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Heinzelmann G; Henriksen NM; Gilson MK Attach-Pull-Release Calculations of Ligand Binding and Conformational Changes on the First BRD4 Bromodomain. J. Chem. Theory Comput 2017, 13 (7), 3260–3275. 10.1021/acs.jctc.7b00275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Cole DJ; Cabeza de Vaca I; Jorgensen WL Computation of Protein-Ligand Binding Free Energies Using Quantum Mechanical Bespoke Force Fields. Med. Chem. Commun 2019, 10 (7), 1116–1120. 10.1039/c9md00017h. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (4).Aldeghi M; Heifetz A; Bodkin MJ; Knapp S; Biggin PC Predictions of Ligand Selectivity from Absolute Binding Free Energy Calculations. J. Am. Chem. Soc 2017, 139 (2), 946–957. 10.1021/jacs.6b11467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Wang L; Wu Y; Deng Y; Kim B; Pierce L; Krilov G; Lupyan D; Robinson S; Dahlgren MK; Greenwood J; et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc 2015, 137 (7), 2695–2703. 10.1021/ja512751q. [DOI] [PubMed] [Google Scholar]
  • (6).Roos K; Wu C; Damm W; Reboul M; Stevenson JM; Lu C; Dahlgren MK; Mondal S; Chen W; Wang L; et al. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput 2019, 15 (3), 1863–1874. 10.1021/acs.jctc.8b01026. [DOI] [PubMed] [Google Scholar]
  • (7).Song L; Lee T-S; Zhu C; York DM; Merz KM Jr. Validation of AMBER/GAFF for Relative Free Energy Calculations, 2019. 10.26434/chemrxiv.7653434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Yin J; Henriksen NM; Slochower DR; Shirts MR; Chiu MW; Mobley DL; Gilson MK Overview of the SAMPL5 Host-Guest Challenge: Are We Doing Better? J Comput Aided Mol Des 2016, 31 (1), 1–19. 10.1007/s10822-016-9974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Zheng Z; Ucisik MN; Merz KM The Movable Type Method Applied to Protein-Ligand Binding. J. Chem. Theory Comput 2013, 9 (12), 5526–5538. 10.1021/ct4005992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Papadourakis M; Bosisio S; Michel J Blinded Predictions of Standard Binding Free Energies: Lessons Learned from the SAMPL6 Challenge. J Comput Aided Mol Des 2018, 32 (10), 1047–1058. 10.1007/s10822-018-0154-6. [DOI] [PubMed] [Google Scholar]
  • (11).Bhakat S; Söderhjelm P Resolving the Problem of Trapped Water in Binding Cavities: Prediction of Host-Guest Binding Free Energies in the SAMPL5 Challenge by Funnel Metadynamics. J Comput Aided Mol Des 2016, 31 (1), 119–132. 10.1007/s10822-016-9948-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Tofoleanu F; Lee J; Pickard IV FC; König G; Huang J; Baek M; Seok C; Brooks BR Absolute Binding Free Energies for Octa-Acids and Guests in SAMPL5. J Comput Aided Mol Des 2016, 31 (1), 107–118. 10.1007/s10822-016-9965-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Henriksen NM; Fenley AT; Gilson MK Computational Calorimetry: High-Precision Calculation of Host-Guest Binding Thermodynamics. J. Chem. Theory Comput 2015, 11 (9), 4377–4394. 10.1021/acs.jctc.5b00405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Han K; Hudson PS; Jones MR; Nishikawa N; Tofoleanu F; Brooks BR Prediction of CB[8] Host-Guest Binding Free Energies in SAMPL6 Using the Double-Decoupling Method. J Comput Aided Mol Des 2018, 32 (10), 1059–1073. 10.1007/s10822-018-0144-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (15).Nishikawa N; Han K; Wu X; Tofoleanu F; Brooks BR Comparison of the Umbrella Sampling and the Double Decoupling Method in Binding Free Energy Predictions for SAMPL6 Octa-Acid Host-Guest Challenges. J Comput Aided Mol Des 2018, 32 (10), 1075–1086. 10.1007/s10822-018-0166-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Song LF; Bansal N; Zheng Z; Merz KM Detailed Potential of Mean Force Studies on Host-Guest Systems from the SAMPL6 Challenge. J Comput Aided Mol Des 2018, 32 (10), 1013–1026. 10.1007/s10822-018-0153-7. [DOI] [PubMed] [Google Scholar]
  • (17).Muddana HS; Fenley AT; Mobley DL; Gilson MK The SAMPL4 Host-Guest Blind Prediction Challenge: An Overview. J Comput Aided Mol Des 2014, 28 (4), 305–317. 10.1007/s10822-014-9735-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Gilson M; Given J; Bush B; McCammon J The Statistical-Thermodynamic Basis for Computation of Binding Affinities: A Critical Review. Biophysical Journal 1997, 72 (3), 1047–1069. 10.1016/s0006-3495(97)78756-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Fenley AT; Henriksen NM; Muddana HS; Gilson MK Bridging Calorimetry and Simulation Through Precise Calculations of Cucurbituril-Guest Binding Enthalpies. J. Chem. Theory Comput 2014, 10 (9), 4069–4078. 10.1021/ct5004109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Saenger W; Jacob J; Gessler K; Steiner T; Hoffmann D; Sanbe H; Koizumi K; Smith SM; Takaha T Structures of the Common Cyclodextrins and Their Larger AnaloguesBeyond the Doughnut. Chem. Rev 1998, 98 (5), 1787–1802. 10.1021/cr9700181. [DOI] [PubMed] [Google Scholar]
  • (21).Kellett K; Kantonen SA; Duggan BM; Gilson MK Toward Expanded Diversity of Host-Guest Interactions via Synthesis and Characterization of Cyclodextrin Derivatives. J Solution Chem 2018, 47 (10), 1597–1608. 10.1007/s10953-018-0769-1. [DOI] [Google Scholar]
  • (22).Del Valle E Cyclodextrins and Their Uses: A Review. Process Biochemistry 2004, 39 (9), 1033–1046. 10.1016/s0032-9592(03)00258-9. [DOI] [Google Scholar]
  • (23).Mobley DL; Bannan CC; Rizzi A; Bayly CI; Chodera JD; Lim VT; Lim NM; Beauchamp KA; Slochower DR; Shirts MR; et al. Escaping Atom Types in Force Fields Using Direct Chemical Perception. J. Chem. Theory Comput 2018, 14 (11), 6076–6092. 10.1021/acs.jctc.8b00640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).The Open Force Field Initiative. Open Force Field Initiative https://openforcefield.org/. [Google Scholar]
  • (25).Cheatham TE III; Cieplak P; Kollman PA A Modified Version of the Cornellet al.Force Field with Improved Sugar Pucker Phases and Helical Repeat. Journal of Biomolecular Structure and Dynamics 1999, 16 (4), 845–862. 10.1080/07391102.1999.10508297. [DOI] [PubMed] [Google Scholar]
  • (26).An Informal AMBER Small Molecule Force Field: parm@Frosst http://www.ccl.net/cca/data/parm_at_Frosst/ (accessed Oct 4, 2019).
  • (27).Case D; Ben-Shalom I; Brozell S; Cerutti D; Cheatham TI; Cruzeiro V; Darden T; Duke R; Ghoreishi D; Gilson M; et al. AMBER 2018. [Google Scholar]
  • (28).Daylight>SMIRKS Tutorial https://daylight.com/dayhtml_tutorials/languages/smirks/ (accessed Oct 4, 2019).
  • (29).Wang J; Wolf RM; Caldwell JW; Kollman PA; Case DA Development and Testing of a General Amber Force Field. J. Comput. Chem 2004, 25 (9), 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  • (30).A General Small Molecule Force Field Descended from AMBER99 and Parm@Frosst, Available in the SMIRNOFF Format: Openforcefield/smirnoff99Frosst, Open Force Field Initiative, 2019. [Google Scholar]
  • (31).Shirts MR; Chodera JD; Mobley DL; Gilson MK; Wang Lee-Ping. Open Force Field Roadmap. Unpublished 2019. 10.13140/rg.2.2.27587.86562. [DOI] [Google Scholar]
  • (32).Rekharsky MV; Mayhew MP; Goldberg RN; Ross PD; Yamashoji Y; Inoue Y Thermodynamic and Nuclear Magnetic Resonance Study of the Reactions of α- and β-Cyclodextrin with Acids, Aliphatic Amines, and Cyclic Alcohols. J. Phys. Chem. B 1997, 101 (1), 87–100. 10.1021/jp962715n. [DOI] [Google Scholar]
  • (33).Henriksen NM; Gilson MK Evaluating Force Field Performance in Thermodynamic Calculations of Cyclodextrin Host-Guest Binding: Water Models, Partial Charges, and Host Force Field Parameters. J. Chem. Theory Comput 2017, 13(9), 4253–4269. 10.1021/acs.jctc.7b00359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Rekharsky M; Inoue Y Chiral Recognition Thermodynamics of β-Cyclodextrin: the Thermodynamic Origin of Enantioselectivity and the Enthalpy-Entropy Compensation Effect. J. Am. Chem. Soc 2000, 122 (18), 4418–4435. 10.1021/ja9921118. [DOI] [Google Scholar]
  • (35).OpenEye Scientific Software. OEChem Toolkit 2019Apr. 2; Santa Fe, NM: http://www.eyesopen.com. [Google Scholar]
  • (36).Jakalian A; Jack DB; Bayly CI Fast, Efficient Generation of High-Quality Atomic Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem 2002, 23 (16), 1623–1641. 10.1002/jcc.10128. [DOI] [PubMed] [Google Scholar]
  • (37).Jakalian A; Bush BL; Jack DB; Bayly CI Fast, Efficient Generation of High-Quality Atomic Charges. AM1-BCC Model: I. Method. J. Comput. Chem 2000, 21 (2), 132–146. . [DOI] [PubMed] [Google Scholar]
  • (38).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. The Journal of Chemical Physics 1983, 79 (2), 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
  • (39).Joung IS; Cheatham TE III. Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B 2008, 112 (30), 9020–9041. 10.1021/jp8001614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Shirts MR; Klein C; Swails JM; Yin J; Gilson MK; Mobley DL; Case DA; Zhong ED Lessons Learned from Comparing Molecular Dynamics Engines on the SAMPL5 Dataset. J Comput Aided Mol Des 2016, 31 (1), 147–161. 10.1007/s10822-016-9977-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Velez-Vega C; Gilson MK Overcoming Dissipation in the Calculation of Standard Binding Free Energies by Ligand Extraction. J. Comput. Chem 2013, n/a-n/a 10.1002/jcc.23398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Kirkwood JG Statistical Mechanics of Fluid Mixtures. The Journal of Chemical Physics 1935, 3 (5), 300–313. 10.1063/1.1749657. [DOI] [Google Scholar]
  • (43).Shirts MR; Chodera JD Statistically Optimal Analysis of Samples from Multiple Equilibrium States. The Journal of Chemical Physics 2008, 129 (12), 124105 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).The Amber Molecular Dynamics Package http://www.ambermd.org.
  • (45).Hopkins CW; Le Grand S; Walker RC; Roitberg AE Long-Time-Step Molecular Dynamics Through Hydrogen Mass Repartitioning. J. Chem. Theory Comput 2015, 11 (4), 1864–1874. 10.1021/ct5010406. [DOI] [PubMed] [Google Scholar]
  • (46).Steinbrecher T; Mobley DL; Case DA Nonlinear Scaling Schemes for Lennard-Jones Interactions in Free Energy Calculations. The Journal of Chemical Physics 2007, 127 (21), 214108 10.1063/1.2799191. [DOI] [PubMed] [Google Scholar]
  • (47).Flyvbjerg H; Petersen HG Error Estimates on Averages of Correlated Data. The Journal of Chemical Physics 1989, 91 (1), 461–466. 10.1063/1.457480. [DOI] [Google Scholar]
  • (48).Add hydroxyl hydrogen radii, remove generics, update for 1.0.7 release by davidlmobley · Pull Request #74 · openforcefield/smirnoff99Frosst https://github.com/openforcefield/smirnoff99Frosst/pull/74 (accessed Oct 4, 2019).
  • (49).Remove generics, add hydrogen radii, rename ffxml files by davidlmobley · Pull Request #101 · openforcefield/openforcefield https://github.com/openforcefield/openforcefield/pull/101 (accessed Oct 4, 2019).
  • (50).Adjust hydroxyl hydrogen to have a small radius, requires more research · Issue #61 · openforcefield/smirnoff99Frosst https://github.com/openforcefield/smirnoff99Frosst/issues/61 (accessed Oct 4, 2019).
  • (51).Cézard C; Trivelli X; Aubry F; Djedaïni-Pilard F; Dupradeau F-Y Molecular Dynamics Studies of Native and Substituted Cyclodextrins in Different Media: 1. Charge Derivation and Force Field Performances. Phys. Chem. Chem. Phys 2011, 13 (33), 15103 10.1039/c1cp20854c. [DOI] [PubMed] [Google Scholar]
  • (52).Hingerty B; Saenger W Topography of Cyclodextrin Inclusion Complexes. 8. Crystal and Molecular Structure of the .Alpha.-cyclodextrin-Methanol-Pentahydrate Complex. Disorder in a Hydrophobic Cage. J. Am. Chem. Soc 1976, 98 (11), 3357–3365. 10.1021/ja00427a050. [DOI] [Google Scholar]
  • (53).Gebhardt J; Kleist C; Jakobtorweihen S; Hansen N Validation and Comparison of Force Fields for Native Cyclodextrins in Aqueous Solution. J. Phys. Chem. B 2018, 122 (5), 1608–1626. 10.1021/acs.jpcb.7b11808. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

RESOURCES