Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Feb 28.
Published in final edited form as: J Comput Aided Mol Des. 2011 Dec 24;26(5):543–550. doi: 10.1007/s10822-011-9525-y

Predicting binding affinities of host-guest systems in the SAMPL3 blind challenge

The performance of relative free energy calculations

Gerhard König 1, Bernard R Brooks 1
PMCID: PMC3584352  NIHMSID: NIHMS426311  PMID: 22198474

Abstract

Relative free energy calculations based on molecular dynamics simulations were combined with available experimental binding free energies to predict unknown binding affinities of acyclic Cucurbituril complexes in the blind SAMPL3 competition. The predictions showed good agreement with experimental results, yielding root mean square errors of about 2.6 kcal/mol for seven host-guest systems. However, the standard deviations found in our simulations were ranging up to 2.4 kcal/mol, which indicates the need for better sampling. We compare the performance of three different approaches: Bennett’s Acceptance Ratio Method and Thermodynamic Integration based on both the trapezoidal and Simpson’s rule. Surprisingly, both Bennett’s Acceptance Ratio Method and Thermodynamic Integration with trapezoidal rule lead to the same root mean square error. We also evaluate the influence of the protonation states of the amine groups of the guest molecules, showing that the deprotonated forms exhibit a poorer correspondence to experimental results with a root mean square error of 5.2 kcal/mol. In addition, we demonstrate that a decrease of the buffer concentration by about 20mM in our simulations can raise the root mean square error to 3.8 kcal/mol.

Keywords: binding free energy calculations, Bennett’s acceptance ratio, Thermodynamic Integration, protonation state, buffer concentration

1 Introduction

So-called “free energy simulations” are considered among the most accurate and general methodologies in the field of computational chemistry. They provide means to study diverse processes such as the binding affinities of ligands[25, 21], enzymatic reactions[11], the solvation of organic molecules[20], as well as the effect of point mutations[26].

However, most applications of free energy simulations do not provide reliable data on the accuracy of the method, since either a.) there is no reference data for an assessment of its quality or b.) the free energy calculations were conducted after the experimental reference results became available. In the latter case, the reference results lead to a selection process, where simulations with a high agreement with experiment become published, while simulations with a poor agreement will not be disclosed. Since only successes are reported this way, an unrealistically positive picture is presented.

Such forms of bias can be avoided by employing blind studies, where the reference results are not known a priori. In computational chemistry, the SAMPL blind challenges have been established during the past years for assessing methods for the prediction of free energies[24, 8, 7]. Those challenges supplied data on the expected accuracies of current force fields, in particular with respect to the solute-water interactions. In the form of the hydrophobic effect, those interactions play a vital role in all biological processes. In SAMPL0 and SAMPL1, the root mean square errors (RMSE) of free energy calculations based on molecular dynamics simulations with explicit solvent ranged between 1.3[24] and 3.5 kcal/mol[8, 19], which gives a good picture of the errors that can be expected from this method for small systems. E.g., the same approach yielded a RMSE of 2.8 kcal/mol when predicting the solvation free energies of 23 small organic compounds in the SAMPL2 competition[13].

In SAMPL3, the prediction of binding affinities of host-guest systems (ΔGbind) was added to the list of challenges. These systems include Cucurbituril molecular containers that are able to selectively bind ligands with cationic groups. Through their binding-pocket-like structure, host-guest systems exhibit many of the features of protein-ligand complexes, while still being small enough to be computationally tractable - no global conformational changes or unfolding events can occur during the simulation. This relative simplicity and robustness makes them a very useful benchmark system for computational methods. For example, Moghaddam et al. employed M2 free energy calculations to design guest systems with ultrahigh affinity to Cucurbit[7]uril[22], obtaining RMSE from experimental results between 2.7 and 4.6 kcal/mol for all compounds included in their study.

In a recent publication[17], Ma et al. described the synthesis of the acyclic Cucurbituril congener employed in the SAMPL3 challenge. They also tested its function as a host to a structurally diverse set of ammonium ions. This study included experimentally determined binding constants between ~ 105 and 109 M−1 for 26 guest molecules. In our work, we employ these experimental binding affinities as a starting point of our binding affinity predictions. I.e., our absolute binding free energy predictions (ΔGbindtarget) are a combination of the experimentally derived absolute binding free energy of a reference molecule (ΔGbindref) as given in Ref. [17] and computed relative binding free energies to the target molecules of the SAMPL3 competition (ΔΔGbind)

ΔGbindtarget=ΔGbindref+ΔΔGbindreftarget (1)

The reference molecules[17] and the corresponding targets of the SAMPL3 competition are shown on the left and the right side of Fig. 1. By employing relative (alchemical) free energy calculations rather than absolute calculations, the mutations of the system are relatively mild, i.e. most interactions within the complex are only slightly affected by the transformation. Through error compensation, we expect this approach to reduce of the errors due to the imperfections of the force field[18].

Fig. 1.

Fig. 1

Chemical structures of the guest molecules. Reference molecules of known binding affinity are shown on the left side of the arrows and the corresponding target molecules of the free energy calculations for the SAMPL3 challenge are shown on the right side.

To calculate such relative free energy differences, several free energy methods are available, whereof Bennett’s acceptance ratio method (BAR)[1] and thermodynamic integration (TI)[12] are among the most widely used (for a short description of the two methods, see the Methods section). While BAR is generally considered more efficient than TI, a recent publication by Bruckner and Boresch suggests that TI can be as efficient as BAR[4] granted that a good numerical quadrature scheme is employed. If the available simulation lengths are short, TI sometimes even outperforms BAR in terms of efficiency. We, therefore, decided to employ both BAR and TI for the analysis of the trajectories, using both the trapezoidal (TI-TR) and Simpson’s rule (TI-SI) for the numerical quadrature step in TI. This allows us to compare the relative competitiveness of those three methods.

The remainder of this paper is organized as follows. First, we outline the methods employed in more detail. We then present the results for BAR, TI-TR and TI-SI and assess their accuracy. Finally, we conclude with a short discussion on the influence of parameters such as the protonation state of the guest molecule and the buffer concentration on the binding free energy results.

2 Methods

We calculated the relative binding free energies of the seven guest molecules in the SAMPL3 challenge (labeled 1–7 in Fig. 1, numbers shown in bold). The relative free energy calculations were started from reference molecules of known binding affinity as published by Ma et al.[17] (numbers in italics). In particular, structures 26, 19 and 11 from Ref. [17] were employed (in order of appearance from top to bottom on the left side of Fig. 1). Their corresponding absolute binding free energies were reported to be −7.0, −8.0 and −11.3 kcal/mol. Reference molecule 26 was employed for molecules 1,3,4,5 and 7. We assumed that those molecules contain only a single protonated amine group. For the other two reference molecules 19 and 11 and their corresponding target molecules 2 and 6 we assumed that they contain two protonated amine groups. In our simulations, also the four carboxyl groups in the host molecule were deprotonated. Binding affinities of the two stereoisomers of guest molecule 1 were calculated separately (named 1S and 1R). The presented data for 1 are the average of the 1S and 1R results. All free energy calculations were conducted with CHARMM[2, 3], using the PERT module of CHARMM and the CHARMM General Force Field for organic molecules (CGenFF)[29], program version 0.9.1 beta, as provided on www.paramchem.org.

2.1 Outline of the free energy methods

TI[12], involves numerical quadrature to determine the free energy difference between two states 0 and 1. Between 0 and 1, several intermediate states can be generated by mixing the respective potential energy functions U. The mixing ratio between U0 for state 0 and U1 for state 1 is given by the factor λ (e.g.: U(λ) = λU1 + (1 − λ)U0). In TI, λ is considered as a continuous variable that can be used for differentiation or integration. Using integration, the free energy difference between 0 and 1 can be regarded as

ΔG=01G(λ)λdλ (2)

which leads to the equation for TI

ΔGTI=01dλU(λ)λλ (3)

In practice, this integral is evaluated by conducting several simulations at discrete values of λ to evaluate U(λ)λλ and then employing numerical quadrature to approximate the integral. This can be done by using the trapezoidal rule or numerical quadrature schemes of higher order such as Simpson’s rule[5].

BAR[1], on the other hand, requires two simulations. At each end point of the free energy calculation the potential energy differences are evaluated, using

ΔG=β1ln (f(U0U1+C)1f(U1U0+C)0)+C (4)

where f denotes the Fermi function. Indexes 0 and 1 indicate that the ensemble averages are calculated over all coordinate frames generated for the initial and final state. Bennett showed that C can found through an iterative procedure. The BAR equation can also be derived using maximum likelihood techniques[27].

2.2 Computation of relative binding affinities

Each relative binding free energy (ΔΔGbindreftarget) was calculated with the standard thermodynamic cycle, which includes two kinds of calculations: 1.) Transforming the reference molecule to the target molecule in solution (ΔGH2Oreftarget) and 2.) conducting the same transformation while bound to the host (ΔGhostreftarget). Thus,

ΔΔGbindreftarget=ΔGhostreftargetΔGH2Oreftarget (5)

In all ΔGhostreftarget and ΔGH2Oreftarget simulations 1492 TIP3P water molecules [10, 23] were present. Na+ or Cl ions were added to neutralize the total charge of the system and an additional NaCl pair was included to improve the sampling and obtain an ionic strength similar to the experimental conditions. The simulation box was a truncated octahedron. The side length L of the cube from which the octahedron was generated was originally L = 40.0 Å. However, we used constant pressure during free energy simulations. Integration of the equations of motion was carried out with the velocity-Verlet algorithm as implemented in the TPCNTRL module of CHARMM[15]; the time step was 2 fs. The temperature was maintained at about 300 K using two separate Nosé-Hoover thermostats[9] for solute and solvent. SHAKE[28] was used to keep the water geometry rigid. Lennard-Jones interactions were switched off between 10–12 Å, while electrostatic interactions were computed with the Particle Mesh Ewald method [6]. Each host-guest system was equilibrated for 200 ps before production.

All alchemical mutations were split into 11–12 λ intermediate steps, using soft core Lennard Jones and electrostatic interactions. 12λ points were used for the BAR and TI-TR results (λ= 0.0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0). Since Simpson’s rule requires an odd number of equally-spaced data points for numerical quadrature, the TI-SI results are based on 11λ points (λ= 0.0, 0.1 … 1.0). Each λ point was simulated for 3 ns and all simulations were repeated four times, starting from different initial random velocities, to allow the calculation of standard deviations. Thus, each result submitted to SAMPL3 was based on a total simulation time of about 288 ns, leading to a combined computational effort equivalent to 2.3 µs for all predictions taken together.

2.3 Free energy simulations with different protonation states

To determine the influence of the protonation state of the guest on the binding affinity result (ΔΔΔGbinddeprot), additional simulations were conducted after the deadline for the SAMPL3 competition. For this purpose, the protonated and deprotonated state of all reference and target molecules were simulated with implicit solvent to determine the free energy differences. Again, a thermodynamic cycle was employed, using a.) free energy simulations between the protonated and deprotonated state in water for the reference (ΔGH2Oref.H+ref), as well as the target (ΔGH2Otarget.H+target) and b.) the corresponding simulations while bound to the host molecule (ΔGhostref.H+ref,ΔGhosttarget.H+target), leading to

ΔΔGH2Odeprot=ΔGH2Otarget.H+targetΔGH2Oref.H+ref (6)
ΔΔGhostdeprot=ΔGhosttarget.H+targetΔGhostref.H+ref (7)

so that

ΔΔΔGbinddeprot=ΔΔGhostdeprotΔΔGH2Odeprot (8)

The free energy differences were calculated using Langevin dynamics simulations with a friction coefficient of 5 ps−1 on all heavy atoms. Random forces were applied according to the target temperature of 300 K. To justify a time step of 1.5 ps, hydrogen masses were set to 10 amu. The effect of the solvent was modeled with GBMV[16]. In previous studies GBMV showed a very good agreement with explicit solvent results for several relative solvation free energies (RMSE=0.5 kcal/mol[14]), therefore the expected error due to the implicit solvent model can be assumed to be small. Free energy differences were determined with BAR, using two steps: one for changing the charges and the other step for changing the atom types. For guest molecule 6 there was not sufficient phase space overlap to obtain converged results - therefore, no data is shown for this guest. The simulation length at each endpoint was 15 ns, the first 1.5 ns of which were discarded as equilibration. Each simulation was repeated thrice with different random seeds.

2.4 Simulations with different buffer concentrations

To determine the influence of the buffer concentration on the binding affinity result (ΔΔΔGbindΔcbuffer), the additional NaCl pairs present in the binding affinity calculations (discussed in Section 2.2) were removed alchemically from the host-guest systems 2, 4 and 6. Again, a thermodynamic cycle was employed, using free energy simulations between the systems that include the additional NaCl pair and systems where the electrostatic and Lennard-Jones interactions of that pair are turned off. This was done a.) in aqueous solution for the reference molecule (ΔGH2Oref+NaClref), as well as the target molecule (ΔGH2Otarget+NaCltarget) and b.) the corresponding simulations while bound to the host molecule (ΔGhostref+NaClref,ΔGhosttarget+NaCltarget), leading to

ΔΔGH2OΔcbuffer=ΔGH2Otarget+NaCltargetΔGH2Oref+NaClref (9)
ΔΔGhostΔcbuffer=ΔGhosttarget+NaCltargetΔGhostref+NaClref (10)

so that

ΔΔΔGbindΔcbuffer=ΔΔGhostΔcbufferΔΔGH2OΔcbuffer (11)

The corresponding free energy simulations were calculated using the same simulation setup as described in subsection 2.2. The electrostatic and Lennard-Jones interactions of the Na+ and Cl ions were turned off in 12λ steps (λ= 0.0, 0.1, … , 0.9, 0.95, 1.0), using soft cores. Each λ point was simulated for 2 ns. Free energy differences were calculated using TI and the trapezoidal rule.

3 Results and Discussion

The results for the absolute binding free energies (ΔGbind) of the seven host-guest systems are shown in Table 1. Three free energy methods were employed: BAR (second column), TI-TR (third column) and TI-SI (fourth column). The ± sign represents the corresponding standard deviations, which were calculated from four repetitions of each calculation. Generally, the standard deviations are very high, ranging between 0.5 and 1.3 kcal/mol for BAR, 0.4 − 1.9 kcal/mol for TI-TR and rising up to 0.6 − 2.4 kcal/mol for TI-SI. However, the average standard deviations were similar for all three methods (0.8 for BAR and TI-TR, 1.0 for TI-SI). Since the standard deviations reflect the quality of the sampling, this indicates that the binding free energy results are not converged. Significantly longer trajectories would have been required to achieve what we consider adequate standard deviations of about 0.3 kcal/mol.

Table 1.

Computed absolute binding free energies ΔGbind and their corresponding standard deviations for the seven guest molecules of the SAMPL3 challenge. Three different methods were used: Bennett’s acceptance ratio method (BAR) with 12 λ points and thermodynamic integration with the trapezoidal rule and 12 λ points (TI-TR), as well as Simpson’s rule with 11 λ points (TI-SI). The experimental results (Exp.) of the binding free energies are shown on the right side in bold. The root mean square errors (RMSE) of the computational results are shown in the last row in bold. All free energy differences are in kcal/mol.

Guest
ΔGbindBAR
ΔGbindTI-TR
ΔGbindTI-SI
Exp.
1 −9.1 ± 0.9 −8.5 ± 0.4 −7.5 ± 0.7 −5.8
2 −7.3 ± 0.5 −7.4 ± 0.6 −6.8 ± 0.6 −7.1
3 −5.3 ± 1.3 −3.8 ± 1.9 −2.0 ± 2.4 −6.8
4 −8.6 ± 0.7 −7.3 ± 1.3 −8.7 ± 1.4 −4.2
5 −9.3 ± 0.8 −9.0 ± 0.7 −6.6 ± 0.6 −6.1
6 −9.0 ± 1.0 −8.7 ± 1.2 −7.1 ± 1.0 −10.7
7 −6.4 ± 0.7 −5.0 ± 0.9 −4.5 ± 1.0 −7.9
RMSE 2.6 2.6 3.2

The experimental results for the binding free energies are presented in the rightmost column of Table 1. They form the basis for the root mean square errors (RMSE) presented in the last line for each computational method. The RMSE serve as a measure for the accuracy of each method. In terms of RMSE, the accuracies of BAR and TI-TR are equal (2.6 kcal/mol), while the errors of TI-SI are higher (3.2 kcal/mol). All three results fall into the same range of RMSE as experienced during past SAMPL competitions for the hydration free energies of organic molecules (between 1.3[24] and 3.5 kcal/mol[8, 19]). This demonstrates that solvation free energies are a good benchmark system for free energy calculations of even larger molecular complexes.

When comparing the accuracy of the three methods, the relatively weak performance of TI-SI might come as a surprise, considering that recent studies[5] demonstrated that TI, in connection with Simpson’s rule (or other higher-order numerical integration schemes), is by far superior to the simple trapezoidal rule. However, better quadrature methods can enhance the efficiency of TI only if the shape of the integrand ∂U/∂λ is well-behaved and the values of ∂U/∂λ are converged. In Fig. 3 we show a typical ∂U/∂λ plot from our calculations. The four lines represent four different repetitions of the simulation. As can be seen, the aforementioned conditions are not met: Both the uncertainties of ∂U/∂λ (as illustrated by differences of the four repetitions) as well as the changes of ∂U/∂λ (a steep decrease between λ = 0.0 and λ = 0.1) are very high. In such cases, it is more efficient to introduce additional λ-points in the problematic regions of the ∂U/∂λ plot and to run longer simulations rather than employing high-order numerical quadrature. This is reflected by the difference of the RMSE of TI-TR and TI-SI (2.6 versus 3.2 kcal/mol).

Fig. 3.

Fig. 3

Comparison of the BAR and TI-TR results for 12 λ-points. The experimental binding free energies of the guest molecules (x-axis) are plotted versus the computational results (y-axis). The TI-TR results and error bars are shown in red, while each BAR result and its corresponding error bar are marked by a blue X and a dashed line. The line of ideal correspondence between experiments and predictions is shown in green, while the range within 2 kcal from the experimental results is indicated by two orange lines.

It is interesting to observe that BAR and TI-TR produce the same RMSE in two different ways (see Fig. 3). With the notable exception of 2, most TI-TR predictions are consistently off from the experimental results by about 2 to 3 kcal/mol (lying outside the orange lines in Fig. 3). Those deviations can probably be attributed to the problems of numerical quadrature as illustrated in Fig. 3. For BAR, the predictions can be divided into two groups: the outliers 1,4 and 5, which exhibit a RMSE of 3.7 kcal/mol and the group of 2,3,6,7 with a relatively low RMSE of 1.4 kcal/mol. Since there are no consistent chemical patterns that distinguish the two groups, the errors are not likely to arise simply from imperfections of the force-field. We assume that the cause of this effect is probably a mixture of fortuitous cancellation of errors and insufficient sampling.

Another aspect of the accuracy of the binding free energy simulations was the selection of protonation states of both the guest molecules and the host. Since the protonation states in our simulations were picked based on similarities to molecules of known pKa’s in chemical textbooks, we were interested to see what would have happened if we had chosen the deprotonated state for our simulations. For that purpose, additional free energy calculations were conducted after the deadline for SAMPL3 to determine the change of binding affinity after deprotonation of the guest molecules (ΔΔΔGbinddeprot). The results for ΔΔΔGbinddeprot (second column) as well as the corresponding absolute binding affinities (third column) are shown in Table 2. In addition, also the standard deviations due to error propagation (after the ± sign) and the RMSE of the resulting absolute binding free energies (last row) are presented.

Table 2.

Effect of employing the deprotonated form of the guest molecules on the binding affinity (ΔΔΔGbinddeprot). While in the second column the changes of the binding free energies due to deprotonation (plus the associated standard deviations) are shown, the resulting absolute binding free energies of the deprotonated guests and the standard deviations due to error propagation are shown in the rightmost column. The corresponding RMSE for the deprotonated guests is presented in the last row. All free energy differences are in kcal/mol.

Guest
ΔΔΔGbinddeprot
ΔGbinddeprot
1 0.0 ± 3.2 −9.1 ± 3.2
2 6.5 ± 1.2 −0.8 ± 1.3
3 3.8 ± 3.3 −1.5 ± 3.6
4 −2.5 ± 2.6 −11.1 ± 2.7
5 1.6 ± 2.7 −7.7 ± 2.8
7 0.6 ± 2.7 −5.8 ± 2.8
RMSE 5.2

When taken as a whole, the binding affinity results of the deprotonated guest molecules deviate significantly from the experimental results, as indicated by the RMSE of 5.2 kcal/mol. Compared with the results of the protonated forms from Table 1, only guest molecule 5 lies closer to the experimental value of −6.1 kcal/mol than its corresponding protonated form (−7.7 kcal/mol instead of 9.3 kcal/mol). This indicates that the guest molecules are protonated in their bound form. Generally, the standard deviations for ΔΔΔGbinddeprot are very high, ranging between 1.2 and 3.3 kcal/mol. Therefore, most of the results are not statistically significant (i.e., distinguishable from zero). This can be attributed to two effects a.) improper sampling and b.) the detachment of the guest molecule from the host in some simulations. The latter problem could have been avoided by employing restraints to restrict their sampling to the binding pocket.

In their experimental paper[17], Ma et al. presented data on the dependence of the binding affinity on the buffer concentration. They hypothesized that the cations in solution bind to the host molecule and thereby reduce the affinity toward the guests. This effect was demonstrated for one guest molecule by changing the sodium phosphate buffer concentration from 24.5 mM to 57.2 mM, which caused the binding affinity to change by about 0.4 kcal/mol. Since in our binding free energy simulations sodium chloride was employed instead of sodium phosphate, we had increased the buffer concentration relative to the experimental conditions in order to obtain about the same ionic strength (i.e. 40 − 50mM instead of 20mM). To check whether our binding affinity results were affected by the buffer concentration employed in our simulations, we conducted additional simulations after the deadline for the SAMPL3 competition. In those simulations, we lowered the buffer concentration to about 20−30mM by alchemically removing one ion pair from our simulations. The changes of the binding free energy due to this change of buffer concentration (ΔΔΔGbindΔcbuffer) are shown in Table 3.

Table 3.

Effect of the buffer concentration on the binding affinity (ΔΔΔGbindΔcbuffer) as a result from alchemically lowering the sodium chloride concentration in our simulations by about 10 – 20mM. The resulting absolute binding free energies of the systems with a concentration of 20 – 30mM and the standard deviations due to error propagation are shown in the rightmost column. The corresponding RMSE for the deprotonated guests is presented in the last row. All free energy differences are in kcal/mol.

Guest
ΔΔΔGbindΔcbuffer
ΔGbind20mM buffer
1 −0.7 ± 3.1 −9.8 ± 3.2
2 0.9 ± 1.7 −6.4 ± 1.8
3 2.4 ± 2.7 −2.9 ± 3.0
4 4.7 ± 2.2 −3.9 ± 2.3
5 −0.7 ± 1.6 −10.0 ± 1.8
6 1.1 ± 3.6 −7.9 ± 3.7
7 5.3 ± 3.0 −1.1 ± 3.1
RMSE 3.8

The ΔΔΔGbindΔcbuffer results fluctuate between −0.7 and +5.4 kcal/mol. This finding is surprising, given that the buffer competitively binds to the host molecule, and, therefore, ΔΔΔGbindΔcbuffer should be approximately the same for all binding free energy predictions. The high standard deviations of up to 3.6 kcal/mol indicate that the simulations are not converged, i.e. the sampling of the ions was not complete. Thus, our free energy results will strongly depend on the initial positions of the ions. Only the results for 4 and 7 can be considered statistically significant. The absolute binding affinity of 4 lies at −3.9 kcal/mol, which is significantly closer to the experimental values of −4.2 kcal/mol than in the original case. On the other hand the correspondence of 7 with experimental results is lowered. In total, the RMSE of all ΔGbind20mM buffer is higher than in our original predictions (3.8 versus 2.6 kcal/mol), demonstrating that the use of the right ionic strength is more important than reproducing the buffer concentration.

4 Conclusions

The binding free energies of seven host-guest systems were predicted using relative free energy calculations. For the analysis of the trajectories, three different methods were employed: BAR using 12 λ points, TI with the trapezoidal rule using 12 λ-points and TI with Simpson’s rule using 11 λ-points. While both BAR and TI with 12 λ-points resulted in a RMSE of 2.6 kcal/mol, the corresponding TI result with 11 λ-points yielded a RMSE of 3.2 kcal/mol. We demonstrated that this difference can be traced back to the shape and the uncertainties of the ∂U/∂λ integrand.

Overall, our data shows that binding affinities of host-guest systems can be determined with about the same accuracy as solvation free energies of relatively small organic molecules in the previous SAMPL challenges (i.e., with a root mean square error of about 2–3 kcal/mol). This demonstrates that solvation free energies are indeed a valuable benchmark system. Given that up to six groups of unknown protonation state and in very close proximity are involved in the binding process of the host-guest systems employed here, the target of the predictions is very challenging. This is reflected by the large differences between the simulations with protonated and unprotonated guest molecules: Simulations of the unprotonated state exhibited a significantly lower correspondence to experimental results (RMSE of 5.2 kcal/mol). However, also changing the buffer concentration by 20 mM can increase the RMSE to 3.8 kcal/mol.

A striking feature of all results are the very high standard deviations, which range between 0.4 and 2.4 kcal/mol (or, relative to the absolute results, between 5 and 120%). About 30% of the RMSE can be explained in terms those uncertainties. Considering that the simulation lengths were very short (3 ns for each λ point), these standard deviations only signify the uncertainties due to the sampling of a local energy minimum. This means that our free energy results are far from converged and, therefore, should be taken with a grain of salt. Instead of regarding our results as an absolute measure of the quality of current force fields, they should rather be seen as an indicator where more methodological development in the field of free energy simulations will be required. We believe that there is still significant room for improvement, especially in the area of sampling and the handling of groups with unknown protonation states. We, therefore, look forward to future free energy prediction challenges.

Fig. 2.

Fig. 2

Illustration of the difficulties encountered by TI: Plot of ∂U/∂λ as a function of λ for the free energy difference between 26 and 3 in complex with the host. The four different colors indicate the results of four different simulations. Between λ = 0.0 and λ = 0.1, the curves are very steep and the uncertainties of ∂U/∂λ are very high. This causes large errors in the numerical quadrature step of TI.

Acknowledgements

The authors would like to thank A. Damjanović and R. Venable for stimulating discussions on the potential protonation states of the host and guest molecules as well as A. Okur and F. Pickard for helpful comments on the manuscript.

References

  • 1.Bennett CH. Efficient estimation of free energy differences from Monte Carlo data. J. Comp. Phys. 1976;22:245–268. [Google Scholar]
  • 2.Brooks B, Brooks C, III, Mackerell A, Jr, Nilsson L, Petrella R, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caisch A, Caves L, Cui Q, Dinner A, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor R, Post C, Pu J, Schaefer M, Tidor B, Venable R, Woodcock H, Wu X, Yang W, York D, Karplus M. CHARMM: The Biomolecular Simulation Program. J. Comp. Chem. 2009;30(10, Sp. Iss. SI):1545–1614. doi: 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. CHARMM: A program for macromolecular energy, minimization and dynamics calculations. J. Comput. Chem. 1983;4:187–217. [Google Scholar]
  • 4.Bruckner S, Boresch S. Efficiency of Alchemical Free Energy Simulations. I. A Practical Comparison of the Exponential Formula, Thermodynamic Integration, and Bennett’s Acceptance Ratio Method. J. Comp. Chem. 2011;32(7):1303–1319. doi: 10.1002/jcc.21713. [DOI] [PubMed] [Google Scholar]
  • 5.Bruckner S, Boresch S. Efficiency of alchemical free energy simulations. II. Improvements for thermodynamic integration. J. Comp. Chem. 2011;32(7):1320–1333. doi: 10.1002/jcc.21712. [DOI] [PubMed] [Google Scholar]
  • 6.Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. J. Chem. Phys. 1995;103:8577–8593. [Google Scholar]
  • 7.Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ. The SAMPL2 blind prediction challenge: introduction and overview. J. Comput-Aided Mol. Des. 2010;24(4, SI):259–279. doi: 10.1007/s10822-010-9350-8. [DOI] [PubMed] [Google Scholar]
  • 8.Guthrie JP. A Blind Challenge for Computational Solvation Free Energies: Introduction and Overview. J. Phys. Chem. B. 2009;113(14):4501–4507. doi: 10.1021/jp806724u. [DOI] [PubMed] [Google Scholar]
  • 9.Hoover WG. Canonical dynamics: Equlibrium phase-space distributions. Phys. Rev. A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
  • 10.Jorgensen WL, Chandrasekhar H, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926. [Google Scholar]
  • 11.Kästner J, Senn H, Thiel S, Otte N, Thiel W. QM/MM free-energy perturbation compared to thermodynamic integration and umbrella sampling: Application to an enzymatic reaction. J. Chem. Theory Comput. 2006;2(2):452–461. doi: 10.1021/ct050252w. [DOI] [PubMed] [Google Scholar]
  • 12.Kirkwood JG. Statistical mechanics of fluid mixtures. J. Chem. Phys. 1935;3:300–313. [Google Scholar]
  • 13.Klimovich PV, Mobley DL. Predicting hydration free energies using all-atom molecular dynamics simulations and multiple starting conformations. J. Comput.-Aided Mol. Des. 2010;24(4, Sp. Iss. SI):307–316. doi: 10.1007/s10822-010-9343-7. [DOI] [PubMed] [Google Scholar]
  • 14.König G, Boresch S. Hydration Free Energies of Amino Acids: Why Side Chain Analog Data Are Not Enough. J. Phys. Chem. B. 2009;113(26):8967–8974. doi: 10.1021/jp902638y. [DOI] [PubMed] [Google Scholar]
  • 15.Lamoureux G, Roux B. Modeling induced polarization with classical Drude oscillators: Theory and molecular dynamics simulation algorithm. J. Chem. Phys. 2003;119(6):3025–3039. [Google Scholar]
  • 16.Lee MS, Feig M, Salsbury FR, Brooks CL., III New analytic approximation to the standard molecular volume definition and its application to generalized born calculations. J. Comput. Chem. 2003;23:1348–1356. doi: 10.1002/jcc.10272. [DOI] [PubMed] [Google Scholar]
  • 17.Ma D, Zavalij PY, Isaacs L. Acyclic Cucurbit[n]uril Congeners Are High Affinity Hosts. J. Org. Chem. 2010;75(14):4786–4795. doi: 10.1021/jo100760g. [DOI] [PubMed] [Google Scholar]
  • 18.Merz KM. Limits of free energy computation for protein-ligand interactions. J. Chem. Theory Comput. 2010;6:1018–1027. doi: 10.1021/ct100102q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mobley DL, Bayly CI, Cooper MD, Dill KA. Predictions of Hydration Free Energies from All-Atom Molecular Dynamics Simulations. J. Phys. Chem. B. 2009;113(14):4533–4537. doi: 10.1021/jp806838b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mobley DL, Bayly CI, Cooper MD, Shirts MR, Dill KA. Small molecule hydration free energies in explicit solvent: An extensive test of fixed-charge atomistic simulations. J. Chem. Theory Comp. 2009;5(2):350–358. doi: 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mobley DL, Graves AP, Chodera JD, McReynolds AC, Shoichet BK, Dill KA. Predicting absolute ligand binding free energies to a simple model site. J. Mol. Biol. 2007;371(4):1118–1134. doi: 10.1016/j.jmb.2007.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Moghaddam S, Yang C, Rekharsky M, Ko YH, Kim K, Inoue Y, Gilson MK. New Ultrahigh Affinity Host-Guest Complexes of Cucurbit[7]uril with Bicyclo[2.2.2]octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J. Am. Chem. Soc. 2011;133(10):3570–3581. doi: 10.1021/ja109904u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Neria E, Fischer S, Karplus M. Simulation of activation free energies in molecular systems. J. Chem. Phys. 1996;105:1902. [Google Scholar]
  • 24.Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS. Predicting small-molecule solvation free energies: An informal blind test for computational chemistry. J. Med. Chem. 2008;51:769–779. doi: 10.1021/jm070549+. [DOI] [PubMed] [Google Scholar]
  • 25.Oostenbrink C, van Gunsteren W. Free energies of ligand binding for structurally diverse compounds. Proc. Natl. Acad. Sci. U.S.A. 2005;102(19):6750–6754. doi: 10.1073/pnas.0407404102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Seeliger D, de Groot B. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys J. 2010;98(10):2309–2316. doi: 10.1016/j.bpj.2010.01.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Shirts MR, Bair E, Hooker G, Pande VS. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Phys. Rev. Lett. 2003;91:140,601. doi: 10.1103/PhysRevLett.91.140601. [DOI] [PubMed] [Google Scholar]
  • 28.Van Gunsteren WF, Berendsen HJC. Algorithms for macromolecular dynamics and costraint dynamics. Mol. Phys. 1977;34:1311–1327. [Google Scholar]
  • 29.Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I, MacKerell AD., Jr CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. J. Comp. Chem. 2010;31(4):671–690. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES