Abstract
The Binding Energy Distribution Analysis Method (BEDAM) protocol has been employed as part of the SAMPL4 blind challenge to predict the binding free energies of a set of octa-acid host-guest complexes. The resulting predictions were consistently judged as some of the most accurate predictions in this category of the SAMPL4 challenge in terms of quantitative accuracy and statistical correlation relative to the experimental values, which were not known at the time the predictions were made. The work has been conducted as part of a hands-on graduate class laboratory session. Collectively the students, aided by automated setup and analysis tools, performed the bulk of the calculations and the numerical and structural analysis. The success of the experiment confirms the reliability of the BEDAM methodology and it shows that physics-based atomistic binding free energy estimation models, when properly streamlined and automated, can be successfully employed by non-specialists.
1 Introduction
The accurate prediction of the strength of molecular association is an important and largely unsolved problem from both chemical[1] and medicinal[2] perspectives. Conventional approaches, such as docking, have reached a high level of maturity as high-throughput virtual screening[3, 4, 5, 6] and structure prediction tools.[7, 8] However methods based on interaction-energy scoring alone[9, 10] are often not optimally suited to pick out trends at the level of resolution necessary to address finer aspects of drug development such as lead optimization, specificity, toxicity, and resistance. Atomistic physics-based free energy models, which take into account dynamical aspects of molecular recognition,[11, 12, 2, 13, 14, 15, 16, 17, 18] have the potential to bridge this gap. However the reliability and general applicability of free energy models of binding remain to be fully established.[19, 20, 21, 22]
Most of the work with physics-based free energy models reported in the literature has focused on small retrospective datasets, which do not give an accurate and unbiased picture of the state of the field. The SAMPL series of community blind challenges[23, 24, 25] and related efforts[26] have played a key role in giving an unbiased view of the advantages as well as the challenges related to the application of free energy models of binding. In the recent SAMPL4 experiment for example, our group has employed our free energy methodology to screen a large set of HIV integrase inhibitor candidates[27, 28] where full treatment of conformational dynamics and entropic effects was found to be key to reach the observed level of prediction accuracy.
While, with the help of experiments such as SAMPL, theories, models and practices continue to improve, one key obstacle towards wider adoption of free energy models is the scarcity of automated and easy to use software tools. For example, although automated tools are beginning to appear,[29] it is notoriously laborious to plan free energy transformations to compute the relative binding free energies of a set of compounds. In many circumstances, such as in virtual screening, differences in ligand scaffolds are too great to accommodate conventional free energy transformations. In this respect absolute rather than relative binding free energy methods offer some advantages. Additional obstacles towards adoption are due to learning barriers posed by molecular dynamics engines, each with its own set of parameters and settings (topology construction, force field parameter assignment, soft-core potentials, restraints, long-range electrostatic treatments, etc.)[20] often incompatible with other molecular dynamics engines. Addressing some of these usability issues and making binding free energy tools more user friendly would enable a wide community of non-specialists to access binding free energy tools and to apply them in a variety of contexts, ultimately leading to new insights and discoveries.
As part of the octa-acid SAMPL4 affinity challenge, in this work we apply the Binding Energy Distribution Analysis Method (BEDAM), an absolute binding free energy protocol,[30, 31] to the blind prediction of the binding free energies of a set of host-guest systems.[32, 33] The bulk of the computational work reported here has been conducted by the students of the Statistical Thermodynamics class at the department of Chemistry at Rutgers University. The BEDAM method has been successfully applied to a variety of systems including protein-ligand binding complexes[30, 21, 34, 28] and host-guest complexes,[35] including the challenging ones presented as part of the previous SAMPL3 edition.[36]
In addition to a further opportunity for an unbiased validation of the methodology, the primary aim of the work has been to involve a group of students from various disciplines into a classroom project reflective of applied collaborative research. The BEDAM/SAMPL4 host-guest exercise was particularly suited for this. It allowed a direct application on molecular systems of the statistical thermodynamics concepts covered in the course. As in actual research, outcomes were not known or guaranteed. In addition, given the relatively small size of the host-guest systems, the computational load was expected to be compatible with the time and computational resources available to the class. The work also involved studying literature material about the available laboratory measurements[37] in order to prepare the molecular systems appropriately and validate the computational protocol before applying it to obtain predictions.
One of the challenges with the introduction advanced computational modeling tools in the classroom is that a significant amount of time is required to familiarize the students with the usage of the modeling software, the format for inputs and outputs, algorithmic details, etc. Besides consuming valuable class time, this process is often of limited utility to the majority of students who either are not directly engaged in computational research or whose home laboratories utilize a different suite of modeling software. This complication was largely bypassed here by using an easy-to-use graphical front-end (Maestro, by Schrödinger, Inc.) combined with the BEDAM automatic workflow tool developed in our laboratory.[38] This was essentially the same protocol we used to automate the free energy calculations for the SAMPL4 HIV integrase screening challenge.[28] The project was set up in such a way that students prepared the molecular systems using the graphical front-end, provided these to the BEDAM workflow which in turn produced, without further intervention, all of the inputs required by the molecular modeling package. The same workflow was used to process the simulation data to provide binding free energy estimates and to streamline structural and other thermodynamic analyses.
This study confirms that it is valuable from multiple perspectives to package complex free energy simulation protocols into a form that allows the automated processing of large datasets and at the same time is accessible to non-specialists. The features of the BEDAM methodology, which does not require explicit solvation, multiple complex free energy transformations and elaborate conformational restraining steps, are conducive to a high degree of automation.
2 Methods
2.1 Overall Organization of the Project
Our group focused primarily on the SAMPL4 HIV Integrase screening challenge.[25, 28] Participation to the octa-acid host-guest challenge was organized as a classroom experiment as part of the Statistical Thermodynamics graduate class that the senior authors (E.G. and R.M.L.) were teaching at the time. The aim of the experiment was to both recruit the help of students and expose them to a realistic applied research study. Contrary to most classroom experiences, but not unlike actual research scenarios, neither the students nor their instructors had knowledge of the “right” answers. However, also similar to most research scenarios, literature data was available to conduct validation of the model to gain confidence in the predictions.
Each student was assigned a small set of host-guest complexes to investigate. The molecular simulation software and related scripts and force field data were provided by the instructors. Students were responsible for building the molecular structures of the guests (either from scratch and/or starting from PubChem sources or using files provided by the SAMPL4 organizers) using the Maestro program ensuring correct protonation, Lewis structure and initial conformation. The students were also responsible for building the initial conformation of the complex by placing the ligand in a reasonable binding mode within the cavity of the host. Students submitted the prepared files for the host and the guests to the automated BEDAM workflow[38] to generate input files for the parallel calculation with the IMPACT program.[39] Students were also responsible for submitting the corresponding parallel jobs to a computing cluster and for retrieving and analyzing the resulting outputs.
Student reports on the host-guest experiment counted towards their final class grade. Students were asked to describe not only their calculations but also to observe overall binding affinity trends by retrieving and discussing the results obtained by other students. Conversely students were asked to complete their calculations and analysis within assigned deadlines so as to be able to promptly address requests from others. Again, this organization reflects actual collaborative research scenarios. At completion of the class the instructor collected the student predictions and submitted them to SAMPL.
2.2 The Binding Energy Distribution Analysis Method
The Binding Energy Distribution Analysis Method (BEDAM)[30] computes the absolute binding free energy between a receptor A and a ligand B employing a λ-dependent effective potential energy function with implicit solvation[40] (see below) of the form
(1) |
where r = (rA, rB) denotes the atomic coordinates of the complex, with rA and rB denoting those of the receptor and ligand, respectively,
(2) |
is the effective potential energy of the complex when receptor and ligand are dissociated, and
(3) |
is the binding energy function defined for each conformation r = (rB, rA) of the complex as the difference between the effective potential energies U(r) of the bound and dissociated conformations of the complex without internal conformational rearrangements. To improve convergence of the free energy near λ = 0, a modified binding energy function is employed of the form
(4) |
where umax is some large positive value (set in this work as 1000 kcal/mol). This modified binding energy function, which is used in place of the actual binding energy function [Eq. (3)] wherever it appears, caps the maximum unfavorable value of the binding energy while leaving unchanged the value of favorable binding energies.[31]
The binding free energy ΔGb is by definition the difference in free energy between the states at λ = 1 and λ = 0. The standard free energy of binding is related to this by the relation[11]
(5) |
where C° is the standard concentration of ligand molecules (C° = 1 M, or equivalently 1,668 Å−3) and Vsite is the volume of the binding site (see below). The multistate Bennett acceptance ratio estimator (MBAR)[41, 42] is used here to compute the binding free energy ΔGb from a set of binding energies, u, sampled from molecular dynamics simulations at a series of λ values. For later use we introduce here the reorganization free energy for binding defined by the expression[14]
(6) |
where ΔEb = 〈u〉1 is the average binding energy of the complex and is the standard binding free energy. The former is computed from the ensemble of conformations of the complex collected at λ = 1 and is computed by difference using Eq. (6).
2.3 The AGBNP2 Solvation Model
The potential energy of the system is described by the OPLS-AA/AGBNP2 effective potential in which the OPLS-AA[43, 44, 39] force field accounts for covalent and non-bonded interatomic interactions and the effect of the solvent is represented implicitly by means of the Analytic Generalized Born plus Non-Polar (AGBNP2) implicit solvent model.[40] A full description of the AGBNP2 model is available elsewhere.[40] Here we give a brief summary of the elements that have been tuned for the present application (see below).
The AGBNP2 model computes the solvation free energy of the solute, ΔGsolv, as the sum of electrostatic, ΔGelec, non-polar, ΔGnp, and short-range solute-water hydrogen bonding, ΔGhb, contributions:
(7) |
The electrostatic term is described by means of a variation of the continuum dielectric Generalized Born model.[45, 46] The non-polar term is further decomposed into a cavity hydration free energy ΔGcav, expressed in terms of solute surface areas, and a solute-solvent average dispersion interaction energy ΔGvdW given by the expression
(8) |
where Bi is the Born radius of atom i, Rw = 1.4 Å represents the radius of a water molecule, ai is an van der Waals energy integration factor solely dependent on the Lennard-Jones parameters of the solute atom and the water model,[47, 48] and αi ≃ 1 is an atom type-dependent dimensionless adjustable parameter.[46]
The hydrogen bonding term,
(9) |
is computed in terms of spherical hydration volumes w, typically located around hydrogen bonding donor and acceptor sites.[40] The geometrical parameter pw, expressed as the fraction of the hydration site not occupied by solute atoms, measures the effective water occupancy of the site and the adjustable parameter hw, which depends on the type of hydrogen bonding site, controls the strength of the solute-solvent interaction (or more precisely the portion of it not captured by the continuum model).[40] While normally used for hydrogen bonding sites contributing favorably to the solute hydration free energy, here and elsewhere[36, 35] we have also employed this same functional form to describe hydration sites contributing unfavorably to the hydration free energy (see below); the distinction being the sign of the hw parameter, negative for hydrogen bonding sites and positive for the unfavorable solvation free energy sites.
2.4 System preparation and tuning
The octa-acid host was prepared starting from the structure file provided by the SAMPL organizers using the facilities in the Maestro program (Schrödinger, Inc.) using standard OPLS2005 parameters. The guests were prepared similarly. All carboxylates of the host and the guests were modeled as unprotonated with a −8 overall net charge of the host. Both axial and equatorial conformations of cyclic alkyl rings of the guests were investigated separately. The axial conformations led to significantly less favorable binding free energies and were not considered in the analysis.
A preliminary binding free energy calculation for guest 1 with default AGBNP2 parameters resulted in an unstable complex, which was regarded as unreasonable. Accordingly, steps were taken to correct this defect. Given the hydrophobicity and depth of the binding cavity of the octa-acid host, it was reasoned that the cause of the discrepancy was due to water enclosure effects[49, 50] not well represented by our continuum solvent model. Two possible scenarios are likely: the cavity may be hydrated by restricted low entropy and/or high energy water molecules which, when released in the bulk due to guest binding, contribute favorably to binding. In the second scenario the cavity is partially dewetted resulting in weak interaction of host atoms in the interior of the cavity with the solvent. In the complex these are replaced by interactions between the host and the guest, again contributing favorably to binding. As indicated by explicit solvent simulations in which the binding cavity of the octa-acid host was observed to fluctuate from empty to completely filled with water,[51, 52] the two effects (low water entropy and low water occupancy in the cavity) may, in fact, occur concomitantly. Nevertheless, both effects contribute favorably to host-guest binding and, as described below, can be modeled similarly in the context of the implicit solvation model we have employed.
As illustrated in Fig. 1 the interior of the host is composed of an outer larger cavity and an inner smaller cavity. Four alkyl hydrogen atoms of the host point towards the smaller cavity.[37] Similar to a previous approach for a β-cyclodextrin host,[35] we employed these as attachment points for custom AGBNP2 hydration sites with unfavorable hydration strength parameters [hw in Eq. (9)]. The results submitted to SAMPL4 were obtained with hw = 2 kcal/mol, although, given that these sites are significantly occluded even in the absence of a bound guest, their individual contribution to the binding free energy is only a fraction of this value. We used a different strategy to model water enclosure effects in the larger cavity of the host. This cavity is lined with aromatic rings lacking hydrogen atoms suitable to serve as attachment points for hydration sites. Instead, we opted to reduce the van der Waals α parameters [see Eq. (8)] for the aromatic carbon atoms lining this cavity from 0.7 to 0.5. Both modifications work towards making the hydration free energy of the host less favorable relative to the complex thereby decreasing the desolvation penalty for binding. Given the limited scope of the classroom experiment, a full parameter optimization campaign was not carried out. The same modified parameters above were applied to both sets of complexes, those with known binding affinities and those with unknown affinities as part of the SAMPL4 challenge.
Figure 1.
Surface representation of the octa-acid host (with guest 7 bound). The guest (green carbon atoms) occupies the central cavity which is composed of an outer large cavity and a deeper smaller cavity occupied in this case by the methyl group of the guest.
2.5 Computational details
Force field parameters were assigned using Schrödinger’s automatic atomtyper.[39] Parallel alchemical Hamiltonian Replica Exchange molecular dynamics simulations were conducted with the IMPACT program.[39] The simulation temperature was set to 300K. We employed 16 intermediate steps at λ =0, 0.001, 0.002, 0.004, 0.005, 0.006, 0.008, 0.01, 0.02, 0.04, 0.07, 0.1, 0.25, 0.5, 0.75, and 1. The binding site volume was defined as any conformation in which the center of mass of the ligand was within 8 Å of the center of mass of the host. The ligand was sequestered within this binding site volume by means of a flat-bottom harmonic potential. Based on this definition the value of the term −kT lnC°Vsite in Eq. (5) is −0.15 kcal/mol. No other restraints were applied.
BEDAM calculations were performed for 1.4 ns of molecular dynamics per replica (22.4 ns total for each complex). Data from the last nanosecond of each replica trajectory was used for free energy analysis. Binding free energy estimates converged quickly; differences between estimates obtained using the first third and the full data set were all smaller than 1 kcal/mol. Binding energies were sampled with a frequency of 1 ps for a total of 16,000 binding energy samples per complex. Uncertainties in the binding free energies were estimated from MBAR[41] and scaled by a factor of 10 to reflect the correlation length of approximately 50 ps estimated from binding energy trajectories of guest 1. The binding free energy predictions were submitted to the SAMPL4 octa-acid challenge on July 20 2013 and assigned prediction ID #140.
3 Results and Performance
3.1 Binding Free Energy Validation
Table 1 reports the computed binding free energies for the octa-acid complexes for which experimental binding free energies were available at the time of the SAMPL4 challenge.[37] With the exception of the complexes with the two longest linear alkyl carboxylates (decanoate and octanoate) whose affinity is overestimated, there is good agreement between calculated and experimental binding free energies. The cause of the discrepancy for long chain carboxylates is not clear. The complex with the shorter hexanoate guest is predicted correctly and so are the complexes with the more compact adamantane and cyclohexane derivatives. As the SAMPL4 set did not contain long chain carboxylates, which appear problematic with the current model, we did not explore this issue further.
Table 1.
Calculated binding free energies for complexes of the octa-acid host with a set of guests with published experimental affinities.[37]
Guest | a,b |
|
|
---|---|---|---|
Decanoate | −6.87 | −10.6±0.4 | |
Octanoate | −6.02 | −7.9±0.4 | |
Hexanoate | −4.85 | −4.4±0.2 | |
1-Adamantane carboxylate | −8.25 | −7.0±0.5 | |
3-Noradamantane carboxylate | −7.42 | −8.1±0.5 | |
Cyclohexane carboxylate | −5.04 | −5.3±0.2 |
In kcal/mol
From reference [37].
3.2 Blind Predictions
The blind binding free energy predictions submitted to SAMPL4 are listed in Table 2 and shown in Fig. 2 compared to the experimental measurements, which were not known to us prior to the submission of the predictions.[32] Trans-4-methyl-cyclohexane carboxylate (guest 7) and 4-chlorobenzoate (guest 4) are correctly predicted as the strongest and next to strongest binders in this set. The calculated binding free energies for these guests are in quantitative agreement with the experiments (for example for guest 7, −7.2 kcal/mol predicted vs. −7.6 kcal/mol experimentally). At the other end of the spectrum, benzoate (guest 1) and cyclopentane carboxylate (guest 8) are correctly predicted as the weakest binders, although for these two guests the agreement is not as quantitative (for benzoate the binding free energy is underestimated by 2.7 kcal/mol). In general, the computational model predicts larger variations in binding free energies than observed as confirmed by the greater-than-one slope of the correlation line of the calculated binding free relative to experiments (Fig. 2). For example methylation at the trans position of guest 1 is predicted to favor binding by approximately 4 kcal/mol whereas measurements show a variation approximately half this value.
Table 2.
Experimental binding free energies and calculated binding free energies, binding energies and reorganization free energies for the complexes of the octa-acid host with the SAMPL4 set of guests.
Guest no. | a,b |
|
ΔEb(calc)a |
|
|||
---|---|---|---|---|---|---|---|
1 |
![]() |
−3.73±0.04 | −1.0±0.2 | −9.3±0.6 | 8.3±0.6 | ||
2 |
![]() |
−5.9±0.1 | −5.1±0.2 | −16.4±0.5 | 11.3±0.5 | ||
3 |
![]() |
−6.28±0.02 | −4.0±0.2 | −15.5±1.1 | 11.5±1.1 | ||
4 |
![]() |
−6.72±0.03 | −6.0±0.2 | −15.1±0.5 | 9.1±0.5 | ||
5 |
![]() |
−5.3±0.1 | −4.1±0.2 | −13.9±0.6 | 9.8±0.6 | ||
6 |
![]() |
−5.6±0.1 | −5.3±0.2 | −12.0±0.5 | 6.7±0.5 | ||
7 |
![]() |
−7.6±0.1 | −7.2±0.1 | −19.6±0.5 | 12.4±0.5 | ||
8 |
![]() |
−3.73±0.04 | −1.2±0.2 | −10.2±0.6 | 9.0±0.6 | ||
9 |
![]() |
−6.61±0.02 | −4.8±0.2 | −14.5±0.6 | 9.7±0.6 |
In kcal/mol.
From reference [32].
Figure 2.
Calculated standard binding free energies of the SAMPL4 octa-acid complexes plotted against the corresponding experimental measurements. The continuous line is the 1:1 line and the dashed line is the least-squared line (slope=1.5).
As the thermodynamic decomposition data in Table 2 shows, trends in binding affinity are generally determined by host-guest interaction energies measured by the binding energies ΔEb. The strongest binder (guest 7) is also the one with the most negative binding energy (−19.6 kcal/mol) whereas the weakest binders (guests 1 and 8) are the ones with the least negative binding energies (−9.3 and −10.2 kcal/mol, respectively). As it is often the case, however, the range of variation of the binding energy (10.0 kcal/mol) is significantly larger than the range of binding free energies (6.2 kcal/mol) due to the compensating effect of reorganization ( in Table 2). The reorganization free energy measures entropic losses and intramolecular strain of the host and the guest upon binding,[14] which generally become increasingly unfavorable with increasing strength of host-guest interactions. The strongest binder (guest 7) is also the one which incurs the highest reorganization penalty while the weakest binders incur the least. In the middle of the pack however the balance between favorable host-guest interactions and unfavorable reorganization losses are more complex. For example guest 2 would be predicted as the second strongest binder based on interaction energies alone overcoming guest 4 by more than 1 kcal/mol. Binding free energy scores however correctly predicts the opposite due to a 2 kcal/mol advantage of guest 4 in terms of reorganization penalty.
As summarized in the overview paper,[33] ours were judged as some of the most accurate predictions of the SAMPL4 challenge. Our submission ranked best (among the 13 octa-acid entries made public) in terms of root mean square error with respect to both absolute and relative binding free energy measures. For the latter, using the notation in reference [33], the root mean square error after subtracting the average signed error was RMSE_o=1.3 and the root mean square error of all pairs of relative binding free energies was RMSE_r=0.9 kcal/mol. Our predictions performed best also in terms of correlation slope (slope= 1.5, but interestingly behind a null model based on guest size), and second best in terms of correlation coefficient (R2 = 0.9). These quality metrics were statistically equivalent to those of absolute binding free energy predictions obtained by Ryde and coworkers with an explicit solvation model.[52]
The predominant binding mode seen in the simulations is, as expected, one in which the hydrophobic ring of the guest is set into the cavity with the carboxylate group oriented towards the solvent (Fig. 3). Substituents in the 4th position of the ring occupy the inner cavity of the host. This happens for guests 2, 3, 4, and 7. In these guests the substituent is in register to occupy the lower cavity of the host while leaving the carboxylate group optimally solvated. Guest 5, with the chlorine substitution at the 3rd position, prefers mostly to not occupy the inner cavity rather than sacrificing optimal solvation of the carboxylate group (see Fig. 3). The calculations generally reproduce the observed trend that binding to the inner cavity contributes to stronger binding. In agreement with the experiments complexes with guests 2, 4 and 7 are more strongly bound than their respective homologues (guests 1, 5, and 6) not capable of occupying the inner cavity.
Figure 3.
Representative structures of the complexes of the octa-acid host with the nine cyclic carboxylate guests investigated as part of the SAMPL4 challenge. The structures displayed here are the final frames of the trajectory of the BEDAM replica at λ = 1.
Experimental trends also identify interactions with the larger outer cavity as an important binding determinant for binding; an aspect that appears to be underestimated by the computational model. For example, guest 9, the third strongest binder experimentally despite the lack of interactions with the inner cavity, is ranked only fifth by the model. Similarly, as noted above, the affinities of guests 1 and 8, while ranked correctly, are significantly underestimated. On the other hand, the binding of guest 3 is also underestimated even though it occupies the inner cavity sacrificing in part good interactions with the outer cavity in order to accommodate the longer ethyl substituent. As noted,[32] the stronger observed binding of guest 3 relative to guest 2 is contrary to expectations and the model, predicting the opposite relative rankings, fails to shed light on the underlying molecular mechanism for the anomaly.
4 Discussion
Overall, binding affinity prediction methods have performed well on the SAMPL4 host-guest challenge,[33, 52, 53, 54, 55] confirming the steady progress of the field, and the valuable contribution of blind experiments of this kind towards this progress. The binding free energy predictions made as part of this work were among the top scoring submissions for the octa-acid binding affinity challenge evaluated by the SAMPL4 organizers.[33] The present results, together with previous successful experiences in SAMPL challenges,[36] and the good ligand screening performance in the concurrent SAMPL4 HIV integrase challenge,[28] adds further confidence in the reliability of the BEDAM protocol for binding free energy estimation. The present work also demonstrates the accessibility of the technology to non-experts, thanks to an automated workflow and the minimal set of structural assumptions required by the model.
As in previous work,[35] tuning of the implicit solvation model to properly treat enclosed hydration sites has been important to achieve good accuracy. Conventional solvation models based on homogeneous continuous descriptions of the solvent do not adequately treat hydration in deep hydrophobic solute cavities. In particular, it has been shown in several contexts that the displacement into the bulk of high free energy water molecules enclosed within receptor cavities can contribute favorably to ligand association.[49, 56] The atypical properties of water in molecular-sized volumes are difficult to model accurately even in the context of explicit representations of the solvent.[57, 58] As we have done in this work, our approach to address these challenges has been to parameterize empirical geometrical models against experimental data. The advantage of this approach is that it can yield, depending on the availability and quality of the experimental data, representations of the thermodynamics of hydration at a level of accuracy equivalent and possibly superior to models of higher complexity. However when adopting empirical approaches of this kind, transferability of parameters can not be assumed. In this work the choice of parameters was guided by existing experimental data on the octa-acid system[37] and previous experiences with similar hydration cavities in other host-guest systems.[36, 50, 35]
The octa-acid binding cavity is in many respects representative of hydrophobic binding sites on protein surfaces where complex hydration patterns significantly affect binding propensities.[49] The fact that reliable predictions were achieved in the present application despite very limited parameterization indicates that similar strategies could be successfully employed for protein receptors. Recent advances in inhomogeneous solvation theory analysis[50, 56] potentially offer a suitable route to automated parametrization from explicit solvent simulations.
The model generally confirms the expected trends in the SAMPL4 set of the octa-acid host system.[32] Guests capable of occupying the inner smaller cavity without sacrificing solvent exposure of the carboxylate group tend to bind the octa-acid host more strongly, and so are guests containing an alkyl ring rather than an aromatic ring. The model provides further insights and details on the molecular origins of these trends. For example it has been suggested that guest 4 (chlorine substituent at position 4) binds more strongly than guest 2 (methyl substituent) due to added hydrogen bonding-like interactions between the chlorine atom and the benzal hydrogen atoms of the host pointing towards the inner cavity.[32] However the computational model, by predicting a stronger host-guest interaction energy for guest 2 relative to guest 4 by more than 1 kcal/mol (see column 5 in Table 2), appears to contradict this hypothesis. In our model the greater affinity of guest 4 for the host is due to its smaller reorganization free energy penalty relative to guest 2. We hypothesize that this is due to added intramolecular strain imposed on the host to open up the inner cavity slightly to accommodate the methyl group. This is an example of the commonly observed compensation between binding energies and reorganization free energy components:[35, 28] stronger receptor-ligand interactions are often achieved at the expense of entropic losses and intramolecular strain and, as a result, the outcome in terms of binding free energy is often the result of a subtle balance that is difficult to predict.
As an additional example we find that the higher affinity of alkyl ring-containing guests is due to their stronger interaction energies with the host. For instance, the average binding energies of guests 6 and 7 are approximately 3 kcal/mol more favorable than those of the corresponding aromatic guests (guests 1 and 2). This is due to a combination of the larger number of hydrogen atom interaction centers in the alkyl ring and the smaller average distance between the carbon atoms of the guest and the atoms of the host afforded by the puckering of the ring. This conclusion is in agreement with the analogous analysis based on binding site volume occupancy.[32] Interestingly, in this case, unlike the example above, the stronger interaction energy actually translates into stronger binding affinity because the reorganization free energy component is either reinforcing the interaction energy difference (compare the reorganization free energy values of guest 6 and guest 1 in Table 1) or opposes it only slightly (guest 7 relative to guest 2).
The SAMPL4 host-guest binding experiment offered an invaluable opportunity to incorporate a realistic research task into a classroom context. The blind nature of the experiment, where no one was in possession of the right answer, created a unique collaborative network among students and among students and instructors. The choice of the best approach to solve each problem was worked out based on the collective wisdom of the class rather than being selected as the approach that gives the “right” answer. Validation of the model, just as it is in applied research, became not the goal of the exercise but rather the means to obtain a set of predictions of the highest quality possible.
To streamline the calculations, we employed a highly automated BEDAM workflow capable of preparing inputs for the molecular dynamics engine from a minimal set of user parameters: the structure files for receptor and ligand and the maximum center of mass distance between the two which defines the binding site volume (see Methods). This automation strategy, which has also been employed to automate hundreds of binding free energy calculations HIV integrase screening challenge as part of the same SAMPL4 experiment,[28] enabled the simulations with little user knowledge of the system preparation details, input file syntax and parallel execution commands for the molecular dynamics engine. Key features of the Schrödinger’s molecular modeling environment, such as the graphical user interface and automatic force field parameter assignment, also played a key role in making these complex calculations accessible to students.
5 Conclusions
As part of the SAMPL4 blind challenge, we have employed the BEDAM protocol to predict the binding free energies of a set of octa-acid host-guest complexes. Our predictions consistently scored among the best submitted to SAMPL4 in this category (best in terms root mean square errors and correlation slope, and second best in terms of correlation coefficient). The experiment has been conducted as part of a hands-on graduate class laboratory exercise. Collectively the students, guided by the instructors, performed the bulk of the calculations and the numerical and structural analysis. Students were encouraged to share data and prepared reports, on which this work is based, discussing their results in the context of those of all of the other students. Overall, participation to this SAMPL4 challenge has been a very instructive experience to both the students and their instructors. The success of the experiment confirms the reliability of physics-based atomistic binding free energy estimation models and it shows that these, when properly streamlined and automated, can be successfully employed by non-specialists.
Acknowledgments
This work has been supported in part by research grants from the National Institute of Health (GM30580) and the National Science Foundation Cyber-enabled Discovery and Innovation Award (CHE-1125332). The calculations reported in this work have been performed at the BioMaPS High Performance Computing Center at Rutgers University funded in part by the NIH shared instrumentation grants no. 1 S10 RR022375. We gratefully acknowledge Lauren Wickstrom for helpful discussion during the preparation of the manuscript.
References
- 1.Kluwer AM, Kapre R, Hartl F, Lutz M, Spek AL, Brouwer AM, Van Leeuwen PWNM, Reek JNH. Self-assembled biomimetic [2Fe2S]-hydrogenase-based photocatalyst for molecular hydrogen evolution. Proc Natl Acad Sci USA. 2009;106(26):10460–10465. doi: 10.1073/pnas.0809666106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jorgensen William L. The many roles of computation in drug discovery. Science. 2004;303:1813–1818. doi: 10.1126/science.1096361. [DOI] [PubMed] [Google Scholar]
- 3.Goodsell David S, Morris Garrett M, Olson Arthur J. Automated docking of flexible ligands: applications of autodock. J Molec Recogn. 1996;9:1–5. doi: 10.1002/(sici)1099-1352(199601)9:1<1::aid-jmr241>3.0.co;2-6. [DOI] [PubMed] [Google Scholar]
- 4.Shoichet Brian K, McGovern Susan L, Wei Binqing, Irwin John J. Lead discovery using molecular docking. Curr Opin Chem Biol. 2002;6:439–446. doi: 10.1016/s1367-5931(02)00339-3. [DOI] [PubMed] [Google Scholar]
- 5.Zhou Zhiyong, Felts Anthony K, Friesner Richard A, Levy Ronald M. Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets. J Chem Inf Model. 2007;47:1599–1608. doi: 10.1021/ci7000346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Perryman Alexander L, Santiago Daniel N, Forli Stefano, Santos-Martins Diogo, Olson Arthur J. Virtual screening with autodock vina and the common pharmacophore engine of a low diversity library of fragments and hits against the three allosteric sites of hiv integrase: participation in the SAMPL4 protein–ligand binding challenge. J Comp Aided Mol Des. 2014;28:1–13. doi: 10.1007/s10822-014-9709-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Friesner Richard A, Banks Jay L, Murphy Robert B, Halgren Thomas A, Klicic Jasna J, Mainz Daniel T, Repasky Matthew P, Knoll Eric H, Shelley Mee, Perry Jason K. Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J Med Chem. 2004;47(7):1739–1749. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
- 8.Morris Garrett M, Huey Ruth, Lindstrom William, Sanner Michel F, Belew Richard K, Goodsell David S, Olson Arthur J. Autodock4 and autodocktools4: Automated docking with selective receptor flexibility. J Comp Chem. 2009;30:2785–2791. doi: 10.1002/jcc.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Brown Scott P, Muchmore Steven W. Large-scale application of high-throughput molecular mechanics with poisson-boltzmann surface area for routine physics-based scoring of protein-ligand complexes. J Med Chem. 2009;52:3159–3165. doi: 10.1021/jm801444x. [DOI] [PubMed] [Google Scholar]
- 10.Repasky Matthew P, Murphy Robert B, Banks Jay L, Greenwood Jeremy R, Tubert-Brohman Ivan, Bhat Sathesh, Friesner Richard A. Docking performance of the Glide program as evaluated on the Astex and DUD datasets: a complete set of Glide SP results and selected results for a new scoring function integrating WaterMap and Glide. J Comp Aided Mol Des. 2012;26:787–799. doi: 10.1007/s10822-012-9575-9. [DOI] [PubMed] [Google Scholar]
- 11.Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: A critical review. Biophys J. 1997;72:1047–1069. doi: 10.1016/S0006-3495(97)78756-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: A quantitative approach for their calculation. J Phys Chem B. 2003;107:9535–9551. [Google Scholar]
- 13.Mobley David L, Dill Ken A. Binding of small-molecule ligands to proteins: “what you see” is not always “what you get”. Structure. 2009;17:489–498. doi: 10.1016/j.str.2009.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gallicchio Emilio, Levy Ronald M. Recent theoretical and computational advances for modeling protein-ligand binding affinities. Adv Prot Chem Struct Biol. 2011;85:27–80. doi: 10.1016/B978-0-12-386485-7.00002-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gallicchio Emilio, Levy Ronald M. Advances in all atom sampling methods for modeling protein-ligand binding affinities. Curr Opin Struct Biol. 2011;21:161–166. doi: 10.1016/j.sbi.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang Lingle, Berne BJ, Friesner Richard A. On achieving high accuracy and reliability in the calculation of relative protein-ligand binding affinities. Proc Natl Acad Sci U S A. 2012;109:1937–1942. doi: 10.1073/pnas.1114017109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mobley David L, Klimovich Pavel V. Perspective: Alchemical free energy calculations for drug discovery. J Chem Phys. 2012;137:230901. doi: 10.1063/1.4769292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Velez-Vega Camilo, Gilson Michael K. Overcoming dissipation in the calculation of standard binding free energies by ligand extraction. J Comp Chem. 2013;34(27):2360–2371. doi: 10.1002/jcc.23398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Shirts MR, Mobley DL, Brown SP. Structure Based Drug Discovery, chapter Free energy calculations in structure-based drug design. Cambridge University Press; 2010. [Google Scholar]
- 20.Chodera John D, Mobley David L, Shirts Michael R, Dixon Richard W, Branson Kim, Pande Vijay S. Alchemical free energy methods for drug discovery: Progress and challenges. Curr Opin Struct Biol. 2011;21:150–160. doi: 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lapelosa Mauro, Gallicchio Emilio, Levy Ronald M. Conformational transitions and convergence of absolute binding free energy calculations. J Chem Theory Comput. 2012;8:47–60. doi: 10.1021/ct200684b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mobley David L. Lets get honest about sampling. J Comp Aided Mol Des. 2012;26:93–95. doi: 10.1007/s10822-011-9497-y. [DOI] [PubMed] [Google Scholar]
- 23.Geballe Matthew T, Geoffrey Skillman A, Nicholls Anthony, Peter Guthrie J, Taylor Peter J. The SAMPL2 blind prediction challenge: introduction and overview. J Comp Aided Mol Des. 2010;24(4):259–279. doi: 10.1007/s10822-010-9350-8. [DOI] [PubMed] [Google Scholar]
- 24.Muddana Hari S, Daniel Varnado C, Bielawski Christopher W, Urbach Adam R, Isaacs Lyle, Geballe Matthew T, Gilson Michael K. Blind prediction of host–guest binding affinities: a new SAMPL3 challenge. J Comp Aided Mol Des. 2012;26(5):475–487. doi: 10.1007/s10822-012-9554-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mobley David L, Liu Shuai, Lim Nathan M, Wymer Karisa L, Perryman Alexander L, Forli Stefano, Deng Nanjie, Su Justin, Branson Kim, Olson Arthur J. Blind prediction of hiv integrase binding from the SAMPL4 challenge. J Comp Aided Mol Des. 2014:1–19. doi: 10.1007/s10822-014-9723-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dunbar James B, Jr, Smith Richard D, Damm-Ganamet Kelly L, Ahmed Aqeel, Esposito Emilio Xavier, Delproposto James, Chinnaswamy Krishnapriya, Kang You-Na, Kubish Ginger, Gestwicki Jason E, et al. CSAR data set release 2012: Ligands, affinities, complexes, and docking decoys. J Chem Inf Model. 2013;53(8):1842–1852. doi: 10.1021/ci4000486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Peat Thomas S, Dolezal Olan, Newman Janet, Mobley David, Deadman John J. Interrogating HIV integrase for compounds that bind–a SAMPL challenge. J Comp Aided Mol Des. 2014;28:347–362. doi: 10.1007/s10822-014-9721-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gallicchio Emilio, Deng Nanjie, He Peng, Perryman Alexander L, Santiago Daniel N, Forli Stefano, Olson Arthur J, Levy Ronald M. Virtual screening of integrase inhibitors by large scale binding free energy calculations: the SAMPL4 challenge. J Comp Aided Mol Des. 2014;28:475–490. doi: 10.1007/s10822-014-9711-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Liu Shuai, Wu Yujie, Lin Teng, Abel Robert, Redmann Jonathan P, Summa Christopher M, Jaber Vivian R, Lim Nathan M, Mobley David L. Lead optimization mapper: automating free energy calculations for lead optimization. J Comp Aided Mol Des. 2013;27(9):755–770. doi: 10.1007/s10822-013-9678-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gallicchio Emilio, Lapelosa Mauro, Levy Ronald M. Binding energy distribution analysis method (BEDAM) for estimation of protein-ligand binding affinities. J Chem Theory Comput. 2010;6:2961–2977. doi: 10.1021/ct1002913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tan Zhiqiang, Gallicchio Emilio, Lapelosa Mauro, Levy Ronald M. Theory of binless multi-state free energy estimation with applications to protein-ligand binding. J Chem Phys. 2012;136:144102. doi: 10.1063/1.3701175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gibb Corinne LD, Gibb Bruce C. Binding of cyclic carboxylates to octa-acid deep-cavity cavitand. J Comp Aided Mol Des. 2013:1–7. doi: 10.1007/s10822-013-9690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Muddana Hari S, Fenley Andrew T, Mobley David L, Gilson Michael K. The SAMPL4 host–guest blind prediction challenge: an overview. J Comp Aided Mol Des. 2014;28:1–13. doi: 10.1007/s10822-014-9735-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gallicchio E. Role of ligand reorganization and conformational restraints on the binding free energies of DAPY non-nucleoside inhibitors to HIV reverse transcriptase. Mol Biosc. 2012;2:7–22. doi: 10.4236/cmb.2012.21002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wickstrom Lauren, He Peng, Gallicchio Emilio, Levy Ronald M. Large scale affinity calculations of cyclodextrin host-guest complexes: Understanding the role of reorganization in the molecular recognition process. J Chem Theory Comput. 2013;9:3136–3150. doi: 10.1021/ct400003r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gallicchio E, Levy RM. Prediction of SAMPL3 host-guest affinities with the binding energy distribution analysis method (BEDAM) J Comp Aided Mol Design. 2012;25:505–516. doi: 10.1007/s10822-012-9552-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sun Hao, Gibb Corinne LD, Gibb Bruce C. Calorimetric analysis of the 1: 1 complexes formed between a water-soluble deep-cavity cavitand, and cyclic and acyclic carboxylic acids. Supramolecular Chemistry. 2008;20(1–2):141–147. [Google Scholar]
- 38.https://github.com/egallicc/bedam_workflow.
- 39.Banks JL, Beard JS, Cao Y, Cho AE, Damm W, Farid R, Felts AK, Halgren TA, Mainz DT, Maple JR, Murphy R, Philipp DM, Repasky MP, Zhang LY, Berne BJ, Friesner RA, Gallicchio E, Levy RM. Integrated modeling program, applied chemical theory (IMPACT) J Comp Chem. 2005;26:1752–1780. doi: 10.1002/jcc.20292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gallicchio Emilio, Paris Kristina, Levy Ronald M. The AGBNP2 implicit solvation model. J Chem Theory Comput. 2009;5:2544–2564. doi: 10.1021/ct900234u. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Shirts Michael R, Chodera John D. Statistically optimal analysis of samples from multiple equilibrium states. J Chem Phys. 2008;129:124105. doi: 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tan Zhiqiang, Gallicchio Emilio, Lapelosa Mauro, Levy Ronald M. Theory of binless multi-state free energy estimation with applications to protein-ligand binding. J Chem Phys. 2012;136:144102. doi: 10.1063/1.3701175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jorgensen WL, Maxwell DS, Tirado-Rives J. Developement and testing of the opls all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996;118:11225–11236. [Google Scholar]
- 44.Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and reparameterization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J Phys Chem B. 2001;105:6474–6487. [Google Scholar]
- 45.Qiu D, Shenkin PS, Hollinger FP, Still CW. The GB/SA continuum model for solvation. a fast analystical method for the calculation of approximate born radii. J Phys Chem A. 1997;101:3005–3014. [Google Scholar]
- 46.Gallicchio E, Levy RM. AGBNP: an analytic implicit solvent model suitable for molecular dynamics simulations and high-resolution modeling. J Comput Chem. 2004;25:479–499. doi: 10.1002/jcc.10400. [DOI] [PubMed] [Google Scholar]
- 47.Levy Ronald M, Zhang Linda Y, Gallicchio Emilio, Felts Anthony K. On the nonpolar hydration free energy of proteins: surface area and continuum solvent models for the solute-solvent interaction energy. J Am Chem Soc. 2003;125:9523–9530. doi: 10.1021/ja029833a. [DOI] [PubMed] [Google Scholar]
- 48.Su Y, Gallicchio E. The non-polar solvent potential of mean force for the dimerization of alanine dipeptide: The role of solute-solvent van der waals interactions. Biophys Chem. 2004;109:251–260. doi: 10.1016/j.bpc.2003.11.007. [DOI] [PubMed] [Google Scholar]
- 49.Young Tom, Abel Robert, Kim Byungchan, Berne Bruce J, Friesner Richard A. Motifs for molecular recognition exploiting hydrophobic enclosure in protein-ligand binding. Proc Natl Acad Sci U S A. 2007;104:808–813. doi: 10.1073/pnas.0610202104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Nguyen Crystal N, Young Tom Kurtzman, Gilson Michael K. Grid inhomogeneous solvation theory: Hydration structure and thermodynamics of the miniature receptor cucurbit [7] uril. J Chem Phys. 2012;137(4):044101. doi: 10.1063/1.4733951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ewell Jeffrey, Gibb Bruce C, Rick Steven W. Water inside a hydrophobic cavitand molecule. J Phys Chem B. 2008;112(33):10272–10279. doi: 10.1021/jp804429n. [DOI] [PubMed] [Google Scholar]
- 52.Mikulskis Paulius, Cioloboc Daniela, Andrejić Milica, Khare Sakshi, Brorsson Joakim, Genheden Samuel, Mata Ricardo A, Söderhjelm Pär, Ryde Ulf. Free-energy perturbation and quantum mechanical study of SAMPL4 octa-acid host–guest binding energies. J Comp Aided Mol Des. 2014:1–26. doi: 10.1007/s10822-014-9739-x. [DOI] [PubMed] [Google Scholar]
- 53.Hogues Hervé, Sulea Traian, Purisima Enrico O. Exhaustive docking and solvated interaction energy scoring: lessons learned from the SAMPL4 challenge. J Comp Aided Mol Des. 2014;28:1–11. doi: 10.1007/s10822-014-9715-5. [DOI] [PubMed] [Google Scholar]
- 54.Coleman Ryan G, Sterling Teague, Weiss Dahlia R. SAMPL4 & DOCK3.7: lessons for automated docking procedures. Journal of computer-aided molecular design. 2014;28(3):201–209. doi: 10.1007/s10822-014-9722-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Muddana Hari S, Yin Jian, Sapra Neil V, Fenley Andrew T, Gilson Michael K. Blind prediction of SAMPL4 cucurbit [7]uril binding affinities with the mining minima method. J Comp Aided Mol Des. 2014;28:1–12. doi: 10.1007/s10822-014-9726-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Beuming Thijs, Che Ye, Abel Robert, Kim Byungchan, Shanmugasundaram Veerabahu, Sherman Woody. Thermodynamic analysis of water molecules at the surface of proteins and applications to binding site prediction and characterization. Proteins: Struct Funct Bioinf. 2012;80(3):871–883. doi: 10.1002/prot.23244. [DOI] [PubMed] [Google Scholar]
- 57.Giovambattista Nicolas, Rossky Peter J, Debenedetti Pablo G. Effect of pressure on the phase behavior and structure of water confined between nanoscale hydrophobic and hydrophilic plates. Phys Rev E. 2006;73(4):041604. doi: 10.1103/PhysRevE.73.041604. [DOI] [PubMed] [Google Scholar]
- 58.Wang Lingle, Friesner Richard A, Berne Bruce J. Hydrophobic interactions in model enclosures from small to large length scales: non-additivity in explicit and implicit solvent models. Faraday discussions. 2010;146:247–262. doi: 10.1039/b925521b. [DOI] [PMC free article] [PubMed] [Google Scholar]