Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2018 Mar 14;148(10):104114. doi: 10.1063/1.5017136

Implicit ligand theory for relative binding free energies

Trung Hai Nguyen 1, David D L Minh 1,a)
PMCID: PMC5851784  PMID: 29544299

Abstract

Implicit ligand theory enables noncovalent binding free energies to be calculated based on an exponential average of the binding potential of mean force (BPMF)—the binding free energy between a flexible ligand and rigid receptor—over a precomputed ensemble of receptor configurations. In the original formalism, receptor configurations were drawn from or reweighted to the apo ensemble. Here we show that BPMFs averaged over a holo ensemble yield binding free energies relative to the reference ligand that specifies the ensemble. When using receptor snapshots from an alchemical simulation with a single ligand, the new statistical estimator outperforms the original.

I. INTRODUCTION

Noncovalent associations between receptors and small organic ligands play an important role in many important biological processes, including the recognition of substrates by enzymes, of hormones by receptors, and of pharmaceuticals by drug targets. Many computational methods have been developed to help characterize these interactions and to guide rational drug design.1–5

The strength of noncovalent association between a receptor R and ligand L to form a complex RL, R + LRL, is quantified by the standard binding free energy,

ΔGRL°=kBTlnC°CRLCRCL, (1)

where kB is the Boltzmann’s constant and T is the temperature in Kelvin. CX is the equilibrium concentration of the species XR,L,RL and C° is the standard concentration (usually 1M = 1/1660 Å3). It is often more convenient to calculate the relative binding free energy between a ligand of interest and a reference ligand, Lo,

ΔΔGRL°=ΔGRL°ΔGRLo°. (2)

The definition of ΔGRLo° is analogous to ΔGRL° except that Lo replaces L. Methods to compute binding free energies span a wide spectrum of theoretical rigor and computational expense.2

Among methods to compute binding free energies, alchemical pathway methods6 are the most rigorous and computationally expensive. These methods, which employ physically unrealistic intermediate thermodynamic states, are used for both ΔGRL° and ΔΔGRL° calculations. In the former, configurations of the complex are sampled from a series of states where the ligand is decoupled or physically separated from the receptor.7–10 In the latter, one ligand is converted into another. Accurate protein-ligand binding free energy calculations have been reported in many publications.11–18 Because sampling of complex conformations needs to be repeated for every receptor-ligand pair, however, performing alchemical free energy calculations for a large library of ligands is very computationally demanding.

Implicit ligand theory (ILT)19 has the potential to facilitate fast and accurate binding free energy calculations between a large library of ligands and a single receptor.

ILT is based on a formal separation of receptor and ligand configurational sampling: binding free energies are an exponential average of the binding potential of mean force (BPMF)—the binding free energy between a flexible ligand and rigid receptor—over an ensemble of receptor conformations. Calculations based on ILT are as rigorous as any alchemical binding free energy calculation in implicit solvent. The approach has a key benefit that the receptor conformational ensemble can be thoroughly sampled once and recycled for binding free energy calculations with multiple ligands.

The name ILT originates from a mathematical analogy to implicit solvent models. In the same way that implicit solvent models formally invoke an integral over all solvent degrees of freedom to yield the solvation free energy, ILT invokes an integral over all ligand degrees of freedom to yield the BPMF. In practice, however, the computation of these integrals is different. While solvation free energies are usually based on a continuum dielectric model that ignore positions of solvent molecules, BPMFs have hitherto been based on Monte Carlo simulations20 or fast Fourier transform operations21 that explicitly consider the position of the ligand.

Although BPMF calculations are nontrivial, they can be performed relatively quickly compared to binding free energy calculations with a flexible receptor. First, they only require sampling of ligands opposed to entire complexes. Second, receptor-ligand interaction energies can be quickly obtained by interpolating a 3D grid (this strategy is widely used in molecular docking22,23). Once interaction energy grids have been computed and stored, calculation times become independent of the number of atoms in the receptor.

Although the approach is promising, a key barrier to the successful application of ILT is sampling receptor conformations relevant to the holo (ligand-bound) ensembles of the ligands of interest. The original formulation of ILT involves an exponential average of BPMFs over the apo (ligand-free) ensemble of the receptor. This strategy may be successful for binding processes that occur by conformational selection—that is, when binding-competent conformations are among those populated in the equilibrium prior to binding (Fig. 1). However, if binding occurs by induced fit, such that conformations relevant to binding are rarely accessed in the apo ensemble, then sampling from the apo ensemble is unlikely to be a successful strategy for ILT-based binding free energy calculations. In these cases, the simplest approach to obtaining binding-competent conformations of a receptor may be simulations of another holo ensemble. In the left panel of Fig. 1, for example, sampling the blue holo ensemble may be useful for binding free energy calculations of the red ligand.

FIG. 1.

FIG. 1.

Schematic representation of apo and holo ensembles in configuration space. The circles denote regions in configuration space that are important to various thermodynamic ensembles. Solid blue and dashed red circles indicate two different ligands. In the left panel, a series of circles of the same color indicate one alchemical pathway from the apo to a holo ensemble. The left panel describes induced-fit binding processes in which apo and holo ensembles are distinct. The right panel describes conformational selection processes in which the holo ensembles are subsets of the apo ensemble.

In a recent study involving ILT-based binding free energy calculations between T4 lysozyme and 141 small molecules, Xie, Nguyen, and Minh24 obtained binding-competent receptor configurations by performing alchemical pathway simulations with 6 different ligands. This approach could be described in the context of the left panel of Fig. 1 as sampling from all of the solid blue circles to calculate the binding free energy of the dashed red ligand. Sampling from the entire alchemical pathway was necessary in order to accurately reweight each receptor conformation to the apo ensemble.

In this paper, we extend ILT by deriving a statistical estimator for ΔΔGRL° that circumvents the requirement of reweighting receptor configurations to the apo ensemble. Instead, receptor configurations need to be drawn from or reweighted to a holo ensemble. The new formalism yields the binding free energy relative to the ligand used to sample the holo ensemble. We will refer to the original expression as the apo estimator and the new one as the holo estimator. The holo estimator should be especially useful if ligand binding induces a conformational change that is consistent between different ligands. In this case, receptor snapshots can be drawn from a single simulation of a complex with one ligand. On the other hand, if holo ensembles are distinct subregions within the apo ensemble (right panel in Fig. 1), another receptor sampling strategy is likely to yield more accurate binding free energies.

In addition to being based on BPMFs, our new framework for relative binding free energy calculations differs from previous methods in that it does not require ligand transformations. Other ΔΔGRL° calculations require the free energy of transforming the reference ligand into the ligands of interest. This requirement limits the diversity of ligands for which ΔΔGRL° can be accurately computed. By contrast, the holo estimator only requires that ligands can bind to the same receptor ensemble.

After introducing the holo estimator, we apply it to binding free energy calculations between T4 lysozyme and 24 ligands based on previously calculated BPMFs.24 Although the previous receptor sampling strategy was not necessarily ideal for the holo estimator, analyzing the same BPMFs allows us to directly compare the performance of the two estimators.

II. THEORY

The absolute standard binding free energy between a receptor R and ligand L can be expressed in terms of configurational integrals for the complex, ZRL, receptor, ZR, and ligand, ZL, as6

ΔGRL°=kBTlnZRLZRZLC°8π2, (3)
ZRL=I(ξ)J(ξ)eβU(rRL)drRL, (4)
ZX=eβU(rX)drX,X{R,L}, (5)

where β=kBT1. U(rX) is the potential energy of species X with coordinates rX in implicit solvent. Coordinates of the complex, rRL, are partitioned into internal coordinates of the receptor and ligand, rR and rL, and external coordinates describing the relative translation and rotation of the species, ξ. I(ξ) is an indicator function equal to one when the receptor and ligand are considered bound and zero otherwise. J(ξ) is the Jacobian for the transformation from Cartesian coordinates into the coordinates used for ξ. 8π2 comes from an integral over the full range of Euler angles in the unbound complex.

The key insight of ILT is that standard binding free energies can be written in terms of an exponential average of the BPMF, B(rR),19

ΔGRL°=kBTlneβB(rR)eβU(rR)drReβU(rR)drRkBTlnΩC°8π2, (6)
B(rR)=kBTlnI(ξ)J(ξ)eβΨ(rRL)eβU(rL)drLdξI(ξ)J(ξ)eβU(rL)drLdξ, (7)
Ψ(rRL)=U(rRL)U(rR)U(rL), (8)

where Ω = ∫I(ξ)J(ξ), an integral of the indicator function over all external degrees of freedom. If no restraints are placed on rotation of the ligand, Ω = 8π2Vsite, where Vsite is the volume of the binding site. Equation (6) is an expectation over the apo ensemble, where the probability density of rR is proportional to expβU(rR). [It builds on the observation by Gallicchio, Lapelosa, and Levy25 that Eq. (3) is an ensemble average of the interaction energy over the apo ensemble.] Hence, ΔGRL° can be estimated using the apo estimator,

ΔĜRL°=kBTlnrRwa(rR)eβB^(rR)kBTlnΩC°8π2, (9)

where B^(rR) is an estimate of the BPMF for the receptor configuration rR and wa(rR) is the normalized statistical weight of rR in the apo ensemble. If receptor snapshots are drawn from the apo ensemble, then wa(rR) = 1/n, where n is number of receptor samples.

An expression for relative binding free energies in terms of configurational integrals can be obtained by substituting Eq. (3) and the analogous expression for ΔG°RLo into the definition of ΔΔGRL°,

ΔΔGRL°=kBTlnZRLZLZLoZRLo. (10)

By expanding the configurational integrals, multiplying by ∫I(ξo)J(ξo)o/∫I(ξ)J(ξ) = 1, and factoring out B(rR), we obtain the main theoretical result of our present paper,

ΔΔGRL°  =kBTlnI(ξo)J(ξo)eβU(rRLo)eβB(rR)Ψ(rRLo)drRLoI(ξo)J(ξo)eβU(rRLo)drRLo, (11)

where Ψ(rRLo)=U(rRLo)U(rR)U(rLo).

Equation (11) is an expectation over the holo ensemble with the reference ligand, where the probability density of rRLo is proportional to expβU(rRLo). Hence, ΔΔGRL° can be estimated using the holo estimator,

ΔΔĜRL°=kBTlnrRLowh(rRLo)eβB^(rR)Ψ(rRLo)ΔΔĜRLo°, (12)
ΔΔĜRLo°=kBTlnrRLowh(rRLo)eβB^o(rR)Ψ(rRLo), (13)

where wh(rRLo) is the statistical weight of rR in the holo ensemble of the receptor complexed with the reference ligand. If receptor snapshots are drawn from the holo ensemble, then wh(rRLo)=1/n, where n is the number of receptor samples. Bo(rR) is the BPMF of the reference ligand.

ΔΔGRLo°, described in Eq. (13), is the “self” relative binding free energy—the relative binding free energy difference between the reference ligand and itself—which should be zero in the limit of infinite sampling. Because the estimated value can be nonzero for a finite number of receptor snapshots, correcting Eq. (12) by ΔΔĜRLo° ensures that the free energy difference between the reference ligand and itself is zero regardless of the number of receptor samples.

In the limit of infinite receptor sampling, Eqs. (9) and (12) should converge to the same values for binding free energies. Based on a finite number of BPMFs, however, they provide numerically distinct estimates. This can be seen be noting that factor wa(rR) that appears in Eq. (9) is not equal to the factor wh(rRLo)eβΨ(rRLo) that appears in Eq. (12); their ratio,

wh(rRLo)eβΨ(rRLo)wa(rR)eβU(rLo),

is not equal to one. Reweighing the holo to the apo ensemble requires not only the interaction energy but also the potential energy of the ligand by itself.

III. COMPUTATIONAL METHODS

The calculations in this study were built on the foundation laid by Xie, Nguyen, and Minh.24 In the previous work, YANK,10 which implements an alchemical pathway method using a flexible receptor, was used to calculate absolute binding free energies between T4 lysozyme and 24 small organic molecules. The calculations yielded precise free energy estimates, with a standard deviation less than 0.5 kcal/mol for 22 systems and less than 1 kcal/mol for all 24 systems. YANK results for 18 systems were used as benchmarks for our ILT-based estimators. The other 6 ligands—methylpyrrole, benzene, p-xylene, phenol, n-hexylbenzene, and DL-camphor—were used as reference ligands.

In the present work, we used the Jensen-Shannon divergence (JSD)26 to compare different ensembles sampled in the YANK calculations. The benchmark holo ensembles were compared to the reference holo ensembles and to the apo ensemble. Based on 15 000 snapshots from each ensemble, a principal components analysis was performed based on heavy atoms within 5 Å of valine 111, 2D probability densities were calculated using a Gaussian kernel density estimate with a bandwidth of 0.1. The probability densities were compared by the JSD, defined as

DJS(p||q)=12DKL(p||m)+12DKL(q||m), (14)

where p, q, and m=12(p+q) are multivariate probability density functions. DKL is Kullback-Leibler divergence, given by

DKL(p||m)=p(x)lnp(x)q(x)dx, (15)

where x is, in general, a multivariate variable.

For the ILT-based binding free energy calculations in the study of Xie, Nguyen, and Minh,24 groups of 96 receptor snapshots were extracted from every alchemical state of YANK simulations with the reference ligands. (Subsequently, the term group will be used to refer snapshots drawn from a simulation with a single reference ligand.) BPMFs between 141 ligands and all 576 receptor snapshots were calculated using AlGDock20 with the OBC2 generalized Born/surface area implicit solvent model.27 In the AlGDock calculations, the AMBER ff14 force field28 was used for the receptor and the generalized AMBER force field29 with Bondi radii30 and AM1BCC partial charges31,32 for ligands. After obtaining the BPMFs, Eq. (9) was used to compute absolute binding free energies. The multistate Bennett Acceptance Ratio (MBAR)33 was used to compute weights (which we will refer to as MBAR weights) for all receptor snapshots in the apo ensemble (see Ref. 24 for details). Using MBAR weights for wa(rR), however, was found to not be the most numerically stable scheme. Better results were obtained with weighting scheme (c) in the the study of Xie, Nguyen, and Minh24—when each of the selected snapshots was assigned the cumulative MBAR weight of its alchemical state and each group was given equal weight.

In this work, Eq. (12) was used with the same BPMFs to calculate binding free energies for the 24 benchmark ligands relative to each of the 6 reference ligands. Each free energy estimate used up to 96 receptor snapshots from each group. Interaction energies Ψ(rRLo) were calculated using a python script based on OpenMM 6.3.1.34,35 The weights wh(rRLo) were calculated using MBAR.33 Unlike in the study of Xie, Nguyen, and Minh,24 assigning each selected snapshot the cumulative MBAR weight of the alchemical state from which it was sampled yielded poor results—an extremely large root mean square error (RMSE) and essentially no correlation with respect to YANK (see Fig. S1 of the supplementary material). This poor performance resulted from steric overlap between the ligand and receptor in some snapshots from mostly decoupled alchemical states leading to large Ψ(rRLo). Therefore, wh(rRLo) in Eq. (12) was based on each snapshot’s MBAR weight (this is actually a more statistically rigorous procedure).

Free energy calculations based on receptor snapshots from YANK calculations with different ligands were combined using simple rules. Starting with ΔΔĜRL° from the holo estimator, ΔĜRL° was calculated by adding the YANK free energy for the reference ligand. Given multiple ΔĜRL° from the holo estimator with multiple different ligands, the minimum value was taken as the consensus estimate. As a comparison, ΔĜRL° was also calculated using the best estimator from the study of Xie, Nguyen, and Minh24—Eq. (9) with wa(rR) based on the cumulative MBAR weight of alchemical states. The consensus estimate was taken to be the exponential average of the BPMF across all 6 groups.

IV. RESULTS

A. For single groups, the holo estimator is generally more consistent with YANK than the apo estimator

Although both the apo and holo estimators should be consistent with each other and with YANK in the limit of asymptotic sampling, their performance can be compared based on results for a finite number of receptor snapshots.

For the benchmark ligands, performance was dependent on the group, but the holo estimator was generally more consistent with YANK (Figs. 2 and 3 and Table I).36 Overall, the mean and median of Pearson’s R is higher with the holo estimator (0.88 and 0.89, respectively) than the apo estimator (0.78 and 0.79, respectively). In 4 of 6 groups, the Pearson’s R is significantly higher for the holo estimator. For the p-xylene and DL-camphor groups, the Pearson’s R for the two estimators was comparable. On a related note, the least squares linear regression lines for the holo estimator have slopes closer to 1 except for the p-xylene and DL-camphor groups (Table I). Across the 6 groups, the mean and median RMSE is lower for the holo estimator (2.99 and 1.83 kcal/mol, respectively) than the apo estimator (3.41 and 3.56 kcal/mol, respectively). The RMSE of the holo estimator has a lower value in 4 of the 6 groups. In the exceptions, DL-camphor and n-hexylbenzene groups, much of the error comes from a constant offset, as evidenced by the large y-intercept in the linear regression.

FIG. 2.

FIG. 2.

Binding free energies for the benchmark ligands estimated by YANK (x-axis) and AlGDock (y-axis) using the apo estimator (left column) or holo estimator (right column). Active ligands are shown as small dots and inactive ones as diamonds. Error bars denote the standard deviation from three independent YANK calculations (x-axis) or from bootstrapping BPMFs (y-axis), with the range of error bars representing a single standard deviation. A least-squares linear regression is shown as a dashed line. Each row corresponds to a group. Results for other ligands are shown in Fig. 3.

FIG. 3.

FIG. 3.

Binding free energies for the benchmark ligands, continued from Fig. 2.

TABLE I.

Comparing the consistency of apo and holo estimators to YANK free energies. Statistics are based on the benchmark ligands. The RMSE is in kcal/mol.

Apo estimator Holo estimator
Reference ligand Linear regression RMSE Pearson’s R Linear regression RMSE Pearson’s R
Methylpyrrole y = 0.74x + 1.73 3.39 (0.38) 0.81 (0.08) y = 1.08x + 0.41 1.20 (0.31) 0.94 (0.03)
Benzene y = 0.56x + 2.08 4.48 (0.45) 0.77 (0.08) y = 0.89x − 0.89 1.48 (0.46) 0.89 (0.06)
p-xylene y = 1.2x + 0.05 2.14 (0.43) 0.89 (0.05) y = 0.75x − 1.02 1.72 (0.36) 0.83 (0.07)
Phenol y = 0.59x − 0.69 3.73 (0.7) 0.59 (0.18) y = 0.73x − 0.44 1.94 (0.33) 0.82 (0.07)
DL-camphor y = 1.07x + 1.65 1.96 (0.19) 0.92 (0.04) y = 0.88x + 3.23 3.98 (0.26) 0.91 (0.04)
n-hexylbenzene y = 0.67x + 2.68 4.76 (0.69) 0.69 (0.11) y = 0.98x + 7.41 7.65 (0.33) 0.89 (0.05)

The offset observed when using the holo estimator for the DL-camphor and n-hexylbenzene groups is due to the “self” relative binding free energy estimates. On one hand, the estimates have large variance. For the DL-camphor and n-hexylbenzene groups, which led to the highest RMSE, the standard deviation of the estimate is significantly larger than with the other ligands (Table II). It is also notable that for n-hexylbenzene, ΔΔĜRLo° is negative, contributing to the large positive deviation observed in ILT-based ΔΔGRL° calculations with this reference ligand.

TABLE II.

“Self” relative binding free energies for 6 reference ligands. The numbers in parentheses are standard deviations estimated by bootstrapping—drawing N samples from the previously sampled population with replacement—100 times.

Reference ligand ΔΔĜRLo° (kcal/mol)
Methylpyrrole 4.06 (0.41)
Benzene 4.56 (0.49)
p-xylene 4.61 (0.98)
Phenol 1.96 (0.85)
DL-camphor 4.15 (1.41)
n-hexylbenzene −1.81 (2.69)

B. For single groups, the two estimators converge at a similar rate

The convergence of free energy calculations to their values for 96 snapshots is similar between the two estimators. The RMSE for both estimators follows a similar decreasing trend and starts to level off at essentially the same number of receptor snapshots (Fig. 4). Noticeable differences between the two estimators can be seen for the benzene group, where the holo estimator shows slightly faster convergence, and the DL-camphor group, where the opposite is true. Nevertheless, these differences may not be statistically significant because they are contained within the error bars. In 4 of the 6 groups in Fig. 5, the convergence of Pearson’s R is not very different between the two estimators. The holo estimator seems to converge slightly faster for the n-hexylbenzene group and slower for the DL-camphor group. Therefore, on average, the two estimators perform equally well in terms of convergence with respect to the number of receptor snapshots.

FIG. 4.

FIG. 4.

Convergence of binding free energies. RMSE with respect to the final results of binding free energies estimated using apo (blue solid lines) and holo (green dashed lines) estimators for single groups. Error bars denote the standard deviation from bootstrapping. Different panels correspond to different reference ligands.

FIG. 5.

FIG. 5.

Convergence of binding free energies. Pearson’s R with respect to the final results of binding free energies estimated using apo (blue solid lines) and holo (green dashed lines) estimators for single groups. Error bars denote the standard deviation from bootstrapping. Different panels correspond to different reference ligands.

C. For multiple groups, the estimators are equally consistent with YANK but the holo estimator converges more slowly

In contrast to using single groups, consensus estimates for the two estimators are equally consistent with YANK (Fig. 6). The holo estimator gives slightly higher RMSE (1.74 ± 0.45 kcal/mol) than the apo estimator (1.64 ± 0.22 kcal/mol), but the difference is within their error bars. The Pearson’s R is also the same, 0.90 ± 0.04 for the apo estimator and 0.89 ± 0.04 for the holo estimator.

FIG. 6.

FIG. 6.

Binding free energies for 24 ligands estimated by YANK (x-axis) and AlGDock (y-axis) using the apo estimator (left) or holo estimator (right). Results from 6 groups were combined using the exponential average across BPMFs from all groups for the apo estimator or the minimum of six estimates for the holo estimator. Active ligands are shown as dots and inactive ones as diamonds. Error bars denote the standard deviation from three independent YANK calculations (x-axis) or from bootstrapping BPMFs (y-axis), with the range of error bars representing a single standard deviation. The linear regression is shown as a dashed line.

When using a consensus estimate from multiple groups, the holo estimator converges to its final value for 576 snapshots more slowly than the apo estimator (Fig. 7). Both the RMSE and the Pearson’s R for the holo estimator level off at about 150 receptor snapshots, whereas for the apo estimator, they level off after about only 20-50 receptor snapshots.

FIG. 7.

FIG. 7.

Convergence of binding free energies. RMSE (left) and Pearson’s R (right) with respect to the final results of binding free energies estimated using apo (blue solid lines) and holo (green dashed lines) estimators for receptor snapshots combined from 6 YANK simulations. Error bars denote the standard deviation from bootstrapping.

D. Configuration space overlap is associated with accurate ΔGRL° estimates

Hitherto, we have used the same BPMFs for both the apo and holo estimators. While this approach has enabled us to directly compare the estimators, it does not specifically address the relationship between configuration space overlap and the accuracy of binding free energy calculations; receptor snapshots are from the same configuration space. In this section, we explore this relationship by calculating ΔGRL° based on receptor snapshots drawn from only the apo or holo ensembles.

In our simulations of T4 lysozyme complexes, most benchmark holo ensembles are closer to most reference holo ensembles than to the apo ensemble, but there are important exceptions. The JSD between most benchmark holo ensembles and reference holo ensembles with methylpyrrole, benzene, p-xylene, and phenol are lower than with the apo ensemble, leading to a lower median JSD (Fig. 8). However, the JSD between these ensembles and the benchmark ligands nitrobenzene, methanol, benzaldehyde oxime, and 1,2-diiodobenzene is very high. (The latter three of these benchmark ligands are inactive.) Due to these outliers, the mean JSD between the benchmark holo ensembles and the reference holo ensembles is higher than with the apo ensemble. For the other two reference holo ensembles, with DL-camphor and n-hexylbenzene, the JSD between these ensembles and the benchmark holo ensembles is very high. As previously mentioned, these two groups lead to the largest RMSE (Table II and Fig. 3).

FIG. 8.

FIG. 8.

The Jensen-Shannon divergence between the benchmark holo ensembles (rows) and the apo and reference holo ensembles (columns).

Accurate ILT-based binding free energy estimates are associated with configuration space overlap between the reference ensemble and benchmark holo ensembles. When the JSD is low (below 0.1), all ILT-based estimates are close to the YANK estimates (within around 2 kcal/mol). On the other hand, when the JSD is higher, there is a much broader range of deviation from YANK (Fig. 9). Notably, for snapshots from the n-hexylbenzene and DL-camphor ensembles, the binding free energy estimated by ILT is significantly overestimated (weaker than YANK).

FIG. 9.

FIG. 9.

Jensen-Shannon divergence versus deviation of ILT estimates from YANK. Absolute binding free energy estimates using the apo estimator based on 36 BPMFs to receptor snapshots drawn from the apo state of six YANK simulations are shown as red circles. Relative binding free energy estimates using the holo estimator based on 6 BPMFs to receptor snapshots drawn from the holo state of the YANK simulation for the reference ligand are shown as follows: methylpyrrole (blue downward triangles), benzene (green upward triangles), p-xylene (cyan leftward triangles), phenol (magenta rightward triangles), DL-camphor (yellow squares), and n-hexylbenzene (black diamonds).

V. DISCUSSION

The most parsimonious explanation for the aforementioned JSD relationships is that binding of our ligands to T4 lysozyme occurs through conformational selection, similar to the right panel of Fig. 1. All of the benchmark holo ensembles appear to be subregions of the apo ensemble, as evidenced by the relatively low mean, median, and maximum JSD. Different holo ensembles, however, make up potentially distinct subregions of the apo ensemble. Most benchmark ligands, except for nitrobenzene, methanol, benzaldehyde oxime, and 1,2-diiodobenzene, favor the same subregion as the reference ligands methylpyrrole, benzene, p-xylene, and phenol, but different subregions from DL-camphor and n-hexylbenzene.

The comparative performance of apo and holo ΔGRL° estimators may also be interpreted in terms of these configuration space relationships. The holo estimator outperforms the apo estimator for single groups because most benchmark holo ensembles are closer to reference holo ensembles than to apo ensembles. The similarity of ensembles means that reweighing factors have less variance and are more numerically stable. In cases when holo ensembles do not overlap, the holo estimator is able to accurately calculate ΔGRL° from groups because they contain snapshots from the apo ensemble. However, if only snapshots are exclusively from the holo ensemble and configuration space overlap is weak, then binding free energies are significantly overestimated.

Although the holo estimator exhibits superior performance for single groups, it comes with a caveat that the “self” relative binding free energy may lead to a constant offset. Equation (13), which we used to calculate ΔΔGRLo°, is an exponential average. Exponential averages converge slowly if the distribution of the exponent, in this case, βB^o(rR)Ψ(rRLo) are broadly distributed. In the specific case of the n-hexylbenzene group, the negative estimate implies that snapshots for which B^o(rR)<Ψ(rRLo) dominate the distribution. In turn, this implies that sampling of ligand configurations in the YANK simulation was limited, such that BPMF calculations accessed lower-energy configurations. In future calculations, an imprecise or negative ΔΔGRLo° may indicate problems with accurate absolute ΔGRL estimates based on the given reference holo ensemble.

Another limitation of the holo estimator is in the analysis of multiple groups. In the apo estimator, methods for combining data from multiple groups24 do not discard any information. On the other hand, our consensus holo estimator based on the minimum ΔGRL° from multiple independent calculations essentially discards information from all but one group. This inefficient use of data is the likely reason that the consensus holo estimator converges to its final value more slowly.

Given these advantages and disadvantages of the holo estimator, the appropriate choice of estimator for future studies will likely depend on the receptor sampling scheme. If only receptor snapshots from an alchemical simulation with a single complex are available, the holo estimator is likely to be superior. The holo estimator is even more likely to perform better when the receptor is sampled from a single holo ensemble opposed to a series of alchemical states. On the other hand, the apo estimator converges faster when combining receptor snapshots from multiple alchemical simulations. It is also obviously more suited to situations when the receptor is sampled from the apo ensemble.

The present framework for relative binding free energy calculations has some similarities but is distinct from that developed by Oostenbrink and van Gunsteren.37 Both approaches rely on precomputing a holo ensemble, which can be of an arbitrary (and possibly unphysical) reference state. Based on structural and chemical intuition, Oostenbrink and van Gunsteren37 designed large and soft reference compounds to induce receptor conformations that are relevant for binding to a diverse set of ligands. This sampling strategy can also be applied for the current framework to generate samples for our new estimator, Eq. (12). However, our approach differs in that we do not compute the free energy of transforming a ligand into a reference compound. To calculate these transformation free energies, Oostenbrink and van Gunsteren37 developed clever schemes to map the coordinates of the mostly planar aromatic ligands to the reference compounds. On the other hand, Eq. (12) is based on independent BPMF calculations between each ligand and structures from the apo ensemble. Our approach is more computationally expensive but applicable to libraries of even more diverse and flexible ligands.

Very recently, Jandova et al.38 adapted the one-step perturbation framework from Oostenbrink and van Gunsteren37 to the free energy of amino acid mutagenesis. They used unphysical reference amino acids that incorporated soft-core particles. Free energies of transforming the reference amino acids into standard amino acids were computed based on mapping different rotational orientations of a precomputed library of amino acid side chain conformations onto the reference state. This strategy could potentially be applied to relative binding free energies for ligands with flexible regions branching from a common scaffold.

VI. CONCLUSIONS

We have derived a new binding free energy estimator for ILT. In contrast to the previous estimator, the new one is based on a holo ensemble and gives binding free energies relative to the reference ligand that defines the holo ensemble. For receptor snapshots generated by alchemical simulations of T4 lysozyme and 6 different ligands, we have demonstrated that when using a single group, the newly derived estimator is generally more accurate than the previous apo estimator and converges at a similar rate. When combining multiple groups, the two estimators are equally accurate in the limit of many receptor snapshots, but the holo estimator converged more slowly. Accuracy of the estimator appears to depend on configuration space overlap between the reference ensemble and the holo ensemble for the ligand of interest.

SUPPLEMENTARY MATERIAL

See supplementary material for binding free energies estimated using the holo estimator when each receptor snapshot is assigned the MBAR weight of the alchemical state it represents instead of the MBAR weight of the individual snapshot (Fig. S1). Figure S2 shows mean holo-ensemble MBAR weights for snapshots drawn from each alchemical state.

ACKNOWLEDGMENTS

We thank Bing Xie for providing data from Xie, Nguyen, and Minh.24 This research was supported by the National Institutes of Health (Grant No. R15GM114781).

REFERENCES

  • 1.Jorgensen W. L., Science 303, 1813 (2004). 10.1126/science.1096361 [DOI] [PubMed] [Google Scholar]
  • 2.Gilson M. K. and Zhou H.-X., Annu. Rev. Biophys. Biomol. Struct. 36, 21 (2007). 10.1146/annurev.biophys.36.040306.132550 [DOI] [PubMed] [Google Scholar]
  • 3.Michel J. and Essex J. W., J. Comput.-Aided Mol. Des. 24, 639 (2010). 10.1007/s10822-010-9363-3 [DOI] [PubMed] [Google Scholar]
  • 4.Chodera J. D., Mobley D. L., Shirts M. R., Dixon R. W., Branson K., and Pande V. S., Curr. Opin. Struct. Biol. 21, 150 (2011). 10.1016/j.sbi.2011.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mobley D. L. and Klimovich P. V., J. Chem. Phys. 137, 230901 (2012). 10.1063/1.4769292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gilson M. K., Given J. A., Bush B. L., and McCammon J. A., Biophys. J. 72, 1047 (1997). 10.1016/s0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Deng Y. and Roux B., J. Phys. Chem. B 113, 2234 (2009). 10.1021/jp807701h [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bekker G.-J., Kamiya N., Araki M., Fukuda I., Okuno Y., and Nakamura H., J. Chem. Theory Comput. 13, 2389 (2017). 10.1021/acs.jctc.6b01127 [DOI] [PubMed] [Google Scholar]
  • 9.Heinzelmann G., Henriksen N. M., and Gilson M. K., J. Chem. Theory Comput. 13, 3260 (2017). 10.1021/acs.jctc.7b00275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang K., Chodera J. D., Yang Y., and Shirts M. R., J. Comput.-Aided Mol. Des. 27, 989 (2013). 10.1007/s10822-013-9689-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Michel J. and Essex J. W., J. Med. Chem. 51, 6654 (2008). 10.1021/jm800524s [DOI] [PubMed] [Google Scholar]
  • 12.Boyce S. E., Mobley D. L., Rocklin G. J., Graves A. P., Dill K. A., and Shoichet B. K., J. Mol. Biol. 394, 747 (2009). 10.1016/j.jmb.2009.09.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ge X. and Roux B., J. Phys. Chem. B 114, 9525 (2010). 10.1021/jp100579y [DOI] [PubMed] [Google Scholar]
  • 14.Wang L., Berne B. J., and Friesner R. A., Proc. Natl. Acad. Sci. U. S. A. 109, 1937 (2012). 10.1073/pnas.1114017109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhu S., Travis S. M., and Elcock A. H., J. Chem. Theory Comput. 9, 3151 (2013). 10.1021/ct400104x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang L., Wu Y., Deng Y., Kim B., Pierce L., Krilov G., Lupyan D., Robinson S., Dahlgren M. K., Greenwood J., Romero D. L., Masse C., Knight J. L., Steinbrecher T., Beuming T., Damm W., Harder E., Sherman W., Brewer M., Wester R., Murcko M., Frye L., Farid R., Lin T., Mobley D. L., Jorgensen W. L., Berne B. J., Friesner R. A., and Abel R., J. Am. Chem. Soc. 137, 2695 (2015). 10.1021/ja512751q [DOI] [PubMed] [Google Scholar]
  • 17.Aldeghi M., Heifetz A., Bodkin M. J., Knapp S., and Biggin P. C., Chem. Sci. 7, 207 (2016). 10.1039/c5sc02678d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wan S., Bhati A. P., Zasada S. J., Wall I., Green D., Bamborough P., and Coveney P. V., J. Chem. Theory Comput. 13, 784 (2017). 10.1021/acs.jctc.6b00794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Minh D. D. L., J. Chem. Phys. 137, 104106 (2012). 10.1063/1.4751284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Minh D. D. L., e-print arXiv:1507.03703v1 (2015).
  • 21.Nguyen T. H., Zhou H.-X., and Minh D. D. L., J. Comput. Chem. 39, 621 (2017). 10.1002/jcc.25139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pattabiraman N., Levitt M., Ferrin T. E., and Langridge R., J. Comput. Chem. 6, 432 (1985). 10.1002/jcc.540060510 [DOI] [Google Scholar]
  • 23.Meng E. C., Shoichet B. K., and Kuntz I. D., J. Comput. Chem. 13, 505 (1992). 10.1002/jcc.540130412 [DOI] [Google Scholar]
  • 24.Xie B., Nguyen T. H., and Minh D. D. L., J. Chem. Theory Comput. 13, 2930 (2017). 10.1021/acs.jctc.6b01183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gallicchio E., Lapelosa M., and Levy R. M., J. Chem. Theory Comput. 6, 2961 (2010). 10.1021/ct1002913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lin J., IEEE Trans. Inf. Theory 37, 145 (1991). 10.1109/18.61115 [DOI] [Google Scholar]
  • 27.Onufriev A., Bashford D., and Case D. A., Proteins: Struct., Funct., Bioinf. 55, 383 (2004). 10.1002/prot.20033 [DOI] [PubMed] [Google Scholar]
  • 28.Ponder J. W. and Case D. A., Adv. Protein Chem. 66, 27 (2003). 10.1016/s0065-3233(03)66002-x [DOI] [PubMed] [Google Scholar]
  • 29.Wang J., Wang W., Kollman P. A., and Case D. A., J. Mol. Graphics Modell. 25, 247 (2006). 10.1016/j.jmgm.2005.12.005 [DOI] [PubMed] [Google Scholar]
  • 30.Bondi A., J. Phys. Chem. 68, 441 (1964). 10.1021/j100785a001 [DOI] [Google Scholar]
  • 31.Jakalian A., Bush B. L., Jack D. B., and Bayly C. I., J. Comput. Chem. 21, 132 (2000). [DOI] [Google Scholar]
  • 32.Jakalian A., Jack D. B., and Bayly C. I., J. Comput. Chem. 23, 1623 (2002). 10.1002/jcc.10128 [DOI] [PubMed] [Google Scholar]
  • 33.Shirts M. R. and Chodera J. D., J. Chem. Phys. 129, 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Eastman P. and Pande V. S., Comput. Sci. Eng. 12, 34 (2010). 10.1109/mcse.2010.27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Eastman P., Friedrichs M., and Chodera J., J. Chem. Theory Comput. 9, 461 (2013). 10.1021/ct300857j [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.The strong performance of the holo estimator implies that the weights are numerically stable. As expected, snapshots drawn from alchemical states near the holo state contribute the largest weights to the holo ensemble (see Fig. S2 in the supplementary material).
  • 37.Oostenbrink C. and van Gunsteren W. F., Proc. Natl. Acad. Sci. U. S. A. 102, 6750 (2005). 10.1073/pnas.0407404102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jandova Z., Fast D., Setz M., Pechlaner M., and Oostenbrink C., J. Chem. Theory Comput. 14, 894 (2018). 10.1021/acs.jctc.7b01099 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

See supplementary material for binding free energies estimated using the holo estimator when each receptor snapshot is assigned the MBAR weight of the alchemical state it represents instead of the MBAR weight of the individual snapshot (Fig. S1). Figure S2 shows mean holo-ensemble MBAR weights for snapshots drawn from each alchemical state.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES