Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2012 Sep 13;137(10):104106. doi: 10.1063/1.4751284

Implicit ligand theory: Rigorous binding free energies and thermodynamic expectations from molecular docking

David D L Minh 1,a)
PMCID: PMC3460968  PMID: 22979849

Abstract

A rigorous formalism for estimating noncovalent binding free energies and thermodynamic expectations from calculations in which receptor configurations are sampled independently from the ligand is derived. Due to this separation, receptor configurations only need to be sampled once, facilitating the use of binding free energy calculations in virtual screening. Demonstrative calculations on a host-guest system yield good agreement with previous free energy calculations and isothermal titration calorimetry measurements. Implicit ligand theory provides guidance on how to improve existing molecular docking algorithms and insight into the concepts of induced fit and conformational selection in noncovalent macromolecular recognition.

INTRODUCTION

The goal of molecular docking is to predict the most stable configuration of a noncovalent complex between a ligand and receptor. Based on this configuration, the complex is assigned a score which may be used to approximately rank the binding affinity of one ligand to the receptor versus another. Molecular docking has many potential applications, and has been most prominently applied to the virtual screening1, 2 of chemical libraries to aid the development of pharmaceuticals.

Given the three-dimensional structure of a protein receptor, docking algorithms have proven reasonably adept at sampling stable conformations of small organic ligands in the complex. Unfortunately, current scoring functions perform poorly at predicting binding free energies.3, 4, 5 Hence, docking is typically used to filter a large library of potential ligands to a smaller binder-enriched library that may be pursued experimentally or by more accurate and expensive computational methods.6, 7, 8, 9, 10 Even in this capacity, however, scoring functions are inconsistent, frequently presenting false positives (ligands predicted to bind but actually have weak or no affinity) and false negatives (ligands predicted not to bind but actually have significant affinity). For example, docking programs often have difficulty distinguishing binding compounds from decoys in which the chemical connectivity has been randomized.3, 11 Improved scoring functions would increase the capability to discern binders from non-binders.

The improvement of scoring functions, however, has been hindered by the lack of a rigorous formalism for obtaining binding free energies from molecular docking. While molecular docking calculations are usually performed with a rigid receptor, existing formalisms for binding free energies require a flexible receptor. Here, I derive a formalism, implicit ligand theory, for estimating binding free energies and thermodynamic expectations based on docking ligands to rigid-receptor structures. I also describe practical aspects of statistical estimation, present example calculations, and discuss how physics-based (opposed to empirical or knowledge-based) docking algorithms (see Ref. 12) may be modified to exploit it. Beyond molecular docking, implicit ligand theory provides insight into the concepts of induced fit and conformational selection in noncovalent macromolecular recognition.

THEORY

The standard binding free energy, the free energy of a noncovalent association between a receptor R and ligand L to form a complex RL, R+LRL, is

ΔG=β1lnCCRLCRCL, (1)

where β = (kBT)−1 is the inverse of Boltzmann's constant, kB, times the temperature in Kelvin, T, C° is the standard concentration (typically 1 M), and CX is the equilibrium concentration of species X ∈ {R, L, RL}.13

Statistical thermodynamics relates the standard binding free energy to a ratio of configurational partition functions14

ΔG=β1lnZRL,NZNZR,NZL,NC8π2, (2)
ZRL,N=IξeβU(rRL,rS)drRLdrS, (3)
ZY,N=eβU(rY,rS)drYdrS, (4)
ZN=eβU(rS)drS, (5)

in which symmetry numbers and a small pressure-volume term have been omitted from Eq. 2. ZRL, N and ZY, N are configurational partition functions of the complex and of the species Y ∈ {R, L}, respectively, in N molecules of solvent. The potential energy U(rX, rS) depends on rX, the internal coordinates of the receptor, ligand, or both in complex (the external degrees of freedom have been analytically integrated), and rS, the coordinates of N molecules of solvent. The complex coordinates rRL may be decomposed into the internal coordinates of the receptor, rR, and of the ligand, rL, and six degrees of freedom describing their relative translation and rotation, ξL. For simplicity, Jacobians for the transformation from Cartesian coordinates to a system with separated internal and external degrees of freedom are not shown in Eqs. 3, 4. In ZRL, N, the indicator function IξIL) takes values between 0 and 1 and determines whether the receptor and ligand are complexed or not. For tight-binding complexes, the binding free energy is insensitive to the precise definition of Iξ.14

Implicit solvent theory

The configurational integrals in Eq. 2 may be expressed in a formally equivalent but simpler form using implicit-solvent theory.14 In implicit solvent theory, the interaction energy is defined as ψ(rX, rS) = U(rX, rS) − U(rX) − U(rS), where U(rX) is the potential energy of species X by itself and U(rS) the potential energy of the solvent by itself. By integrating the configurational partition functions over rS, we may define the ratios

ZRLZRL,NZN=Iξeβ[U(rRL)+W(rRL)]drRL, (6)
ZYZY,NZN=eβ[U(rY)+W(rY)]drY, (7)

where

W(rX)=β1lneβψ(rX,rS)eβU(rS)drSeβU(rS)drS (8)

is a potential of mean force (PMF) that can be interpreted as the constant-pressure reversible work of transferring the species X from the gas phase into the solvent. In biomolecular modeling, W(rX) is frequently estimated as the sum of an electrostatic term from the Poisson-Boltzmann (PB) equation15 (or the generalized Born approximation16), and a non-electrostatic term, which to a first approximation is proportional to the molecular surface area.

In terms of implicit solvent configurational integrals, the standard binding free energy is

ΔG=β1lnZRLZRZLC8π2. (9)

As most implicit solvent models fail to account for specific interactions, such as hydrogen bonding, that can have important structural and energetic consequences, binding free energy calculations in implicit solvent are generally expected to be less accurate than those in explicit solvent.17 Nevertheless, binding free energy calculations in implicit solvent have yielded promising agreement with experimental results (e.g., Refs. 18, 19, 20, 21).

Implicit ligand theory

The development of implicit ligand theory is very similar to that of implicit solvent theory. It involves defining the effective potential as U(rX)=U(rX)+W(rX), the effective interaction energy as Ψ(rRL)=U(rRL)U(rR)U(rL), and

B(rR)=β1lnIξeβΨ(rRL)eβU(rL)drLdξLIξeβU(rL)drLdξLβ1lneβΨL,IrL,ξL, (10)

which is a potential of mean force that will subsequently be referred to as the binding PMF. Throughout this paper, angled brackets ...X,...r will be used to denote an ensemble average over the coordinates r listed in the superscript with respect to the density proportional to qX, …, where X describes the coordinates in the effective potential U(rX), and … are labels. Here, qL,I(rL,ξL)=IξeβU(rL). Within angled brackets, I will use a shorthand notation in which functions implicitly depend on coordinates, e.g., Ψ ≡ Ψ(rRL).

In terms of the binding PMF, Eq. 9 may be written as

ΔG=β1lnIξeβU(rRL)drRLeβU(rR)drReβU(rL)drLC8π2=β1lnIξeβ[U(rR)+Ψ(rRL)+U(rL)]drRLeβU(rR)drReβU(rL)drLC8π2=β1lneβ[B(rR)+U(rR)]drReβU(rR)drRΩC8π2β1lneβBRrR+ΔGξ, (11)

where Ω = ∫IξdξL (which may be analytically tractable) is the binding site volume, ΔGξ=β1lnΩC8π2 is the free energy of confining the ligand external degrees of freedom to the binding site, and qR(rR)=eβU(rR). Equations 10, 11 are the central theoretical results of this paper.

Implicit ligand theory provides a rigorous framework for binding free energies that separates the sampling of receptor and ligand configurations. In Eq. 11, the receptor probability density is independent of any ligand configuration. Likewise, the probability density of ligand internal coordinates in Eq. 10 is independent from the receptor configuration. In practice, however, sampling from this ligand distribution may lead to slow convergence (this point will later be discussed in greater detail). The primary benefit of implicit ligand theory is that the computationally expensive step of sampling receptor configurations only needs to be performed once. Predicting binding free energies for a chemical library is then limited by the much faster process of sampling ligand conformations.

Thermodynamic expectations

In addition to estimating the binding free energy, implicit ligand theory may also be used to estimate expected values of observables in the bound ensemble. Observables may include, for example, the mean potential energy, interaction energy, or distance between a ligand and receptor atom. Towards this end, it is useful to define a rigid-receptor expectation of an observable O(rRL), weighted by the interaction energy,

Θ(rR)=IξO(rRL)eβΨ(rRL)eβU(rL)drLdξLIξeβU(rL)drLdξLOeβΨL,IrL,ξL. (12)

If the observable is solely a function of the receptor configuration, then Θ(rR) reduces to O(rR)eβB(rR).

In terms of Eqs. 10, 12, the expectation of O(rRL) with respect to the density proportional to qRL,I(rRL)=IξeβU(rRL) is

ORL,IrRLIξO(rRL)eβU(rRL)drRLIξeβU(rRL)drRL=IξO(rRL)eβ[U(rR)+Ψ(rRL)+U(rL)]drRLIξeβ[U(rR)+Ψ(rRL)+U(rL)]drRL=Θ(rR)eβU(rR)drReβ[B(rR)+U(rR)]drRΘRrReβBRrR=ΘRrReβ[ΔGΔGξ]. (13)

Equations 12, 13 significantly generalize implicit ligand sampling,22 a method to estimate the potential of mean force for the ligand center of mass. The results of Cohen et al.22 may be obtained by choosing the observable as a Dirac delta function for the ligand center of mass, taking a natural logarithm, and multiplying by β−1. Cohen et al.22 applied implicit ligand sampling to study gas migration pathways in myoglobin, but the possibility of estimating other observables and binding free energies has not been previously recognized.

ESTIMATION

Applying implicit ligand theory to predicting binding free energies involves three steps:

  1. Sampling receptor configurations.

  2. Estimating the binding PMF, B(rR), for each receptor configuration.

  3. Estimating ΔG° from B(rR) estimates.

In this section, I present several ways, roughly in order of increasing complexity, that these steps may be accomplished. A variant of one approach will be demonstrated later in the paper.

Receptor configurations

Receptor configurations can be drawn from qR(rR), any (possibly unnormalized) distribution qR, w(rR) on the same support as qR(rR) and for which w(rR) = qR(rR)/qR, w(rR) may be calculated, or from multiple distributions satisfying these conditions. Regardless of the sampling method, however, convergence of free energy estimates requires representative sampling of both the bound and unbound receptor configuration space. A particularly straightforward protocol is to sample from the distribution proportional to qR(rR)=eβU(rR); one conducts a molecular dynamics (MD) simulation in the implicit solvent used for W(rR), collecting snapshots at evenly spaced intervals that are longer than the statistical correlation time. This protocol may be satisfactory if receptor fluctuations are minimal and the ligand does not significantly perturb the receptor configurational ensemble.

For a receptor that undergoes larger structural fluctuations, sampling from multiple energetic minima may be facilitated by applying an external biasing potential (e.g., a harmonic bias) on one or more order parameters. If it is known that a ligand significantly perturbs the receptor configurational ensemble, it can be useful to introduce multiple alchemical intermediates into a simulation. Alchemical calculations may involve a coupling parameter λ, defined such that the two groups (e.g., the receptor and ligand) are non-interacting at λ = 0 and fully interacting with λ = 1. Simulations are conducted with λ at these end points and at multiple values in between. Sampling in each stage may be enhanced by Hamiltonian replica exchange (e.g., Jiang et al.,23 Gallicchio et al.,21 Gallicchio and Levy24), which entails stochastically swapping the coordinates of different simulations with a probability that preserves the Boltzmann distribution. Receptor configurations obtained through a flexible-receptor Hamiltonian replica exchange with a single ligand may subsequently be used for implicit ligand free energy calculations with other ligands in the chemical library.

As a caveat, implicit ligand theory does not provide a formal justification for docking to multiple experimentally determined structures (e.g., Ref. 25) or any other set of structures in which w(rR) is unknown (e.g., homology modeling or flexible docking). One potential way to use information about multiple structures is to conduct multiple MD simulations with external potentials biased towards one or more of the structures. To facilitate later analysis, the external potentials should be set up to promote overlap in the configuration space of different simulations.

Estimating a binding PMF

The binding PMF B(rR) may be expressed in terms of a ratio of partition functions,

B(rR)=β1lnIξeβU(rRL)drLdξLIξeβ[U(rL)+U(rR)]drLdξL, (14)

which clarifies that B(rR) is a special type of free energy difference in which the receptor configuration rR is rigid. Thus, B(rR) may be calculated using any one of many available methods to estimate free energy differences,26 including free energy perturbation (FEP),27 thermodynamic integration (TI),28 and the Bennett acceptance ratio (BAR).29 While formally equivalent, free energy methods can have dramatically different convergence properties.

Based on the form of Eq. 10, the most straightforward estimation protocol is FEP. One can, for example, draw ligand configurations from the distribution proportional to qL(rL,ξL)=eβU(rL) by conducting a MD simulation of the ligand in the appropriate implicit solvent and collecting snapshots at sufficiently long intervals. Because qL is independent of ξL, the external degrees of freedom sampled from the simulation may be replaced by a new ξL sampled from the distribution proportional to qξ, I = Iξ. The expectation in Eq. 11 may then be estimated by the sample mean

B^(rR)=β1ln1Nn=1NeβΨ(rRL,n), (15)

where rRL, n is the nth of N samples of the complex. Throughout this paper, A^ will denote a statistical estimator—an equation used to calculate a quantity based on sampled data.

In exponential averages such as Eq. 15, a small subset of samples may contribute a large portion of the sum. The limiting case of an individual important sample inspires the severe dominant state approximation, in which a single value of Ψ(rRL) is used to estimate B(rR). Exponential averages may also be estimated via a cumulant expansion,27 here shown for Eq. 10 to the fourth order,

B(rR)ΨL,IrL,ξLβ2!ξΨ2L,IrL,ξL+β23!ξΨ3L,IrL,ξLβ34!ξΨ4L,IrL,ξL3ξΨ2L,IrL,ξL2, (16)

where ξΨ=Ψ(rRL)ΨL,IrL,ξL. Each expectation in the cumulant expansion may be estimated by the sample mean.

While formally correct, this approach to ligand sampling can converge slowly if most ligand configurations placed in the binding site have overlapping atoms and high values of Ψ(rRL). One potential solution to this problem is to sample the external degrees of freedom from a distribution biased towards energetically favorable orientations by a confining potential UcL). Multiplying and dividing Eq. 10 by Ωc=IξeβUc(ξL)dξL and the integrand in the numerator by eβUc(ξL) leads to

B(rR)=β1lneβ[ΨUc]L,IcrL,ξLβ1lnΩcΩ, (17)

where qL,Ic=Iξeβ[U(rL)+Uc(ξL)]. Good choices for UcL), which may be ascertained from existing molecular docking algorithms (as will be discussed later in the paper), will favor the sampling of poses with low Ψ(rRL).

Alternatively, the binding PMF may be calculated using the inverse form of Eq. 10,

B(rR)=β1lnIξeβΨ(rRL)eβU(rRL)drLdξLIξeβU(rRL)drLdξL=β1lneβΨRL,IrL,ξL. (18)

Ligand configurations from the distribution proportional to qRL,I(rL,ξL)=IξeβU(rRL) may be sampled, for example, from an implicit-solvent MD simulation in which the receptor is held rigid and the ligand is allowed to move, and the expectation estimated using the sample mean estimator.

This straightforward procedure is also problematic because of the rarity of sampling configurations in which the ligand is separated from the receptor or in which they overlap. While these configurations are insignificant in the conformational ensemble in which receptor and ligand are fully interacting, they are relevant to the ensemble of noninteracting ligand and receptor, and the convergence of free energy differences requires phase space overlap between adjacent thermodynamic states.26 The phase space overlap problem may also be alleviated by calculating the free energy difference with a reference state in which the external degrees of freedom are confined,

B(rR)=β1lneβ[ΨUc]RL,IrL,ξLβ1lnΩcΩ. (19)

The binding PMF may be estimated from the same samples as with Eq. 18, and will be more accurate the more closely eβUc(ξL) resembles the distribution of ξL in the complex.

As discussed, phase space overlap problems are often resolved by introducing multiple alchemical stages into a calculation, and sampling may be enhanced by Hamiltonian replica exchange. With multiple stages, the total free energy difference between states with λ = 0 and λ = 1 is the sum of free energy differences between adjacent stages, each of which may be estimated by FEP,27 TI,28 or BAR.29 Alternatively, the total free energy difference may be estimated by the multistate Bennett acceptance ratio (MBAR).30

Estimating the binding free energy

Once B^(rR) is evaluated for each receptor configuration, the binding free energy may be calculated by estimating an ensemble average. The appropriate method for estimating ΔG° depends on how the receptor configurations rR are sampled. If they are drawn from the distribution qR(rR), then the expectation in Eq. 11 may be estimated by the sample mean

ΔG^=β1ln1Nn=1NeβB^(rR,n)+ΔGξ, (20)

in which B^(rR,n) is the estimated binding PMF for the nth of N receptor configurations. Because the implicit-ligand expression for the binding free energy, Eq. 11, has the same form as Eq. 10, the dominant state approximation and cumulant expansion may also be applied.

If receptor configurations are drawn from a biased distribution, the importance sampling identity,

OT=O(r)qT(r)drqT(r)dr=O(r)w(r)qS(r)drw(r)qS(r)dr=wOSwS, (21)

may be applied. In this generic expression, w(r) = qT(r)/qS(r) is a ratio of unnormalized densities qT(r) for the target distribution and qS(r) for the sampling distribution. Using the sample mean estimator and importance sampling identity for the expectation in Eq. 11 leads to

ΔG^=β1lnn=1Nw(rR,n)eβB^(rR,n)n=1Nw(rR,n)+ΔGξ. (22)

If receptor configurations are drawn from multiple biased distributions, then the expectation may be estimated using MBAR.30

Thermodynamic expectations

Thermodynamic expectations may be estimated from the same data as the binding free energy. The appropriate estimator for Θ(rR) will depend on how the ligand configurations were sampled. Once Θ(rR) is estimated for every sampled receptor configuration, the appropriate estimator for the expectation in Eq. 13 similarly depends on how the receptor configurations were sampled. In the simplest case for Θ(rR), if ligand configurations are sampled from qξ, I, then Θ(rR) may be estimated by a sample mean. In other cases, Θ(rR) and ORL,IrRL may be estimated using importance sampling, MBAR,30 or a combination thereof.

DEMONSTRATION

As a demonstration, implicit ligand theory calculations were performed to estimate the standard binding free energy of various ligands to Cucurbit[7]uril (CB[7]) in water. The binding of CB[7] to a number of ferrocenes, adamantanes, and bicyclooctanes has been well characterized by both isothermal calorimetry and second-generation mining minima (M2)18, 19 free energy calculations.31, 32 Receptor configurations were sampled by molecular dynamics, binding PMFs estimated with a multi-stage alchemical calculation and MBAR,30 and the binding free energy calculated using Eq. 20 or the dominant state approximation.

Methods

Molecular dynamics simulations at 300 K were performed with a slightly modified33 compilation of NAMD34 version 2.9. When appropriate, CB[7] was fixed using the fixedAtoms parameter. The “commercial” force field parameters and topologies from Moghaddam et al.32 were used for both CB[7] and its ligands. To match the force field from Moghaddam et al.32 as closely as possible, 1–4 electrostatics were scaled by 0.5 and the nonbonded cutoff was set to 999 Å, which effectively turns off cutoffs. Water was represented with the generalized Born surface area (GBSA) implicit-solvent model without ions and a surface tension of 0.006 kcal/mol/Å2. The receptor dielectric was 1.0 and solvent dielectric was 78.5. A time step of 1 fs (using a 2 fs time step with fixed atoms led to unstable trajectories) was used with Langevin dynamics.

CB[7] was minimized for 2500 steps and thermalized by increasing the temperature by 10 K and reinitializing velocities every 100 steps from 0 to 300 K. Receptor snapshots were saved every 0.1 ns from a trajectory of 10 ns.

Binding PMFs for every ligand in Moghaddam et al.32 with the minimized CB[7] structure (15 repetitions each) and 100 receptor simulation snapshots (1 repetition each) were estimated using Hamiltonian replica exchange, which can simultaneously dock a ligand and compute its binding free energy.21, 24 The implementation is similar to that from Gallicchio and Levy,24 except that the receptor configuration is fixed. A reservoir of ligand configurations24 was generated by simulating the ligand for up to 10 ns and saving snapshots every 10 ps. Simulations of the complex in which λ controls the extent of interaction between the ligand and receptor were run with λ ∈ {0, 10−5, 10−4, 10−3, 10−2,0.1, 0.2, 0.3, 0.4, 0.5,0.6, 0.7, 0.8, 0.9, 0.95, 1.0}. As implemented in NAMD, intermediate values of λ used a soft-core potential with a van der Waals shift coefficient of 5. Electrostatic interactions were turned on when λ = 0.5. Using the colvars module, a flat-bottom harmonic potential with a spring constant of 10 kcal mol−1 Å−1 and starting at 0.75 Å was used to restrain the center-of-mass distance between the ligand core (heavy atoms except for the R groups in Moghaddam et al.32) and the receptor heavy atoms. This potential keeps the ligand within the binding site when interactions are turned off. The binding site volume, Ω = ∫IξdξL, is approximated as 4/3π(0.753)(8π2). Because NAMD does not allow the simultaneous use of alchemical decoupling and implicit solvent, simulations were conducted in vacuum.

The replica exchange simulation was initiated by taking a random ligand configuration, applying a random rotation, and randomly placing it within the binding site. This initial configuration was minimized and thermalized with the same protocol as with CB[7], except that it was done in vacuum. The thermalized structure was used to start each replica. Occasionally, the random placement of the ligand led to high forces that caused the simulations to crash; in this case, the simulation was restarted with a different random initial configuration.

After every 5 ps of simulation for every value of λ, 1000 replica exchanges were attempted between each pair of adjacent λ windows. After each set of replica exchange attempts, the ligand configuration for λ = 0 was replaced with a random ligand configuration from the reservoir, randomly rotated, and placed in the binding site. (This type of reservoir swap satisfies detailed balance.) The simulation was conducted for 25 cycles, saving snapshots every 0.5 ps, for a total of 2 ns of simulation for each binding PMF. The docking and equilibration period, defined as the time before the potential energy of the fully coupled state is within 20 kBT of its energy for the final snapshot, was ignored in subsequent analysis.

Because alchemical coupling calculations were performed in vacuum, binding PMFs were estimated based on a decomposition of B(rR),

B(rR)=Bcpl+BRLBLΔU(rR),Bcpl=β1lnIξeβU(rRL)drLdξLIξeβ[U(rL)+U(rR)]drLdξL,BRL=β1lnIξeβΔU(rRL)eβU(rRL)drLdξLIξeβU(rRL)drLdξL,BL=β1lnIξeβΔU(rL)eβU(rL)drLdξLIξeβU(rL)drLdξL. (23)

Bcpl is the free energy of turning on the interactions between the ligand and the rigid receptor in vacuum. BRL, BL, and ΔU(rR) are free energies of transferring the complex, ligand, and receptor, respectively, from vacuum to the target state (in implicit solvent). They are based on ΔU(rX) = UT(rX) − U(rX), the potential energy difference between rX in the target state versus the state from which configurations were sampled (in vacuum). Bcpl was estimated by applying MBAR30 to snapshots from every 0.5 ps of simulation, and BRL and BL by single-step FEP. (Evaluating transfer free energies by MBAR would require the computationally expensive step of calculating target-state potential energies for every snapshot.)

This decomposition makes it straightforward to evaluate B(rR) for a variety of force fields using the same configurational samples. In this work, four are compared:

  1. NAMD: the total potential energy from using GBSA in NAMD;34

  2. M2: the total potential energy from using the GBSA model in the M2 program;18, 19

  3. PB: Poisson-Boltzmann electrostatic solvation free energies from the University of Houston Brownian Dynamics (UHBD) program35 and bond, angle, dihedral, Coulomb, and van der Waals energies from the M2 program;18, 19

  4. PBSA: Poisson-Boltzmann electrostatic solvation free energies from UHBD35 and bond, angle, dihedral, Coulomb, van der Waals, and nonpolar surface area energies from M2,18, 19 the combination used in Moghaddam et al.32

During this step, the NAMD, M2, and UHBD programs are used strictly for single-point energy evaluations, not for minimization or dynamics. Poisson-Boltzmann energies were calculated with a grid spacing of 0.18 Å with dimensions such that the maximum dimensions of the molecule are 0.7 (or less) of the final grid.32 For comparison, binding PMFs were also calculated from the dominant state approximation with PBSA energies, using the lowest value of Ψ(rRL) observed in the simulations with λ = 0 or λ = 1.

Because receptor configurations were sampled from a simulation in GBSA implicit solvent, binding free energies were estimated by using Eq. 22. Binding free energies were also estimated with the dominant state approximation: using the lowest observed value of B^(rR) to estimate β1lneβBRrR.

To demonstrate the calculation of thermodynamic expectations and for comparison with results from Moghaddam et al.,32 the mean values of six PBSA energies – van der Waals, Coulomb, electrostatic solvation, valence (bond + angle + dihedral), nonpolar solvation, and total – were estimated for the complex, the receptor, and the ligand. Mean PBSA energies for the ligand and receptor were estimated by applying the importance sampling identity to the ligand from the non-interacting system in vacuum and to the receptor from the GBSA simulation, respectively. Occasionally, energies in the ligand trajectory briefly spiked to very high values. In estimating the mean PBSA energies, these spikes were filtered out by removing data points in which the total PBSA energy is at least 100 kBT larger than the PBSA energy of the final snapshot. As the spikes were likely caused by the finite molecular dynamics time step, they would probably be avoided by using a propagator that exactly preserves the Boltzmann distribution, e.g., hybrid Monte Carlo.36

Towards estimating the mean PBSA energies of the complex, rigid-receptor expectations were estimated by applying MBAR30 to snapshots from the non-interacting and fully interacting states,

Θ^(rR)=n=1Nw(rRL,n)O(rRL,n)n=1Nw(rRL,n),w(rRL)=eβ(UPBSA(RL)U0(RL))1+N1N0eβ(U1(rRL)B^cplU0(rRL)), (24)

where U0(rRL) and U1(rRL) are the potential energies of the non-interacting and fully interacting complexes, respectively, UPBSA(rL) is the PBSA energy of only the ligand, and rRL, n is the nth of N snapshots of either the non-interacting (N0 snapshots) or fully interacting complex (N1 snapshots). B^cpl was estimated by using MBAR30 with all replicas. While it would be possible to estimate the mean PBSA energies using all snapshots from all replicas, this was avoided because of the computational expense of Poisson-Boltzmann calculations, which can take over a minute per snapshot. After obtaining Θ^(rR), the importance sampling identity, Eq. 21, was used to estimate the expectations in Eq. 13. To ensure consistency of the estimator—an estimate of a constant yields the same constant–Θ^(rR) was calculated for O = 1, in which case Θ^(rR)RrR=eβBRrR. This estimate of eβBRrR was used in the denominator of Eq. 13.

Results

Highlighting the importance of an accurate molecular mechanics model, binding PMF estimates are strongly dependent on the force field, as shown in Table 1. For the large and highly charged bicyclooctane B11, switching the force field causes the binding PMF to change nearly 40 kcal/mol! With increasing magnitude of charge, larger Coulomb energies lead to larger values of Bcpl and larger electrostatic solvation free energies increase the magnitude of BRL, BL, and ΔU(rR) (for estimates of Bcpl, BRL, and BL, see Table I of the supplementary material37). Thus, estimating the binding PMF with Eq. 23 entails the difficult task of computing a relatively small difference between large values. The importance of the force field has also been noted for M2 calculations.18, 19 An alternate implementation, e.g., conducting replica exchange within implicit solvent rather than vacuum, may not require the implicit-solvent model to be as accurate.

Table 1.

The mean and standard deviation of 15 independent estimates of the binding PMF, B(rR), (kcal/mol) for various ligands to the minimized structure of CB[7], based on applying Eq. 23 with different force fields (NAMD, M2, PB, and PBSA columns) or on using the minimum observed value of the interaction energy Ψ(rRL) from PBSA energies during the λ = 0 and λ = 1 simulations (min{Ψ(rRL)} column). The bottom rows show the correlation coefficient (R2) and root mean square error (RMSE, Eq. 25) with respect to isothermal titration calorimetry experiments (ITC) and mining minima calculations (Gilson) from Moghaddam et al.32 that result from the dominant state approximation—calculating ΔG^ by using a single binding PMF estimate B^(rR) as an estimate for β1lneβBRrR in Eq. 11.

Ligand NAMD M2 PB PBSA min{Ψ(rRL)}
AD1 −14.1 (0.79) −22.0 (0.51) −23.0 (0.82) −25.5 (0.83) −31.3 (0.55)
AD2 −32.5 (0.15) −29.0 (0.13) −26.8 (0.12) −29.4 (0.12) −36.9 (0.30)
AD3 −31.0 (0.16) −30.7 (0.18) −28.9 (0.23) −31.6 (0.23) −40.3 (0.28)
AD4 −44.0 (0.94) −36.7 (1.11) −24.0 (1.12) −26.9 (1.12) −36.1 (0.45)
AD5 −32.2 (0.68) −29.0 (0.25) −26.0 (0.14) −28.5 (0.14) −36.2 (0.29)
B02 −12.8 (0.41) −18.8 (0.38) −19.8 (0.53) −22.6 (0.53) −30.6 (0.54)
B05 −40.4 (0.29) −30.6 (0.40) −19.5 (0.50) −22.3 (0.50) −34.3 (0.63)
B11 −52.4 (1.50) −38.5 (1.81) −14.1 (1.72) −17.5 (1.63) −39.3 (1.90)
F01 −1.8 (1.57) −5.5 (0.81) −10.9 (0.53) −13.6 (0.53) −24.7 (0.33)
F02 −14.8 (0.96) −14.1 (0.67) −16.2 (0.42) −19.2 (0.42) −31.7 (0.38)
F03 −16.3 (1.61) −13.1 (1.03) −16.4 (0.95) −19.5 (0.94) −31.1 (0.71)
F06 −30.6 (0.18) −20.0 (0.21) −21.9 (0.18) −25.4 (0.18) −37.2 (0.24)
RITC2 0.884 0.750 0.454 0.490 0.883
RMSEITC 12.8 7.9 4.9 4.7 12.2
RGilson2 0.827 0.907 0.705 0.712 0.792
RMSEGilson 10.4 4.8 5.4 4.5 11.3

With 2 ns of total simulation for all replicas, the standard deviation of binding PMF estimates ranges from 0.12 to 1.63 kcal/mol (Table 1), with most estimates on the lower range of imprecision. For all of the components of Eq. 23, the mean estimate does not appear to shift after about 0.75 ns, and additional sampling reduces the standard deviation of the estimate (see Fig. 1 and Fig. 1 in the supplementary material37). There is no unique component that limits the convergence of B^(rR); the slowest converging component varies from ligand to ligand. The binding PMF estimate B^(rR) and the minimal interaction energy min {Ψ(rRL)} converge at about the same rate, suggesting that the limiting factor for convergence is finding a configuration with the lowest interaction energy. This interpretation is corroborated by the fact that largest ligands with the most rotatable bonds (see Moghaddam et al.32 for structures) also have the most variance in B^(rR), as the flexibility increases the challenge of finding configurations with low Ψ(rRL).

Figure 1.

Figure 1

The mean and standard deviation of 15 independent estimates of B(rR), Bcpl, BRL, BL, and min{Ψ(rRL)} (kcal/mol) based on PBSA energies as a function of total MD simulation time for the ligand B02. Analogous plots for the other ligands in this study are available as Fig. 1 in the supplementary material.37

The accuracy of binding free energy estimates was assessed with the correlation coefficient and root mean square error (RMSE)

RMSE (m1,m2)=1Ll=1L(ΔGl,m1ΔGl,m2)2 (25)

between methods m1 and m2, where ΔGl,m is the binding free energy estimate for ligand l of L ligands using method m (Tables 1, 2).

Table 2.

Estimates of the binding free energy ΔG° (kcal/mol) of various ligands to CB[7]. First, binding PMFs B(rR) are estimated based on Eq. 23 for 100 receptor snapshots from a simulation in GBSA implicit solvent. Then ΔG^ is calculated using Eq. 22. The value in the parentheses is the standard deviation from bootstrapping: the binding free energy is estimated based on 1000 random selections of 100 binding PMFs. The experimental and Gilson columns are isothermal calorimetry measurements (ITC) and M2 calculations, respectively, taken from Moghaddam et al.32 The bottom rows are the correlation coefficient (R2) and root mean square error (RMSE, Eq. 25) with respect to the ITC and Gilson columns.

Ligand ITC Gilson NAMD M2 PB PBSA
AD1 −14.1 −18.2 −9.4 (0.23) −16.3 (0.15) −17.6 (0.25) −20.1 (0.25)
AD2 −19.4 −25.9 −27.9 (0.19) −24.3 (0.22) −22.9 (0.27) −25.4 (0.26)
AD3 −20.4 −25.6 −35.7 (5.03) −28.6 (1.87) −23.5 (0.23) −26.2 (0.23)
AD4 −21.5 −29.7 −40.5 (0.21) −33.7 (0.32) −24.3 (1.11) −27.1 (1.06)
AD5 −19.1 −24.1 −29.5 (1.24) −24.0 (0.20) −22.0 (0.35) −24.4 (0.34)
B02 −13.4 −12.0 −9.0 (0.38) −13.7 (0.16) −15.4 (0.26) −18.1 (0.25)
B05 −19.5 −23.1 −38.0 (0.40) −27.7 (0.27) −18.6 (0.27) −21.4 (0.27)
B11 −20.6 −22.4 −51.2 (0.34) −37.3 (0.24) −17.2 (0.53) −20.5 (0.51)
F01 −12.9 −10.2 0.3 (0.82) −0.6 (0.34) −4.9 (0.26) −7.6 (0.25)
F02 −16.8 −12.4 −12.0 (0.70) −9.6 (0.75) −11.7 (0.70) −14.6 (0.71)
F03 −17.2 −12.2 −10.2 (0.16) −7.3 (0.24) −10.2 (0.22) −13.2 (0.22)
F06 −21.0 −17.8 −24.1 (0.34) −14.1 (0.46) −16.2 (0.51) −19.7 (0.52)
RITC2   0.782 0.870 0.745 0.671 0.704
RMSEITC   4.6 14.0 9.0 4.4 4.5
RGilson2     0.841 0.892 0.923 0.925
RMSEGilson     11.3 5.9 3.4 2.4

Binding free energy estimates based on the binding PMF for a minimized receptor structure suffices to provide high correlation with experiment (R2 = 0.884 for NAMD) and M2 free energy calculations (R2 = 0.827 for NAMD) (see Table 1). Surprisingly, binding free energies from NAMD GBSA calculations are more highly correlated to these benchmarks than ΔG^ from PBSA calculations. Ironically, the high correlation may be explained by inaccurately large binding PMF values resulting from highly charged ligands, as the molecules in this set with the strongest charges also tend to have stronger binding affinities. Although the correlation coefficient is high, the RMSE is also considerable, over 10 kcal/mol. Similar performance (R2 and RMSE) is observed by using the dominant state approximation with PBSA calculations. In contrast, using Eq. 23 with PBSA leads to less correlated (lower R2) but more accurate (lower RMSE) estimates of the binding free energy.

Even for this simple system, binding free energy estimates are substantially improved by using multiple receptor structures (Table 2). With binding PMFs from PBSA energies for 100 receptor structures, there is both higher correlation and lower RMSE with respect to experiment (RExp2=0.704, RMSEExp = 4.5) and especially with respect to M2 free energy calculations (RGilson2=0.925, RMSEGilson = 2.4).

While there are some variations on the order of a few kcal/mol, mean potential energy changes upon complexation are also consistent with results from Moghaddam et al.32 (Table 3). Minor discrepancies between M2 and implicit ligand free energy and mean potential energy calculations may be explained by a combination of imperfect sampling in the current calculations and the approximations in M2. As the described calculations were performed in vacuum, the samples may not be from the same configurational space as those in implicit solvent. On the other hand, M2 assumes that the energy landscape of the ligand, receptor, and complex are a truncated harmonic wells with anharmonicity corrections.

Table 3.

Estimates of the mean potential energy changes (kcal/mol) upon the binding of various ligands to CB[7]. The columns refer to van der Waals (VDW), Coulomb (Coul), electrostatic solvation (PB), valence (Val, bond + angle + dihedral), nonpolar solvation (NP), and total energies. The value in the parentheses is the standard deviation from bootstrapping: the observable is estimated based on 1000 random selections of 100 values of Θ^. In Table II of the supplementary material,37 mean potential energies for the ligand, receptor, and complex are also shown.

Ligand VDW Coul PB Val NP Total
AD1 −32.5 (0.471) 0.1 (1.509) 4.8 (1.547) −5.0 (2.785) −2.5 (0.011) −35.2 (2.547)
AD2 −33.6 (0.931) −65.8 (1.032) 64.9 (0.783) −5.9 (1.910) −2.5 (0.017) −42.9 (1.693)
AD3 −32.8 (0.718) −64.4 (0.693) 62.2 (0.855) −5.7 (2.128) −2.6 (0.009) −43.4 (2.388)
AD4 −38.1 (1.400) −125.2 (3.283) 124.4 (1.003) 1.9 (4.475) −2.7 (0.070) −39.9 (2.817)
AD5 −33.3 (1.374) −65.1 (1.549) 64.8 (1.415) −4.9 (1.782) −2.5 (0.025) −40.9 (1.834)
B02 −33.3 (0.622) −5.8 (1.067) 9.7 (0.770) 1.4 (3.379) −2.7 (0.022) −30.6 (2.187)
B05 −32.9 (0.896) −138.2 (1.231) 138.0 (1.151) −2.3 (1.236) −2.8 (0.013) −38.1 (1.673)
B11 −39.9 (1.192) −199.3 (2.280) 212.0 (1.066) −5.6 (5.431) −3.4 (0.075) −36.2 (4.475)
F01 −26.2 (0.497) −8.2 (1.824) 14.2 (1.119) 8.7 (4.842) −2.7 (0.017) −14.3 (4.079)
F02 −26.9 (1.518) −65.7 (2.078) 65.9 (0.923) −0.9 (2.237) −3.0 (0.012) −30.6 (2.253)
F03 −28.7 (0.832) −58.0 (0.987) 64.2 (0.649) −0.4 (3.552) −3.0 (0.015) −26.1 (3.428)
F06 −35.1 (1.154) −116.1 (0.810) 120.9 (0.651) −8.8 (4.862) −3.5 (0.013) −42.6 (4.132)

Compared to the full procedure for estimating the binding PMF, applying the dominant state configuration leads to a reduction in the correlation with M2 results and an increase in the RMSE (Table 4). In contrast, applying the dominant state approximation to calculate ΔG^ from B^(rR) leads to a near-constant reduction of about 3 kcal/mol in the estimated binding free energy. While the RMSE increases, the correlation with M2 results remains nearly identical. Given a set of B^(rR) results, however, there is essentially no reason to apply the dominant state approximation rather than Eq. 22.

Table 4.

Estimates of the binding free energy ΔG° (kcal/mol) using the PBSA model. First, the binding PMF B(rR) is estimated with the dominant state approximation (min {Ψ(rR)}) or based on Eq. 23, using Hamiltonian replica exchange for Bcpl (HREX). Then, ΔG^ is from the dominant state approximation (min{B^(rR)}) or based on Eq. 22 (EXP). The bottom rows show the correlation coefficient (R2) and root mean square error (RMSE, Eq. 25) with respect to isothermal titration calorimetry experiments (ITC) and mining minima calculations (Gilson) from Moghaddam et al.,32 and the fourth column.

Ligand        
B^(rR) min{Ψ(rR)} min{Ψ(rR)} HREX HREX
ΔG^ minB^(rR) EXP minB^(rR) EXP
AD1 −28.6 −27.2 −22.0 −20.1
AD2 −36.4 −34.6 −27.6 −25.4
AD3 −38.1 −36.8 −27.6 −26.2
AD4 −43.1 −40.4 −29.8 −27.1
AD5 −35.8 −33.6 −26.8 −24.4
B02 −29.8 −27.9 −21.0 −18.1
B05 −37.9 −35.6 −23.7 −21.4
B11 −48.5 −45.7 −23.1 −20.5
F01 −22.7 −21.3 −10.2 −7.6
F02 −30.9 −28.8 −17.0 −14.6
F03 −28.7 −27.0 −14.5 −13.2
F06 −35.6 −33.8 −21.3 −19.7
RITC2 0.849 0.855 0.684 0.704
RMSEITC 17.3 15.3 5.8 4.5
RGilson2 0.787 0.795 0.926 0.925
RMSEGilson 15.8 13.9 3.5 2.4
RExp2 0.723 0.736 0.996  
RMSEExp 15.5 13.6 2.3  

There is considerable variation in the binding PMFs for the 100 receptor structures (Fig. 2 and Fig. 2 in the supplementary material37). For most of the ligands, the range of binding PMFs spans 10–20 kcal/mol. While the binding PMF of the minimized structure is often near the lower end of the binding PMF distribution, this is not always the case. In larger ligands, the binding PMF appears to be lower for other receptor structures. The fact that a single structure does not always lead to the lowest binding PMF shows a major limitation of using a single receptor structure to estimate binding free energies.

Figure 2.

Figure 2

(a) Histogram of binding PMF estimates B^(rR) (kcal/mol) of B02 to 100 snapshots of CB[7] using PBSA energies. The vertical line shows the mean binding PMF for the minimized receptor structure. (b) and (c) Estimates of the binding free energy ΔG° of B02 to CB[7] (kcal/mol), using PBSA energies, as a function of the number of receptor snapshots. The line and error bars denote the mean and standard deviation from bootstrapping: the binding free energy is estimated 100 times using random selections of N out of 100 binding PMFs. Analogous plots for the other ligands in this study are available as Figs. 2 and 3 in the supplementary material.37

In spite of the variability of binding PMFs, for the ligands in the test set, the average value of ΔG° appears to stabilize after a relatively small number (about 15) of receptor snapshots (Fig. 2 and Fig. 3 in the supplementary material37). Using a greater number of snapshots slightly reduces the variance of binding free energy estimates. After the certain point, however, further reduction in the variance of ΔG^ is limited by the variance in binding PMF estimates.

DISCUSSION

While the good agreement between implicit ligand and M2 calculations provides a proof of principle, the convergence and accuracy of implicit ligand calculations will differ with other classes of receptor-ligand pairs. With protein-ligand pairs, for example, representative sampling of receptors and finding low-energy poses of the ligand will likely require much more MD simulation time. On the other hand, many protein-ligand systems are not as strongly charged and may be less sensitive to the electrostatic solvation free energy. Due to these variabilities, assessments for the feasibility of implicit ligand calculations in different classes of systems will prove valuable. Tests for convergence and accuracy may be similar to those performed for the CB[7] system.

Numerous opportunities remain for further methodological improvement and optimization of implicit ligand free energy calculations. The accuracy of implicit ligand calculations (and M2 calculations) may be limited by the quality of the force field. The decomposition of the binding PMF in Eq. 23 provides a facile means to integrate alternate and potentially more expensive potential energies, e.g., quantum mechanical calculations or more sophisticated nonpolar solvation free energies. Modeling may also be improved by the inclusion of a few explicit water molecules (see supplementary material.37) Another potential avenue for improvement is the fine-tuning of the replica exchange protocol (e.g., using implicit solvent or optimizing the number of stages and values of λ for a particular system) or implementing alternative methods to estimate the binding PMF and binding free energy.

Even without modifying the replica exchange protocol, computations may be accelerated by optimizing existing MD simulation packages for implicit ligand theory. Few modern MD simulation programs take full advantage of rigid degrees of freedom by skipping the calculation of pairwise interactions between rigid atoms. Even fewer implicit-solvent models are designed with rigid receptors in mind;38 implicit ligand theory may inspire the development of such models.

Implicit ligand theory also provides guidance on how to understand and improve existing molecular docking algorithms. The definition of Ψ(rRL) provides a straightforward functional form that can be used to account for solvation free energies and ligand internal energies (strain), which have been noted to be important factors in binding free energies,39 but are frequently ignored in the interaction energy functions used by docking packages. Implicit ligand theory also delineates how to improve the ranking of different ligands. Molecular docking packages currently rank receptor-ligand binding free energies based on a single low-energy configuration. As such, they apply the crudest form of implicit ligand theory, the dominant state approximation, to estimate both the binding PMF and the binding free energy. With important modifications to existing algorithms and the application of more complex estimators, the accuracy of scoring functions should be enhanced.

One important potential change to molecular docking is the inclusion of multiple receptor configurations. While most modern docking packages account for the orientation and flexibility of the ligand, the large number of coordinates makes the treatment of receptor flexibility challenging. A number of groups have improved docking performance by treating receptor flexibility by using multiple structures from crystallography25, 40, 41 or MD simulations42 (the relaxed complex method7, 43, 44). Molecular dynamics simulations have also revealed binding sites not discovered by crystallography.45, 46 In the case of Human Immunodeficiency Virus integrase, insight into a new binding site even inspired the development of a new drug.47

Despite of this success, it has hitherto remained unclear how to combine information from docking to different receptor snapshots. While averaging strategies have been empirically compared,48 the default strategy has been to rank the ligand using the minimal energy from docking to all the snapshots. With implicit ligand theory, it is clear that the binding free energy may also be estimated by using an exponential average or cumulant expansion of the binding PMF (which may still come from the dominant state approximation) for different snapshots.

The computational expense of the relaxed complex approach may be reduced by clustering snapshots and selecting a representative snapshot from each cluster.44 Assuming that the binding PMF is constant within the cluster, estimated averages may be weighed by the cluster size. Using a clustering algorithm based on QR factorization to select 33 representative structures Amaro et al.44 were able to accurately reproduce a histogram of docking scores to over 400 structures. Further research will be necessary to develop and validate algorithms that reliably cluster receptor configurations in which the binding PMF is nearly constant.

Compared to the inclusion of multiple receptor structures, a more difficult task is the estimation of B(rR) using molecular docking, as this involves a paradigm shift from searching for a minimum to sampling from a distribution. Docking algorithms may be broadly classified into two categories: matching and docking simulation.49 Matching algorithms such as DOCK50 attempt to match a ligand into a model of the binding site. DOCK models both the ligand and the binding site as a set of spheres and uses algorithms from graph theory to align the ligand spheres into the binding site spheres. In docking simulation methods such as AutoDOCK49, 51, 52, 53 and MCDOCK,54 the ligand starts outside the binding site and its configuration and orientation are progressively modified to search for the lowest energy configuration of the complex.

Matching algorithms can estimate B(rR) by a postprocessing algorithm. That is, after low-energy complexes are found, they may be used to bias receptor-independent random sampling of the ligand orientation by a confining potential UcL), for use in Eq. 17. With a harmonic potential for UcL), the ligand orientation will come from a Gaussian distribution. An alternative postprocessing algorithm is to use the lowest energy structure from a matching algorithm as a starting point for a rigid-receptor MD simulation. This is not prohibitively expensive; Graves et al.8 even used MD simulations with flexibility near the binding site as a postprocessing step for molecular docking. Samples from this simulation would be used to estimate B(rR) based on Eq. 18 or 19.

Docking simulation methods, on the other hand, will need to be modified to sample from a known distribution rather than to search for the minimum energy. This change may not require a complete revamp. Docking simulation algorithms are often based on Monte Carlo approaches, which preserve a desired distribution or may be readily modified to do so. For example, MCDOCK54 and early generations of AutoDOCK51, 52 use simulated annealing, a procedure for which it is possible to calculate the importance sampling weight.55

In addition to providing a path to rigorous binding free energies from molecular docking, implicit ligand theory also quantifies existing notions56 about whether molecular recognition proceeds by induced fit or conformational selection.57, 58 As all receptor configurations have finite Boltzmann probability, the issue is a matter of degree. Suppose that a receptor binds to two different ligands with the same binding free energy, one by conformational selection and the other by induced fit. If, to a good approximation, the complex is dominated by a single structure with receptor configuration rR* such that B(rR) = ∞ for all other receptor configurations, then Eq. 11 simplifies to ΔG=U(rR*)+B(rR*)+β1lnZR+ΔGξ. For the ligand that binds by conformational selection, p(rR*)=eβU(rR*)/ZR has a reasonably high probability. In the induced fit complex, U(rR*) is much less favorable and B(rR*) must compensate accordingly to achieve the same ΔG°.

Source code and data used in this paper are available at https://simtk.org/home/implicit_ligand .

ACKNOWLEDGMENTS

The author thanks David Beratan for being a supportive postdoctoral advisor, Aaron Virshup and Shahar Keinan for helpful discussions, John Chodera and David Mobley for comments on the manuscript, Yi Wang for suggesting CB[7] as a test case, Michael Gilson for providing parameters for CB[7] and its ligands, Clayton Jarratt for pinpointing the cause of negative surface areas in NAMD, and Emilio Gallichio for sharing source code for the binding energy distribution analysis method (BEDAM). Calculations were performed using Duke Shared Computing Resources (DSCR). This research was funded by the National Science Foundation (NSF) (CHE10-57953), the National of Institutes of Health (NIH) (2P50 GM-067082-06-10 and N00014-11-1-0729).

References

  1. Shoichet B. K., Nature (London) 432, 862 (2004). 10.1038/nature03197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Klebe G., Drug Discovery Today 11, 580 (2006). 10.1016/j.drudis.2006.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Kim R. and Skolnick J., J. Comput. Chem. 29, 1316 (2008). 10.1002/jcc.20893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Moitessier N., Englebienne P., Lee D., Lawandi J., and Corbeil C. R., Br. J. Pharmacol. 153, S7 (2009). 10.1038/sj.bjp.0707515 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Plewczynski D., Laźniewski M., Augustyniak R., and Ginalski K., J. Comput. Chem. 32, 742 (2010). 10.1002/jcc.21643 [DOI] [PubMed] [Google Scholar]
  6. Wang J., Morin P., Wang W., and Kollman P., J. Am. Chem. Soc. 123, 5221 (2001). 10.1021/ja003834q [DOI] [PubMed] [Google Scholar]
  7. Lin J., Perryman A., Schames J., and McCammon J., Biopolymers 68, 47 (2003). 10.1002/bip.10218 [DOI] [PubMed] [Google Scholar]
  8. Graves A. P., Shivakumar D. M., Boyce S. E., Jacobson M. P., Case D. A., and Shoichet B. K., J. Mol. Biol. 377, 914 (2008). 10.1016/j.jmb.2008.01.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Thompson D. C., Humblet C., and Joseph-McCarthy D., J. Chem. Inf. Model. 48, 1081 (2008). 10.1021/ci700470c [DOI] [PubMed] [Google Scholar]
  10. Hou T., Wang J., Li Y., and Wang W., J. Comput. Chem. 32, 866 (2010). 10.1002/jcc.21666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chang M. W., Ayeni C., Breuer S., and Torbett B. E., PLoS ONE 5, e11955 (2010). 10.1371/journal.pone.0011955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Huang N., Kalyanaraman C., Bernacki K., and Jacobson M. P., Phys. Chem. Chem. Phys. 8, 5166 (2006). 10.1039/b608269f [DOI] [PubMed] [Google Scholar]
  13. Activities have been assumed to be unity, a reasonable approximation in the limit of low concentrations.
  14. Gilson M. K., Given J., Bush B., and McCammon J., Biophys. J. 72, 1047 (1997). 10.1016/S0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Wang J., Tan C., Tan Y.-H., Lu Q., and Luo R., Commun. Comput. Phys. 3, 1010 (2008), available at : http://www.global-sci.com/issue/abstract/readabs.php?vol=3&page=1010. [Google Scholar]
  16. Feig M. and Brooks C., Curr. Opin. Struct. Biol. 14, 217 (2004). 10.1016/j.sbi.2004.03.009 [DOI] [PubMed] [Google Scholar]
  17. Michel J. and Essex J. W., J. Med. Chem. 51, 6654 (2008). 10.1021/jm800524s [DOI] [PubMed] [Google Scholar]
  18. Chang C.-E. and Gilson M. K., J. Comput. Chem. 24, 1987 (2003). 10.1002/jcc.10325 [DOI] [PubMed] [Google Scholar]
  19. Chang C.-E., Potter M. J., and Gilson M. K., J. Phys. Chem. B 107, 1048 (2003). 10.1021/jp027149c [DOI] [Google Scholar]
  20. Lee M., Biophys. J. 90, 864 (2005). 10.1529/biophysj.105.071589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gallicchio E., Lapelosa M., and Levy R. M., J. Chem. Theory Comput. 6, 2961 (2010). 10.1021/ct1002913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Cohen J., Arkhipov A., Braun R., and Schulten K., Biophys. J. 91, 1844 (2006). 10.1529/biophysj.106.085746 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jiang W., Hodoscek M., and Roux B., J. Chem. Theory Comput. 5, 2583 (2009). 10.1021/ct900223z [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gallicchio E. and Levy R. M., J. Comput.-Aided Mol. Des. 26(5), 505 (2012). 10.1007/s10822-012-9552-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rao S., Sanschagrin P. C., Greenwood J. R., Repasky M. P., Sherman W., and Farid R., J. Comput.-Aided Mol. Des. 22, 621 (2008). 10.1007/s10822-008-9182-y [DOI] [PubMed] [Google Scholar]
  26. Free Energy Calculations, edited by Chipot C. and Pohorille A. (Springer, Berlin, 2007), Vol. 86. [Google Scholar]
  27. Zwanzig R., J. Chem. Phys. 22, 1420 (1954). 10.1063/1.1740193 [DOI] [Google Scholar]
  28. Kirkwood J. G., J. Chem. Phys. 3, 300 (1935). 10.1063/1.1749657 [DOI] [Google Scholar]
  29. Bennett C. H., J. Comput. Phys. 22, 245 (1976). 10.1016/0021-9991(76)90078-4 [DOI] [Google Scholar]
  30. Shirts M. R. and Chodera J. D., J. Chem. Phys. 129, 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Moghaddam S., Inoue Y., and Gilson M. K., J. Am. Chem. Soc. 131, 4012 (2009). 10.1021/ja808175m [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Moghaddam S., Yang C., Rekharsky M., Ko Y. H., Kim K., Inoue Y., and Gilson M. K., J. Am. Chem. Soc. 133, 3570 (2011). 10.1021/ja109904u [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Using the linear combination of pairwise overlap algorithm, NAMD 2.9 calculates a negative surface area for CB[7]. NAMD directly uses Appendix B of Weiser et al. , in which the P1 parameter for the N sp3 atom type with 1 bonded neighbor is 7.8602 × 10−2, which is smaller than other P1 values and the corresponding P2 parameter. By definition, P1 should be larger than P2. To bring this parameter in line with other P1 and to make it larger than P2, this parameter was multiplied by 10. The modified code yields a positive surface area for CB[7].
  34. Phillips J. C., Braun R., Wang W., Gumbart J., Tajkhorshid E., Villa E., Chipot C., Skeel R. D., Kalé L., and Schulten K., J. Computat. Chem. 26, 1781 (2005). 10.1002/jcc.20289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Davis M. E., Madura J. D., Luty B. A., and McCammon J., Comput. Phys. Commun. 62, 187 (1991). 10.1016/0010-4655(91)90094-2 [DOI] [Google Scholar]
  36. Duane S., Kennedy A. D., Pendleton B. J., and Roweth D., Phys. Lett. B 195, 216 (1987). 10.1016/0370-2693(87)91197-X [DOI] [Google Scholar]
  37. See supplementary material at http://dx.doi.org/10.1063/1.4751284 for tables and figures.
  38. Guvench O., Weiser J., Shenkin P., Kolossváry I., and Still W., J. Comput. Chem. 23, 214 (2001). 10.1002/jcc.1167 [DOI] [PubMed] [Google Scholar]
  39. Mobley D. L. and Dill K. A., Structure 17, 489 (2009). 10.1016/j.str.2009.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Craig I. R., Essex J. W., and Spiegel K., J. Chem. Inf. Model. 50, 511 (2010). 10.1021/ci900407c [DOI] [PubMed] [Google Scholar]
  41. Bottegoni G., Rocchia W., Rueda M., Abagyan R., and Cavalli A., PLoS ONE 6, e18845 (2011). 10.1371/journal.pone.0018845 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Nichols S. E., Baron R., Ivetac A., and McCammon J. A., J. Chem. Inf. Model. 51, 1439 (2011). 10.1021/ci200117n [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lin J., Perryman A., Schames J., and McCammon J., J. Am. Chem. Soc. 124, 5632 (2002). 10.1021/ja0260162 [DOI] [PubMed] [Google Scholar]
  44. Amaro R. E., Baron R., and McCammon J. A., J. Comput.-Aided Mol. Des. 22, 693 (2008). 10.1007/s10822-007-9159-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Schames J., Henchman R., Siegel J., Sotriffer C., Ni H., and McCammon J., J. Med. Chem. 47, 1879 (2004). 10.1021/jm0341913 [DOI] [PubMed] [Google Scholar]
  46. Amaro R. E., Minh D. D. L., Cheng L. S., Lindstrom W. M., Olson A. J., Lin J.-H., Li W. W., and McCammon J. A., J. Am. Chem. Soc. 129, 7764 (2007). 10.1021/ja0723535 [DOI] [PubMed] [Google Scholar]
  47. Durrant J. D. and McCammon J. A., BMC Biol. 9, 71 (2011). 10.1186/1741-7007-9-71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Paulsen J. L. and Anderson A. C., J. Chem. Inf. Model. 49, 2813 (2009). 10.1021/ci9003078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Morris G., Goodsell D., Halliday R., Huey R., Hart W., Belew R., and Olson A., J. Comput. Chem. 19, 1639 (1998). [DOI] [Google Scholar]
  50. Kuntz I. D., Blaney J. M., Oatley S. J., Langridge R., and Ferrin T. E., J. Mol. Biol. 161, 269 (1982). 10.1016/0022-2836(82)90153-X [DOI] [PubMed] [Google Scholar]
  51. Goodsell D. and Olson A., Proteins 8, 195 (1990). 10.1002/prot.340080302 [DOI] [PubMed] [Google Scholar]
  52. Morris G., Goodsell D., Huey R., and Olson A., J. Comput.-Aided Mol. Des. 10, 293 (1996). 10.1007/BF00124499 [DOI] [PubMed] [Google Scholar]
  53. Huey R., Morris G. M., Olson A. J., and Goodsell D. S., J. Comput. Chem. 28, 1145 (2007). 10.1002/jcc.20634 [DOI] [PubMed] [Google Scholar]
  54. Liu M. and Wang S., J. Comput.-Aided Mol. Des. 13, 435 (1999). 10.1023/A:1008005918983 [DOI] [PubMed] [Google Scholar]
  55. Neal R., Stat. Comput. 11, 125 (2001). 10.1023/A:1008923215028 [DOI] [Google Scholar]
  56. Carlson H., Curr. Opin. Chem. Biol. 6, 447 (2002). 10.1016/S1367-5931(02)00341-1 [DOI] [PubMed] [Google Scholar]
  57. Xu Y., Colletier J. P., Jiang H., Silman I., Sussman J. L., and Weik M., Protein Sci. 17, 601 (2008). 10.1110/ps.083453808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Bucher D., Grant B. J., and McCammon J. A., Biochemistry 50, 10530 (2011). 10.1021/bi201481a [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Weiser J., Shenkin P. S., and Still W. C., J. Comput. Chem. 20, 217 (1999). [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES