Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2012 Dec 21;137(23):230901. doi: 10.1063/1.4769292

Perspective: Alchemical free energy calculations for drug discovery

David L Mobley 1,2,a), Pavel V Klimovich 2
PMCID: PMC3537745  PMID: 23267463

Abstract

Computational techniques see widespread use in pharmaceutical drug discovery, but typically prove unreliable in predicting trends in protein-ligand binding. Alchemical free energy calculations seek to change that by providing rigorous binding free energies from molecular simulations. Given adequate sampling and an accurate enough force field, these techniques yield accurate free energy estimates. Recent innovations in alchemical techniques have sparked a resurgence of interest in these calculations. Still, many obstacles stand in the way of their routine application in a drug discovery context, including the one we focus on here, sampling. Sampling of binding modes poses a particular challenge as binding modes are often separated by large energy barriers, leading to slow transitions. Binding modes are difficult to predict, and in some cases multiple binding modes may contribute to binding. In view of these hurdles, we present a framework for dealing carefully with uncertainty in binding mode or conformation in the context of free energy calculations. With careful sampling, free energy techniques show considerable promise for aiding drug discovery.

INTRODUCTION

Structure-based drug design seeks to predict binding

Structure-based drug design seeks to take an experimental structure of a drug target and identify or design a small molecule which binds to this macromolecular target in a desired way, modulating its function and thereby treating a target disease or condition.1, 2, 3 The goal is a process which begins with a structure and yields a good drug.2, 4 Unfortunately, every aspect of this process has proven challenging.

Currently, computational techniques are applied throughout the discovery process.3, 5, 6, 7, 8 Of particular interest here are the early to middle stages of the process, where one seeks to first identify an initial “hit”—a small molecule which binds to the target with sufficient affinity to be interesting—and then improve this molecule's properties, affinity, and specificity to the point where it is good candidate for further development as a drug.6, 7, 9 That is, we seek computational techniques which can be applied to hit identification (often called “virtual screening” as this is usually applied to screen libraries of compounds) and the lead optimization stage of drug discovery.

At the earliest stages of this process, large libraries of existing compounds are often considered,3, 10, 11, 12, 13 so computational techniques need to above all be fast, even if unreliable for many individual molecules.14 But once initial hits are identified, the goal changes from identifying which molecules bind, to making these initial hits bind better, otherwise improving their properties, or expanding chemical diversity.9 At this stage (lead optimization), accuracy is a much larger consideration than speed, as experiments now involve synthesis of new molecules and can be slow and expensive.3, 15 An accurate computational method could reduce the need for synthesis and experiment and accelerate the process.

Existing computational methods used in the pharmaceutical industry are mostly focused on the earliest stages of this process—library screening. These methods, including docking, chemoinformatics, and ligand-based methods, are highly approximate and often empirical. While they can be helpful for screening large libraries, their ability to predict binding strength is typically extremely poor.7, 10, 16, 17

Free energy calculations could guide structure-based design

Free energy calculations based on molecular simulations show promise at providing higher accuracy for the lead optimization stage of discovery. These calculations yield binding free energy (or affinity) results which are correct given the force field (which provides the potential energy as a function of system configuration), at least in the limit of adequate sampling and simulation time.4, 18, 19, 20, 21, 22 Our focus is mainly on “alchemical” free energy calculations, which see the most use, and particularly on innovations that make these calculations appealing for drug discovery, as well as challenges that still stand in the way of their widespread use.

Binding free energy calculations yield either absolute free energies (measuring the free energy of binding of a single ligand to a single receptor), or, more commonly, relative free energies (comparing the binding of related ligands to a receptor or a single ligand to related receptors). Relative calculations are generally thought to be more efficient.4, 19, 20, 23 This is partly because practitioners hope for a cancellation of errors where, for example, inadequate sampling of receptor motions for one ligand would have similar effects on binding of another ligand, thus yielding minimal errors. While this cancellation is not guaranteed, it seems reasonable to us that for small ligand modifications, relative free energy calculations will indeed be more efficient. Also, accurate prediction of relative binding free energies would be well suited to applications in lead optimization, as noted above. For these reasons, we focus here on relative calculations.

Alchemical free energy calculations work by introducing a series of intermediate unphysical states spanning between the desired end states. For example, for relative calculations, binding free energies are compared by changing one ligand into another or turning off interactions of one ligand in a receptor while turning on interactions of another ligand in a receptor (Fig. 3). The intermediate states here are, roughly speaking, associated with fractional presence of each ligand, as discussed in more detail elsewhere.4, 18, 19, 20, 21

Figure 3.

Figure 3

Standard thermodynamic cycle for relative free energy calculations. P represents the protein (or receptor) and L the ligands. To compute the relative binding free energy of two ligands, L1 and L2, to the same protein, L2 is mutated into L1 in solution yielding ΔΔGsolv, and L2 into L1 in the binding site, yielding ΔΔGsite. The difference, ΔΔGsite − ΔΔGsolv, is related to the difference of ΔG1 and ΔG2. In the figure, each box denotes a separate solvated system which does not interact with the other boxes.

Alchemical free energy calculations are built on groundwork laid by Kirkwood24 and Zwanzig,25 among others. Kirkwood described a coupling parameter approach, where a parameter (λ) controls interaction strength between a molecule or particle and the remainder of the system, and described how this could be used to compute free energy differences using thermodynamic integration (TI). Zwanzig showed that a free energy difference between two states can be computed via an appropriate exponential average of energy differences over an ensemble of configurations.25 The coupling parameter approach underlies all alchemical free energy simulations, and TI and the Zwanzig relation can be thought of as analysis techniques for these calculations.

Since these techniques predate molecular simulations, they were quickly applied in new ways in simulations. Many early applications focused on solvation free energies.26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 Soon after the basic idea for relative binding free energy calculations was laid out,39 the first applications to binding followed, in host-guest systems30, 40, 41 and proteins.35, 42, 43, 44, 45, 46, 47 Several early reviews provide useful perspective.48, 49

Despite early enthusiasm for alchemical calculations, a number of key innovations were necessary to make them more robust. The development of “soft-core potentials”50 or “separation-shifted scaling”51 led to much better convergence52 for transformations involving changes in the number of atoms or chemical structure, opening up new applications and improving performance. Rediscovered appreciation for the Bennett acceptance ratio approach53 for computing free energies54, 55 yielded a much more efficient analysis tool than the Zwanzig relation, and the subsequent generalization into the multistage Bennett acceptance ratio took this still further.56 Today, these improvements make solvation free energy calculations for small or at least fragment-like molecules essentially routine,57, 58 even for hundreds of molecules59, 60, 61, 62, 63 and binding free energy calculations are much more tractable.

Relative free energy calculations have seen some significant applications. The Jorgensen lab at Yale routinely applies MC-based alchemical free energy calculations to help guide optimization of lead candidates in an early-stage drug discovery setting.15, 64 Several other groups in academia have applied them in similar efforts.65, 66, 67 Industry has also found some use for these techniques,68, 69, 70 and several studies have argued that these methods are accurate enough to be used routinely in a drug discovery context.4, 71, 72, 73, 74 Several recent reviews cover foundations and highlights.4, 18, 19, 20, 75, 76

Robust free energy calculations could have a profound impact on the drug discovery process with a modest level of accuracy. To see this, consider a hypothetical discovery pipeline (following Ref. 4). A computational chemist participates in an existing drug discovery project which already has several hits which otherwise look promising but lack sufficient affinity. The computational chemist's job, each week, is to take a list of proposed compounds which could be made next and select the most promising for synthesis. For example, the medicinal chemistry team might propose 100 compounds and the computational chemist might need to select 10 for synthesis. Assume our goal is to gain a factor of 10 in binding affinity (or make the binding free energy better by ∼1.4 kcal/mol). What accuracy do we need to achieve to dramatically reduce the time or number of compounds that must be synthesized to reach this goal? It turns out that even a very modest level of accuracy can provide significant benefits for lead optimization (Fig. 1). To quantitatively analyze this, we assume our computational method yields correct affinity predictions with a given level of Gaussian random noise, and ask what level of noise we can tolerate. It turns out that the distribution of affinity changes seen in actual compounds proposed by medicinal chemists is very nearly Gaussian and centered at an affinity change of zero.77 Thus, a rather simple statistical analysis is possible. Screening a fixed number of compounds will result in finding an increasing number of highly potent compounds as the method's noise goes down (Fig. 1). We can use this to reduce the number of molecules which must be synthesized. Assume we will actually screen up to 10 molecules each week, and ask how many total molecules must be screened to gain a factor of 10 in affinity after filtering by our computational method. With 0.5 kcal/mol of noise, the number screened is reduced by a factor of 8; with 1.0 kcal/mol of noise, a factor of 5, and with 2.0 kcal/mol of noise, a factor of 3 (see Ref. 4). Thus, a method which could screen ∼10–100 molecules per week with even 2 kcal/mol of noise would impact lead optimization by reducing the synthesis needed in a lead series by a factor of 3.

Figure 1.

Figure 1

The probability of synthesizing a compound with a particular binding free energy change. Filled regions indicate those compounds with at least a factor of 10 gain in binding affinity, and are labeled with the reduction in the number of compounds which would need to be synthesized (on average) to gain this factor of 10 in affinity. Blue is the approximate distribution observed experimentally for compounds proposed internally at Abbott; orange, green, and red are distributions generated by filtering molecules with a hypothetical computational method which gives correct free energy estimates with 2.0, 1.0, and 0.5 kcal/mol of noise, respectively. Figure adapted from Ref. 4.

In reality, binding affinity is just one consideration in lead optimization—improvements in affinity must be balanced against other factors such as solubility, bio-availability, stability, and so on.6, 7, 8 Still, accurate tools for affinity prediction will provide tremendous help. For example, ligand modifications made for solubility reasons must not ruin the binding affinity, so affinity prediction will be useful even while optimizing other factors.7, 20

In summary, free energy calculations could dramatically aid structure-based drug design, one of our overarching goals. But a concrete and hopefully realistic near-term goal is to routinely exceed a 2 kcal/mol (root mean squared error) level of accuracy in computing relative binding free energies of related molecules. If possible with reasonable computational efficiency, this would yield real improvements in early-stage drug discovery.

Free energy calculations face serious challenges

Free energy calculations, while in principle rigorous, face three main challenges. One is simply a logistical challenge—these calculations remain difficult to set up, conduct, and analyze, and choices which may be relatively unimportant for other types of molecular simulations (such as the choice of thermostat, barostat, or even cutoff78) can adversely affect accuracy. But beyond logistics, computed free energies can be in error because of limitations in the force field or due to inadequate sampling. In the limit of infinite sampling, most analysis approaches are guaranteed to yield correct free energies for the force field. But practical simulations may often fall short of this limit, and research on how much simulation is needed to reach this limit is ongoing.

Here, our interest is free energy techniques themselves—that is, obtaining correct results for the particular setup and choice of force field. So we focus primarily on limitations due to sampling, as new force field developments can generally easily be incorporated in free energy simulations. Evidence from solvation free energy studies suggests that current force fields actually may be good enough to reach the level of accuracy needed for relevance to pharmaceutical drug discovery, further justifying our focus on sampling.59, 60, 61, 62, 63, 79, 80, 81 It is certainly the case that force field limitations do exist,59, 62, 63, 82 but these are not our focus here. There are many different places where sampling can go wrong. We might have difficulty finding the relevant configurations of a system, or obtaining correct populations for these configurations. This could occur for the receptor, such as in the case of receptor conformational changes, or for the ligand (for ligand conformational changes or changes in binding mode).

Assessing the magnitude of the error introduced by sampling problems in relative free energy calculations can be difficult, but some compelling evidence shows these problems introduce significant errors. Since the sum of free energies around any thermodynamic cycle is by definition zero, cycle closure errors provide a lower bound on the amount of sampling error present in free energy calculations. In cycle closure analysis, the relative binding free energy of two ligands is computed by at least two different paths, forming a cycle. Cycle closure errors are reported relatively infrequently, but in cases where they have been reported, cycles may fail to close (indicating an error) by as much as several kcal/mol, far in excess of statistical error estimates,83, 84, 85, 86, 87 suggesting serious sampling problems. This may contribute to relative free energy calculations' reputation for unreliability, where performance can be good in some systems and terrible in others.

Some significant error in free energy calculations, then, originates with sampling problems. Considering a flexible ligand binding to a flexible receptor, our fundamental problem is that multiple states contribute to binding. These include multiple conformations of the receptor, multiple conformations of the ligand, and multiple orientations of the ligand,88 as we discuss further in Sec. 1E. Adequate sampling means that our simulations have visited all of the relevant states in the correct proportions, and thus we can obtain correct free energies. Some or all of these states will be within the same local minima on the energy landscape, so interconversion between them will be relatively quick. But other states may be separated by substantial barriers. We will call these distinct metastable states separated by barriers “binding modes.” Different binding modes may differ in ligand orientation, ligand conformation, receptor (usually protein) structure, or even presence or absence of water molecules in the binding site region, for example.88 Given this (broad) definition of a binding mode, then, typical sampling errors in free energy calculations originate from a failure to adequate sample relevant ligand-receptor binding modes.

Many sampling problems involve binding mode sampling challenges

This problem of binding mode sampling is particularly important for relative free energy calculations. Specifically, relative free energy calculations (as we discuss below) typically assume that two related inhibitors share a common binding mode or that any energy barriers between binding modes are small, so their different potential binding modes interconvert quickly. While this is sometimes the case, there are many cases where the binding mode is not shared or not known a priori and interconversion is slow, leading to the potential for serious errors, as we show in the examples that follow.

To be more specific, whenever the dominant ligand binding mode is not definitively known for the two ligands being compared, and shared by these two ligands, the thermodynamic cycle may not close, except in the limit of impractically long simulations. In this case, no accurate affinity prediction is possible. Computed relative free energies may be incorrect by an unknown amount which, as we highlight below, is related to the free energy of changing binding modes. Recall that by “binding mode” here we include ligand orientation and also ligand and receptor conformation.

Binding modes are difficult to predict

Both experimental and computational evidence suggests this binding mode problem poses a real challenge. In lead optimization, one typically starts from a known inhibitor of a protein (often with a known binding mode) and modifies it to attempt to improve binding.19, 70, 89 In cases where the ligand binding mode is maintained as functional groups are modified, conventional relative free energy calculations may often work well without any special treatment of alternate potential binding modes. However, small modifications to ligands do on occasion yield big differences in ligand binding orientation,65, 88, 90, 91, 92, 93, 94, 95, 96, 97 and there is currently no way to know when this will happen. For example, Stout et al. developed a series of thymidylate synthase (TS) inhibitors based on phenolphthalein and phthalein derivatives. They found that changing a five membered ring to a six membered ring dramatically altered the binding mode, then adding a nitro group (removing two chlorines) to the resulting compound introduced yet another unexpected binding mode (Fig. 2). These changes led to marked differences in affinity and specificity for the enzyme, and the authors highlighted how some other structures show multiple binding orientations for individual inhibitors. They speculate that many TS inhibitors actually have multiple binding orientations, with the affinity involving a combination of these.91 A variety of data suggests this same conclusion may hold true in a variety of other systems.20, 68, 88, 97, 98, 99, 100, 101, 102, 103 Going beyond ligand orientation, closely related ligands can bind with significantly different protein conformations. Protein conformational changes can be important, and even protein side chain differences in the binding site can be slow and, calculations indicate, thermodynamically significant.101, 103, 104, 105 Ligand modifications can also result in changes of binding mode by way of displacing water molecules.87 So related ligands may have significantly different binding modes, in terms of binding orientation, protein conformation, or even binding site water occupancy.

Figure 2.

Figure 2

Thymidylate synthase inhibitors. (a) Initial binding mode. (b) Modified binding mode on addition of a nitro group and deletion of chlorines. (PDB codes 1TSL and 1TSM91).

The situation is even worse if we move beyond lead optimization and imagine using relative free energy calculations to screen binding of a small library of more dissimilar compounds. As structural dissimilarity grows, we can expect that alternate binding modes (in the form of alternate ligand orientations or conformations, or alternate protein structures) become increasingly likely. Furthermore, in screening a small library of compounds, there is no guarantee we know the bound structure of either ligand, let alone that they share a common binding mode.

Hence, the issues discussed here are likely to affect relative free energy calculations in a variety of different applications, even for drug-like ligands.

Computation provides some evidence concerning how much binding mode sampling problems can affect binding free energy estimates. Previously, we found that an alternate stable (and potentially reasonable) binding mode of phenol in a model binding site differed by 4.0 kcal/mol in free energy from the true binding mode, yet was predicted to be the dominant binding mode by some docking techniques.106 Thus, mutating phenol into benzene in this binding mode, for example, would yield computed binding free energies erroneously favoring binding of benzene over phenol by 4 kcal/mol. Thus, appreciable errors are possible. Problems of similar magnitude were also observed in some relative free energy calculations—the ΔΔG between different small ligands in the polar lysozyme cavity varied by up to 4 kcal/mol depending on how the mutation was set up, for reasons that were attributed to sampling of binding modes.103

These trends seem to hold true in pharmaceutically relevant binding sites. For thrombin inhibitors, addition of a methyl group yields a ring flip which alters the binding mode; without special care, free energy calculations miss this change in binding mode and yield errors around 1 kcal/mol in computed relative free energies.105 Work on neutrophil elastase using end-point free energy calculations considered multiple binding modes and found one case where the relative free energy was clearly wrong due to two different kinetically distinct conformations of a ligand (by up to 1.6 kcal/mol). Obtaining correct binding free energies required including the free energy of flipping the relevant portion of the ligand.65 Michel et al. used dual-topologies to compute the relative free energy between different potential binding modes of the same ligands,107, 108 highlighting the fact that adequate sampling over binding modes is not guaranteed, nor is the binding mode always obvious. Here, subtle differences in ligand composition could easily alter the binding mode,107 and stable binding modes differed by 1 kcal/mol107 to 7 kcal/mol.108 One other study of particular interest highlighted multiple binding modes of catechol-O-methyltransferase inhibitors which had to be treated separately in relative calculations because of time scale issues.68 Multiple distinct binding modes were seen and in one case differed by up to 1.5 kcal/mol. Other applications have highlighted similar issues.18, 109, 110, 111, 112 Water plays a thermodynamically significant role as well—slow water motions into and out of binding sites can yield errors in computed relative free energies in excess of 10 kcal/mol.87 Thus, available computational data, though limited, suggest that issues relating to uncertain, incorrect, or changing ligand binding modes can introduce errors up to 7 kcal/mol in relative binding free energy calculations when these effects are ignored. Errors relating to binding mode may be a factor in both the overall unreliability of these calculations and in the cycle closure problems noted above.

Clearly, binding free energy calculations face very real sampling challenges. Modifying an existing ligand with a known binding mode, without taking into account the possibility of an important, slow change in binding mode, can lead to errors of several kcal/mol in relative free energy calculations. Uncertainty in ligand binding mode can introduce similar errors. We now discuss how typical relative free energy calculations are done, then highlight how binding mode changes can introduce these errors, and provide a general framework for handling multiple binding modes and changes in binding mode in the context of relative free energy calculations.

THEORY OF RELATIVE FREE ENERGY CALCULATIONS

Relative free energy calculations normally assume adequate sampling

Here, our interest is accurate calculations of relative binding free energies of different ligands to a receptor. We must assume, based on the analysis above, that their binding modes may not be completely known. As noted, the idea of “binding mode” here includes receptor conformation, as well as ligand position and orientation and ligand conformation.

Relative binding free energy calculations employ a thermodynamic cycle like that shown in Fig. 3. This cycle compares binding of ligands L1 and L2 to a single protein receptor, using an alchemical transformation of L2 into L1. The cycle involves turning L2 into L1 in solvent (or replacing L2 with L1), which yields ΔΔGsolv, and turning (or replacing) L2 into L1 in the receptor binding site, which yields ΔΔGsite. Since these compose two legs of a thermodynamic cycle as shown, we can then write ΔG1ΔG2=ΔΔGsiteΔΔGsolv, where ΔG1 is the standard binding free energy (also sometimes called the absolute binding free energy) for L1 and ΔG2 is the corresponding quantity for L2. Alternatively, ΔΔG1 → 2 = ΔΔGsite − ΔΔGsolv where ΔΔG1→2 is the relative binding free energy.

This thermodynamic cycle is the standard approach for relative binding free energies, but its implementation can take several forms. Single topology relative free energy calculations actually involve directly changing L2 into L1, while dual topology relative free energy calculations involve turning off the interactions of one ligand in the binding site while turning on interactions of the other ligand in the binding site.113 This issue is discussed in more detail in the supplementary material;113 here we focus mainly on single topology calculations, though the same considerations apply to dual topology calculations.113

Alternate binding modes pose potential problems for relative binding free energy calculations

To understand the nature of challenges related to binding mode sampling in free energy calculations, consider the case where related ligands L1 and L2 both bind to a receptor (as in Fig. 4 for binding of 5-chloro-2-methylphenol vs. 2-ethylphenol). Assume L1 has a known, single, dominant binding mode, but L2 has two distinct potential binding modes, both of which are stable in MD simulations on time scales longer than our typical binding free energy calculations. This scenario is described in Fig. 5. In our example here, L2 is a ligand which consists of L1 plus an additional functional group (green rectangle). L2 might share a binding mode with L1 (Fig. 5a). But L2 might alternatively have a modified binding mode (Fig. 5b).

Figure 4.

Figure 4

Relative free energy calculations might be done to compare binding of 5-chloro-2-methylphenol, left, with binding of 2-ethylphenol, right. Atoms which would be transformed into dummy atoms in a single topology calculation113 are shown in magenta and the scaffold is shown in black.

Figure 5.

Figure 5

Thermodynamic cycles for comparing binding of L1 and L2, where L2 has two possible binding modes, as indicated by the positioning of the green rectangle. In (a), L1 and L2 share a common scaffold (gray spheres) which has the same binding mode in both ligands (as in Fig. 3). In (b), L1 and L2 share a common scaffold (gray spheres) which is actually rotated in binding of L2 relative to L1.

Both these cycles yield identical results only in the limit of adequate sampling. While adequate sampling may be straightforward in case Fig. 5a since the two ligands share the same binding mode, energy barriers mean it will be very challenging in Fig. 5b, which involves a change in binding mode. In this latter case, computed ΔΔG1→2 will depend on the starting structure and be incorrect unless the simulations sample enough binding mode interconversion events.

To sum up, whenever different potential ligand binding modes are separated by large kinetic barriers, thermodynamic cycles of the sort in Fig. 5 are unlikely to close with normal simulation lengths due to convergence problems. This will be especially problematic whenever the binding mode of one (or more) of the potential ligands in the calculations is uncertain, or when their binding modes are different.

Longer simulations or separation of states can both solve convergence problems

Currently, there are two established approaches to solving these types of sampling problems. The most straightforward approach is simply to simulate longer until convergence is adequate; after all, both thermodynamic cycles yield correct free energies in the limit of infinite sampling. But this may not always be practical.114 Another approach involves separation of states (also called “integration over parts”115)—specifically, focusing sampling on individual regions of phase space likely to be important and treating kinetically distinct states separately.115, 116, 117, 118 For example, for proteins with multiple relevant conformations, one might consider each distinct stable conformation individually (such as in cases of different stable rotamers or isomers).104, 115, 116, 117, 119, 120, 121, 122, 123 For a ligand with multiple binding orientations, one might consider different orientations separately.65, 68, 106, 115, 116, 124 Here, we can think of each of these approaches as considering a different stable binding mode separately. In this scenario, the main advantage is that we only have to adequately sample each binding mode, not transitions between binding modes. The price we have to pay, however, is that we must obtain the relative free energies of the different binding modes. Still, in cases where transitions between binding modes are slow but transitions within binding modes are fast, this can dramatically improve convergence of free energy estimates.106, 124

There is a simple general expression for relative free energies involving multiple binding modes

Separation of states approaches have a long history in free energy calculations, even predating simulations themselves.116 There have been a reasonable number of applications of these techniques to protein conformational changes, but relatively few to binding free energy calculations. And of these, most applications were in absolute free energy calculations,101, 103, 106, 124 with only very few to relative free energy calculations.68, 107

Here, we are interested in a general approach to handle multiple binding modes within relative free energy calculations. In this approach, we pick a reference binding mode for the ligand which we will use for doing the actual binding calculation, from which we will obtain ΔΔGsite, r, the binding free energy in that particular mode. Additionally, we need ΔΔGLn, ir, the free energies of taking each ligand Ln between binding mode number i and the reference binding mode r (“interconversion free energies” (IFE), in our terminology). Then we can write (motivated by Refs. 106 and 124; see the supplementary material113 for the derivation) the following general expression for the relative binding free energy:

ΔΔG12=ΔΔGsite,rΔΔGsolvβ1ln1+ modes irexp(βΔGL1,ir)1+ modes irexp(βΔGL2,ir). (1)

Here the sum runs over different stable binding modes. Two important limiting cases of this expression are: (1) When all of the ΔGLn, ir are large and positive (when all other binding modes except r are unfavorable), Eq. 1 reduces to ΔΔG1 → 2 = ΔΔGsite, r − ΔΔGsolv; and (2) When all other binding modes are equivalent to the reference binding mode (when all binding modes are equal), then all of the ΔGLn, ir are zero and ΔΔG12=ΔΔGsite,rΔΔGsolvβ1lnNM where N is the number of equivalent binding modes of L1 and M is the number of equivalent binding modes of L2. In the latter case, the term involving the logarithm essentially amounts to a ligand symmetry number correction analogous to those in Ref. 106.

This approach requires clear separation into multiple, non-overlapping binding modes. As we will discuss below, this is simple in the limit of extremely slow interconversions between binding modes, but when only a few transitions occur on simulation time scales, a clear definition of each binding mode is necessary.

Overall, this expression is a general one for handling multiple binding modes in the context of relative free energy calculations. Next, we consider several additional limiting cases.

In general, we may not know the binding mode of either ligand

In the general case (Eq. 1) each ligand L1 and L2 may have multiple binding modes88 or at least we may not know their dominant binding modes. To deal with this, we might dock each ligand into the binding site and run some short (i.e., nanosecond) MD simulations from several different starting poses.103, 106, 111 From these, we might identify stable states for each ligand which would be potential binding modes, but if time scales for interconversion are slow, we will have no data at this point about which binding modes are the most important to binding—we will simply know of a small set of stable binding modes of each ligand. Consider the case of two ligands, each with two possible binding modes (Fig. 6). Depending on how the calculations are set up, we can imagine two possible values for ΔG1—one going to binding mode 1 (left) of L1 and one going to binding mode 2 (right) of L1. Similarly, there are two possible values for ΔG2. The more negative of these free energies for each ligand will correspond to the more dominant binding mode,106 and we do not know a priori which this will be. This is one of the things we would like to find out, in addition to ΔΔG1→2.

Figure 6.

Figure 6

To correctly treat binding of L1 and L2 when both have multiple potential binding modes, we need a different approach depending on the preferred binding mode, as discussed in the text. Calculating ΔGL1, 1→2 and ΔGL2, 1→2 will tell us which binding modes are preferred, and we can combine this with Eq. 1 to get the correct binding free energy.

Assume that simulation time scales are short enough that the ligand does not switch between binding modes. Then, because each ligand has two possible binding modes, calculations will obtain one of the two different potential values for the free energy change in the binding site, ΔΔGsite, 1 and ΔΔGsite, 2, depending on the choice of reference binding mode for the calculations.125 In the limit where a single (common) binding mode dominates, the corresponding ΔΔGsite, r will yield the correct binding free energy, but we do not know which is correct at this point.

We note three limiting scenarios in this situation:

  1. Both ligands prefer the left (#1) binding mode.

  2. Both ligands prefer the right (#2) binding mode.

  3. The ligands prefer different binding modes (assume L1 prefers the left binding mode and L2 prefers the right binding mode).

These three cases require slightly different approaches, but to distinguish between them, we need to know which binding mode of each ligand is preferred. This, as discussed in Sec. 2B, will require a separate calculation of ΔGL1, 1→2, the free energy of taking L1 from the left to the right binding mode, and ΔGL2, 1→2, the corresponding free energy for L2. Calculation of these may be challenging in its own right, though we provide some discussion below of how these may be obtained (Sec. 3). For now, assume we have obtained these IFEs, allowing us to distinguish between the scenarios noted above.

With these quantities in hand, we now consider our three scenarios:

  1. Left binding mode: In this case, we follow the thermodynamic cycle labeled “Cycle 1” and ΔΔG1 → 2 = ΔΔGsite, 1 − ΔΔGsolv.

  2. Right binding mode: In this case, we follow the thermodynamic cycle labeled “Cycle 2” and ΔΔG1 → 2 = ΔΔGsite, 2 − ΔΔGsolv.

  3. Different binding modes: In this case, we obtain ΔΔG1 → 2 = ΔΔGsite, 1 − ΔΔGsolv + ΔGL2, 1 → 2 or ΔΔG1 → 2 = ΔΔGsite, 2 − ΔΔGsolv − ΔGL1, 1 → 2. These are thermodynamically equivalent, though in practice they will have different convergence properties. These follow from Eq. 1 in the appropriate limits.126

If the possibility of multiple binding modes is ignored, as in many relative calculations, then typically scenario 1 or scenario 2 will be assumed and no IFEs (ΔGL1, 1→2 and ΔGL2, 1→2) are calculated. This introduces an error of ΔGL2, 1→2 (or ΔGL1, 1→2 depending on the cycle) when scenario 3 describes binding. As noted in the Introduction, these free energies can be large, up to 7 kcal/mol.

Lead optimization campaigns will in some cases begin with knowledge of the dominant binding mode of one of the ligands. But even if we know the binding mode of one ligand, we may not know the binding mode of the other. Thus we still need ΔGL1, 1→2, as long as transitions between binding modes in the site are slow compared to simulation time scales.

The same framework applies when handling mutations to a common scaffold

In general, one ligand may not be a subset of the other, so instead of directly mutating L1 into L2, the free energy calculation may need to pass through a common scaffold (as shown in the supplementary material,113 Fig. 1(b)). Additionally, we may need to consider even more potential binding modes—for example, for mutations of 5-chloro-2-methylphenol to catechol, we tried four potential binding modes (which differed substantially in free energy) since there are four ways to overlay it onto catechol while preserving the position of the hydroxyl.106 To handle such cases, we introduce an additional intermediate state (Fig. 7) corresponding to the scaffold. Depending on whether these are single or dual topology calculations, specification of this scaffold may be explicit or implicit (supplementary material,113 Fig. 1).

Figure 7.

Figure 7

In the fully general case, each ligand (Ligand 1, Ligand 2) may have multiple potential binding modes, and one ligand may not be a subset of the other. In this case, for single topology calculations, we need to pass through an intermediate state or scaffold which is a common substructure of both ligands. Computing the relative free energy may require computing all of the ΔΔGsite, 1...ΔΔGsite, N, as well as the free energy of moving the scaffold between different scaffold binding modes. Here the different ligands are shown with colored additions representing functional groups; the common substructure is gray.

Figure 7 shows a variety of possible structures (which represent metastable states or potential binding modes) which may affect the relative binding free energy.127 What exact information do we need to get the relative binding free energy in this case? Minimally, we require a path between every stable potential binding mode to every other stable potential binding mode. We have considerable flexibility in how these paths can be constructed, including:

  1. Compute the free energy of altering the binding mode of the scaffold only, and all of the binding site free energies ΔΔGsite, i.

  2. Compute the free energy of altering the binding mode of each ligand in its binding site between all binding modes, and only one ΔΔGsite, i avoiding the need for scaffold IFEs.

  3. Compute the free energy of altering the binding mode of just one ligand in its binding site, and all of the binding site free energies ΔΔGsite, i.

As we will discuss below in Sec. 3, different approaches for computing the binding mode IFEs may give us reason to favor one of these scenarios over the others, but all of these yield the requisite information. Addition of some redundancy is advisable to allow computation of cycle closure errors as a test for convergence.

Assuming we have this information, we then use IFEs in combination with the other information available to identify which binding mode(s) are dominant and then construct a thermodynamic cycle such as Fig. 6 which includes our scaffold to get the correct relative free energy.

While our discussion here has focused on single topology calculations, similar issues face dual topology calculations as we note in the supplementary material.113

RECOMMENDED APPROACHES

For ligands with slow binding mode changes or unknown binding modes, efficient relative free energy calculations require the free energies of taking ligands between stable binding modes (the IFE) in the binding site. Otherwise, free energy calculations will need to be extremely long in order to achieve convergence. We envision several potential ways to obtain the IFE in the binding site (the free energies, ΔGLn, ir (Eq. 1), to take each binding mode to the reference binding mode).

Two different types of IFEs would suffice. We could use IFEs of the scaffold (as in Fig. 7) or IFEs of actual ligands. Scaffold IFEs may be easier to obtain due to fewer steric hindrances (and in special cases such as a scaffold consisting of a symmetric ring, scaffold IFEs may be zero128), though IFEs of a ligand can provide additional physical insight. These IFEs could be obtained a number of ways, including potential of mean force approaches, relative free energy calculations between different binding modes,65, 68, 107, 108 and even absolute binding free energy calculations to get relative free energies of binding modes.106, 124 Relative calculations between binding modes seem likely to work particularly well, and would probably best be implemented with a dual topology strategy, where the ligand is turned off in one binding mode while being turned on in another binding mode. Orientational restraints106 could be used to ensure the two binding modes are well separated and the correct IFE is obtained, as in Rocklin et al.129 Other potential strategies for computing IFEs include Markov state models,124 metadynamics, and even the “deactivated morphing” approach.130

We recommend one of two approaches to calculate the IFEs in a relatively general way, though more alternatives are becoming available:

  1. Using absolute free energy calculations to compute the free energy to change the binding mode of the scaffold in the binding site. While this may seem overkill for a relative free energy calculation, there are two main benefits. First, it actually enables a set of relative free energy calculations to be used to obtain absolute binding free energies (by virtue of knowing the absolute binding free energy of our scaffold). Second, it scales extremely well with additional related ligands. Particularly, imagine instead of two related ligands we want to compare, we have 10 which share a common scaffold. We can compute absolute free energies of the scaffold binding modes just once, and use this one set of absolute binding free energy calculations to compute correct relative or absolute binding free energies of all 10 of our ligands. This is done simply by adding additional ligands (i.e., in Fig. 7) and connecting each ligand to the common scaffold.

  2. Using relative free energy calculations combined with orientational restraints to calculate the free energy to change the binding mode of each ligand in the binding site. With a dual topology scheme, we can turn a ligand into dummy atoms in one binding mode while turning it on from dummy atoms in another, potentially using orientational restraints to ensure that no interconversion between binding modes happens during the calculations and thus obtaining a correct IFE.107, 108, 129

Both of these approaches have the caveat that, as noted previously, treating different binding modes separately in the these calculations requires careful separation of kinetically distinct binding modes.106, 124 Consider a ligand L1 which has two metastable binding modes A and B, and its transformation into L2 which has the same two binding modes. When the barrier between A and B is large enough that no binding mode interconversion occurs, separation into distinct binding modes is trivial. And when rapid interconversion between A and B occurs, no separation is necessary. But in the intermediate regime, when just one or several transitions between A and B occur (but not enough to ensure convergence), the simulations either need be lengthened, or simulation data need to be sorted by binding mode and each binding mode analyzed separately.

A third approach is to use enhanced sampling techniques21, 131, 132 to sample all relevant binding modes easily within the context of a single free energy calculation, eliminating the need for IFEs. However, success with these techniques so far seems confined mainly to model binding sites.21, 105, 110, 128, 133

Here, we focused on cases where likely possible binding modes can be discovered relatively easily, which is sometimes true but certainly will not always be the case. We have typically done this by docking ligands into binding sites, running molecular dynamics simulations beginning from a wide range of substantially different docking poses, and then clustering the resulting simulation snapshots to identify a variety of different binding modes which are well populated.101, 103, 106 In model sites we have studied, this has typically yielded a variety of plausible binding modes from which, using free energy calculations, we have often been able to identify the dominant binding modes.101, 103, 106 Others have had some success with a similar strategy.65, 124 However, in general the problem of identifying likely binding modes will be extremely challenging, especially in cases of dramatic receptor rearrangement on ligand binding, or where ligand modifications introduce unexpected and slow conformational rearrangement in the protein. In such cases, discovering candidate binding modes may be much more difficult. In principle the same considerations raised here still apply, but new approaches for binding mode exploration will likely be needed.

SUMMARY AND CONCLUSIONS

Relative free energy calculations are increasingly intriguing for structure based drug discovery, yet still notoriously unreliable. Major challenges include force fields and sampling, and here our focus is on sampling. Problems with binding mode sampling can pose significant challenges, and the straightforward approach of simply lengthening simulations until convergence is achieved may not be adequate. Relative free energy calculations, we argue, can benefit from separation of states approaches, where different stable binding modes are treated separately to enhance convergence. This is especially important whenever the binding mode of one or both ligands is not absolutely known (that is, in essentially all discovery applications) and whenever the ligands being compared do not share the same binding mode. If ligand binding mode interconversions are ignored, thermodynamic cycles used in relative free energy calculations often fail to close, resulting in errors in computed binding free energies of several kcal/mol. Using the framework here requires free energies to take each ligand between its different potential binding modes, or to take the scaffold (common substructure) between its potential binding modes. One route for obtaining these free energies is absolute binding free energy calculations.

In general, when binding mode sampling is ignored in relative free energy calculations and binding mode interconversions are slow, the error introduced is related to the free energy to take either ligand from the binding mode actually used in the calculations to the correct binding mode(s), and can in some cases be several kcal/mol.88 It is difficult to say how often these effects will be important without more exhaustive computational or experimental study. As noted, at this point our data consist primarily of examples of unexpected binding modes and multiple relevant binding modes. But we suspect (probably conservatively) that at least 10% of cases will be affected by problems with binding mode sampling.

Historically, one argument in favor of relative free energy calculations over absolute binding free energy calculations is that the former are simpler and do not require turning the entire ligand into dummy atoms in the binding site. However, convergence of relative free energy calculations when ligand binding modes are uncertain (as in any predictive context) will likely require computing the free energy of taking the ligands or scaffold between different kinetically stable binding modes. One straightforward route to that information is provided by absolute binding free energy calculations, and another is to use dual topology relative free energy calculations to obtain the binding mode IFE for each ligand after a careful separation of binding modes.

In our view, the time is ripe for a careful examination of binding mode sampling issues, perhaps in model systems such as the lysozyme binding sites100, 101, 103, 105, 106, 128, 134, 135, 136, 137, 138, 139 and the cytochrome C cavities.109, 140, 141, 142 These are relatively simple binding sites which bind small, fragment-like ligands and yet are susceptible the binding mode sampling problems discussed here. Much of the work in these sites has been with absolute free energy calculations (with notable exceptions105, 109, 128), but the available high quality data pave the way for a careful examination of relative free energy calculations in this context. These also should provide an excellent test case for new methodological innovations aimed at tackling binding mode sampling problems.105, 128

Overall, we believe the literature and our analysis suggests ligand binding mode sampling presents serious challenges to relative free energy calculations. It is easy in practice to ignore these issues and plunge ahead with such calculations. But the data suggest this is unwise if these calculations are ever to be robust enough for routine application in a predictive setting such as drug discovery. The issue of orientational and binding mode sampling for ligands needs to receive much more attention in relative free energy calculations (as it is beginning to in absolute free energy calculations101, 103, 106, 124). Hopefully, the separation of states approach we have highlighted will provide a route forward.

Already, a good number of free energy studies have achieved the level of accuracy needed to be helpful in drug discovery. If we can advance binding free energy techniques to where hydration free energy studies are today in terms of speed and accuracy, they will play a dramatically different role in the drug discovery process.

ACKNOWLEDGMENTS

D.L.M. thanks Vijay S. Pande (Stanford) for encouraging him to publish this analysis and for helpful discussions. We appreciate Gabriel Rocklin (UCSF), Thomas Steinbrecher (Karlsruhe Institute of Technology), Julien Michel (Edinburgh), John Chodera (Berkeley), Emilio Gallicchio (Rutgers), Robert Abel (Schrödinger), and Michael Shirts (University of Virginia) for a critical reading of the manuscript. We acknowledge the National Institutes of Health (Grant No. 1R15GM096257-01A1), the Louisiana Board of Regents Research Competitiveness and Research Enhancement Subprograms as well as the Louisiana Optical Network Initiative (supported by the Louisiana Board of Regents Post-Katrina Support Fund Initiative Grant No. LEQSF(2007-12)- ENH-PKSFI-PRS-01)), and the National Science Foundation under NSF EPSCoR Cooperative Agreement No. EPS-1003897 with additional support from the Louisiana Board of Regents.

References

  1. Kuntz I. D., Meng E. C., and Shoichet B. K., Acc. Chem. Res. 27, 117 (1994). 10.1021/ar00041a001 [DOI] [Google Scholar]
  2. Shoichet B. K., Nat. Rev. Drug Discovery 432, 862 (2004). 10.1038/nature03197 [DOI] [Google Scholar]
  3. Steinbrecher T., in Protein-Ligand Interactions, edited by Gohlke H. (Wiley-VCH, 2012). [Google Scholar]
  4. Shirts M. R., Mobley D., and Brown S. P., in Drug Design: Structure and Ligand-based Approaches, edited by K. M.MerzJr., Ringe D., and Reynolds C. H. (Cambridge University Press, 2010). [Google Scholar]
  5. Jorgensen W. L., Science 303, 1813 (2004). 10.1126/science.1096361 [DOI] [PubMed] [Google Scholar]
  6. Schnecke V. and Bostrom J., Drug Discovery Today 11, 43 (2006). 10.1016/S1359-6446(05)03703-7 [DOI] [PubMed] [Google Scholar]
  7. Reynolds C. H., in Drug Design: Structure- and Ligand-based Approaches, edited by K. M.MerzJr., Ringe D., and Reynolds C. H. (Cambridge University Press, 2010), pp. 181–196. [Google Scholar]
  8. Green D. V. S., Leach A. R., and Head M. S., J. Comput.-Aided Mol. Des. 26, 51 (2011). 10.1007/s10822-011-9514-1 [DOI] [PubMed] [Google Scholar]
  9. Manly C., Chandrasekhar J., Ochterski J., Hammer J., and Warfield B., Drug Discovery Today 13, 99 (2008). 10.1016/j.drudis.2007.10.019 [DOI] [PubMed] [Google Scholar]
  10. Enyedy I. J. and Egan W. J., J. Comput.-Aided Mol. Des. 22, 161 (2008). 10.1007/s10822-007-9165-4 [DOI] [PubMed] [Google Scholar]
  11. Klebe G., Drug Discovery Today 11, 580 (2006). 10.1016/j.drudis.2006.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. McInnes C., Curr. Opin. Chem. Biol. 11, 494 (2007). 10.1016/j.cbpa.2007.08.033 [DOI] [PubMed] [Google Scholar]
  13. Erhardt P. W. and Proudfoot J. R., Compr. Med. Chem. II 1, 29 (2007). 10.1016/B0-08-045044-X/00002-X [DOI] [Google Scholar]
  14. Shoichet B. K., McGovern S., Wei B., and Irwin J., Curr. Opin. Chem. Biol. 6, 439 (2002). 10.1016/S1367-5931(02)00339-3 [DOI] [PubMed] [Google Scholar]
  15. Jorgensen W. L., in Drug Design: Structure and Ligand-based Approaches, edited by K. M.MerzJr., Ringe D., and Reynolds C. H. (Cambridge University Press, 2010), pp. 1–14. [Google Scholar]
  16. Perola E., Walters W. P., and Charifson P. S., Proteins 56, 235 (2004). 10.1002/prot.20088 [DOI] [PubMed] [Google Scholar]
  17. Warren G. L., Andrews C. W., Capelli A.-M., Clarke B., Lalonde J., Lambert M. H., Lindvall M., Nevins N., Semus S. F., Senger S., Tedesco G., Wall I. D., Woolven J. M., Peishoff C. E., and Head M. S., J. Med. Chem. 49, 5912 (2006). 10.1021/jm050362n [DOI] [PubMed] [Google Scholar]
  18. Christ C. D., Mark A. E., and van Gunsteren W. F., J. Comput. Chem. 31, 1569 (2010). 10.1002/jcc.21450 [DOI] [PubMed] [Google Scholar]
  19. Michel J. and Essex J. W., J. Comput.-Aided Mol. Des. 24, 649 (2010). 10.1007/s10822-010-9363-3 [DOI] [PubMed] [Google Scholar]
  20. Chodera J. D., Mobley D., Shirts M. R., Dixon R. W., Branson K., and Pande V. S., Curr. Opin. Struct. Biol. 21, 150 (2011). 10.1016/j.sbi.2011.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gallicchio E. and Levy R. M., Curr. Opin. Struct. Biol. 21, 161 (2011). 10.1016/j.sbi.2011.01.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Here, our focus is on methods which are in principle exact, so we omit discussion of approximate end-point methods which are sometimes also called free energy calculations.
  23. It will be important to test this in simple systems such as model binding sites, but very little such work has yet been done.
  24. Kirkwood J. G., J. Chem. Phys. 3, 300 (1935). 10.1063/1.1749657 [DOI] [Google Scholar]
  25. Zwanzig R. W., J. Chem. Phys. 22, 1420 (1954). 10.1063/1.1740193 [DOI] [Google Scholar]
  26. Mruzik M., Abraham F., Schreiber D., and Pound G., J. Chem. Phys. 64, 481 (1976). 10.1063/1.432264 [DOI] [Google Scholar]
  27. Mezei M., Swaminathan S., and Beveridge D. L., J. Am. Chem. Soc. 100, 3255 (1978). 10.1021/ja00478a070 [DOI] [Google Scholar]
  28. Okazaki S., Nakanishi K., Touhara H., and Adachi Y., J. Chem. Phys. 71, 2421 (1979). 10.1063/1.438647 [DOI] [Google Scholar]
  29. Postma J. P. M., Berendsen H. J. C., and Haak J. R., Faraday Symp. Chem. Soc. 17, 55 (1982). 10.1039/fs9821700055 [DOI] [Google Scholar]
  30. Lybrand T. P., Ghosh I., and McCammon J. A., J. Am. Chem. Soc. 107, 7793 (1985). 10.1021/ja00311a112 [DOI] [Google Scholar]
  31. Jorgensen W. L. and Ravimohan C., J. Chem. Phys. 83, 3050 (1985). 10.1063/1.449208 [DOI] [Google Scholar]
  32. Bash P. A., Singh U. C., Langridge R., and Kollman P. A., Science 236, 564 (1987). 10.1126/science.3576184 [DOI] [PubMed] [Google Scholar]
  33. Cieplak P., Singh U. C., and Kollman P. A., Int. J. Quantum Chem. 32, 65 (1987). 10.1002/qua.560320811 [DOI] [Google Scholar]
  34. Fleischman S. and C. L.BrooksIII, J. Chem. Phys. 87, 3029 (1987). 10.1063/1.453039 [DOI] [Google Scholar]
  35. C. L.BrooksIII, Int. J. Quantum Chem. 34, 221 (1988). 10.1002/qua.560340720 [DOI] [Google Scholar]
  36. Hermans J., Pathiaseril A., and Anderson A., J. Am. Chem. Soc. 110, 5982 (1988). 10.1021/ja00226a009 [DOI] [PubMed] [Google Scholar]
  37. Jorgensen W. L., Blake J., and Buckner J., Chem. Phys. 129, 193 (1989). 10.1016/0301-0104(89)80004-7 [DOI] [Google Scholar]
  38. Rao B. and Singh U. C., J. Am. Chem. Soc. 111, 3125 (1989). 10.1021/ja00191a003 [DOI] [Google Scholar]
  39. Tembe B. L. and McCammon J. A., Comput. Chem. 8, 281 (1984). 10.1016/0097-8485(84)85020-2 [DOI] [Google Scholar]
  40. Lybrand T. P., McCammon J. A., and Wipff G., Proc. Natl. Acad. Sci. U.S.A. 83, 833 (1986). 10.1073/pnas.83.4.833 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Jorgensen W. L., Boudon S., and Nguyen T. B., J. Am. Chem. Soc. 111, 755 (1989). 10.1021/ja00184a067 [DOI] [Google Scholar]
  42. Wong C. F. and McCammon J. A., J. Am. Chem. Soc. 108, 3830 (1986). 10.1021/ja00273a048 [DOI] [Google Scholar]
  43. Hermans J. and Subramaniam S., Isr. J. Chem. 27, 225 (1986). [Google Scholar]
  44. Bash P. A., Singh U. C., Brown F. K., Langridge R., and Kollman P. A., Science 235, 574 (1987). 10.1126/science.3810157 [DOI] [PubMed] [Google Scholar]
  45. Rao S. N., Singh U. C., Bash P. A., and Kollman P. A., Nature (London) 328, 551 (1987). 10.1038/328551a0 [DOI] [PubMed] [Google Scholar]
  46. Gao J., Kuczera K., Tidor B., and Karplus M., Science 244, 1069 (1989). 10.1126/science.2727695 [DOI] [PubMed] [Google Scholar]
  47. Hermans J., Yun R., and Anderson A., J. Comput. Chem. 13, 429 (1992). 10.1002/jcc.540130406 [DOI] [Google Scholar]
  48. Straatsma T. P. and McCammon J., Annu. Rev. Phys. Chem. 43, 407 (1992). 10.1146/annurev.pc.43.100192.002203 [DOI] [Google Scholar]
  49. Reynolds C., King P., and Richards W., Mol. Phys. 76, 251 (1992). 10.1080/00268979200101321 [DOI] [Google Scholar]
  50. Beutler T., Mark A. E., van Schaik R. C., Gerber P. R., and van Gunsteren W. F., Chem. Phys. Lett. 222, 529 (1994). 10.1016/0009-2614(94)00397-1 [DOI] [Google Scholar]
  51. Zacharias M., Straatsma T. P., and McCammon J. A., J. Chem. Phys. 100, 9025 (1994). 10.1063/1.466707 [DOI] [Google Scholar]
  52. Steinbrecher T., Mobley D., and Case D. A., J. Chem. Phys. 127, 214108 (2007). 10.1063/1.2799191 [DOI] [PubMed] [Google Scholar]
  53. Bennett C., J. Comput. Phys. 22, 245 (1976). 10.1016/0021-9991(76)90078-4 [DOI] [Google Scholar]
  54. Shirts M. R., Bair E., Hooker G., and Pande V. S., Phys. Rev. Lett. 91, 140601 (2003). 10.1103/PhysRevLett.91.140601 [DOI] [PubMed] [Google Scholar]
  55. Lu N., Kofke D. A., and Woolf T. B., J. Comput. Chem. 25, 28 (2004). 10.1002/jcc.10369 [DOI] [PubMed] [Google Scholar]
  56. Shirts M. R. and Chodera J. D., J. Chem. Phys. 129, 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Shirts M. R., Pitera J. W., Swope W. C., and Pande V. S., J. Chem. Phys. 119, 5740 (2003). 10.1063/1.1587119 [DOI] [Google Scholar]
  58. Shirts M. R. and Pande V. S., J. Chem. Phys. 122, 134508 (2005). 10.1063/1.1877132 [DOI] [PubMed] [Google Scholar]
  59. Mobley D., Bayly C. I., Cooper M. D., Shirts M. R., and Dill K. A., J. Chem. Theory Comput. 5, 350 (2009). 10.1021/ct800409d [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Shivakumar D., Deng Y., and Roux B., J. Chem. Theory Comput. 5, 919 (2009). 10.1021/ct800445x [DOI] [PubMed] [Google Scholar]
  61. Shivakumar D., Williams J., Wu Y., Damm W., Shelley J., and Sherman W., J. Chem. Theory Comput. 6, 1509 (2010). 10.1021/ct900587b [DOI] [PubMed] [Google Scholar]
  62. Knight J. L. and C. L.BrooksIII, J. Comput. Chem. 32, 2909 (2011). 10.1002/jcc.21876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Shivakumar D., Harder E., Damm W., Friesner R. A., and Sherman W., J. Chem. Theory Comput. 8, 2553 (2012). 10.1021/ct300203w [DOI] [PubMed] [Google Scholar]
  64. Luccarelli J., Michel J., Tirado-Rives J., and Jorgensen W. L., J. Chem. Theory Comput. 6, 3850 (2010). 10.1021/ct100504h [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Steinbrecher T., Case D., and Labahn A., J. Med. Chem. 49, 1837 (2006). 10.1021/jm0505720 [DOI] [PubMed] [Google Scholar]
  66. Steinbrecher T., Hrenn A., Dormann K., Merfort I., and Labahn A., Bioorg. Med. Chem. 16, 2385 (2008). 10.1016/j.bmc.2007.11.070 [DOI] [PubMed] [Google Scholar]
  67. Lawrenz M., Wereszczynski J., Amaro R., Walker R., Roitberg A. E., and McCammon J. A., Proteins 78, 2523 (2010). 10.1002/prot.22761 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Palma P. N., Bonifácio M. J., Loureiro A. I., and Soares-da Silva P., J. Comput. Chem. 33, 970 (2012). 10.1002/jcc.22926 [DOI] [PubMed] [Google Scholar]
  69. Reddy M. R. and Erion M. D., J. Am. Chem. Soc. 123, 6246 (2001). 10.1021/ja0103288 [DOI] [PubMed] [Google Scholar]
  70. Erion M. D., Dang Q., Reddy M. R., Kasibhatla S. R., Huang J., Lipscomb W. N., and van Poelje P. D., J. Am. Chem. Soc. 129, 15480 (2007). 10.1021/ja074869u [DOI] [PubMed] [Google Scholar]
  71. Chipot C., Rozanska X., and Dixit S. B., J. Comput.-Aided Mol. Des. 19, 765 (2005). 10.1007/s10822-005-9021-3 [DOI] [PubMed] [Google Scholar]
  72. Chipot C., New Algorithms for Macromolecular Simulation (Springer, 2006), pp. 184–209. [Google Scholar]
  73. Pearlman D. and Charifson P., J. Med. Chem. 44, 3417 (2001). 10.1021/jm0100279 [DOI] [PubMed] [Google Scholar]
  74. Pearlman D., J. Med. Chem. 48, 7796 (2005). 10.1021/jm050306m [DOI] [PubMed] [Google Scholar]
  75. Steinbrecher T. and Labahn A., Curr. Med. Chem. 17, 767 (2010). 10.2174/092986710790514453 [DOI] [PubMed] [Google Scholar]
  76. de Ruiter A. and Oostenbrink C., Curr. Opin. Chem. Biol. 15, 547 (2011). 10.1016/j.cbpa.2011.05.021 [DOI] [PubMed] [Google Scholar]
  77. Hajduk P. J. and Sauer D., J. Med. Chem. 51, 553 (2008). 10.1021/jm070838y [DOI] [PubMed] [Google Scholar]
  78. Shirts M. R., Mobley D., Chodera J. D., and Pande V. S., J. Phys. Chem. B 111, 13052 (2007). 10.1021/jp0735987 [DOI] [PubMed] [Google Scholar]
  79. Klimovich P. and Mobley D., J. Comput.-Aided Mol. Des. 24, 307 (2010). 10.1007/s10822-010-9343-7 [DOI] [PubMed] [Google Scholar]
  80. Mobley D., Liu S., Cerutti D. S., Swope W. C., and Rice J. E., J. Comput.-Aided Mol. Des. 26(5), 551–562 (2012). 10.1007/s10822-011-9528-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Kehoe C. W., Fennell C. J., and Dill K. A., J. Comput.-Aided Mol. Des. 26, 563 (2012). 10.1007/s10822-011-9536-8 [DOI] [PubMed] [Google Scholar]
  82. Mobley D., Bayly C. I., Cooper M. D., Dill K. A., and Dill K. A., J. Phys. Chem. B 113, 4533 (2009). 10.1021/jp806838b [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. van den Bosch M., Swart M., Snijders J., Berendsen H. J. C., Mark A. E., Oostenbrink C., van Gunsteren W. F., and Canters G., ChemBioChem 6, 738 (2005). 10.1002/cbic.200400244 [DOI] [PubMed] [Google Scholar]
  84. de Graaf C., Oostenbrink C., Keizers P. H. J., van Vugt-Lussenburg B. M. A., Commandeur J. N. M., and Vermeulen N. P. E., Curr. Drug Metab. 8, 59 (2007). 10.2174/138920007779315062 [DOI] [PubMed] [Google Scholar]
  85. Dolenc J., Oostenbrink C., Koller J., and van Gunsteren W. F., Nucleic Acids Res. 33, 725 (2005). 10.1093/nar/gki195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Villa A., Zangi R., Pieffet G., and Mark A. E., J. Comput.-Aided Mol. Des. 17, 673 (2003). 10.1023/B:JCAM.0000017374.53591.32 [DOI] [PubMed] [Google Scholar]
  87. Michel J., Tirado-Rives J., and Jorgensen W. L., J. Phys. Chem. B 113(40), 13337 (2009). 10.1021/jp9047456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Mobley D. and Dill K. A., Structure 17, 489 (2009). 10.1016/j.str.2009.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Jorgensen W. L., Ruiz-Caro J., Tirado-Rives J., Basavapathruni A., Anderson K. S., and Hamilton A. D., Bioorg. Med. Chem. Lett. 16, 663 (2006). 10.1016/j.bmcl.2005.10.038 [DOI] [PubMed] [Google Scholar]
  90. Badger J., Minor I., Kremer M. J., Oliveira M. A., Smith T. J., Griffith J. P., Guerin D. M., Krishnaswamy S., Luo M., and Rossmann M. G., Proc. Natl. Acad. Sci. U.S.A. 85, 3304 (1988). 10.1073/pnas.85.10.3304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Stout T. J., Tondi D., Rinaldi M., Barlocco D., Pecorari P., Santi D. V., Kuntz I. D., Stroud R. M., Shoichet B. K., and Costi M. P., Biochemistry 38, 1607 (1999). 10.1021/bi9815896 [DOI] [PubMed] [Google Scholar]
  92. Böhm H. and Klebe G., Angew. Chem., Int. Ed. 35, 2588 (1996). 10.1002/anie.199625881 [DOI] [Google Scholar]
  93. Kim K. H., J. Comput.-Aided Mol. Des. 21, 63 (2007). 10.1007/s10822-007-9106-2 [DOI] [PubMed] [Google Scholar]
  94. Kim K. H., J. Comput.-Aided Mol. Des. 21, 421 (2007). 10.1007/s10822-007-9126-y [DOI] [PubMed] [Google Scholar]
  95. Pei Z., Li X., Longenecker K., von Geldern T. W., Wiedeman P. E., Lubben T. H., Zinker B. A., Stewart K., Ballaron S. J., Stashko M. A., Mika A. K., Beno D. W. A., Long M., Wells H., Kempf-Grote A. J., Madar D. J., McDermott T. S., Bhagavatula L., Fickes M. G., Pireh D., Solomon L. R., Lake M. R., Edalji R., Fry E. H., Sham H. L., and Trevillyan J. M., J. Med. Chem. 49, 3520 (2006). 10.1021/jm051283e [DOI] [PubMed] [Google Scholar]
  96. Reich S. H., Melnick M., Davies J. F., Appelt K., Lewis K. K., Fuhry M. A., Pino M., Trippe A. J., Nguyen D., and Dawson H., Proc. Natl. Acad. Sci. U.S.A. 92, 3298 (1995). 10.1073/pnas.92.8.3298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Stoll V., Stewart K. D., Maring C. J., Muchmore S. W., Giranda V., Gu Y.-G. Y., Wang G., Chen Y., Sun M., Zhao C., Kennedy A. L., Madigan D. L., Xu Y., Saldivar A., Kati W., Laver G., Sowin T., Sham H. L., Greer J., and Kempf D., Biochemistry 42, 718 (2003). 10.1021/bi0205449 [DOI] [PubMed] [Google Scholar]
  98. Montfort W. R., Perry K. M., Fauman E. B., Finer-Moore J. S., Maley G. F., Hardy L., Maley F., and Stroud R. M., Biochemistry 29, 6964 (1990). 10.1021/bi00482a004 [DOI] [PubMed] [Google Scholar]
  99. Stubbs M. T., Reyda S., Dullweber F., Moller M., Klebe G., Dorsch D., Mederski W. W. K. R., and Wurziger H., ChemBioChem 3, 246 (2002). [DOI] [PubMed] [Google Scholar]
  100. Graves A. P., Brenk R., and Shoichet B. K., J. Med. Chem. 48, 3714 (2005). 10.1021/jm0491187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Mobley D., Graves A. P., Chodera J. D., McReynolds A., Shoichet B. K., and Dill K. A., J. Mol. Biol. 371, 1118 (2007). 10.1016/j.jmb.2007.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Constantine K. L., Mueller L., Metzler W. J., McDonnell P. A., Todderud G., Goldfarb V., Fan Y., Newitt J. A., Kiefer S. E., Gao M., Tortolani D., Vaccaro W., and Tokarski J., J. Med. Chem. 51, 6225 (2008). 10.1021/jm800747w [DOI] [PubMed] [Google Scholar]
  103. Boyce S. E., Mobley D., Rocklin G. J., Graves A. P., Dill K. A., and Shoichet B. K., J. Mol. Biol. 394, 747 (2009). 10.1016/j.jmb.2009.09.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Mobley D., Chodera J. D., and Dill K. A., J. Chem. Theory Comput. 3, 1231 (2007). 10.1021/ct700032n [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Wang L., Berne B. J., and Friesner R. A., Proc. Natl. Acad. Sci. 109, 1937 (2012). 10.1073/pnas.1114017109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Mobley D., Chodera J. D., and Dill K. A., J. Chem. Phys. 125, 084902 (2006). 10.1063/1.2221683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Michel J. and Essex J. W., J. Med. Chem. 51, 6654 (2008). 10.1021/jm800524s [DOI] [PubMed] [Google Scholar]
  108. Michel J., Verdonk M., and Essex J. W., J. Chem. Theory Comput. 3, 1645 (2007). 10.1021/ct700081t [DOI] [PubMed] [Google Scholar]
  109. Banba S. and C. L.BrooksIII, J. Chem. Phys. 113, 3423 (2000). 10.1063/1.1287147 [DOI] [Google Scholar]
  110. Banba S., Guo Z., and C. L.BrooksIII, J. Phys. Chem. B 104, 6903 (2000). 10.1021/jp001177i [DOI] [Google Scholar]
  111. Stjernschantz E. and Oostenbrink C., Biophys. J. 98, 2682 (2010). 10.1016/j.bpj.2010.02.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Riniker S., Christ C. D., Hansen N., Mark A. E., Nair P. C., and van Gunsteren W. F., J. Chem. Phys. 135, 024105 (2011). 10.1063/1.3604534 [DOI] [PubMed] [Google Scholar]
  113. See supplementary material at http://dx.doi.org/10.1063/1.4769292 for additional information on single versus dual topology calculations, a derivation of Eq. 1, and a discussion of scaffolds in relative free energy calculations.
  114. Mobley D., J. Comput.-Aided Mol. Des. 26, 93 (2012). 10.1007/s10822-011-9497-y [DOI] [PubMed] [Google Scholar]
  115. Gallicchio E., and Levy R. M., in Advances in Protein Chemistry and Structural Biology, edited by Christov C. (Elsevier, 2011), pp. 27–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. DeVoe H., in Structure and Stability of Biological Macromolecules, edited by Timasheff S. N. and Fasman G. D. (Marcel Dekker, New York, 1969), pp. 1–63. [Google Scholar]
  117. Straatsma T. P. and McCammon J. A., J. Chem. Phys. 90, 3300 (1989). 10.1063/1.456651 [DOI] [Google Scholar]
  118. Leitgeb M., Schröder C., and Boresch S., J. Chem. Phys. 122, 084109 (2005). 10.1063/1.1850900 [DOI] [PubMed] [Google Scholar]
  119. Wade R. C. and McCammon J. A., J. Mol. Biol. 225, 697 (1992). 10.1016/0022-2836(92)90395-Z [DOI] [PubMed] [Google Scholar]
  120. Hodel A., Rice L. M., Simonson T., Fox R. O., and Brünger A. T., Protein Sci. 4, 636 (1995). 10.1002/pro.5560040405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Pitera J. W. and Kollman P. A., Proteins 41, 385 (2000). [DOI] [PubMed] [Google Scholar]
  122. Hritz J. and Oostenbrink C., J. Chem. Phys. 128, 144121 (2008). 10.1063/1.2888998 [DOI] [PubMed] [Google Scholar]
  123. König G. and Boresch S., J. Comput. Chem. 32, 1082 (2010). 10.1002/jcc.21687 [DOI] [PubMed] [Google Scholar]
  124. Jayachandran G., Shirts M. R., Park S., and Pande V. S., J. Chem. Phys. 125, 084901 (2006). 10.1063/1.2221680 [DOI] [PubMed] [Google Scholar]
  125. From one perspective, there are actually four possible values, since each ligand has two possible binding modes. However, the premise of our discussion is that the ligand does not transition between binding modes during the simulations, meaning that only two values for ΔΔGsite are likely in simulations.
  126. When the reference binding mode r = 1 and ΔGL2, 2→1 is large and negative, or the reference binding mode r = 2 and ΔGL1, 1→2 is large and negative, respectively.
  127. Note that Fig. shows ligands paired so that ΔΔGsite, i involves a calculation where the scaffold remains in a particular reference binding mode (metastable state). This is not an absolute requirement—we could attempt a calculation from binding mode 1 of L1 to binding mode 2 of L2. However, unless the scaffold is much faster to alter its binding mode than the individual ligands, this calculation will typically be much more demanding than calculation within a particular stable binding mode.
  128. Khavrutskii I. V. and Wallqvist A., J. Chem. Theory Comput. 7, 3001 (2011). 10.1021/ct2003786 [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Rocklin G. J., Mobley D., and Dill K. A., “Separated topologies – A method for relative binding free energy calculations using orientational restraints” (unpublished). [DOI] [PMC free article] [PubMed]
  130. Park S., Lau A. Y., and Roux B., J. Chem. Phys. 129, 134102 (2008). 10.1063/1.2982170 [DOI] [PubMed] [Google Scholar]
  131. Zheng L. and Yang W., J. Chem. Theory Comput. 8, 810 (2012). 10.1021/ct200726v [DOI] [PubMed] [Google Scholar]
  132. Gallicchio E. and Levy R. M., J. Comput.-Aided Mol. Des. 26, 505 (2012). 10.1007/s10822-012-9552-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Jiang W. and Roux B., J. Chem. Theory Comput. 6, 2559 (2010). 10.1021/ct1001768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Morton A., Baase W., and Matthews B. W., Biochemistry 34, 8564 (1995). 10.1021/bi00027a006 [DOI] [PubMed] [Google Scholar]
  135. Morton A. and Matthews B. W., Biochemistry 34, 8576 (1995). 10.1021/bi00027a007 [DOI] [PubMed] [Google Scholar]
  136. Graves A. P., Shivakumar D., Boyce S. E., Jacobson M. P., Case D., and Shoichet B. K., J. Mol. Biol. 377, 914 (2008). 10.1016/j.jmb.2008.01.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Deng Y. and Roux B., J. Chem. Theory Comput. 2, 1255 (2006). 10.1021/ct060037v [DOI] [PubMed] [Google Scholar]
  138. Gallicchio E., Lapelosa M., and Levy R. M., J. Chem. Theory Comput. 6, 2961 (2010). 10.1021/ct1002913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Malmstrom R. D. and Watowich S. J., J. Chem. Inf. Model. 51, 1648 (2011). 10.1021/ci200126v [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Fitzgerald M. M., Churchill M. J., McRee D. E., and Goodin D. B., Biochemistry 33, 3807 (1994). 10.1021/bi00179a004 [DOI] [PubMed] [Google Scholar]
  141. Musah R. A., Jensen G. M., Bunte S. W., Rosenfeld R. J., and Goodin D. B., J. Mol. Biol. 315, 845 (2002). 10.1006/jmbi.2001.5287 [DOI] [PubMed] [Google Scholar]
  142. Baron R. and McCammon J. A., Biochem. 46(37) 10629 (2007). 10.1021/bi700866x [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES