Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2012 Apr 9;136(14):144102. doi: 10.1063/1.3701175

Theory of binless multi-state free energy estimation with applications to protein-ligand binding

Zhiqiang Tan 1,a), Emilio Gallicchio 2,a), Mauro Lapelosa 2,b), Ronald M Levy 2
PMCID: PMC3339880  PMID: 22502496

Abstract

The weighted histogram analysis method (WHAM) is routinely used for computing free energies and expectations from multiple ensembles. Existing derivations of WHAM require observations to be discretized into a finite number of bins. Yet, WHAM formulas seem to hold even if the bin sizes are made arbitrarily small. The purpose of this article is to demonstrate both the validity and value of the multi-state Bennet acceptance ratio (MBAR) method seen as a binless extension of WHAM. We discuss two statistical arguments to derive the MBAR equations, in parallel to the self-consistency and maximum likelihood derivations already known for WHAM. We show that the binless method, like WHAM, can be used not only to estimate free energies and equilibrium expectations, but also to estimate equilibrium distributions. We also provide a number of useful results from the statistical literature, including the determination of MBAR estimators by minimization of a convex function. This leads to an approach to the computation of MBAR free energies by optimization algorithms, which can be more effective than existing algorithms. The advantages of MBAR are illustrated numerically for the calculation of absolute protein-ligand binding free energies by alchemical transformations with and without soft-core potentials. We show that binless statistical analysis can accurately treat sparsely distributed interaction energy samples as obtained from unmodified interaction potentials that cannot be properly analyzed using standard binning methods. This suggests that binless multi-state analysis of binding free energy simulations with unmodified potentials offers a straightforward alternative to the use of soft-core potentials for these alchemical transformations.

INTRODUCTION

The weighted histogram analysis method (WHAM) (Ref. 1) has emerged as an effective, general method for computing free energies and expectations from multiple ensembles, for example, at different temperatures or with different biasing potentials.2, 3 There are a variety of ways to derive and understand WHAM, including the self-consistency approach1, 4 and the maximum likelihood approach.2, 5, 6 However, all existing derivations in the computational physics literature involve discretizing observations into a finite number of bins in order to construct proper histograms. On the other hand, it has been recognized that WHAM formulas remain mathematically defined even if the bin sizes are made arbitrarily small or equivalently if the actual data instead of their discretizations are used (e.g., Sec. 8.3.2 of Ref. 4). However no formal account exists in the chemical physics literature for whether and under what conditions such a binless extension is valid.

At the same time, there have been extensive developments in the mathematical and statistical fields of theory and methods leading to essentially the binless extension of WHAM.7, 8, 9, 10 Shirts and Chodera11 presented the binless method as the result of making the optimal choice among a large class of estimators,10 and called it the multi-state Bennet acceptance ratio method (MBAR) by the fact that the method reduces to the optimal Bennet acceptance ratio (BAR) (Refs. 12, 13) in the case of only two ensembles. In this article, we discuss two statistical arguments to derive MBAR equations, in parallel to the self-consistency and maximum likelihood derivations already known for WHAM. Disseminating these concepts to the chemical physics community is helpful to better appreciate the theoretical foundations of the method and to highlight the connections between MBAR and WHAM, building on the established familiarity and expertise of practitioners with the latter.

To understand from a theoretical perspective the binless formulation of WHAM, an important quantity to consider is the measure of states, a non-negative measure from which the density of states is defined as the (Radon-Nikodym) derivative with respect to the counting or Lebesgue measure.14 From this perspective, the validity of MBAR as binless WHAM can be seen as follows. The measure of states can be consistently estimated in the sense that integrals of the density of states can be estimated with standard errors inversely proportional to the squared root of the sample size, even though the density of states, in general, cannot be pointwise estimated at the usual rate of standard errors. Examples of integrals include the partition function or the probability that the value of a system observable falls into a given bin.

We also provide a number of analytically and computationally useful results on MBAR from the statistical literature. The maximum likelihood derivation shows that the MBAR estimators can be obtained by minimizing a convex objective function, equivalent to solving a system of self-consistent equations. Various fast and reliable numerical algorithms have been developed for such optimization problems. For example, the trust region algorithm is globally convergent at the second order.15 Computing MBAR estimates by these optimization algorithms can be more effective than by algorithms in current use;11 relevant comparisons have recently been reported in the context of solving WHAM equations.6

Statistical large-sample theory gives not only conditions under which the MBAR estimates are consistent and asymptotically normal but also formulas for asymptotic variance matrices, as the sample size grows to infinity. Although the theory can be applied to correlated data,7 the variance formulas are much simplified if the observations from each ensemble are independent.10, 16 These formulas can be used for variance estimation provided that observations are subsampled to be approximately independent. Alternatively, as also done here, block bootstrapping17 can be used to estimate statistical uncertainties taking into account data correlations.

We illustrate the advantages of MBAR, based on the sampled values directly without binning, over conventional WHAM, with binning, on the calculation of absolute protein-ligand binding free energies by alchemical transformations. These calculations take various forms18 but they all consist of collecting samples from simulations distributed along a suitable thermodynamic path connecting the coupled and uncoupled states of the ligand-receptor complex. The path is parameterized by a progress parameter λ whereby, for example, λ = 0 corresponds to the uncoupled state and λ = 1 to the coupled state. The progress parameter λ, in turn, dials the parameters of a hybrid potential in such a way that at λ = 1 it represents the bound complex and at λ = 0 the ligand and receptor are not interacting.19, 20, 21, 22, 23

In typical applications, the binding free energy is computed from the free energy differences between neighboring λ-states using only data collected at these states using pairwise exponential or more accurate BAR free energy estimators.13, 24, 25 These and analogous binding free energy estimators are notoriously affected by end point numerical instabilities near λ = 0, when the ligand and the receptor are nearly uncoupled. Under these conditions conformations are generated in which receptor and ligand atoms interpenetrate each other yielding very large interaction energies. These cause instabilities which are difficult to overcome unless specialized soft core potentials are employed.22, 26, 27, 28

Multi-state free energy estimation methods such as WHAM and MBAR (Refs. 3, 11) are beginning to be employed in binding free energy calculations. The general idea behind these methods is to efficiently extract information from all of the intermediate states so as to achieve binding free energy estimates with smaller statistical variance. One example in this class of methods is the binding energy distribution analysis method (BEDAM),29, 30 which is employed here. The method is based on the analysis of samples of the binding energy of the complex (defined as the change in the effective potential energy of the complex with implicit solvation for bringing receptor and ligand from infinite separation to the bound conformation) without internal conformational rearrangements. In BEDAM, the end point problem with unmodified potentials is manifested with the occurrence near λ = 0 of large binding energy values spread over an extremely wide range, which, as we will show, makes the application of binning-based methods such as WHAM unfeasible. Binless methods such as MBAR do not suffer from the same issues and are shown to be able to treat data sets of this kind. This observation opens the possibility that using binless multi-state inference methods such as MBAR in conjunction with standard functional forms for the interactions potentials could be as effective as using modified soft-core potentials to circumvent the end point problem of binding free energy calculations.

THEORY AND METHODS

Setup

Consider a generalized ensemble whose Boltzmann probability density function is

1ZθeθTu(x), (1)

where u is a column vector of d generalized energy functions of the configuration x of the system, θ is a column vector, also of length d, of corresponding coefficients, and

Zθ=eθTu(x)dx (2)

is the generalized configurational partition function in physics or the normalizing constant in statistics. Throughout, a superscript T denotes transpose so that for two vectors a and b each of length d,

aTb=k=1dakbk, (3)

where ak and bk are vector elements, gives the inner product of a and b.

The foregoing notation is suitable to accommodate various applications. For example, the canonical ensemble at inverse temperature β = 1/kBT and potential energy function U(x), is recovered by setting d = 1, θ = β, and u(x) = U(x) in Eq. 3. Similarly, the isothermal grand-canonical ensemble for a neat substance is recovered with d = 2, θ = (β, βμ), and u(x) = (U(x), N), where μ is the chemical potential and N the number of particles, so that θTu(x)=β(U(x)+μN). (Note that in this case the system configuration x includes atomic coordinates as well as the number of particles N, and Eq. 2 includes a summation over N.) A variety of ensembles commonly used in molecular simulations can also be accommodated by this notation. For example, each replica of a temperature replica exchange simulation is a canonical ensemble at the corresponding temperature as described above. Free energy perturbation and “umbrella sampling” setups are obtained by setting the potential energy vector as u(x) = (U0(x), ω1(x), …, ωd(x)), where U0(x) is the reference potential and ωk(x) is the perturbation or umbrella potential in window k, and by setting the coefficient vector in window k as θk = (β, 0, …, 0, β, 0, …, 0), in which all elements are zero except for the first (corresponding to reference potential U0) and the (k + 1)th element corresponding to the perturbation potential ωk(x). For the binding free energy application illustrated below, we adopt the latter setup but with a simplified notation afforded by the particular linear form, ωk(x) = λkb(x), of the perturbation (see Sec. 3).

The notation introduced above is also useful to obtain compact expressions for thermodynamic observables. For example, the distribution and expectation of some observable c(x) under Eq. 1 can be obtained in compact form (see, for example, Eq. 22) by formally including c(x) as a component of the generalized energy vector u(x) with the corresponding coefficient in θ set to zero, so as to leave the physical system energy θTu(x) unchanged. In the following, we will implicitly assume that the generalized energy vector u(x) includes components related to system observables.

Assume that simulations are conducted at m coefficient vectors θj (j = 1, …, m) and with the same energy vector u(x). (Note that in this notation the dimensionality, d, of the θ and u vectors and the number of simulations, m, are, in general, distinct; for example, for temperature replica exchange d = 1 while m is the number of replicas.) Denoted by {xji: i = 1, …, nj} the set of configurations of size nj obtained from the jth simulation, and denoted by uji = u(xji) the corresponding generalized energy vectors, which, as discussed above, may also include system observables. The total sample size is n=j=1mnj. Typically, the low-dimensional vectors uji are stored, instead of the high-dimensional, full configurations xji. For example, in the case of free energy perturbation calculations, uji = (U0(xji), ω1(xji), …, ωd(xji)) contains the value of the perturbation potential, ωj(xji), corresponding to the same window as the observed conformation, xji, as well as values of the perturbation potential, ωk(xji), k¬j, for all other windows for the same conformation. This specification of u(x) well captures the type of data manipulations needed in multi-state inference methods such as WHAM and, as will be seen, the binless extension of WHAM.

Under Eq. 1, the induced probability density function of u(x) at θ is of the form

1ZθΩ(u)eθTu, (4)

where Ω(u), formally defined as

Ω(u)=δ(u(x)u)dx (5)

is a generalized density of states, which does not depend on θ. The partition function Zθ can also be determined from Ω(u) as

Zθ=Ω(u)eθTudu. (6)

The density function 1 and relationship 2 are replaced by Eqs. 4, 6, respectively, when the data are reduced from xji to uji (i = 1, …, nj; j = 1, …, m).

From WHAM to binless WHAM

The WHAM, first proposed by Ferrenberg and Swendsen,1 can be used to compute various quantities of interest. The method involves constructing a histogram, Nj(u), from each sample {uji: i = 1, …, nj}, where Nj(u) indicates the number of observations falling into a bin about u, for example, an interval or a rectangle if u(x) is 1 or two-dimensional. Then Ω(u) is estimated by

Ω^(u)Δu=r=1MNr(u)r=1MnrZ^θr1eθrTu, (7)

where the partition function estimators (Z^θ1,...,Z^θm) are defined by self-consistency according to Eq. 6

Z^θk=ur=1mNr(u)r=1mnrZ^θr1e(θkθr)Tu(k=1,...,m), (8)

where the summation ∑u is taken over all possible bins centered at u of size Δu. The estimators (Z^θ1,...,Z^θm) are determined up to a multiplicative constant. It is customary to pick a reference value, for example, Zθ1, and then estimate the ratios (Zθ2/Zθ1,...,Zθm/Zθ1) from Eq. 8.

Again by relationship 6, the partition function Zθ at any other parameter value is estimated by

Z^θ=ur=1mNr(u)r=1mnrZ^θr1e(θθr)Tu. (9)

Furthermore, let h(u) be a function of u, for example, a component of u, and denote by 〈hθ the expectation of h(u) under Eq. 4, that is, the expectation of h(u(x)) under Eq. 1. From Eqs. 4, 7, the WHAM estimate h^θ for 〈hθ is

h^θ=1Z^θuh(u)r=1mNr(u)r=1mnrZ^θr1e(θθr)Tu. (10)

This estimator depends on (Z^θ,Z^θ1,...,Z^θm) up to a multiplicative constant, that is, only depends on the ratios (Z^θ/Z^θ1,Z^θ2/Z^θ1,...,Z^θm/Z^θ1). It is interesting to note that the summation over bins in Eq. 10 can be equivalently expressed in terms of a weighted average over observations

h^θ=jih(ujib)Fji(θ), (11)

where ujib is a representative generalized energy of the bin containing uji, Fji is the “WHAM weight” of uji that, by comparing Eqs. 11, 10, is defined as

Fji(θ)=Z^θ1r=1mnrZ^θr1e(θθr)Tujib=1Z^θeθTujibGji (12)

and

Gji=1r=1mnrZ^θr1eθrTujib (13)

is the θ-independent component of the WHAM weight Fji(θ) for each observation.

Equation 11 states that the expectation value of any observable can be obtained by attaching a statistical weight Fji(θ) to each observation uji which depends on the bin to which it is assigned. An obvious simplification is to express the WHAM estimate of 〈hθ and the WHAM weights [Eqs. 11, 12] in terms of the actual observations uji rather than their closest bin representatives ujib. This idea, which has been noted before without formal justification in the computational physics literature,4, 31 leads naturally to a binless extension of WHAM. A closely related formalism has been developed in statistics for computing normalizing constants.7, 8, 9, 10 Although this method can be derived by various statistical arguments, it is essentially an extension of WHAM without binning data. Below we give a formal derivation of the binless method by importance weighting and self-consistency.

To understand binless WHAM, it is useful to introduce the concept of the measure G defined by

dG=Ω(u)du, (14)

that is, G(A)=AΩ(u)du for every measurable set A of u. Informally, Eq. 14 says that for an infinitesimal bin about u of size du, the weight assigned under G is Ω(u)du. Thereafter G is called the measure of states. The concept of measure can be used to reformulate the ideas developed above. Denote by Fθ the probability distribution of u(x) under 1, that is, the probability distribution with density function 4. Then, from Eqs. 4, 14, Fθ is related to G as

dFθ=1ZθeθTuΩ(u)du=1ZθeθTudG, (15)

that is, Fθ(A)=Zθ1AeθTudG for every measurable set A of u. For an infinitesimal bin about u of size du, the probability assigned under Fθ is the density function 4 times du and hence is Zθ1eθTu times the weight assigned under G. The partition function Zθ by Eq. 6 can then be expressed as

Zθ=eθTudG. (16)

See, for example, Ref. 14 for discussion of measure-theoretic concepts.

The pooled data {uji: i = 1, …, nj, j = 1, …, m} can be regarded as an approximate sample from the mixture distribution, F*, whose components are (Fθ1,...,Fθm) with proportions (n1/n, …, nm/n). (Note that the pooled data are not strictly an independent and identically distributed sample from F*, which would involve randomly selecting a distribution Fθr with probability nr/n (r = 1, …, m), simulating one observation from Fθr and then repeating this process for n times. The numbers of observations from (Fθ1,...,Fθm) would be random, instead of being fixed at (n1, …, nm). To highlight main ideas, this difference is ignored in the derivation below. The resulting estimators are, however, evaluated in Sec. 2D without making this simplification.) Then, in analogy with Eq. 15, F* is related to G as

dF*=r=1mnrnZθr1eθrTuΩ(u)du=r=1mnrnZθr1eθrTudG. (17)

For an infinitesimal bin about u of size du, the probability assigned under F* is the expression in the curly bracket times the weight assigned under G. Dividing both sides of Eq. 17 by the quantity in the curly brackets gives

dG=r=1mnrnZθr1eθrTu1dF*. (18)

For an infinitesimal bin about u of size du, the weight assigned under G is the inverse of the quantity in the curly brackets times the probability assigned under F*.

Relationship 18 can be used for estimating G from the pooled data by importance weighting. Recall that the pooled data form an approximate sample from F*. Then F* can be estimated by the empirical distribution F^* for which each observation uji is assigned the probability n−1. By Eq. 18, the resulting estimator G^ is a discrete measure for which each observation uji is assigned the weight

G^(uji)=1r=1mnrZ^θr1eθrTuji, (19)

where (Z^θ1,...,Z^θm) are defined by self-consistency according to Eq. 16

Z^θk=j=1mi=1njeθkTujiG^(uji)=j=1mi=1nj1r=1mnrZ^θr1e(θkθr)Tuji(k=1,...,m). (20)

Formulas 19, 20 provide a binless extension of Eqs. 7, 8 in WHAM.

By again relationship 16, the partition function Zθ at any other parameter value is estimated by

Z^θ=j=1mi=1njeθTujiG^(uji)=j=1mi=1nj1r=1mnrZ^θr1e(θθr)Tuji. (21)

The expectation 〈hθ is by definition Zθ1h(u)eθTudG and hence estimated by

1Z^θj=1mi=1njh(uji)eθTujiG^(uji)=1Z^θj=1mi=1njh(uji)r=1mnrZ^θr1e(θθr)Tuji. (22)

Formulas 21, 22 provide a binless extension of Eqs. 9, 10 in WHAM. In addition, we see that the WHAM weights 13, identified heuristically earlier, coincide (except for the difference between ujib vs. uji) with the discrete measure with weights 19 derived from the statistical theory sketched out above. Therefore, the binless formulation of WHAM, while it appears straightforward, is nevertheless rooted on fundamental statistical concepts.

It is worth emphasizing that the binless method, like WHAM, can be used not only to estimate partition functions Zθ and equilibrium expectations 〈hθ, but also to estimate equilibrium distributions Fθ. Recall that u(x) is in general a vector of multiple components and Fθ is the joint distribution of those components under Eq. 1. By relationship 15, Fθ is estimated by a discrete distribution F^θ on the pooled data with probabilities

F^θ(uji)=Z^θ1r=1nnrZ^θr1e(θθr)Tuji. (23)

In other words, Fθ is approximated by attaching weight 23 to each observation uji in the pooled data, where the weights sum up to 1 by Eq. 21. As a result of this approximation, the marginal distribution of h(u(x)) under Eq. 1 is approximated by attaching the same weight 23 to h(uji) for each uji in the pooled data. Then the expectation 〈hθ is approximated as before (Eq. 22) by a weighted average of the form j=1mi=1njh(uji)F^θ(uji).

The above approximation to the marginal distribution of h(u(x)) under Eq. 1 can be visualized as a weighted histogram with suitable bins. The height of each bin is the sum of F^θ(uji) such that h(uji) falls into the bin for uji in the pooled data. The histogram can be normalized into a probability density plot, where the height of each bin is divided by the bin size. If θ = θk for some k, this weighted histogram based on the pooled data provides a better approximation than the raw histogram of h(uki) based on the observations uki from Fθk only. On the other hand, a comparison of these two histograms can be used to assess goodness of simulations. A substantial discrepancy between the two histograms suggest that the quality of simulations is questionable, that is, the simulated data are actually not distributed according to Eq. 4.

Maximum likelihood

We describe a derivation of binless WHAM by the method of nonparametric maximum likelihood taking G as an infinite-dimensional unknown parameter.9 The likelihood of the jth sample from Fθj is by Eq. 15

Lj=i=1nj1ZθjeθjTujiG(uji), (24)

where G(uji) is the mass assigned to the singleton uji under G, and Zθj=eθjTudG, a functional of G, by Eq. 16. The likelihood of the pooled sample is then L=j=1mLj. The method of nonparametric maximum likelihood is to find G^ which maximizes the likelihood L among all possible non-negative measures including discrete measures.

There are two steps to find the maximum likelihood estimator G^. First, it is sufficient to restrict our search to discrete measures supported on the set of pooled data {uji: j = 1, …, nj, j = 1, …, m}. If a positive mass is assigned under G to any set outside the pooled data, then relocating the mass evenly to each observation in the pooled data only increases L. Second, for a discrete measure G, put wji = G(uji). The likelihood at G is

L=j=1mi=1nj1ZθjeθjTujiwji, (25)

where Zθr=j=1mi=1njeθrTujiwji for r = 1, …, m. Taking the log of the likelihood gives

logL=j=1mi=1njlogwjij=1mnrlogZθrr=1mi=1njθjTuji. (26)

The term outside the curly brackets does not depend on wji and can be ignored. Taking the partial derivative of log L with respect to wji gives

1wjir=1mnreθrTujiZθr=0 (27)

or

wji=1r=1mnrZθr1eθrTuji, (28)

which leads to the basic formulas 19, 20. Furthermore, substituting the expression of wji into the term inside the curly bracket in Eq. 26 yields

j=1mi=1njlogr=1mnrZr1eθrTujir=1mnrlogZr, (29)

which is a function of (Z1, …, Zm) only. This function multiplied by −n−1 and then subtracted by log n gives the function κ below (Eq. 31).

It is interesting that the maximum likelihood estimator G^ is always a discrete measure, even though the actual measure G is not. This discrete approximation of G by G^ serves precisely our computational purpose. A complication is that even though there is a general statistical theory to justify the method of maximum likelihood with a finite-dimensional unknown parameter, the validity of the estimators obtained by the method of nonparametric likelihood need to be established on a case-by-case basis. Fortunately, a statistical theory of binless WHAM has been rigorously developed in statistics, and is reviewed in Sec. 2D.

The foregoing derivation takes the measure of states G as the underlying unknown parameter. Equivalently, the method of nonparametric maximum likelihood can be applied with a reparameterization taking Fθ0 as the unknown parameter for some fixed, reference value θ0. By Eq. 15, Fθ is related to Fθ0 as

dFθ=Zθ0Zθe(θθ0)TudFθ0. (30)

By invariance of maximum likelihood under reparameterization, the resulting estimator of Fθ0 is the same as Eq. 23 with θ set to θ0. Furthermore, the formulas 20, 21, 22 remain the same as before. This derivation is essentially an extension of the derivation of WHAM by Bartels and Karplus, and Gallicchio et al.2, 5

Statistical theory

As seen from Sec. 2B, the estimators in binless WHAM are similar to those in WHAM, but based on the actual data without binning. While this construction seems heuristically easy, a central issue is to evaluate statistical and computational properties of binless WHAM. We point out a number of useful results which demonstrate the usefulness of the binless formulation of WHAM, by drawing on related statistical work. Although there are results applicable to correlated data,7 we assume for simplicity that {uji: i = 1, …, nj; j = 1, …, m} are independent.

First, the estimators (Z^θ1,...,Z^θm) are defined by Eq. 20, a system of nonlinear equations. Remarkably, an equivalent characterization is that log(Z^θ1,...,Z^θm) are jointly a minimizer of the criterion function7, 16

κ(logz1,...,logzm)=1nj=1mi=1njlogr=1mnrnzr1eθrTuji+r=1mnrnlogzr. (31)

See Sec. 2C above for the derivation of κ by maximum likelihood. The function κ is invariant under translation: κ(a + log z1, …, a + log zm) = κ(log z1, …, log zm) for an arbitrary constant a, in agreement with the fact that log(Z^θ1,...,Z^θm) are only determined up to an additive constant. Moreover, κ is bounded from below, by application of Jensen's inequality to the log of the term in the curly brackets

logr=1mnrnzr1eθrTujir=1mnrnlogzr1eθrTuji=r=1mnrnlogzrr=1mnrnθrTuji. (32)

Finally, if one of (log z1, …, log zm) is fixed, for example log z1 = 0, then κ is strictly convex.16 The convexity can be directly shown by the fact that

r=1mnrnzr1eθrTuji (33)

is convex, and consequently the log of this term is also convex in (log z1, …, log zm). Therefore, log(Z^θ2/Z^θ1,...,Z^θm/Z^θ1) can be obtained as a unique minimizer of κ(0, log z2, …, log zm). This approach of minimizing a convex function can be more effective than solving the system of nonlinear equations 20 by the self-consistency or the Newton-Raphson algorithm.11 See Appendix A for details.

Second, the estimators (Z^θ2/Z^θ1,...,Z^θm/Z^θ1) are always consistent (that is, converge in probability to the true values) and asymptotically normally distributed as the sample size nj tends to infinity and nj/n is fixed for each j.10, 16 The connectedness condition required for the general result of Gill et al.16 and Tan10 is satisfied here because the weighting function eθTu is positive. Moreover, the estimator Z^θ/Z^θ1 is consistent and asymptotically normally distributed provided that the variance under F* of the density ratio of Fθ over F* is finite

r=1mnrnZθr1e(θθr)Tu2dF*<. (34)

Similarly, the estimator of 〈hθ is consistent and asymptotically normally distributed provided that the variance under F* of h(u) times the density ratio of F* over F* is finite

h2(u)r=1mnrnZθr1e(θθr)Tu2dF*<. (35)

These conditions require that the mixture “umbrella” distribution F* should provide sufficient coverage of Fθ, so that observations from F* can be weighted by the density ratio of Fθ over F* to estimate Fθ. Therefore, interpolation is in general valid, but extrapolation needs to be considered more carefully. For example, for the application in Sec. 3, it is important to obtain observations from the end thermodynamic states, in addition to intermediate states, in order to estimate the free energy differences between them. Obtaining observations from thermodynamic states however close to the end states, but not at end states, would require extrapolation whereby condition 34 would be difficult to verify.

Third, the asymptotic variance matrix of (Z^θ2/Z^θ1,...,Z^θm/Z^θ1) and Z^θ/Z^θ1 jointly can be consistently estimated without using any generalized inverse such as the Moore-Penrose inverse.10 This approach differs from that of Kong et al.9 and Shirts and Chodera11 based on the asymptotic variance matrix of (Z^θ1,Z^θ2,...,Z^θm), which necessarily involves use of generalized inverses. Similarly, the asymptotic variance of the estimator of 〈hθ can be consistently estimated. The resulting variance formula is appropriate even when h(u) is not always non-negative, in contrast with Shirts and Chodera (Sec. IV of Ref. 11). See Appendix B for details.

Fourth, when m = 2, the estimator Z^θ2/Z^θ1 is equivalent to Bennett's optimal acceptance ratio method (BAR),12 which attains the smallest asymptotic variance among bridge sampling estimators of the form8, 13

n11i=1n1α(u1i)e(θ2θ1)Tu1in21i=1n2α(u2i), (36)

where α(·) is an arbitrary function, for example, α(u)=min(e(θ1θ2)Tu,1). In general, the estimators (Z^θ2/Z^θ1,...,Z^θm/Z^θ1) and Z^θ/Z^θ1 jointly attain the smallest asymptotic variance matrix in the order on positive-definite matrices among a class of extended bridge sampling estimators based on Eq. 36.10, 11 Similarly, the estimator of 〈hθ attains the smallest variance among corresponding extended bridge sampling estimators. For this reason, the binless method was called the multi-state Bennett acceptance ratio method (MBAR) by Shirts and Chodera.11

APPLICATION: ESTIMATION OF BINDING FREE ENERGIES

This section illustrates the application of the binless method using both the MBAR software (as developed by Shirts and Chodera11) and our computational implementation based on Sec. 2D (referred to as unbinned WHAM or UWHAM), to the estimation of protein-ligand binding free energies. As we will show, due to the wide range of values of the binding energies involved, it is difficult to apply the conventional WHAM binning method to this problem unless soft-core potentials are employed. In contrast, the binless approach yields consistent results in all cases.

The binding free energy measures the propensity of a receptor R to be associated in solution with a ligand L. The binding free energy is by definition the difference between the free energy of the receptor-ligand complex and the free energy of the dissociated receptor and ligand. In this work, binding free energies are estimated by simulation in the context of the BEDAM,29 which, in the present formalism can be summarized as follows.

Working within the implicit solvent representation, the potential energy of a conformation x of the complex can be written as18, 29

URL(x)=UR(x)+UL(x)+b(x), (37)

where UR and UL are the potential energies of the dissociated receptor and ligand in solution and b(x) is the binding energy of conformation x of the complex, defined as the change in potential energy for bringing into contact the receptor and ligand from infinite separation without intramolecular conformational rearrangements. Based on the notation developed in Sec. 2A, we recognize that the coupled (ligand and receptor fully interacting) and decoupled (non interacting ligand and receptor) ensembles can be cast in the form of the generalized ensemble representation of Eqs. 1, 2, 3 with a two-dimensional potential energy function vector u = (U0, b) where

U0(x)=UR(x)+UL(x) (38)

is the reference potential energy function corresponding to the uncoupled state and b is the binding energy function. Using Eq. 37, the potential energy of the decoupled state corresponds to the coefficient vector θdcpld = (β, 0) and the one for the coupled ensemble is θcpld = (β, β). The binding free energy is then given by the ratio of the corresponding partition functions Zθ cpld and Zθ dcpld :

ΔGb=kTlogZθ cpld Zθ dcpld . (39)

Note that the observable standard binding free energy also includes a standard state concentration-dependent term18, 19, 29 which, being constant among the systems investigated, is included in the results30 but not further discussed in this work.

A series of intermediate states k = 1, …, m are introduced with potential energies

Uk(x)=U0(x)+λkb(x), (40)

where λ1 = 0 corresponds to the decoupled state and λm = 1 corresponds to the coupled state. The intermediate states with λi between 0 and 1 serve as interpolating states in which receptor and ligand partially interact to connect, in a free energy sense, the two end states.25 In general, as stated in Sec. 2A, a (m + 1)-dimensional potential energy vector u = (U0(x), ω1(x), …, ωm(x)), with ωk(x) = λkb(x), and corresponding (m + 1)-dimensional θ vectors are necessary to describe this collection of ensembles. However in this case, taking advantage of the particular linear expression of ωk(x), it is convenient to collapse the λk dependence on the coefficient vector θ so as to lower the dimensionality of the generalized energy vector. By doing so, each of the states corresponds to a two-dimensional θ vector of the form θk = (β, βλk) which multiplies the potential energy vector u = (U0, b) introduced above to yield, by means of Eq. 3 the potential energy functions in Eq. 40.

The partition function of each state is computed from Eq. 22 setting Zθ dcpld =Zθ1=1. Using Eq. 3 and the above, it is easy to see that the term (θkθr)Tuji in Eq. 20 in this case simplifies to

(θkθr)Tuji=β(λkλr)bji, (41)

which does not include the total reference potential energy U0 and depends only on the binding energy bji of the ith sampled conformation xji from a simulation at λ = λj. Analogously, it is straightforward to show that Eq. 31 simplifies to

κ(logz1,...,logzm)=c+1nj=1mi=1njlogr=1mnrnzr1eβλrbji+r=1mnrnlogzr, (42)

where c is a constant that depends only on the observations of U0 and does not affect the position of the minimum.

Similarly, in the denominator of the WHAM equation (Eq. 8), the (θkθr)Tu term reduces to β(λk − λr)b, which depends only on the binned value b of the binding energy. Furthermore expressing Eq. 8 as

Z^λk=U0br=1mNr(U0,b)r=1mnrZ^λr1eβ(λkλr)b=br=1mNr(b)r=1mnrZ^λr1eβ(λkλr)b, (43)

we see that the two-dimensional histogram Nr(u) = Nr(U0, b) can be replaced by the one-dimensional marginal histogram Nr(b)=U0Nr(U0,b) of the binding energy. Consequently, in both the WHAM and MBAR calculations that follow it has been sufficient to collect only the binding energy samples from the molecular simulations.

Binding energies are collected from Hamiltonian replica exchange all-atom molecular dynamics simulations of the protein complexes as described29, 30, 32 for a series of λ values from 0 (decoupled state) to 1 (coupled state). The binding energy data is then fed into Eq. 8, using binning, or Eq. 42, without binning, to compute the ratios of partition functions and ultimately the binding free energy from Eq. 39. See below for a description of the biological systems and simulation settings.

WHAM estimates with binning

The distributions of binding energies depend critically on the λ value at which they are obtained. At λ = 1, when the ligand and the receptor fully interact, binding energies are typically centered around favorable (negative) values (see Fig. 1). In contrast at λ = 0, in the absence of receptor-ligand interactions, the ligand is likely to sample conformations with unfavorable clashes between receptor and ligand atoms, corresponding to large unfavorable (positive) values of the binding energy (see Fig. 2). In principle, because the Lennard-Jones and Coulomb interatomic potentials tend to infinity at zero interatomic separation, there is no finite upper limit to the range of binding energies that can be observed. As shown here, this causes major difficulties for the binning of binding energy data to be used in conjunction with WHAM (Eq. 8), since in this case the binding energy samples are spread out very sparsely in a region spanning many orders of magnitude which is impossible to bin reliably without using very wide bins leading to large integration errors.

Figure 1.

Figure 1

Computed probability density at λ = 1, p1(b), for the complex with ligand 6 with the unmodified potential.30 The line represents the UWHAM estimate from the data collected from all λ-replicas. The crosses correspond to the probability density computed from the histogram of the binding energy data at only λ = 1. Good correspondence between the two densities is observed.

Figure 2.

Figure 2

Computed probability density at λ = 0, p0(b), for the complex with ligand 6 with the unmodified potential.30 The line represents the UWHAM estimate from the data collected from all λ-replicas. The crosses correspond to the probability density computed from the histogram of the binding energy data at only λ = 0. There is good correspondence between the two densities in the range explored by the λ = 0 replica. The binding energy grid used for this plot has 200 bins, equally spaced (0.5 kcal/mol bin sizes) for negative binding energies and exponentially increasing spacing for positive values to up to 109 kcal/mol. Even though p0(u) is predicted to be maximal at approximately u = 20 kcal/mol, it is rare to observe binding energies in that range because of the small integrated cumulative probability at low binding energies (note the logarithmic layout of the binding energy axis). The UWHAM estimate instead extends to as low as −40 kcal/mol (the lowest observed sample at all λ’s) with an estimated probability density on the order 10−28 kcal/mol−1 (not shown for clarity).

Conventional wisdom dictates that the number of bins should be small enough so that each bin contains more than a few samples so as to minimize statistical noise in the resulting histograms. On the other hand, the binning resolution should be sufficiently fine so as to avoid significant integration errors when replacing the integral in Eq. 6 with the summation over bins in Eq. 8. It is not always clear how to balance these opposing requirements especially when, as in this case, the range of values to be binned is unbounded. Of course, as shown above, we now know that it is justifiable to increase the number of bins indefinitely, reaching the limit where the WHAM formula is indistinguishable from the MBAR formula, which is based on the sampled values directly without binning.

In Table 1 we report WHAM binding free energy estimates for the complex with ligand 2 (see below for a description of protein-ligand complexes) varying the number of bins. In these calculations a uniform grid spacing has been used in the favorable binding energy range and an exponentially increasing bin spacing for unfavorable binding energies. We see that the results change significantly as the grid resolution is increased. With fewer bins and coarser bin widths WHAM under-predicts binding affinities. As the number of bins is increased the estimate of the binding free energy gets closer to the limiting value of ΔGb ≃ −2.2 kcal/mol obtained with, effectively, an unlimited number of bins (see results in Table 3). These results indicate that binning the unfavorable range of binding energies, even with a thousand bins, leads to large errors.

Table 1.

WHAM results for ligand 2 with the unmodified potential varying the number of bins.

Nbins ΔGb1
100 3.50
120 7.62
150 0.81
200 −0.07
250 −0.51
1000 −1.45
2 −2.21
1

In kcal/mol.

2

MBAR/UWHAM result from Table 3.

Table 3.

Comparison of MBAR/UWHAM and WHAM computed binding free energies for the six complexes of FKBP with (“soft-core”) and without (“unmodified”) the soft-core binding energy function.

Ligand Expt1 WHAM1 2, 3n2 MBAR/UWHAM1 2, 3n2 MBAR/UWHAM1 3, 3n3 (sub-sampled)
    Soft-core Unmodified Soft-core Unmodified Soft-core
2 −7.80 ± 0.1 −2.46 ± 0.18 −2.21 ± 0.12 −2.56 ± 0.19 −2.22 ± 0.45 −2.10 ± 0.51
3 −8.40 ± 0.1 −3.90 ± 0.24 −3.86 ± 0.20 −4.01 ± 0.23 −4.63 ± 0.47 −3.85 ± 0.48
5 −9.50 ± 0.1 −3.85 ± 0.37 −4.13 ± 0.23 −3.98 ± 0.32 −4.03 ± 0.50 −4.30 ± 0.50
6 −10.80 ± 0.3 −3.74 ± 0.36 −3.74 ± 0.21 −3.86 ± 0.31 −3.77 ± 0.49 −3.79 ± 0.49
8 −10.90 ± 0.1 −4.45 ± 0.28 −5.42 ± 0.14 −4.59 ± 0.30 −5.81 ± 0.51 −3.61 ± 0.53
9 −11.10 ± 0.2 −6.19 ± 0.35 −6.03 ± 0.19 −6.31 ± 0.32 −6.20 ± 0.55 −6.34 ± 0.55
1

In kcal/mol.

2

Using all of the data, statistical errors computed by block-bootstrapping.

3

Using 1-in-50 sub-sampled data and statistical errors computed as described in Appendix B.

Data censoring bias

One simple way to circumvent the need for binning a very large range of binding energies is to terminate the binning at a large but finite grid value bc and assign all of the samples with values larger than the maximum to this last bin (data censoring).29 This approach intuitively appears valid based on the argument that unfavorable binding energies much larger than thermal energy are equally unlikely to be sampled by the complex regardless of their specific value. However, as shown in Table 2, this leads to significant bias in the binding free energy estimates. The estimates in Table 2 are obtained for ligand 2 with 250 bins and the binning limit, bc, indicated. The results show that using a small bc (but still much larger than binding energy values achievable at λ = 1 at standard temperature) leads to overestimation of binding affinities, and that the bias progressively shifts to less negative values as bc is increased – but overshoots the correct value because, with a fixed bin size, the bins became too coarse as the the energy limit of the last bin increases.

Table 2.

WHAM results for ligand 2 with the unmodified potential changing the energy limit of the last bin with Nbins = 250.

bc1 ΔGb1
20 −5.92
80 −5.23
200 −4.88
107 −1.46
109 −0.51
1

In kcal/mol.

The origin of the data censoring bias can be understood in general terms by recognizing that it amounts to assuming that the potential energy of the system is bounded although no limit is actually present. In other words, the data is being analyzed with a statistical model inconsistent with the system that generated the data. To understand the effect in numerical terms consider the denominator in Eq. 43 for b = bckT. When λk = 1, the quantities exp [β(1 − λr)bc] are all positive and some are very large. It follows that the sum in the denominator is large and the contribution to Zλ = 1 from the bin at bc is negligible regardless of the specific value of bc. A similar conclusion can be reached for any large value of λk. However for small values of λk such that βλkbc ≃ 1 the values of the quantities exp [β(λk − λr)bc] can vary significantly depending on the specific value of bc. This leads to incorrect estimates of the free energy profile at small λ’s and, in turn, of the total free energy change.

Soft-core binding energy function

Another approach that we have explored in this work is to, in effect, prevent the generation of large binding energies by adopting a soft core potential in the simulations. Soft core potentials are commonly used to attempt to improve the convergence of free energy calculations.26, 27, 33 In this work, a soft core potential is introduced in terms of a modified binding energy function b(x) of the form30, 34

b(x)=b max tanh[b(x)/b max ]b(x)>0b(x)b(x)0, (44)

where bmax is some large positive value, set in this work to either 103 kcal/mol (soft core) or 109 kcal/mol (referred to below as the “unmodified” binding energy function). The modified binding energy function b(x) serves the purpose of capping the maximum value of the binding energy while leaving unchanged the values of favorable binding energies. Here it is used throughout in the molecular simulations and the statistical analysis in place of the actual binding energy function. The potential energy of λ = 0 state is equal to u0(x) (see Eq. 40) and is unaffected by the binding energy function. Furthermore, the λ = 1 state with the soft core binding energy function is virtually indistinguishable from the original one as large positive values of the binding energy are never sampled during the simulation. We conclude therefore that the free energy difference between the λ = 0 and λ = 1 states (that is the binding free energy) is not significantly affected by the introduction of the modified binding energy function.28 Indeed, as shown below, we obtain statistically indistinguishable binding free energy estimates with the two binding energy functions, with any small difference possibly attributable to other factors, such as insufficient equilibration and convergence.

Table 3 reports WHAM binding energy estimates obtained with the soft core binding energy function (Eq. 44) with bmax = 103 kcal/mol. These calculations employed a grid with 250 bins similar to the one used above for the unmodified binding energy function (Table 1) but extending only up to b = bmax since no samples are present beyond this value. The limited extent of the range of binding energies makes it possible to select a sufficiently fine binning grid with a reasonable number of bins. Because the binding free energy estimates so obtained are in agreement with the MBAR/UWHAM estimates (see below) obtained with the unmodified potential function, we conclude that the soft core WHAM results indeed reflect the correct binding free energies for this system. Conversely, based on the results above, we conclude that application of WHAM to the data with the unmodified potential leads to incorrect results with any reasonable binning choice we attempted.

MBAR/UWHAM estimates without binning

As discussed in Sec. 2, binless free energy estimation methods make it unnecessary to bin the data in order to compute free energies. Binding energy samples bji, i = 1, …, nj, from each simulation at λ = λj are simply fed into Eq. 20, which is solved for the Zλj’s by self-consistency11 (referred to as the MBAR implementation) or by minimization of Eq. 42 (referred here as the UWHAM implementation); see below for details on the numerical implementation. The resulting binding free energy estimates for the six protein-ligand systems are given in Table 3. Identical results are obtained with either the MBAR or UWHAM implementations. Also reported in Table 3 are the results obtained with WHAM on the data with the soft core potential.

We immediately notice that the MBAR/UWHAM results with the unmodified potential agree very closely with those using the soft core potential. The fact that we obtained consistent results from two independent sets of simulations, each providing very different binding energy datasets is a strong indication that both of these results reflect the actual binding free energies for these systems. This is a significant result because it shows that binless methods are capable of treating correctly the distribution at high binding energies even though this extends to extremely large values (109 kcal/mol) and it is extremely sparsely sampled. For example, for the complex with ligand 6 there is on average only one observation every 10 000 kcal/mol in the range between 106 and 107 kcal/mol, a regime in which, clearly, binning is not a feasible option. As discussed above, reliable WHAM results could be obtained only for the soft core data because of challenges with binning the unmodified binding energy data. The agreement between MBAR/UWHAM soft core and unmodified potential results and with the WHAM soft core results confirms the ability of the binless inference methods to handle non soft-core data reliably.

The MBAR results obtained are based on the same equation (Eq. 8) derived here.11 The only difference is the computational procedure to solve it. UWHAM uses a minimization procedure with the criterion function (Eq. 42), whereas MBAR employs a self-consistent procedure optionally supplemented by Newton-Raphson iterations. Here we have used the simple self-consistent solution starting with the default initial guess Zλk=1 (the same initial guess used for UWHAM). In our experience, UWHAM has provided a converged solution in significantly less computational time than MBAR (seconds vs. minutes typically), a feature that has been particularly helpful in block-bootstrapping uncertainty calculations involving 100 independent free energy evaluations per ligand. MBAR and UWHAM yielded virtually identical results thereby validating numerically the new minimization procedure presented here.

The last two columns in Table 3 are the results based on subsampled data,35 including the point estimates and analytical errors computed as described in Appendix B. For each system, a subsample of size 20 has been selected, with 1 in every 50 time points, from the original sample of size 1000. The point estimates are reasonably close to those based on the original samples. The analytical errors, assuming uncorrelated data, are approximately 0.50 kcal/mol for all the systems, with or without the soft-core potential. Adjusting for sample sizes, the errors based on uncorrelated data of size 1000 would be about 0.50/500.07 kcal/mol. Comparison of such adjusted errors with the block-bootstrap errors for the original data then indicates statistical inefficiency36 caused by correlations. For example, for ligand 2, the factor of statistical inefficiency due to correlated data is (0.12/0.07)2 = 2.9 for the unmodified potential and (0.19/0.07)2 = 7.4 for the soft-core potential. It is interesting to note that the statistical inefficiencies with the soft-core potential are consistently larger than those with for the unmodified potential, implying smaller correlations and faster convergence of binding free energies with the latter.

MBAR/UWHAM probability densities

The BEDAM binding free energy theory highlights the fundamental importance of probability densities pλ(b) of the binding free energy as a function of the progress parameter λ. For example we have shown29 that the binding free energy (Eq. 39) can be written as

ΔGb=kTlogp0(b)eβbdb, (45)

where p0(b) is the probability density of the binding energies at λ = 0, that is in absence of ligand-receptor interactions. The probability density of binding energies p1(b) of the ligand-receptor coupled state is also of special interest. The mean of p1(b) is the average binding energy 〈b1 which measures the driving force toward binding provided by favorable ligand-receptor interactions. The difference between the binding free energy and the average binding energy is the binding reorganization free energy that measures energetic strain and entropic factors which oppose binding. In addition to thermodynamic decompositions of this kind, p1(b) also leads to conformational decompositions of the binding free energy. p1(b) can be interpreted as the contribution to the binding affinity of conformations with binding energy b and, consequently, distinct macrostates of the complex contribute to binding affinity proportionally to the integrated intensity of the corresponding components of p1(b).18, 29

It is straightforward to estimate these probability densities by binning and WHAM using Eqs. 7, 4, which for the present application can be condensed as2

pλ(b)Δb=1ZλrNr(b)rnrZλr1eβ(λλr)b, (46)

where pλ(b) is the estimate of the probability density in correspondence with a bin centered at b, with bin width Δb, and Nr(b) is the number of observations in that bin from the simulation at λ = λr. As presented above (see Eqs. 23, 22) the procedure to obtain probability densities and their moments (such as expectation values) is somewhat different when using binless methods. First each binding energy observation bji is assigned a λ-dependent statistical weight given in this case by

Fλ(bji)=1ZλrnrZλr1eβ(λλr)bji. (47)

The sum of statistical weights over the samples is automatically unitary. Expectation values are computed as weighted averages using the weights in Eq. 47. For example the average binding energy at λ is

bλ=jibjiFλ(bji). (48)

Averages of other properties can be obtained similarly by replacing bji in Eq. 48 with any property of the sampled conformation ji. As discussed in Sec. 2, this expression can also be used to estimate probability densities, such as the binding energy densities pλ(b). These can be approximated by the relationship

pλ(bk)Δbkδbk(b)λ, (49)

where δbk(b) is a function defined as 1 if the argument falls within the bin centered at binding energy bk with width Δbk and zero otherwise. Then the average in Eq. 49 is computed using the equivalent of Eq. 48

pλ(bk)Δbkjiδbk(bji)Fλ(bji). (50)

So the UWHAM calculation of pλ(b) basically consists of binning samples based on their binding energies and then creating a histogram in which the height of each bin is the sum of the weights Fλ(bji) of the observations collected in that bin.

Figures 12 illustrate the p1(b) and p0(b) probability densities obtained by UWHAM and Eq. 50 for the complex with ligand 6.30 These are compared with the corresponding probability density estimates from the histograms of the data collected only at λ = 1 and λ = 0, respectively. There is good agreement between the two estimates in the region of binding energies well sampled at the respective λ values, further validating the UWHAM results. The tails of the probability densities are estimated much more accurately by UWHAM than by the direct histograms because these are rarely sampled by the simulations conducted only at a specific λ. The UWHAM probability densities are instead estimated from data obtained from simulations at multiple λ values between 0 and 1 which explore a much wider range of binding energies. Obtaining accurate tails of probability densities is very important in a variety of applications such as for example when employing Eq. 45 to estimate the binding free energy from p0(b) (see Fig. 2). Due to the exponential term in the integrand of Eq. 45, the p0(b) density in the range −40 < b < −10 kcal/mol dominates the estimate of the binding free energy and the data collected at λ = 0 constitute a very poor estimate of p0(b) in this region of binding energies (although this is difficult to see in Fig. 2 because of log-log representation).

Simulation setup and numerical analysis

BEDAM calculations29 were performed for six complexes of FKBP with ligands 2, 3, 5, 6, 8, and 9 from Ref. 37, from a ligand set which was the subject of previous binding free energy calculations.30, 38, 39 Complexes were prepared as described30 based on the crystal structures of ligands 8 and 9 (PDB ID’s 1FKG and 1FKH, respectively). Two BEDAM calculations were conducted for each complex, both employing Eq. 44 to represent the protein-ligand interaction potential, one with bmax = 109 kcal/mol (referred to as the unmodified potential) and the other with bmax = 103 kcal/mol (referred to as the soft-core potential). Soft-core calculations employed 15 BEDAM replicas at λ = 0, 10−3, 2 × 10−3, 4 × 10−3, 6 × 10−3, 8 × 10−3, 10−2, 2 × 10−2, 6 × 10−2, 0.1, 0.25, 0.5, 0.75, 0.9, and 1. Calculations with the unmodified potential employed 18 replicas at λ = 0, 10−9, 10−8, 10−7, 10−6, 10−5, 10−4, 10−3, 10−2, 10−1, 0.15, 0.25, 0.35, 0.5, 0.75, 0.9, and 1. Hamiltonian replica exchange simulations were conducted for 2 ns per replica (396 ns total simulation time). Binding energies were recorded at 1 ps intervals during the second half of the simulations, yielding 1000 observations per replica.

WHAM analysis has been performed employing Eqs. 43, 45, 46 as described2 on the collected binding energy data, bji, using a binning grid starting at −40 kcal/mol (the lowest recorded binding energy value) to a set maximum (see Tables 1, 2) for the unmodified binding energy function or the to maximum allowed value (bmax = 103 kcal/mol) for the soft core data. Grid spacing was set to 0.3 kcal/mol in the (−40, −10) binding energy range, increasing exponentially starting from this value at a rate adjusted to reach the variable maximum set value with the given number of bins.

UWHAM analysis was conducted on the same binding energy data to obtain the logarithm of the partition functions logZλk (relative to log Z0, which is set to 0) by minimization of the function κ(log z1, …, log zm) in Eq. 42 with respect to log zk setting the free energy of the unbound state to zero (log z1 = log Z0 = 0). For the minimization, we used the trusted region algorithm15 as implemented in the R statistical package “trust”.40 A similar procedure has been recently proposed in the context of WHAM.6 The code for the UWHAM R module we employed and a set of use examples in R are available by the authors upon request. MBAR calculations were performed using the code kindly provided by Chodera and Shirts.11

Statistical uncertainties were computed using the block bootstrapping method.17 Similarly to the traditional block averaging approach,41 the method consists of dividing the sampled data in Nb time-contiguous blocks (Nb = 20 in this work). However, in this context blocks span all of the replicas; that is each block contains the data generated from all of the replicas in the same time window. Each block is then assigned an integer identifier and the list of block identifiers plays the same role of the data samples in the standard bootstrap method. Namely, a new block identifier list of length Nb is created by sampling with repetition from the original list and a corresponding new binding energy dataset is generated by collating the data contained in the blocks of the new list. This is repeated a number of times (100 times in this case) and the statistical uncertainty of the binding free energy is estimated from the standard deviation of the free energy values from each bootstrap sample. The advantage of this bootstrap technique is that it accounts for time correlation of samples originating from each replica as well as cross-correlations between replicas due to λ exchanges.

CONCLUSION

We demonstrate the statistical validity and usefulness of interpreting MBAR as a binless formulation of WHAM. Like WHAM, the binless formulation can be used not only to estimate free energies and equilibrium expectations, but also to estimate equilibrium distributions. This development allows practitioners to easily build on their current applications of WHAM, but without discretizing observations into bins, which may sometimes incur substantial biases. This is illustrated for alchemical absolute binding free energy calculations using the BEDAM technique. While UWHAM and MBAR11 binless implementations yield equivalent results for either the unmodified and soft-core potentials, binning of the unmodified data leads to substantial biases which vary depending on the level of discretization. These results indicate that binless multi-state inference approaches are potentially a straightforward alternative to soft-core potentials for binding free energy alchemical calculations.

ACKNOWLEDGMENTS

This work has been supported in part by research grants from the National Institute of Health (Grant No. GM30580) and the National Science Foundation (CDI type II Grant No. 1125332 and DMS-0749718). The calculations reported in this work have been performed at the BioMaPS High Performance Computing Center at Rutgers University funded in part by the NIH shared instrumentation (Grant Nos. 1 S10 RR022375 and 1 S10 RR027444), and on the Lonestar4 cluster at the Texas Advanced Computing Center under TeraGrid/XSEDE National Science Foundation allocation (Grant No. MCB100145). The authors are grateful to John Chodera and Michael Shirts for providing valuable suggestions and guidance.

APPENDIX A: COMPUTING POINT ESTIMATORS

To compute log(Z^θ2/Z^θ1,...,Z^θm/Z^θ1), we minimize κ(0, ζ2, …, ζm) using the trust region algorithm implemented by the R package trust.40 This algorithm is globally convergent at the second order (Sec. 4.2 of Ref. 15). Below we provide formulas for evaluating κ and its gradient and Hessian, which are required by the trust region algorithm.

Arrange the pooled data into a column vector (u1,...,un)T. Let Πs be the m × m diagonal matrix with the (j, j)th element nj/n, and Qs and Ws be the n × m matrices, respectively, with (i, j)th element

Qij=eζjeθjTui,Wij=eζjeθjTuir=1mnrneζreθrTui.

Write ζ=(ζ1,...,ζm)T and 1m (or 1n) as the column vector of m (or n) ones. Then

κ(ζ)=1nTnlog(QsΠs1m)+r=1mnrnζr,κζ(ζ)=ΠsWsT1nn+Πs1m,2κζζT(ζ)=1nΠsWsTWsΠs+diagΠsWsT1nn,

where diag(c) is the diagonal matrix with (j, j) element cj for a vector c = (c1, …, cm). The gradient (or Hessian) of κ(0, ζ2, …, ζm) is formed by deleting the first element (or the first row and column) from that of κ(ζ).

APPENDIX B: COMPUTING VARIANCE MATRICES

Suppose that Z^θ/Z^θ1 is computed for k values, θm + 1, …, θm + k, of θ. Let R be the n × m matrix with (i, j)th element (nj/n)−1 if ui is sampled from Fθj and 0 otherwise, and W be the n × (m + k) matrix with (i, j)th element

Wij=(Z^θj/Z^θ1)1eθjTuir=1mnrn(Z^θr/Z^θ1)1eθrTui.

Let Im + k be the identity matrix of size m + k, 0(m + k) × k be the (m + k) × k matrix of zeros, and

O=1nWTW,B=OsΠs,0(m+k)×kIm+k,D=1nWTR,A=ODΠsDT,

where Os is the (m + k) × m matrix consisting of the first m columns in O. The (j, r)th element of D is the sample average of (i, r)th elements of W for i = 1, …, n such that ui is sampled from Fθj. The asymptotic variance matrix of log(Z^θ2/Z^θ1,...,Z^θm/Z^θ1, Z^θm+1/Z^θ1,...Z^θm+k/Z^θ1) can be consistently estimated by

1nB(1)1A(1)B(1)T1, (B1)

where A(1) and B(1) are formed by deleting the first row and column from A and B. Alternatively, formula B1 can be used with A replaced by OOsΠsOsT. The resulting formula does not require the use of the information about which observation is sampled from which distribution.

Suppose that 〈hθ is estimated for θ = θ1, …, θm, θm + 1, …, θm + k. Write formula 22 as the ratio Z^θh/Z^θ, where

Z^θh=j=1mi=1njh(uji)r=1mnrZ^θr1e(θθr)Tuji.

Redefine W as the n × (m + k) matrix with (i, j)th element

Wij=eθjTuir=1mnrn(Z^θr/Z^θ1)1eθrTui.

Let Wh be the n × (m + k) matrix with (i, j)th element

Wijh=h(ui)eθjTuir=1mnrn(Z^θr/Z^θ1)1eθrTui.

Now replace W by (W, Wh) throughout and redefine

B={OsC1ΠsC1,0(2m+2k)×(m+2k)}I2m+2k,

where Os is the (2m + 2k) × m matrix consisting of the first m columns in O, and C is the (m + k) × (m + k) diagonal matrix with (j, j)th element Z^θj/Z^θ1. The asymptotic variance matrix of (Z^θ2/Z^θ1,...,Z^θm+k/Z^θ1,Z^θ1h/Z^θ1,...,Z^θm+kh/Z^θ1) can be consistently estimated by formula B1, which is denoted by V. The asymptotic variance matrix of (Z^θ1h/Z^θ1,...,Z^θm+kh/Z^θm+k) can be consistently estimated by

C1(Ch,Im+k)002m+2k1T02m+2k1VChIm+kC1,

where 02m + 2k − 1 is the column vector of 2m + 2k − 1 zeros and Ch is the (m + k) × (m + k) diagonal matrix with (j, j)th element Z^θjh/Z^θj.

References

  1. Ferrenberg A. M. and Swendsen R. H., “Optimized Monte Carlo data analysis,” Phys. Rev. Lett. 63, 1195–1198 (1989). 10.1103/PhysRevLett.63.1195 [DOI] [PubMed] [Google Scholar]
  2. Gallicchio E., Andrec M., Felts A. K., and Levy R. M., “Temperature weighted histogram analysis method, replica exchange, and transition paths,” J. Phys. Chem. B 109, 6722–6731 (2005). 10.1021/jp045294f [DOI] [PubMed] [Google Scholar]
  3. Free Energy Calculations. Theory and Applications in Chemistry and Biology, Springer Series in Chemical Physics, edited by Chipot C. and Pohorille A. (Springer, Berlin/Heidelberg, 2007). [Google Scholar]
  4. Newman M. E. J. and Barkema G. T., Monte Carlo Methods in Statistical Physics (Oxford University Press, New York, 1999). [Google Scholar]
  5. Bartels C. and Karplus M., “Multidimensional adaptive umbrella sampling: Application to main chain and side chain peptide conformations,” J. Comput. Chem. 18, 1450–1462 (1997). [DOI] [Google Scholar]
  6. Zhu F. and Hummer G., “Convergence and error estimation in free energy calculations using the weighted histogram analysis method,” J. Comput. Chem. (in press). [DOI] [PMC free article] [PubMed]
  7. Geyer C. J., “Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo,” Technical report, University of Minnesota, School of Statistics, 1994.
  8. Meng X.-L. and Wong W. H., “Simulating ratios of normalizing constants via a simple identity: A theoretical explanation,” Stat. Sin. 6, 831–860 (1996). [Google Scholar]
  9. Kong A., McCullagh P., Meng X.-L., Nicolae D., and Tan Z., “A theory of statistical models for Monte Carlo integration,” J R. Stat. Soc. Ser. B (Stat. Methodol.) 65, 585–618 (2003). 10.1111/1467-9868.00404 [DOI] [Google Scholar]
  10. Tan Z., “On a likelihood approach for Monte Carlo integration,” J. Am. Stat. Assoc. 99(468), 1027–1036 (2004). 10.1198/016214504000001664 [DOI] [Google Scholar]
  11. Shirts M. R. and Chodera J. D., “Statistically optimal analysis of samples from multiple equilibrium states,” J. Chem. Phys. 129(12), 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bennett C. H., “Efficient estimation of free energy differences from Monte Carlo data,” J. Comput. Phys. 22(2), 245–268 (1976). 10.1016/0021-9991(76)90078-4 [DOI] [Google Scholar]
  13. Lu N., Singh J. K., and Kofke D. A., “Appropriate methods to combine forward and reverse free-energy perturbation averages,” J. Chem. Phys. 118(7), 2977–2984 (2003). 10.1063/1.1537241 [DOI] [Google Scholar]
  14. Billingsley P., Probability and Measure (Wiley, New York, 1995). [Google Scholar]
  15. Nocedal J. and Wright S. J., Numerical Optimization (Springer-Verlag, New York, 1999). [Google Scholar]
  16. Gill R., Vardi Y., and Wellner J., “Large sample theory of empirical distributions in biased sampling models,” Ann. Stat. 16, 1069–1112 (1988). 10.1214/aos/1176350948 [DOI] [Google Scholar]
  17. Chernick M. R., Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd ed. (Wiley, Hoboken, NJ, 2008). [Google Scholar]
  18. Gallicchio E. and Levy R. M., “Recent theoretical and computational advances for modeling protein-ligand binding affinities,” in Advances in Protein Chemistry and Structural Biology (Academic, 2011), Vol. 85, pp. 27–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gilson M. K., Given J. A., Bush B. L., and McCammon J. A., “The statistical-thermodynamic basis for computation of binding affinities: A critical review,” Biophys. J. 72, 1047–1069 (1997). 10.1016/S0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Mobley D. L. and Dill K. A., “Binding of small-molecule ligands to proteins: ‘what you see’ is not always ‘what you get’,” Structure (London) 17(4), 489–498 (2009). 10.1016/j.str.2009.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Deng Y. and Roux B., “Computations of standard binding free energies with molecular dynamics simulations,” J. Phys. Chem. B 113(8), 2234–2246 (2009). 10.1021/jp807701h [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Michel J. and Essex J. W., “Prediction of protein-ligand binding affinity by free energy simulations: assumptions, pitfalls, and expectations,” J. Comput.-Aided Mol. Des. 24(8), 639–658 (2010). 10.1007/s10822-010-9363-3 [DOI] [PubMed] [Google Scholar]
  23. Chodera J. D., Mobley D. L., Shirts M. R., Dixon R. W., Branson K., and Pande V. S., “Alchemical free energy methods for drug discovery: Progress and challenges,” Curr. Opin. Struct. Biol. 21, 150–160 (2011). 10.1016/j.sbi.2011.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Shirts M. R. and Pande V. S., “Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration,” J. Chem. Phys. 122(14), 144107 (2005). 10.1063/1.1873592 [DOI] [PubMed] [Google Scholar]
  25. Pohorille A., Jarzynski C., and Chipot C., “Good practices in free-energy calculations,” J. Phys. Chem. B 114(32), 10235–10253 (2010). 10.1021/jp102971x [DOI] [PubMed] [Google Scholar]
  26. Steinbrecher T., Mobley D. L., and Case D. A., “Nonlinear scaling schemes for Lennard-Jones interactions in free energy calculations,” J. Chem. Phys. 127(21), 214108 (2007). 10.1063/1.2799191 [DOI] [PubMed] [Google Scholar]
  27. Steinbrecher T., Joung I., and Case D. A., “Soft-core potentials in thermodynamic integration: Comparing one- and two-step transformations,” J. Comput. Chem. 32(15), 3253–3263 (2011). 10.1002/jcc.21909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Buelens F. P. and Grubmüller H., “Linear-scaling soft-core scheme for alchemical free energy calculations,” J. Comput. Chem. 33(1), 25–33 (2012). 10.1002/jcc.21938 [DOI] [PubMed] [Google Scholar]
  29. Gallicchio E., Lapelosa M., and Levy R. M., “Binding energy distribution analysis method (BEDAM) for estimation of protein-ligand binding affinities,” J. Chem. Theory Comput. 6(9), 2961–2977 (2010). 10.1021/ct1002913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lapelosa M., Gallicchio E., and Levy R. M., “Conformational transitions and convergence of absolute binding free energy calculations,” J. Chem. Theory Comput. 8, 47–60 (2012). 10.1021/ct200684b [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kumar S., Bouzida D., Swendsen R. H., Kollman P. A., and Rosenberg J. M., “The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method,” J. Comput. Chem. 13, 1011–1021 (1992). 10.1002/jcc.540130812 [DOI] [Google Scholar]
  32. Gallicchio E. and Levy R. M., “Advances in all atom sampling methods for modeling protein-ligand binding affinities,” Curr. Opin. Struct. Biol. 21(2), 161–166 (2011). 10.1016/j.sbi.2011.01.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Christ C. D., Mark A. E., and van Gunsteren W. F., “Basic ingredients of free energy calculations: a review,” J. Comput. Chem. 31(8), 1569–1582 (2010). 10.1002/jcc.21450 [DOI] [PubMed] [Google Scholar]
  34. Gallicchio E. and Levy R. M., “Prediction of sampl3 host-guest affinities with the binding energy distribution analysis method (BEDAM),” J. Comp.-Aided Mol. Des. (in press). 10.1007/s10822-012-9552-3 [DOI] [PMC free article] [PubMed]
  35. Paliwal H. and Shirts M. R., “A benchmark test set for alchemical free energy transformations and its use to quantify error in common free energy methods,” J. Chem. Theory Comput. 7(12), 4115–4134 (2011). 10.1021/ct2003995 [DOI] [PubMed] [Google Scholar]
  36. Janke W., “Statistical analysis of simulations: Data correlations and error estimation,” in Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms (John von Neumann Institute for Computing, Jülich, Germany, 2002), pp. 423–445. [Google Scholar]
  37. Holt D. A., Luengo J. I., Yamashita D. S., Oh H. J., Konialian A. L., Yen H. K., Rozamus L. W., Brandt M., and Bossard M. J., “Design, synthesis, and kinetic evaluation of high-affinity fkbp ligands and the x-ray crystal structures of their complexes with fkbp12,” J. Am. Chem. Soc. 115(22), 9925–9938 (1993). 10.1021/ja00075a008 [DOI] [Google Scholar]
  38. Wang J., Deng Y., and Roux B., “Absolute binding free energy calculations using molecular dynamics simulations with restraining potentials,” Biophys. J. 91(8), 2798–2814 (2006). 10.1529/biophysj.106.084301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Fujitani H., Tanida Y., Ito M., Guha J., D Snow C., Shirts M. R., Sorin E. J., and Pande V. S., “Direct calculation of the binding free energies of fkbp ligands,” J. Chem. Phys. 123(8), 084108 (2005). 10.1063/1.1999637 [DOI] [PubMed] [Google Scholar]
  40. Geyer C. J., Trust region optimization, R package 0.1-2., 2009, see http://www.stat.umn.edu/geyer/trust/.
  41. Allen M. P. and Tildesley D. J., Computer Simulation of Liquids (Oxford University Press, New York, 1993). [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES