Bayesian Multistate Bennett Acceptance Ratio Methods

Xinqiang Ding

doi:10.1021/acs.jctc.3c01212

. 2024 Feb 22;20(5):1878–1888. doi: 10.1021/acs.jctc.3c01212

Bayesian Multistate Bennett Acceptance Ratio Methods

Xinqiang Ding ^1,^*

PMCID: PMC10938511 PMID: 38385533

Abstract

The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing the free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integration of configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR’s result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using nonuniform prior distributions. As an example, we show that by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR’s widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.

1. Introduction

Computing the free energies of thermodynamic states is a central problem in computational chemistry and physics. It has wide-ranging applications including computing protein–ligand binding affinities,¹ predicting molecular solubilities,² and estimating phase equilibria,³⁻⁵ among other tasks. For states whose free energies are not analytically tractable, their free energies are often estimated using numerical methods.⁶ These methods typically involve sampling configurations from states of interest and subsequently computing their free energies based on sampled configurations. In this work, we focus on the second step of estimating free energies, assuming that equilibrium configurations have been sampled using Monte Carlo sampling or molecular dynamics.

The multistate Bennett acceptance ratio (MBAR) method⁷ is a common technique for estimating free energies given sampled configurations. Equivalent formulations of the MBAR method were also developed in different contexts.⁸⁻¹⁰ For the purpose of this study, we refer to this method and its equivalent formulations as MBAR. The MBAR method not only offers an estimate of free energies but also provides the statistical uncertainty associated with the estimate. In situations where a large number of configurations are available, the MBAR estimator is unbiased and has the smallest variance among estimators reliant on sampled configurations.^7,9 However, properties of the MBAR estimator and their associated uncertainty estimate remain largely unexplored when the number of configurations is small. Furthermore, in such scenarios, it becomes desirable to incorporate prior knowledge into the estimation procedure.

A systematic approach of integrating prior knowledge into an estimation procedure is Bayesian inference.¹¹ Bayesian inference treats unknown quantities (free energies in this case) as random variables and incorporates prior knowledge into the estimation procedure by employing prior distributions and the Bayes’ theorem. In terms of free energy estimation, prior knowledge could come from previous simulations, experiments, or physical knowledge of a system. A common instance of physical prior knowledge of free energies is that free energy surfaces along a collective coordinate are usually smooth. Combining prior knowledge with observed data (configurations sampled from thermodynamic states), Bayesian inference computes the posterior distribution of unknown quantities. The posterior distribution provides both estimates of the unknown quantities and uncertainties of the estimates.

Estimating free energies using Bayesian inference was investigated in multiple studies. For instance, Stecher et al.¹² used a Gaussian process as the prior distribution over smooth free energy surfaces. The resulting posterior distribution, given configurations from umbrella sampling, was utilized to estimate the free energy surfaces and the associated uncertainty. Shirts and Ferguson¹³ parameterized free energy surfaces using splines and constructed prior distributions using a Gaussian prior on spline coefficients. Unlike these studies that primarily focused on estimating free energy surfaces from biased simulations, the works of Habeck,^14,15 Ferguson,¹⁶ and Maragakis et al.¹⁷ were aimed at estimating densities of states and free energy differences using Bayesian inference. Methods developed in these studies are direct Bayesian generalizations of the weighted histogram analysis method (WHAM) and the Bennett acceptance ratio (BAR) method.¹⁸

This work focuses on improving the accuracy of estimating free energies of discrete thermodynamic states when the number of sampled configurations is small. For this purpose, we developed a Bayesian generalization of the MBAR method, which we term BayesMBAR. With several benchmark examples, we show that when the number of configurations is small, BayesMBAR provides not only superior uncertainty estimates compared to MBAR but also more accurate estimates of free energies by incorporating prior knowledge into the estimation procedure.

2. Methods

The MBAR method is commonly understood as a set of self-consistent equations that are not amenable to the development of its Bayesian generalization. To develop a Bayesian generalization of MBAR, we first emphasize the probabilistic nature of the MBAR method. Although there are multiple statistical models from which the MBAR method can be derived, we build upon the reverse logistic regression model,¹⁰ which treats free energies as parameters and provides a likelihood for inference. To convert the reverse logistic regression model into a Bayesian model, we treat free energies as random variables and place a prior distribution on them. Then the posterior distribution of free energies is computed using Bayes’ theorem. Samples from the posterior distribution are efficiently generated using Hamiltonian Monte Carlo (HMC) methods.^19,20 These samples are used to estimate free energies and quantify the uncertainty of the estimate. Hyperparameters of the prior distribution are automatically optimized by maximizing the marginal likelihood of data (Bayesian evidence). We present the details of BayesMBAR in the following sections.

2.1. Reverse Logistic Regression Model of MBAR

Computing the free energies of thermodynamic states is closely related to computing the normalizing constants of Bayesian models. Multiple methods have been developed in statistics for estimating normalizing constants,^10,21−23 and these methods are directly applicable for estimating free energies. Here, we focus on the reverse logistic regression method proposed by Geyer¹⁰ and show that the solution of this method is equivalent to the MBAR method.

Let us assume that we aim to calculate the free energies of m thermodynamic states (up to an additive constant) by sampling their configurations. Let u_i(x), i = 1, ..., m be the reduced potential energy functions⁷ of the m states. The free energy of the ith state is defined as

where Γ is the configuration space. For the ith state, n_i uncorrelated configurations, {x_ik, k = 1, ..., n_i}, are sampled from its Boltzmann distribution p_i(x; F_i) = exp(−[u_i(x) – F_i]). Here, p_i(x; F_i) means that it is a distribution of the random variable x with the parameter F_i. We will henceforth use such a notation of separating parameters from random variables with a semicolon in probability distributions. To estimate free energies, Geyer¹⁰ proposed the following retrospective formulation. This formulation treats indices of states in an unconventional manner. Let us use y_ik to denote the index of the state from which configuration x_ik is sampled.¹⁰ Apparently, y_ik = i for all i and k. Although indices of states for sampled configurations are determined in the sampling setup, they are treated as a multinomial distributed random variable with parameters π = (π₁, ..., π_m). Because n_i configurations are sampled from state i, the maximum likelihood estimate of π_i is Inline graphic , where n = ∑^m_i = 1n_i. The concatenation of state indices and configurations, (y, x), is viewed as samples from the joint distribution of p(y, x), which is defined as

for i ∈ {1, ..., m}. Following a retrospective argument, the reverse logistic regression method estimates the free energies by asking the following question. Given that a configuration x is observed, what is the probability that it is sampled from state y = i rather than from other states? Using Bayes’ theorem, we can compute this retrospective conditional probability as

where F = (F₁, ..., F_m) and log π = (log π₁, ..., log π_m). The free energies are estimated by maximizing the product of the retrospective conditional probabilities of all configurations, which is equivalent to maximizing the log-likelihood

The log-likelihood function Inline graphic (F, log π) in eq 5 depends on F and log π only through their sum ϕ = F + log π, so F and log π are not separately estimable from maximizing the log-likelihood. The solution is to substitute log π_i with the empirical estimate . Then, by setting the derivative of /∂F to zero, we obtain

for r = 1, ..., m. Inline graphic is the solution that maximizes . Equation 6 is identical to the MBAR equation and reduces to the BAR equation¹⁸ when m = 2. The technique described above is termed “reverse logistic regression” based on two primary insights. First, the log-likelihood in eq 5 bears resemblance to that found in multiclass logistic regression. Second, the primary goal of this method is to estimate F, the intercept term. This differs from traditional logistic regression, where the aim is to determine regression coefficients and predict the response variable y.

The uncertainty of the estimate Inline graphic is computed using asymptotic analysis of the log-likelihood function (F, log π) in eq 5. Because the log-likelihood function (F, log π) depends on F and log π only through their sum ϕ = F + log π, the observed Fisher information matrix computed using the log-likelihood function can only be used to compute the asymptotic covariance of the sum ϕ. The observed Fisher information matrix at Inline graphic is

where p_ik is a column vector of Inline graphic ..., and diag(p_ik) is a diagonal matrix with p_ik as its diagonal elements. The asymptotic covariance matrix of is the Moore–Penrose pseudoinverse of the observed Fisher information matrix, i.e., . To compute the asymptotic covariance matrix of , we assume that and log are asymptotically independent. Then the asymptotic covariance matrix of Inline graphic can be computed as

where 1 is a column vector of m ones. The asymptotic covariance matrix in eq 8 is the same as that derived in ref (9) and is commonly used in the MBAR method.⁷ With the asymptotic covariance matrix of Inline graphic , we can compute the asymptotic variance of their differences using the identity = + – , where is the (r, s)th element of the matrix .

2.2. Bayesian MBAR

As shown above, the reverse logistic regression model formulates MBAR as a statistical model. It provides a likelihood function (eq 5) for computing the MBAR estimate of Inline graphic and the associated asymptotic covariance. Based on this formulation, we developed BayesMBAR by turning the reverse logistic regression into a Bayesian model. In BayesMBAR, we treat F as a random variable instead of a parameter and place a prior distribution on F. The posterior distribution of F is then used to estimate F. Let us represent the prior distribution of F as p(F; θ), where θ is the parameter of the prior distribution and is often called hyperparameters. Borrowing from the reverse logistic regression, we use the retrospective conditional probability in eq 4 as the likelihood function, i.e., p(y|x, F) = p(y|x; F, log π). We note that F is treated as a random variable in p(y|x, F), whereas it is a parameter in p(y|x; F, log π). The log π term in the likelihood function is substituted with the maximum likelihood estimate log Inline graphic . With these definitions, the posterior distribution of F given sampled configurations and state index is

where Y = {y_ik: i = 1, ..., m; k = 1, ..., n_i} and X = {x_ik: i = 1, ..., m; k = 1, ..., n_i}. Using the posterior distribution in eq 9, we can compute various quantities of interest, such as the posterior mode and the posterior mean, both of which can serve as point estimates of Inline graphic . In addition, we can use the posterior covariance matrix as an estimate of the uncertainty for . However, to carry out these calculations, we need to address the following questions that commonly arise in Bayesian inference.

2.3. Choosing the Prior Distribution

To fully specify the BayesMBAR model, we must choose a prior distribution for F. We could use information about F from previous simulations or experiments to construct the prior distribution, if such information is available. For example, the prior distribution of protein–ligand binding free energies could be constructed using free energies computed with fast but less accurate methods such as docking. The information could also come from the binding free energies of similar protein–ligand systems. Turning such information into a prior distribution will depend on domain experts’ experience and likely vary from case to case. In this work, we focus on scenarios where such information is not available. In this scenario, we propose to use two types of distributions as the prior: the uniform distribution and the Gaussian distribution.

2.3.1. Using Uniform Distributions as the Prior

As the MBAR method has proven to be a highly effective method for estimating free energies in many applications, a conservative strategy for choosing the prior distribution is to minimize the deviation of BayesMBAR from MBAR. Such a strategy leads to using the uniform distribution as the prior distribution because it makes the maximum a posteriori probability (MAP) estimate of BayesMBAR the same as the MBAR estimate. Specifically, if we set the prior distribution of F to be the uniform distribution, i.e., p(F; θ) ∝ constant, the posterior distribution of F in eq 9 becomes the same as the likelihood function. Therefore, maximizing the posterior distribution of F is equivalent to maximizing the log-likelihood function in eq 5.

While the MBAR estimate was recovered with its MAP estimate, BayesMBAR with a uniform prior distribution provides two advantages. First, in addition to the MAP estimate, BayesMBAR also offers the posterior mean as an alternative point estimate of F. Second, BayesMBAR produces a posterior distribution of F, which can be used to estimate the uncertainty of the estimate. As shown in the Results section, the uncertainty estimate from BayesMBAR is more accurate than that from MBAR when the number of configurations is small.

2.3.2. Using Gaussian Distributions as the Prior

In many applications, we are interested in computing free energies along collective coordinates such as distances, angles, or alchemical parameters. In such cases, we often have the prior knowledge that the free energy surface is a smooth function F(λ) of the collective coordinate λ. A widely used approach to encode such knowledge into Bayesian inference is to use a Gaussian process²⁴ as the prior distribution. A Gaussian process is a collection of random variables, any finite number of which has a joint Gaussian distribution. A Gaussian process is fully specified by its mean function μ(λ) and covariance function k(λ, λ′). The value of the covariance function k(λ, λ′) is the covariance between F(λ) and F(λ′). The covariance function is often designed to encode the smoothness of the function. Specifically, the covariance k(λ, λ′) between F(λ) and F(λ′) increases as λ and λ′ become closer. When the mean function is smooth and a covariance function such as the squared exponential covariance function is used, the Gaussian process is a probability distribution of smooth functions.

In BayesMBAR, we focus on estimating free energies at discrete values of the collective coordinate, (λ₁, ..., λ_m), instead of the whole free energy surface. Projecting the Gaussian process over free energy surfaces onto discrete values of λ, we obtain as the prior distribution of F = [F(λ₁), ..., F(λ_m)] a multivariate Gaussian distribution with the mean vector μ = [μ(λ₁), ..., μ(λ_m)] and the covariance matrix Σ. The (i, j)th element of Σ is computed as Σ_ij = k(λ_i, λ_j) and represents the covariance between F(λ_i) and F(λ_j). As in many applications of Gaussian processes, we set the mean function to be a constant function, i.e., μ = (c, ..., c), where c is a hyperparameter to be optimized. The choice of the covariance function is a key ingredient of constructing the prior distribution, as it encodes our assumption about the free energy surface’s smoothness.²⁴ Several well-studied covariance functions are suitable for use in BayesMBAR. In this study, we use the squared exponential covariance function as an example, noting that other types of covariance functions can be used as well. The squared exponential covariance function is defined as

where r = |λ – λ′| and the variance scale σ and the length scale l are hyperparameters to be optimized. Every function F(λ) from such Gaussian processes has infinitely many derivatives and is very smooth. The hyperparameters σ and l control the variance and the length scale of function F(λ), respectively. The collective coordinate λ is not restricted to scalars and can be a vector. When λ is a vector, the length scale l could be either a scalar or a vector of the same dimension as λ, the latter of which means that the length scale can be different for different dimensions in λ. When a vector length scale is used, the squared exponential covariance function is defined as

where L is a diagonal matrix with values of l as its diagonal elements. To allow the variances to be different for different entries in F = [F(λ₁), ..., F(λ_m)], we add a constant Inline graphic to the variance of F(λ_i), i.e., . With the mean function and the covariance function defined, the prior distribution of F is fully specified as a multivariate Gaussian distribution of

where μ_θ and Σ_θ are the mean vector and the covariance matrix, respectively. They depend on the hyperparameters Inline graphic , where . The hyperparameters θ are optimized by maximizing the Bayesian evidence, as described in the following sections.

2.4. Computing Posterior Statistics

With the prior distribution of F defined as above, the posterior distribution defined in eq 9 contains rich information about F. Specifically, the MAP or the posterior mean can be used as point estimates of F and the posterior covariance matrix can be used to compute the uncertainty of the estimate.

2.4.1. Computing the MAP Estimate

The MAP estimate of F is the value that maximizes the posterior distribution density, i.e., Inline graphic . When the prior distribution is chosen to be either uniform distributions or Gaussian distributions, log p(F|Y, X) is a concave function of F. This means that the MAP estimate is the unique global maximum of the posterior distribution density and can be efficiently computed using standard optimization algorithms. In BayesMBAR, we implemented the L-BFGS-B algorithm²⁵ and the Newton’s method to compute the MAP estimate.

2.4.2. Computing the Mean and the Covariance Matrix of the Posterior Distribution

Computing the posterior mean and the covariance matrix is more challenging than computing the MAP estimate. It involves computing an integral with respect to the posterior distribution density. When there are only two states, we compute the posterior mean and the covariance matrix by numerical integration. When there are more than two states, numerical integration is not feasible. In this case, we estimate the posterior mean and covariance matrix by sampling from the posterior distribution using the No-U-Turn Sampler (NUTS).²⁰ The NUTS is a variant of HMC methods¹⁹ and has the advantage of automatically tuning the step size and the number of steps. The NUTS sampler has been shown to be highly efficient in sampling from high-dimensional distributions for Bayesian inference problems.^26,27 In BayesMBAR, an extra factor that further improves the efficiency of the NUTS sampler is that the posterior distribution density is a concave function, which means that the sampler does not need to cross low-density (high-energy) regions during sampling. In BayesMBAR, we use the NUTS sampler as implemented in the Python package, BlackJAX.²⁸

2.5. Optimizing the Hyperparameters

When Gaussian distributions with a specific covariance function are used as the prior distribution of F (eq 12), we need to make decisions about the values of hyperparameters. Such decisions are referred to as model selection problems in Bayesian inference, and several principles have been proposed and used in practice. In BayesMBAR, we use the Bayesian model selection principle, which is to choose the model that maximizes the marginal likelihood of the data. The marginal likelihood of the data is also called the Bayesian evidence and is defined as

Because the Bayesian evidence is a multidimensional integral, computing it with numerical integration is not feasible. In BayesMBAR, we use ideas from variational inference²⁹ and Monte Carlo integration⁹ to approximate it and optimize the hyperparameters.

We introduce a variational distribution q(F) and use the evidence lower bound (ELBO) of the marginal likelihood as the objective function for optimizing the hyperparameters. Specifically, the ELBO is defined as

It is straightforward to show that Inline graphic = – ≤ , where D_KL(q∥p(F|Y, X; θ)] is the Kullback–Leibler divergence between q(F) and p(F|Y, X; θ). Therefore, the ELBO is a lower bound of the log marginal likelihood of data, and the gap between them is the Kullback–Leibler divergence between q(F) and p(F|Y, X; θ). This suggests that to make the ELBO a good approximation of the log marginal likelihood, we should choose q(F) that is close to p(F|Y, X; θ).

Although we could in principle use p(F|Y, X; θ) as the variational distribution q(F) (then the ELBO would be equal to the log marginal likelihood), it is not practical because computing the gradient of the ELBO with respect to the hyperparameters would require sampling from p(F|Y, X; θ) at every iteration of the optimization and is computationally too expensive. Instead, we choose q(F) to be a Gaussian distribution to approximate the posterior distribution based on the following observations. The posterior distribution density p(F|Y, X; θ) is equal to the product of the likelihood function p(Y|F, X) and the prior distribution p(F; θ) up to a normalization constant. The likelihood term p(Y|F, X) is a log-concave function of F and does not depend on θ, so we can approximate it using a fixed Gaussian distribution Inline graphic , where the mean μ₀ and the covariance matrix Σ₀ are computed by sampling F from p(Y|F, X) once. Because the prior distribution p(F; θ) is also a Gaussian distribution, , multiplying the fixed Gaussian distribution with the prior yields another Gaussian distribution , where μ_q and Σ_q can be analytically computed as

Therefore, we choose the proposal distribution q(F) to be the Gaussian distribution Inline graphic , where μ_q and Σ_q are computed as above and depend on θ analytically. We compute the ELBO and its gradient with respect to θ using the reparameterization trick.³⁰ Specifically, we reparameterize the proposal distribution q(F) using F = μ_q + Σ_q^1/2ε, where ε is a random variable with the standard Gaussian distribution. The ELBO can then be written as

The first term on the right hand side can be estimated by sampling ε from the standard Gaussian distribution and evaluating the log-likelihood p(Y|μ_q + Σ^1/2_qε, X). The second term can be computed analytically. The gradient of the ELBO with respect to θ is computed using automatic differentiation.³¹

3. Results

3.1. Computing the Free Energy Difference between Two Harmonic Oscillators

We first tested the performance of BayesMBAR by computing the free energy difference between two harmonic oscillators. In this case, because there are only two states, BayesMBAR reduces to a Bayesian generalization of the BAR method, and we use BayesBAR to refer to it. The two harmonic oscillators are defined by the potential energy functions of Inline graphic and , where k₁ and k₂ are the force constants and u₁ and u₂ are in the unit of k_BT. The objective is to compute the free energy difference between them, i.e., ΔF = F₂ – F₁ and for i = 1 and 2.

We first drew n₁ and n₂ samples from the Boltzmann distribution of u₁ and u₂, respectively. Then we use BayesBAR with the uniform prior distribution to estimate the free energy difference. To benchmark BayesBAR, we also computed the free energy difference using the BAR method and compared the results from both methods with the true value (Table 1). The force constants are set to be k₁ = 25 and k₂ = 36. The numbers of samples, n₁ and n₂, are set equal and range from 10 to 5000. For each sample size, we repeated the calculation for K = 100 times and computed the root mean square error (RMSE), bias, and standard deviation (SD) of the estimates. The RMSE is computed as Inline graphic , where is the estimate from the kth repeat and ΔF is the true value. The bias is computed as Δ – F, where , and the SD is computed as .

Table 1. Free Energy Difference between the Two Harmonic Oscillators (k₁ = 25, k₂ = 36).

	RMSE		bias		SD		estimate of SD
n₁(=n₂)	MAP^a	mean^b	MAP	mean	MAP	mean	BayesBAR	asymptotic	Bennett’s	bootstrap
10	2.45	2.40	0.85	0.90	2.29	2.23	4.08	39.24	1.10	1.53
13	2.53	2.47	0.87	0.92	2.38	2.29	3.55	19.69	1.11	1.47
18	1.92	1.84	0.16	0.22	1.91	1.83	3.09	11.58	1.09	1.31
28	1.72	1.65	0.44	0.48	1.66	1.58	2.58	7.06	1.07	1.14
48	1.27	1.19	0.16	0.19	1.26	1.17	1.90	2.92	1.07	1.11
99	1.34	1.23	–0.19	–0.11	1.33	1.22	1.38	1.64	0.98	0.96
304	0.83	0.79	0.00	0.03	0.83	0.79	0.80	0.81	0.74	0.70
5000	0.19	0.18	–0.00	–0.00	0.19	0.18	0.20	0.20	0.20	0.20

Open in a new tab

MAP estimate of BayesBAR (equivalent to the BAR estimate).

Posterior mean estimate of BayesBAR.

Because the uniform prior distribution is used, the MAP estimate of BayesBAR is identical to the BAR estimate. Besides the MAP estimate, BayesBAR also provides the posterior mean estimate, which is computed using numerical integration. Compared to the MAP estimate (the BAR estimate), the posterior mean estimate has a smaller RMSE. Decomposing the RMSE into bias and SD, we found that the posterior mean estimate has a larger bias but a smaller SD than the MAP estimate. The decrease in SD overcompensates the increase in bias for the posterior mean estimate, which leads to its smaller RMSE. Statistical testing shows that the differences in RMSE and bias between the MAP estimate and the posterior mean estimate are statistically significant (p-value < 0.05), while the difference in SD is not (Figures S1 and S2). Although the MAP estimate and the posterior mean estimate have different RMSEs, the difference is small, and both estimates converge to the true value as the sample size increases. This suggests that both estimates can be used interchangeably in practice.

Besides the MAP and the posterior mean estimate for ΔF, BayesBAR offers an estimate of the uncertainty (whose true values are included in the two columns beneath the label SD in Table 1) by using the posterior SD (the BayesBAR column in Table 1). For benchmarking, we also calculated the uncertainty estimate using asymptotic analysis, Bennett’s method, and the bootstrap method. Because each repeat produces an uncertainty estimate, we used the average from all K repeats as the uncertainty estimate of each method, denoted as “estimate of SD” in Table 1.

When the number of configurations is small, the asymptotic analysis significantly overestimates the uncertainty, while both Bennett’s method and the bootstrap method tend to underestimate the uncertainty. Practically, overestimating uncertainty is favored over underestimating, as the former prompts further configuration collection, whereas the latter might cause the user to stop sampling prematurely. Nevertheless, excessive overestimation is not ideal either, as it might result in gathering an unnecessarily large number of configurations. Given these considerations, BayesBAR’s uncertainty estimate overestimates the uncertainty modestly and thus is a better choice than the other methods. As the sample size increases, the uncertainty estimates from all methods converge to the true value.

The asymptotic analysis tends to overestimate uncertainty much more than BayesBAR. This is because the asymptotic analysis approximates the posterior distribution of ΔF with a Gaussian distribution centered at the MAP estimate. Such an approximation is generally accurate for a large number of configurations. However, with a smaller number of configurations, this approximation becomes imprecise, leading to considerable overestimation of uncertainty. Figure 1 provides a visual comparison, contrasting the posterior distribution of ΔF as determined by BayesBAR with the Gaussian approximation from the asymptotic analysis for an experiment where n₁ = n₂ = 18.

Probability densities of the posterior distribution (solid line) of ΔF and the approximate Gaussian distribution (dashed line) used by the asymptotic analysis for the two harmonic oscillator system with n₁ = n₂ = 18.

3.2. Computing Free Energy Differences among Three Harmonic Oscillators

We next tested the performance of BayesMBAR on a multistate system. The system consists of three harmonic oscillators with the following unitless potential energy functions: Inline graphic , , and , where k₁ = 16, k₂ = 25, and k₃ = 36. The free energy differences among the three harmonic oscillators are analytically known. Similar to the two harmonic oscillator systems, we first draw n samples from the Boltzmann distribution of each harmonic oscillator. We use BayesMBAR with the uniform prior to estimate the free energy differences by computing both the MAP estimate and the posterior mean estimate. The posterior mean estimate is computed by sampling from the posterior distribution using the NUTS sampler instead of numerical integration. Figure 2 shows the posterior distribution of the free energy differences (F₂ – F₁ and F₃ – F₁) and a subset of samples drawn from the posterior distribution in one repeat of the calculation when n = 18. As shown in Figure 2b,c, samples from the NUTS samplers decorrelate quickly and can efficiently traverse the posterior distribution.

Probability density and samples of the posterior distribution of F₂ – F₁ and F₃ – F₁ for the three harmonic oscillators with n = 18. (a) Contours are the logarithm of the posterior distribution density. Dots are a subset of samples drawn from the posterior distribution using the NUTS sampler. (b,c) First 300 samples of F₂ – F₁ and F₃ – F₁ drawn from the posterior distribution using the NUTS sampler.

For benchmarking purposes, we conducted the calculation 100 times (K = 100) for each sample size n, and we derived metrics including the RMSE, bias, and SD of the estimate (Table 2). Given the use of a uniform prior, BayesMBAR’s MAP estimate is the same as the MBAR estimate. When contrasted with the MBAR estimate, the posterior mean estimate has lower SD but higher bias. When factoring in both SD and bias, the posterior mean estimate has a smaller RMSE compared to the MBAR estimate. The differences in RMSE and bias are statistically significant when n ≤ 99 and the difference in SD is not for any sample size (Figures S3–S6). As in the case of two harmonic oscillators, the difference in RMSE between the two estimators is minimal, and both estimates converge to the correct value as sample size grows. In terms of uncertainty, BayesMBAR offers a superior estimate compared to established techniques such as asymptotic analysis or the bootstrap method, especially with limited configuration sizes. Notably, BayesMBAR’s uncertainty estimate avoids the underestimation seen with the bootstrap method. Simultaneously, compared to the asymptotic analysis, BayesMBAR’s uncertainty estimate has a more modest overestimation.

Table 2. Free Energy Differences among the Three Harmonic Oscillators (k₁ = 16, k₂ = 25, and k₃ = 36).

F₂ – F₁
	RMSE		bias		SD		estimate of SD
n	MAP	mean	MAP	mean	MAP	mean	BayesMBAR	asymptotic	bootstrap
10	1.89	1.83	0.45	0.53	1.84	1.75	2.28	5.31	1.26
13	1.84	1.76	0.46	0.52	1.78	1.68	1.93	3.20	1.19
18	1.41	1.30	–0.09	0.01	1.41	1.30	1.62	2.17	1.05
28	1.18	1.12	0.13	0.19	1.17	1.10	1.31	1.55	0.91
48	0.79	0.75	–0.04	0.01	0.79	0.75	0.97	1.00	0.83
99	0.67	0.64	–0.06	–0.04	0.66	0.64	0.69	0.70	0.63
304	0.39	0.39	–0.01	–0.00	0.39	0.39	0.40	0.40	0.40
5000	0.09	0.09	–0.00	0.00	0.09	0.09	0.10	0.10	0.10

F₃ – F₁
	RMSE		Bias		SD		estimate of SD
n	MAP	mean	MAP	mean	MAP	mean	BayesMBAR	asymptotic	bootstrap
10	2.93	2.85	0.89	1.02	2.79	2.66	4.63	28.27	2.02
13	3.26	3.18	1.25	1.35	3.01	2.87	4.16	20.74	1.86
18	2.53	2.40	0.12	0.27	2.53	2.39	3.39	10.12	1.85
28	2.28	2.20	0.62	0.73	2.20	2.08	2.87	6.35	1.56
48	1.73	1.64	0.16	0.26	1.72	1.62	2.26	3.91	1.37
99	1.53	1.43	0.36	0.42	1.49	1.37	1.58	1.84	1.21
304	0.99	0.95	–0.00	0.03	0.99	0.95	0.89	0.91	0.81
5000	0.23	0.22	0.01	0.01	0.23	0.22	0.22	0.23	0.22

Open in a new tab

3.3. Computing the Hydration Free Energy of Phenol

We further tested the performance of BayesMBAR on a real system that involves several collective variables. Specifically, we use BayesMBAR to compute the hydration free energy of phenol by using an alchemical approach. In this approach, we modify the nonbonded interactions between phenol and water using an alchemical variable λ = (λ_elec, λ_vdw), where λ_elec and λ_vdw are alchemical variables for the electrostatic and the van der Waals interactions, respectively. The electrostatic interaction is linearly scaled by 1 – λ_elec as

where q_i and q_j are the charges of atom i and j, respectively; and r_ij is the distance between them. The van der Waals interaction is modified by λ_vdw using a soft-core Lennard-Jones potential³² as

where ε_ij and σ_ij are the Lennard-Jones parameters of atoms i and j; and α = 0.5. When (λ_elec, λ_vdw) = (0, 0), the nonbonded interactions between phenol and water are turned on and phenol is in the water phase. When (λ_elec, λ_vdw) = (1, 1), the nonbonded interactions are turned off and phenol is in the vacuum phase. The hydration free energy of phenol is equal to the free energy difference between the two states λ = (0, 0) and λ = (1, 1). To compute the free energy difference, we introduce seven intermediate states through which λ_elec and λ_vdw are gradually changed from (0, 0) to (1, 1). The values of λ_elec and λ_vdw for the intermediate and end states are included in Table 3.

Table 3. Values of λ_elec and λ_vdw Used in Computing the Hydration Free Energy of Phenol.

λ index	1	2	3	4	5	6	7	8	9
λ_elec	0	0.33	0.66	1	1	1	1	1	1
λ_vdw	0	0	0	0	0.2	0.4	0.6	0.8	1

Open in a new tab

We use the general AMBER force field³³ for phenol and the TIP3P model³⁴ for water. The particle mesh Ewald (PME) method³⁵ is used to compute the electrostatic interactions. Lennard-Jones interactions are truncated at 12 Å, with a switching function starting at 10 Å. We use OpenMM³⁶ to run NPT molecular dynamics simulations for all states at 300 K and 1 atm using the middle scheme³⁷ Langevin integrator with a friction coefficient of 1 ps^–1 and the Monte Carlo barostat³⁸ with a frequency of 25 steps. Each simulation is run for 20 ns with a time step of 2 fs. Configurations are saved every 20 ps to ensure that they are uncorrelated³⁹ (Figure S7), as verified in the Supporting Information. We use BayesMBAR to compute the free energy differences with the n configurations from each state. We repeated the calculation for K = 100 times. Because the ground truth hydration free energy is not known analytically, we use as the benchmark the MBAR estimate computed using all configurations sampled from all repeats, i.e., 100,000 configurations from each state.

3.3.1. Uniform Prior

We first tested the performance of BayesMBAR with the uniform prior using different numbers of configurations. Here, the n configurations from each state are not randomly sampled from saved configurations during the 20 ns of simulation. Instead, we use the first n configurations to mimic the situation in production calculations, where configurations are saved sequentially. The results are summarized in Table 4. Compared to the MAP estimate (the MBAR estimate), the posterior mean estimate has a smaller SD but a larger bias, as observed in the previous harmonic oscillator systems. In terms of RMSE, the posterior mean estimate has a larger RMSE than the MAP estimate, which is different from that in the harmonic oscillator systems. However, the differences in RMSE and bias between the MAP estimate and the posterior mean are not statistically significant except when n = 5000 (Figure S8) and the difference in SD is not statistically significant for any sample size (Figure S8). This further suggests that both estimates can be used interchangeably in practice. We also compared the uncertainty estimate among BayesMBAR, asymptotic analysis, and the bootstrap method. The BayesMBAR estimate of the uncertainty is closer to the true value than the asymptotic analysis while not underestimating the uncertainty as the bootstrap method does when the number of configurations is small. In addition to the free energy difference between the two end states, we also compared the uncertainty estimates for free energies of all states (Figure 3). When the number of configurations is small (n = 5), the uncertainty estimates from BayesMBAR are closer to the true uncertainty than the asymptotic analysis and do not underestimate the uncertainty as the bootstrap method does.

Table 4. Hydration Free Energy (in the Unit of k_BT) of Phenol Computed Using BayesMBAR with the Uniform Prior.

	RMSE		bias		SD		estimate of SD
n₁(=n₂)	MAP	mean	MAP	mean	MAP	mean	BayesMBAR	asymptotic	bootstrap
5	2.75	2.81	–0.73	–1.02	2.65	2.62	2.89	4.48	2.33
7	2.53	2.56	–0.60	–0.86	2.45	2.42	2.48	3.36	1.99
12	1.94	1.96	–0.23	–0.42	1.92	1.91	1.88	2.15	1.59
25	1.34	1.33	–0.03	–0.15	1.34	1.33	1.26	1.28	1.16
75	0.69	0.69	0.01	–0.04	0.69	0.69	0.73	0.73	0.71
1000	0.19	0.19	–0.00	–0.01	0.19	0.19	0.20	0.20	0.20

Open in a new tab

Free energy estimates of all states for computing the hydration free energy of phenol. SD_F (BayesMBAR), SD_F (asymptotic), and SD_F (bootstrap) are the average of the uncertainty estimates using BayesMBAR, the asymptotic analysis, and the bootstrap method, respectively, when n = 5. SD_F (true) is the true uncertainty when n = 5. F_ref is the MBAR estimate computed using all configurations sampled from all repeats.

3.3.2. Normal Prior

The free energy surface along the alchemical variable λ is expected to be smooth; therefore, we can use a normal prior distribution in BayesMBAR to encode this prior knowledge. The squared exponential covariance function is defined as

where σ² is the variance and l_elec and l_vdw are the length scales for λ_elec and λ_vdw, respectively.

The hyperparameters in the covariance functions and the mean parameter of the prior distribution are optimized by maximizing the Bayesian evidence. After optimizing the hyperparameters, we use the MAP and the posterior mean estimators to estimate the free energy difference between the two end states and compare them to the MAP estimator with the uniform prior distribution (Table 5), which is identical with the MBAR estimator.

Table 5. Comparison of the Performance of BayesMBAR with the Uniform Prior (MAP Estimate) and the Normal Prior (MAP and Posterior Mean Estimates) for Computing the Hydration Free Energy of Phenol.

	RMSE			bias			SD
	uniform	normal		uniform	normal		uniform	normal
n₁(=n₂)		MAP	mean		MAP	mean		MAP	mean
5	2.75	2.18	2.15	–0.73	0.69	0.57	2.65	2.07	2.07
7	2.53	1.97	1.96	–0.60	0.66	0.55	2.45	1.86	1.88
12	1.94	1.60	1.61	–0.23	0.56	0.49	1.92	1.50	1.53
25	1.34	1.22	1.22	–0.03	0.43	0.38	1.34	1.15	1.16
75	0.69	0.75	0.75	0.01	0.11	0.09	0.69	0.74	0.74
1000	0.19	0.20	0.20	–0.00	–0.02	–0.02	0.19	0.19	0.19

Open in a new tab

By incorporating the prior knowledge of the smoothness of the free energy surface, the BayesMBAR estimator with a normal prior distribution has a smaller RMSE than the MBAR estimator, especially when the number of configurations is small. As the number of configurations increases, the BayesMBAR estimator converges to the MBAR estimate. When the number of configurations is small, the information about free energy from data is limited, and the prior knowledge of the free energy surface excludes unlikely results and helps improve the estimate. When the number of configurations is large, the inference is dominated by the data, and the prior knowledge becomes less important because the prior knowledge used here is a relatively weak prior. This behavior is desirable because prior knowledge should be used when data alone are not sufficient to make a good inference and at the same time not bias the inference when data are sufficient.

4. Conclusion and Discussion

In this study, we developed BayesMBAR, a Bayesian generalization of the MBAR method based on the reverse logistic regression formulation of MBAR. BayesMBAR provides a posterior distribution of free energy, which is used to estimate free energies and compute the estimation uncertainty. When uniform distributions are used as the prior, the MAP estimate of BayesMBAR recovers the MBAR estimate. Besides the MAP estimate, BayesMBAR provides the posterior mean estimate of the free energy. Compared to the MAP estimate, the posterior mean estimate tends to have a larger bias but a smaller SD. The reason for such an observation could be that the posterior mean estimate takes into account the whole spread of the posterior distribution, which makes it more stable over repeated calculations and, at the same time, makes it more susceptible to extreme values. The difference in accuracy between the MAP estimate and the posterior mean estimate is small, and both estimates converge to the true value as the number of configurations increases. Therefore, both estimates can be used interchangeably in practice. In BayesMBAR, the estimation uncertainty is computed using the posterior SD. All benchmark systems in this study show that such uncertainty estimate from BayesMBAR is better than that from the asymptotic analysis and the Bennett’s method, especially when the number of configurations is small.

As a Bayesian method, BayesMBAR is able to incorporate prior knowledge about the free energy into the estimation. We demonstrated this feature by using a normal prior distribution to encode prior knowledge of the smoothness of free energy surfaces. All hyperparameters in the prior distribution are automatically optimized by maximizing the Bayesian evidence. By using such prior knowledge, BayesMBAR provides a more accurate estimate than the MBAR method when the number of configurations is small and converges to the MBAR estimate when the number of configurations is large.

To facilitate the adoption of BayesMBAR, we provide an open-source Python package at https://github.com/DingGroup/BayesMBAR. In cases where prior knowledge is not available, we recommend using BayesMBAR with a uniform distribution as the prior distribution. It takes the same input as MBAR and can thus be easily integrated into existing workflows, with the benefit of providing better uncertainty estimates. For computing free energy differences among points on a smooth free energy surface, we recommend using BayesMBAR with normal distributions as the prior. Because the hyperparameters in the normal prior distribution are automatically optimized, the extra input to BayesMBAR from the user, compared to MBAR, is the value of the collective variables associated with each thermodynamic state. In terms of covariance functions, although we used the squared exponential covariance function in this study, other covariance functions such as the Matérn covariance function and the rational quadratic covariance function²⁴ could also be used. The choice of covariance functions could also be informed by comparing the Bayesian evidence after optimizing their hyperparameters.

Because BayesMBAR needs to sample from the posterior distribution, it is computationally more expensive than MBAR that uses the asymptotic analysis to compute the estimation uncertainty. Table S1 shows the running time required by MBAR and BayesMBAR for computing the hydration free energy of phenol when n = 1000 configurations are used from each alchemical state. Both MBAR and BayesMBAR were run on a graphic processing unit, and the FastMBAR⁴⁰ implementation was used for MBAR calculations. Considering that most of the computational cost for calculating free energies lies in sampling configurations from the equilibrium distribution, and BayesMBAR provides better uncertainty estimates and more accurate free energy estimates when prior knowledge is available, we believe that the extra computational cost is worthwhile in practice.

BayesMBAR could also be extended to incorporate other types of prior knowledge about free energy such as knowledge from other calculations or experimental data. For example, when computing relative binding free energies of two ligands, A and B, with a protein using alchemical free energy methods, results from cheaper calculations such as docking or molecular mechanics/generalized Born surface area (MM/GBSA) calculations could be used as prior knowledge. Specifically, the relative binding free energy is often calculated as Inline graphic , where and are free energy differences of changing ligand A to B alchemically in the bound and unbound states, respectively. Computing is often much cheaper than computing because the unbound state is in solvent and thus does not require simulations with the protein. Therefore, can be efficiently computed to a high precision. On the other hand, docking or MM/GBSA calculations could provide rough estimates on the absolute binding free energies of ligands A and B, whose difference provides an estimate μ of Inline graphic and associated uncertainty σ, i.e., has a normal distribution with mean μ and SD σ. Combining with the distribution of from docking or MM/GBSA calculations, we could construct a normal distribution on and use it as the prior distribution when computing with BayesMBAR. Such estimated Inline graphic using BayesMBAR could then be combined with to compute . Rough estimates of could also come from experimental data such as qualitative competitive binding assays that only indicate whether ligand A binds better than ligand B. In this case, the prior distribution of could be a bounded uniform distribution with a lower or upper bound of 0. Similarly, combining such prior knowledge on Inline graphic with , we could construct a bounded uniform distribution on and use it as the prior distribution for computing with BayesMBAR. We believe that such extension of BayesMBAR could be useful in practice and will be explored in future studies.

Acknowledgments

The author thanks the Tufts University High Performance Compute Cluster that was utilized for the research reported in this paper.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01212.

Statistical tests comparing the MAP estimator and the posterior mean estimator of BayesMBAR using the uniform prior distribution for computing free energy differences, saved configurations being uncorrelated in computing the hydration free energy of phenol, the standard error of estimating the ensemble mean potential energy using saved configurations sampled from alchemical states of phenol, and running time of MBAR and BayesMBAR (PDF)

The author declares no competing financial interest.

Special Issue

Published as part of Journal of Chemical Theory and Computationvirtual special issue “Machine Learning and Statistical Mechanics: Shared Synergies for Next Generation of Chemical Theory and Computation”.

Supplementary Material

ct3c01212_si_001.pdf^{(2MB, pdf)}

References

Chodera J. D.; Mobley D. L.; Shirts M. R.; Dixon R. W.; Branson K.; Pande V. S. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Curr. Opin. Struct. Biol. 2011, 21, 150–160. 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mobley D. L.; Bayly C. I.; Cooper M. D.; Shirts M. R.; Dill K. A. Small Molecule Hydration Free Energies in Explicit Solvent: An Extensive Test of Fixed-Charge Atomistic Simulations. J. Chem. Theory Comput. 2009, 5, 350–358. 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]
Panagiotopoulos A. Z. Monte Carlo Methods for Phase Equilibria of Fluids. J. Phys.: Condens. Matter 2000, 12, R25. 10.1088/0953-8984/12/3/201. [DOI] [Google Scholar]
Dybeck E. C.; Abraham N. S.; Schieber N. P.; Shirts M. R. Capturing Entropic Contributions to Temperature-Mediated Polymorphic Transformations Through Molecular Modeling. Cryst. Growth Des. 2017, 17, 1775–1787. 10.1021/acs.cgd.6b01762. [DOI] [Google Scholar]
Schieber N. P.; Dybeck E. C.; Shirts M. R. Using Reweighting and Free Energy Surface Interpolation to Predict Solid-Solid Phase Diagrams. J. Chem. Phys. 2018, 148, 144104. 10.1063/1.5013273. [DOI] [PubMed] [Google Scholar]
Chipot C.; Pohorille A.. Free Energy Calculations; Springer, 2007; Vol. 86. [Google Scholar]
Shirts M. R.; Chodera J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tan Z.; Gallicchio E.; Lapelosa M.; Levy R. M. Theory of Binless Multi-State Free Energy Estimation with Applications to Protein-Ligand Binding. J. Chem. Phys. 2012, 136, 144102. 10.1063/1.3701175. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kong A.; McCullagh P.; Meng X.-L.; Nicolae D.; Tan Z. A Theory of Statistical Models for Monte Carlo Integration. J. R. Stat. Soc. Ser. B Methodol. 2003, 65, 585–604. 10.1111/1467-9868.00404. [DOI] [Google Scholar]
Geyer C. J.Estimating Normalizing Constants and Reweighting Mixtures; 1994.
Berger J. O.Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media, 2013. [Google Scholar]
Stecher T.; Bernstein N.; Csányi G. Free Energy Surface Reconstruction from Umbrella Samples Using Gaussian Process Regression. J. Chem. Theory Comput. 2014, 10, 4079–4097. 10.1021/ct500438v. [DOI] [PubMed] [Google Scholar]
Shirts M. R.; Ferguson A. L. Statistically Optimal Continuous Free Energy Surfaces from Biased Simulations and Multistate Reweighting. J. Chem. Theory Comput. 2020, 16, 4107–4125. 10.1021/acs.jctc.0c00077. [DOI] [PubMed] [Google Scholar]
Habeck M. Bayesian Reconstruction of the Density of States. Phys. Rev. Lett. 2007, 98, 200601. 10.1103/PhysRevLett.98.200601. [DOI] [PubMed] [Google Scholar]
Habeck M. Bayesian Estimation of Free Energies from Equilibrium Simulations. Phys. Rev. Lett. 2012, 109, 100601. 10.1103/PhysRevLett.109.100601. [DOI] [PubMed] [Google Scholar]
Ferguson A. L. BayesWHAM: A Bayesian Approach for Free Energy Estimation, Reweighting, and Uncertainty Quantification in the Weighted Histogram Analysis Method. J. Comput. Chem. 2017, 38, 1583–1605. 10.1002/jcc.24800. [DOI] [PubMed] [Google Scholar]
Maragakis P.; Ritort F.; Bustamante C.; Karplus M.; Crooks G. E. Bayesian Estimates of Free Energies from Nonequilibrium Work Data in the Presence of Instrument Noise. J. Chem. Phys. 2008, 129, 024102. 10.1063/1.2937892. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bennett C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245–268. 10.1016/0021-9991(76)90078-4. [DOI] [Google Scholar]
Neal R. M.MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo; CRC Press, 2011; Vol. 2, p 2. [Google Scholar]
Hoffman M. D.; Gelman A. The No-U-turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
John Skilling Nested Sampling for General Bayesian Computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar]
Meng X.-L.; Schilling S. Warp Bridge Sampling. J. Comput. Graph Stat. 2002, 11, 552–586. 10.1198/106186002457. [DOI] [Google Scholar]
Meng X.-L.; Wong W. H. Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration. Stat. Sin. 1996, 6, 831–860. [Google Scholar]
Rasmussen C. E.; Williams C. K. I.. Gaussian Processes for Machine Learning; The MIT Press, 2005. [Google Scholar]
Liu D. C.; Nocedal J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45, 503–528. 10.1007/BF01589116. [DOI] [Google Scholar]
Carpenter B.; Gelman A.; Hoffman M. D.; Lee D.; Goodrich B.; Betancourt M.; Brubaker M.; Guo J.; Li P.; Riddell A. Stan: A Probabilistic Programming Language. J. Stat. Software 2017, 76, 1–32. 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu K.; Ge H.; Tebbutt W.; Tarek M.; Trapp M.; Ghahramani Z.. AdvancedHMC.Jl: A Robust, Modular and e Cient Implementation of Advanced HMC Algorithms. Proceedings of the 2nd Symposium on Advances in Approximate Bayesian Inference; 2020; pp 1–10.
Lao J.; Louf R.. Blackjax: A Sampling Library for JAX; 2020.
Jordan M. I.; Ghahramani Z.; Jaakkola T. S.; Saul L. K.. In Learning in Graphical Models; Jordan M. I., Ed.; Springer Netherlands: Dordrecht, 1998; pp 105–161. [Google Scholar]
Kingma D. P.; Welling M.. Auto-Encoding Variational Bayes; 2022.
Bradbury J.; Frostig R.; Hawkins P.; Johnson M. J.; Leary C.; Maclaurin D.; Necula G.; Paszke A.; VanderPlas J.; Wanderman-Milne S.; Zhang Q.. JAX: Composable Transformations of Python+NumPy Programs; 2018.
Beutler T. C.; Mark A. E.; van Schaik R. C.; Gerber P. R.; van Gunsteren W. F. Avoiding Singularities and Numerical Instabilities in Free Energy Calculations Based on Molecular Simulations. Chem. Phys. Lett. 1994, 222, 529–539. 10.1016/0009-2614(94)00397-1. [DOI] [Google Scholar]
Wang J.; Wolf R. M.; Caldwell J. W.; Kollman P. A.; Case D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L.-P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 2017, 13, e1005659 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Z.; Liu X.; Yan K.; Tuckerman M. E.; Liu J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123, 6056–6079. 10.1021/acs.jpca.9b02771. [DOI] [PubMed] [Google Scholar]
Åqvist J.; Wennerström P.; Nervall M.; Bjelic S.; Brandsdal B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chem. Phys. Lett. 2004, 384, 288–294. 10.1016/j.cplett.2003.12.039. [DOI] [Google Scholar]
Chodera J. D. A. A Simple Method for Automated Equilibration Detection in Molecular Simulations. J. Chem. Theory Comput. 2016, 12, 1799–1805. 10.1021/acs.jctc.5b00784. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ding X.; Vilseck J. Z.; Brooks C. L. I. Fast Solver for Large Scale Multistate Bennett Acceptance Ratio Equations. J. Chem. Theory Comput. 2019, 15, 799–802. 10.1021/acs.jctc.8b01010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct3c01212_si_001.pdf^{(2MB, pdf)}

[ref1] Chodera J. D.; Mobley D. L.; Shirts M. R.; Dixon R. W.; Branson K.; Pande V. S. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Curr. Opin. Struct. Biol. 2011, 21, 150–160. 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] Mobley D. L.; Bayly C. I.; Cooper M. D.; Shirts M. R.; Dill K. A. Small Molecule Hydration Free Energies in Explicit Solvent: An Extensive Test of Fixed-Charge Atomistic Simulations. J. Chem. Theory Comput. 2009, 5, 350–358. 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] Panagiotopoulos A. Z. Monte Carlo Methods for Phase Equilibria of Fluids. J. Phys.: Condens. Matter 2000, 12, R25. 10.1088/0953-8984/12/3/201. [DOI] [Google Scholar]

[ref4] Dybeck E. C.; Abraham N. S.; Schieber N. P.; Shirts M. R. Capturing Entropic Contributions to Temperature-Mediated Polymorphic Transformations Through Molecular Modeling. Cryst. Growth Des. 2017, 17, 1775–1787. 10.1021/acs.cgd.6b01762. [DOI] [Google Scholar]

[ref5] Schieber N. P.; Dybeck E. C.; Shirts M. R. Using Reweighting and Free Energy Surface Interpolation to Predict Solid-Solid Phase Diagrams. J. Chem. Phys. 2018, 148, 144104. 10.1063/1.5013273. [DOI] [PubMed] [Google Scholar]

[ref6] Chipot C.; Pohorille A.. Free Energy Calculations; Springer, 2007; Vol. 86. [Google Scholar]

[ref7] Shirts M. R.; Chodera J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Tan Z.; Gallicchio E.; Lapelosa M.; Levy R. M. Theory of Binless Multi-State Free Energy Estimation with Applications to Protein-Ligand Binding. J. Chem. Phys. 2012, 136, 144102. 10.1063/1.3701175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Kong A.; McCullagh P.; Meng X.-L.; Nicolae D.; Tan Z. A Theory of Statistical Models for Monte Carlo Integration. J. R. Stat. Soc. Ser. B Methodol. 2003, 65, 585–604. 10.1111/1467-9868.00404. [DOI] [Google Scholar]

[ref10] Geyer C. J.Estimating Normalizing Constants and Reweighting Mixtures; 1994.

[ref11] Berger J. O.Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media, 2013. [Google Scholar]

[ref12] Stecher T.; Bernstein N.; Csányi G. Free Energy Surface Reconstruction from Umbrella Samples Using Gaussian Process Regression. J. Chem. Theory Comput. 2014, 10, 4079–4097. 10.1021/ct500438v. [DOI] [PubMed] [Google Scholar]

[ref13] Shirts M. R.; Ferguson A. L. Statistically Optimal Continuous Free Energy Surfaces from Biased Simulations and Multistate Reweighting. J. Chem. Theory Comput. 2020, 16, 4107–4125. 10.1021/acs.jctc.0c00077. [DOI] [PubMed] [Google Scholar]

[ref14] Habeck M. Bayesian Reconstruction of the Density of States. Phys. Rev. Lett. 2007, 98, 200601. 10.1103/PhysRevLett.98.200601. [DOI] [PubMed] [Google Scholar]

[ref15] Habeck M. Bayesian Estimation of Free Energies from Equilibrium Simulations. Phys. Rev. Lett. 2012, 109, 100601. 10.1103/PhysRevLett.109.100601. [DOI] [PubMed] [Google Scholar]

[ref16] Ferguson A. L. BayesWHAM: A Bayesian Approach for Free Energy Estimation, Reweighting, and Uncertainty Quantification in the Weighted Histogram Analysis Method. J. Comput. Chem. 2017, 38, 1583–1605. 10.1002/jcc.24800. [DOI] [PubMed] [Google Scholar]

[ref17] Maragakis P.; Ritort F.; Bustamante C.; Karplus M.; Crooks G. E. Bayesian Estimates of Free Energies from Nonequilibrium Work Data in the Presence of Instrument Noise. J. Chem. Phys. 2008, 129, 024102. 10.1063/1.2937892. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Bennett C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245–268. 10.1016/0021-9991(76)90078-4. [DOI] [Google Scholar]

[ref19] Neal R. M.MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo; CRC Press, 2011; Vol. 2, p 2. [Google Scholar]

[ref20] Hoffman M. D.; Gelman A. The No-U-turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]

[ref21] John Skilling Nested Sampling for General Bayesian Computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar]

[ref22] Meng X.-L.; Schilling S. Warp Bridge Sampling. J. Comput. Graph Stat. 2002, 11, 552–586. 10.1198/106186002457. [DOI] [Google Scholar]

[ref23] Meng X.-L.; Wong W. H. Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration. Stat. Sin. 1996, 6, 831–860. [Google Scholar]

[ref24] Rasmussen C. E.; Williams C. K. I.. Gaussian Processes for Machine Learning; The MIT Press, 2005. [Google Scholar]

[ref25] Liu D. C.; Nocedal J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45, 503–528. 10.1007/BF01589116. [DOI] [Google Scholar]

[ref26] Carpenter B.; Gelman A.; Hoffman M. D.; Lee D.; Goodrich B.; Betancourt M.; Brubaker M.; Guo J.; Li P.; Riddell A. Stan: A Probabilistic Programming Language. J. Stat. Software 2017, 76, 1–32. 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Xu K.; Ge H.; Tebbutt W.; Tarek M.; Trapp M.; Ghahramani Z.. AdvancedHMC.Jl: A Robust, Modular and e Cient Implementation of Advanced HMC Algorithms. Proceedings of the 2nd Symposium on Advances in Approximate Bayesian Inference; 2020; pp 1–10.

[ref28] Lao J.; Louf R.. Blackjax: A Sampling Library for JAX; 2020.

[ref29] Jordan M. I.; Ghahramani Z.; Jaakkola T. S.; Saul L. K.. In Learning in Graphical Models; Jordan M. I., Ed.; Springer Netherlands: Dordrecht, 1998; pp 105–161. [Google Scholar]

[ref30] Kingma D. P.; Welling M.. Auto-Encoding Variational Bayes; 2022.

[ref31] Bradbury J.; Frostig R.; Hawkins P.; Johnson M. J.; Leary C.; Maclaurin D.; Necula G.; Paszke A.; VanderPlas J.; Wanderman-Milne S.; Zhang Q.. JAX: Composable Transformations of Python+NumPy Programs; 2018.

[ref32] Beutler T. C.; Mark A. E.; van Schaik R. C.; Gerber P. R.; van Gunsteren W. F. Avoiding Singularities and Numerical Instabilities in Free Energy Calculations Based on Molecular Simulations. Chem. Phys. Lett. 1994, 222, 529–539. 10.1016/0009-2614(94)00397-1. [DOI] [Google Scholar]

[ref33] Wang J.; Wolf R. M.; Caldwell J. W.; Kollman P. A.; Case D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]

[ref34] Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 10.1063/1.445869. [DOI] [Google Scholar]

[ref35] Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]

[ref36] Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L.-P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 2017, 13, e1005659 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Zhang Z.; Liu X.; Yan K.; Tuckerman M. E.; Liu J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123, 6056–6079. 10.1021/acs.jpca.9b02771. [DOI] [PubMed] [Google Scholar]

[ref38] Åqvist J.; Wennerström P.; Nervall M.; Bjelic S.; Brandsdal B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chem. Phys. Lett. 2004, 384, 288–294. 10.1016/j.cplett.2003.12.039. [DOI] [Google Scholar]

[ref39] Chodera J. D. A. A Simple Method for Automated Equilibration Detection in Molecular Simulations. J. Chem. Theory Comput. 2016, 12, 1799–1805. 10.1021/acs.jctc.5b00784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] Ding X.; Vilseck J. Z.; Brooks C. L. I. Fast Solver for Large Scale Multistate Bennett Acceptance Ratio Equations. J. Chem. Theory Comput. 2019, 15, 799–802. 10.1021/acs.jctc.8b01010. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bayesian Multistate Bennett Acceptance Ratio Methods

Xinqiang Ding

Abstract

1. Introduction

2. Methods

2.1. Reverse Logistic Regression Model of MBAR

2.2. Bayesian MBAR

2.3. Choosing the Prior Distribution

2.3.1. Using Uniform Distributions as the Prior

2.3.2. Using Gaussian Distributions as the Prior

2.4. Computing Posterior Statistics

2.4.1. Computing the MAP Estimate

2.4.2. Computing the Mean and the Covariance Matrix of the Posterior Distribution

2.5. Optimizing the Hyperparameters

3. Results

3.1. Computing the Free Energy Difference between Two Harmonic Oscillators

Table 1. Free Energy Difference between the Two Harmonic Oscillators (k1 = 25, k2 = 36).

Figure 1.

3.2. Computing Free Energy Differences among Three Harmonic Oscillators

Figure 2.

Table 2. Free Energy Differences among the Three Harmonic Oscillators (k1 = 16, k2 = 25, and k3 = 36).

3.3. Computing the Hydration Free Energy of Phenol

Table 3. Values of λelec and λvdw Used in Computing the Hydration Free Energy of Phenol.

3.3.1. Uniform Prior

Table 4. Hydration Free Energy (in the Unit of kBT) of Phenol Computed Using BayesMBAR with the Uniform Prior.

Figure 3.

3.3.2. Normal Prior

Table 5. Comparison of the Performance of BayesMBAR with the Uniform Prior (MAP Estimate) and the Normal Prior (MAP and Posterior Mean Estimates) for Computing the Hydration Free Energy of Phenol.

4. Conclusion and Discussion

Acknowledgments

Supporting Information Available

Special Issue

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. Free Energy Difference between the Two Harmonic Oscillators (k₁ = 25, k₂ = 36).

Table 2. Free Energy Differences among the Three Harmonic Oscillators (k₁ = 16, k₂ = 25, and k₃ = 36).

Table 3. Values of λ_elec and λ_vdw Used in Computing the Hydration Free Energy of Phenol.

Table 4. Hydration Free Energy (in the Unit of k_BT) of Phenol Computed Using BayesMBAR with the Uniform Prior.