Abstract
The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing the free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integration of configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR’s result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using nonuniform prior distributions. As an example, we show that by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR’s widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.
1. Introduction
Computing the free energies of thermodynamic states is a central problem in computational chemistry and physics. It has wide-ranging applications including computing protein–ligand binding affinities,1 predicting molecular solubilities,2 and estimating phase equilibria,3−5 among other tasks. For states whose free energies are not analytically tractable, their free energies are often estimated using numerical methods.6 These methods typically involve sampling configurations from states of interest and subsequently computing their free energies based on sampled configurations. In this work, we focus on the second step of estimating free energies, assuming that equilibrium configurations have been sampled using Monte Carlo sampling or molecular dynamics.
The multistate Bennett acceptance ratio (MBAR) method7 is a common technique for estimating free energies given sampled configurations. Equivalent formulations of the MBAR method were also developed in different contexts.8−10 For the purpose of this study, we refer to this method and its equivalent formulations as MBAR. The MBAR method not only offers an estimate of free energies but also provides the statistical uncertainty associated with the estimate. In situations where a large number of configurations are available, the MBAR estimator is unbiased and has the smallest variance among estimators reliant on sampled configurations.7,9 However, properties of the MBAR estimator and their associated uncertainty estimate remain largely unexplored when the number of configurations is small. Furthermore, in such scenarios, it becomes desirable to incorporate prior knowledge into the estimation procedure.
A systematic approach of integrating prior knowledge into an estimation procedure is Bayesian inference.11 Bayesian inference treats unknown quantities (free energies in this case) as random variables and incorporates prior knowledge into the estimation procedure by employing prior distributions and the Bayes’ theorem. In terms of free energy estimation, prior knowledge could come from previous simulations, experiments, or physical knowledge of a system. A common instance of physical prior knowledge of free energies is that free energy surfaces along a collective coordinate are usually smooth. Combining prior knowledge with observed data (configurations sampled from thermodynamic states), Bayesian inference computes the posterior distribution of unknown quantities. The posterior distribution provides both estimates of the unknown quantities and uncertainties of the estimates.
Estimating free energies using Bayesian inference was investigated in multiple studies. For instance, Stecher et al.12 used a Gaussian process as the prior distribution over smooth free energy surfaces. The resulting posterior distribution, given configurations from umbrella sampling, was utilized to estimate the free energy surfaces and the associated uncertainty. Shirts and Ferguson13 parameterized free energy surfaces using splines and constructed prior distributions using a Gaussian prior on spline coefficients. Unlike these studies that primarily focused on estimating free energy surfaces from biased simulations, the works of Habeck,14,15 Ferguson,16 and Maragakis et al.17 were aimed at estimating densities of states and free energy differences using Bayesian inference. Methods developed in these studies are direct Bayesian generalizations of the weighted histogram analysis method (WHAM) and the Bennett acceptance ratio (BAR) method.18
This work focuses on improving the accuracy of estimating free energies of discrete thermodynamic states when the number of sampled configurations is small. For this purpose, we developed a Bayesian generalization of the MBAR method, which we term BayesMBAR. With several benchmark examples, we show that when the number of configurations is small, BayesMBAR provides not only superior uncertainty estimates compared to MBAR but also more accurate estimates of free energies by incorporating prior knowledge into the estimation procedure.
2. Methods
The MBAR method is commonly understood as a set of self-consistent equations that are not amenable to the development of its Bayesian generalization. To develop a Bayesian generalization of MBAR, we first emphasize the probabilistic nature of the MBAR method. Although there are multiple statistical models from which the MBAR method can be derived, we build upon the reverse logistic regression model,10 which treats free energies as parameters and provides a likelihood for inference. To convert the reverse logistic regression model into a Bayesian model, we treat free energies as random variables and place a prior distribution on them. Then the posterior distribution of free energies is computed using Bayes’ theorem. Samples from the posterior distribution are efficiently generated using Hamiltonian Monte Carlo (HMC) methods.19,20 These samples are used to estimate free energies and quantify the uncertainty of the estimate. Hyperparameters of the prior distribution are automatically optimized by maximizing the marginal likelihood of data (Bayesian evidence). We present the details of BayesMBAR in the following sections.
2.1. Reverse Logistic Regression Model of MBAR
Computing the free energies of thermodynamic states is closely related to computing the normalizing constants of Bayesian models. Multiple methods have been developed in statistics for estimating normalizing constants,10,21−23 and these methods are directly applicable for estimating free energies. Here, we focus on the reverse logistic regression method proposed by Geyer10 and show that the solution of this method is equivalent to the MBAR method.
Let us assume that we aim to calculate the free energies of m thermodynamic states (up to an additive constant) by sampling their configurations. Let ui(x), i = 1, ..., m be the reduced potential energy functions7 of the m states. The free energy of the ith state is defined as
| 1 |
where Γ is the configuration space.
For the ith state, ni uncorrelated configurations, {xik, k = 1, ..., ni}, are sampled from its Boltzmann
distribution pi(x; Fi) = exp(−[ui(x) – Fi]). Here, pi(x; Fi) means that it is a distribution of
the random variable x with the parameter Fi. We will henceforth use such
a notation of separating parameters from random variables with a semicolon
in probability distributions. To estimate free energies, Geyer10 proposed the following retrospective formulation.
This formulation treats indices of states in an unconventional manner.
Let us use yik to denote
the index of the state from which configuration xik is sampled.10 Apparently, yik = i for all i and k. Although
indices of states for sampled configurations are determined in the
sampling setup, they are treated as a multinomial distributed random
variable with parameters π = (π1, ..., πm). Because ni configurations are sampled from state i, the maximum likelihood estimate of πi is
, where n = ∑mi = 1ni. The concatenation of state indices and configurations,
(y, x), is viewed as samples from
the joint distribution of p(y, x), which is defined as
| 2 |
| 3 |
for i ∈ {1, ..., m}. Following a retrospective argument, the reverse logistic regression method estimates the free energies by asking the following question. Given that a configuration x is observed, what is the probability that it is sampled from state y = i rather than from other states? Using Bayes’ theorem, we can compute this retrospective conditional probability as
| 4 |
where F = (F1, ..., Fm) and log π = (log π1, ..., log πm). The free energies are estimated by maximizing the product of the retrospective conditional probabilities of all configurations, which is equivalent to maximizing the log-likelihood
![]() |
5 |
The log-likelihood function
(F, log π) in eq 5 depends on F and log π only through their sum ϕ = F + log π, so F and log π are not separately
estimable from maximizing the log-likelihood. The solution is to substitute
log πi with the empirical estimate
. Then, by setting the derivative of
/∂F to zero, we
obtain
| 6 |
for r = 1, ..., m.
is the solution that maximizes
. Equation 6 is identical to the MBAR equation and reduces to the
BAR equation18 when m =
2. The technique described above is termed “reverse logistic
regression” based on two primary insights. First, the log-likelihood
in eq 5 bears resemblance
to that found in multiclass logistic regression. Second, the primary
goal of this method is to estimate F, the intercept
term. This differs from traditional logistic regression, where the
aim is to determine regression coefficients and predict the response
variable y.
The uncertainty of the estimate
is computed using asymptotic analysis of
the log-likelihood function
(F, log π) in eq 5. Because the log-likelihood
function
(F, log π) depends
on F and log π only through their sum ϕ
= F + log π, the observed Fisher information
matrix computed using the log-likelihood function can only be used
to compute the asymptotic covariance of the sum ϕ. The observed
Fisher information matrix at
is
| 7 |
where pik is a column vector of
...,
and diag(pik) is a diagonal matrix with pik as its diagonal elements. The asymptotic
covariance matrix of
is the Moore–Penrose pseudoinverse
of the observed Fisher information matrix, i.e.,
. To compute the asymptotic covariance matrix
of
, we assume that
and log
are asymptotically independent. Then the
asymptotic covariance matrix of
can be computed as
![]() |
8 |
where 1 is a column vector of m ones. The asymptotic covariance matrix in eq 8 is the same as that derived in
ref (9) and is commonly
used in the MBAR method.7 With the asymptotic
covariance matrix of
, we can compute the asymptotic variance
of their differences using the identity
=
+
–
, where
is the (r, s)th element of the matrix
.
2.2. Bayesian MBAR
As shown above, the
reverse logistic regression model formulates MBAR as a statistical
model. It provides a likelihood function (eq 5) for computing the MBAR estimate of
and the associated asymptotic covariance.
Based on this formulation, we developed BayesMBAR by turning the reverse
logistic regression into a Bayesian model. In BayesMBAR, we treat F as a random variable instead of a parameter and place
a prior distribution on F. The posterior distribution
of F is then used to estimate F.
Let us represent the prior distribution of F as p(F; θ), where θ is the parameter
of the prior distribution and is often called hyperparameters. Borrowing
from the reverse logistic regression, we use the retrospective conditional
probability in eq 4 as
the likelihood function, i.e., p(y|x, F) = p(y|x; F, log π).
We note that F is treated as a random variable in p(y|x, F), whereas it is a parameter in p(y|x; F, log π). The log π
term in the likelihood function is substituted with the maximum likelihood
estimate log
. With these definitions, the posterior
distribution of F given sampled configurations and
state index is
![]() |
9 |
where Y = {yik: i = 1, ..., m; k = 1, ..., ni} and X = {xik: i = 1, ..., m; k = 1, ..., ni}. Using the posterior distribution in eq 9, we can compute various quantities of interest,
such as the posterior mode and the posterior mean, both of which can
serve as point estimates of
. In addition, we can use the posterior
covariance matrix as an estimate of the uncertainty for
. However, to carry out these calculations,
we need to address the following questions that commonly arise in
Bayesian inference.
2.3. Choosing the Prior Distribution
To fully specify the BayesMBAR model, we must choose a prior distribution for F. We could use information about F from previous simulations or experiments to construct the prior distribution, if such information is available. For example, the prior distribution of protein–ligand binding free energies could be constructed using free energies computed with fast but less accurate methods such as docking. The information could also come from the binding free energies of similar protein–ligand systems. Turning such information into a prior distribution will depend on domain experts’ experience and likely vary from case to case. In this work, we focus on scenarios where such information is not available. In this scenario, we propose to use two types of distributions as the prior: the uniform distribution and the Gaussian distribution.
2.3.1. Using Uniform Distributions as the Prior
As the MBAR method has proven to be a highly effective method for estimating free energies in many applications, a conservative strategy for choosing the prior distribution is to minimize the deviation of BayesMBAR from MBAR. Such a strategy leads to using the uniform distribution as the prior distribution because it makes the maximum a posteriori probability (MAP) estimate of BayesMBAR the same as the MBAR estimate. Specifically, if we set the prior distribution of F to be the uniform distribution, i.e., p(F; θ) ∝ constant, the posterior distribution of F in eq 9 becomes the same as the likelihood function. Therefore, maximizing the posterior distribution of F is equivalent to maximizing the log-likelihood function in eq 5.
While the MBAR estimate was recovered with its MAP estimate, BayesMBAR with a uniform prior distribution provides two advantages. First, in addition to the MAP estimate, BayesMBAR also offers the posterior mean as an alternative point estimate of F. Second, BayesMBAR produces a posterior distribution of F, which can be used to estimate the uncertainty of the estimate. As shown in the Results section, the uncertainty estimate from BayesMBAR is more accurate than that from MBAR when the number of configurations is small.
2.3.2. Using Gaussian Distributions as the Prior
In many applications, we are interested in computing free energies along collective coordinates such as distances, angles, or alchemical parameters. In such cases, we often have the prior knowledge that the free energy surface is a smooth function F(λ) of the collective coordinate λ. A widely used approach to encode such knowledge into Bayesian inference is to use a Gaussian process24 as the prior distribution. A Gaussian process is a collection of random variables, any finite number of which has a joint Gaussian distribution. A Gaussian process is fully specified by its mean function μ(λ) and covariance function k(λ, λ′). The value of the covariance function k(λ, λ′) is the covariance between F(λ) and F(λ′). The covariance function is often designed to encode the smoothness of the function. Specifically, the covariance k(λ, λ′) between F(λ) and F(λ′) increases as λ and λ′ become closer. When the mean function is smooth and a covariance function such as the squared exponential covariance function is used, the Gaussian process is a probability distribution of smooth functions.
In BayesMBAR, we focus on estimating free energies at discrete values of the collective coordinate, (λ1, ..., λm), instead of the whole free energy surface. Projecting the Gaussian process over free energy surfaces onto discrete values of λ, we obtain as the prior distribution of F = [F(λ1), ..., F(λm)] a multivariate Gaussian distribution with the mean vector μ = [μ(λ1), ..., μ(λm)] and the covariance matrix Σ. The (i, j)th element of Σ is computed as Σij = k(λi, λj) and represents the covariance between F(λi) and F(λj). As in many applications of Gaussian processes, we set the mean function to be a constant function, i.e., μ = (c, ..., c), where c is a hyperparameter to be optimized. The choice of the covariance function is a key ingredient of constructing the prior distribution, as it encodes our assumption about the free energy surface’s smoothness.24 Several well-studied covariance functions are suitable for use in BayesMBAR. In this study, we use the squared exponential covariance function as an example, noting that other types of covariance functions can be used as well. The squared exponential covariance function is defined as
| 10 |
where r = |λ – λ′| and the variance scale σ and the length scale l are hyperparameters to be optimized. Every function F(λ) from such Gaussian processes has infinitely many derivatives and is very smooth. The hyperparameters σ and l control the variance and the length scale of function F(λ), respectively. The collective coordinate λ is not restricted to scalars and can be a vector. When λ is a vector, the length scale l could be either a scalar or a vector of the same dimension as λ, the latter of which means that the length scale can be different for different dimensions in λ. When a vector length scale is used, the squared exponential covariance function is defined as
| 11 |
where L is a diagonal matrix
with values of l as its diagonal elements. To allow
the variances to be different for different entries in F = [F(λ1), ..., F(λm)], we add a constant
to the variance of F(λi), i.e.,
. With the mean function and the covariance
function defined, the prior distribution of F is
fully specified as a multivariate Gaussian distribution of
| 12 |
where μθ and Σθ are the mean vector and the covariance matrix, respectively.
They depend on the hyperparameters
, where
. The hyperparameters θ are optimized
by maximizing the Bayesian evidence, as described in the following
sections.
2.4. Computing Posterior Statistics
With the prior distribution of F defined as above, the posterior distribution defined in eq 9 contains rich information about F. Specifically, the MAP or the posterior mean can be used as point estimates of F and the posterior covariance matrix can be used to compute the uncertainty of the estimate.
2.4.1. Computing the MAP Estimate
The
MAP estimate of F is the value that maximizes the
posterior distribution density, i.e.,
. When the prior distribution is chosen
to be either uniform distributions or Gaussian distributions, log p(F|Y, X) is a concave function of F. This means that the
MAP estimate is the unique global maximum of the posterior distribution
density and can be efficiently computed using standard optimization
algorithms. In BayesMBAR, we implemented the L-BFGS-B algorithm25 and the Newton’s method to compute the
MAP estimate.
2.4.2. Computing the Mean and the Covariance Matrix of the Posterior Distribution
Computing the posterior mean and the covariance matrix is more challenging than computing the MAP estimate. It involves computing an integral with respect to the posterior distribution density. When there are only two states, we compute the posterior mean and the covariance matrix by numerical integration. When there are more than two states, numerical integration is not feasible. In this case, we estimate the posterior mean and covariance matrix by sampling from the posterior distribution using the No-U-Turn Sampler (NUTS).20 The NUTS is a variant of HMC methods19 and has the advantage of automatically tuning the step size and the number of steps. The NUTS sampler has been shown to be highly efficient in sampling from high-dimensional distributions for Bayesian inference problems.26,27 In BayesMBAR, an extra factor that further improves the efficiency of the NUTS sampler is that the posterior distribution density is a concave function, which means that the sampler does not need to cross low-density (high-energy) regions during sampling. In BayesMBAR, we use the NUTS sampler as implemented in the Python package, BlackJAX.28
2.5. Optimizing the Hyperparameters
When Gaussian distributions with a specific covariance function are used as the prior distribution of F (eq 12), we need to make decisions about the values of hyperparameters. Such decisions are referred to as model selection problems in Bayesian inference, and several principles have been proposed and used in practice. In BayesMBAR, we use the Bayesian model selection principle, which is to choose the model that maximizes the marginal likelihood of the data. The marginal likelihood of the data is also called the Bayesian evidence and is defined as
| 13 |
Because the Bayesian evidence is a multidimensional integral, computing it with numerical integration is not feasible. In BayesMBAR, we use ideas from variational inference29 and Monte Carlo integration9 to approximate it and optimize the hyperparameters.
We introduce a variational distribution q(F) and use the evidence lower bound (ELBO) of the marginal likelihood as the objective function for optimizing the hyperparameters. Specifically, the ELBO is defined as
![]() |
14 |
It is straightforward to show that
=
–
≤
, where DKL(q∥p(F|Y, X; θ)] is the Kullback–Leibler divergence
between q(F) and p(F|Y, X; θ).
Therefore, the ELBO is a lower bound of the log marginal likelihood
of data, and the gap between them is the Kullback–Leibler divergence
between q(F) and p(F|Y, X; θ).
This suggests that to make the ELBO a good approximation of the log
marginal likelihood, we should choose q(F) that is close to p(F|Y, X; θ).
Although we could
in principle use p(F|Y, X; θ) as the variational
distribution q(F) (then the ELBO
would be equal to the log marginal likelihood), it is not practical
because computing the gradient of the ELBO with respect to the hyperparameters
would require sampling from p(F|Y, X; θ) at every iteration of the
optimization and is computationally too expensive. Instead, we choose q(F) to be a Gaussian distribution to approximate
the posterior distribution based on the following observations. The
posterior distribution density p(F|Y, X; θ) is equal to the
product of the likelihood function p(Y|F, X) and the prior distribution p(F; θ) up to a normalization constant.
The likelihood term p(Y|F, X) is a log-concave function of F and does not depend on θ, so we can approximate
it using a fixed Gaussian distribution
, where the mean μ0 and
the covariance matrix Σ0 are computed by sampling F from p(Y|F, X) once. Because the prior distribution p(F; θ) is also a Gaussian distribution,
, multiplying the fixed Gaussian distribution
with the prior yields another Gaussian
distribution
, where μq and Σq can be analytically computed
as
| 15 |
| 16 |
Therefore, we choose the proposal distribution q(F) to be the Gaussian distribution
, where μq and Σq are computed as above and
depend on θ analytically. We compute the ELBO and its gradient
with respect to θ using the reparameterization trick.30 Specifically, we reparameterize the proposal
distribution q(F) using F = μq + Σq1/2ε, where ε is a random
variable with the standard Gaussian distribution. The ELBO can then
be written as
| 17 |
The first term on the right hand side can be estimated by sampling ε from the standard Gaussian distribution and evaluating the log-likelihood p(Y|μq + Σ1/2qε, X). The second term can be computed analytically. The gradient of the ELBO with respect to θ is computed using automatic differentiation.31
3. Results
3.1. Computing the Free Energy Difference between Two Harmonic Oscillators
We first tested the performance
of BayesMBAR by computing the free energy difference between two harmonic
oscillators. In this case, because there are only two states, BayesMBAR
reduces to a Bayesian generalization of the BAR method, and we use
BayesBAR to refer to it. The two harmonic oscillators are defined
by the potential energy functions of
and
, where k1 and k2 are the force constants and u1 and u2 are in the unit of kBT. The objective is to compute
the free energy difference between them, i.e., ΔF = F2 – F1 and
for i = 1 and 2.
We first drew n1 and n2 samples from the Boltzmann distribution of u1 and u2, respectively. Then
we use BayesBAR with the uniform prior distribution to estimate the
free energy difference. To benchmark BayesBAR, we also computed the
free energy difference using the BAR method and compared the results
from both methods with the true value (Table 1). The force constants are set to be k1 = 25 and k2 =
36. The numbers of samples, n1 and n2, are set equal and range from 10 to 5000.
For each sample size, we repeated the calculation for K = 100 times and computed the root mean square error (RMSE), bias,
and standard deviation (SD) of the estimates. The RMSE is computed
as
, where
is the estimate from the kth repeat and ΔF is the true value. The bias
is computed as Δ
– F, where
, and the SD is computed as
.
Table 1. Free Energy Difference between the Two Harmonic Oscillators (k1 = 25, k2 = 36).
| RMSE |
bias |
SD |
estimate of SD |
|||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n1(=n2) | MAPa | meanb | MAP | mean | MAP | mean | BayesBAR | asymptotic | Bennett’s | bootstrap |
| 10 | 2.45 | 2.40 | 0.85 | 0.90 | 2.29 | 2.23 | 4.08 | 39.24 | 1.10 | 1.53 |
| 13 | 2.53 | 2.47 | 0.87 | 0.92 | 2.38 | 2.29 | 3.55 | 19.69 | 1.11 | 1.47 |
| 18 | 1.92 | 1.84 | 0.16 | 0.22 | 1.91 | 1.83 | 3.09 | 11.58 | 1.09 | 1.31 |
| 28 | 1.72 | 1.65 | 0.44 | 0.48 | 1.66 | 1.58 | 2.58 | 7.06 | 1.07 | 1.14 |
| 48 | 1.27 | 1.19 | 0.16 | 0.19 | 1.26 | 1.17 | 1.90 | 2.92 | 1.07 | 1.11 |
| 99 | 1.34 | 1.23 | –0.19 | –0.11 | 1.33 | 1.22 | 1.38 | 1.64 | 0.98 | 0.96 |
| 304 | 0.83 | 0.79 | 0.00 | 0.03 | 0.83 | 0.79 | 0.80 | 0.81 | 0.74 | 0.70 |
| 5000 | 0.19 | 0.18 | –0.00 | –0.00 | 0.19 | 0.18 | 0.20 | 0.20 | 0.20 | 0.20 |
MAP estimate of BayesBAR (equivalent to the BAR estimate).
Posterior mean estimate of BayesBAR.
Because the uniform prior distribution is used, the MAP estimate of BayesBAR is identical to the BAR estimate. Besides the MAP estimate, BayesBAR also provides the posterior mean estimate, which is computed using numerical integration. Compared to the MAP estimate (the BAR estimate), the posterior mean estimate has a smaller RMSE. Decomposing the RMSE into bias and SD, we found that the posterior mean estimate has a larger bias but a smaller SD than the MAP estimate. The decrease in SD overcompensates the increase in bias for the posterior mean estimate, which leads to its smaller RMSE. Statistical testing shows that the differences in RMSE and bias between the MAP estimate and the posterior mean estimate are statistically significant (p-value < 0.05), while the difference in SD is not (Figures S1 and S2). Although the MAP estimate and the posterior mean estimate have different RMSEs, the difference is small, and both estimates converge to the true value as the sample size increases. This suggests that both estimates can be used interchangeably in practice.
Besides the MAP and the posterior mean estimate for ΔF, BayesBAR offers an estimate of the uncertainty (whose true values are included in the two columns beneath the label SD in Table 1) by using the posterior SD (the BayesBAR column in Table 1). For benchmarking, we also calculated the uncertainty estimate using asymptotic analysis, Bennett’s method, and the bootstrap method. Because each repeat produces an uncertainty estimate, we used the average from all K repeats as the uncertainty estimate of each method, denoted as “estimate of SD” in Table 1.
When the number of configurations is small, the asymptotic analysis significantly overestimates the uncertainty, while both Bennett’s method and the bootstrap method tend to underestimate the uncertainty. Practically, overestimating uncertainty is favored over underestimating, as the former prompts further configuration collection, whereas the latter might cause the user to stop sampling prematurely. Nevertheless, excessive overestimation is not ideal either, as it might result in gathering an unnecessarily large number of configurations. Given these considerations, BayesBAR’s uncertainty estimate overestimates the uncertainty modestly and thus is a better choice than the other methods. As the sample size increases, the uncertainty estimates from all methods converge to the true value.
The asymptotic analysis tends to overestimate uncertainty much more than BayesBAR. This is because the asymptotic analysis approximates the posterior distribution of ΔF with a Gaussian distribution centered at the MAP estimate. Such an approximation is generally accurate for a large number of configurations. However, with a smaller number of configurations, this approximation becomes imprecise, leading to considerable overestimation of uncertainty. Figure 1 provides a visual comparison, contrasting the posterior distribution of ΔF as determined by BayesBAR with the Gaussian approximation from the asymptotic analysis for an experiment where n1 = n2 = 18.
Figure 1.
Probability densities of the posterior distribution (solid line) of ΔF and the approximate Gaussian distribution (dashed line) used by the asymptotic analysis for the two harmonic oscillator system with n1 = n2 = 18.
3.2. Computing Free Energy Differences among Three Harmonic Oscillators
We next tested the performance of BayesMBAR
on a multistate system. The system consists of three harmonic oscillators
with the following unitless potential energy functions:
,
, and
, where k1 =
16, k2 = 25, and k3 = 36. The free energy differences among the three harmonic
oscillators are analytically known. Similar to the two harmonic oscillator
systems, we first draw n samples from the Boltzmann
distribution of each harmonic oscillator. We use BayesMBAR with the
uniform prior to estimate the free energy differences by computing
both the MAP estimate and the posterior mean estimate. The posterior
mean estimate is computed by sampling from the posterior distribution
using the NUTS sampler instead of numerical integration. Figure 2 shows the posterior
distribution of the free energy differences (F2 – F1 and F3 – F1) and a subset
of samples drawn from the posterior distribution in one repeat of
the calculation when n = 18. As shown in Figure 2b,c, samples from
the NUTS samplers decorrelate quickly and can efficiently traverse
the posterior distribution.
Figure 2.
Probability density and samples of the posterior distribution of F2 – F1 and F3 – F1 for the three harmonic oscillators with n = 18. (a) Contours are the logarithm of the posterior distribution density. Dots are a subset of samples drawn from the posterior distribution using the NUTS sampler. (b,c) First 300 samples of F2 – F1 and F3 – F1 drawn from the posterior distribution using the NUTS sampler.
For benchmarking purposes, we conducted the calculation 100 times (K = 100) for each sample size n, and we derived metrics including the RMSE, bias, and SD of the estimate (Table 2). Given the use of a uniform prior, BayesMBAR’s MAP estimate is the same as the MBAR estimate. When contrasted with the MBAR estimate, the posterior mean estimate has lower SD but higher bias. When factoring in both SD and bias, the posterior mean estimate has a smaller RMSE compared to the MBAR estimate. The differences in RMSE and bias are statistically significant when n ≤ 99 and the difference in SD is not for any sample size (Figures S3–S6). As in the case of two harmonic oscillators, the difference in RMSE between the two estimators is minimal, and both estimates converge to the correct value as sample size grows. In terms of uncertainty, BayesMBAR offers a superior estimate compared to established techniques such as asymptotic analysis or the bootstrap method, especially with limited configuration sizes. Notably, BayesMBAR’s uncertainty estimate avoids the underestimation seen with the bootstrap method. Simultaneously, compared to the asymptotic analysis, BayesMBAR’s uncertainty estimate has a more modest overestimation.
Table 2. Free Energy Differences among the Three Harmonic Oscillators (k1 = 16, k2 = 25, and k3 = 36).
| F2 – F1 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| RMSE |
bias |
SD |
estimate of SD |
||||||
| n | MAP | mean | MAP | mean | MAP | mean | BayesMBAR | asymptotic | bootstrap |
| 10 | 1.89 | 1.83 | 0.45 | 0.53 | 1.84 | 1.75 | 2.28 | 5.31 | 1.26 |
| 13 | 1.84 | 1.76 | 0.46 | 0.52 | 1.78 | 1.68 | 1.93 | 3.20 | 1.19 |
| 18 | 1.41 | 1.30 | –0.09 | 0.01 | 1.41 | 1.30 | 1.62 | 2.17 | 1.05 |
| 28 | 1.18 | 1.12 | 0.13 | 0.19 | 1.17 | 1.10 | 1.31 | 1.55 | 0.91 |
| 48 | 0.79 | 0.75 | –0.04 | 0.01 | 0.79 | 0.75 | 0.97 | 1.00 | 0.83 |
| 99 | 0.67 | 0.64 | –0.06 | –0.04 | 0.66 | 0.64 | 0.69 | 0.70 | 0.63 |
| 304 | 0.39 | 0.39 | –0.01 | –0.00 | 0.39 | 0.39 | 0.40 | 0.40 | 0.40 |
| 5000 | 0.09 | 0.09 | –0.00 | 0.00 | 0.09 | 0.09 | 0.10 | 0.10 | 0.10 |
| F3 – F1 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| RMSE |
Bias |
SD |
estimate of SD |
||||||
| n | MAP | mean | MAP | mean | MAP | mean | BayesMBAR | asymptotic | bootstrap |
| 10 | 2.93 | 2.85 | 0.89 | 1.02 | 2.79 | 2.66 | 4.63 | 28.27 | 2.02 |
| 13 | 3.26 | 3.18 | 1.25 | 1.35 | 3.01 | 2.87 | 4.16 | 20.74 | 1.86 |
| 18 | 2.53 | 2.40 | 0.12 | 0.27 | 2.53 | 2.39 | 3.39 | 10.12 | 1.85 |
| 28 | 2.28 | 2.20 | 0.62 | 0.73 | 2.20 | 2.08 | 2.87 | 6.35 | 1.56 |
| 48 | 1.73 | 1.64 | 0.16 | 0.26 | 1.72 | 1.62 | 2.26 | 3.91 | 1.37 |
| 99 | 1.53 | 1.43 | 0.36 | 0.42 | 1.49 | 1.37 | 1.58 | 1.84 | 1.21 |
| 304 | 0.99 | 0.95 | –0.00 | 0.03 | 0.99 | 0.95 | 0.89 | 0.91 | 0.81 |
| 5000 | 0.23 | 0.22 | 0.01 | 0.01 | 0.23 | 0.22 | 0.22 | 0.23 | 0.22 |
3.3. Computing the Hydration Free Energy of Phenol
We further tested the performance of BayesMBAR on a real system that involves several collective variables. Specifically, we use BayesMBAR to compute the hydration free energy of phenol by using an alchemical approach. In this approach, we modify the nonbonded interactions between phenol and water using an alchemical variable λ = (λelec, λvdw), where λelec and λvdw are alchemical variables for the electrostatic and the van der Waals interactions, respectively. The electrostatic interaction is linearly scaled by 1 – λelec as
| 18 |
where qi and qj are the charges of atom i and j, respectively; and rij is the distance between them. The van der Waals interaction is modified by λvdw using a soft-core Lennard-Jones potential32 as
| 19 |
where εij and σij are the Lennard-Jones parameters of atoms i and j; and α = 0.5. When (λelec, λvdw) = (0, 0), the nonbonded interactions between phenol and water are turned on and phenol is in the water phase. When (λelec, λvdw) = (1, 1), the nonbonded interactions are turned off and phenol is in the vacuum phase. The hydration free energy of phenol is equal to the free energy difference between the two states λ = (0, 0) and λ = (1, 1). To compute the free energy difference, we introduce seven intermediate states through which λelec and λvdw are gradually changed from (0, 0) to (1, 1). The values of λelec and λvdw for the intermediate and end states are included in Table 3.
Table 3. Values of λelec and λvdw Used in Computing the Hydration Free Energy of Phenol.
| λ index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| λelec | 0 | 0.33 | 0.66 | 1 | 1 | 1 | 1 | 1 | 1 |
| λvdw | 0 | 0 | 0 | 0 | 0.2 | 0.4 | 0.6 | 0.8 | 1 |
We use the general AMBER force field33 for phenol and the TIP3P model34 for water. The particle mesh Ewald (PME) method35 is used to compute the electrostatic interactions. Lennard-Jones interactions are truncated at 12 Å, with a switching function starting at 10 Å. We use OpenMM36 to run NPT molecular dynamics simulations for all states at 300 K and 1 atm using the middle scheme37 Langevin integrator with a friction coefficient of 1 ps–1 and the Monte Carlo barostat38 with a frequency of 25 steps. Each simulation is run for 20 ns with a time step of 2 fs. Configurations are saved every 20 ps to ensure that they are uncorrelated39 (Figure S7), as verified in the Supporting Information. We use BayesMBAR to compute the free energy differences with the n configurations from each state. We repeated the calculation for K = 100 times. Because the ground truth hydration free energy is not known analytically, we use as the benchmark the MBAR estimate computed using all configurations sampled from all repeats, i.e., 100,000 configurations from each state.
3.3.1. Uniform Prior
We first tested the performance of BayesMBAR with the uniform prior using different numbers of configurations. Here, the n configurations from each state are not randomly sampled from saved configurations during the 20 ns of simulation. Instead, we use the first n configurations to mimic the situation in production calculations, where configurations are saved sequentially. The results are summarized in Table 4. Compared to the MAP estimate (the MBAR estimate), the posterior mean estimate has a smaller SD but a larger bias, as observed in the previous harmonic oscillator systems. In terms of RMSE, the posterior mean estimate has a larger RMSE than the MAP estimate, which is different from that in the harmonic oscillator systems. However, the differences in RMSE and bias between the MAP estimate and the posterior mean are not statistically significant except when n = 5000 (Figure S8) and the difference in SD is not statistically significant for any sample size (Figure S8). This further suggests that both estimates can be used interchangeably in practice. We also compared the uncertainty estimate among BayesMBAR, asymptotic analysis, and the bootstrap method. The BayesMBAR estimate of the uncertainty is closer to the true value than the asymptotic analysis while not underestimating the uncertainty as the bootstrap method does when the number of configurations is small. In addition to the free energy difference between the two end states, we also compared the uncertainty estimates for free energies of all states (Figure 3). When the number of configurations is small (n = 5), the uncertainty estimates from BayesMBAR are closer to the true uncertainty than the asymptotic analysis and do not underestimate the uncertainty as the bootstrap method does.
Table 4. Hydration Free Energy (in the Unit of kBT) of Phenol Computed Using BayesMBAR with the Uniform Prior.
| RMSE |
bias |
SD |
estimate of SD |
||||||
|---|---|---|---|---|---|---|---|---|---|
| n1(=n2) | MAP | mean | MAP | mean | MAP | mean | BayesMBAR | asymptotic | bootstrap |
| 5 | 2.75 | 2.81 | –0.73 | –1.02 | 2.65 | 2.62 | 2.89 | 4.48 | 2.33 |
| 7 | 2.53 | 2.56 | –0.60 | –0.86 | 2.45 | 2.42 | 2.48 | 3.36 | 1.99 |
| 12 | 1.94 | 1.96 | –0.23 | –0.42 | 1.92 | 1.91 | 1.88 | 2.15 | 1.59 |
| 25 | 1.34 | 1.33 | –0.03 | –0.15 | 1.34 | 1.33 | 1.26 | 1.28 | 1.16 |
| 75 | 0.69 | 0.69 | 0.01 | –0.04 | 0.69 | 0.69 | 0.73 | 0.73 | 0.71 |
| 1000 | 0.19 | 0.19 | –0.00 | –0.01 | 0.19 | 0.19 | 0.20 | 0.20 | 0.20 |
Figure 3.
Free energy estimates of all states for computing the hydration free energy of phenol. SD_F (BayesMBAR), SD_F (asymptotic), and SD_F (bootstrap) are the average of the uncertainty estimates using BayesMBAR, the asymptotic analysis, and the bootstrap method, respectively, when n = 5. SD_F (true) is the true uncertainty when n = 5. Fref is the MBAR estimate computed using all configurations sampled from all repeats.
3.3.2. Normal Prior
The free energy surface along the alchemical variable λ is expected to be smooth; therefore, we can use a normal prior distribution in BayesMBAR to encode this prior knowledge. The squared exponential covariance function is defined as
| 20 |
where σ2 is the variance and lelec and lvdw are the length scales for λelec and λvdw, respectively.
The hyperparameters in the covariance functions and the mean parameter of the prior distribution are optimized by maximizing the Bayesian evidence. After optimizing the hyperparameters, we use the MAP and the posterior mean estimators to estimate the free energy difference between the two end states and compare them to the MAP estimator with the uniform prior distribution (Table 5), which is identical with the MBAR estimator.
Table 5. Comparison of the Performance of BayesMBAR with the Uniform Prior (MAP Estimate) and the Normal Prior (MAP and Posterior Mean Estimates) for Computing the Hydration Free Energy of Phenol.
| RMSE |
bias |
SD |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| uniform | normal |
uniform | normal |
uniform | normal |
||||
| n1(=n2) | MAP | mean | MAP | mean | MAP | mean | |||
| 5 | 2.75 | 2.18 | 2.15 | –0.73 | 0.69 | 0.57 | 2.65 | 2.07 | 2.07 |
| 7 | 2.53 | 1.97 | 1.96 | –0.60 | 0.66 | 0.55 | 2.45 | 1.86 | 1.88 |
| 12 | 1.94 | 1.60 | 1.61 | –0.23 | 0.56 | 0.49 | 1.92 | 1.50 | 1.53 |
| 25 | 1.34 | 1.22 | 1.22 | –0.03 | 0.43 | 0.38 | 1.34 | 1.15 | 1.16 |
| 75 | 0.69 | 0.75 | 0.75 | 0.01 | 0.11 | 0.09 | 0.69 | 0.74 | 0.74 |
| 1000 | 0.19 | 0.20 | 0.20 | –0.00 | –0.02 | –0.02 | 0.19 | 0.19 | 0.19 |
By incorporating the prior knowledge of the smoothness of the free energy surface, the BayesMBAR estimator with a normal prior distribution has a smaller RMSE than the MBAR estimator, especially when the number of configurations is small. As the number of configurations increases, the BayesMBAR estimator converges to the MBAR estimate. When the number of configurations is small, the information about free energy from data is limited, and the prior knowledge of the free energy surface excludes unlikely results and helps improve the estimate. When the number of configurations is large, the inference is dominated by the data, and the prior knowledge becomes less important because the prior knowledge used here is a relatively weak prior. This behavior is desirable because prior knowledge should be used when data alone are not sufficient to make a good inference and at the same time not bias the inference when data are sufficient.
4. Conclusion and Discussion
In this study, we developed BayesMBAR, a Bayesian generalization of the MBAR method based on the reverse logistic regression formulation of MBAR. BayesMBAR provides a posterior distribution of free energy, which is used to estimate free energies and compute the estimation uncertainty. When uniform distributions are used as the prior, the MAP estimate of BayesMBAR recovers the MBAR estimate. Besides the MAP estimate, BayesMBAR provides the posterior mean estimate of the free energy. Compared to the MAP estimate, the posterior mean estimate tends to have a larger bias but a smaller SD. The reason for such an observation could be that the posterior mean estimate takes into account the whole spread of the posterior distribution, which makes it more stable over repeated calculations and, at the same time, makes it more susceptible to extreme values. The difference in accuracy between the MAP estimate and the posterior mean estimate is small, and both estimates converge to the true value as the number of configurations increases. Therefore, both estimates can be used interchangeably in practice. In BayesMBAR, the estimation uncertainty is computed using the posterior SD. All benchmark systems in this study show that such uncertainty estimate from BayesMBAR is better than that from the asymptotic analysis and the Bennett’s method, especially when the number of configurations is small.
As a Bayesian method, BayesMBAR is able to incorporate prior knowledge about the free energy into the estimation. We demonstrated this feature by using a normal prior distribution to encode prior knowledge of the smoothness of free energy surfaces. All hyperparameters in the prior distribution are automatically optimized by maximizing the Bayesian evidence. By using such prior knowledge, BayesMBAR provides a more accurate estimate than the MBAR method when the number of configurations is small and converges to the MBAR estimate when the number of configurations is large.
To facilitate the adoption of BayesMBAR, we provide an open-source Python package at https://github.com/DingGroup/BayesMBAR. In cases where prior knowledge is not available, we recommend using BayesMBAR with a uniform distribution as the prior distribution. It takes the same input as MBAR and can thus be easily integrated into existing workflows, with the benefit of providing better uncertainty estimates. For computing free energy differences among points on a smooth free energy surface, we recommend using BayesMBAR with normal distributions as the prior. Because the hyperparameters in the normal prior distribution are automatically optimized, the extra input to BayesMBAR from the user, compared to MBAR, is the value of the collective variables associated with each thermodynamic state. In terms of covariance functions, although we used the squared exponential covariance function in this study, other covariance functions such as the Matérn covariance function and the rational quadratic covariance function24 could also be used. The choice of covariance functions could also be informed by comparing the Bayesian evidence after optimizing their hyperparameters.
Because BayesMBAR needs to sample from the posterior distribution, it is computationally more expensive than MBAR that uses the asymptotic analysis to compute the estimation uncertainty. Table S1 shows the running time required by MBAR and BayesMBAR for computing the hydration free energy of phenol when n = 1000 configurations are used from each alchemical state. Both MBAR and BayesMBAR were run on a graphic processing unit, and the FastMBAR40 implementation was used for MBAR calculations. Considering that most of the computational cost for calculating free energies lies in sampling configurations from the equilibrium distribution, and BayesMBAR provides better uncertainty estimates and more accurate free energy estimates when prior knowledge is available, we believe that the extra computational cost is worthwhile in practice.
BayesMBAR could also be extended to incorporate
other types of
prior knowledge about free energy such as knowledge from other calculations
or experimental data. For example, when computing relative binding
free energies of two ligands, A and B, with a protein using alchemical
free energy methods, results from cheaper calculations such as docking
or molecular mechanics/generalized Born surface area (MM/GBSA) calculations
could be used as prior knowledge. Specifically, the relative binding
free energy is often calculated as
, where
and
are free energy differences of changing
ligand A to B alchemically in the bound and unbound states, respectively.
Computing
is often much cheaper than computing
because the unbound state is in solvent
and thus does not require simulations with the protein. Therefore,
can be efficiently computed to a high precision.
On the other hand, docking or MM/GBSA calculations could provide rough
estimates on the absolute binding free energies of ligands A and B,
whose difference provides an estimate μ of
and associated uncertainty σ, i.e.,
has a normal distribution with mean μ
and SD σ. Combining
with the distribution of
from docking or MM/GBSA calculations, we
could construct a normal distribution on
and use it as the prior distribution when
computing
with BayesMBAR. Such estimated
using BayesMBAR could then be combined
with
to compute
. Rough estimates of
could also come from experimental data
such as qualitative competitive binding assays that only indicate
whether ligand A binds better than ligand B. In this case, the prior
distribution of
could be a bounded uniform distribution
with a lower or upper bound of 0. Similarly, combining such prior
knowledge on
with
, we could construct a bounded uniform distribution
on
and use it as the prior distribution for
computing
with BayesMBAR. We believe that such extension
of BayesMBAR could be useful in practice and will be explored in future
studies.
Acknowledgments
The author thanks the Tufts University High Performance Compute Cluster that was utilized for the research reported in this paper.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01212.
Statistical tests comparing the MAP estimator and the posterior mean estimator of BayesMBAR using the uniform prior distribution for computing free energy differences, saved configurations being uncorrelated in computing the hydration free energy of phenol, the standard error of estimating the ensemble mean potential energy using saved configurations sampled from alchemical states of phenol, and running time of MBAR and BayesMBAR (PDF)
The author declares no competing financial interest.
Special Issue
Published as part of Journal of Chemical Theory and Computationvirtual special issue “Machine Learning and Statistical Mechanics: Shared Synergies for Next Generation of Chemical Theory and Computation”.
Supplementary Material
References
- Chodera J. D.; Mobley D. L.; Shirts M. R.; Dixon R. W.; Branson K.; Pande V. S. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Curr. Opin. Struct. Biol. 2011, 21, 150–160. 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mobley D. L.; Bayly C. I.; Cooper M. D.; Shirts M. R.; Dill K. A. Small Molecule Hydration Free Energies in Explicit Solvent: An Extensive Test of Fixed-Charge Atomistic Simulations. J. Chem. Theory Comput. 2009, 5, 350–358. 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Panagiotopoulos A. Z. Monte Carlo Methods for Phase Equilibria of Fluids. J. Phys.: Condens. Matter 2000, 12, R25. 10.1088/0953-8984/12/3/201. [DOI] [Google Scholar]
- Dybeck E. C.; Abraham N. S.; Schieber N. P.; Shirts M. R. Capturing Entropic Contributions to Temperature-Mediated Polymorphic Transformations Through Molecular Modeling. Cryst. Growth Des. 2017, 17, 1775–1787. 10.1021/acs.cgd.6b01762. [DOI] [Google Scholar]
- Schieber N. P.; Dybeck E. C.; Shirts M. R. Using Reweighting and Free Energy Surface Interpolation to Predict Solid-Solid Phase Diagrams. J. Chem. Phys. 2018, 148, 144104. 10.1063/1.5013273. [DOI] [PubMed] [Google Scholar]
- Chipot C.; Pohorille A.. Free Energy Calculations; Springer, 2007; Vol. 86. [Google Scholar]
- Shirts M. R.; Chodera J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. 10.1063/1.2978177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan Z.; Gallicchio E.; Lapelosa M.; Levy R. M. Theory of Binless Multi-State Free Energy Estimation with Applications to Protein-Ligand Binding. J. Chem. Phys. 2012, 136, 144102. 10.1063/1.3701175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kong A.; McCullagh P.; Meng X.-L.; Nicolae D.; Tan Z. A Theory of Statistical Models for Monte Carlo Integration. J. R. Stat. Soc. Ser. B Methodol. 2003, 65, 585–604. 10.1111/1467-9868.00404. [DOI] [Google Scholar]
- Geyer C. J.Estimating Normalizing Constants and Reweighting Mixtures; 1994.
- Berger J. O.Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media, 2013. [Google Scholar]
- Stecher T.; Bernstein N.; Csányi G. Free Energy Surface Reconstruction from Umbrella Samples Using Gaussian Process Regression. J. Chem. Theory Comput. 2014, 10, 4079–4097. 10.1021/ct500438v. [DOI] [PubMed] [Google Scholar]
- Shirts M. R.; Ferguson A. L. Statistically Optimal Continuous Free Energy Surfaces from Biased Simulations and Multistate Reweighting. J. Chem. Theory Comput. 2020, 16, 4107–4125. 10.1021/acs.jctc.0c00077. [DOI] [PubMed] [Google Scholar]
- Habeck M. Bayesian Reconstruction of the Density of States. Phys. Rev. Lett. 2007, 98, 200601. 10.1103/PhysRevLett.98.200601. [DOI] [PubMed] [Google Scholar]
- Habeck M. Bayesian Estimation of Free Energies from Equilibrium Simulations. Phys. Rev. Lett. 2012, 109, 100601. 10.1103/PhysRevLett.109.100601. [DOI] [PubMed] [Google Scholar]
- Ferguson A. L. BayesWHAM: A Bayesian Approach for Free Energy Estimation, Reweighting, and Uncertainty Quantification in the Weighted Histogram Analysis Method. J. Comput. Chem. 2017, 38, 1583–1605. 10.1002/jcc.24800. [DOI] [PubMed] [Google Scholar]
- Maragakis P.; Ritort F.; Bustamante C.; Karplus M.; Crooks G. E. Bayesian Estimates of Free Energies from Nonequilibrium Work Data in the Presence of Instrument Noise. J. Chem. Phys. 2008, 129, 024102. 10.1063/1.2937892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bennett C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245–268. 10.1016/0021-9991(76)90078-4. [DOI] [Google Scholar]
- Neal R. M.MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo; CRC Press, 2011; Vol. 2, p 2. [Google Scholar]
- Hoffman M. D.; Gelman A. The No-U-turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
- John Skilling Nested Sampling for General Bayesian Computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar]
- Meng X.-L.; Schilling S. Warp Bridge Sampling. J. Comput. Graph Stat. 2002, 11, 552–586. 10.1198/106186002457. [DOI] [Google Scholar]
- Meng X.-L.; Wong W. H. Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration. Stat. Sin. 1996, 6, 831–860. [Google Scholar]
- Rasmussen C. E.; Williams C. K. I.. Gaussian Processes for Machine Learning; The MIT Press, 2005. [Google Scholar]
- Liu D. C.; Nocedal J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45, 503–528. 10.1007/BF01589116. [DOI] [Google Scholar]
- Carpenter B.; Gelman A.; Hoffman M. D.; Lee D.; Goodrich B.; Betancourt M.; Brubaker M.; Guo J.; Li P.; Riddell A. Stan: A Probabilistic Programming Language. J. Stat. Software 2017, 76, 1–32. 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu K.; Ge H.; Tebbutt W.; Tarek M.; Trapp M.; Ghahramani Z.. AdvancedHMC.Jl: A Robust, Modular and e Cient Implementation of Advanced HMC Algorithms. Proceedings of the 2nd Symposium on Advances in Approximate Bayesian Inference; 2020; pp 1–10.
- Lao J.; Louf R.. Blackjax: A Sampling Library for JAX; 2020.
- Jordan M. I.; Ghahramani Z.; Jaakkola T. S.; Saul L. K.. In Learning in Graphical Models; Jordan M. I., Ed.; Springer Netherlands: Dordrecht, 1998; pp 105–161. [Google Scholar]
- Kingma D. P.; Welling M.. Auto-Encoding Variational Bayes; 2022.
- Bradbury J.; Frostig R.; Hawkins P.; Johnson M. J.; Leary C.; Maclaurin D.; Necula G.; Paszke A.; VanderPlas J.; Wanderman-Milne S.; Zhang Q.. JAX: Composable Transformations of Python+NumPy Programs; 2018.
- Beutler T. C.; Mark A. E.; van Schaik R. C.; Gerber P. R.; van Gunsteren W. F. Avoiding Singularities and Numerical Instabilities in Free Energy Calculations Based on Molecular Simulations. Chem. Phys. Lett. 1994, 222, 529–539. 10.1016/0009-2614(94)00397-1. [DOI] [Google Scholar]
- Wang J.; Wolf R. M.; Caldwell J. W.; Kollman P. A.; Case D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
- Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
- Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L.-P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 2017, 13, e1005659 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z.; Liu X.; Yan K.; Tuckerman M. E.; Liu J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123, 6056–6079. 10.1021/acs.jpca.9b02771. [DOI] [PubMed] [Google Scholar]
- Åqvist J.; Wennerström P.; Nervall M.; Bjelic S.; Brandsdal B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chem. Phys. Lett. 2004, 384, 288–294. 10.1016/j.cplett.2003.12.039. [DOI] [Google Scholar]
- Chodera J. D. A. A Simple Method for Automated Equilibration Detection in Molecular Simulations. J. Chem. Theory Comput. 2016, 12, 1799–1805. 10.1021/acs.jctc.5b00784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ding X.; Vilseck J. Z.; Brooks C. L. I. Fast Solver for Large Scale Multistate Bennett Acceptance Ratio Equations. J. Chem. Theory Comput. 2019, 15, 799–802. 10.1021/acs.jctc.8b01010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








