Eigenvector method for umbrella sampling enables error analysis

Erik H Thiede; Brian Van Koten; Jonathan Weare; Aaron R Dinner

doi:10.1063/1.4960649

. 2016 Aug 30;145(8):084115. doi: 10.1063/1.4960649

Eigenvector method for umbrella sampling enables error analysis

Erik H Thiede ^1,^a), Brian Van Koten ^2,^a), Jonathan Weare ^2,^b), Aaron R Dinner ^1,^c)

PMCID: PMC5010559 PMID: 27586912

Abstract

Umbrella sampling efficiently yields equilibrium averages that depend on exploring rare states of a model by biasing simulations to windows of coordinate values and then combining the resulting data with physical weighting. Here, we introduce a mathematical framework that casts the step of combining the data as an eigenproblem. The advantage to this approach is that it facilitates error analysis. We discuss how the error scales with the number of windows. Then, we derive a central limit theorem for averages that are obtained from umbrella sampling. The central limit theorem suggests an estimator of the error contributions from individual windows, and we develop a simple and computationally inexpensive procedure for implementing it. We demonstrate this estimator for simulations of the alanine dipeptide and show that it emphasizes low free energy pathways between stable states in comparison to existing approaches for assessing error contributions. Our work suggests the possibility of using the estimator and, more generally, the eigenvector method for umbrella sampling to guide adaptation of the simulation parameters to accelerate convergence.

I. INTRODUCTION

One of the main uses of molecular simulations is the calculation of equilibrium averages. For understanding reaction processes, the free energy projected onto selected coordinates (collective variables) is of special interest. It relates directly to the probabilities of the coordinates taking particular values, and it can provide valuable information about the stable states, the barriers between them, and the origin of their stabilization. Furthermore, it is the starting point for most rate theories. Although in principle the free energy can be estimated from a long unbiased simulation, in practice doing so is challenging because bottlenecks slow the exploration of the configuration space. In other words, transitions between regions of the space are very infrequent in comparison to local fluctuations.

Various methods have been introduced to overcome this problem. Here, we consider one of the oldest and still most widely used such methods, umbrella sampling (US).^1,2 In this approach, the collective-variable interval of interest is covered by a series of simulations, in each of which the system is biased such that sampling is restricted to a relatively narrow window of values of the collective variables. This can be accomplished by addition of a biasing potential that is small in the window and large outside it. The information from the different simulations must be combined, and the effect of the bias removed, to obtain the overall free energy profile. This requires consistently normalizing the probabilities in different windows, a task that is complicated by the fact that the simulations are run independently.

Considerable effort has been devoted to determining how best to combine the results from different simulations. Initially, researchers manually adjusted the zero of free energy in each window to make the full free energy profile continuous and, often, smooth; conflicting results arising from limited sampling at the window peripheries were removed. The desire to use all the simulation data motivated the introduction of estimators that allow for systematically combining the data from different simulations. By far, the most widely used of these in chemical physics applications is the weighted histogram analysis method (WHAM). The multistate Bennett acceptance ratio (MBAR) method, as it is referred to in the molecular-simulation literature and will be referred to here, is closely related but does not rely on binning the data.^3–6 Both WHAM and MBAR can be derived from maximum-likelihood or minimum asymptotic variance principles, assuming independent, identically distributed sampling in each window, and have corresponding statistical optimality properties under those conditions. Recent extensions seek to improve performance when the sampling is limited and to extend the algorithm to more general ensembles.^7,8

In the present paper, we introduce an alternative scheme for estimating the free energy from US simulation data. In this approach, the normalization constants needed to combine information from separate simulation windows are the components of the eigenvector of a stochastic matrix that can be constructed from running averages in the windows. We thus term our method Eigenvector Method for Umbrella Sampling (EMUS). The advantage of our method is that it lends itself to error analysis. Following previous work,^9–11 we measure error with the asymptotic variance.

Our paper is organized as follows. After giving some background on US in Section II, we formulate EMUS in Section III. In Sections IV and V, we show that EMUS performs comparably to WHAM and MBAR, and discuss its connection with the latter. In Section VI, we use scaling arguments with simplifying assumptions to show that accounting for the error associated with combining the data is important and limits the speedups that can be achieved by increasing the number of simulation windows. In Section VII, we provide the full numerical analysis, which applies generally, without simplifying assumptions. Specifically, we derive a central limit theorem for averages from EMUS and use it to develop a means for estimating the error contributions from individual windows. We demonstrate the method for the free energy projected onto the ϕ and ψ dihedral angles of the alanine dipeptide and compare the error contributions with those from an estimator introduced by Zhu and Hummer.¹² We conclude in Section VIII.

II. BACKGROUND ON UMBRELLA SAMPLING

Here, we review umbrella sampling and establish basic terms and notation. The goal is the calculation of an average of an observable g over a time-independent probability distribution π,

〈g〉 = \int g (x) π (x) d x .

(1)

At thermal equilibrium, π is the Boltzmann distribution,

π (x) = \frac{exp (- H_{0} (x) / k_{B} T)}{\int exp (- H_{0} (x) / k_{B} T) d x},

(2)

where H₀ is the system Hamiltonian, k_B is Boltzmann’s constant, and T is the temperature. In particular, we can express the free energy difference between two states S₁ and S₂ as

Δ G = - k_{B} T ln (\frac{〈1_{S 1}〉}{〈1_{S 2}〉}),

(3)

where 1 is the indicator function

1_{S} (x) = {\begin{cases} 1 & if x \in S, and \\ 0 & otherwise . \end{cases}

(4)

Similarly, the reversible work to constrain a collective variable q(x) to a particular value q′, also known as the potential of mean force (PMF), may be written as

W (q^{'}) = - k_{B} T ln 〈δ (q - q^{'})〉 .

(5)

For complex systems, averages of the form in (1) must be evaluated numerically. Typically, this is done by generating a chain of related configurations, X_t, using Metropolis Monte Carlo methods or molecular dynamics, and by assuming ergodicity. Namely, as the number of configurations N goes to infinity, $〈g〉$ is the limit of the sample mean,

\bar{g} = \frac{1}{N} \sum_{t = 0}^{N - 1} g (X_{t}) .

(6)

In all practical sampling methods, successive configurations are strongly correlated. While ergodicity guarantees that sample means converge to averages over π, convergence can be extremely slow if the correlation between subsequent points is strong. This is the case when sampling π relies on visiting low-probability states, such as transition states of chemical reactions.

US methods address this issue by enforcing sampling of different regions of configuration space (windows), introducing L nonnegative bias functions ψ_i and then using L independent simulations to sample from the biased probability distributions,

π_{i} (x) = \frac{ψ_{i} (x) π (x)}{\int ψ_{i} (x) π (x) d x} .

(7)

The essential idea is that sampling each π_i is fast because ψ_i is chosen so that relatively likely states under ψ_i are not separated by relatively unlikely states. This is accomplished by restricting the set of states on which ψ_i is non-negligible so that π is closer to constant on that set. In Section VI C we make this point more carefully by examining a regime in which umbrella sampling can be shown to be exponentially more efficient than direct simulation. A popular choice is to use bias functions that take a Gaussian form,

ψ_{i} (q) = exp (- \frac{1}{2} k_{i} {(q - q_{i}^{0})}^{2} / k_{B} T),

(8)

such that

π_{i} (x) \propto exp [- (H_{0} (x) + \frac{1}{2} k_{i} {(q - q_{i}^{0})}^{2}) / k_{B} T] .

(9)

This corresponds to adding a harmonic potential centered at $q_{i}^{0}$ with spring constant k_i to the system Hamiltonian. We call the relative normalization constant of the ith biased distribution z_i,

z_{i} = \frac{\int ψ_{i} (x) π (x) d x}{\sum_{k = 1}^{L} \int ψ_{k} (x) π (x) d x} .

(10)

We also define the free energy in window i as

G_{i} = - k_{B} T ln z_{i} .

(11)

We denote averages over the biased distributions by

{〈g〉}_{i} = \int g (x) π_{i} (x) d x .

(12)

Overall averages of interest, 〈g〉, can be estimated as z_i-weighted sums of averages computed in each of the windows. We detail our prescription in Section III.

III. THE EIGENVECTOR METHOD FOR UMBRELLA SAMPLING

In this section, we present the Eigenvector Method for Umbrella Sampling (EMUS). We begin by defining

g^{*} \equiv \frac{g}{\sum_{k = 1}^{L} ψ_{k}} .

(13)

for any function g. Then, we observe that

〈g〉 = \int g (x) π (x) d x

= \int g (x) \{\frac{\sum_{i = 1}^{L} ψ_{i} (x) [\frac{\int ψ_{i} (x) π (x) d x}{\int ψ_{i} (x) π (x) d x}]}{\sum_{k = 1}^{L} ψ_{k} (x)}\} π (x) d x

= \sum_{i = 1}^{L} \int ψ_{i} (x) π (x) d x \frac{\int g^{*} (x) ψ_{i} (x) π (x) d x}{\int ψ_{i} (x) π (x) d x}

= \sum_{i = 1}^{L} z_{i} (\sum_{k = 1}^{L} \int ψ_{k} (x) π (x) d x) {〈g^{*}〉}_{i} .

(14)

The factor in parentheses can be taken out of the sum over i. To express this factor in terms of computable averages, we repeat the same steps with g = 1,

\sum_{k = 1}^{L} \int ψ_{k} (x) π (x) d x = \frac{1}{\sum_{i = 1}^{L} z_{i} {〈1^{*}〉}_{i}} .

(15)

Substituting (15) into (14),

〈g〉 = \frac{\sum_{i = 1}^{L} z_{i} {〈g^{*}〉}_{i}}{\sum_{i = 1}^{L} z_{i} {〈1^{*}〉}_{i}} .

(16)

Consequently, if we can evaluate the z_i, the ${〈1^{*}〉}_{i}$ , and the ${〈g^{*}〉}_{i}$ , then we can assemble the original average $〈g〉$ of interest. The averages ${〈g^{*}〉}_{i}$ can be computed from sequences $X_{t}^{i}$ (typically independent for each i) that sample the π_i. Umbrella sampling methods differ primarily in how the z_i are computed.

To express the constants z_i in terms of averages over the biased distributions, we take g(x) = ψ_j(x) in (14). Then, z_i solves

z_{j} = \sum_{i = 1}^{L} z_{i} F_{i j},

where

F_{i j} = {〈ψ_{j}^{*}〉}_{i} .

(17)

That is, the vector of normalization constants z is a left eigenvector of the matrix F with eigenvalue one. Under conditions to be elaborated in Section III B, the solution to (17) is uniquely specified when we notice that

\sum_{i = 1}^{L} z_{i} = 1 .

(18)

A. Computational procedure

In the EMUS algorithm, we estimate the entries of F and the averages ${〈g^{*}〉}_{i}$ and ${〈1^{*}〉}_{i}$ by sample means, then assemble the estimate of $〈g〉$ using (16). To be precise, we denote the sample means by

{\bar{g}}_{i}^{*} = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} \frac{g (X_{t}^{i})}{\sum_{k} ψ_{k} (X_{t}^{i})},

(19)

{\bar{1}}_{i}^{*} = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} \frac{1}{\sum_{k} ψ_{k} (X_{t}^{i})}, and

(20)

{\bar{F}}_{i j} = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} \frac{ψ_{j} (X_{t}^{i})}{\sum_{k} ψ_{k} (X_{t}^{i})} .

(21)

EMUS proceeds as follows:

1.
Choose the biasing functions ψ_i.
2.
Compute trajectories that sample states $X_{t}^{i}$ from the biased distributions π_i.
3.
Calculate the matrix $\bar{F}$ and the averages ${\bar{g}}_{i}^{*}$ and ${\bar{1}}_{i}^{*}$ .
4.
Calculate the vector of estimated normalization constants z^EMUS as the solution to
$z_{j}^{EMUS} = \sum_{i = 1}^{L} z_{i}^{EMUS} {\bar{F}}_{i j} with \sum_{i = 1}^{L} z_{i}^{EMUS} = 1 .$ (22)
We use QR factorization as in Golub and Meyer.¹³ See also Section VII B 1.
5.
Compute the estimate of $〈g〉$ ,
$〈 g 〉^{EMUS} = \frac{\sum_{i = 1}^{L} z_{i}^{EMUS} {\bar{g}}_{i}^{*}}{\sum_{i = 1}^{L} z_{i}^{EMUS} {\bar{1}}_{i}^{*}}$ (23)
by substituting z^EMUS and the sample means in (16).

We have provided an implementation of this algorithm online, along with implementations of the iteration described in Section IV and the asymptotic error estimate found in Section VII B 1.¹⁴ We remark that when one wishes to compute a free energy difference or a ratio of two observables, as in Equation (3), it is not necessary to compute the $1_{i}^{*}$ . Instead, one may use the formula

\frac{〈 g_{1} 〉}{〈 g_{2} 〉} = \frac{\sum_{i = 1}^{L} z_{i} 〈 g_{1} 〉_{i}}{\sum_{i = 1}^{L} z_{i} 〈 g_{2} 〉_{i}}

(24)

where g₁ and g₂ are arbitrary functions.

B. The eigenvector problem

In this section, we give conditions under which the eigenvector problem has a unique solution. First, we show that F is a stochastic matrix; that is, each element F_ij is nonnegative and every row of F sums to one:

\sum_{j = 1}^{L} F_{i j} = \sum_{j = 1}^{L} {〈\frac{ψ_{j}}{\sum_{k = 1}^{L} ψ_{k}}〉}_{i} = {〈\frac{\sum_{j = 1}^{L} ψ_{j}}{\sum_{k = 1}^{L} ψ_{k}}〉}_{i} = 1 .

(25)

The entries of F are nonnegative since we require that the bias functions be nonnegative. One can show that the matrix $\bar{F}$ is also stochastic by similar arguments.

A stochastic matrix J has a unique eigenvector with eigenvalue one if it is irreducible: for every possible grouping of the indices into two distinct sets, A and B, J_ij ≠ 0 for some i ∈ A and j ∈ B.¹⁵ In fact, this statement remains true when J is nonnegative with largest eigenvalue equal to one. For any such matrix we let z(J) denote the continuous function returning the unique left eigenvector of J corresponding to eigenvalue one.

In the case of the particular stochastic matrix F defined in (17) these statements imply that if, for any division of the indices into sets A and B, there is a sufficient overlap between the sets ∪_i∈A{x : ψ_i(x) > 0} and ∪_j∈B{x : ψ_j(x) > 0} then there will be a unique solution z(F) to (17) which necessarily equals the relative normalization constants z defined in (10). Because z(J) is a continuous function of its arguments, $z^{EMUS} = z (\bar{F})$ converges to z as $\bar{F}$ converges to F. Consequently, EMUS produces a consistent estimator in the sense that if the sample averages used to estimate the entries F_ij and ${〈g^{*}〉}_{i}$ converge (in probability or with probability one) to the true values, then the estimate of $〈g〉$ also converges (in the same sense).

IV. THE CONNECTION BETWEEN EMUS AND MBAR

Building upon earlier work in the statistics literature,^3,4,16 Shirts and Chodera⁶ suggested a class of algorithms for estimating free energy differences between states, which they termed MBAR. This method is similar to WHAM but does not require binning the simulation data to form histograms (see Tan et al.⁹). In this section, we explain the relation between EMUS and MBAR.⁶ We also derive a new iterative method for solving the MBAR equations, and we show that our iteration leads naturally to a new family of related consistent estimators.

The starting point of Shirts and Chodera⁶ is the identity (see their (5))

z_{j} \sum_{i = 1}^{L} {〈α_{i j} (x) ψ_{i} (x) π (x)〉}_{j} = \sum_{i = 1}^{L} z_{i} {〈α_{i j} (x) ψ_{j} (x) π (x)〉}_{i},

(26)

where α_ij(x) is an arbitrary function. They proposed the choice

α_{i j}^{MBAR} (x) = \frac{n_{i} / z_{i}}{\sum_{k} ψ_{k} (x) π (x) n_{k} / z_{k}},

(27)

where n_i is the number of uncorrelated samples in window i. Substituting (27) into (26) gives

z_{j} = \sum_{i = 1}^{L} z_{i} {〈\frac{ψ_{j} (x) n_{i} / z_{i}}{\sum_{k} ψ_{k} (x) n_{k} / z_{k}}〉}_{i} .

(28)

We can cast (28) in a form reminiscent of EMUS by writing

z_{j} = \sum_{i = 1}^{L} z_{i} F_{i j} (z),

(29)

where

F_{i j} (w) = {〈\frac{ψ_{j} (x) n_{i} / w_{i}}{\sum_{k} ψ_{k} (x) n_{k} / w_{k}}〉}_{i}

(30)

for any vector w with positive entries. EMUS corresponds to setting w = n so that

α_{i j}^{EMUS} (x) = \frac{1}{\sum_{k} π (x) ψ_{k} (x)},

(31)

and (26) reduces to the eigenproblem (17).

In practice, one must replace the matrix F_ij(w) in (30) by the sample mean approximation

{\bar{F}}_{i j} (w) = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} [\frac{ψ_{j} (X_{t}^{i}) n_{i} / w_{i}}{\sum_{k} ψ_{k} (X_{t}^{i}) n_{k} / w_{k}}] .

(32)

Substituting ${\bar{F}}_{i j} (z)$ for F_ij(z) in (29) yields the equation

z_{j}^{MBAR} = \sum_{i = 1}^{L} z_{i}^{MBAR} {\bar{F}}_{i j} (z^{MBAR})

(33)

for z^MBAR, which we refer to here as the MBAR estimator. If the samples $X_{t}^{i}$ are independent, MBAR is the nonparametric maximum-likelihood estimator of z.³

In practice, the samples $X_{t}^{i}$ are not independent for a given i, and the n_i must be estimated from data. Several algorithms for estimating the n_i have been proposed.^17–19 Shirts and Chodera⁶ base their estimates on the integrated autocorrelation times of physically motivated coordinates, and we follow this common practice here. In fact, once the n_i have been estimated, Shirts and Chodera⁶ suggest replacing sample averages over all N_i points by sample averages over the n_i points obtained by including only every N_i/n_i-th sample along the trajectory. We note that both the subsampling approach and the one in (32) correspond to approximations of expression (26) with (27), and we regard both as variations on the MBAR estimator. When the samples are independent, the two approaches are the same. In tests of the iterative EMUS algorithm introduced below, we find estimates to be insensitive to the choice of n_i and they can be set equal to 1, though in that case the estimator no longer corresponds directly to MBAR.

As written above, the MBAR estimator (33) resembles an eigenvector problem. However, the dependence of $\bar{F} (z)$ on z implies that the solution must be obtained self-consistently. The approach advocated by Shirts and Chodera for computing the MBAR estimator corresponds in the framework described here to solving (33) by a Newton-type iteration. However, the eigenvector form of (33) suggests an alternative approach. Rather than Newton’s method, we employ the following algorithm:

1.
As an initial guess for z^MBAR, choose a vector z⁰ with positive entries. Estimate the n_i. Set m = 0.
2.
- (a)
  Calculate ${\bar{F}}_{i j} (z^{m})$ according to (32).
- (b)
  Calculate a new estimate z^m+1 of z^MBAR by solving the eigenproblem
  $z_{j}^{m + 1} = \sum_{i = 1}^{L} z_{i}^{m + 1} {\bar{F}}_{i j} (z^{m}) .$ (34)
3.
If $max_{i} | z_{i}^{m + 1} - z_{i}^{m} | / z_{i}^{m} > Tolerance$ ,
- (a)
  Increment m;
- (b)
  Go to Step 2.

A similar algorithm was proposed by Meng and Wong.²⁰

To show that this iteration makes sense, we must prove that the eigenproblem (34) always has a unique solution and that z^m converges to z^MBAR as m goes to infinity. To see that the eigenproblem has a solution, first observe that if ${\bar{F}}_{i j} (w)$ is irreducible for one vector with positive entries, w, then it is irreducible for all vectors with positive entries. When applying the EMUS method, we thus assume that $\bar{F} (w)$ is irreducible. Moreover, observe that for any positive vector w, the vector with entries n_i/w_i is a right eigenvector of $\bar{F} (w)$ with eigenvalue one and positive entries. It follows from the Perron-Frobenius theorem that the matrix $\bar{F} (w)$ has a unique left eigenvector $z (\bar{F} (w))$ with eigenvalue one and that $z (\bar{F} (w))$ has positive entries. Thus, the eigenproblem always has a unique solution. We do not have a proof that the iterates converge. However, since z^MBAR is a fixed point of the iteration, if the iterates do converge, their limit must be z^MBAR. In practice, we find that the iteration converges quickly, usually to a relative error of 10⁻⁶ within 10 iterates.

In addition to its apparently rapid convergence, another argument in favor of the algorithm that we introduce above for solving (33) is that each iteration of the scheme results in a new consistent estimator. We will use the term iterative EMUS to refer to this family of estimators. With the initial guess z⁰ = n, the result, z¹, of the first iteration is the EMUS estimator defined in Section III. In the Appendix, we show that for any fixed finite number of iterations m, z^m is also a consistent estimator of the vector z of normalization constants. By contrast, other schemes, such as Newton’s method, for solving (33) may require that the number of iterations goes to infinity to obtain a consistent estimate. We also remark that the consistency result in the Appendix holds as long as the n_i converge to non-random, positive values with increasing numbers of samples N_i. They can be chosen as described above, or simply set to a fixed value.

Differences between the iterative EMUS scheme above and the application of Newton’s method proposed by Shirts and Chodera⁶ are mostly matters of implementation. As we will see in Section V, the results are not very sensitive to these computational details; most of the accuracy in the iterative EMUS approach is achieved in the first step. In any case, we remind the reader that the primary goal of this paper is to characterize those properties of the broader umbrella sampling approach that are essential to its success, not to analyze details of implementation.

While we focus here on potentials of mean force, the MBAR estimator has been applied to a broader category of free energy problems, including the analysis of single-molecule pulling experiments and alchemical free energy calculations.^6,21,22 The close relation between EMUS and MBAR indicates that error analysis of EMUS may provide insight into the sources of error in MBAR for these problems, but we do not pursue this idea further in the present work.

V. NUMERICAL COMPARISON

To test the algorithm numerically, we performed 100 independent umbrella sampling calculations for the PMF of the ϕ coordinate of the alanine dipeptide (i.e., N-acetyl-alanyl-N′-methylamide) in vacuum. Simulations were run using GROMACS version 5.1.1 with harmonic bias potentials applied using the PLUMED 2.2.0 software package.^23,24 The molecule was represented by the CHARMM 27 force field without CMAP corrections,²⁵ with covalent bonds to hydrogen atoms constrained by the LINCS algorithm.²⁶ Twenty windows were evenly spaced along the ϕ dihedral angle. The force constant k_i = 0.007 605 35 × 10⁻² kcal mol⁻¹ deg⁻² such that the standard deviation of the Gaussian bias functions was 9^∘. In each window, we integrated the equations of motion with the GROMACS leap-frog Langevin integrator with a 1 fs time step and a time constant of 0.1 ps. The temperature was 300 K. The system was equilibrated for 40 ps and then sampled for 100 ps, saving structures every 10 fs.

The data were then analyzed with EMUS, Grossfield’s implementation of WHAM,²⁷ and the algorithm proposed by Zhu and Hummer (ZH) (see Equation (A1) and the discussion following it in Appendix A of that paper¹²). The data were also analyzed with pyMBAR;⁶ as pyMBAR gave results virtually identical to WHAM, the results are not shown. In Figure 1, we show the resulting average potentials of mean force, as well as the standard deviation of the estimates over the 100 runs. WHAM and EMUS converge to the same result. This is to be expected, as both algorithms are consistent (i.e., they converge to the exact result as the amount of samples in each window tends to infinity; see Section VII), although WHAM exhibits a small bias from the binning of data for the histograms.⁶ The standard deviations of the free energies are slightly higher for EMUS than for the other two algorithms, but these differences are negligible compared with those expected for physically weighted simulations. Moreover, the relative performances are likely to be problem dependent. We note that ZH is based on thermodynamic integration, and the finite number of integration points causes quadrature error,²⁸ and, in turn, a systematic error in the barrier height.

FIG. 1. — Comparison of umbrella sampling methods applied to simulation data for the alanine dipeptide. (a) Average window free energies, *G_i*, for the indicated methods. Error bars are estimated standard deviations of the means. (b) Standard deviation of each method relative to that of the WHAM algorithm. Colors are the same as in (a). (c) EMUS as the first step in a self-consistent iteration to solve the MBAR equations (see text). The number of uncorrelated samples in each window (*n_i*) was estimated by calculating the integrated autocorrelation of the ϕ dihedral angle from each trajectory. Results shown are for identical molecular dynamics data (see text for simulation details); the methods differ only with respect to combination of the data to estimate the free energies.

In Figure 1(c), we apply the self-consistent iteration described in Section IV. For this calculation, we estimate the number of independent samples in each window (n_i) from the integrated autocorrelation time of the ϕ dihedral angle time series. We plot the standard deviation of the values of z calculated after the first iteration (EMUS), the second iteration, and after convergence to a relative residual smaller than 10⁻⁶. In general, convergence is achieved after an average of 9 iterations; none of the 100 data sets required more than 15 iterations. However, we note that after two iterations, the estimates of z already have a standard deviation equivalent to that of the WHAM algorithm. In this article we focus on the scheme corresponding to the first iteration only and do not attempt to analyze the improvement due to multiple iterations. In our tests the performance gain from multiple iterations is negligible compared to the improvement over direct approximation of free energy differences using long unbiased trajectories.

VI. JUSTIFICATION FOR UMBRELLA SAMPLING BY SCALING ARGUMENTS

The quality of a statistical estimate from umbrella sampling depends strongly on the choices made for the simulation windows. In this section, we discuss how the error scales as properties of the simulation change. We begin in Section VI A with a description of a prevalent justification for the use of US. We show in Section VI B that this argument is incomplete and, in turn, misleading. In Section VI C, we provide an alternative justification; namely, we show that in the low temperature limit, the cost to achieve a fixed accuracy by US grows slowly compared to direct simulation. In this section we make several simplifying assumptions that allow us to draw precise conclusions about the scaling properties of EMUS. In Section VII we provide error bounds for EMUS under much more general assumptions.

A. Scaling in the limit of many windows

To justify umbrella sampling, it is often suggested that the total computational time required to accurately sample statistics is inversely proportional to the number of windows, L.^17,29–32 The argument for this scaling proceeds as follows.

•
Divide a one-dimensional collective variable space into L windows of equal length, inversely proportional to L (i.e., L⁻¹).
•
Assume the windows are small enough that no free energy barriers exist in each window. The time to explore a window should be diffusion limited and proportional to the length of the window squared. Therefore, the simulation time required to accurately sample statistics in one window is also proportional to L⁻².
•
Because there are L windows, the total simulation time required to compute averages to fixed accuracy should scale as L × L⁻² = L⁻¹.

While this argument is now standard,^17,29–32 Virnau and Müller³³ observed that the error for computing the free energy difference between phases of Lennard-Jones particles with an approximately fixed amount of sampling was insensitive to the number of windows in practice, and they noted that the argument above neglects the error associated with combining the data from different simulation windows. Nguyen and Minh recently made a similar suggestion for a related class of methods.³⁴ This intuition is supported by our analysis in Section VI B, which shows that the total computational cost to achieve a fixed accuracy should be at best insensitive to the choice of L, so long as it is sufficiently large.

B. A simple model problem

To perform a more precise analysis, we make a number of simplifying assumptions. We emphasize that these assumptions are in force only for the purposes of the scaling arguments in this section. We provide more general error bounds for EMUS in Section VII.

Assumption 6.1.

The total computation time, N, is divided equally among the windows such that N_i = N/L.

Assumption 6.2.

The ψ_i are functions on the one-dimensional interval [0, 1], and the set of points where ψ_i is non-zero, {q : ψ_i > 0}, is an interval of length |{q : ψ_i > 0}| ≤ γ/L. We also assume that ψ_iψ_j = 0 unless |j − i| ≤ 1. Consequently, both the exact matrix F and the sample mean $\bar{F}$ are tri-diagonal. This assumption clearly does not hold when the bias functions ψ_i are Gaussian. Nonetheless, the rapid decay of Gaussian bias functions away from their peaks guarantees that entries of F and $\bar{F}$ far from the diagonal are very small, such that we expect our conclusions to still hold (though their justification would be more complicated).

Assumption 6.3.

The overlap of ψ_i and ψ_i±1 (i.e., the integral of their product) is large enough that

$min {F_{i, i + 1}, F_{i, i - 1}} > δ > 0$ (35)

for all L and for all i ≤ L. If our last assumption holds, but this one does not, then we can find more than one vector z satisfying Equations (17) and (18). This assumption is a slightly stronger version of the notion of irreducibility that we defined earlier (see Section III B). Note that we require the irreducibility to hold uniformly in the large L limit, and we thus introduce the δ, which is independent of L.

Assumption 6.4.

Sample averages computed in different windows are independent, i.e., ${\bar{F}}_{i, i \pm 1}$ and ${\bar{F}}_{j, j \pm 1}$ for j ≠ i are independent. We do not assume (here or anywhere else in this paper) that samples generated within a single window are independent. Indeed, even if the samples from π_i are independent, ${\bar{F}}_{i, i + 1}$ and ${\bar{F}}_{i, i - 1}$ are dependent random variables.

As an example average, let us consider the error in the free energy difference between the first and last windows,

Δ {\bar{G}}_{L, 1} = - k_{B} T ln (\frac{z_{L}}{z_{1}}) .

(36)

Assumption 6.2 is sufficient for $z (\bar{F})$ to be in detailed balance with $\bar{F}$ (Kelly Ref. 35, Lemma 1.5 and Section 1.3),

z_{i + 1} (\bar{F}) = z_{i} (\bar{F}) \frac{{\bar{F}}_{i, i + 1}}{{\bar{F}}_{i + 1, i}} .

(37)

Using (37) recursively,

Δ {\bar{G}}_{L, 1} = - k_{B} T ln (\frac{\prod_{i = 1}^{L - 1} {\bar{F}}_{i, i + 1}}{\prod_{i = 1}^{L - 1} {\bar{F}}_{i + 1, i}}) = - k_{B} T ln {\bar{F}}_{1, 2} + k_{B} T ln {\bar{F}}_{L, L - 1} + k_{B} T \sum_{i = 2}^{L - 1} ln (\frac{{\bar{F}}_{i, i + 1}}{{\bar{F}}_{i, i - 1}}) .

(38)

To understand the error (variance) of the terms in (38), we must further specify F_i,i+1 and F_i,i−1.

Assumption 6.5.

For N_min and L_min sufficiently large, when N_i ≥ N_min and L ≥ L_min,

$\frac{K_{m i n}}{N_{i} L^{2}} \leq var (ln (\frac{{\bar{F}}_{i, i + 1}}{{\bar{F}}_{i, i - 1}})) \leq \frac{K_{m a x}}{N_{i} L^{2}}$ (39)

for i = 2, 3, …, L − 1, and the same upper and lower bounds hold for $var (ln F_{1, 2})$ and $var (ln F_{L, L - 1})$ . This is just a precise interpretation of the diffusion limited sampling assumption made in the standard justification of US reproduced in Section VI A. Under such an assumption we expect both ${\bar{F}}_{i, i + 1}$ and ${\bar{F}}_{i, i - 1}$ to have variance on the order of 1/(N_iL²) and, in light of (35), the function ln(x/y) is smooth near (x, y) = (F_i,i+1, F_i,i−1). These considerations are closely related to Lemma 7.2 in Section VII A.

With all the assumptions in hand, we now complete the argument by taking the variance of both sides of (38). Since samples from different windows are independent, the variance of ${\bar{G}}_{L, 1}$ is a sum of contributions from each window,

var (Δ {\bar{G}}_{L, 1}) = var (k_{B} T ln {\bar{F}}_{1, 2}) + var (k_{B} T ln {\bar{F}}_{L, L - 1}) + \sum_{i = 2}^{L - 1} var (k_{B} T ln (\frac{{\bar{F}}_{i, i + 1}}{{\bar{F}}_{i, i - 1}})) .

(40)

Using (39) and substituting N_i = N/L, we find that, as long as N/L ≥ N_min and L ≥ L_min,

{(k_{B} T)}^{2} \frac{K_{m i n}}{N} \leq var (Δ {\bar{G}}_{L, 1}) \leq {(k_{B} T)}^{2} \frac{K_{m a x}}{N} .

(41)

To verify that this conclusion carries over to harmonic bias potentials, we performed multiple umbrella sampling calculations for a Brownian particle on a flat potential on the interval $[0, 1]$ with a stepsize of 1.0 × 10⁻⁶ and k_BT = 1.0 using Gaussian bias functions with a standard deviation of 1/L. The number of windows was varied from L = 10 to 46 in steps of 2. For each value, a total of 10⁷ steps were distributed equally in the windows, and the US calculation was repeated 480 times. We then calculated the mean square error of the free energy difference between the first and last window over the 480 replicates and determined how the mean square error scaled with L. Rather than the mean square error varying inversely with the number of windows, the data plotted in Figure 2 support a scaling of L⁰, consistent with (41).

FIG. 2. — The scaling of umbrella sampling error with number of windows on a flat potential. A Brownian particle on a flat, one-dimensional potential was simulated for 480 identical runs, and the free energy difference between the first and last windows was calculated, as described in the text. Here, the mean square error from the exact result is plotted against the number of windows. The lines show the scaling in error predicted by the L⁻¹ and L⁰ scalings. Fitting the data on a log-log scale give a scaling exponent of −0.026 ± 0.028.

It is worth noting that the inverse scaling with total cost N in (41) is exactly the scaling one would expect for the variance of an estimate of the free energy difference $Δ {\bar{G}}_{L, 1}$ constructed from a molecular dynamics trajectory of length N. Because US and direct simulations of comparable total numbers of steps require comparable computational effort (ignoring the overhead associated with combining the simulation data, which is typically small in comparison with the computational cost of the sampling), the benefits of US must be encoded in the constants K_min and K_max. A dramatic demonstration of this observation is the purpose of Section VI C.

C. The low temperature limit

To understand the benefits of umbrella sampling, we must study its performance in the presence of free energy barriers. In particular, we compare the performance of umbrella sampling to physically weighted sampling as the temperature goes to zero. In this limit, the cost of direct sampling increases exponentially with 1/T, while, as we show, the cost of umbrella sampling increases only algebraically. A formal discussion is given in a separate publication;³⁶ here, we present a simple plausibility argument.

Owing to the free energy barriers, the assumption of diffusive dynamics in each window no longer holds. Instead, we expect a form typical of reaction rate theories in each window. We define ΔW_i as the maximum difference in the PMF in window i,

Δ W_{i} = max_{{q : \underset{i}{ψ} (q) > 0}} \{W (q)\} - min_{{q : \underset{i}{ψ} (q) > 0}} \{W (q)\} .

(42)

Assumption 6.5 ’:

We now replace the upper and lower bounds in (39) by the upper bound

$var (ln (\frac{{\bar{F}}_{i, i + 1}}{{\bar{F}}_{i, i - 1}})) \leq \frac{K}{k_{B} T N_{i} L^{2}} exp (\frac{Δ W_{i}}{k_{B} T})$ (43)

for i = 2, 3, …, L − 1 with analogous replacements for i = 1 and i = L, as long as N_i ≥ N_min and L ≥ L_min. The constant K here is assumed to be independent of temperature. This bound captures the diffusion limited sampling assumption when L is very large, but is more detailed than (39) in that it captures (crudely) the increasing difficulty of the sampling problem as the temperature decreases with all other parameters held fixed. Under reasonable additional assumptions on the underlying potential, the bias functions ψ_i, and the sampling scheme, one can rigorously establish an asymptotic (large N_i) bound of the form in (43).³⁶

Substituting this new bound into (40), we find that, if L ≥ L_min and N/L ≥ N_min, then

var (Δ {\bar{G}}_{L, 1}) \leq \frac{K k_{B} T}{N L} \sum_{i = 1}^{L} exp (\frac{Δ W_{i}}{k_{B} T}) .

(44)

As the temperature decreases, we choose to increase L such that ΔW_i/k_BT is bounded above. This can be achieved by scaling L linearly with 1/T: if the derivative of the PMF is bounded (in absolute value) by $W_{m a x}^{'}$ , choosing L so that

L \geq \frac{W_{m a x}^{'}}{Ω k_{B} T}

(45)

ensures that ΔW_i/k_BT is bounded by Ω (since we have assumed that the argument of W is in [0, 1]). On the other hand, our assumption that the length of {q : ψ_i > 0} (Assumption 6.2) does not exceed γ/L implies that

\frac{Δ W_{i}}{k_{B} T} \leq \frac{W_{m a x}^{'} γ}{k_{B} T L} .

(46)

Consequently, as long as (45) holds,

\frac{Δ W_{i}}{k_{B} T} \leq Ω γ .

(47)

Finally, substituting this result into (44) we find that if L ≥ L_min and N/L ≥ N_min, then

var (Δ {\bar{G}}_{L, 1}) \leq \frac{K k_{B} T exp (Ω γ)}{N} \leq \frac{K L k_{B} T exp (Ω γ)}{N_{m i n}} .

(48)

With the best possible (smallest) choice of L allowed by (45), this bound becomes

var (Δ {\bar{G}}_{L, 1}) \leq \frac{K W_{m a x}^{'} exp (Ω γ)}{Ω N_{m i n}} .

(49)

The remarkable feature of the bound in (49) is that it is independent of T. This does not mean that the cost to achieve a fixed accuracy is independent of T. However, it does imply that as the temperature is decreased, we do not have to increase N_min to maintain a fixed accuracy. Expression (45) and the fact that N_i ≥ N_min together imply that, under the assumptions of this section, the computational cost of obtaining an accurate estimate of $Δ {\bar{G}}_{L, 1}$ by US increases algebraically with (k_BT)⁻¹. That scaling is to be compared to exponential in (k_BT)⁻¹ to achieve the same accuracy by direct simulation.

Finally, we remark that while our analysis provides a convincing explanation for the performance benefits offered by umbrella sampling, it neglects a number of practical realities. First, we have ignored the cost of equilibrating the simulation in each window, which can be challenging. Second, we have ignored the practical difficulties that arise when the number of windows grows large. The bias functions (restraints) can introduce additional free energy barriers that slow mixing in the degrees of freedom orthogonal to the collective variable, and, if sufficiently restrictive, could in principle necessitate modifying the elementary simulation step sizes. Our analysis reveals, however, that these difficulties are not responsible for the slow (L⁰) error scaling observed for large L.

VII. ANALYSIS OF THE ERROR OF EMUS

In this section, we study the error of EMUS in full generality, without imposing the simplifying assumptions of Section VI. Our main results are a central limit theorem for EMUS (Theorem 7.4) and an easily computed, practical error estimator which reveals the contributions of the different windows to the total error. These results may be used to compare the efficiency of EMUS and other methods and to study how the efficiency of EMUS depends on parameters such as the number of samples allocated to each window.

A. A central limit theorem for EMUS

Before developing the error analysis, we define a single notation for EMUS which incorporates both the case of a free-energy difference and the case of an ensemble average. In either case, one must compute $\bar{F}$ and also ${\bar{g}}_{1, i}^{*}$ and ${\bar{g}}_{2, i}^{*}$ for two real valued functions g₁ and g₂. To compute a free energy difference, we choose based on (3),

g_{1} = 1_{S 1} and g_{2} = 1_{S 2} .

(50)

To compute an ensemble average 〈g〉, we choose based on (16),

g_{1} = g and g_{2} = 1 .

(51)

We furthermore define the function

v_{i} (x) = (ψ_{1}^{*} (x), \dots, ψ_{L}^{*} (x), g_{1, i}^{*} (x), g_{2, i}^{*} (x))

(52)

so that

{\bar{v}}_{i} = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} v_{i} (X_{t}^{i}) = ({\bar{F}}_{i 1}, \dots, {\bar{F}}_{i L}, {\bar{g}}_{1, i}^{*}, {\bar{g}}_{2, i}^{*}),

(53)

where we remind the reader that, for each i, the process $X_{t}^{i}$ samples the biased distribution π_i. Define

\bar{v} = ({\bar{v}}_{1}, \dots, {\bar{v}}_{L}),

(54)

and let

〈v〉 = ({〈v_{1}〉}_{1}, \dots, {〈v_{L}〉}_{L})

(55)

denote the corresponding vector of exact averages. Using the notation defined in Section III B, the EMUS estimator takes the form $B (\bar{v})$ , where for a free-energy difference,

B (\bar{v}) = - k_{B} T ln (\frac{\sum_{i = 1}^{L} z_{i} (\bar{F}) {\bar{g}}_{1, i}^{*}}{\sum_{i = 1}^{L} z_{i} (\bar{F}) {\bar{g}}_{2, i}^{*}}),

(56)

and for an ensemble average,

B (\bar{v}) = \frac{\sum_{i = 1}^{L} z_{i} (\bar{F}) {\bar{g}}_{1, i}^{*}}{\sum_{i = 1}^{L} z_{i} (\bar{F}) {\bar{g}}_{2, i}^{*}} .

(57)

We now proceed with the error analysis. First, we characterize the error of the sample means over the biased distributions. As discussed by Frenkel and Smit (Ref. 17, Appendix D), the variance of a sample mean may be expanded in terms of the integrated autocovariance of the process. We define the autocovariance function of $v_{i} (X_{t}^{i})$ to be

C_{i} (t) = {〈(v_{i} (X_{0}^{i}) - {〈v_{i}〉}_{i}) {(v_{i} (X_{t}^{i}) - {〈v_{i}〉}_{i})}^{T}〉}_{i},

(58)

where T denotes a vector transpose, and here the outer 〈…〉_i denotes the exact average not only over $X_{0}^{i}$ sampled from π_i but also subsequent points of the sequence $X_{t}^{i}$ . Note that C_i(t) is a (L + 2) × (L + 2) matrix. We define the integrated autocovariance to be

Σ_{i} = \sum_{t = - \infty}^{\infty} C_{i} (t) .

(59)

The integrated autocovariance is the leading order coefficient in an expansion of the covariance ${\bar{v}}_{i}$ (see Ref. 17, D.1.3),

cov ({\bar{v}}_{i}) = \frac{Σ_{i}}{N_{i}} + o (\frac{1}{N_{i}}),

(60)

where o(1/N_i) denotes terms that go to zero faster than 1/N_i (i.e., N_io(1/N_i) → 0).

Under certain conditions on the process $X_{t}^{i}$ , one can strengthen the expansion of the covariance (60) to a central limit theorem (CLT) for ${\bar{v}}_{i}$ . We expect such a CLT to hold for most problems and most sampling methods in computational statistical physics. However, to avoid a lengthy and technical digression, we simply take the CLT as an assumption; we justify this assumption in more detail in another work,³⁶ and we refer to the work of Lelièvre et al. (Ref. 37, Section 2.3.1.2) for a general discussion of the CLT in the context of computational statistical physics.

Assumption 7.1 Central limit theorem for ${\bar{v}}_{i}$ —

We assume that

$\sqrt{N_{i}} ({\bar{v}}_{i} - 〈v_{i}〉) \overset{d}{\to} N (0, Σ_{i}),$ (61)

where Σ_i ∈ ℝ^(L+2)×(L+2) is the integrated autocovariance matrix defined in (59). The symbol $\overset{d}{\to}$ denotes convergence in distribution as N_i → ∞. Notice that when the elements of the sequence $X_{t}^{i}$ are independent and drawn from π_i then $Σ_{i} = {〈(v_{i} - {〈v_{i}〉}_{i}) {(v_{i} - {〈v_{i}〉}_{i})}^{T}〉}_{i} / N_{i}$ . More generally, samples are correlated, so Σ_i includes a factor that accounts for the time to decorrelate.

Having characterized the errors in the sample means, we now study how these errors propagate through the EMUS algorithm. Our goal is to prove a CLT for EMUS. We accomplish this using the delta method.

Lemma 7.2 The delta method; Proposition 6.2 of Bilodeau and Brenner³⁸ —

Let θ_N be a sequence of random variables taking values in ℝ^d. Assume that a central limit theorem holds for θ_N with mean μ ∈ ℝ^d and asymptotic covariance matrix Σ ∈ ℝ^d×d; that is, assume

$\sqrt{N} (θ_{N} - μ) \overset{d}{\to} N (0, Σ) .$ (62)

Let Φ : ℝ^d → ℝ be a function differentiable at μ. Then we have the central limit theorem

$\sqrt{N} (Φ (θ_{N}) - Φ (μ)) \overset{d}{\to} N (0, \nabla Φ^{T} (μ) Σ \nabla Φ (μ))$ (63)

for the sequence of random variables Φ(θ_N).

To motivate the delta method, we observe that if X has distribution N(μ, Σ), then ∇Φ(μ)^TX has distribution $N (Φ (μ), \nabla Φ^{T} (μ) Σ \nabla Φ (μ))$ . That is, according to the delta method, the asymptotic distribution of Φ(X) is the linearization of Φ at μ applied to the asymptotic distribution of X. Thus, one may regard the delta method as a rigorous version of the standard error propagation formula based on linearization.

We prove the CLT for EMUS by applying the delta method with $\bar{v}$ taking the place of θ_N and with the function B taking the place of Φ. We require the following assumptions in addition to Assumption 7.1.

Assumption 7.3.

We assume the following:

1.
The proportion of the total number of samples drawn from each window is constant in the limit as N → ∞; that is,
$lim_{N \to \infty} N_{i} / N = κ_{i} .$ (64)

2.
Sampling in different windows is independent; that is, ${\bar{v}}_{i}$ is independent of ${\bar{v}}_{j}$ when j ≠ i.

3.
The biasing functions ψ_i are chosen so that F is irreducible; see Section III B.

We now give the CLT for EMUS.

Theorem 7.4 Central Limit Theorem for EMUS —

Let Assumptions 7.1 and 7.3 hold. Let

$\frac{\partial B}{\partial {\bar{v}}_{i}} = (\frac{\partial B}{\partial {\bar{F}}_{i 1}}, \dots, \frac{\partial B}{\partial {\bar{F}}_{i L}}, \frac{\partial B}{\partial {\bar{g}}_{1, i}^{*}}, \frac{\partial B}{\partial {\bar{g}}_{2, i}^{*}}) \in R^{L + 2}$ (65)

denote the partial derivative of B with respect to ${\bar{v}}_{i}$ . Under the assumptions stated above,

$\sqrt{N} (B (\bar{v}) - B (〈v〉)) \overset{d}{\to} N (0, σ^{2}),$ (66)

where

$σ^{2} = \sum_{i = 1}^{L} \frac{1}{κ_{i}} ({\frac{\partial B}{\partial {\bar{v}}_{i}}}^{T} Σ_{i} \frac{\partial B}{\partial {\bar{v}}_{i}}) .$ (67)

We refer to σ² as the asymptotic variance of EMUS.

Proof.

First, we write down a central limit theorem for $({\bar{v}}_{1}, \dots, {\bar{v}}_{L})$ . We have that

$\sqrt{N} ({\bar{v}}_{i} - {〈v_{i}〉}_{i}) \overset{d}{\to} N (0, κ_{i}^{- 1} Σ_{i})$ (68)

by Assumption 7.1 and (64). Since the sampling in different windows is assumed to be independent, (68) implies

$\sqrt{N} (\bar{v} - 〈v〉) \overset{d}{\to} N (0, Σ),$ (69)

where Σ ∈ ℝ^{L(L+2)×L(L+2)} is the block diagonal matrix

$Σ = [\begin{array}{cccc} Σ_{1} / κ_{1} & 0 & 0 & \dots \\ 0 & Σ_{2} / κ_{2} & 0 & \dots \\ 0 & 0 & Σ_{3} / κ_{3} & \dots \\ ⋮ & ⋮ & ⋮ & ⋱ \end{array}] .$ (70)

Second, we verify that B is differentiable at $\bar{v}$ . Since F is assumed to be an irreducible stochastic matrix, $z (\bar{F})$ is differentiable at $\bar{F}$ . We refer to Thiede et al.³⁹ Lemma 3.1 for a complete explanation. It follows from the chain rule that B is differentiable at $\bar{v}$ .

Finally, applying Lemma 7.2 with B playing the role of Φ and $\bar{v}$ the role of θ_N concludes the proof.□

The asymptotic variance σ² appearing in Theorem 7.4 measures the rate at which the error of EMUS decreases with the number of samples. To make this precise, we observe that Theorem 7.4 is equivalent to the following asymptotic result concerning confidence intervals. For every α > 0,

lim_{N \to \infty} P [|B (\bar{v}) - B (〈v〉)| \leq \frac{α σ}{\sqrt{N}}] = \erf (\frac{α}{\sqrt{2}}),

(71)

where P denotes a probability and erf denotes the error function.

The asymptotic variance is commonly used to measure the efficiency of an estimator. We refer to the work of van der Vaart⁴⁰ for an explanation and for a discussion of other possibilities. In Section VII B, we explain how the proportion κ_i of samples allocated to each window may be adjusted to minimize the asymptotic variance of EMUS, thereby maximizing efficiency.

We note that a central limit theorem similar to Theorem 7.4 has been proved for the MBAR estimator by Gill et al. (Ref. 4, Proposition 2.2). However, the authors of this work do not study the dependence of the asymptotic variance on the parameters, as we do. In fact, the MBAR estimator is significantly more complicated than EMUS, and its dependence on the number of windows and the allocation of samples is harder to understand.

We³⁶ use a result similar to Theorem 7.4 to generalize the conclusions of Section VI to periodic and multi-dimensional reaction coordinates and to a wider class of observables than free energy differences. We show both that the asymptotic variance is constant in the limit of large L and that the work required to compute an average to fixed precision increases only algebraically in the low temperature limit. In addition, we use recently developed perturbation estimates for Markov chains³⁹ to quantify the dependence of the asymptotic variance of EMUS on the degree to which the bias functions overlap.

B. Estimating the asymptotic variance of EMUS

Our goal in this section is to derive a computable estimate ${\bar{σ}}^{2}$ of the asymptotic variance σ², which can be decomposed to assess the contributions from individual windows to errors in averages. We recall that formula (67) for σ² involves partial derivatives of B. Our estimate ${\bar{σ}}^{2}$ of σ² requires explicit formulas for these partial derivatives. We provide the appropriate expressions, both for ensemble averages and for free-energy differences, in Lemma 7.5. Following the partial derivatives, we present an algorithm for evaluating ${\bar{σ}}^{2}$ and demonstrate it for the alanine dipeptide. Finally, we compare with the output of a procedure from Zhu and Hummer (ZH)¹² in Section VII B 3.

Lemma 7.5.

We have the following formulas for $\partial B / \partial {\bar{v}}_{i}$ :

1.
When EMUS is used to compute an ensemble average 〈g〉, B is defined by (57), and we have
$\frac{\partial B}{\partial {\bar{F}}_{i j}} (\bar{v}) = \frac{\sum_{k} z_{i} (\bar{F}) {(I - \bar{F})}_{j k}^{#} ({\bar{g}}_{k}^{*} - B (\bar{v}) {\bar{1}}_{k}^{*})}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{k}^{*}},$ (72)

$\frac{\partial B}{\partial {\bar{g}}_{1, i}^{*}} (\bar{v}) = \frac{z_{i} (\bar{F})}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{k}^{*}}, and$ (73)

$\frac{\partial B}{\partial {\bar{g}}_{2, i}^{*}} (\bar{v}) = - \frac{B (\bar{v}) z_{i} (\bar{F})}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{k}^{*}},$ (74)
where # denotes the group inverse (see Ref. 13 and discussion below).

2.
When EMUS is used to compute a free-energy difference, B is defined by (56), and we have
$\frac{\partial B}{\partial {\bar{F}}_{i j}} (\bar{v}) = k_{B} T z_{i} (\bar{F}) (\frac{\sum_{k} {(I - F)}_{j k}^{#} {\bar{1}}_{S 2, k}^{*}}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{S 2, k}^{*}} - \frac{\sum_{k} {(I - F)}_{j k}^{#} {\bar{1}}_{S 1, k}^{*}}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{S 1, k}^{*}}),$ (75)

$\frac{\partial B}{\partial {\bar{g}}_{1, i}^{*}} (\bar{v}) = k_{B} T \frac{z_{i} (\bar{F})}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{S 1, k}^{*}}, and$ (76)

$\frac{\partial B}{\partial {\bar{g}}_{2, i}^{*}} (\bar{v}) = - k_{B} T \frac{z_{i} (\bar{F})}{\sum_{k} z_{k} (\bar{F}) {\bar{1}}_{S 2, k}^{*}} .$ (77)

3.
When EMUS is used to compute the free energy of the k th window, B is defined by (11), and we have
$- k_{B} T \frac{\partial B}{\partial {\bar{F}}_{i j}} (\bar{v}) = \frac{z_{i} (\bar{F})}{z_{k} (\bar{F})} {(I - \bar{F})}_{j k}^{#} and$ (78)

$\frac{\partial B}{\partial {\bar{g}}_{i, 1}^{*}} (\bar{v}) = \frac{\partial B}{\partial {\bar{g}}_{i, 2}^{*}} (\bar{v}) = 0 .$ (79)
Note that for a free energy difference between windows, we can simply subtract derivatives for the corresponding windows.

Proof.

We begin by reminding the reader that the output of EMUS is the vector of window normalization constants, z, which depends on the sample mean $\bar{F}$ . Because all other averages and, in turn, their derivatives rely on z, we need to determine the sensitivity of each element of z to each element of $\bar{F}$ (i.e., $\partial z_{k} / \partial {\bar{F}}_{i j}$ ). Since $\bar{F}$ is a stochastic matrix, some care must be taken in defining this derivative. We resolve the technical difficulties in detail elsewhere; see Ref. 36 and Ref. 39 (Lemma 3.1). Here, to obtain the derivative $\partial z_{k} / \partial {\bar{F}}_{i j}$ , evaluated at $\bar{F}$ , we perturb around $\bar{F}$ ,

${\frac{d}{d ε}|}_{ε = 0} z_{k} (\bar{F} + ε E) = \sum_{i, j = 1}^{L} \frac{\partial z_{k}}{\partial {\bar{F}}_{i j}} (\bar{F}) E_{i j},$ (80)

where E is an arbitrary matrix, ε is a scalar, and we assume that the sum $\bar{F} + ε E$ is also a stochastic matrix. The right hand side follows from the chain rule, effectively treating each element of the matrix as a separate argument to each element z. Then, we employ a relation from Golub and Meyer (Ref. 13, Theorem 3.1),

${\frac{d}{d ε}|}_{ε = 0} z_{k} (\bar{F} + ε E) = z {(\bar{F})}^{T} E {(I - \bar{F})}^{#} e_{k},$ (81)

where # denotes the group inverse, a generalized matrix inverse similar to the Moore-Penrose inverse. It is defined as satisfying AA^#A = A, A^#AA^# = A^#, AA^# = A^#A. We refer to Golub and Meyer¹³ for further discussion of the group inverse and an algorithm for computing it. Finally, we equate (80) and (81) and solve for the derivative of interest,

$\frac{\partial z_{k}}{\partial {\bar{F}}_{i j}} (\bar{F}) = z_{i} (\bar{F}) {(I - \bar{F})}_{j k}^{#} .$ (82)

Thus the sensitivity of each element of z to each element of $\bar{F}$ can be computed from linear algebra operations.

With (82), we can now compute the derivatives of B. We derive the formulas for the free-energy difference explicitly; the other cases are similar. In this case,

$B (\bar{v}) = k_{B} T ln (\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{2, k}^{*}) - k_{B} T ln (\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{1, k}^{*}) .$ (83)

By the chain rule,

$\frac{\partial}{\partial {\bar{g}}_{1, i}^{*}} ln (\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{1, k}^{*}) = \frac{z_{i} (\bar{F})}{\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{1, k}^{*}}$ (84)

and

$\frac{\partial}{\partial {\bar{F}}_{i j}} ln (\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{1, k}^{*}) = \frac{\sum_{k = 1}^{L} \frac{\partial z_{k}}{\partial {\bar{F}}_{i j}} (\bar{F}) {\bar{g}}_{1, k}^{*}}{\sum_{k = 1}^{L} z_{k} (\bar{F}) {\bar{g}}_{1, k}^{*}} .$ (85)

The stated result follows by substituting g₁ = 1_S1, g₂ = 1_S2, and the expression in (82) for $\partial z_{k} / \partial {\bar{F}}_{i j}$ .□

1. Computational procedure

We now provide a practical procedure that uses the derivatives above to estimate σ² from trajectories that sample the distributions π_i. For clarity, we assume that the system is equilibrated (i.e., $X_{0}^{i}$ has distribution π_i, so that the process $X_{t}^{i}$ is stationary) throughout this section.

We begin by rewriting (67) as

σ^{2} = \sum_{i = 1}^{L} \frac{χ_{i}^{2}}{κ_{i}},

(86)

where

χ_{i}^{2} = {\frac{\partial B}{\partial {\bar{v}}_{i}}}^{T} Σ_{i} \frac{\partial B}{\partial {\bar{v}}_{i}} = {\frac{\partial B}{\partial {\bar{v}}_{i}}}^{T} (\sum_{t = - \infty}^{\infty} C_{i} (t)) \frac{\partial B}{\partial {\bar{v}}_{i}} .

(87)

Defining the sequence

ζ_{t}^{i} = {\frac{\partial B}{\partial {\bar{v}}_{i}}|}_{〈v〉} \cdot (v_{i} (X_{t}^{i}) - {〈v_{i}〉}_{i})

(88)

we find that

χ_{i}^{2} = \sum_{t = - \infty}^{\infty} 〈ζ_{t}^{i} ζ_{0}^{i}〉

(89)

which is the integrated autocovariance of $ζ_{t}^{i}$ .

We thus propose the following algorithm, given simulation data:

1.
Compute $\bar{v}$ .
2.
Compute $z (\bar{F})$ and ${(I - \bar{F})}^{#}$ using the algorithm of Golub and Meyer.¹³
3.
Evaluate $\partial B / \partial {\bar{v}}_{i}$ at $\bar{v}$ using the formulas in Lemma 7.5.
4.
Compute
${\bar{ζ}}_{t}^{i} = {\frac{\partial B}{\partial {\bar{v}}_{i}}|}_{{\bar{v}}_{i}} \cdot (v_{i} (X_{t}^{i}) - {\bar{v}}_{i}) .$ (90)
5.
Compute an estimate ${\bar{χ}}_{i}^{2}$ of the integrated autocovariance of ${\bar{ζ}}_{t}^{i}$ using an algorithm such as ACOR.⁴¹
6.
Compute the estimate of σ²,
${\bar{σ}}^{2} = \sum_{i = 1}^{L} \frac{{\bar{χ}}_{i}^{2}}{κ_{i}} .$ (91)

Since $\bar{F}$ , $\bar{v}$ , and z are all computed in the process of obtaining the EMUS averages, estimating ${\bar{σ}}^{2}$ only requires one additional pass over the simulation data. This additional cost is insignificant compared with that of computing the trajectories.

Both (67) and its approximation (91) decompose the asymptotic variance of EMUS into a sum of contributions from each window. By comparing the sizes of terms in the sum, we can determine the degrees to which different windows contribute to the error. In principle, this information can be used to guide modification of the parameters of the simulation to improve efficiency. For instance, one might adjust the proportion of samples allocated to each window, κ_i, to minimize the asymptotic variance. From (86), the asymptotic variance σ² is minimized when κ_i ∝ χ_i (see Equation (42) of Ref. 12). Consequently, we can define the relative importance of window i as

μ_{i} = L \frac{χ_{i}}{\sum_{k = 1}^{L} χ_{i}},

(92)

where the normalization is chosen so that μ_i = 1, regardless of L, if all windows have the same importance. The relative importance represents how many samples would be allocated to a window to optimally estimate a specific observable, compared to a uniform distribution over all windows.

2. Numerical results

To study the behavior of these estimates, we performed a two-dimensional umbrella sampling calculation with restraints on the ϕ and ψ dihedral angles of the alanine dipeptide. Parameters were the same as in the one-dimensional calculation above, with the addition of 20 bias functions in the ψ dihedral with the same force constant, creating a grid of 400 windows. Each window was equilibrated for 40 ps and sampled for a further 150 ps, with the collective variable values output every 10 fs.

In Figures 3 and 4(a), we plot the two-dimensional PMF from EMUS and the importances for the free energy difference between two windows located at the C₇ equatorial and C₇ axial configurations. Comparison shows that the importances are high for windows on low free energy pathways between the two windows of interest. Two such pathways exist. In the representation in Figure 3, one proceeds up and to the left of the C₇ equatorial basin and then (via the periodic boundaries) enters the C₇ axial basin through transition state 1 (TS1 in Figure 3). The other pathway proceeds down then right through transition state 2 (TS2 in Figure 3). Of these two pathways, the first has a lower free energy barrier. We observe that the EMUS importances are larger for windows located on this pathway. In contrast, windows off these pathways in regions with high free energies have very low importances presumably because, though sampling error may be large in those regions, they do not contribute significantly to the desired averages. We refer the reader to Thiede et al.³⁹ for a mathematical discussion of the sensitivities of the averages.

FIG. 3. — Potential of mean force obtained from US with biases on the ϕ and ψ dihedral angles. Major basins and barriers on pathways connecting them are indicated. Angles are measured in degrees. The scale bar indicates PMF values in kcal/mol, and the contour spacing is 2 *k_BT*. The surface is constructed from simulation data accumulated in histograms with 100 bins in each collective variable. See text for simulation details.

FIG. 4. — EMUS relative importances. Angles are measured in degrees. (a) Relative importances for the free energy difference between windows in the C₇ axial and C₇ equatorial basins. The window in the C₇ equatorial basin is centered at (–81°,81°), and the window in the C₇ axial basin at (63°,–63°). (b) Window importances for the free energy difference between windows in the C₇ axial basin and at TS1. Windows are centered at (63°,–63°) and (135°,−117°), respectively. (c) Importances for the free energy of the window at TS1. (d) Importances for the window in the C₇ axial basin.

We expect the importances to depend on the computed average. To illustrate that this is the case numerically, we show the log importances for the free energy difference between a window in the C₇ axial basin and one located on TS1 in Figure 4(b). Compared to Figure 4(a), the importances are higher in the C₇ axial basin and lower in the C₇ equatorial basin, which highlights that the importances depend on the average computed and do not simply mirror the free energy. In Figures 4(c) and 4(d), we plot the importances for estimating the window free energy (not the free energy difference) of the window on TS1 and the window in the C₇ axial basin, respectively. We note that the importances in the C₇ equatorial basin are higher in Figures 4(c) and 4(d) than in 4(b). This suggests that when the free energy difference between the two windows is considered, there is some cancellation of the errors arising in the C₇ equatorial basin.

3. Comparison with other algorithms for determining error contributions

Zhu and Hummer¹² proposed an algorithm for determining window free energies by calculating the mean restraining forces for each window and using thermodynamic integration to estimate free energy differences between adjacent windows. These are combined using least squares to calculate window free energies. Like EMUS, this algorithm allows one to construct error estimates that can be decomposed into contributions from individual windows. The authors give an expression for the error in the free energy of one window. This expression can be easily extended to the free energy difference between two windows, giving

var (Δ G_{j i}) = \sum_{k}^{L} var (\sum_{α}^{D} (c_{j k α} - c_{i k α}) {\bar{f}}_{k α}),

(93)

where ${\bar{f}}_{k α}$ is the average force exerted by the bias function for window k in the αth dimension. The constants c_ikα and c_jkα are defined in Appendix A of Zhu and Hummer.¹² The authors propose that these error estimates are applicable to WHAM and other umbrella sampling algorithms.

Both (67) and (93) are sums of contributions from individual windows. Using the formalism introduced in Section VII B 1, we define the process

ζ_{t}^{k, Z H} = \sum_{α}^{D} (c_{j k α} - c_{i k α}) (f_{k α} (X_{t}^{k}) - 〈f_{k α}〉),

(94)

and ${(χ_{i}^{Z H})}^{2}$ as the integrated autocovariance of $ζ_{t}^{k, Z H}$ . This allows us to define importance for the Zhu and Hummer algorithm analogously to those for EMUS (see (92)).

We applied the ZH error analysis to the two-dimensional umbrella sampling data used in Section VII B 2 and calculated the importances for the same free energy difference as in Figure 4(a) (Figure 5(a)). Rather than falling along the low free energy pathways, as for EMUS, the ZH importances mirror the autocorrelation times (Figure 5(b)). This indicates that windows have large ZH importances if they have large fluctuations in free energy. We thus see that different algorithms emphasize different windows in US. We can understand the behaviors of these two algorithms by considering (88) and (94). The factor $\partial B / \partial {\bar{v}}_{i}$ in (88) depends explicitly on the normalization constant for each window (see Lemma 7.5). By contrast, the factor $(c_{j k α} - c_{i k α})$ in (94) depends only on the relative positions of the windows and not on their free energies.

FIG. 5. — Comparison to the work of Zhu and Hummer. (a) ZH estimates for the relative importances for the free energy difference between windows in the C₇ axial and C₇ equatorial basins. Compare with Figure 4(a). (b) Autocorrelation times of the trajectory $ζ_{t}^{k, Z H}$ in each window. The largest value observed is 3 ps, but the scale is limited to 1 ps for visual clarity.

VIII. CONCLUSIONS

The success of an umbrella sampling simulation depends on the choice of windows (i.e., how the system is biased) and the estimator used to determine the normalization constants of the windows from trajectory data. Here, we show that the normalization constants can be obtained from an eigenvector of a stochastic matrix. This eigenvector method for umbrella sampling (EMUS) can be viewed as the first step in an implementation of the MBAR estimator. In our experience, this first step is nearly converged, and machine precision is reached in only a few iterations. Moreover, each iteration yields a consistent estimate. Most importantly, error analysis is considerably easier for EMUS than MBAR because the elements of the stochastic matrix do not depend on the normalization constants.

Within this framework, we revisited a common scaling argument for justifying umbrella sampling and showed that once the number of windows becomes sufficiently large, the scheme does not benefit from the addition of more windows (i.e., the variance is not further reduced for a fixed computational effort). We show that an alternative scaling regime in which temperature decreases (or, equivalently, free-energy barrier heights increase) as the number of windows increases best demonstrates the potential benefits of the umbrella sampling strategy; in that regime the efficiency improvement over direct simulation is exponential in the (inverse) temperature.

Our main theoretical result is a central limit theorem for the statistical averages obtained from EMUS. This result relies on the delta method, which we use to characterize the propagation of the asymptotic error through the solution of a stochastic matrix eigenproblem. The central limit theorem provides an expression for the asymptotic variance of the averages of interest. It is a sum of contributions from individual windows, and we use it to develop a prescription for estimating the relative importances of windows for averages from the trajectory data. For free energy differences of states of the alanine dipeptide, we find numerically that the importances are largest for low-free energy pathways that connect the specific states of interest. These results suggest that the importances could serve as the basis for adaptive schemes that focus computational effort on the windows of most importance. Even more interesting would be to adjust the bias functions as the simulation progresses. How best to do this remains an open area of investigation.

Acknowledgments

This research was supported by National Institutes of Health (NIH) Grant No. 5 R01 GM109455-02. We wish to thank Jonathan Mattingly, Jeremy Tempkin, and Charlie Matthews for helpful discussions.

APPENDIX: CONSISTENCY OF ITERATIVE EMUS

Here, we prove that for fixed finite m, z^m is a consistent estimator of the vector of normalization constants z. With the initial guess z⁰ = n, the result, z¹, of the first iteration is the EMUS estimator. We now show that z² is also consistent in the sense that if the trajectory averages defining $\bar{F} (w)$ converge then z² converges to z. Because the various sequences in question are sequences of random variables, one must specify what is meant by convergence. The argument below applies when convergence refers either to convergence in probability or convergence with probability one (almost sure convergence) as long as the notion of convergence is consistent throughout. Consistency of z^m follows by induction on m using a similar argument.

For any positive vector w, we define

u_{k} = \prod_{j \neq k} \frac{w_{j}}{n_{j}},

(A1)

and we write

{\bar{F}}_{i j} (w) = \frac{1}{N_{i}} \sum_{t = 0}^{N_{i} - 1} h_{i j} (u, x),

(A2)

where

h_{i j} (u, x) = \frac{u_{i} ψ_{j} (x)}{\sum_{k} u_{k} ψ_{k} (x)} .

(A3)

We then observe that

\partial_{u_{k}} h_{i j} (u, x) = {\begin{cases} \frac{1}{u_{i}} h_{i j} (u, x) [1 - h_{i i} (u, x)], & if k = i \\ - \frac{1}{u_{i}} h_{i j} (u, x) h_{i k} (u, x), & if k \neq i \end{cases} .

(A4)

Because h_ij(u, x) ≤ u_i/u_j,

| \partial_{u_{k}} h_{i j} (u, x) | \leq \frac{1}{u_{j}} max \{1, \frac{u_{i}}{u_{k}}\} .

(A5)

Therefore,

| h_{i j} (\tilde{u}, x) - h_{i j} (u, x) | \leq γ (u, \tilde{u}),

(A6)

for a continuous function γ defined for positive vectors u and $\tilde{u}$ and such that γ(u, u) = 0 for any u. The function $γ (u, \tilde{u})$ must explode when the entries of u or $\tilde{u}$ approach 0. Now define

u_{k} = \prod_{j \neq k} \frac{z_{j}^{1}}{n_{j}} and {\tilde{u}}_{k} = \prod_{j \neq k} \frac{z_{j}}{n_{j}},

(A7)

where z is the exact vector of normalization constants. By (A6), we have

| {\bar{F}}_{i j} (z^{1}) - {\bar{F}}_{i j} (z) | \leq γ (u, \tilde{u}) .

(A8)

As the number of samples N increases, $\bar{F} (z)$ converges to F(z). Moreover, since z¹ is the EMUS estimate of z, z¹ converges to z. Therefore, u converges to $\tilde{u}$ , and (A8) implies that ${\bar{F}}_{i j} (z^{1})$ converges to F_ij(z). Finally, since the function mapping an irreducible, stochastic matrix to its invariant vector is continuous, it follows that z² converges to the invariant vector of F(z), which is z. This verifies the consistency of z².

REFERENCES

1.Torrie G. M. and Valleau J. P., J. Comput. Phys. , 187 (1977). 10.1016/0021-9991(77)90121-8 [DOI] [Google Scholar]
2.Pangali C., Rao M., and Berne B., J. Chem. Phys. , 2975 (1979). 10.1063/1.438701 [DOI] [Google Scholar]
3.Vardi Y., Ann. Stat. , 178 (1985). 10.1214/aos/1176346585 [DOI] [Google Scholar]
4.Gill R. D., Vardi Y., and Wellner J. A., Ann. Stat. , 1069 (1988). 10.1214/aos/1176350948 [DOI] [Google Scholar]
5.Kumar S., Bouzida D., Swendsen R. H., Kollman P. A., and Rosenberg J. M., J. Comput. Chem. , 1011 (1992). 10.1002/jcc.540130812 [DOI] [Google Scholar]
6.Shirts M. R. and Chodera J. D., J. Chem. Phys. , 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rosta E. and Hummer G., J. Chem. Theory Comput. , 276 (2014). 10.1021/ct500719p [DOI] [PubMed] [Google Scholar]
8.Mey A. S., Wu H., and Noé F., Phys. Rev. X , 041018 (2014). 10.1103/PhysRevX.4.041018 [DOI] [Google Scholar]
9.Tan Z., Gallicchio E., Lapelosa M., and Levy R. M., J. Chem. Phys. , 144102 (2012). 10.1063/1.3701175 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lelièvre T., Stoltz G., and Rousset M., Free Energy Computations: A Mathematical Perspective (World Scientific, 2010). [Google Scholar]
11.Minh D. D. and Chodera J. D., J. Chem. Phys. , 134110 (2009). 10.1063/1.3242285 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhu F. and Hummer G., J. Comput. Chem. , 453 (2012). 10.1002/jcc.21989 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Golub G. H. and C. D. Meyer, Jr., SIAM J. Algebraic Discrete Methods , 273 (1986). 10.1137/0607031 [DOI] [Google Scholar]
14.Thiede E. H., EMUS, 2016, https://github.com/ehthiede/EMUS.
15.Schneider H., Linear Algebra Appl. , 139 (1977). 10.1016/0024-3795(77)90070-2 [DOI] [Google Scholar]
16.Tan Z., J. Am. Stat. Assoc. , 1027 (2004). 10.1198/016214504000001664 [DOI] [Google Scholar]
17.Frenkel D. and Smit B., Understanding Molecular Simulation: From Algorithms to Applications (Academic Press, 2001). [Google Scholar]
18.Geyer C. J., Stat. Sci. , 473 (1992). 10.1214/ss/1177011137 [DOI] [Google Scholar]
19.Doss H. and Tan A., J. R. Stat. Soc.: Ser. B , 683 (2014). 10.1111/rssb.12049 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Meng X.-L. and Wong W. H., Statistica Sinica , 831 (1996). [Google Scholar]
21.Paliwal H. and Shirts M. R., J. Chem. Theory Comput. , 4700 (2013). 10.1021/ct4005068 [DOI] [PubMed] [Google Scholar]
22.Shirts M. R., Mobley D. L., and Chodera J. D., Annu. Rep. Comput. Chem. , 41 (2007). 10.1016/S1574-1400(07)03004-6 [DOI] [Google Scholar]
23.Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., and Lindahl E., SoftwareX , 19 (2015). 10.1016/j.softx.2015.06.001 [DOI] [Google Scholar]
24.Tribello G. A., Bonomi M., Branduardi D., Camilloni C., and Bussi G., Comput. Phys. Commun. , 604 (2014). 10.1016/j.cpc.2013.09.018 [DOI] [Google Scholar]
25.MacKerell A. D., Banavali N., and Foloppe N., Biopolymers , 257 (2000). [DOI] [PubMed] [Google Scholar]
26.Hess B., Bekker H., Berendsen H. J. C., and Fraaije J. G. E. M., J. Comput. Chem. , 1463 (1997). [DOI] [Google Scholar]
27.Grossfield A., “WHAM: The weighted histogram analysis method (version 2.0.9),” 2013, http://membrane.urmc.rochester.edu/content/wham.
28.Süli E. and Mayers D. F., An Introduction to Numerical Analysis (Cambridge University press, 2003). [Google Scholar]
29.Chandler D., Introduction to Modern Statistical Mechanics (Oxford University Press, 1987). [Google Scholar]
30.Chipot C. and Pohorille A., Free Energy Calculations (Springer, 2007), p. 86. [Google Scholar]
31.van Duijneveldt J. and Frenkel D., J. Chem. Phys. , 4655 (1992). 10.1063/1.462802 [DOI] [Google Scholar]
32.Wojtas-Niziurski W., Meng Y., Roux B., and Bernèche S., J. Chem. Theory Comput. , 1885 (2013). 10.1021/ct300978b [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Virnau P. and Müller M., J. Chem. Phys. , 10925 (2004). 10.1063/1.1739216 [DOI] [PubMed] [Google Scholar]
34.Nguyen T. H. and Minh D. D., J. Chem. Theory Comput. , 2154 (2016). 10.1021/acs.jctc.6b00060 [DOI] [PubMed] [Google Scholar]
35.Kelly F. P., Reversibility and Stochastic Networks (Cambridge University Press, 2011). [Google Scholar]
36.Dinner A. R., Thiede E. H., Van Koten B., and Weare J., “Stratification of Markov chain Monte Carlo sampling” (unpublished). [DOI] [PMC free article] [PubMed]
37.Lelièvre T., Stoltz G., and Rousset M., Free Energy Computations: A Mathematical Perspective (Imperial College Press, Hackensack, NJ, 2010), p. 458. [Google Scholar]
38.Bilodeau M. and Brenner D., Theory of Multivariate Statistics (Springer Science & Business Media, 2008). [Google Scholar]
39.Thiede E., Van Koten B., and Weare J., SIAM J. Matrix Anal. Appl. , 917 (2015). 10.1137/140987900 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.van der Vaart A. W., Asymptotic Statistics, Cambridge Series on Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, New York, 1998), p. 443. [Google Scholar]
41.Foreman-Mackey D. and Goodman J., ACOR 1.1.1, 2014, https://pypi.python.org/pypi/acor/1.1.1.

[c1] 1.Torrie G. M. and Valleau J. P., J. Comput. Phys. , 187 (1977). 10.1016/0021-9991(77)90121-8 [DOI] [Google Scholar]

[c2] 2.Pangali C., Rao M., and Berne B., J. Chem. Phys. , 2975 (1979). 10.1063/1.438701 [DOI] [Google Scholar]

[c3] 3.Vardi Y., Ann. Stat. , 178 (1985). 10.1214/aos/1176346585 [DOI] [Google Scholar]

[c4] 4.Gill R. D., Vardi Y., and Wellner J. A., Ann. Stat. , 1069 (1988). 10.1214/aos/1176350948 [DOI] [Google Scholar]

[c5] 5.Kumar S., Bouzida D., Swendsen R. H., Kollman P. A., and Rosenberg J. M., J. Comput. Chem. , 1011 (1992). 10.1002/jcc.540130812 [DOI] [Google Scholar]

[c6] 6.Shirts M. R. and Chodera J. D., J. Chem. Phys. , 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c7] 7.Rosta E. and Hummer G., J. Chem. Theory Comput. , 276 (2014). 10.1021/ct500719p [DOI] [PubMed] [Google Scholar]

[c8] 8.Mey A. S., Wu H., and Noé F., Phys. Rev. X , 041018 (2014). 10.1103/PhysRevX.4.041018 [DOI] [Google Scholar]

[c9] 9.Tan Z., Gallicchio E., Lapelosa M., and Levy R. M., J. Chem. Phys. , 144102 (2012). 10.1063/1.3701175 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c10] 10.Lelièvre T., Stoltz G., and Rousset M., Free Energy Computations: A Mathematical Perspective (World Scientific, 2010). [Google Scholar]

[c11] 11.Minh D. D. and Chodera J. D., J. Chem. Phys. , 134110 (2009). 10.1063/1.3242285 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c12] 12.Zhu F. and Hummer G., J. Comput. Chem. , 453 (2012). 10.1002/jcc.21989 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c13] 13.Golub G. H. and C. D. Meyer, Jr., SIAM J. Algebraic Discrete Methods , 273 (1986). 10.1137/0607031 [DOI] [Google Scholar]

[c14] 14.Thiede E. H., EMUS, 2016, https://github.com/ehthiede/EMUS.

[c15] 15.Schneider H., Linear Algebra Appl. , 139 (1977). 10.1016/0024-3795(77)90070-2 [DOI] [Google Scholar]

[c16] 16.Tan Z., J. Am. Stat. Assoc. , 1027 (2004). 10.1198/016214504000001664 [DOI] [Google Scholar]

[c17] 17.Frenkel D. and Smit B., Understanding Molecular Simulation: From Algorithms to Applications (Academic Press, 2001). [Google Scholar]

[c18] 18.Geyer C. J., Stat. Sci. , 473 (1992). 10.1214/ss/1177011137 [DOI] [Google Scholar]

[c19] 19.Doss H. and Tan A., J. R. Stat. Soc.: Ser. B , 683 (2014). 10.1111/rssb.12049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c20] 20.Meng X.-L. and Wong W. H., Statistica Sinica , 831 (1996). [Google Scholar]

[c21] 21.Paliwal H. and Shirts M. R., J. Chem. Theory Comput. , 4700 (2013). 10.1021/ct4005068 [DOI] [PubMed] [Google Scholar]

[c22] 22.Shirts M. R., Mobley D. L., and Chodera J. D., Annu. Rep. Comput. Chem. , 41 (2007). 10.1016/S1574-1400(07)03004-6 [DOI] [Google Scholar]

[c23] 23.Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., and Lindahl E., SoftwareX , 19 (2015). 10.1016/j.softx.2015.06.001 [DOI] [Google Scholar]

[c24] 24.Tribello G. A., Bonomi M., Branduardi D., Camilloni C., and Bussi G., Comput. Phys. Commun. , 604 (2014). 10.1016/j.cpc.2013.09.018 [DOI] [Google Scholar]

[c25] 25.MacKerell A. D., Banavali N., and Foloppe N., Biopolymers , 257 (2000). [DOI] [PubMed] [Google Scholar]

[c26] 26.Hess B., Bekker H., Berendsen H. J. C., and Fraaije J. G. E. M., J. Comput. Chem. , 1463 (1997). [DOI] [Google Scholar]

[c27] 27.Grossfield A., “WHAM: The weighted histogram analysis method (version 2.0.9),” 2013, http://membrane.urmc.rochester.edu/content/wham.

[c28] 28.Süli E. and Mayers D. F., An Introduction to Numerical Analysis (Cambridge University press, 2003). [Google Scholar]

[c29] 29.Chandler D., Introduction to Modern Statistical Mechanics (Oxford University Press, 1987). [Google Scholar]

[c30] 30.Chipot C. and Pohorille A., Free Energy Calculations (Springer, 2007), p. 86. [Google Scholar]

[c31] 31.van Duijneveldt J. and Frenkel D., J. Chem. Phys. , 4655 (1992). 10.1063/1.462802 [DOI] [Google Scholar]

[c32] 32.Wojtas-Niziurski W., Meng Y., Roux B., and Bernèche S., J. Chem. Theory Comput. , 1885 (2013). 10.1021/ct300978b [DOI] [PMC free article] [PubMed] [Google Scholar]

[c33] 33.Virnau P. and Müller M., J. Chem. Phys. , 10925 (2004). 10.1063/1.1739216 [DOI] [PubMed] [Google Scholar]

[c34] 34.Nguyen T. H. and Minh D. D., J. Chem. Theory Comput. , 2154 (2016). 10.1021/acs.jctc.6b00060 [DOI] [PubMed] [Google Scholar]

[c35] 35.Kelly F. P., Reversibility and Stochastic Networks (Cambridge University Press, 2011). [Google Scholar]

[c36] 36.Dinner A. R., Thiede E. H., Van Koten B., and Weare J., “Stratification of Markov chain Monte Carlo sampling” (unpublished). [DOI] [PMC free article] [PubMed]

[c37] 37.Lelièvre T., Stoltz G., and Rousset M., Free Energy Computations: A Mathematical Perspective (Imperial College Press, Hackensack, NJ, 2010), p. 458. [Google Scholar]

[c38] 38.Bilodeau M. and Brenner D., Theory of Multivariate Statistics (Springer Science & Business Media, 2008). [Google Scholar]

[c39] 39.Thiede E., Van Koten B., and Weare J., SIAM J. Matrix Anal. Appl. , 917 (2015). 10.1137/140987900 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c40] 40.van der Vaart A. W., Asymptotic Statistics, Cambridge Series on Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, New York, 1998), p. 443. [Google Scholar]

[c41] 41.Foreman-Mackey D. and Goodman J., ACOR 1.1.1, 2014, https://pypi.python.org/pypi/acor/1.1.1.

PERMALINK

Eigenvector method for umbrella sampling enables error analysis

Erik H Thiede

Brian Van Koten

Jonathan Weare

Aaron R Dinner

Abstract

I. INTRODUCTION

II. BACKGROUND ON UMBRELLA SAMPLING

III. THE EIGENVECTOR METHOD FOR UMBRELLA SAMPLING

A. Computational procedure

B. The eigenvector problem

IV. THE CONNECTION BETWEEN EMUS AND MBAR

V. NUMERICAL COMPARISON

FIG. 1.

VI. JUSTIFICATION FOR UMBRELLA SAMPLING BY SCALING ARGUMENTS

A. Scaling in the limit of many windows

B. A simple model problem

Assumption 6.1.

Assumption 6.2.

Assumption 6.3.

Assumption 6.4.

Assumption 6.5.

FIG. 2.

C. The low temperature limit

Assumption 6.5 ’:

VII. ANALYSIS OF THE ERROR OF EMUS

A. A central limit theorem for EMUS

Assumption 7.1 Central limit theorem for v¯i —

Lemma 7.2 The delta method; Proposition 6.2 of Bilodeau and Brenner38 —

Assumption 7.3.

Theorem 7.4 Central Limit Theorem for EMUS —

Proof.

B. Estimating the asymptotic variance of EMUS

Lemma 7.5.

Proof.

1. Computational procedure

2. Numerical results

FIG. 3.

FIG. 4.

3. Comparison with other algorithms for determining error contributions

FIG. 5.

VIII. CONCLUSIONS

Acknowledgments

APPENDIX: CONSISTENCY OF ITERATIVE EMUS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Assumption 7.1 Central limit theorem for ${\bar{v}}_{i}$ —

Lemma 7.2 The delta method; Proposition 6.2 of Bilodeau and Brenner³⁸ —