Abstract
An optimized method for estimating path-ensemble averages using data from processes driven in opposite directions is presented. Based on this estimator, bidirectional expressions for reconstructing free energies and potentials of mean force from single-molecule force spectroscopy—valid for biasing potentials of arbitrary stiffness—are developed. Numerical simulations on a model potential indicate that these methods perform better than unidirectional strategies.
Crooks’ path-ensemble average theorem (Eq. 1) encompasses a set of exact results in nonequilibrium statistical mechanics pertinent to systems driven from thermal equilibrium by a time-dependent external potential [1]. These include Jarzynski’s equality [2] and the Crooks fluctuation theorem [3], which relate equilibrium free energy differences to the nonequilibrium work distribution, as well as reweighting relations that allow one to recover arbitrary equilibrium ensemble averages from measurements of driven nonequilibrium processes [1]. Because of the intimate connection between such processes and molecular force spectroscopy, these theorems have been widely invoked to extract free energies and potentials of mean force (PMFs) from single-molecule pulling experiments [4–8].
While formally correct, the practical utility of these relations is limited by the presence of exponential averages of the work, which are dominated by rare events and therefore have notoriously slow convergence properties [9]. In order to improve their convergence, strategies such as work-weighted trajectory sampling [10–12] have been proposed. Here we suggest another method to accelerate the convergence of these averages: including trajectories from the reverse process in the forward path-ensemble. This is motivated in part by the observation that the exponential average of the work in the forward process is dominated by those rare trajectories that resemble time-reversed counterparts (“conjugate twins”) of typical trajectories generated by the reverse protocol [13]. Thus, our goals are to construct optimized forward path-ensemble average estimators that explicitly include such trajectories, and apply them to the problem of estimating free energies and potentials of mean force from single-molecule pulling experiments.
The starting point of our analysis is Crooks’ path-ensemble average theorem, which relates the forward average of an arbitrary functional of the phase space trajectory Γ = {q(t), p(t)} to its work-weighted average in the reverse process, namely [1]
(1) |
In the above, the forward average 〈…〉F is an average over all trajectories (path-ensemble average) generated in the forward process, wherein an external parameter (e.g. the position of a harmonic trap in a single-molecule pulling experiment) is driven from the value A to B in τ units of time after equilibration at A, while 〈…〉R is a similarly defined average in the reverse direction, from B to A. The total work W[Γ] accumulated up to the final time τ is defined in terms of the time-dependent Hamiltonian H = H(q(t), p(t); t) as , while ΔF = FB −FA is the free energy difference between the equilibrium states corresponding to the endpoints A and B. Finally, the notation is a shorthand for the value of the functional when evaluated over the time-reversal of Γ, viz. Γ̂ = {q(τ − t), −p(τ − t)}.
By choosing in Eq. (1) and using the property W[Γ̂] = −W[Γ], one obtains an identity between the distribution of trajectories in the two directions [1, 14],
(2) |
where ρF (Γ) and ρR(Γ) are the probabilities of observing a particular trajectory Γ in the forward and reverse processes, respectively. This result offers a means of achieving the aforementioned goal — trajectories from the reverse process can indeed be included in the forward path-ensemble when their density is reweighted by eβ(W − ΔF). Our next goal is to optimally combine direct estimates of ρF(Γ) from forward processes with indirect estimates obtained from ρR(Γ) via Eq. (2); this will be done with the weighted histogram analysis method (WHAM) [15, 16].
The objective of WHAM is to find an optimal (i.e. least variance) estimator for a desired probability distribution from a series of independent estimates of biased distributions, where “biased” here means that the distribution of interest is related to the remaining ones by a simple reweighting factor. To be specific, given a series of normalized distributions of a random variable x, with i = 1, …, M, and M unbiasing relations of the form
(3) |
where ρ(x) is the distribution of interest and fi(x) is the unbiasing factor for the i-th distribution, the WHAM strategy seeks a linear combination of M independent estimates of ρ(x) obtained from the measured biased distributions via Eq. (3), such that its variance σ2[ρ(x)] is minimized. This result in [15, 16]
(4) |
where ni is the number of samples in the estimate of the i-th distribution. (For notational simplicity, here we do not distinguish the exact distribution and its sample estimate). Applied to the problem of estimating ρF (Γ) from nF forward and nR reverse trajectories, Eqs. (2)–(4) give an optimized estimator for the forward probability distribution of trajectories in terms of the measured forward and reverse densities:
(5) |
We are now ready to derive the main results of our paper. Taking the average of using the optimized density from Eq. (5), we obtain the following estimator for the forward path-ensemble average of :
(6) |
where in the last average we have again used the property that the total work is odd under time-reversal. (An analogous expression for the reverse path-ensemble can be obtained by switching the definitions of forward and reverse.) This general result forms the basis of our bidirectional method, and different applications can be obtained with suitable choices of .
Our first example is concerned with free energy differences, where we choose , with defined as the partial work between times a and b along the trajectory Γ, i.e. . (Note that, according to this notation, the total work W coincides with ). Invoking Jarzynski’s equality for the l.h.s. of Eq. (6), this choice of gives
(7) |
where ΔFt = Ft − FA is the free energy difference between the equilibrium states defined by the Hamiltonians H(q, p; t) and H(q, p; 0), and in the last average we have used the property . For the particular cases where t = 0 or t = τ, this result can be rearranged to yield the Bennett Acceptance Ratio (BAR) formula for ΔF [17], as generalized to nonequilibrium processes by Crooks [1] (for a multistate extension, see [18]). The above equation further generalizes BAR to estimate intermediate free energy differences ΔFt. Operationally, when estimating an intermediate free energy difference, we must first estimate ΔF to use in the r.h.s. of Eq. (7). This can be accomplished with BAR, which has been shown to be a maximum likelihood estimator of ΔF [19].
Free energy differences can also be estimated using a cumulant expansion of Jarzynski’s equality [7]. In order to analyze bidirectional data with this approach, one should apply Eq. (6) to estimate moments of the work distribution, choosing . This is more rigorous than a method which applies the Crooks fluctuation theorem between states which are not in equilibrium [20]. A bidirectional estimator for the energetic contribution to ΔFt can be obtained by choosing in Eq. (6). This results in the average energy at time t, as was shown in the unidirectional case [21].
In the context of single-molecule pulling experiments, the system is typically driven out of equilibrium by a time-dependent potential Vt = V (zt; t) acting on a collective coordinate zt = z(q(t)) (e.g. the end-to-end distance of a protein) such that the total Hamiltonian is H = H0 + Vt, where H0 is the (time-independent) Hamiltonian in the absence of the external perturbation. In this case, the free energy difference ΔFt involves the equilibrium states of the system corresponding to the potential at Vt and V0. However, one is often more interested in the potential of mean force G0(z) of the unperturbed Hamiltonian, i.e. in the effective potential dictating the equilibrium distribution of z-values in the absence of the external potential. Although in the limit of sufficiently stiff potentials the free energy difference approaches the PMF [7], this approximation fails for soft springs [22] such as those used in optical tweezer experiments [8], in which case one should use more rigorous methods. One approach starts from the observation that the equilibrium distribution of z-values in the absence of the external potential (i.e. the unbiased distribution) is given by [4, 5], where is an overall normalization constant, which can be shown to be independent of t. With this result in mind, a bidirectional estimator for ρ0(z) can be obtained from Eq. (6) by choosing . Moreover, since this expression for ρ0(z) is correct for all times t, different estimates of ρ0(z) can be obtained from different time-slices during the pulling process, and these can in turn be combined according to the WHAM prescription (Eqs. (3) and (4)). Indeed, rewriting the above result for ρ0(z) in the form of Eq. (3), viz.
(8) |
where the factor is introduced to normalize the distribution in square brackets, one arrives at the Hummer-Szabo estimator for ρ0(z) [4, 5],
(9) |
where G0(z) ≡ −β−1 ln ρ0(z) is defined up to an additive constant. The above PMF formalism has been extended to account for multiple pulling protocols [23] and multiple dimensions [24].
In order to optimally include trajectories from the reverse perturbation in Eq. (9), we choose in Eq. (6) and substitute the ensuing expression for in Eq. (9). This leads to our bidirectional PMF estimator:
(10) |
where ΔFt is estimated via Eq. (7) and ΔF = ΔFτ via BAR. (As in WHAM, ΔFt can also be estimated self-consistently by iterative cycles of Eq. (10) and numerically integrating ΔFt =∫ e−β[Go(z)+V (z;t)]dz/∫e −β[Go (z′)+V (z′;0)]dz′.) If we switch the definitions of forward and reverse, then Go(z) differs by the constant ΔF.
To demonstrate these results, we perform Brownian dynamics simulations on a one-dimensional potential whose unperturbed Hamiltonian is H0(z) = (5z3 − 10z+ 3)z (as used by Hummer [25]). The time-dependent Hamiltonian is H(z; t) = H0(z) + V (z; t), with V(z; t) =ks(z − z̄(t))2/2 and ks chosen as 15. In the forward direction, the center of the potential z̄(t) is linearly varied from −1.5 to 1.5 over 750 steps; it is varied from 1.5 to −1.5 in the reverse direction. Before pulling, trajectories are equilibrated for 100 steps. Dynamics are run with a diffusion coefficient D = 1, temperature parameter β = 1, and time step Δt = 0.001. Work is calculated with the discrete formula .
For ΔFt estimates on this model system, our bidirectional strategy outperforms existent methods (Fig. 1). Unidirectional estimates of ΔFt based on Jarzynski’s equality are markedly biased as the states are further perturbed from the starting equilibrium state. Chelli and coworkers have also developed an asymptotically correct bidirectional estimator that reduces to BAR at the end states [26]. However, the derivation offered by these authors is limited to deterministic systems, and although we have empirical evidence that their estimator approaches the correct ΔFt for Brownian simulations in the limit of a large number of trajectories (data not shown), it leads to a more pronounced bias than Eq. (7) for simulations under the above conditions (Fig. 1). Since a general derivation of the results of Ref. [26] is not yet available, at present it is difficult to identify the source of discrepancy between these two estimators, and we leave this question for future investigations.
Our PMF reconstruction methods also compare favorably with unidirectional methods (Fig. 2). As with ΔFt, reconstructed PMFs from separate forward and reverse processes increasingly overestimate the PMF farther from region sampled by the original state. In contrast, our bidirectional formula, Eq. (10), optimally combines the data to reduce this bias. The method of Chelli and coworkers for PMF reconstruction requires a stiff-spring assumption, and hence is not applicable here.
In summary, building on the observation that the convergence of Jarzynski’s nonequilibrium work average is dominated by time-reversed counterparts of trajectories generated via the reverse process [13], we have introduced a formula that optimally includes such trajectories in generic nonequilibrium path-averages (Eq. (6)). As an application of this result, we have derived a bidirectional estimator for free energy differences in terms of nonequilibrium measurements of work (Eq. (7)). Although it reduces to BAR for the special case of endpoint free energy differences ΔF, our formula also allows for the estimation of intermediate values ΔFt of the free energy during the switching process. When applied to the problem of estimating potentials of mean force G0(z) in nonequilibrium force spectroscopy, our methods yield a bidirectional estimator for G0(z) that optimally combines time-slices from forward and reverse measurements of position and work (Eq. (10)). Numerical comparison of our formula with unidirectional estimates based on the Jarzynski equality [2] or the Hummer-Szabo method [4, 5] reveal that our reconstructed free energy differences are of better overall quality than these unidirectional estimators, which are increasingly biased as one drives the system farther away from its original equilibrium state. It has been noted that faster pullings farther from equilibrium contain less instrument noise and therefore lead to more accurate free energy estimates [27]. It is thus expected that our bidirectional estimators will further improve the quality of such experimental estimates by appreciably reducing the finite-sample bias due to fast pullings.
Acknowledgments
We thank Christopher Jarzynski and Attila Szabo for helpful discussions. This research was supported by the Intramural Research Program of the NIH, NIDDK.
References
- 1.Crooks GE. Phys Rev. 2000;E 61:2361. [Google Scholar]
- 2.Jarzynski C. Phys Rev Lett. 1997;78:2690. [Google Scholar]
- 3.Crooks GE. Phys Rev. 1999;E 60:2721. doi: 10.1103/physreve.60.2721. [DOI] [PubMed] [Google Scholar]
- 4.Hummer G, Szabo A. Proc Natl Acad Sci USA. 2001;98:3658. doi: 10.1073/pnas.071034098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hummer G, Szabo A. Acc Chem Res. 2005;38:504. doi: 10.1021/ar040148d. [DOI] [PubMed] [Google Scholar]
- 6.Liphardt J, Dumont S, Smith SB, Tinoco I, Jr, Bustamante C. Science. 2002;296:1832. doi: 10.1126/science.1071152. [DOI] [PubMed] [Google Scholar]
- 7.Park S, Khalili-araghi F, Tajkhorshid E, Schulten K. J Chem Phys. 2003;119:3559. [Google Scholar]
- 8.Collin D, Ritort F, Jarzynski C, Smith SB, Tinoco I, Bustamante C. Nature. 2005;437:231. doi: 10.1038/nature04061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jarzynski C. Phys Rev E. 1997;56:5018. [Google Scholar]
- 10.Sun S. J Chem Phys. 2003;118:5769. [Google Scholar]
- 11.Ytreberg F, Zuckerman D. J Chem Phys. 2004;120:10876. doi: 10.1063/1.1760511. [DOI] [PubMed] [Google Scholar]
- 12.Oberhofer H, Dellago C. Comput Phys Commun. 2008 [Google Scholar]
- 13.Jarzynski C. Phys Rev E. 2006;73:046105. doi: 10.1103/PhysRevE.73.046105. [DOI] [PubMed] [Google Scholar]
- 14.Crooks GE. J Stat Phys. 1998;90:1481. [Google Scholar]
- 15.Ferrenberg AM, Swendsen RH. Phys Rev Lett. 1989;63:1195. doi: 10.1103/PhysRevLett.63.1195. [DOI] [PubMed] [Google Scholar]
- 16.Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM. J Comput Chem. 1992;13:1011. [Google Scholar]
- 17.Bennett C. J Comput Phys. 1976;22:245. [Google Scholar]
- 18.Maragakis P, Spichty M, Karplus M. Phys Rev Lett. 2006;96:100602. doi: 10.1103/PhysRevLett.96.100602. [DOI] [PubMed] [Google Scholar]
- 19.Shirts MR, Bair E, Hooker G, Pande VS. Phys Rev Lett. 2003;91:140601. doi: 10.1103/PhysRevLett.91.140601. [DOI] [PubMed] [Google Scholar]
- 20.Kosztin I, Barz B, Janosi L. J Chem Phys. 2006;124 doi: 10.1063/1.2166379. [DOI] [PubMed] [Google Scholar]
- 21.Nummela J, Yassin F, Andricioaei I. J Chem Phys. 2008;128:024104. doi: 10.1063/1.2817332. [DOI] [PubMed] [Google Scholar]
- 22.Minh DDL, McCammon JA. J Phys Chem B. 2008 doi: 10.1021/jp0733163. [DOI] [PubMed] [Google Scholar]
- 23.Minh DDL. Phys Rev E. 2006;74:061120. doi: 10.1103/PhysRevE.74.061120. [DOI] [PubMed] [Google Scholar]
- 24.Minh DDL. J Phys Chem. 2007;B 111:4137. doi: 10.1021/jp068656n. [DOI] [PubMed] [Google Scholar]
- 25.Hummer G. In: Free Energy Calculations. Chipot C, Pohorille A, editors. Vol. 86 Springer; Berlin: 2007. [Google Scholar]
- 26.Chelli R, Marsili S, Procacci P. Phys Rev. 2008;E 77:031104. doi: 10.1103/PhysRevE.77.031104. [DOI] [PubMed] [Google Scholar]
- 27.Maragakis P, Ritort F, Bustamante C, Karplus M, Crooks GE. 2007 doi: 10.1063/1.2937892. arXiv:0707.0089v1 [cond-mat.statmech] [DOI] [PMC free article] [PubMed] [Google Scholar]