Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Aug 11.
Published in final edited form as: Phys Rev Lett. 2008 May 6;100(18):180602. doi: 10.1103/PhysRevLett.100.180602

Optimized free energies from bidirectional single-molecule force spectroscopy

David D L Minh 1,*, Artur B Adib 1
PMCID: PMC2504746  NIHMSID: NIHMS48903  PMID: 18518359

Abstract

An optimized method for estimating path-ensemble averages using data from processes driven in opposite directions is presented. Based on this estimator, bidirectional expressions for reconstructing free energies and potentials of mean force from single-molecule force spectroscopy—valid for biasing potentials of arbitrary stiffness—are developed. Numerical simulations on a model potential indicate that these methods perform better than unidirectional strategies.


Crooks’ path-ensemble average theorem (Eq. 1) encompasses a set of exact results in nonequilibrium statistical mechanics pertinent to systems driven from thermal equilibrium by a time-dependent external potential [1]. These include Jarzynski’s equality [2] and the Crooks fluctuation theorem [3], which relate equilibrium free energy differences to the nonequilibrium work distribution, as well as reweighting relations that allow one to recover arbitrary equilibrium ensemble averages from measurements of driven nonequilibrium processes [1]. Because of the intimate connection between such processes and molecular force spectroscopy, these theorems have been widely invoked to extract free energies and potentials of mean force (PMFs) from single-molecule pulling experiments [48].

While formally correct, the practical utility of these relations is limited by the presence of exponential averages of the work, which are dominated by rare events and therefore have notoriously slow convergence properties [9]. In order to improve their convergence, strategies such as work-weighted trajectory sampling [1012] have been proposed. Here we suggest another method to accelerate the convergence of these averages: including trajectories from the reverse process in the forward path-ensemble. This is motivated in part by the observation that the exponential average of the work in the forward process is dominated by those rare trajectories that resemble time-reversed counterparts (“conjugate twins”) of typical trajectories generated by the reverse protocol [13]. Thus, our goals are to construct optimized forward path-ensemble average estimators that explicitly include such trajectories, and apply them to the problem of estimating free energies and potentials of mean force from single-molecule pulling experiments.

The starting point of our analysis is Crooks’ path-ensemble average theorem, which relates the forward average of an arbitrary functional F=F[Γ] of the phase space trajectory Γ = {q(t), p(t)} to its work-weighted average in the reverse process, namely [1]

FF=F^eβ(W+ΔF)R. (1)

In the above, the forward average 〈…〉F is an average over all trajectories (path-ensemble average) generated in the forward process, wherein an external parameter (e.g. the position of a harmonic trap in a single-molecule pulling experiment) is driven from the value A to B in τ units of time after equilibration at A, while 〈…〉R is a similarly defined average in the reverse direction, from B to A. The total work W[Γ] accumulated up to the final time τ is defined in terms of the time-dependent Hamiltonian H = H(q(t), p(t); t) as W=0τ(H/t)dt, while ΔF = FBFA is the free energy difference between the equilibrium states corresponding to the endpoints A and B. Finally, the notation F^F[Γ^] is a shorthand for the value of the functional when evaluated over the time-reversal of Γ, viz. Γ̂ = {q(τt), −p(τt)}.

By choosing F[Γ]=δ[ΓΓ] in Eq. (1) and using the property W[Γ̂] = −W[Γ], one obtains an identity between the distribution of trajectories in the two directions [1, 14],

ρF(Γ)=eβ(WΔF)ρR(Γ^), (2)

where ρF (Γ) and ρR(Γ) are the probabilities of observing a particular trajectory Γ in the forward and reverse processes, respectively. This result offers a means of achieving the aforementioned goal — trajectories from the reverse process can indeed be included in the forward path-ensemble when their density is reweighted by eβ(W − ΔF). Our next goal is to optimally combine direct estimates of ρF(Γ) from forward processes with indirect estimates obtained from ρR(Γ) via Eq. (2); this will be done with the weighted histogram analysis method (WHAM) [15, 16].

The objective of WHAM is to find an optimal (i.e. least variance) estimator for a desired probability distribution from a series of independent estimates of biased distributions, where “biased” here means that the distribution of interest is related to the remaining ones by a simple reweighting factor. To be specific, given a series of normalized distributions ρib(x) of a random variable x, with i = 1, …, M, and M unbiasing relations of the form

ρ(x)=fi(x)ρib(x), (3)

where ρ(x) is the distribution of interest and fi(x) is the unbiasing factor for the i-th distribution, the WHAM strategy seeks a linear combination of M independent estimates of ρ(x) obtained from the measured biased distributions ρib(x) via Eq. (3), such that its variance σ2[ρ(x)] is minimized. This result in [15, 16]

ρ(x)=i=1Mniρib(x)i=1Mnifi1(x), (4)

where ni is the number of samples in the estimate of the i-th distribution. (For notational simplicity, here we do not distinguish the exact distribution and its sample estimate). Applied to the problem of estimating ρF (Γ) from nF forward and nR reverse trajectories, Eqs. (2)(4) give an optimized estimator for the forward probability distribution of trajectories in terms of the measured forward and reverse densities:

ρF(Γ)=nFρF(Γ)+nRρR(Γ^)nF+nReβ(WΔF). (5)

We are now ready to derive the main results of our paper. Taking the average of F[Γ] using the optimized density from Eq. (5), we obtain the following estimator for the forward path-ensemble average of F:

FF=nFFnF+nReβ(WΔF)F+nFF^nF+nReβ(W+ΔF)R, (6)

where in the last average we have again used the property that the total work is odd under time-reversal. (An analogous expression for the reverse path-ensemble can be obtained by switching the definitions of forward and reverse.) This general result forms the basis of our bidirectional method, and different applications can be obtained with suitable choices of F.

Our first example is concerned with free energy differences, where we choose F=eβW0t, with Wab=Wab[Γ] defined as the partial work between times a and b along the trajectory Γ, i.e. Wabab(H/t)dt. (Note that, according to this notation, the total work W coincides with W0τ). Invoking Jarzynski’s equality eβ(FtFA)=eβW0tF for the l.h.s. of Eq. (6), this choice of F gives

eβΔFt=nFeβΔF0tnF+nReβ(WΔF)F+nFeβWτtτnF+nReβ(W+ΔF)R, (7)

where ΔFt = FtFA is the free energy difference between the equilibrium states defined by the Hamiltonians H(q, p; t) and H(q, p; 0), and in the last average we have used the property W0t[Γ^]=Wτtτ[Γ]. For the particular cases where t = 0 or t = τ, this result can be rearranged to yield the Bennett Acceptance Ratio (BAR) formula for ΔF [17], as generalized to nonequilibrium processes by Crooks [1] (for a multistate extension, see [18]). The above equation further generalizes BAR to estimate intermediate free energy differences ΔFt. Operationally, when estimating an intermediate free energy difference, we must first estimate ΔF to use in the r.h.s. of Eq. (7). This can be accomplished with BAR, which has been shown to be a maximum likelihood estimator of ΔF [19].

Free energy differences can also be estimated using a cumulant expansion of Jarzynski’s equality [7]. In order to analyze bidirectional data with this approach, one should apply Eq. (6) to estimate moments of the work distribution, choosing F=Wn. This is more rigorous than a method which applies the Crooks fluctuation theorem between states which are not in equilibrium [20]. A bidirectional estimator for the energetic contribution to ΔFt can be obtained by choosing F=H(q,p;t)eβ(W0tΔFt) in Eq. (6). This results in the average energy at time t, as was shown in the unidirectional case [21].

In the context of single-molecule pulling experiments, the system is typically driven out of equilibrium by a time-dependent potential Vt = V (zt; t) acting on a collective coordinate zt = z(q(t)) (e.g. the end-to-end distance of a protein) such that the total Hamiltonian is H = H0 + Vt, where H0 is the (time-independent) Hamiltonian in the absence of the external perturbation. In this case, the free energy difference ΔFt involves the equilibrium states of the system corresponding to the potential at Vt and V0. However, one is often more interested in the potential of mean force G0(z) of the unperturbed Hamiltonian, i.e. in the effective potential dictating the equilibrium distribution of z-values in the absence of the external potential. Although in the limit of sufficiently stiff potentials the free energy difference approaches the PMF [7], this approximation fails for soft springs [22] such as those used in optical tweezer experiments [8], in which case one should use more rigorous methods. One approach starts from the observation that the equilibrium distribution of z-values in the absence of the external potential (i.e. the unbiased distribution) is given by ρ0(z)=C1eβV(z;t)δ(zzt)eβW0t [4, 5], where C=eβ(W0tVt)F is an overall normalization constant, which can be shown to be independent of t. With this result in mind, a bidirectional estimator for ρ0(z) can be obtained from Eq. (6) by choosing F=δ(zzt)eβW0t. Moreover, since this expression for ρ0(z) is correct for all times t, different estimates of ρ0(z) can be obtained from different time-slices during the pulling process, and these can in turn be combined according to the WHAM prescription (Eqs. (3) and (4)). Indeed, rewriting the above result for ρ0(z) in the form of Eq. (3), viz.

ρ0(z)=C1eβ[V(z;t)ΔFt][δ(zzt)eβW0tFeβΔFt], (8)

where the factor eβΔFt=eβW0tF is introduced to normalize the distribution in square brackets, one arrives at the Hummer-Szabo estimator for ρ0(z) [4, 5],

eβG0(z)=tδ(zzt)eβW0tFeβΔFtteβW[V(z;t)ΔFt], (9)

where G0(z) ≡ −β−1 ln ρ0(z) is defined up to an additive constant. The above PMF formalism has been extended to account for multiple pulling protocols [23] and multiple dimensions [24].

In order to optimally include trajectories from the reverse perturbation in Eq. (9), we choose F=δ(zzt)eβW0t in Eq. (6) and substitute the ensuing expression for δ(zzt)eβW0tF in Eq. (9). This leads to our bidirectional PMF estimator:

eβG0(z)=t[nFδ(zzt)eβW0tnF+nReβ(WΔF)F+nRδ(zzτt)eβWτtτnF+nReβ(W+ΔF)R]eβΔFtteβ[V(z;t)ΔFt], (10)

where ΔFt is estimated via Eq. (7) and ΔF = ΔFτ via BAR. (As in WHAM, ΔFt can also be estimated self-consistently by iterative cycles of Eq. (10) and numerically integrating ΔFt =∫ e−β[Go(z)+V (z;t)]dz/∫e β[Go (z′)+V (z′;0)]dz′.) If we switch the definitions of forward and reverse, then Go(z) differs by the constant ΔF.

To demonstrate these results, we perform Brownian dynamics simulations on a one-dimensional potential whose unperturbed Hamiltonian is H0(z) = (5z3 − 10z+ 3)z (as used by Hummer [25]). The time-dependent Hamiltonian is H(z; t) = H0(z) + V (z; t), with V(z; t) =ks(z − (t))2/2 and ks chosen as 15. In the forward direction, the center of the potential (t) is linearly varied from −1.5 to 1.5 over 750 steps; it is varied from 1.5 to −1.5 in the reverse direction. Before pulling, trajectories are equilibrated for 100 steps. Dynamics are run with a diffusion coefficient D = 1, temperature parameter β = 1, and time step Δt = 0.001. Work is calculated with the discrete formula Wab=t=abΔt[H(z(t+Δt);t+Δt)H(z(t+Δt);t)].

For ΔFt estimates on this model system, our bidirectional strategy outperforms existent methods (Fig. 1). Unidirectional estimates of ΔFt based on Jarzynski’s equality are markedly biased as the states are further perturbed from the starting equilibrium state. Chelli and coworkers have also developed an asymptotically correct bidirectional estimator that reduces to BAR at the end states [26]. However, the derivation offered by these authors is limited to deterministic systems, and although we have empirical evidence that their estimator approaches the correct ΔFt for Brownian simulations in the limit of a large number of trajectories (data not shown), it leads to a more pronounced bias than Eq. (7) for simulations under the above conditions (Fig. 1). Since a general derivation of the results of Ref. [26] is not yet available, at present it is difficult to identify the source of discrepancy between these two estimators, and we leave this question for future investigations.

FIG. 1.

FIG. 1

(Color online) Comparison of ΔFt estimators: Jarzynski’s equality applied to 500 forward (rightward triangles) or reverse pullings (leftward triangles, time reversed so that ΔFt = ΔF at t = 0.75); our optimized estimator, Eq. (7) (upward triangles) and Eq. (16) of Ref. [26] (dotted line) applied to 250 pullings in each direction. The exact ΔFt, calculated by applying Gauss-Kronrod quadrature in MATLAB 7.5 to numerically integrate ∫eβH(z;t)dz between (t) −5 < z < (t) + 5, is shown as a shaded line.

Our PMF reconstruction methods also compare favorably with unidirectional methods (Fig. 2). As with ΔFt, reconstructed PMFs from separate forward and reverse processes increasingly overestimate the PMF farther from region sampled by the original state. In contrast, our bidirectional formula, Eq. (10), optimally combines the data to reduce this bias. The method of Chelli and coworkers for PMF reconstruction requires a stiff-spring assumption, and hence is not applicable here.

FIG. 2.

FIG. 2

(Color online) Comparison of PMF estimators: (a) Hummer and Szabo’s method, Eq. (9), applied to 500 forward (rightward triangles) or reverse (leftward triangles) pullings. (b) Our estimator, Eq. (10), applied to 250 forward and 250 reverse pullings (upward triangles). The shaded line is the exact PMF. PMFs from forward and bidirectional data are shifted to align with the exact PMF at z = −1.25; for the reverse, they are aligned at z = 1.25. In (b), the harmonic potential used in our pullings is shown as a dashed line.

In summary, building on the observation that the convergence of Jarzynski’s nonequilibrium work average is dominated by time-reversed counterparts of trajectories generated via the reverse process [13], we have introduced a formula that optimally includes such trajectories in generic nonequilibrium path-averages (Eq. (6)). As an application of this result, we have derived a bidirectional estimator for free energy differences in terms of nonequilibrium measurements of work (Eq. (7)). Although it reduces to BAR for the special case of endpoint free energy differences ΔF, our formula also allows for the estimation of intermediate values ΔFt of the free energy during the switching process. When applied to the problem of estimating potentials of mean force G0(z) in nonequilibrium force spectroscopy, our methods yield a bidirectional estimator for G0(z) that optimally combines time-slices from forward and reverse measurements of position and work (Eq. (10)). Numerical comparison of our formula with unidirectional estimates based on the Jarzynski equality [2] or the Hummer-Szabo method [4, 5] reveal that our reconstructed free energy differences are of better overall quality than these unidirectional estimators, which are increasingly biased as one drives the system farther away from its original equilibrium state. It has been noted that faster pullings farther from equilibrium contain less instrument noise and therefore lead to more accurate free energy estimates [27]. It is thus expected that our bidirectional estimators will further improve the quality of such experimental estimates by appreciably reducing the finite-sample bias due to fast pullings.

Acknowledgments

We thank Christopher Jarzynski and Attila Szabo for helpful discussions. This research was supported by the Intramural Research Program of the NIH, NIDDK.

References

RESOURCES