Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Feb 13.
Published in final edited form as: Phys Rev Lett. 2011 Mar 29;106(13):138102. doi: 10.1103/PhysRevLett.106.138102

Compaction and Tensile Forces Determine the Accuracy of Folding Landscape Parameters from Single Molecule Pulling Experiments

Greg Morrison 1,2, Changbong Hyeon 3, Michael Hinczewski 2, D Thirumalai 2
PMCID: PMC3571105  NIHMSID: NIHMS298082  PMID: 21517423

Abstract

We establish a framework for assessing whether the transition state location of a biopolymer, which can be inferred from single molecule pulling experiments, corresponds to the ensemble of structures that have equal probability of reaching either the folded or unfolded states (Pfold = 0.5). Using results for the forced unfolding of a RNA hairpin, an exactly soluble model, and an analytic theory, we show that Pfold is solely determined by s, an experimentally measurable molecular tensegrity parameter, which is a ratio of the tensile force and a compaction force that stabilizes the folded state. Applications to folding landscapes of DNA hairpins and a leucine zipper with two barriers provide a structural interpretation of single molecule experimental data. Our theory can be used to assess whether molecular extension is a good reaction coordinate using measured free energy profiles.


The response of biopolymers to mechanical force (f), at the single molecule level, has produced direct estimates of many features of their folding landscapes, which in turn has given a deeper understanding of how proteins and RNA fold. In particular, single molecule pulling experiments directly measure the distribution of forces needed to rupture biomolecules, roughness, and shapes of folding landscapes [14]. Such measurements have made it possible to decipher the molecular origin of elasticity and mechanical stability of the building blocks of life, which is the first step in describing how they interact to function in the cellular context. The major challenge is to provide a firm theoretical basis for interpreting the physical meaning and reliability of the folding landscape parameters that are extracted from trajectories that project dynamics in multidimensional space onto a one-dimensional molecular extension, which is conjugate to f.

The key characteristics of the folding landscape of biomolecules that can be extracted from single molecule force spectroscopy measurements are the f-dependent position of the transition state (TS), the distance (Δx = xTSxN) from the ensemble of conformations that define the basin of attraction corresponding to the native states (NBA), and the free energy barrier (ΔF). The assumption in the analysis of the single molecule force spectroscopy data is that the molecular extension is a good reaction coordinate for RNA and proteins, which implies that a single degree of freedom accurately describes the behavior of the multiple degrees of freedom explored by the biomolecule. The structural meaning of Δx, a parameter that is unique to single molecule force spectroscopy, has never been made clear. Despite many subtleties in determining Δx from measurements [57], xTS is most easily identified as a local maximum of the free energy profile at the transition midforce fm, F(R|fm), which can be constructed by measuring the statistics of end-to-end distance P(R) at f = fm [14,8,9]. This method has been experimentally used to obtain sequence-dependent folding landscapes of DNA hairpins [2] and, more recently, proteins [3]. In order to render physical meaning to Δx we address two questions here: (i) Does Δx represent the structures in the transition state ensemble (TSE)? The TSE is a subset of structures that have equal probability of reaching the NBA or unfolded basin of attraction (UBA). (ii) Can a molecular tensegrity (short for tensional integrity) parameter [10] s = fc/fm, expressing balance between the internal compaction force fm = ΔFUF/ΔxUF and the tensile force fc = ΔF(fm)/Δx(fm), describe the adequacy of Δx(fm) in describing the TSE structures?

We use simulations of a RNA hairpin and an exactly soluble model, both of which are apparent two-state folders as indicated by F(R|f), to answer the two questions posed above. The TS is a surface in the multidimensional folding landscape (stochastic separatrix [11]) across which the flux to the NBA and the UBA is identical. This implies that the fraction of folding trajectories corresponding to Δx that start from the TS should have equal probability (Pfold ≈ 0.5 [12]) of reaching the NBA and the UBA [12,13]. At f = fm, the mean dwell times in the NBA and the basin of attraction corresponding to unfolded conformations (UBA) are identical, so that 0xTSdRP(R)=xTSdRP(R)=0.5. However, it is unclear whether or not the barrier top position is consistent with the requirement Pfold ≈ 0.5 in force experiments.

To directly ascertain whether the barrier top position denotes the TSE (requiring Pfold ≈ 0.5 in force experiments), we study folding of P5GA, a RNA hairpin, for which the NBA and UBA are equally populated at fm = 14.7 pN [8,9]. Both free energy profiles and the kinetics predicted by Kramers’ theory show excellent agreement with the simulation results [9] and formally establish that extension R is a good reaction coordinate for describing hopping kinetics at ffm. In practice, experimental time traces that have a number of transitions such as the one in Fig. 1(a) can be used to estimate Pfold. With absorbing boundary conditions imposed at RN = 2 nm and RU = 7.5 nm [Fig. 1(c)], we directly count the number of molecules from 47 points belonging to the TS region (R = (4.1 – 4.2) nm [Fig. 1(c)]) that reach R < RN (folded) and R > RU (unfolded). Although the R distribution of the 47 points is uniform [Fig. 1(a)], we obtained Pfold ≈ 0.74. To determine Pfold by using the ensemble method, we launched 100 trajectories from each of the 47 structures and monitored their evolution [Fig. 1(b)] by using Brownian dynamics simulations [14] with the multidimensional energy function for the hairpin [8]. We find that Pfold = 0.65 [Fig. 1(c)], which is similar to the value obtained by analyzing the folding trajectory. An examination of the individual trajectories reveals that many molecules, initially with a gradient toward the UBA, recross the transition barrier to reach the NBA [blue trajectories in Fig. 1(b)]. Conversely, most of the molecules directly reach R = RN if they initially fall into the NBA, showing few recrossing events. Although the precise percentage of molecules reaching the UBA or NBA depends on the particular value of the boundary (RN and RU), our simulations emphasize the importance of the recrossing dynamics, which is known to cause significant deviations from the transition state theory.

FIG. 1.

FIG. 1

(color online). Relaxation dynamics of the ensemble of P5GA hairpins from the barrier top of F(R|fm). (a) R(t) at f = fm. TSE in the range R = (4.1–4.2) nm are shown in the green box with a uniform R distribution. (b) Starting from the TSE of hairpins from (a), 35% of the trajectories (red) reach UBA and 65% of the trajectories (blue) reach NBA. (c) Free energy profile at f = fm. (d) TSE of P5GA has a broad distribution in the number of base pairs (Nbp).

Deviation from Pfold ≈ 0.50 suggests that the global coordinate R alone is not sufficient to rigorously describe the hopping kinetics of P5GA. At least one other auxiliary coordinate is needed, and for structural reasons we take it to be the number Nbp of base pairs [2]. The broad asymmetric distribution of Nbp within the narrow TS region [R = (4.1–4.2)] implied by F(R|fm) [Fig. 1(d)] shows that hopping kinetics at fm should be described by a multi-dimensional (at least two) folding landscape even though the f-dependent rates of hopping between the NBA and the UBA can be reliably predicted by using F(R|fm) [9].

To further illustrate if the R coordinate alone is sufficient to determine the TSE structures, we consider an analytically solvable generalized Rouse model (GRM) [9] that has a single bond in the interior of the chain whose presence corresponds to the NBA. The GRM Hamiltonian [15]

βH=32a20Ndsr.(s)-βf0Ndsr.(s)+βVc[r(s1)-r(s2)], (1)

where Vc[x] = k(x2c2)/2 for xc and Vc[x] = 0 for x > c, describes a simple Gaussian chain under tension with an additional cutoff harmonic interaction at the interior points s1 and s2. The distribution of R for the GRM, with Δs = |s2s1|, is

P(R;Δs)Rsinh(βfR)e-3R2/2(N-Δs)a20dxxsinh[3Rx(N-Δs)a2]e-3x2N/2Δs(N-Δs)a2-βVc[x]. (2)

To obtain Pfold we set N = 22 for the number of bonds, a = 0.545 nm for their spacing, β = 1/kBT, βk = 1.65 nm−2, c = 4 nm for the strength and cutoff distance of the interior bond interaction, respectively, and Δs = 18. For this set of parameters, the transition midforce [where P(x < c) = P(x > c)] is fm ≈ 16.8 pN. The two minima and the position of the barrier top of the free energy βF(R|fm) = −log[P(R)] are easily determined as RN ≈ 3.91 nm, RU ≈ 9.39 nm, and RTS = 6.00 nm, respectively [Fig. 2(a)].

FIG. 2.

FIG. 2

(color online). Relaxation dynamics of GRM chains from the barrier top of βF(R|fm). (a) βF(R|fm) for the GRM with interior interaction. Parameters are listed in the text. Red symbols are from the simulation result sampled by using the chain with the Hamiltonian of Eq. (1). The solid line is the theoretical fit. The arrow shows the position of the barrier top. (b) The distribution of interior distance at R = RTS. The black line is the theoretical prediction. (c) The initial distribution of R prior to the relaxation dynamics, which is very tight around RRTS = 6 nm. (d) The probability of being folded as a function of R. Open circles show the exact numerical solution for the sharp potential [Eq. (1)]. The filled circles are from simulations. The solid black line shows the approximate solution using the smoothed potential [Eq. (3)].

To calculate Pfold, we prepared 5000 GRM chains with R = RTS = 6 nm that corresponds to the maximum in F(R|fm) [Fig. 2(a)] and allowed the system to relax to either of the two basins of attraction, ending the simulation when the chain extension attains the value RN or RU. These simulations are performed by using the Hamiltonian in Eq. (1) and not on the simple one-dimensional profile βF(R) = −log[P(R)]. We find 60% (40%) of the initial chains from RTS reach R = RN (R = RU), which implies Pfold = 0.6. Similar to the P5GA, the GRM dynamics projected onto the R coordinate using both sets of parameters exhibit a number of recrossing events.

To understand the relation between x (the structural coordinate which specifies the NBA and UBA in the GRM) and chain extension R (pulling coordinate that is conjugate to f), we approximate the sharp interaction in Eq. (1) by a smoothed potential:

βVc[x]βVS[x]=-log(e-βk(x2-c2)/2+1), (3)

by taking advantage of the clear separation between the UBA and the NBA [Fig. 2(a)]. Defining Inline graphic = 〈eβk(x2c2)/2δ(|R|−R)〉0, with 〈…〉0 denoting an average over the Gaussian backbone, we can compute approximately PR(x<c)=0cdxδ(R-R)Nk/(Nk=0+Nk), with 〈…〉 representing an average over the GRM potential in Eq. (3). The probability of bond formation satisfies

PR(x<c)[1+λ3/2exp(3R2κσ22Na2λ-βkc22)]-1, (4)

where κ = Na2βk/3, σ = Δs/N, and λ=1+σ(1−σ)κ. At R = 6 nm, Eq. (4) gives PR(x < c) ≈ 0.402 ≠ 0.5. Thus, when the dynamics is initiated from the top of the apparent free energy barrier using the R variable, their internal coordinate (x) is populated primarily with unfolded conformations [Fig. 2(b)]. Despite the fact that the UBA is primarily populated at R = RTS, we find that Pfold ≈ 0.6, so that the NBA will be predominantly populated as the trajectories progress. The midpoint of transition [PR(x < c) = 0.5] occurs at Rmid = [(Na2λ/3κσ2)(βkc2 − 3 logλ)]1/2 = 5.91 nm, which deviates slightly from RTS = 6.00 nm. The results from VS[x], which are in excellent agreement with the simulations as well as the numerical results using Vc[x] (Fig. 2), also suggest that hopping kinetics in the GRM involves coupling between x and R. Thus, even in this simple system, accurate location of the TSE should consider two-dimensional free energy profiles (see [6,16]).

To answer the second question, we introduce a molecular tensegrity parameter, which is a ratio of tensile force and a force that determines the stability of the biopolymer. The limit of mechanical stability of the NBA is determined by the critical unbinding force fc = ΔF(fm)/Δx(fm). At the midpoint force f = fm, the tensegrity parameter s = fc/fm determines whether the applied external tension is sufficient to overcome the stability of the NBA. Models, which approximate the free energy profiles using a cubic or cusp potential [7], could alter fc or fm from the form suggested above. However, because s involves the ratio of the two forces, the precise numerical factors are not relevant. Barrier crossings between the NBA over the TS are governed by the competition between fc and the applied force f = fm. For fcfm, barrier recrossing from the NBA to the TS will be extremely rare, while if fcfm, barrier crossing events will be common. We would therefore expect as the ratio s = fc/fm increases, barrier recrossings from within the NBA to the TS will decrease, and the probability of reaching R = RN will increase. Thus, Pfold, the structural link to Δx, should be determined by the experimentally measurable tensegrity parameter s.

To confirm this expectation, we ran 20 different sets of GRM parameters (1000 runs each), with fm ranging between 5 and 25 pN, βk ranging from 0.1 to 600 nm−2, c between 0.26 and 6 nm, and Δs between N/2 and N. Figure 3 shows that Punfold = 1 − Pfold decreases monotonically with s = fc/fm. The dependence of Punfold on s (Fig. 3) can be derived by considering diffusion in a 1D version of the GRM potential in Eq. (1): U(x)=-fx+12kux2+Vc(x), with ku being the curvature in the bottom of the UBA. Assuming constant diffusivity, Punfold=xNcdxeβU(x)/xNxUdxeβU(x) [17], where the positions of the transition barrier, native, and unfolded wells are given by c, xN, and xU, respectively. For s ≫ 1, we obtain Punfold = (1 + s)−1/2, with the entire dependence on the energy landscape encoded in s. The analytical result, plotted as a solid curve in Fig. 3, captures the overall trend of the GRM and P5GA simulations.

FIG. 3.

FIG. 3

(color online). Punfold as a function of s = ΔF/fmΔx for a variety of GRM parameters (colored circles). Red corresponds to small βk ~ (0.1–1) nm−2, purple to intermediate βk ~ (1–50) nm−2, and blue to high βk ~ (50–500) nm−2. The point sizes indicate the magnitude of the midpoint force, with the smallest points at fm = 5 pN and the largest at fm = 25 pN. The large square indicates the results of the P5GA simulations with Punfold = 0.35. The gray line is the theoretical prediction Punfold(s) = (1 + s)−1/2 for s≫1. The solid line in the inset shows the predicted Punfold(s) for s ≪ 1 [Eq. (5)] using the DNA hairpin free energy profiles (not deconvolved) in Ref. [2]. The dashed line is for the leucine zipper, which has two barriers, using data from Ref. [3]. As explained in Ref. [18], numbers 1 and 2 are for NBA → I and I → UBA transitions, respectively. Because of different kU values [Eq. (5)], dashed and solid lines do not coincide.

For s ≪ 1, Punfold can be approximated as

Punfold=12[Φ1+Φ+32fm2(Φ-1)πkU(Φ+1)3s]-1, (5)

where Φ=4fm/2πkBTkU and kU = ∂2U=x2 at x = xU. This relation can be used to predict the degree to which the extension is a good reaction coordinate for DNA hairpins. We calculated s from the measured energy landscapes in Ref. [2] and obtained Punfold by using Eq. (5). We took fm ≈ 12.5 pN and βkU ≈ 0.2 nm−2. The predicted Pfold inset in Fig. 3 varies between 0.54 and 0.61 for the DNA hairpin sequences A–D, close to the ideal value of 1/2, indicating that extension is a reasonable coordinate except possibly for sequence C, which has a T:T base pair (bp) mismatch 7 bp from the stem. The limitation of extension as a reaction coordinate for this sequence is consistent with the observation that folding of sequence C has an intermediate as indicated by the three minima in F(R|fm) [2].

To illustrate that the theory is applicable when there are multiple barriers [4], we analyze the data for a leucine zipper which unfolds by populating an intermediate. In this case there are two tensegrity parameters: s1(= 0.05) and s2(= 0.15) (see [18]). The predicted Punfold values show that the extension is a good reaction coordinate for the NBA → I transition but is less for the I → UBA transition. (Figure 3).

Our results can be confirmed by analyzing long time folding trajectories R(t) as long as multiple hopping events occur. Because Δx is f-dependent, it follows that tensegrity can be altered by changing f. Thus, the extension may be a good reaction coordinate over a certain range of f but may not remain so under all loading conditions.

Supplementary Material

SIPRL

Acknowledgments

This work was supported in part by grants from National Research Foundation of Korea (No. 2010-0000602) (C. H.) and National Institutes of Health (No. GM089685) (D. T.).

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SIPRL

RESOURCES