Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2009 Mar 24;130(12):124903. doi: 10.1063/1.3082151

How accurate are polymer models in the analysis of Förster resonance energy transfer experiments on proteins?

Edward P O’Brien 1,2, Greg Morrison 1,3, Bernard R Brooks 2, D Thirumalai 1,4,a)
PMCID: PMC2736576  PMID: 19334885

Abstract

Single molecule Förster resonance energy transfer (FRET) experiments are used to infer the properties of the denatured state ensemble (DSE) of proteins. From the measured average FRET efficiency, ⟨E⟩, the distance distribution P(R) is inferred by assuming that the DSE can be described as a polymer. The single parameter in the appropriate polymer model (Gaussian chain, wormlike chain, or self-avoiding walk) for P(R) is determined by equating the calculated and measured ⟨E⟩. In order to assess the accuracy of this “standard procedure,” we consider the generalized Rouse model (GRM), whose properties [⟨E⟩ and P(R)] can be analytically computed, and the Molecular Transfer Model for protein L for which accurate simulations can be carried out as a function of guanadinium hydrochloride (GdmCl) concentration. Using the precisely computed ⟨E⟩ for the GRM and protein L, we infer P(R) using the standard procedure. We find that the mean end-to-end distance can be accurately inferred (less than 10% relative error) using ⟨E⟩ and polymer models for P(R). However, the value extracted for the radius of gyration (Rg) and the persistence length (lp) are less accurate. For protein L, the errors in the inferred properties increase as the GdmCl concentration increases for all polymer models. The relative error in the inferred Rg and lp, with respect to the exact values, can be as large as 25% at the highest GdmCl concentration. We propose a self-consistency test, requiring measurements of ⟨E⟩ by attaching dyes to different residues in the protein, to assess the validity of describing DSE using the Gaussian model. Application of the self-consistency test to the GRM shows that even for this simple model, which exhibits an order→disorder transition, the Gaussian P(R) is inadequate. Analysis of experimental data of FRET efficiencies with dyes at several locations for the cold shock protein, and simulations results for protein L, for which accurate FRET efficiencies between various locations were computed, shows that at high GdmCl concentrations there are significant deviations in the DSE P(R) from the Gaussian model.

INTRODUCTION

Much of our understanding of how proteins fold comes from experiments in which folding is initiated from an ensemble of initially unfolded molecules whose structures are hard to characterize.1 In many experiments, the initial structures of the denatured state ensemble (DSE) are prepared by adding an excess amount of denaturants or by raising the temperature above the melting temperature (Tm) of the protein.2 Theoretical studies have shown that folding mechanisms depend on the initial conditions, i.e., the nature of the DSE.3 Thus, a quantitative description of protein folding mechanisms requires a molecular characterization of the DSE—a task that is made difficult by the structural diversity of the ensemble of unfolded states.4, 5

In an attempt to probe the role of initial conditions on folding, single molecule Förster resonance energy transfer (FRET) experiments are being used to infer the properties of unfolded proteins. The major advantage of these experiments is that they can measure the FRET efficiencies of the DSE under solution conditions where the native state is stable. The average denaturant-dependent FRET efficiency ⟨E⟩ has been used to infer the global properties of the polypeptide chain in the DSE as the external conditions are altered. The properties of the DSE are inferred from ⟨E⟩ by assuming a polymer model for the DSE, from which the root mean squared distance between two dyes attached at residues i and j along the protein sequence (Rij=⟨∣rirj∣⟩), the distribution of the end-to-end distance P(R) (where R=∣rNr0∣), the root mean squared end-to-end distance (Ree=⟨R21∕2), the root mean squared radius of gyration (Rg=Rg212), and the persistence length (lp) of the denatured protein6, 7, 8, 9, 10, 11, 12, 13, 14, 15 can be calculated.

In FRET experiments, donor (D) and acceptor (A) dyes are attached at two locations along the protein sequence,4, 16 and hence can only provide information about correlations between them. The efficiency of energy transfer E between the D and A is equal to (1+r6R06)1, where r is the distance between the dyes, and R0 is the dye-dependent Förster distance.4, 16 Because of conformational fluctuations, there is a distribution of r, P(r), which depends on external conditions such as the temperature and denaturant concentration. As a result, the average FRET efficiency ⟨E⟩ is given by

E=0(1+r6R06)1P(r)dr (1)

under most experimental conditions due to the central limit theorem.17 If the dyes are attached to the ends of the chain, then P(r)=P(R). Even if ⟨E⟩ is known accurately, the extraction of P(R) from the integral equation [Eq. 1] is fraught with numerical instabilities. In experimental applications to biopolymers, a functional form for P(r) is assumed in order to satisfy the equality in Eq. 1. The form of P(r) is based off of a particular polymer model which depends only on a single parameter (see Table 1): the Gaussian chain (dependent on the Kuhn length a), the wormlike chain (WLC) (dependent on the persistence length lp), and the self-avoiding walk (SAW) (dependent on the average end-to-end distance Ree). For the chosen polymer model meant to represent the biopolymer of interest, the free parameter (a, lp, or Ree) is determined numerically to satisfy Eq. 1. Using this method (referred to as the “standard procedure” in this article), several researchers have estimated Rg and lp as a function of the external conditions for protein L,11, 14 cold shock protein (CspTm),13 and Rnase H.16 The justification for using homopolymer models to analyze FRET data comes from the anecdotal comparison of the Rg measured using x-ray scattering experiments and the extracted Rg from analysis of Eq. 1.4

Table 1.

Polymer models and their properties.

Polymer model Property
End-to-end distribution P(R)a Radius of gyration Rg Persistence length lp
Gaussian 4πR2(32πNa2)32exp(3R22Na2) aN6 Na22L=a2
WLCb 4πR2C1L(1(RL)2)92exp(3L4lp(1(RL)2)) L6C2+14C22+14LC231exp(Llp)8C24L2 Ree2=2lpL2lp22lp2exp(Llp)c
Self-avoiding polymerd aRee(RRee)2+θexp(b(RRee)δ) N∕A N∕A
a

The average end-to-end distance Ree=(∫R2P(R)dR)1∕2.

b

L and lp are the contour length and persistence length, respectively. C1=(π32eαα32(1+3α1+154α2))1, where α=3L∕(4lp). C2=1∕(2lp).

c

Using the simulated ⟨R2⟩, lp was solved for numerically using this equation.

d

θ and δ equal to 0.3 and 2.5, respectively. The constants a and b are determined by solving the integrals of the zeroth and second moment of ∫P(R)dr=∫R2P(R)dr=1, resulting in values of a=3.678 53 and b=1.231 52.

Here, we study an analytically solvable generalized Rouse model (GRM)18 and the Molecular Transfer Model (MTM) for protein L19 to assess the accuracy of using polymer models to solve Eq. 1. In the GRM, two monomers that are not covalently linked interact through a harmonic potential that is truncated at a distance c. The presence of the additional length scale, c, which reflects the interaction between nonbonded beads, results in the formation of an ordered state as the temperature (T) is varied. A more detailed discussion of these models can be found in Sec. 4. For the GRM, P(R) can be analytically calculated, and hence the reliability of the standard procedure to solve Eq. 1 can be unambiguously established. We find that the accuracy of the polymer models in extracting the exact values in the GRM depends on the location of the monomers that are constrained by the harmonic interaction. Using coarse-grained simulations of protein L, we show that the error between the exact quantity and that inferred using the standard procedure depends on the property of interest. For example, the inferred end-to-end distribution P(R) is in qualitative, but not quantitative agreement with the exact P(R) distribution obtained from accurate simulations. In general, the DSE of protein L is better characterized by the SAW polymer model than the Gaussian chain model.

We propose that the accuracy of the popular Gaussian model can be assessed by measuring ⟨E⟩ with dyes attached at multiple sites in a protein.13, 20, 21 If the DSE can be described by a Gaussian chain, then the parameters extracted by attaching the dyes at position i and j can be used to predict ⟨E⟩ for dyes at other points. The proposed self-consistency test shows that the Gaussian model only qualitatively accounts for the experimental data of CspTm, simulation results for protein L, and the exact analysis of the GRM.

RESULTS AND DISCUSSION

We present the results in three sections. In Secs. 2A, 2B we examine the accuracy of the standard procedure (described in Sec. 1) in accurately inferring the properties of the denatured state of the GRM and protein L models. Section 2C presents results of the Gaussian self-consistency (GSC) test applied to these models. We also analyze experimental data for CspTm to assess the extent to which the DSE deviates from a Gaussian chain.

GRM

The GRM is a simple modification of the Gaussian chain with N bonds and Kuhn length a0, which includes a single, noncovalent bond between two monomers at positions s1 and s2 (Fig. 1). The monomers at s1 and s2 interact with a truncated harmonic potential with spring constant k, with strength κ=kc2∕2, where c is the distance at which the interaction vanishes [Eq. 4]. The GRM minimally represents a two-state system, with a clear demarcation between ordered [with ∣r(s2)−r(s1)∣≤c] and disordered [with ∣r(s2)−r(s1)∣>c] states. Unlike other polymer models (see Table 1), which are characterized by a single length scale, the GRM is described by a0 and the energy scale κ. For βκ→0 (the high temperature limit, where β=1∕kBT), the simple Gaussian chain is recovered (see Sec. 4 for details). By varying βκ, a disorder→order transition can be induced (see Fig. 1). The presence of the interaction between monomers s1 and s2 approximately mimics persistence of structure in the DSE of proteins. If the fraction of ordered states, fO, exceeds 0.5 (Fig. 1 inset), we assume that the residual structure is present with high probability. The exact analysis of the GRM when ∣r(s2)−r(s1)∣≤c allows us to examine the effect of structure in the DSE on the global properties of unfolded states.

Figure 1.

Figure 1

Top figures show a schematic sketch of the GRM, with the donor and acceptor at the end points, represented by the green spheres, and the interacting monomers at s1 and s2 represented by the red spheres. In the ordered configuration, the monomers at s1 and s2 are tightly bound. The bottom figure shows the exact and the inferred end-to-end distribution functions P(r) for interior interactions (Δs=31). The blue lines correspond to the Gaussian chain model, light green lines to the SAW, and the symbols to the exact GRM distribution. Dashed lines and red circles are for βκ=6.6, while solid lines and red squares correspond to βκ=2. In the inset we show the fraction of ordered states as a function of βκ. Note that 75% of the structures are ordered at βκ=6.6, yet the inferred Gaussian P(r) is in excellent agreement with the exact result.

Because ⟨E⟩ can be calculated exactly for the GRM [see Eq. 5], it can be used to quantitatively study the accuracy of solving Eq. 1 using the standard procedure.6, 10, 11, 13, 14 Given the best fit for the Gaussian chain (Kuhn length a), WLC (persistence length lp), and SAW (average end-to-end distance Ree), as described in Table 1, many quantities of interest can be inferred [P(R) or Rg, for example], and compared to the exact results for the GRM. The extent to which the exact and inferred properties deviate, due to the additional single energy scale in the GRM, is an indication of the accuracy of the standard procedure used to analyze Eq. 1.

P(R) is accurately inferred using the Gaussian polymer model

If the interacting monomers are located near the end points of the chain, the end-to-end distribution function is bimodal, with a clear distinction between the ordered and disordered regions.18 However, if the monomers s1 and s2 are in the interior of the chain, the two-state behavior is obscured because the distribution function becomes unimodal. In Fig. 1, we show the exact and inferred P(R) functions for a chain with N=63, a0=3.8 Å, c=2a0, and ∣s2s1∣=(N−1)∕2=31. We take the Förster distance [Eq. 1] R0=23ÅR2κ=012 for the GRM. The distributions are unimodal for both weakly (βκ=2) and strongly (βκ=6.6) interacting monomers.

The strength of the interaction is most clearly captured with the fraction of conformations in the ordered state, fO, with fO=0.25 for the weakly interacting chain and fO=0.75 for the strongly interacting chain (inset of Fig. 1). The inferred Gaussian distribution functions are in excellent agreement with the exact result. Because of the underlying Gaussian Hamiltonian in the GRM, the rather poor agreement in the inferred SAW distribution seen in Fig. 1 is to be expected. We also note that the GRM is inherently flexible so that the WLC and Gaussian chains produce virtually identical distributions.

The accuracy of the inferred Rg depends on the location of the interaction

The two-state nature of the GRM is obscured by the relatively long unstructured regions of the chain, similar to the effect seen in laser optical tweezers experiments with flexible handles.18 As a result, P(R) is well represented by a Gaussian chain, with a smaller inferred Kuhn length, aa0 (Fig. 2). For large βκ, where the ordered state is predominantly occupied and r(s2)≈r(s1), the end-to-end distribution function is well approximated by a Gaussian chain with N*=N−Δs bonds. Consequently, the single length scale for the Gaussian chain decreases to aa01ΔsN0.71a0 for large values of βκ (Fig. 2).

Figure 2.

Figure 2

The inferred Kuhn length a as a function of βκ for the GRM. Ree monotonically decreases a function of the interaction strength, leading to the decrease in aa0. The Kuhn length a reaches its limiting value of aa01ΔsN when fO≈1.

Because the two-state nature of the chain is obscured for certain values of ∣s2s1∣, the Gaussian chain gives an excellent approximation to the end-to-end distribution function. However, the radius of gyration Rg is not as accurately obtained using the Gaussian chain model, as shown in Fig. 3. The exact Rg for the GRM reflects both the length scale a0 and the energy scale βκ, which can not be fully described by the single inferred length scale a in the Gaussian chain. For the GRM, Rg depends not only on the separation between the monomers Δs, but also explicitly on s1 (i.e., where the interaction is along the chain; see Fig. 3 and Sec. 4), which can not be captured by the Gaussian chain. If the interacting monomers are in the middle of the chain [s1=(N+1)∕4=16 and Δs=31], the inferred Rg is in excellent agreement with the exact result (Fig. 3). The relative error in Rg (the difference between the inferred and exact values, divided by the exact value) is no less than −2%. However, for interactions near the end point of the chain, with s1=0 and the same Δs=31, the relative error between the inferred and exact values of Rg is ∼−14%. The large errors arise because the radius of gyration depends on the behavior of all of the monomers so that the energy scale βκ plays a much larger role in the determination of Rg than Ree.

Figure 3.

Figure 3

Comparison of the exact (symbols) and inferred (blue line) values of the radius of gyration (Rg) as a function of βκ for Δs=31. Shown are Rg’s for the GRM with s1=0 (open symbols) and s1=16 (filled symbols) for N=63. The structures in the ordered state are shown schematically. The Rg obtained using the standard procedure is independent of s1, while the exact result is not. The inset shows the relative errors between the inferred and exact values of Rg.

MTM for protein L

Protein L is a 64 residue protein [Fig. 4a] whose folding has been studied by a variety of methods.11, 14, 22, 23, 24 More recently, single molecule FRET experiments have been used to probe changes in the DSE as the concentration of GdmCl is increased from 0 to 7M.11, 14 From the measured GdmCl-dependent ⟨E⟩, the properties of the DSE, such as Ree, P(R), and Rg, were extracted by solving Eq. 1, and assuming a Gaussian chain P(R).11, 14 To further determine the accuracy of polymer models in the analysis of ⟨E⟩, we use simulations of protein L in the same range of the concentration of denaturant, [C], as used in experiments.6, 9

Figure 4.

Figure 4

(a) A secondary structure representation of protein L in its native state. Starting from the N-terminus, the residues are numbered 1–64. (b) The average FRET efficiency between the various (i,j) residue pairs in protein L vs GdmCl concentration. The ⟨Eij⟩ values, computed using MTM simulations, for each (i,j) pair is indicated by the two numbers next to each line. For example, the numbers “1–64” beneath the black line indicates that i=1 and j=64. The solid black line (lowest values of ⟨E⟩) is computed for the dyes at the end points.

The average end-to-end distance is accurately inferred from FRET data

In a previous study,19 we showed that the predictions based on MTM simulations for protein L are in excellent agreement with experiments. From the calculated ⟨E⟩ with the dyes at the end points [solid black line in Fig. 4b], which is in quantitative agreement with experimental measurements,19 we determine the model parameter Ree or lp by assuming that the exact P(R) can be approximated by the three polymer models in Table 1. Comparison of the exact value of Ree to the inferred value RF, obtained using the simulation results for ⟨E⟩, shows good agreement for all three polymer models [Fig. 5a]. There are deviations between Ree and RF at [C]>Cm, the midpoint of the folding transition. The maximum relative error [see inset of Fig. 5A] we observe is about 10% at the highest concentration of GdmCl. The SAW model provides the most accurate estimate of Ree at GdmCl concentrations above Cm, with a relative error ≤0.05, and the Gaussian model gives the least accurate values, with a relative error ≤0.10 [Fig. 5a]. Due to the relevance of excluded volume interaction in the DSE of real proteins, the better agreement using the SAW is to be expected.

Figure 5.

Figure 5

(a) The root mean squared end-to-end distance (Ree) as a function of GdmCl concentration for protein L. The average Ree (black circles) and R for the subpopulation of the DSE [(red) squares] from simulations are shown. The values of Ree inferred by solving Eq. 1 by the standard procedure using the Gaussian chain, WLC, and SAW polymer models are shown for comparison as the top, middle, and bottom solid lines, respectively. The inset shows the relative errors between the exact and the values inferred using the FRET efficiency for Ree vs GdmCl concentration. The top, middle, and bottom lines correspond to the Gaussian chain, WLC, and SAW polymer models, respectively. (b) Simulation results of the denatured state end-to-end distance distribution (P(R)) at 2.4M GdmCl [solid (red) squares] and 6M GdmCl [open (red) squares] and T=327.8 K are compared with P(R)s using the Gaussian chain, WLC, and SAW polymer models are also shown at 2.4M GdmCl (dashed lines) and 6M GdmCl (solid lines). The top, middle, and bottom lines correspond to the SAW, WLC, and Gaussian chain polymer models, respectively.

Polymer models do not give quantitative agreement with the exact P(R)

The inferred distribution functions, PF(R)’s, obtained by the standard procedure (as described in the introduction) at [C]=2M and 6M GdmCl differ from the exact results [Fig. 5b]. Surprisingly, the agreement between P(R) and PF(R) is worse at higher [C]. The range of R explored and the width of the exact distribution are less than predicted by the polymer models. The Gaussian chain and the SAW models account only for chain entropy, while the WLC only models the bending energy of the protein. However, in protein L (and in other proteins) intramolecular attractions are still present even when [C]=6M>Cm. As a result, the range of R explored in the protein L simulations is expected to be less than in these polymer models. Only at [C]∕Cm⪢1 and∕or at high T are proteins expected to be described by Flory random coils. Our results show that although it is possible to use models that can give a single quantity correctly (Ree, for example), the distribution functions are less accurate. The results in Fig. 5b show that P(R), inferred from the polymer models, agrees only qualitatively with the exact P(R), with the SAW model being the most accurate [Fig. 5b]. While the MTM will not perfectly reproduce all of the fine details of protein L under all situations, we expect it to produce more realistic results than idealized polymer models, which have no specific intrachain interactions.

Inferred Rg and lp differ significantly from the exact values

The solution of Eq. 1 using a Gaussian chain or WLC model yields a and lp, from which Rg can be analytically calculated (Table 1). Figures 6a, 6b, which compare the FRET inferred Rg and lp with the corresponding values obtained using MTM simulations, show that the relative errors are substantial. At high [C] values the RgF deviates from Rg by nearly 25% if the Gaussian chain model is used [Fig. 6a]. The value of Rg≈26 Å at [C]=8 M while RgF using the Gaussian chain model is ≈31 Å. In order to obtain reliable estimates of Rg, an accurate calculation of the distance distribution between all the heavy atoms in a protein is needed. Therefore, it is reasonable to expect that errors in the inferred P(R) are propagated, leading to a poor estimate of internal distances, thus resulting in a larger error in Rg. A similar inference can be drawn about the persistence length obtained using polymer models [Fig. 6b]. Plotting lpF as a function of [C] [Fig. 6b], against lp=Ree∕2L, shows that lp is overestimated at concentrations above 1M GdmCl, with the error increasing as [C] increases. The error is less when the Gaussian chain model is used.

Figure 6.

Figure 6

(a) Comparison of Rg from direct simulations of protein L and that obtained by solving Eq. 1 using the Gaussian chain and WLC polymer models. The top line (magenta) shows the WLC fit, the bottom line (blue) shows the Gaussian fit, squares (red) show the DSE Rg from the simulation, and black circles show the average simulated Rg. The inset shows the relative errors as a function of GdmCl concentration; top and bottom lines correspond to the WLC and Gaussian chain polymer models, respectively. (b) Same as (a) except the figure is for lp. Top and bottom lines correspond to the inferred lp using the WLC and Gaussian chain polymer models, respectively. Top and bottom sets of squares correspond to a direct analysis of the simulations using the WLC and Gaussian chain polymer models, respectively.

Gaussian self-consistency test shows the DSE is non-Gaussian

The extent to which the Gaussian chain accurately describes the ensemble of conformations that are sampled at different values of the external conditions (temperature or denaturants) can be assessed by performing a self-consistency test. A property of a Gaussian chain is that if the average root mean square distance, Rij, between two monomers i and j is known then Rkl, the distance between any other pair monomers k and l, can be computed using

Rkl=klijRij. (2)

Thus, if the conformations of a protein (or a polymer) can be modeled as a Gaussian chain, then Rij inferred from the FRET efficiency ⟨Eij⟩ should accurately predict Rkl and the FRET efficiency ⟨Ekl⟩, if the dyes were to be placed at monomers k and l. We refer to this criterion as the GSC test, and the extent to which the predicted Rkl from Eq. 2 deviates from the exact Rkl reflects deviations from the Gaussian model description of the DSE.

GSC test for the GRM

For the GRM, with a nonbonded interaction between monomers s1 and s2, we calculate ⟨Eij⟩ using Eq. 8 with j fixed at 0 and for i=20, 40, and 60. Using the exact results for ⟨Eij⟩, the values of Rij are inferred assuming that P(r) is a Gaussian chain. From the inferred Rij the values of ⟨Ekl⟩ and Rkl can be calculated using Eqs. 1, 2, respectively. We note that since RklRij=klij [Eq. 2] for any pair (k,l) using the Gaussian chain model, the prediction of the Gaussian chain will be independent of the particular choices of k and l, as long as their difference is held constant. We first apply the GSC test to a GRM in which fO≈0.75 due to a favorable interaction between monomers s1=16 and s2=47. There are discrepancies between the values of the Gaussian inferred (RklG) and exact Rkl distances, as well as the inferred (EklG) and exact ⟨Eij⟩ efficiencies when a Gaussian model is used (Fig. 7). The relative errors in the predicted values of the FRET efficiency and the interdye distances can be as large as 30%–40%, depending on the choice of i and j (see insets in Fig. 7). We note that the relative error in the end-to-end distance is small for dyes near the end points [the green line in Fig. 7b], in agreement with the results shown in Fig. 1. The errors decrease as fO decreases, with a maximum error of 20% when fO=0.5, and 10% when fO=0.25 (data not shown). By construction, the GRM is a Gaussian chain when fO=0 and therefore the relative errors will vanish at sufficiently small βκ (data not shown). These results show that even for the GRM, with only one nonbonded interaction in an otherwise Gaussian chain, its DSE cannot be accurately described using a Gaussian chain model. Thus, even if the overall end-to-end distribution P(r) for the GRM is well approximated as a Gaussian (as seen in Fig. 1), the internal Rkl monomer pair distances can deviate from predictions of the Gaussian chain model.

Figure 7.

Figure 7

GSC test using (a) the FRET efficiency and (b) the average end-to-end distance for the GRM with fO=0.75 and interaction sites at s1=16 and s2=47. In both (a) and (b) the solid lines are the inferred properties and the open symbols are the exact values. In both (a) and (b), j=0 and the blue, magenta, and green lines correspond to a dye at i=20, 40, and 60, respectively. The insets show the relative error for ⟨Ekl⟩ and Rkl. Note that the relative error would be zero if the Gaussian chain accurately modeled the GRM.

GSC test for protein L

We apply the GSC test to our simulations of protein L at GdmCl concentrations of [C]=2.0M (below Cm=2.4M) and [C]=7.5M (well above Cm). While our simulations allow us to compute the DSE ⟨Eij⟩ for all possible (i,j) pairs, we examine only a subset of ⟨Eij⟩ as a function of GdmCl concentration [Fig. 4b]. By choosing multiple j values for the same value of i, we can determine whether distant residues along the backbone are close together spatially, which may offer insights into three-point correlations in denatured states. We note that all values of ⟨Eij⟩ in Fig. 4 are monotonically decreasing, except for the (1,14) pair. This is due to the fact that the native state has a beta strand between these two residues; as the protein denatures, they come closer together, increasing the FRET efficiency. We use these values for ⟨Eij⟩ in the GSC test. The results are shown in Figs. 8a, 8b. Relative errors in ⟨Ekl⟩ as large as 36% at 2.0M GdmCl and 50% at 7.5M GdmCl are found, with the lowest errors generally seen for residues close to one another along the backbone, in agreement with the results from the GRM [Fig. 7a inset]. In addition, the number of data points that underestimate ⟨Ekl⟩ increases as [C] is changed from 7.5 to 2.0M for ∣kl∣<20. Despite these differences, the gross features in Figs. 8a, 8b are concentration independent. Because the error does not vanish for all (k,l) pairs [Figs. 8a, 8b], we conclude that the DSE of protein L cannot be modeled as a Gaussian chain.

Figure 8.

Figure 8

The Gaussian self-consistency test applied to simulated DSE ⟨Eij⟩ data of protein L using the (i,j) pairs listed in Fig. 4b. Shown are the relative errors at (a) 2.0M GdmCl and (b) 7.5M GdmCl. In both (a) and (b), solid (green) circles correspond to ∣ij∣=13, open (orange) squares to ∣ij∣=16, solid (blue) squares to ∣ij∣=19, open (brown) circles to ∣ij∣=29, asterisks (cyan) to ∣ij∣=30, diamonds (red) to ∣ij∣=34, solid (violet) triangles to ∣ij∣=44, open (gray) triangles to ∣ij∣=50, and crosses (magenta) to ∣ij∣=54. Each symbol corresponds to a line in Fig. 4b, with the colors of the symbols corresponding to the colors of the line, except for the 1–64 pair (not shown here).

GSC test for CspTm

In an interesting single molecule experiment, Hoffmann et al.13 measured FRET efficiencies by attaching donor and acceptor dyes to pairs of residues at five different locations of a CspTm. They analyzed the data by assuming that the DSE properties can be mimicked using a Gaussian chain model. We used the GSC test to predict ⟨Ekl⟩ for dyes separated by ∣kl∣ along the sequence using the experimentally measured values ⟨Eij⟩.

The relative error in ⟨Ekl⟩ [Eq. 2] should be zero if CspTm can be accurately modeled as a Gaussian chain. However, there are significant deviations (up to 17%) between the predicted and experimental values (Fig. 9). The relative error is fairly insensitive to the denaturant concentration [compare Figs. 9a, 9b]. It is interesting to note that the trends in Fig. 9 are qualitatively similar to the relative errors in the GRM at fO>0. Based on these observations we conclude tentatively that whenever the DSE is ordered to some extent (i.e., when there is persistent residual structure) then we expect deviations from a homopolymer description of the DSE of proteins. At the very least, the GSC test should be routinely used to assess errors in the modeling of the DSE as a Gaussian chain.

Figure 9.

Figure 9

The GSC test using experimental data from CspTm. One dye was placed at one end point, and the location of the other was varied. We show relative error of the predicted ⟨E⟩, using Eqs. 1, 2, versus the distance between the dyes (∣kl∣) for [C]=2M (a) and 5M (b). In both (a) and (b), triangles correspond to ∣ij∣=33, x’s to ∣ij∣=45, diamonds to ∣ij∣=46, squares to ∣ij∣=57, and circles to ∣ij∣=65. The trends in Figs. 78 are similar.

CONCLUSIONS

In order to assess the accuracy of polymer models to infer the properties of the DSE of proteins from measurement of FRET efficiencies, we studied two models for which accurate calculations of all the equilibrium properties can be carried out. Introduction of a nonbonded interaction between two monomers in a Gaussian chain (the GRM) leads to an disorder-order transition as the temperature is lowered. The presence of “residual structure” in the GRM allows us to clarify its role in the use of the Gaussian chain model to fit the accurately calculated FRET efficiency. Similarly, we have used the MTM model for protein L to calculate precisely the denaturant-dependent ⟨E⟩ from which we extracted the global properties of the DSE by solving Eq. 1 using the P(R)’s for the polymer models in Table 1. Quantitative comparison of the exact values of a number of properties of the DSE (obtained analytically for the GRM and accurately using simulations for protein L) and the values inferred from ⟨E⟩ has allowed us to assess the accuracy with which polymer models can be used to analyze the experimental data. The major findings and implications of our study are listed below.

  • (1)

    The polymer models, in conjunction with the measured ⟨E⟩, can accurately infer values of Ree, the average end-to-end distance. However, P(R), lp, and Rg are not quantitatively reproduced. For the GRM, Rg is underestimated, whereas it is overestimated for protein L. The simulations show that the absolute value of the relative error in the inferred Rg can be nearly 25% at elevated GdmCl concentration.

  • (2)

    We propose a simple self-consistency test to determine the ability of the Gaussian chain model to correctly infer the properties of the DSE of a polymer. Because the Gaussian chain depends only on a single length scale, the FRET efficiency can be predicted for varying dye positions once ⟨E⟩ is accurately known for one set of dye positions. The GSC test shows that neither the GRM, simulations of protein L, nor experimental data on CspTm can be accurately modeled using the Gaussian chain. The relative errors between the exact and predicted FRET efficiencies can be as high as 50%. For the GRM, we find that the variation in the FRET efficiency as a function of the dye position changes abruptly if one dye is placed near an interacting monomer. Taken together these findings suggest that it is possible to infer the structured regions in the DSE by systematically varying the location of the dyes. This is due to the fact that the FRET efficiency is perfectly monotonic using the Gaussian chain model. An experiment that shows nonmonotonic behavior in ⟨Eij⟩ as the dye positions i and j are varied is a clear signal of non-Gaussian behavior, and sharp changes in the FRET efficiency as a function of ∣ij∣ may indicate strongly interacting sites [see Fig. 7a].

  • (3)

    The properties of the DSE inferred from Eq. 1 become increasingly more accurate as [C] decreases. At a first glance this finding may be surprising, especially considering that stabilizing intrapeptide interactions are expected to be weakened at high GdmCl concentrations [C], and therefore the protein should be more “polymerlike.” The range of R-values sampled at low [C] is much smaller than at high [C]. Protein L swells as [C] is increased, as a consequence of the increase in the solvent quality. It is possible that [C]≈2.4M might be close to a Θ-solvent (favorable intrapeptide and solvent-peptide interactions are almost neutralized) so that P(R) can be approximated by a polymer model. The inaccuracy of polymer models in describing P(R) at [C]=6M suggests that only at much higher concentrations does protein L behave as a random coil. In other words, T=327.8 K and [C]=6M is not an athermal (good) solvent.

  • (4)

    It is somewhat surprising that polymer models, which do not have side chains or any preferred interactions between the beads, are qualitatively correct in characterizing the DSE of proteins with complex intramolecular interactions. In addition, even [C]=6M GdmCl is not an athermal solvent, suggesting that at lower [C] values the aqueous denaturant may be closer to a Θ-solvent. A consequence of this observation is that for many globular proteins, the extent of collapse may not be significant, resulting in the nearness of the concentrations at which collapse and folding transitions occur, as shown by Camacho and Thirumalai25 some time ago. We suggest that only by exploring the changes in the conformations of polypeptide chains over a wide range of temperature and denaturant concentrations can one link the variations of the DSE properties (compaction) and folding (acquisition of a specific structure).

THEORY AND COMPUTATIONAL METHODS

GRM model

In order to understand the effect of a single noncovalent interaction between two monomers along a chain, we consider a Gaussian chain with Kuhn length a0 and N bonds, with a harmonic attraction between monomers s1s2, which is cutoff at a distance c. The Hamiltonian for the GRM is

βH=32a20Ndsr˙2(s)+βV[r(s2)r(s1)], (3)
βV[r]={kr22,r<ckc22,rc,} (4)

where k is the spring constant that constrains r(s2)−r(s1) to a harmonic well. The Hamiltonian in Eq. 3 allows the exact determination of many quantities of interest. Defining x=r(s2)−r(s1) and Δs=s2s1, we can determine most averages of interest for the GRM using

=d3xd3rN()G(x,rN;Δs,N)d3xd3rNG(x,rN;Δs,N), (5)
G(x,rN;Δs,N)=exp(3x22Δsa23(rNx)22(NΔs)a2βV[x]). (6)

Cα-SCM protein model and GdmCl denaturation

We use the coarse-grained Cα-side chain model (Cα-SCM) to model protein L (for details, see the supporting information in Ref. 19). In the Cα-SCM each residue in the polypeptide chain is represented using two interaction sites, one that is centered on the α-carbon atom and another that is located at the center of mass of the side chain.26 Langevin dynamics simulations27 are carried out in the underdamped limit at zero molar guanidinium chloride. Simulation details are given in.19

We model the denaturation of protein L by GdmCl using the MTM.19 MTM combines simulations at zero molar GdmCl with experimentally measured transfer free energies, using a reweighing method28, 29, 30 to predict the equilibrium properties of proteins at any GdmCl concentration of interest.

ANALYSIS

GRM

The average squared end-to-end distance can be computed directly from Eq. 5, using Ree2=Na02+(x2Δsa02). The exact expression for ⟨x2⟩ is easily determined, but somewhat lengthy, and we omit the explicit result here. Also of interest is the end-to-end distribution function, P(R)=⟨δ[rNR]⟩, which can be obtained from Eq. 5. In order to determine the probability of an interior bond being in the “ordered” state [i.e., the fraction of residual structures, see the inset for Fig. 1a], we compute the interior distribution, PI(X)=⟨δ[xX]⟩, so that fO=∫x∣≤cd3xPI(x). The radius of gyration requires a more complicated integral than the one found in Eq. 5, but we find

Rg2=Na026+(x2Δsa02)[Δs3N+s1N(Δs2N+s1N)2]. (7)

Note that unlike the average end-to-end distance, the radius of gyration depends not only on Δs, but also on s1.

The FRET efficiency for a system with dyes attached to r(j=0)=0 and r(i), ⟨E⟩=⟨[1+(∣r(i)∣∕R)6]−1⟩, is determined from Eq. 5 as

E(i)={EG(i),0is10dxdrg1(x,r;{si})[1+(rR0)6]0dxdrg1(x,r;{si}),s1<i<s20dxdrg2(x,r;{si})[1+(rR0)6]0dxdrg2(x,r;{si}),s2iN,} (8)

where EG(i) is the FRET efficiency for a Gaussian chain with i bonds, and

g1(x,r;{sj})=xr sinh(3(is1)xrλa02)e3(ix2+Δsr2)2λa02βV[x], (9)
g2(x,r;{sj})=xr sinh(3xr(iΔs)a02)e3x22Δsa023(x2+r2)2(iΔs)a02βV[x], (10)
λ=(s2+s1)is12i2. (11)

This result allows us to compute the GSC test, after a numerical integral over r.

Protein L

Averages and distributions were computed using the MTM19 which combines experimentally measured transfer free energies,31 converged simulations and the Weighted Histogram Analysis Method (WHAM) equations.28, 29, 30 The WHAM equations use the simulation time series of potential energy and the property of interest at various temperatures and gives a best estimate of the averages and distributions of that property. The native state ensemble (NSE) and DSE subpopulations were defined as having a structural root mean squared deviation, after least squares minimization, of less than or greater than 5 Å relative to the crystal structure for the NSE and DSE, respectively. The exact values of lp are computed using the average R from simulations and the relationships listed in Table 1.

Notation

Throughout the paper, exact values of all quantities are reported without superscript or subscript. For the GRM, exact values are analytically obtained or calculated by performing a one-dimensional integral numerically. For convenience, exact results for protein L refer to converged simulations. While these simulations have residual errors, the simplicity of the MTM has allowed us to calculate all properties of interest with arbitrary accuracy. The use of subscript or superscript is, unless otherwise stated, reserved for quantities that are extracted by solving Eq. 1 using the polymer models listed in Table 1.

ACKNOWLEDGMENTS

We thank Sam Cho, Govardan Reddy, and David Pincus for their comments on the manuscript. E.O. thanks Guy Ziv for many useful discussions on experimental aspects of FRET measurements and analysis. This work was supported in part by grants from the NSF (No. 05-14056) to D.T., a NIH GPP Biophysics Fellowship to E.O., by the Intramural Research Program of the NIH, National Heart Lung and Blood Institute.

References

  1. Jackson S. E., Folding Des. 10.1016/S1359-0278(98)00033-9 3, R81 (1998). [DOI] [PubMed] [Google Scholar]
  2. Fersht A. R., Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding, 2nd ed. (Freeman, New York, 1999). [Google Scholar]
  3. Klimov D. K. and Thirumalai D., J. Mol. Biol. 353, 1171 (2005). [DOI] [PubMed] [Google Scholar]
  4. Schuler B. and Eaton W. A., Curr. Opin. Struct. Biol. 10.1016/j.sbi.2007.12.003 18, 16 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Haran G., J. Phys.: Condens. Matter 10.1088/0953-8984/15/32/201 15, R1291 (2003). [DOI] [Google Scholar]
  6. Deniz A. A., Laurence T. A., Beligere G. S., Dahan M., Martin A. B., Chemla D. S., Dawson P. E., Schultz P. G., and Weiss S., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.090104997 97, 5179 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Navon A., Ittah V., Landsman P., Scheraga H. A., and Haas E., Biochemistry 40, 105 (2001). [DOI] [PubMed] [Google Scholar]
  8. Rhoades E., Cohen M., Schuler B., and Haran G., J. Am. Chem. Soc. 10.1021/ja046209k 126, 14686 (2004). [DOI] [PubMed] [Google Scholar]
  9. Sinha K. K. and Udgaonkar J. B., J. Mol. Biol. 353, 704 (2005). [DOI] [PubMed] [Google Scholar]
  10. Kuzmenkina E. V., Heyes C. D., and Nienhaus G. U., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0507728102 102, 15471 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Sherman E. and Haran G., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0601395103 103, 11539 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Saxena A. M., Udgaonkar J. B., and Krishnamoorthy G., J. Mol. Biol. 359, 174 (2006). [DOI] [PubMed] [Google Scholar]
  13. Hoffmann A., Kane A., Nettels D., Hertzog D. E., Baumgartel P., Lengefeld J., Reichardt G., Horsley D. A., Seckler R., Bakajin O., and Schuler B., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0604353104 104, 105 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Merchant K. A., Best R. B., Louis J. M., Gopich I. V., and Eaton W. A., Proc. Natl. Acad. Sci. U.S.A. 104, 1528 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Thirumalai D. and Klimov D. K., Curr. Opin. Struct. Biol. 10.1016/S0959-440X(99)80028-1 9, 197 (1999). [DOI] [PubMed] [Google Scholar]
  16. Nienhaus G. U., Macromol. Biosci. 10.1002/mabi.200600158 6, 907 (2006). [DOI] [PubMed] [Google Scholar]
  17. Gopich I. V. and Szabo A., J. Phys. Chem. B 10.1021/jp027481o 107, 5058 (2003). [DOI] [Google Scholar]
  18. Hyeon C., Morrison G., and Thirumalai D., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0802484105 105, 9604 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. O’Brien E. P., Ziv G., Haran G., Brooks B. R., and Thirumalai D., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0802113105 105, 13403 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Magg C., Kubelka J., Holtermann G., Haas E., and Schmid F. X., J. Mol. Biol. 360, 1067 (2006). [DOI] [PubMed] [Google Scholar]
  21. Sinha K. K. and Udgaonkar J. B., J. Mol. Biol. 370, 385 (2007). [DOI] [PubMed] [Google Scholar]
  22. Yi Q., Scalley M. L., Simons K. T., Gladwin S. T., and Baker D., Folding Des. 10.1016/S1359-0278(97)00038-2 2, 271 (1997). [DOI] [PubMed] [Google Scholar]
  23. Plaxco K. W., Millett I. S., Segel D. J., Doniach S., and Baker D., Nat. Struct. Biol. 10.1038/9329 6, 554 (1999). [DOI] [PubMed] [Google Scholar]
  24. Kim D. E., Fisher C., and Baker D., J. Mol. Biol. 10.1006/jmbi.2000.3701 298, 971 (2000). [DOI] [PubMed] [Google Scholar]
  25. Camacho C. J. and Thirumalai D., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.90.13.6369 90, 6369 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Klimov D. K. and Thirumalai D., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.97.6.2544 97, 2544 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Veitshans T., Klimov D., and Thirumalai D., Folding Des. 10.1016/S1359-0278(97)00002-3 2, 1 (1997). [DOI] [PubMed] [Google Scholar]
  28. Ferrenberg A. M. and Swendsen R. H., Phys. Rev. Lett. 10.1103/PhysRevLett.63.1195 63, 1195 (1989). [DOI] [PubMed] [Google Scholar]
  29. Kumar S., Bouzida D., Swendsen R. H., Kollman P. A., and Rosenberg J. M., J. Comput. Chem. 10.1002/jcc.540130812 13, 1011 (1992). [DOI] [Google Scholar]
  30. Shea J., Nochomovitz Y. D., Guo Z., and Brooks C. L., J. Chem. Phys. 10.1063/1.476842 109, 2895 (1998). [DOI] [Google Scholar]
  31. Auton M. and Bolen D. W., Biochemistry 43, 1329 (2004). [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES