Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Sep 14;109(40):16155-16160. doi: 10.1073/pnas.1207719109

Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy

Hagen Hofmann a,1, Andrea Soranno a, Alessandro Borgia a, Klaus Gast b, Daniel Nettels a, Benjamin Schuler a,1
PMCID: PMC3479594  PMID: 22984159

Abstract

The dimensions of unfolded and intrinsically disordered proteins are highly dependent on their amino acid composition and solution conditions, especially salt and denaturant concentration. However, the quantitative implications of this behavior have remained unclear, largely because the effective theta-state, the central reference point for the underlying polymer collapse transition, has eluded experimental determination. Here, we used single-molecule fluorescence spectroscopy and two-focus correlation spectroscopy to determine the theta points for six different proteins. While the scaling exponents of all proteins converge to 0.62 ± 0.03 at high denaturant concentrations, as expected for a polymer in good solvent, the scaling regime in water strongly depends on sequence composition. The resulting average scaling exponent of 0.46 ± 0.05 for the four foldable protein sequences in our study suggests that the aqueous cellular milieu is close to effective theta conditions for unfolded proteins. In contrast, two intrinsically disordered proteins do not reach the Θ-point under any of our solvent conditions, which may reflect the optimization of their expanded state for the interactions with cellular partners. Sequence analyses based on our results imply that foldable sequences with more compact unfolded states are a more recent result of protein evolution.

Keywords: protein folding, single-molecule FRET, coil-globule transition, polymer theory


It has become increasingly clear that the structure and dynamics of unfolded proteins are essential for understanding protein folding (13) and the functional properties of intrinsically disordered proteins (IDPs) (46). Theoretical concepts from polymer physics (79) have frequently been used to describe the properties of unfolded polypeptide chains (4, 10, 11) with the goal to establish the link between protein folding and collapse (1215). However, the methodology to test many of these concepts experimentally has only become available rather recently (2, 16, 17). A considerable body of experimental and theoretical work suggests that the dimensions of unfolded proteins depend on parameters such as amino acid composition (4), temperature (18), and solvent quality (3, 10, 15, 19). The continuous collapse of polymers has been treated exhaustively by a number of theories (2024) based on general principles that relate the dimensions and the length of a chain to its free energy. However, a prerequisite for the quantitative application of these theories and their comparison to experimental results is that the dimensions of the Θ-state are known, which serves as an essential reference state. At the Θ-point*, chain–chain and chain–solvent interactions balance such that the polymer is at a critical point, at which the thermodynamic phase boundaries disappear. As a result, the polypeptide chain obeys the same length scaling as an ideal chain without excluded volume and intrachain interactions. However, the Θ-conditions for protein chains are unknown. Besides its importance for obtaining the correct thermodynamic parameters of the chain, such as excluded volume and interaction energies, the Θ-state for proteins has been suggested to be of special biological relevance since folding is predicted to occur most efficiently when the Θ-point coincides with the transition midpoint for folding (9, 25, 26), while several previous results have been taken to suggest that unfolded proteins and folding intermediates are below the Θ-point under physiological conditions (2730).

One way of obtaining this missing information is by means of scaling laws (20, 22) that relate the radius of gyration of the unfolded protein (RG) to its length (N) via RG ∝ Nν. By determining the scaling exponent ν at different solvent conditions, the Θ-conditions are identified as the conditions for which ν = 1/2. Here we used single-molecule Förster resonance energy transfer (smFRET) to systematically determine the dimensions of seventeen chain segments with different lengths in six different unfolded proteins at a wide range of denaturant concentrations, resulting in a large data set (Fig. 1A and SI Appendix, Table S1). To investigate the sequence dependence of the Θ-conditions, we chose four foldable proteins [cold shock protein, CspTm (3); cyclophilinA, hCyp (31); spectrin domains R15 and R17 (32)] and two more highly charged IDPs (prothymosin α, ProTα, and the N-terminal domain of HIV Integrase, IN) (4) (Fig. 1A and SI Appendix, Table S1). Estimates for the scaling exponent ν, the Θ-conditions, and the free energy of solvation could be obtained for all six proteins.

Fig. 1.

Fig. 1.

Structures and amino acid compositions of the proteins used in this study (A) and single-molecule FRET efficiency histograms for CspTm (Csp66, SI Appendix, Table S1) at different concentrations of GdmCl (B). (A) Mean net charge, including the charges of the attached fluorophores, versus mean hydrophobicity per residue for hCyp, CspTm, R15, R17, IN, and ProTα (variants ProT53 and ProT54, SI Appendix) (circles). Error bars are standard deviations of mean net charge and mean hydrophobicity of the different variants of each protein. The density plot represents the distribution of 10,905 monomeric proteins with a sequence similarity ≤ 30% taken from the Protein Data Bank. The horizontal dashed line indicates a mean net charge of zero. Diagonal dashed lines indicate the separation line between intrinsically disordered and folded proteins suggested by Uversky et al. (48).

Results

To probe the dimensions of the unfolded states of the six proteins, we attached AlexaFluor 488 as a donor and AlexaFluor 594 as an acceptor chromophore at different positions within the polypeptide chains (SI Appendix, Table S1). The labeled proteins were investigated with confocal smFRET while freely diffusing in solution. In the resulting transfer efficiency histograms for each protein and variant, up to three peaks are observed: The peak at very high transfer efficiency (E) results from folded molecules, and the peak at E ≈ 0 results from molecules lacking an active acceptor dye (Fig. 1B and SI Appendix, Figs. S1–S3). We focus exclusively on the peak at intermediate transfer efficiencies, which results from unfolded molecules (Fig. 1B). The use of smFRET allows us to discriminate this population of unfolded molecules from folded molecules even in the virtual absence of denaturant (SI Appendix, Figs. S1–S3). With increasing concentration of the denaturant GdmCl, the transfer efficiency distributions of the unfolded subpopulations of all variants show a pronounced shift to lower E values, corresponding to an expansion of the polypeptide chains (Fig. 1B and SI Appendix, Figs. S1–S3), as observed previously for a broad range of proteins and peptides (3, 10, 15, 19, 33).

Chain dimensions from FRET efficiencies.

Quantitative information about the dimensions of the unfolded proteins can be obtained from the average values Inline graphic of their transfer efficiency peaks. We used the coil-to-globule transition theory of Sanchez (21) to extract the chain dimensions from Inline graphic. The advantage of this theory is its ability to describe the dimensions of a chain under all solvent conditions by explicitly taking into account effects such as excluded volume, intrachain interactions, and multibody interactions (10, 11, 21). The theory provides an expression for the probability density function of the radius of gyration rG in the form of a Boltzmann-weighted Flory–Fisk distribution (11, 34):

graphic file with name pnas.1207719109eq17.jpg [1]

Here, Inline graphic is the root mean squared radius of gyration of the Θ-state; ε is the mean interaction energy between amino acids; ϕ is the volume fraction of the chain; n is the number of amino acids in the chain segment probed by FRET; Z is a normalization factor; and q is the excess free energy per monomer with respect to the ideal chain (11). An expression similar to Eq. 1 was also obtained in heteropolymer theories (12, 13), showing that Eq. 1 is not specific for homopolymers (SI Appendix). Note, however, that none of these descriptions take into account effects from sequence complexities; e.g., the patterning of residues.

In order to relate the distribution P(rG,ε,RGΘ) to a segment end-to-end distance distribution P(r,ε,RGΘ), which is required to describe the transfer efficiencies of the polypeptide chains, we used the conditional probability density function P(r|rG) suggested by Ziv and Haran (11) (SI Appendix, Eq. S1). The observed mean transfer efficiency Inline graphic is related to Eq. 1 by

graphic file with name pnas.1207719109eq18.jpg [2]

where R0 is the Förster radius (5.4 nm in our case) and L is the contour length of the protein segment probed. Importantly, the root mean squared radius of gyration of the chain segment, Inline graphic, is largely independent of the specific value of RGΘ (SI Appendix, Fig. S8), which allows us to determine RG for every protein segment from its mean transfer efficiency, Inline graphic. We then use the scaling of RG with the number of peptide bonds in the unfolded protein segments, RG ∝ Nν, to determine RGΘ from the conditions at which ν = 1/2. With the correct value of RGΘ, we then determine ε exactly. P(r|rG) (SI Appendix, Eq. S1) assumes unfolded proteins to be spherical in shape, which is an approximation (3537), but we investigated the accuracy of Eq. 2 by simulation and found the error in RG to be ≤ 6% (SI Appendix, Fig. S5).

The radius of gyration of polymers scales with the number of bonds (N) according to the power-law relation RG = ρ0Nν. The specific value of ν depends on the dimensions of the chain, with a value of 3/5 for the expanded coil state (22), 1/2 for the Θ-state, and 1/3 for the most compact globule state (21, 35). In contrast, the value of the prefactor ρ0 depends on the details of the monomer and the bond geometry. For a self-avoiding chain with scaling exponent ν, RG is given by (38)

graphic file with name pnas.1207719109eq19.jpg [3]

(The derivation for a special case can also be found in ref. 34). Here, b = 0.38 nm (39) is the distance between two Cα-atoms, and Inline graphic is the persistence length (SI Appendix). Values for ρ0 from experiments (0.19 ± 0.03 nm and 0.2 ± 0.1 nm) (40, 41) and simulations (0.22 ± 0.02 nm, 0.24 nm, 0.198 ± 0.037 nm, and 0.199 nm) (4245) obtained under good solvent conditions (ν = 3/5) yield Inline graphic, in agreement with persistence lengths from force spectroscopy experiments (39). Since the range of segment lengths accessible with smFRET is not broad enough to determine ρ0 independently, we fixed Inline graphic (but not ρ0) to this value of 0.40 nm. For comparison, a free fit of the length scaling of RG for 10,905 folded proteins selected from the Protein Data Bank results in ν = 0.34 and a persistence length of Inline graphic (Fig. 2) (35), but even using this value for our analysis as an upper bound does not change our conclusions (SI Appendix).

Fig. 2.

Fig. 2.

Radius of gyration, RG, for all proteins and variants as a function of the number of bonds, Nbonds = N + l, at different GdmCl concentrations (see color scale). Each dye linker was estimated to be equivalent to 4.5 peptide bonds (l = 9) (61). Colored dashed lines are fits according to Eq. 3 with Inline graphic. The contour plots represent the distribution of RG values for the folded proteins shown in A. Gray circles are the RG values determined for unfolded proteins via SAXS, taken from Kohn et al. (40). Open blue circles are RG values of denatured proteins under native conditions determined with SAXS, taken from Uzawa et al. (30). Black solid lines are fits of the data taken from Kohn et al. (40) and of the 10,905 monomeric native proteins from the Protein Data Bank with Eq. 3. The resulting scaling exponents are indicated.

Identifying the Θ conditions from FRET and two-focus FCS.

Previous measurements of the scaling exponent ν for unfolded proteins at high concentrations of denaturant resulted in values between 0.50 and 0.67 (40, 41, 46, 47). In the most extensive study, RG for 28 proteins was determined by SAXS in the presence of high concentrations of GdmCl or urea (40). From this data set, ν = 0.598 ± 0.028 was obtained, indistinguishable from the theoretical prediction of 3/5 for an excluded volume chain (22), which indicates that unfolded proteins are in the coil-state and in good solvent at high concentrations of denaturant (Fig. 2). Under comparable solvent conditions (6 M GmdCl), we found the RG values from smFRET to be in remarkable agreement with RG = 0.2 nm N3/5, the scaling law obtained with SAXS (40) (Fig. 2). The scaling exponents we obtained at 6 M GdmCl range from 0.59 for hCyp to 0.63 for the hydrophilic IDP integrase. The high ν-value of prothymosin α (ν = 0.67), a highly negatively charged IDP (4, 48), points towards a specific interaction of the chain with the denaturant GdmCl (4), as previously suggested based on molecular dynamics simulations (49).

A decrease in the concentration of GdmCl leads to a compaction and to a corresponding decrease of ν for all six unfolded proteins (Figs. 2 and 3A). While the values of ν are close to 3/5 at high GdmCl concentrations for all proteins, they diverge with decreasing denaturant (Fig. 3A). Due to electrostatic repulsion at low ionic strength, the scaling exponents for the two charged IDPs, IN and ProTα, increase in water (4), reaching values of 0.58 for IN and 0.70 for ProTα. In contrast to the IDPs, the scaling exponents of the four foldable proteins decrease monotonically with decreasing solvent quality, but a substantial divergence of their scaling exponents is observed at the lowest denaturant concentrations, suggesting an increasing effect of sequence composition on the chain dimensions. The scaling exponents range from 0.40 for the most hydrophobic sequence (hCyp) to 0.51 for the most hydrophilic (R17), with a mean value of ν = 0.46 ± 0.05 in water—i.e., close to the Θ-regime.

Fig. 3.

Fig. 3.

Scaling exponents (A) and phase transition surface (B) for the unfolded proteins and variants of this study. (A) Error bars represent the uncertainties of the fits shown in Fig. 2, and the distributions in water (Left) and 6 M GdmCl (Right) reflect the changes in the scaling exponents upon variation of Inline graphic by ± 10% around its estimated value of 0.40 nm. (B) Comparison between experimentally determined expansion factors α (filled circles) for all variants and proteins of this study and the numerically computed expansion factors α with our estimate for RGΘ using Eq. 1. Shaded volumes indicate the regimes of attractive (ε > 0) and repulsive (ε < 0) intrachain interaction energies. The gray shaded region indicates the transition regime between αc = 1, the critical value for infinitely long chains, and αc = 1 + (19/22)ϕ0, the approximation for finite chains as given by Sanchez (21). Here, ϕ0 is the volume fraction of the Θ-state relative to the most compact state (SI Appendix).

An independent experimental approach to probe the collapse transition and the resulting change in the scaling exponents of polymers is the comparison of RG with the average hydrodynamic radius, RH. While both RG and RH are measures of the dimensions of the chain, their relative magnitude depends on the scaling regime (20), and the ratio RG/RH has thus been used to locate the collapse transition (50). To determine RH with sufficient precision, we used two-focus fluorescence correlation spectroscopy (2f-FCS) (51) (SI Appendix, Fig. S4), where the crosscorrelation between the fluorescence intensities from two partially overlapping foci is used to determine the diffusion time. The distance between the foci was determined to high accuracy by calibration with dynamic light scattering data (SI Appendix), resulting in very accurate translational diffusion coefficients and hydrodynamic radii. Fig. 4A shows the comparison of RH from 2f-FCS with RG determined from smFRET as a function of the GdmCl activity for singly labeled unfolded hCyp, the largest polypeptide chain of this study. As expected, RH increases with increasing concentration of GdmCl, confirming the expansion of the unfolded protein observed with smFRET (Fig. 4A). As observed previously (10, 41), the ratio RG/RH does not approach the expected limit of 1.5 at high concentrations of GdmCl. This might be the result of residual intrachain interactions even at high GdmCl concentrations, or of a direct interaction of guanidinium ions with the unfolded polypeptide chain (49), leading to slower diffusion and higher apparent values for RH. At low GdmCl activities, where the latter effect should be negligible, RG/RH decreases in a cooperative fashion, indicating a pronounced change in the scaling behavior and the scaling exponent of unfolded hCyp. The maximally compact state (Inline graphic) (20, 50), however, is not reached even at the lowest accessible GdmCl activities (aGdmCl = 0.05; GdmCl = 0.25 M) (Fig. 4B), as suggested also by the scaling exponent of ν = 0.45 ± 0.03. These results support our estimates for the scaling exponents of unfolded hCyp from smFRET (Fig. 3A).

Fig. 4.

Fig. 4.

Comparison between the radii of gyration and the hydrodynamic radii for hCyp as a function of GdmCl activity. (A) Radius of gyration, RG, (blue circles) for Cyp163 (SI Appendix, Table S1) rescaled to the full length sequence (Nbonds = 166 + 9) according to the scaling laws shown in Fig. 2, and hydrodynamic radius (RH) determined from 2fFCS (red circles) for the donor-labeled variant CypV2C as a function of the denaturant activity, aGdmCl. Error bars for RG were estimated from the change in Inline graphic by ± 10%. Error bars for RH represent the standard deviation of ± 0.1 nm estimated from the calibration of the instrument (SI Appendix). Solid lines are fits according to y = y(0) + γaGdmCl/(K + aGdmCl), where y is RG or RH, respectively. Inset: Arrangement of the foci with parallel and vertical polarization in the 2f-FCS setup (51). (B) RG/RH as a function of the GdmCl activity. Error bars result from the error propagation of the uncertainties shown in A. The solid line is the ratio of the fits shown in A.

Interaction energies and the Tanford transfer model.

The determination of the scaling exponents (Fig. 3A) now allows us to compute the absolute values of the intrachain interaction energies ε for the six unfolded proteins from the measured transfer efficiencies using Eq. 2. The radius of gyration of the Θ-state, which we found to be Inline graphic (Eq. 3), the interaction energy ε, and the chain length N then fully determine the phase transition behavior of the unfolded chains within the framework of Sanchez theory (21). A comparison of the experimental data with a numerical evaluation of Eq. 1 in terms of the expansion factor α = RG/RGΘ shows how the cooperativity of the collapse transition increases with increasing chain length (Fig. 3B). Strictly speaking, a second-order phase transition of the Landau type is only obtained in the limit of N → ∞ (21). Hence, for the finite size of the proteins investigated here, with 33 ≤ N ≤ 163, the transitions are pseudo-second-order, resulting in a rounding of the transition (21, 52).

Since the absolute value of ε depends on specific numerical factors in the theory, it is instructive to investigate the difference between the interaction energies in water, ε(0), and GdmCl solution ε(aGdmCl), respectively, Δε = ε(0) - ε(aGdmCl). The values of Δε determined for the different interdye variants of length nDA can then be rescaled to the full-length protein (ntotal) according to Δεtotal = Δε(nDA/ntotal)1/2 (SI Appendix). Δεtotal shows a pronounced dependence on the GdmCl activity for all six proteins (Fig. 5A). The effect of GdmCl on protein chains can be modeled as a preferential interaction of the denaturant with the polypeptide chain (49, 53). This weak-binding model describes the solvation free energy for the polypeptide chain as Δgsol = -βγ log(1 + KaGdmCl), where γ corresponds to the effective number of binding sites for GdmCl molecules, K is the apparent equilibrium constant for binding, and β = (RT)-1, where R is the ideal gas constant and T is the temperature. Fits with this model provide a good description of the change in Δεtotal with GdmCl activity for all proteins investigated here (Fig. 5A). In addition, we find a remarkable agreement of the absolute values of Δεtotal with the transfer free energies (Δgsol) of the polypeptide chains from water into GdmCl solutions (54) calculated based on their amino acid sequences (Fig. 5 A and B and SI Appendix, Fig. S6). This accordance suggests that the expansion of unfolded proteins, at least for the proteins investigated here, can be explained quantitatively by the change in free energy upon interaction of GdmCl molecules with the chain, implying Δεtotal = Δgsol. This finding strongly supports the use of this equality in a heteropolymer theory of protein folding (13) and in the molecular transfer model, where it was employed to predict the dimensions of denatured proteins at varying concentrations of GdmCl (14). A simple thermodynamic cycle, in which the total intrachain interaction energy, -εtotal(0), is reduced by the free energy of transferring the amino acid sequence from water to GdmCl (Δgsol), illustrates the effect of GdmCl on the intrachain interaction energy, -εtotal(a), and RG (Fig. 5C). Finally, these results directly support the correlation between the m-value for folding and the free energy change of collapse predicted by Alonso and Dill (13) and found experimentally by Ziv and Haran (11) (SI Appendix).

Fig. 5.

Fig. 5.

Relative intrachain interaction energies, Δεtotal, as a function of GdmCl activity, and comparison between Δεtotal and Δgsol. (A) Δεtotal for the proteins of this study (circles, colors as in Fig. 3B) together with the fits according to the Schellman weak binding model (gray solid line), and, for comparison, the Tanford transfer free energies Δgsol calculated for the full-length sequences (black line) according to ref. 54. Contributions from the backbone and side chains to Δgsol are shaded in blue and green, respectively. The effect of the δgsol-values estimated for Glu and Asp on Δgsol is indicated as a light green shaded area. From the discrepancy between Δεtotal and Δgsol for ProTα, we obtained δgsol for Glu and Asp at 6 M GdmCl to be -798 cal mol-1 (SI Appendix, Eq. S14 and Table S2). (B) Correlation between Δεtotal and Δgsol and thermodynamic cycle (C) illustrating the effect of GdmCl on the chain energy as explained in the main text. State 1 is a hypothetical expanded unfolded state in water and state 3 is the same state in the presence of GdmCl. State 2 is the collapsed unfolded state in water.

Effect of sequence composition on the scaling exponent.

A detailed analysis of the effect of sequence composition on the scaling exponents of the six proteins in water reveals a pronounced positive correlation between ν and the net charge of the polypeptide (Fig. 6A), and a negative correlation between ν and sequence hydrophobicity (Fig. 6B). A similar correlation has recently been observed in molecular dynamics simulations of protamines, positively charged intrinsically disordered peptides (55). These correlations allow us to estimate the scaling exponents also for other proteins. Values of the scaling exponents predicted for the unfolded states of 10,905 monomeric proteins from the Protein Data Bank, based on the correlation between ν and net charge (Fig. 6A, Inset), and ν and hydrophobicity (Fig. 6B, Inset) indicate that the majority of these proteins fall into the range of the scaling exponents observed with the foldable proteins in this study. A value of 0.45 ± 0.03 is obtained as a mean value of the two distributions, remarkably close to the value expected for the Θ-state (ν = 1/2).

Fig. 6.

Fig. 6.

Scaling exponents, sequence composition, and evolutionary trends. (A) Correlation between the scaling exponents of the proteins and the net charges of their sequences at pH 7. (B) Correlation between the scaling exponents of the six proteins and the mean hydrophobicity of their sequences. Horizontal error bars are the standard deviations as shown in Fig. 1A; vertical error bars reflect the changes in the scaling exponents upon variation of Inline graphic by ± 10%. Dashed lines in A and B are global fits according to empirical equations chosen to give reasonable limits of ν (SI Appendix, Eq. S29). Insets: Frequency histograms of the predicted scaling exponents for the unfolded states of the proteins selected from the pdb shown in Fig. 1 A and B based on the fits in A (red) and B (blue), respectively. The shaded areas indicate the regime of scaling exponents between ν = 0.40 and ν = 0.51, which encompass 93% of proteins in A and 71% of proteins in B. (CE) Distributions of predicted scaling exponents (Top) and mean net charge versus hydrophobicity (Bottom) for 50,000 amino acid sequences drawn randomly from the amino acid frequency distribution of the last universal ancestor (C), current proteins (D), and predicted for the distant future (E). The mean scaling exponents are indicated. See SI Appendix, Eqs. S29S31 for calculation of the scaling exponents. Amino acid frequencies were taken from table 3 in ref. 60.

Discussion

In order to quantify the thermodynamics of unfolded proteins with polymer theory, information about the Θ-point of the unfolded protein is indispensable (11, 21). Using smFRET, we determined the effective Θ-point of unfolded polypeptide chains by extracting the scaling exponents for four foldable proteins (CspTm, hCyp, R15, R17) and two intrinsically disordered proteins (ProTα and IN). The RG-values and scaling exponents obtained at high GdmCl are in quantitative agreement with values from SAXS (40) (Fig. 2) and SANS (41), indicating that smFRET is not only a precise but also an accurate method to determine the chain dimensions of unfolded proteins. With the ability to resolve subpopulations, smFRET allows us additionally to obtain the full range of scaling exponents down to physiological solvent conditions.

The higher net charge of the two intrinsically disordered proteins IN and ProTα (Fig. 1A) affects the scaling exponents and leads to an increase of ν at very low GdmCl concentrations (Fig. 3A). The resulting expanded conformations under physiological conditions might reflect an optimization of the sequences for the interaction with their cellular ligands, in keeping with suggestions from theory and simulations that binding kinetics can be accelerated in extended unfolded conformer ensembles (5). In contrast to the IDPs, the scaling exponents of the four foldable proteins decrease monotonically with decreasing solvent quality (Fig. 3A). However, with a mean scaling exponent of 0.46 ± 0.05 in water, they are still much more expanded than a dense globule, which would obey a scaling exponent of 1/3, as observed for folded globular proteins. Note that the scaling exponents of the two coexisting regimes, folded and unfolded, in water are significantly different (νfolded = 0.34, νunfolded ≈ 0.46). Although theories for homopolymers predict a phase separation into compact globules (ν = 1/3) and expanded chains (ν = 1/2) in poor solvent at high concentrations of the polymer (23), these theories are insufficient to reconcile the two coexisting scaling regimes under our experimental conditions of almost infinite dilution.

In heteropolymer theory, the effective intrachain interaction energy can be approximated by the sum of two mean-field terms, one for backbone interactions (εbb) and one for side-chain interactions (εsc), ε = εbb + εsc. Simulations (29) and experiments (33, 56) suggest that backbone interactions of polypeptide chains are attractive in water, implying that water is a poor solvent for the polypeptide chain backbone with εbb > 1. Our mean scaling exponent of 0.46 ± 0.05 of unfolded proteins in water (i.e. ε ≈ 1) (Fig. 3 A and B) would then imply that εsc is on average repulsive, i.e. εsc < 0. Hence, backbone and side-chain interactions nearly compensate in water, leading to a chain close to its critical point. In case the cooperative formation of specific interactions in folded proteins exceeds the mean-field energy term ε, compact folded proteins with ν = 1/3 and expanded unfolded proteins with ν > 1/3 can coexist. This scenario is in accord with lattice simulations that suggest that the folding of proteins can occur without populating a dense unstructured globule (57).

What do our results imply for protein folding? Although a collapse to a very dense state (ν = 1/3 and RG/RH = 0.77) favors folding by reducing the conformational entropy, it could drastically slow down the dynamics of the chain (57) by processes such as internal friction, which have been shown to increase with increasing compaction of unfolded proteins (16, 17, 33, 58). However, especially during the early stages of the folding process, many interactions have to be sampled to find the correct contacts that incrementally decrease the energy of the protein. Simulations based on simple models predict that unfolded chains close to the Θ-regime can accomplish this optimization process more efficiently than chains that are in the completely collapsed globule regime (9, 25, 26). Our results for hCyp, CspTm, R15, and R17 (Figs. 2 and 3), and a comparison of their hydrophobicity and net charge with those of a large number of foldable protein sequences (Fig. 6) implies that natural sequences are indeed close to this regime, and only very few proteins are expected to reach the maximally compact regime with ν = 1/3 in their unfolded state (Fig. 6). However, not only extreme compaction, but also expansion caused by a high net charge of the polypeptide (4, 55) can impede folding, as exemplified by IDPs that are folding incompetent without their biological ligands (48). An intermediate regime of compaction as prevalent in current sequences (Fig. 6) therefore indeed seems most favorable for folding. Within this regime, however, topology-specific effects such as contact order (59) appear to play the dominant role in determining the folding rates of current foldable proteins.

The correlations among net charge, hydrophobicity, and scaling exponents (Fig. 6) finally also allow us to assess the change in average chain dimensions during protein evolution. Based on bioinformatics analyses (60), ancestral proteins are assumed to have consisted of only eight to ten different amino acids with high average hydrophilicity (Fig. 6 CE). The resulting scaling exponent of 0.53 ± 0.06 for these ancestral proteins (SI Appendix, Eqs. S29S31) is close to what we observe for current IDPs, implying that IDPs may be remnants of ancestral protein sequences, whereas foldable sequences with more compact unfolded states are a more recent result of protein evolution (Fig. 6 CE).

Materials and Methods

Details of the expression, purification, and labeling of the protein variants and single-molecule measurements are described in detail in the SI Appendix.

Supplementary Material

Supporting Information

ACKNOWLEDGMENTS.

We thank Robert Best, Gilad Haran, Rohit Pappu, and Devarajan Thirumalai for helpful discussions. This work was supported by the Swiss National Science Foundation, the Swiss National Center of Competence in Research for Structural Biology, and by a Starting Investigator Grant of the European Research Council.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1207719109/-/DCSupplemental.

*The critical point for heteropolymers is an effective Θ-point (24), but for convenience, we will use the term Θ-point also for heteropolymers.

References

  • 1.Hagen SJ, Hofrichter J, Szabo A, Eaton WA. Diffusion-limited contact formation in unfolded cytochrome c: Estimating the maximum rate of protein folding. Proc Natl Acad Sci USA. 1996;93:11615–11617. doi: 10.1073/pnas.93.21.11615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bieri O, et al. The speed limit for protein folding measured by triplet–triplet energy transfer. Proc Natl Acad Sci USA. 1999;96:9597–9601. doi: 10.1073/pnas.96.17.9597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schuler B, Lipman E, Eaton W. Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy. Nature. 2002;419:743–747. doi: 10.1038/nature01060. [DOI] [PubMed] [Google Scholar]
  • 4.Müller-Späth S, et al. Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc Natl Acad Sci USA. 2010;107:14609–14614. doi: 10.1073/pnas.1001743107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Shoemaker B, Portman J, Wolynes P. Speeding molecular recognition by using the folding funnel: The fly-casting mechanism. Proc Natl Acad Sci USA. 2000;97:8868–8873. doi: 10.1073/pnas.160259697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sugase K, Dyson H, Wright PE. Mechanism of coupled folding and binding of an intrinsically disordered protein. Nature. 2007;447:1021–1025. doi: 10.1038/nature05858. [DOI] [PubMed] [Google Scholar]
  • 7.Chan HS, Dill KA. Polymer principles in protein structure and stability. Annu Rev Biophys Biophys Chem. 1991;20:447–490. doi: 10.1146/annurev.bb.20.060191.002311. [DOI] [PubMed] [Google Scholar]
  • 8.Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: The energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
  • 9.Thirumalai D, O’Brien E, Morrison G, Hyeon C. Theoretical perspectives on protein folding. Annu Rev Biophys. 2010;39:159–183. doi: 10.1146/annurev-biophys-051309-103835. [DOI] [PubMed] [Google Scholar]
  • 10.Sherman E, Haran G. Coil-globule transition in the denatured state of a small protein. Proc Natl Acad Sci USA. 2006;103:11539–11543. doi: 10.1073/pnas.0601395103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ziv G, Haran G. Protein folding, protein collapse, and Tanford’s transfer model: Lessons from single-molecule FRET. J Am Chem Soc. 2009;131:2942–2947. doi: 10.1021/ja808305u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bryngelson J, Wolynes P. A simple statistical field-theory of heteropolymer collapse with application to protein folding. Biopolymers. 1990;30:177–188. [Google Scholar]
  • 13.Alonso DO, Dill KA. Solvent denaturation and stabilization of globular proteins. Biochemistry. 1991;30:5974–5985. doi: 10.1021/bi00238a023. [DOI] [PubMed] [Google Scholar]
  • 14.O’Brien E, Ziv G, Haran G, Brooks B, Thirumalai D. Effects of denaturants and osmolytes on proteins are accurately predicted by the molecular transfer model. Proc Natl Acad Sci USA. 2008;105:13403–13408. doi: 10.1073/pnas.0802113105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Haran G. How, when, and why proteins collapse: The relation to folding. Curr Opin Struct Biol. 2012;22:14–20. doi: 10.1016/j.sbi.2011.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Waldauer S, Bakajin O, Lapidus L. Extremely slow intramolecular diffusion in unfolded protein L. Proc Natl Acad Sci USA. 2010;107:13713–13717. doi: 10.1073/pnas.1005415107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nettels D, Gopich I, Hoffmann A, Schuler B. Ultrafast dynamics of protein collapse from single-molecule photon statistics. Proc Natl Acad Sci USA. 2007;104:2655–2660. doi: 10.1073/pnas.0611093104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nettels D, et al. Single-molecule spectroscopy of the temperature-induced collapse of unfolded proteins. Proc Natl Acad Sci USA. 2009;106:20740–20745. doi: 10.1073/pnas.0900622106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Schuler B, Eaton W. Protein folding studied by single-molecule FRET. Curr Opin Struct Biol. 2008;18:16–26. doi: 10.1016/j.sbi.2007.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Grosberg A, Kuznetsov D. Quantitative theory of the globule-to-coil transition. 4. Comparison of theoretical results with experimental data. Macromolecules. 1992;25:1996–2003. [Google Scholar]
  • 21.Sanchez I. Phase transition behavior of the isolated polymer chain. Macromolecules. 1979;12:980–988. [Google Scholar]
  • 22.Flory P. The configuration of real polymer chains. J Chem Phys. 1949;17:303–310. [Google Scholar]
  • 23.de Gennes P-G. Scaling Concepts in Polymer Physics. Ithaca, NY and London: Cornell Univ Press; 1979. pp. 113–123. [Google Scholar]
  • 24.Ha B-Y, Thirumalai D. Conformations of a polyelectrolyte chain. Phys Rev A. 1992;46:R3012–R3015. doi: 10.1103/physreva.46.r3012. [DOI] [PubMed] [Google Scholar]
  • 25.Camacho C, Thirumalai D. Kinetics and thermodynamics of folding in model proteins. Proc Natl Acad Sci USA. 1993;90:6369–6372. doi: 10.1073/pnas.90.13.6369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Thirumalai D. From minimal models to real proteins: Time scales for protein-folding kinetics. J Phys (Paris) 1995;5:1457–1467. [Google Scholar]
  • 27.Uversky VN. Natively unfolded proteins: A point where biology waits for physics. Protein Sci. 2002;11:739–756. doi: 10.1110/ps.4210102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Crick SL, Jayaraman M, Frieden C, Wetzel R, Pappu RV. Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions. Proc Natl Acad Sci USA. 2006;103:16764–16769. doi: 10.1073/pnas.0608175103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tran HT, Mao A, Pappu RV. Role of backbone-solvent interactions in determining conformational equilibria of intrinsically disordered proteins. J Am Chem Soc. 2008;130:7380–7392. doi: 10.1021/ja710446s. [DOI] [PubMed] [Google Scholar]
  • 30.Uzawa T, et al. Time-resolved small-angle X-ray scattering investigation of the folding dynamics of heme oxygenase: Implication of the scaling relationship for the submillisecond intermediates of protein folding. J Mol Biol. 2006;357:997–1008. doi: 10.1016/j.jmb.2005.12.089. [DOI] [PubMed] [Google Scholar]
  • 31.Kallen J, et al. Structure of human cyclophilin and its binding site for cyclosporin A determined by X-ray crystallography and NMR spectroscopy. Nature. 1991;353:276–279. doi: 10.1038/353276a0. [DOI] [PubMed] [Google Scholar]
  • 32.Wensley B, et al. Experimental evidence for a frustrated energy landscape in a three-helix-bundle protein family. Nature. 2010;463:685–688. doi: 10.1038/nature08743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Möglich A, Joder K, Kiefhaber T. End-to-end distance distributions and intrachain diffusion constants in unfolded polypeptide chains indicate intramolecular hydrogen bond formation. Proc Natl Acad Sci USA. 2006;103:12394–12399. doi: 10.1073/pnas.0604748103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Flory P. Statistical Mechanics of Chain Molecules. Munich, Vienna, and New York: Carl Hanser Verlag; 1989. [Google Scholar]
  • 35.Dima R, Thirumalai D. Asymmetry in the shapes of folded and denatured states of proteins. J Phys Chem B. 2004;108:6564–6570. [Google Scholar]
  • 36.Theodorou DN, Suter UW. Shape of unperturbed linear-polymers: Polypropylene. Macromolecules. 1985;18:1206–1214. [Google Scholar]
  • 37.Tran HT, Pappu RV. Toward an accurate theoretical framework for describing ensembles for proteins under strongly denaturing conditions. Biophys J. 2006;91:1868–1886. doi: 10.1529/biophysj.106.086264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hammouda B. SANS from homogeneous polymer mixtures: A unified overview. Adv Polymer Sci. 1993;106:87–133. [Google Scholar]
  • 39.Zhou H. Polymer models of protein stability, folding, and interactions. Biochemistry. 2004;43:2141–2154. doi: 10.1021/bi036269n. [DOI] [PubMed] [Google Scholar]
  • 40.Kohn J, et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc Natl Acad Sci USA. 2004;101:12491–12496. doi: 10.1073/pnas.0403643101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wilkins D, et al. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry. 1999;38:16424–16431. doi: 10.1021/bi991765q. [DOI] [PubMed] [Google Scholar]
  • 42.Goldenberg D. Computational simulation of the statistical properties of unfolded proteins. J Mol Biol. 2003;326:1615–1633. doi: 10.1016/s0022-2836(03)00033-0. [DOI] [PubMed] [Google Scholar]
  • 43.Vitalis A, Wang X, Pappu R. Quantitative characterization of intrinsic disorder in polyglutamine: Insights from analysis based on polymer theories. Biophys J. 2007;93:1923–1937. doi: 10.1529/biophysj.107.110080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Fitzkee N, Rose G. Reassessing random-coil statistics in unfolded proteins. Proc Natl Acad Sci USA. 2004;101:12497–12502. doi: 10.1073/pnas.0404236101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhou H. Dimensions of denatured protein chains from hydrodynamic data. J Phys Chem B. 2002;106:5769–5775. [Google Scholar]
  • 46.Damaschun G, Damaschun H, Gast K, Zirwer D. Denatured states of yeast phosphoglycerate kinase. Biochemistry (Moscow) 1998;63:259–275. [PubMed] [Google Scholar]
  • 47.Tanford C, Kawahara K, Lapanje S. Proteins in 6M guanidine hydrochloride: Demonstration of random coil behavior. J Biol Chem. 1966;241:1921–1923. [PubMed] [Google Scholar]
  • 48.Uversky V, Gillespie J, Fink A. Why are “natively unfolded” proteins unstructured under physiologic conditions. Proteins. 2000;41:415–427. doi: 10.1002/1097-0134(20001115)41:3<415::aid-prot130>3.0.co;2-7. [DOI] [PubMed] [Google Scholar]
  • 49.O’Brien E, Dima R, Brooks B, Thirumalai D. Interactions between hydrophobic and ionic solutes in aqueous guanidinium chloride and urea solutions: Lessons for protein denaturation mechanism. J Am Chem Soc. 2007;129:7346–7353. doi: 10.1021/ja069232+. [DOI] [PubMed] [Google Scholar]
  • 50.Wu C, Zhou S. First observation of the molten globule state of a single homopolymer chain. Phys Rev Lett. 1996;77:3053–3055. doi: 10.1103/PhysRevLett.77.3053. [DOI] [PubMed] [Google Scholar]
  • 51.Dertinger T, et al. Two-focus fluorescence correlation spectroscopy: A new tool for accurate and absolute diffusion measurements. Chemphyschem. 2007;8:433–443. doi: 10.1002/cphc.200600638. [DOI] [PubMed] [Google Scholar]
  • 52.Steinhauser MO. A molecular dynamics study on universal properties of polymer chains in different solvent qualities. Part I. A review of linear chain properties. J Chem Phys. 2005;122:94901–94913. doi: 10.1063/1.1846651. [DOI] [PubMed] [Google Scholar]
  • 53.Schellman J. Fifty years of solvent denaturation. Biophys Chem. 2002;96:91–101. doi: 10.1016/s0301-4622(02)00009-1. [DOI] [PubMed] [Google Scholar]
  • 54.Nozaki Y, Tanford C. The solubility of amino acids, diglycine, and triglycine in aqueous guanidine hydrochloride solutions. J Biol Chem. 1970;245:1648–1652. [PubMed] [Google Scholar]
  • 55.Mao AH, Crick SL, Vitalis A, Chicoine CL, Pappu RV. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc Natl Acad Sci USA. 2010;107:8183–8188. doi: 10.1073/pnas.0911107107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Teufel DP, Johnson CM, Lum JK, Neuweiler H. Backbone-driven collapse in unfolded protein chains. J Mol Biol. 2011;409:250–262. doi: 10.1016/j.jmb.2011.03.066. [DOI] [PubMed] [Google Scholar]
  • 57.Gutin A, Abkevich V. Is burst hydrophobic collapse necessary for protein folding? Biochemistry. 1995;34:3066–3076. doi: 10.1021/bi00009a038. [DOI] [PubMed] [Google Scholar]
  • 58.Soranno A, et al. Quantifying internal friction in unfolded and intrinsically disordered proteins with single molecule spectroscopy. Proc Natl Acad Sci USA. 2012 doi: 10.1073/pnas.1117368109. doi: 10.1073/pnas.1117368109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Plaxco K, Simons K, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1998;277:985–994. doi: 10.1006/jmbi.1998.1645. [DOI] [PubMed] [Google Scholar]
  • 60.Jordan IK, et al. A universal trend of amino acid gain and loss in protein evolution. Nature. 2005;433:633–638. doi: 10.1038/nature03306. [DOI] [PubMed] [Google Scholar]
  • 61.Hoffmann A, et al. Mapping protein collapse with single-molecule fluorescence and kinetic synchrotron radiation circular dichroism spectroscopy. Proc Natl Acad Sci USA. 2007;104:105–110. doi: 10.1073/pnas.0604353104. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1207719109_SD01.doc (447KB, doc)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES