Significance
Although novel experimental approaches open exciting opportunities for understanding the elusive protein-folding pathways, interpreting experiments in terms of microscopic folding mechanisms often poses a serious challenge. Here, we demonstrate that extensive sets of global and site-specific unfolding data for two small proteins can be quantitatively explained by a simple statistical–mechanical model derived from known native protein structures. Remarkably, differences in folding between two proteins with similar structures are captured without need to consider sequence-specific interresidue interactions. This finding is significant because it implies that knowledge of the native structure—which implicitly includes sequence information—is sufficient for predicting detailed, site-specific folding mechanisms.
Keywords: protein thermodynamics, site-specific folding, Ising-like models, statistical mechanics
Abstract
Residue-level unfolding of two helix-turn-helix proteins—one naturally occurring and one de novo designed—is reconstructed from multiple sets of site-specific 13C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa–Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako–Saitô–Muñoz–Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and 13C-amide I′ bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for “experimental” reaction coordinates—namely, the degree of local folding as sensed by site-specific 13C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture.
The original protein-folding problem of structure prediction from amino acid sequences is for many small proteins an accomplished goal (1–3). By contrast, the seemingly much simpler problem of the folding mechanism of a known protein structure remains unsolved, even for the smallest proteins (4). The main difficulty stems from limited experimental information about partially folded and intermediate states of proteins along their folding pathways. Experiments that use multiple spectroscopic probes with complementary or site-specific structure sensitivities can, under favorable circumstances, overcome this obstacle. Noncoincident equilibrium unfolding curves from different probes are a clear sign of noncooperative transitions (5), which, in principle, allow intermediate states to be detected and characterized (6). Muñoz and coworkers pioneered the multiprobe equilibrium approach for studies of “downhill” folding (5, 7, 8), where cooperativity is minimal, but subsequently demonstrated its applicability to other fast-folding proteins (9–13) and extended the analysis to kinetics (14). Following their work, experiments from other laboratories have reported probe-dependent folding equilibria and kinetics in a number of small proteins (15–26). However, because the spectroscopic signals do not directly report on structure and are often subject to interferences from nonstructural effects that lead to ambiguities in their interpretations, the challenge lies in relating the observed experimental data to the underlying structural and energetic states of the protein.
In our laboratory, the telltale nonoverlapping unfolding transitions were observed by site-specific 13C isotopically edited IR spectroscopy (27) in two small helix–turn–helix (hth) proteins (19, 25) (Fig. 1). hth is an important structural motif, often found as an autonomously stable unit in larger helical domains. Such autonomous motifs, of which the P22 subdomain (28) is an example (Fig. 1A), are believed to be important as potential folding nuclei, or “foldons” (16, 29). hth motifs are also excellent models for studying folding of secondary and tertiary structure; the αtα (Fig. 1B) was de novo designed (30) specifically for that purpose. Both motifs have in common that tertiary, interhelical interactions are critical for their folding as evident from the lack of any residual structure in peptide fragments corresponding to the individual helices (19, 25). Conversely, site-specific unfolding data reveal quite distinct patterns of local thermal stabilities: Whereas the P22 subdomain unfolds from its N terminus toward the turn (19), αtα appears to unfold from the turn toward the chain ends (25). Here, to explain these data within the unifying framework of a physical model, we use one of the simplest, but powerful, descriptions of protein folding—the Ising-like statistical–mechanical model.
Fig. 1.
Structure of hth motifs. (A, Upper and B, Upper) Representative structures solved by NMR for the P22 subdomain (A; PDB ID code 1GP8) and de novo designed αtα (B; PDB ID code 1ABZ) are shown here. (A, Lower and B, Lower) Contact maps weighted by the fractional occurrence of each contact in the NMR ensemble (on the scale 0.0–1.0, white to red).
Ising-like models have an impressive track record in reproducing folding experimental data (7, 31–34) and even yield folding mechanism consistent with all-atom molecular dynamics (MD) simulations (35). However, replicating unfolding data from multiple local sites represents a new test for these simple models. In addition, because the Ising-like models are derived solely from the protein native structure, the even greater challenge is to explain the differences in the local unfolding—as indicated by 13C experimental data (19, 25)—of two proteins with nearly identical structures. Thirdly, because one of the model proteins (the P22 subdomain) is naturally occurring, whereas the other (αtα) is de novo designed, the native-centric assumption, which is usually justified by natural evolution, may not be strictly valid (36). The significance of the modeling is therefore not just in interpreting experimental data, but also in validating the underlying assumptions of the Ising-like models, which have important implications for the general understanding of protein folding. Furthermore, evaluating the importance of additional approximations and specific model parameters for the correct description of the experimental data can highlight their respective roles in determining the observed folding behavior. We evaluate the effects of sequence-specific interactions, which we approximate by Miyazawa–Jernigan (MJ) statistical contact potentials (37), backbone hydrogen bonds (38), and formation of nonlocal contacts between separate native segments. For the latter, we compare two widely used variants of the Ising-like model: the double-sequence approximation with loops (DSA/L) (32–35), which allows such contacts, and Wako-Saitô-Muñoz-Eaton (WSME) (39, 40), where the contacts only form within a contiguous native stretch. Because the hth motifs are among the simplest structures that combine both short- and long-range contacts (Fig. 1), they are ideal models for such tests.
Results and Discussion
Optimization of Ising-Like Models.
The detailed description of the Ising-like models and their parameterization is given in SI Appendix. All are derived from coarse-grained contact maps, with each contact weighted by its relative abundance in the NMR structural set (Fig. 1). This weighting not only deals with the technical problem of choosing which NMR structure to use, but also includes additional information about how well defined the structure of particular regions is. We test the significance of the weighting by considering a contact map for an average structure (SI Appendix, Fig. S1 A and B). The adjustable parameters of the model—the contact interaction energy, entropy cost of ordering a native bond and the Flory characteristic ratio for the DSA/L model—were optimized for each hth motif by simultaneously fitting all experimental data, including the circular dichroism (CD) for the fragments (Fig. 2). The site-specific interaction potentials are approximated by the MJ matrix (37) with a single adjustable scaling factor such that the number of adjustable parameters is the same as for nonspecific interactions. Including hydrogen bonds adds an extra parameter for the hydrogen-bond energy.
Fig. 2.
Modeling experimental unfolding data with different Ising-like model parameterizations. The P22 subdomain (A) and αtα (B) thermal unfolding probed by CD ellipticity and site-specific 13C isotope-edited spectroscopy. (A, Lower and B, Lower) Results of fitting the experimental data with several variants of the Ising-like model. (A, Upper and B, Upper) Fractional populations of the isotopically labeled stretches, derived from the sets of temperature-dependent experimental 13C amide I′ spectra (SI Appendix) by SMSA decomposition (symbols) and best model predictions (solid lines). The color scheme corresponds to the highlighted 13C-labeled regions in the cartoon protein representations. (A, Lower and B, Lower) Mean residue molar ellipticity at 222 nm of each protein (red), a fragment corresponding to the N-terminal helix (blue), and the C-terminal helix (black). (A, i and B, i) DSA/L with a weighted contact map (CM) and a single contact energy term. (A, ii and B, ii) DSA/L with a weighted CM and residue-specific potentials approximated by MJ map. (A, iii and B, iii) DSA/L with a CM for an average structure and a single contact energy term. (A, iv and B, iv) The WSME model with a weighted CM and a single contact energy term.
Additional parameters are necessary for simulation of experimental signals (SI Appendix). The CD ellipticities at 222 nm are calculated by using a standard length-dependent formula for the α-helix and random coil (41) and generally accepted ranges for temperature-dependent baselines (SI Appendix). Including the fragment CD is crucial; otherwise, the models tend to fold the helices independently. To reproduce temperature-dependent 13C amide I′ IR spectra, the Ising-like model is combined with the recently developed Shifted Multivariate Spectra Analysis (SMSA) method (42). Because of the extensive amount of data, we only show fractional folded populations (Fig. 2). The actual experimental spectra and their model predictions are found in SI Appendix, Figs. S2–S5. The reliability of the model can be further verified by inspection of resulting spectral components (SI Appendix, Figs. S2–S5) and by the consistency between estimated temperature-dependent amide I′ frequency shifts (SI Appendix, Table S3) and known shifts of the amide I′ in model compounds (19, 25). The resulting model parameters are also summarized in SI Appendix, Tables S1 and S2.
Significance of Residue-Specific Interactions.
The most remarkable and quite unexpected result is that just a single energy parameter for all interresidue contacts in the DSA/L Ising-like model (32–35) is sufficient for explaining all of the experimental data for both motifs (Fig. 2 A, i and B, i). By contrast, inclusion of sequence-specific interactions via the MJ matrix (37) produces worse results, although it is not immediately obvious from Fig. 2 A, ii and B, ii, which only compares CD data and fractional populations of native states. However, examination of the underlying component amide I′ IR spectra (SI Appendix, Fig. S3) shows that several are unphysical. Specifically, for the N terminus of the P22 subdomain (SI Appendix, Fig. S3A) and the N terminus of the turn in αtα (SI Appendix, Fig. S3J), the folded component spectra are too intense, which compensates for the folded population being too low on the corresponding segments predicted by the model with MJ potentials. In addition, for the P22 subdomain, other spectral components are suspect as well, notably the unfolded ones for both the N and C termini of the first helix. By contrast, without MJ potentials, both intensities and bandshapes (SI Appendix, Fig. S2) are clearly more in line with expectations. The folded spectrum for the C terminus of helix 1 in αtα still appears too intense (SI Appendix, Fig. S2J), perhaps implying that the folded population is too low, but the intensity is now comparable to that measured for other double-labeled segments (e.g., SI Appendix, Fig. S2 D and L). The difficulty with modeling this particular segment may also stem from the fact that it is essentially disordered, even at the lowest temperature. Conversely, for turn-C, which is another segment with similarly low native population, the component spectra do not get excessively intense, even with MJ potentials.
These results have two important implications. First, the residue-specific interactions do not seem to be necessary for inferring the folding mechanism, but the same energy parameter common to all interresidue contact is sufficient to account for all of the data. This result parallels the multiple successful applications of the model of Henry, Eaton, and others, which likewise uses only a single contact energy (7, 31–35), and it is also consistent with the findings of Muñoz and coworkers (43, 44) that sequence variability plays a less significant role in determining folding energy landscapes of natural proteins in comparison with topology and size. Obviously, this result does not mean that the specific amino acid sequence is not important because it is responsible for folding and stability of the protein in the first place. It merely suggests that all of the sequence-specific interactions are already incorporated in the folded structure or, strictly speaking, in its contact map (Fig. 1). Second, even in such a case, if described correctly, amino acid-specific parameters should not make the results worse. This result means that MJ potentials do not correctly capture the interresidue contact interactions—at least not in the context of the Ising-like model for protein folding—in either the P22 subdomain or in αtα.
Dynamics in the Native Structure: Weighted Contact Maps.
A notable result in Fig. 2 is that most of the isotopically labeled segments have <100% folded populations at the start of the unfolding curves (lowest temperatures), but retain >0% at high temperatures. That the protein is never completely folded is an important result of the model and cannot be captured, for example, by a simple chemical mass-action scheme that assumes transitions from fully folded to fully unfolded (19, 25). Partly folded, flexible regions are present in all proteins, but are expected to be abundant, particularly in small proteins and fragments that are frequently used as models for folding studies. Likewise, incomplete unfolding at high temperature is common and evidenced here by the residual α-helical CD at high temperatures, particularly for the P22 subdomain (Fig. 2A).
Because our model is derived from weighted contact maps (Fig. 1), it contains information about the relative flexibility of particular regions. The significance of this weighting can be tested by comparing the model derived from the contact map of an average structure (SI Appendix, Fig. S1 A and B). Here, the fit exhibits little change for the P22 subdomain (Fig. 2 A, iii), but is notably worse for αtα (Fig. 2 B, iii). Note that the fragment CD could not be fitted at all. The effect on the P22 subdomain is small, most likely because the most flexible region is on the N terminus, which has few contacts, whereas the most critical interhelical contacts near the turn are well established (SI Appendix, Fig. S1C). It should also be noted that averaging the structure and its associated contact map may be somewhat problematic because it may not lead to physically meaningful results. However, from a purely practical standpoint, weighting the contact maps according to the set of structures at hand was the most straightforward way to represent the experimental structural information and, at least for αtα, is clearly beneficial to the modeling.
Importance of Nonlocal Contacts: The WSME Model.
The popular WSME scheme (39, 40) only considers contacts within a native stretch. This approximation greatly simplifies the enumeration of the partition function (45), which can be done exactly without any restriction on the number of native stretches. However, the neglect of contacts between separate native segments also results in a failure of the model to reproduce experimental data (Fig. 2 A, iv and B, iv). For αtα, where data clearly show that helices must come into contact with the unfolded section in between, this result can be expected. However, the same result for the P22 subdomain demonstrates that the importance of nonlocal contacts is general. Despite the restriction to only two native segments, the DSA/L scheme provides a more realistic representation of the unfolding than an unlimited number of native segments allowing only local interactions. Even additional modifications, such as consideration of two separate entropy parameters for helical and nonhelical parts to compensate for the additional unordered loop entropy term in DSA/L, did not lead to any significant improvement (SI Appendix, Fig. S6 C and D). This finding is consistent with recent results of Henry et al. (35),who show that DSA/L is generally a very good approximation for small proteins with less than ∼50 amino acids.
Hydrogen Bonds.
The role of polypeptide backbone hydrogen bonds vs. specific side-chain interactions in protein folding is still questioned (38). The models above did not explicitly consider interhelical hydrogen bonds, but their success in description of the experimental data suggests that the contribution of hydrogen bonds is not significant. Not surprisingly, when testing this hypothesis directly by including an additional hydrogen-bond energy term, the effect is minimal (SI Appendix, Fig. S6E), and the resulting hydrogen bond energy parameters are near zero (SI Appendix, Table S1). This result is in agreement with past implementations of Ising-like models (32–35) that do not explicitly consider hydrogen-bond energies, but is somewhat at odds with experimental results, which suggest that protein stability is substantially affected by the backbone hydrogen bonds (46, 47). The likely explanation for this discrepancy is that the hydrogen bonds may be effectively included in the local (i, i + 4) interresidue contacts. As our tests show, when these local contacts are omitted from the energy function, hydrogen bonds become critical for folding (SI Appendix, Fig. S6F and Table S1). For the P22 subdomain, these give essentially the same results as the local contacts (Fig. 2 A, i); in contrast, the latter yield much better fits to the experimental data for αtα (Fig. 2 B, i).
Comparison of the Thermal Unfolding of the Two hth Motifs.
Successfully optimizing a model that explains experimental data for both studied proteins provides the basis for quantitative, detailed comparison of folding. Such comparison of the P22 subdomain and αtα is interesting for several reasons. First, it underlines the roles of the overall topology and sequence-specific interactions (48, 49) in determining the local stability of native structural elements. As demonstrated above, even a simple model based on a coarse-grained representation of native contacts with a single, common interaction energy parameter can capture quite distinct unfolding behavior in very similar structures. Although ideally one would like to fit the data for both proteins with the same set of parameters, such goals are unfeasible for the heavily coarse-grained approach tested here. Generally, the higher the level of coarse-graining, the more specific the parameters become (50). Nevertheless, although the parameter values differ somewhat, consistency between different variants of the model in fitting the sets of data for both proteins is remarkable (SI Appendix, Tables S1 and S2).
Second, because αtα is de novo designed, the neglect of all nonnative interactions by the Ising-like model, which can be rationalized on the basis of molecular evolution, becomes questionable. However, the Ising-like model works just as well for the αtα as it does for the naturally occurring P22 subdomain, and the effects of modifying the model and imposing additional simplifications are the same for both (Fig. 2). This finding implies that native interactions must dominate the stabilization of the partially folded, intermediate structures in both motifs. Conversely, the details of these interactions are very different, as evident from distinct patterns of thermal stability in individual labeled regions of the two motifs. From the heat maps in Fig. 3, the differences in unfolding captured by site-specific experiments are apparent: The P22 subdomain unfolds from the N terminus and is most stable at the helical segments near the turn, compared with αtα, which has little defined structure in the turn region and the highest stability near the centers and toward the termini of the α-helices. It is interesting to note that even in the P22 subdomain the turn itself is predicted to be less stable than the helices, which was missed by analysis with chemical mass-action models (19), although the general order of unfolding of the other segments agreed with the Ising-like model. For αtα, the unfolding curves of the two turn segments could not even be fitted with a two-state model; this result was attributed to these segments being essentially disordered (25), which the present Ising-like model analysis confirms.
Fig. 3.

Residue-by-residue thermal unfolding of two hth motifs. Population folded (from 0.0 blue, to 1.0 red) of individual peptide bonds for P22 subdomain (Left) and αtα (Right) as a function of temperature. Cartoons on the right of each map show the location of the N-terminal (h1) and C-terminal (h2) helices.
Free Energy Profiles: Kinetics and Mechanism of Folding.
Our experimental data only report on equilibrium folding and, consequently, contain no information about how the motifs actually fold in time. However, previous work has established empirical correlations between the degrees of unfolding cooperativity in equilibrium and the free energy barriers and folding kinetics (9, 51). In addition, because the Ising-like model provides a complete statistical–mechanical description of all states of the amino acid chain, the folding kinetics and mechanism can be inferred. The kinetics is often well described as diffusion on a one-dimensional free energy profile (34, 52), calculated as a function of a suitable coordinate that measures the degree of folding (34, 35). Because each residue is by definition native or unfolded in the Ising-like models, the number of native peptide bonds is a natural choice for a reaction coordinate. The free energy surfaces for both motifs (Fig. 4A) are similar, and the only notable difference is the greater variation of the αtα free energy with temperature, reflecting a more pronounced and sharper unfolding transition (Fig. 2). The profiles for both proteins are essentially barrier-less at low temperatures, and only an insignificant barrier to folding appears near the transition midpoints (∼0.4 and ∼0.3 kcal⋅mol−1, respectively). Negligible folding free-energy barriers are consistent with the heterogeneous unfolding process, where effectively a continuum of states with varying degrees of native structure are populated, and suggest very fast folding paralleling experimental data on other similar motifs (29, 53) and general trends expected for helical proteins of this size (44, 54).
Fig. 4.
Comparison of the folding mechanism as predicted by the model. (A) Free energy profiles for the P22 subdomain (Left) and αtα (Right) as the function of the number of native peptide bonds, plotted approximately every 14 K from 274 K (blue) to 344 K (yellow). (B) Probability of being folded (0.0 blue to 1.0 red) for each peptide bond at each value of the overall number of native peptide bonds at 270K and 350K.
The similarity of the one-dimensional free energy profiles obscures the differences in folding mechanism that would be expected from distinct patterns of local thermodynamic stability, because each value of the reaction coordinate averages many actual microstates. More detail is revealed if the contribution of each individual peptide bond to the given value of the reaction coordinate value is plotted, as in Fig. 4B. Viewed together with the free energy profile (Fig. 4A), these plots illustrate the predicted order of structure formation during the folding transition—i.e., diffusion of the population distribution from low to high degree of native structure. It is evident that the folding progresses from the segments with higher stability, which start forming near the top of the free energy barrier, to the least stable parts, which do not reach their native states until the folded minimum or not at all. Specifically, in the P22 subdomain, the first to form would be the helical structure in the vicinity of the turn, whereas in αtα it is near the helix termini. The N-terminal helix in αtα is predicted to start folding before the other helix, which is expected because the fragment experiments show that helix 1 does show some degree of autonomous stability (Fig. 2).
“Experimental” Reaction Coordinates: Probe-Specific Free Energy Profiles.
Although the number of native peptide bonds is the natural reaction coordinate for Ising-like models, experiments do not directly measure the overall fraction of native or unfolded amide bonds. The CD is sensitive to the amount of α-helix and its length, whereas the 13C edited amide I′ IR spectrum reports on the change in backbone conformation of the labeled segments. The experimental signals are often “built” into the free energy profiles as a dividing surface (14, 34, 52), but a more straightforward alternative is to consider a specific reaction coordinate for each probe. One could then construct a free energy profile as sensed by the particular experimental technique used to measure folding. Comparison of such surfaces with the global one can reveal how closely each probe captures the overall folding and, conversely, how well the overall reaction coordinate reflects unfolding measured by the specific probes. Moreover, the probe-specific profiles can be used to estimate—or analyze—the folding kinetics measured by each particular method.
Fig. 5 displays free energy profiles calculated for such probe-specific reaction coordinates (SI Appendix). The total number of α-helical residues, which mostly accounts for the observed CD ellipticity at 222 nm, gives very similar profiles to those in Fig. 4A. This finding is hardly surprising, because the folded structure is mostly α-helix, and the α-helical content, as measured by CD, should therefore closely follow the overall unfolding. On the other hand, profiles calculated as a function of the folded fraction of each 13C-labeled stretch, as measured by the IR (Fig. 5B), are quite distinct. The less stable regions (e.g., N terminus of the P22 subdomain and turn segments in αtα) show broad free energy minima consistent with their considerable flexibility, which, upon increase in temperature, tend to further broaden and shift toward the lower values of the order parameter. By contrast, the most stable parts (e.g., C terminus of the first helix of the P22 subdomain and N terminus of helix 1 of αtα) have deep, narrow native minima that do not shift with temperature. Therefore, although proteins are often divided into two-state folders and non-two-state folders, both gradual and two-state-like folding scenarios may actually be observed within a single protein, depending on the type and position of the experimental probe used for its detection. Moreover, although Ising-like models assume only two states for each individual amide bond, the two-state assumption is obviously not generally valid, even for stretches as short as two amides; in this case, it would be justified only for the most highly thermodynamically stable segments of each model hth motif.
Fig. 5.
Free energy profiles for experimental reaction coordinates. (A) Free energy profiles as a function of the total number of helical residues at approximately every 14 K from 274 K (blue) to 344 K (yellow) for the P22 subdomain (Left) and αtα (Right). (B) The free energy as function of the folded probability for each individual 13C-labeled stretch at two temperatures. The colors correspond to the label color scheme in Fig. 2. The apparent noise in some of the plots is due to the limited number of configurations for certain values of P, as stretches as short as two peptide bonds are considered.
Concluding Remarks
Site-specific experiments offer valuable indications of the folding cooperativity and of the relative thermodynamic stability of the individual probed structural segments. Only when combined with a microscopic model for protein folding, however, do the details of the underlying conformational states emerge, along with insights into the origins of the observed behavior. Ising-like statistical–mechanical models continue to demonstrate their utility for interpreting protein-folding experimental data. With very few free parameters, these simple models capture local thermodynamic stabilities, as probed by site-specific experiments, and, most notably, their differences between two structurally nearly-identical proteins. The latter is particularly remarkable, considering that the Ising-like models are based solely on the coarse-grained representation of the native structure. Although one of the studied proteins is de novo designed, the success of the Ising-like model in reproducing its unfolding suggests that the basic assumptions—namely, the dominance of native interactions—are equally valid. Furthermore, no sequence-specific interactions are needed, but subtle differences in the native contacts and their rigidity, as reflected in the weighted contact maps, are sufficient to explain the unfolding stability patterns. Although this finding does not imply that the specific amino acid sequence is unimportant, it suggests that its effects are already implicit in the details of the folded structure. Model comparison also highlights the importance of nonlocal contacts between disjoint native stretches for the correct description of the noncooperative unfolding of the studied proteins. Disallowing such long-range contact formation (WSME model) generally leads to predictions of a more cooperative behavior, at odds with the experimental data. Finally, the free energy surfaces constructed from the optimized model, although based only on equilibrium experiments, hint on the mechanism of folding. The stability appears to be key, because the more stable native segments are generally predicted to form first, consistent with both experimental studies of directed stability perturbations (49, 55) and with the recent state-of-the-art MD simulations (2).
Materials and Methods
The P22 subdomain [Protein Data Bank (PDB) ID code 1GP8] and de novo designed αtα (PDB ID code 1ABZ) were synthesized by using Fmoc-based solid-phase peptide synthesis techniques on PS-3 and Tribute automated peptide synthesizers (Protein Technologies), respectively, with isotope variants synthesized with amino acids 13C labeled at C=O. CD was performed on a Jasco J-815 spectropolarimeter in a 1-mm path length quartz cuvette and IR on a Bruker Tensor 27 FTIR spectrometer, equipped with a DLaTGS detector. Both CD and IR measurements were conducted in D2O-based buffer solution at pH 7.4 and pH 2.3 (uncorrected) for P22 and αtα, respectively.
Ising-like model partition function derivations for the DSA/L and WSME models generally followed Kubelka et al. (34) and Bruscolini and Pelizzola (45), respectively. Variants of the models included altering the energy function to incorporate the MJ potential matrix (37) or interhelical hydrogen bonds and defining the contact map to describe a single averaged structure or with contacts fractionally weighted according to occurrence in the NMR ensemble. The CD ellipticity at 222 nm was modeled directly using well-established methods (41). For simulating 13C amide I′ IR data, the Ising-like model was combined with the SMSA method (42).
Supplementary Material
Acknowledgments
We thank Milan Balaz for the use of circular dichroism and high performance liquid chromatography instruments and William A. Eaton for helpful discussions. This work was supported by National Science Foundation CAREER Grant 0846140.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1506309112/-/DCSupplemental.
References
- 1.Freddolino PL, Harrison CB, Liu Y, Schulten K. Challenges in protein folding simulations: Timescale, representation, and analysis. Nat Phys. 2010;6(10):751–758. doi: 10.1038/nphys1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. How fast-folding proteins fold. Science. 2011;334(6055):517–520. doi: 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- 3.Adhikari AN, Freed KF, Sosnick TR. De novo prediction of protein folding pathways and structure using the principle of sequential stabilization. Proc Natl Acad Sci USA. 2012;109(43):17442–17447. doi: 10.1073/pnas.1209000109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Skinner JJ, et al. Benchmarking all-atom simulations using hydrogen exchange. Proc Natl Acad Sci USA. 2014;111(45):15975–15980. doi: 10.1073/pnas.1404213111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Muñoz V. Thermodynamics and kinetics of downhill protein folding investigated with a simple statistical mechanical model. Int J Quantum Chem. 2002;90(4-5):1522–1528. [Google Scholar]
- 6.Eaton WA. Searching for “downhill scenarios” in protein folding. Proc Natl Acad Sci USA. 1999;96(11):5897–5899. doi: 10.1073/pnas.96.11.5897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Garcia-Mira MM, Sadqi M, Fischer N, Sanchez-Ruiz JM, Muñoz V. Experimental identification of downhill protein folding. Science. 2002;298(5601):2191–2195. doi: 10.1126/science.1077809. [DOI] [PubMed] [Google Scholar]
- 8.Sadqi M, Fushman D, Muñoz V. Atom-by-atom analysis of global downhill protein folding. Nature. 2006;442(7100):317–321. doi: 10.1038/nature04859. [DOI] [PubMed] [Google Scholar]
- 9.Naganathan AN, Doshi U, Muñoz V. Protein folding kinetics: Barrier effects in chemical and thermal denaturation experiments. J Am Chem Soc. 2007;129(17):5673–5682. doi: 10.1021/ja0689740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fung A, Li P, Godoy-Ruiz R, Sanchez-Ruiz JM, Muñoz V. Expanding the realm of ultrafast protein folding: gpW, a midsize natural single-domain with α+β topology that folds downhill. J Am Chem Soc. 2008;130(23):7489–7495. doi: 10.1021/ja801401a. [DOI] [PubMed] [Google Scholar]
- 11.Naganathan AN, Li P, Perez-Jimenez R, Sanchez-Ruiz JM, Muñoz V. Navigating the downhill protein folding regime via structural homologues. J Am Chem Soc. 2010;132(32):11183–11190. doi: 10.1021/ja103612q. [DOI] [PubMed] [Google Scholar]
- 12.Naganathan AN, Muñoz V. Thermodynamics of downhill folding: Multi-probe analysis of PDD, a protein that folds over a marginal free energy barrier. J Phys Chem B. 2014;118(30):8982–8994. doi: 10.1021/jp504261g. [DOI] [PubMed] [Google Scholar]
- 13.Sborgi L, et al. Interaction networks in protein folding via atomic-resolution experiments and long-time-scale molecular dynamics simulations. J Am Chem Soc. 2015;137(20):6506–6516. doi: 10.1021/jacs.5b02324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li P, Oliva FY, Naganathan AN, Muñoz V. Dynamics of one-state downhill protein folding. Proc Natl Acad Sci USA. 2009;106(1):103–108. doi: 10.1073/pnas.0802986106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yang WY, Pitera JW, Swope WC, Gruebele M. Heterogeneous folding of the trpzip hairpin: Full atom simulation and experiment. J Mol Biol. 2004;336(1):241–251. doi: 10.1016/j.jmb.2003.11.033. [DOI] [PubMed] [Google Scholar]
- 16.Maity H, Maity M, Krishna MMG, Mayne L, Englander SW. Protein folding: The stepwise assembly of foldon units. Proc Natl Acad Sci USA. 2005;102(13):4741–4746. doi: 10.1073/pnas.0501043102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ma H, Gruebele M. Kinetics are probe-dependent during downhill folding of an engineered lambda6-85 protein. Proc Natl Acad Sci USA. 2005;102(7):2283–2287. doi: 10.1073/pnas.0409270102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hauser K, Krejtschi C, Huang R, Wu L, Keiderling TA. Site-specific relaxation kinetics of a tryptophan zipper hairpin peptide using temperature-jump IR spectroscopy and isotopic labeling. J Am Chem Soc. 2008;130(10):2984–2992. doi: 10.1021/ja074215l. [DOI] [PubMed] [Google Scholar]
- 19.Amunson KE, Ackels L, Kubelka J. Site-specific unfolding thermodynamics of a helix-turn-helix protein. J Am Chem Soc. 2008;130(26):8146–8147. doi: 10.1021/ja802185e. [DOI] [PubMed] [Google Scholar]
- 20.Liu F, Gao YG, Gruebele M. A survey of λ repressor fragments from two-state to downhill folding. J Mol Biol. 2010;397(3):789–798. doi: 10.1016/j.jmb.2010.01.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nagarajan S, et al. Differential ordering of the protein backbone and side chains during protein folding revealed by site-specific recombinant infrared probes. J Am Chem Soc. 2011;133(50):20335–20340. doi: 10.1021/ja2071362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jones KC, Peng CS, Tokmakoff A. Folding of a heterogeneous β-hairpin peptide from temperature-jump 2D IR spectroscopy. Proc Natl Acad Sci USA. 2013;110(8):2828–2833. doi: 10.1073/pnas.1211968110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kishore M, Krishnamoorthy G, Udgaonkar JB. Critical evaluation of the two-state model describing the equilibrium unfolding of the PI3K SH3 domain by time-resolved fluorescence resonance energy transfer. Biochemistry. 2013;52(52):9482–9496. doi: 10.1021/bi401337k. [DOI] [PubMed] [Google Scholar]
- 24.Walters BT, Mayne L, Hinshaw JR, Sosnick TR, Englander SW. Folding of a large protein at high structural resolution. Proc Natl Acad Sci USA. 2013;110(47):18898–18903. doi: 10.1073/pnas.1319482110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kubelka GS, Kubelka J. Site-specific thermodynamic stability and unfolding of a de novo designed protein structural motif mapped by 13C isotopically edited IR spectroscopy. J Am Chem Soc. 2014;136(16):6037–6048. doi: 10.1021/ja500918k. [DOI] [PubMed] [Google Scholar]
- 26.Davis CM, Cooper AK, Dyer RB. Fast helix formation in the B domain of protein A revealed by site-specific infrared probes. Biochemistry. 2015;54(9):1758–1766. doi: 10.1021/acs.biochem.5b00037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Decatur SM. Elucidation of residue-level structure and dynamics of polypeptides via isotope-edited infrared spectroscopy. Acc Chem Res. 2006;39(3):169–175. doi: 10.1021/ar050135f. [DOI] [PubMed] [Google Scholar]
- 28.Sun Y, et al. Structure of the coat protein-binding domain of the scaffolding protein from a double-stranded DNA virus. J Mol Biol. 2000;297(5):1195–1202. doi: 10.1006/jmbi.2000.3620. [DOI] [PubMed] [Google Scholar]
- 29.Religa TL, et al. The helix-turn-helix motif as an ultrafast independently folding domain: the pathway of folding of Engrailed homeodomain. Proc Natl Acad Sci USA. 2007;104(22):9272–9277. doi: 10.1073/pnas.0703434104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fezoui Y, Connolly PJ, Osterhout JJ. Solution structure of alpha t alpha, a helical hairpin peptide of de novo design. Protein Sci. 1997;6(9):1869–1877. doi: 10.1002/pro.5560060907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Muñoz V, Thompson PA, Hofrichter J, Eaton WA. Folding dynamics and mechanism of beta-hairpin formation. Nature. 1997;390(6656):196–199. doi: 10.1038/36626. [DOI] [PubMed] [Google Scholar]
- 32.Henry ER, Eaton WA. Combinatorial modeling of protein folding kinetics: Free energy profiles and rates. Chem Phys. 2004;307(2-3):163–185. [Google Scholar]
- 33.Cellmer T, Henry ER, Hofrichter J, Eaton WA. Measuring internal friction of an ultrafast-folding protein. Proc Natl Acad Sci USA. 2008;105(47):18320–18325. doi: 10.1073/pnas.0806154105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA. Chemical, physical, and theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA. 2008;105(48):18655–18662. doi: 10.1073/pnas.0808600105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Henry ER, Best RB, Eaton WA. Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations. Proc Natl Acad Sci USA. 2013;110(44):17880–17885. doi: 10.1073/pnas.1317105110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Best RB, Hummer G, Eaton WA. Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci USA. 2013;110(44):17874–17879. doi: 10.1073/pnas.1311599110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Miyazawa S, Jernigan RL. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol. 1996;256(3):623–644. doi: 10.1006/jmbi.1996.0114. [DOI] [PubMed] [Google Scholar]
- 38.Rose GD, Fleming PJ, Banavar JR, Maritan A. A backbone-based theory of protein folding. Proc Natl Acad Sci USA. 2006;103(45):16623–16633. doi: 10.1073/pnas.0606843103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wako H, Saitô N. Statistical mechanical theory of the protein conformation. II. Folding pathway for protein. J Phys Soc Jpn. 1978;44(6):1939–1945. [Google Scholar]
- 40.Muñoz V, Eaton WA. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA. 1999;96(20):11311–11316. doi: 10.1073/pnas.96.20.11311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Scholtz JM, Qian H, York EJ, Stewart JM, Baldwin RL. Parameters of helix-coil transition theory for alanine-based peptides of varying chain lengths in water. Biopolymers. 1991;31(13):1463–1470. doi: 10.1002/bip.360311304. [DOI] [PubMed] [Google Scholar]
- 42.Kubelka J. Multivariate analysis of spectral data with frequency shifts: Application to temperature dependent infrared spectra of peptides and proteins. Anal Chem. 2013;85(20):9588–9595. doi: 10.1021/ac402083p. [DOI] [PubMed] [Google Scholar]
- 43.De Sancho D, Doshi U, Muñoz V. Protein folding rates and stability: How much is there beyond size? J Am Chem Soc. 2009;131(6):2074–2075. doi: 10.1021/ja808843h. [DOI] [PubMed] [Google Scholar]
- 44.De Sancho D, Muñoz V. Integrated prediction of protein folding and unfolding rates from only size and structural class. Phys Chem Chem Phys. 2011;13(38):17030–17043. doi: 10.1039/c1cp20402e. [DOI] [PubMed] [Google Scholar]
- 45.Bruscolini P, Pelizzola A. Exact solution of the Muñoz-Eaton model for protein folding. Phys Rev Lett. 2002;88(25 Pt 1):258101. doi: 10.1103/PhysRevLett.88.258101. [DOI] [PubMed] [Google Scholar]
- 46.Bunagan MR, Gao J, Kelly JW, Gai F. Probing the folding transition state structure of the villin headpiece subdomain via side chain and backbone mutagenesis. J Am Chem Soc. 2009;131(21):7470–7476. doi: 10.1021/ja901860f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Culik RM, Jo H, DeGrado WF, Gai F. Using thioamides to site-specifically interrogate the dynamics of hydrogen bond formation in β-sheet folding. J Am Chem Soc. 2012;134(19):8026–8029. doi: 10.1021/ja301681v. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Baker D. A surprising simplicity to protein folding. Nature. 2000;405(6782):39–42. doi: 10.1038/35011000. [DOI] [PubMed] [Google Scholar]
- 49.Zarrine-Afsar A, Larson SM, Davidson AR. The family feud: Do proteins with similar structures fold via the same pathway? Curr Opin Struct Biol. 2005;15(1):42–49. doi: 10.1016/j.sbi.2005.01.011. [DOI] [PubMed] [Google Scholar]
- 50.Zhang Z, Pfaendtner J, Grafmüller A, Voth GA. Defining coarse-grained representations of large biomolecules and biomolecular complexes from elastic network models. Biophys J. 2009;97(8):2327–2337. doi: 10.1016/j.bpj.2009.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Naganathan AN, Sanchez-Ruiz JM, Muñoz V. Direct measurement of barrier heights in protein folding. J Am Chem Soc. 2005;127(51):17970–17971. doi: 10.1021/ja055996y. [DOI] [PubMed] [Google Scholar]
- 52.Yang WY, Gruebele M. Folding at the speed limit. Nature. 2003;423(6936):193–197. doi: 10.1038/nature01609. [DOI] [PubMed] [Google Scholar]
- 53.Du D, Gai F. Understanding the folding mechanism of an α-helical hairpin. Biochemistry. 2006;45(44):13131–13139. doi: 10.1021/bi0615745. [DOI] [PubMed] [Google Scholar]
- 54.Naganathan AN, Muñoz V. Scaling of folding times with protein size. J Am Chem Soc. 2005;127(2):480–481. doi: 10.1021/ja044449u. [DOI] [PubMed] [Google Scholar]
- 55.McCallister EL, Alm E, Baker D. Critical role of beta-hairpin formation in protein G folding. Nat Struct Biol. 2000;7(8):669–673. doi: 10.1038/77971. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




