Abstract
Advances in simulation techniques and computing hardware have created a substantial overlap between the timescales accessible to atomic-level simulations and those on which the fastest-folding proteins fold. Here we demonstrate, using simulations of four variants of the human villin headpiece, how simulations of spontaneous folding and unfolding can provide direct access to thermodynamic and kinetic quantities such as folding rates, free energies, folding enthalpies, heat capacities, Φ-values, and temperature-jump relaxation profiles. The quantitative comparison of simulation results with various forms of experimental data probing different aspects of the folding process can facilitate robust assessment of the accuracy of the calculations while providing a detailed structural interpretation for the experimental observations. In the example studied here, the analysis of folding rates, Φ-values, and folding pathways provides support for the notion that a norleucine double mutant of villin folds five times faster than the wild-type sequence, but following a slightly different pathway. This work showcases how computer simulation has now developed into a mature tool for the quantitative computational study of protein folding and dynamics that can provide a valuable complement to experimental techniques.
Keywords: Amber ff99SB*-ILDN, enthalpy, heat capacity, pre-exponential factor, transition path time
Proteins are synthesized in the cell or in vitro as unstructured polypeptide chains that, in most cases, self-assemble into their functionally active three-dimensional shapes. This process, called protein folding, occurs on a broad range of timescales ranging from microseconds to seconds and higher. From a purely physical-chemical perspective, it should be possible in principle to characterize the folding mechanism of a given protein at atomistic resolution and to reconstruct its free-energy landscape, given only its primary sequence, through molecular dynamics (MD) simulations based on elementary physical principles. This direct approach has been rarely pursued because even the simplest systems representing a protein immersed in water consist of several thousand atoms, and simulating their behavior on the timescales typical of protein folding is computationally extremely demanding. The discovery and design of fast-folding proteins (1) significantly narrowed the timescale gap between simulations and experiments, making such simulations feasible, at least for the fastest-folding proteins.
The C-terminal fragment of the villin headpiece [referred to in the remainder of this paper simply as “villin” (2)], one of the fastest-folding protein domains known (3), has proven to be an excellent target for folding simulations with physics-based force fields and an atomistically detailed representation of both the solute and the surrounding solvent (4–7). Until recently, the length of such simulations was limited to a few microseconds—a timescale sufficient to capture, at best, a single folding event (8, 9). With this limitation, it has been difficult to directly connect the data produced by short, non-equilibrium simulations to experimental observations, unless sufficient statistics were generated to allow the construction of coarse-grained kinetic models that approximate the underlying folding dynamics (10, 11). While these models have proven useful for obtaining certain insights into the folding mechanism, it has been difficult to produce consistent predictions of the thermodynamics of villin folding (10–13), and where comparison has been possible between kinetic models built from multiple short simulations and independent, long-equilibrium MD simulations (14), substantial differences have been observed in the folding free energies.
Recent advances in computer hardware have, however, extended the timescale accessible to simulation up to the millisecond (15), thus creating a broad overlap with the microsecond timescale characteristic of fast-folding proteins such as villin, and allowing for the direct calculation of equilibrium thermodynamic and kinetic properties from the simulation data (16, 17). Substantial improvements have also been made in the molecular mechanics force fields used in MD simulations (14, 18–20).
Taking advantage of improvements in both simulation speeds and force fields, we have employed equilibrium MD simulations to study the folding kinetics and thermodynamics of several variants of villin. Some researchers have suggested that temperature-jump experiments may underestimate the folding time of villin (4, 5) and may be more sensitive to processes other than protein folding. It should be noted, however, that some of these theoretical estimates were based on models that combine the results of non-equilibrium, short, individual trajectories, thereby introducing additional sources of uncertainty in the quantitative comparisons between the simulation results and the experimental observations. The equilibrium simulations presented here overcome this difficulty as in each trajectory at least 30 folding and unfolding events are observed, making it possible to directly compute thermodynamic and kinetic quantities without the need to build approximate models to describe the system. Further, we compared simulations of different variants allowing the direct calculation of Φ-values in a manner analogous to experiments (17, 21).
Most of the results presented here are in good agreement with previous experimental findings, with the notable exception of the heat capacity for folding, which appears to be smaller than the value extracted from calorimetric data. We find that the double norleucine (Nle/Nle) mutant (3) folds approximately five times faster than the wild-type protein, thus supporting the original interpretation of the experimental data (3). In agreement with our previous observations (14), the results reported here also indicate that both the number of helical residues and the Trp side-chain environment are sensitive to the folding/unfolding process, supporting the notion that experiments that probe these quantities, like infrared (IR) and fluorescence-detected temperature-jump, may be used to determine folding and unfolding rates (22).
Results and Discussion
Equilibrium Reversible Folding Simulation of the Villin Headpiece C-terminal Fragment.
We validated that the Amber ff99SB*-ILDN (18, 19, 23) force field appears to be reasonably transferable across different protein classes (SI Text) and used it to investigate computationally the kinetics and thermodynamics of villin folding. In particular, we performed equilibrium MD simulations, at multiple temperatures, of the human villin C-terminal fragment (24), the Nle/Nle double mutant (3), and variants where we introduced an F10L mutant into either the wild-type protein or the Nle/Nle variant. Each simulation was run for at least 300 μs and contained between 30 and 150 folding and unfolding events (Table 1).
Table 1.
Folding kinetics and thermodynamics from equilibrium MD simulations of wild-type (WT) and two variants of villin headpiece C-terminal fragment
| Variant | Tsim (K) | L (μs) | n | ΔGf | ΔHf | ΔCv | τf (μs) |
(μs) |
k0 (μs-1) | ||
| MD | VH | MD | ΔΔH | ||||||||
| WT (HP-35) | 345 | 398 | 30 | 0.8(2) | −15.1(4) | 0.2(2) | 19(5) | 0.5(1) | 0.65 | ||
| WT (HP-35) | 360 | 319 | 31 | 1.6(2) | −18(3) | −19(7) | 0.1(2) | 0.2(2) | 16(4) | 0.24(5) | 1.49 |
| WT-F10L | 345 | 371 | 30 | 1.6(3) | −10(2) | 0.4(2) | 24(4) | 0.39(7) | 0.90 | ||
| Nle/Nle | 360 | 305 | 61 | −0.6(2) | −16(3) | 0.1(1) | 3.2(6) | 0.19(2) | 1.05 | ||
| Nle/Nle | 370 | 395 | 150 | 0.0(1) | −18.2(8) | −22(8) | 0.07(2) | 0.1(2) | 2.3(2) | 0.15(1) | 1.05 |
| Nle/Nle | 380 | 301 | 140 | 0.7(1) | −21.2(9) | −26(5) | 0.0(4) | 0.1(1) | 3.0(4) | 0.12(1) | 2.33 |
| Nle/Nle-F10L | 360 | 301 | 110 | 0.4(1) | −14.7(8) | 0.2(1) | 3.5(5) | 0.21(2) | 1.00 | ||
| Nle/Nle-F10L | 370 | 300 | 130 | 0.8(1) | −16(1) | −15(5) | 0.3(1) | 0.1(1) | 2.9(3) | 0.18(1) | 1.10 |
The temperature of each MD simulation is reported together with the total length (L) and the total number of observed folding and unfolding events (n). The trajectories have been partitioned into folded and unfolded segments using a transition-based assignment (14, 26). The folding free energy (ΔGf, kcal mol-1) is calculated from the ratio of the folded and unfolded fractions. Folding enthalpies (ΔHf, kcal mol-1) were calculated either from the folding free energy at different temperature using the van’t Hoff equation (VH) or as ΔH = ΔU + VΔP where ΔU is the difference in average force-field energy in the folded and unfolded states (MD). The heat capacities (ΔCv, kcal mol-1 K-1) were calculated either from difference in the fluctuations of the force-field energy between the folded and unfolded states (MD) or from the temperature dependency of the folding enthalpy (ΔΔH). The folding time (τf) is calculated as the average waiting time in the unfolded state. The pre-exponential factor for folding (k0) is estimated from the folding time and the mean transition path time (
) using Kramers’ theory (40)
The simulation trajectories were analyzed using the stable state approximation (25) and partitioned into folded and unfolded states through a “transition-based assignment” (26) of time series of the Cα-RMSD from the experimental native structure (14). We previously demonstrated that this approach is robust (14), and leads to a natural definition of transition paths as the portions of the trajectory where the system is transitioning between the two cutoffs.
Calculation of Thermodynamic Properties.
Partitioning of the trajectory into folded and unfolded states allows for the direct calculation of the folding free energy from the ratio of the populations of the folded and unfolded states. The calculated (14) melting temperatures of the wild type and the Nle/Nle double mutant in simulation (325 K and 370 K, respectively) are in reasonably good agreement with the experimental values of 342 K (27) and 361 K (3), respectively, and remarkably consistent with the value of 320 K estimated for wild-type villin in an independent metadynamics simulation with the same force field (28). At 360 K we calculate that the double Nle/Nle mutant is 2.2 kcal mol-1 more stable than the wild-type villin, compared to the experimental result of 1 kcal mol-1 (3). Also, the F10L mutation destabilizes the Nle/Nle double mutant by 1 kcal mol-1 and wild-type villin by 0.8 kcal mol-1, in excellent agreement with the experimentally measured value of 1 kcal mol-1 for the same mutation in the wild-type sequence at 340 K (29). We conclude that the simulations are able to satisfactorily reproduce both the absolute and relative stabilities of these three proteins.
The folding enthalpy was calculated from the difference in internal energy between the folded and unfolded states (Table 1). Similarly, the folding heat capacity was calculated from the differences in the fluctuations of the potential energy in the two states. These values can be compared to the folding enthalpies and the folding heat capacities calculated from the temperature dependency of the folding free energy and enthalpy, respectively. Both methods of calculation result in similar folding enthalpies, with values for the three proteins ranging between 14 and 26 kcal mol-1 (Table 1). These values are consistent with those observed in previous simulations performed with different force fields (14), but generally appear to be slightly smaller than the value of 29 kcal mol-1 determined from calorimetry for wild-type villin (30) and the approximately 25 kcal mol-1 estimated from a van’t Hoff analysis for the Nle/Nle double mutant (3).
As is the case for the folding enthalpies, the values calculated for the folding heat capacities from either the enthalpy changes as function of temperature or the fluctuations of the folding enthalpy are the same within (a relatively large) error (Table 1). However, the calculated values, ranging between 0 and 0.2 kcal mol-1 K-1, are smaller than the corresponding ΔCp values obtained from calorimetry experiments on the wild-type protein [0.457 kcal mol-1 K-1 (30)]. Similarly, small heat capacities have also been observed in simulations performed with a different force field (16). This discrepancy should most likely be ascribed to deficiencies in the force field representation of the system, and may be related to the weak temperature dependence of the stability of helical structures (9). This appears to be a general characteristic of the force fields commonly used for protein simulations (19, 20), suggesting that some aspects of protein folding, like helix formation, involve many-body effects that may not be fully reproduced by simple pairwise additive force fields. The importance of such effects is likely to be system- and size-dependent and, while it appears to be possible to get good quantitative agreement with experiment in folding simulations of small proteins with current force fields, folding simulations of larger proteins may require the development of improved functional forms beyond currently used functional forms.
It has recently been suggested that unfolded states in MD simulations with simple water models like TIP3P may be unusually compact due to inaccuracies in the force-field description of the enthalpies of hydration (31). In our simulations of villin, we observed that the average radius of gyration of the unfolded state (approximately 11 Å) is only 1 Å larger than the folded state (approximately 10 Å). This relatively small change may be ascribed in part to the substantial fraction of the helix that is observed in the unfolded state even at the melting temperature (14). Indeed, in our simulation of the FiP35 WW domain, a much less helical protein with the same number of amino acid residues, the radius of gyration of the unfolded state (approximately 16 Å), is 1.5 times larger than the value in the folded state, suggesting that the unusual compactness of the unfolded state observed for villin is not necessarily an intrinsic property of all proteins simulated with this force field. In all cases, it should be noted that, while errors in the enthalpy and heat capacity affect the temperature-dependence of calculated properties, most of this study was carried out in a relatively narrow range of temperatures (between 345 and 380 K), and we do not observe dramatic deviations from experiment when comparing simulations performed in this limited range.
Kinetics.
As described above, we used a transition-based assignment (14, 26) to define folded and unfolded segments of the trajectories, and calculated folding and unfolding rates from the waiting times in these states (Table 1). Experimental measurements of folding and unfolding rates are, however, generally based on observing the time-resolved relaxation of a spectroscopic signal after a small external perturbation; for fast-folding proteins this is typically a sudden increase in temperature of 5–10 K. For a two-state folder, the relaxation rate obtained from such an experiment is the sum of the folding and unfolding rates, and this relaxation rate can—together with a measured equilibrium constant—thus be used to determine the folding and unfolding rates. In the case of villin, such temperature-jump experiments have been performed using either tryptophan fluorescence (3, 27) or the amide I band in IR spectroscopy (32) as spectroscopic signals.
We recently described an approach by which the kinetics observed in a long MD simulation can be compared more directly with the results of temperature-jump experiments (14). In particular, in the limit of a small temperature perturbation, the relaxation kinetics observed in a temperature-jump experiment can be compared more directly to an MD simulation by calculating the autocorrelation function (ACF) of a quantity that mimics the experimentally employed spectroscopic probe (14). This analysis requires (i) simulations that are much longer than the timescale of the relaxation of interest (so that the ACFs can be calculated with a reasonable statistical accuracy) and (ii) an efficient method that can be used to calculate the spectroscopic signal from the structures observed in the simulation. Most of the simulations presented here satisfy the first requirement, as they are one to two orders of magnitude longer than the folding and unfolding times (Table 1). To the best of our knowledge, however, no computationally affordable method has been reported that allows for a highly accurate calculation of IR spectra or excited-state energies for a large number of conformations in a complex protein system. We have thus limited ourselves to calculating quantities that are expected to be correlated with the spectroscopic signal of interest, although the exact nature of the correlation is unknown. This approach is expected to be sufficient for the estimation of relaxation times, but may prevent quantitative comparisons with the amplitude of the signal observed in experiment, in particular when relaxations on multiple timescales are present.
The spectroscopic signal from the amide I band in IR spectroscopy is primarily sensitive to the number and geometry of backbone hydrogen bonds, and is therefore a global measure of the secondary structure content in the protein. In our calculations we approximate this signal by the total number of amino acid residues that are found in a helical geometry (33). The fluorescence properties of tryptophan residues are strongly influenced by fluctuations of the electric field surrounding the indole ring, which in turn is determined by the type of environment that surrounds the Trp side chain (34). In the absence of an accurate and efficient method to predict fluorescence properties, we approximate this signal by the solvent-accessible surface area (SASA) of the indole ring (14), although we note that a very recent study suggests that a more detailed description of the local geometry may be required to fully capture the fluorescence properties of villin (35). We therefore calculated the ACFs of the number of helical residues and the indole SASA from our equilibrium simulations of villin folding. Fitting of the ACFs to a single exponential decay resulted in considerable discrepancies for short lag times. They could, however, be fitted well to a double exponential decay (Fig. 1A) with a fast phase (time constants between 40 ns and 200 ns) and a slow phase (time constants between 0.4 μs and 5 μs). A similar two-exponential fit was deemed necessary in the analysis of temperature-jump experiments (3).
Fig. 1.
Simulated T-jump and TTET experiments. (A) Simulated IR T-jump experiment from the wild-type villin simulation performed at 345 K. The autocorrelation function for the number of helical residues as reported by STRIDE (33) (black) is used as a proxy for the decay of the amide band IR absorption following a T-jump and can be fitted to the sum of two exponential decays (red) with timescales of 0.12 and 5.2 μs. The long timescale describes the folding/unfolding transition, as demonstrated by the excellent agreement (B) between relaxation times calculated from the simulated IR and fluorescence T-jump experiments and the sum of the folding and unfolding rates (kf and ku) obtained from a two-state analysis of the simulation trajectories. (C) Simulated TTET experiments from the wild-type villin simulation at 345 K. Four experiments with probes located on different pairs of residues were simulated. The decay profile of probe absorbance, monitored in TTET experiments, was calculated from the kinetics of contact formation using the contact definitions of (10). The dashed and dotted black lines show the decomposition of the solid black curve (probes located on residues 0 and 23) in the contributions coming from the folded (dashed) and unfolded (dotted) states. (D) Native state structure of the villin headpiece showing in red the side chains corresponding to the residues where donor and acceptor probes were attached in TTET experiments (36). The side chain of residue Phe10 is shown in cyan.
The faster relaxation was originally attributed to intra-basin dynamics (3, 27) and has been the subject of extensive triplet–triplet energy transfer (TTET) studies (36) and simulations (10, 28). We have simulated TTET experiments using the definition of TTET-active states proposed in (10) and the trajectory data of wild-type villin at 345 K, as these are the simulation data most readily comparable to the TTET experiment; analysis of the Nle/Nle double mutant trajectory at 360 K gave very similar results. The calculated triplet decay profiles are all fully consistent with the experimental results obtained in 2 M guanidinium hydrochloride (36). For probes placed on the N-terminus and residue 23 (0/23) and residues 7/23 (Fig. 1D) the triplet decay profiles are very similar (Fig. 1C) and can be modelled by two exponential decays with timescales of 60–100 ns and approximately 4.5 μs. A decomposition of the decay into the folded and unfolded state components shows that the slow phase corresponds to the unfolding transition and the fast phase to contact formation in the unfolded state. The calculated triplet decay profiles generated by probes placed on residues 23/35 and on the N-terminus and residue 35 (0/35) (Fig. 1D) can also be modelled by double-exponential decays (Fig. 1C), but this time with timescales of a few tens of nanoseconds and 190–230 ns. In these cases, the fast phase can be attributed to contact formation in the unfolded state, while the slower phase is generated by partial melting of helix 3 in the folded state, consistent with previous interpretations (10).
The timescale of the slow relaxation is very similar for the two spectroscopic probes and suggests that both the Trp side-chain environment and number of helical residues are sensitive to the global folding and unfolding reaction. To quantify the extent to which this is the case, we plotted the long-timescale relaxation time constants obtained from fitting the ACFs against the relaxation time calculated directly from the folding and unfolding rates (Fig. 1B). The results show a strong correlation between the relaxation time obtained from the ACFs and the values obtained from the mean waiting times in the folded and unfolded states, consistent with recent experimental studies indicating that Trp fluorescence properties can be used to monitor the folding/unfolding transition in villin (22). Over the temperature range studied here, we find—in agreement with experiments—that the folding rates are only weakly dependent on temperature (3, 27), whereas the unfolding rates increase substantially as the temperature is increased (Table 1). We also find that wild-type villin folds substantially slower than the Nle/Nle double mutant. At 360 K—where we have data for both proteins—the wild-type protein folds in 16 μs and the double mutant in 3 μs; the relative folding rates of the two proteins are in excellent agreement with the experimental measurements (3, 27), while the absolute folding rates appear to be a factor of three slower than the rates experimentally determined at 300 K.
The ability to calculate folding rates and equilibrium constants directly from simulations of reversible folding also allows us to calculate protein-engineering Φ-values in a manner analogous to experiments (17, 21, 37). We performed simulations of a variant of villin in which we introduced the F10L mutation into both the wild type and Nle/Nle backgrounds (Fig. 1D). This mutation disrupts several key hydrophobic interactions between helix 1 and 2 and is expected to have a sizable effect on the stability of the protein. As described above, this mutation gives rise to a 0.8 to 1 kcal mol-1 destabilization when introduced both in the Nle/Nle variant or in wild-type villin (Table 1). We find that at 345 K the F10L mutation appears to decrease the folding rate of wild-type villin, and the resulting Φ-value of 0.2 ± 0.2 is consistent with the fractional Φ-values measured experimentally at 310 K (Φ = 0.3) and 340 K (Φ = 0.6) (29). In contrast, this mutation has no observable effect on the folding rate of the Nle/Nle variant, and the calculated Φ-value for folding is therefore small (0.0 ± 0.2).
The small difference in the Φ-value calculated for wild-type villin and for the Nle/Nle variant might suggest that the introduction of two stabilizing Nle-residues in helix 3 in the Nle/Nle double mutant subtly shifts the folding pathway relative to that in the wild-type protein (38). In our previous analysis of the Nle/Nle variant in the Amber ff99SB*-ILDN force field we found that helix 3 in general forms early during the folding pathway, and that helix 1 almost invariably forms last (14); a result that is in good agreement with the very low Φ-value that we here obtained computationally for F10L in the Nle/Nle background. A comparable analysis of the order of helix formation in wild-type villin shows that helix 2 forms first in 80% of the folding events (as compared to only 30% in the Nle/Nle double mutant). Thus, the larger Φ-value for F10L in the WT background reflects genuine, but subtle, differences in the folding mechanisms of the two proteins in our simulations. Future experiments probing the folding kinetics and thermodynamics of the F10L mutant in the Nle/Nle variant could be used to validate or disprove the pathway shift proposed on the basis of MD simulations.
The transition path time is the time it takes the molecule to transition between the folded and unfolded basin, and can be substantially shorter than the mean waiting times between such folding and unfolding events (39, 40). The mean value of the transition path time is of considerable interest as it contains useful information regarding the properties of the free-energy barrier. It also determines the time resolution needed in experiments in order to resolve individual folding events in single-molecule studies (39, 40). For each folding and unfolding event we determined the transition path time as the time required to transition fully between the RMSD cutoffs used to define these two states. The calculated values range between 120 and 460 ns, and show an approximately exponential dependence on the temperature (Fig. 2A). The calculated mean transition path times can be used to estimate the pre-exponential factor for folding (k0) using the relation (40):
![]() |
[1] |
Fig. 2.
Arrhenius plots of mean transition path time and pre-exponential factor. Mean transition path times (A) and pre-exponential factors (B) observed across the seven different simulations of villin folding plotted as a function of the inverse of temperature. Pre-exponential factors are estimated from the mean transition path times and folding times using Kramers’ theory (40). The apparent activation free energy for diffusion can be defined as the slope of the Arrhenius plot where the logarithm of the pre-exponential factor is plotted against the inverse temperature.
The values of k0 estimated with this approach range between (0.5 μs)-1 and (1.5 μs)-1, in remarkable agreement with previous estimates (1), and also appear to be temperature-dependent (Fig. 2B). For diffusion in a rough potential, it has been proposed that there should be some kind of exponential dependence of k0 with respect to the inverse or the square of the inverse of the temperature (41). The exponent is expected to be related to the roughness of the free-energy surface (41). We note the exact functional form describing the temperature-dependence of k0 is not known and a substantial amount of additional data spanning a larger temperature interval would be required to infer it from simulation. A crude estimate of the roughness of the energy landscape can be obtained from an Arrhenius plot of k0 (Fig. 2B), giving “effective activation energies” for diffusion of approximately 7 kcal mol-1 for wild-type villin, approximately 5 kcal mol-1 for the Nle/Nle double mutant, and approximately 1 kcal mol-1 for the F10L mutant, when assuming a simple exponential dependency on the inverse temperature. [For a model in which the exponent scales with the inverse square of the temperature (41) we find the corresponding numbers to be approximately 2 kcal mol-1 for wild-type villin, approximately 2 kcal mol-1 for the Nle/Nle double mutant, and approximately 1 kcal mol-1 for the F10L mutant.]
Finally, we find that the distribution of transition-path times for individual folding and unfolding events is rather broad and characterized by a “lag” at short times where no events are found and a roughly exponential decay at longer timescales. In Fig. 3 we show a histogram of the observed transition path times for the 150 folding and unfolding events observed for the Nle/Nle double mutant at 370 K. The mean value is 150 ns, and the histogram is peaked around 40 ns. This shape is qualitatively consistent with that predicted by a number of theories (41–44); it would, however, require a substantially larger number of transitions to determine which of these theories fits the simulation results best. The short-end timescale of this distribution sets a natural lower limit to the timescales needed to observe folding events in shorter MD simulations. As long as the folding and unfolding relaxation times are substantially longer that the transition path time, as is the case for the simulations performed here, a two-state analysis will produce a reasonable description of the kinetics of the system.
Fig. 3.
Distribution of transition path times. The plot shows the distribution of the observed transition path times for the 150 folding/unfolding events observed in the Nle/Nle double mutant simulation at 370 K. On the longest timescales the distribution is roughly exponential. On the shortest timescales, however, there is a clear “lag time” so that almost no events are observed at the shortest timescales. The distribution can be fitted to a simple model with two intermediate states on the transition path (red) (40), although some discrepancy is observed for short timescales, suggesting that the actual transition is a more complex process.
Estimation of the Folding Free-Energy Barrier.
We used a previously described variational approach (45, 46) to estimate the free-energy barrier for the folding of villin. Briefly, in this approach, a reaction coordinate is optimized to separate maximally the transition paths from the stable folded and unfolded states. The free energy profile calculated along the optimized reaction coordinate is therefore expected to provide a reasonable estimate of the actual folding free-energy barrier. The free-energy barriers for folding calculated with this approach range between 1.1 kcal mol-1 and 2.3 kcal mol-1 (Fig. 4). These values are consistent with the pre-exponential factors of (0.5 μs)-1 to (1.5 μs)-1 estimated from the mean transition path times and the calculated folding times of 2.3 to 20 μs. In Fig. 4 we report the free energy profiles for the Nle/Nle double mutant, where simulations are available at three temperatures. While the three profiles are remarkably similar, the shape of the calculated barrier changes with temperature. In the 360-K simulation of the Nle/Nle mutant, a sparsely populated folding intermediate can be observed; analysis of the trajectory suggests that this intermediate corresponds to formation of helix 3 and the turn between helices 2 and 3. As the temperature is increased, this intermediate disappears and at 380 K the top of the barrier moves towards the native state, suggestive of a Hammond behavior. The free-energy barriers calculated for the wild-type villin simulations (1.7 kcal mol-1 and 2.3 kcal mol-1 at 345 K and 360 K, respectively) can be compared to the 0.5–2 kcal mol-1 estimated experimentally using a number of approaches (30).
Fig. 4.
Free energy profiles for the fast-folding Nle/Nle double mutant of villin. Free energy profiles for the folding of the Nle/Nle double mutant of villin at three temperatures, projected along an optimized one-dimensional coordinate. The reaction coordinate was optimized separately for each simulation. To facilitate comparison, the profiles have been rescaled on the x-axis and translated so that the folded state has a coordinate value of approximately 0 and the unfolded state has a coordinate value of approximately 1. Folding free-energy barriers calculated from the profiles are 1.10 kcal mol-1 at 360 K, 1.14 kcal mol-1 at 370 K, and 1.70 kcal mol-1 at 380 K.
Methods
The Anton specialized hardware (15) was used to perform MD simulation with the Amber ff99SB*-ILDN (18, 19, 23) force field following the protocol described in (14). Further details on the simulation and analysis methods are reported in the SI Text.
Conclusion
We have further validated the Amber ff99SB*-ILDN force field and used it to examine the folding process of villin, demonstrating how long-timescale molecular dynamics simulations can provide direct access to a range of thermodynamic and kinetic properties for folding. Most of the calculated observables are in reasonably good agreement with experiments, the largest discrepancies being that the calculated heat capacity for folding is smaller than in the experiments and folding rates are slower by a factor of three. Our simulations indicate that the folding pathway for wild-type villin and the Nle/Nle double mutant are slightly different and suggest a possible way to probe this experimentally.
Although the main focus of our work was to demonstrate the feasibility of comparing folding simulations directly with experiments, we also note here that many of our results are in line with expectations based on existing theories of protein folding such as energy landscape theory (47). These theories describe protein folding as a diffusive process on a rough free energy landscape; indeed we find that even in the presence of a marginal free energy barrier, landscape roughness still limits folding to the microsecond timescale, and that the relaxation kinetics can be exponential even with small barriers. Also, landscape theory suggests that the folding mechanism is more easily affected by perturbations than native state structure, in good agreement with our findings.
Supplementary Material
ACKNOWLEDGMENTS.
We thank Ron O. Dror and William A. Eaton for helpful discussions and a critical reading of the manuscript and Mollie Kirk for editorial assistance.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1201811109/-/DCSupplemental.
References
- 1.Kubelka J, Hofrichter J, Eaton WA. The protein folding “speed limit”. Curr Opin Struct Biol. 2004;14:76–88. doi: 10.1016/j.sbi.2004.01.013. [DOI] [PubMed] [Google Scholar]
- 2.McKnight CJ, Doering DS, Matsudaira PT, Kim PS. A thermostable 35-residue subdomain within villin headpiece. J Mol Biol. 1996;260:126–134. doi: 10.1006/jmbi.1996.0387. [DOI] [PubMed] [Google Scholar]
- 3.Kubelka J, Chiu TK, Davies DR, Eaton WA, Hofrichter J. Sub-microsecond protein folding. J Mol Biol. 2006;359:546–553. doi: 10.1016/j.jmb.2006.03.034. [DOI] [PubMed] [Google Scholar]
- 4.Ensign DL, Kasson PM, Pande VS. Heterogeneity even at the speed limit of folding: Large-scale molecular dynamics study of a fast-folding variant of the villin headpiece. J Mol Biol. 2007;374:806–816. doi: 10.1016/j.jmb.2007.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Freddolino PL, Schulten K. Common structural transitions in explicit-solvent simulations of villin headpiece folding. Biophys J. 2009;97:2338–2347. doi: 10.1016/j.bpj.2009.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mittal J, Best RB. Tackling force-field bias in protein folding simulations: Folding of Villin HP35and Pin WW domains in explicit water. Biophys J. 2010;99:L26–L28. doi: 10.1016/j.bpj.2010.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Duan Y, Kollman PA. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 1998;282:740–744. doi: 10.1126/science.282.5389.740. [DOI] [PubMed] [Google Scholar]
- 8.Freddolino PL, Harrison CB, Liu Y, Schulten K. Challenges in protein-folding simulations. Nat Phys. 2010;6:751–758. doi: 10.1038/nphys1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Best RB. Atomistic molecular simulations of protein folding. Curr Opin Struct Biol. 2012;22:52–61. doi: 10.1016/j.sbi.2011.12.001. [DOI] [PubMed] [Google Scholar]
- 10.Beauchamp KA, Ensign DL, Das R, Pande VS. Quantitative comparison of villin headpiece subdomain simulations and triplet–triplet energy transfer experiments. Proc Natl Acad Sci USA. 2011;108:12734–12739. doi: 10.1073/pnas.1010880108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Beauchamp KA, et al. MSMBuilder2: Modeling conformational dynamics at the picosecond to millisecond scale. J Chem Theory Comput. 2011;7:3412–3419. doi: 10.1021/ct200463m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bowman GR, Beauchamp KA, Boxer G, Pande VS. Progress and challenges in the automated construction of Markov state models for full protein systems. J Chem Phys. 2009;131:124101. doi: 10.1063/1.3216567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bowman GR, Pande VS. Protein folded states are kinetic hubs. Proc Natl Acad Sci USA. 2010;107:10890–10895. doi: 10.1073/pnas.1003962107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Piana S, Lindorff-Larsen K, Shaw DE. How robust are protein folding simulations with respect to force field parameterization? Biophys J. 2011;100:L47–L49. doi: 10.1016/j.bpj.2011.03.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shaw DE, et al. Millisecond-scale molecular dynamics simulations on Anton; Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SC09); New York: ACM; 2009. [Google Scholar]
- 16.Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. How fast-folding proteins fold. Science. 2011;334:517–520. doi: 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- 17.Shaw DE, et al. Atomic-level characterization of the structural dynamics of proteins. Science. 2010;330:341–346. doi: 10.1126/science.1187409. [DOI] [PubMed] [Google Scholar]
- 18.Hornak V, et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins. 2006;65:712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Best RB, Hummer G. Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides. J Phys Chem B. 2009;113:9004–9015. doi: 10.1021/jp901540t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lindorff-Larsen K, et al. Systematic validation of protein force fields against experimental data. PLoS ONE. 2012;7:e32131. doi: 10.1371/journal.pone.0032131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Settanni G, Rao F, Caflisch A. Phi-value analysis by molecular dynamics simulations of reversible folding. Pro Natl Acad Sci USA. 2005;102:628–633. doi: 10.1073/pnas.0406754102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cellmer T, Buscaglia M, Henry ER, Hofrichter J, Eaton WA. Making connections between ultrafast protein folding kinetics and molecular dynamics simulations. Pro Natl Acad Sci USA. 2011;108:6103–6108. doi: 10.1073/pnas.1019552108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lindorff-Larsen K, et al. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chiu TK, et al. High-resolution X-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. Proc Natl Acad Sci USA. 2005;102:7517–7522. doi: 10.1073/pnas.0502495102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Northrup SH, Hynes JT. The stable states picture of chemical reactions. I. Formulation for rate constants and initial condition effects. J Chem Phys. 1980;73:2700–2714. [Google Scholar]
- 26.Buchete NV, Hummer G. Coarse master equations for peptide folding dynamics. J Phys Chem B. 2008;112:6057–6069. doi: 10.1021/jp0761665. [DOI] [PubMed] [Google Scholar]
- 27.Kubelka J, Eaton WA, Hofrichter J. Experimental tests of villin subdomain folding simulations. J Mol Biol. 2003;329:625–630. doi: 10.1016/s0022-2836(03)00519-9. [DOI] [PubMed] [Google Scholar]
- 28.Saladino G, Marenchino M, Gervasio FL. Bridging the gap between folding simulations and experiments: The case of the villin headpiece. J Chem Theory Comput. 2011;7:2675–2680. doi: 10.1021/ct2002489. [DOI] [PubMed] [Google Scholar]
- 29.Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA. Chemical, physical, and theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA. 2008;105:18655–18662. doi: 10.1073/pnas.0808600105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Godoy-Ruiz R, et al. Estimating free-energy barrier heights for an ultrafast folding protein from calorimetric and kinetic data. J Phys Chem B. 2008;112:5938–5949. doi: 10.1021/jp0757715. [DOI] [PubMed] [Google Scholar]
- 31.Best RB, Mittal J. Protein simulations with an optimized water model: Cooperative helix formation and temperature-induced unfolded state collapse. J Phys Chem B. 2010;114:14916–14923. doi: 10.1021/jp108618d. [DOI] [PubMed] [Google Scholar]
- 32.Bunagan MR, Gao J, Kelly JW, Gai F. Probing the folding transition state structure of the villin headpiece subdomain via side chain and backbone mutagenesis. J Am Chem Soc. 2009;131:7470–7476. doi: 10.1021/ja901860f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Frishman D, Argos P. STRIDE: A web server for secondary structure assignment from known atomic coordinates of proteins. Proteins. 1995;23:566–579. doi: 10.1093/nar/gkh429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Callis PR. Predicting fluorescence lifetimes and spectra of biopolymers. Methods Enzymol. 2011;487:1–38. doi: 10.1016/B978-0-12-381270-4.00001-9. [DOI] [PubMed] [Google Scholar]
- 35.Tusell JR, Callis PR. Simulations of tryptophan fluorescence dynamics during folding of the villin headpiece. J Phys Chem B. 2012;116:2586–2594. doi: 10.1021/jp211217w. [DOI] [PubMed] [Google Scholar]
- 36.Andreas R, Henklein P, Kiefhaber T. An unlocking/relocking barrier in conformational fluctuations of villin headpiece subdomain. Proc Natl Acad Sci. 2010;107:4955–4960. doi: 10.1073/pnas.0910001107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Matouschek A, Kellis JT, Jr, Serrano L, Fersht AR. Mapping the transition state and pathway of protein folding by protein engineering. Nature. 1989;340:122–126. doi: 10.1038/340122a0. [DOI] [PubMed] [Google Scholar]
- 38.Lei H, Chen C, Xiao Y, Duan Y. The protein folding network indicates that the ultrafast folding mutant of villin headpiece subdomain has a deeper folding funnel. J Chem Phys. 2011;134:205104. doi: 10.1063/1.3596272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chung HS, Louis JM, Eaton WA. Experimental determination of upper bound for transition path times in protein folding from single-molecule photon-by-photon trajectories. Proc Natl Acad Sci USA. 2009;106:11837–11844. doi: 10.1073/pnas.0901178106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chung HS, McHale K, Louis JM, Eaton WA. Single molecule fluorescence experiments determine protein folding transition path times. Science. 2012;335:981–984. doi: 10.1126/science.1215768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zwanzig R. Diffusion in a rough potential. Proc Natl Acad Sci USA. 1988;10485:2029–2030. doi: 10.1073/pnas.85.7.2029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang BW, Jasnow D, Zuckerman DM. Efficient and verified simulation of a path ensemble for conformational change in a united-residue model of calmodulin. Proc Natl Acad Sci USA. 2007;104:18403–18048. doi: 10.1073/pnas.0706349104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Malinin SV, Chernyak VY. Transition times in the low-noise limit of stochastic dynamics. J Chem Phys. 2010;132:014504. doi: 10.1063/1.3278440. [DOI] [PubMed] [Google Scholar]
- 44.Chaudhury S, Makarov DE. A harmonic transition state approximation for the duration of reactive events in complex molecular rearrangements. J Chem Phys. 2010;133:034118. doi: 10.1063/1.3459058. [DOI] [PubMed] [Google Scholar]
- 45.Best RB, Hummer G. Reaction coordinates and rates from transition paths. Proc Natl Acad Sci USA. 2005;102:6732–6737. doi: 10.1073/pnas.0408098102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hummer G. Position-dependent diffusion coefficients and free energies from Bayesian analysis of equilibrium and replica molecular dynamics simulations. New J Phys. 2005;7:34. [Google Scholar]
- 47.Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol. 2004;14:70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






