Abstract
The dynamics of proteins in the unfolded state can be quantified in computer simulations by calculating a spectrum of relaxation times which describes the time scales over which the population fluctuations decay to equilibrium. If the unfolded state space is discretized we can evaluate the relaxation time of each state. We derive a simple relation that shows the mean first passage time to any state is equal to the relaxation time of that state divided by the equilibrium population. This explains why mean first passage times from state to state within the unfolded ensemble can be very long but the energy landscape can still be smooth (minimally frustrated). In fact, when the folding kinetics is two-state, all of the unfolded state relaxation times within the unfolded free energy basin are faster than the folding time. This result supports the well-established funnel energy landscape picture and resolves an apparent contradiction between this model and the recently proposed kinetic hub model of protein folding. We validate these concepts by analyzing a Markov State Model of the kinetics in the unfolded state and folding of the mini-protein NTL9 constructed from a 2.9 millisecond simulation provided by D. E. Shaw Research.
It has been more than fifty years since the protein-folding problem was first posed [1]. That proteins can fold into their native conformations so fast despite their vast number of possible conformations, is a puzzle that came to be known as the Levinthal paradox. The “funnel energy landscape” provided a way to visualize the solution to the puzzle, but it has deeper meaning beyond the pictures [2, 3]. The recently introduced kinetic hub model [4] of folding calls into question one aspect of the protein folding funnel namely the smoothness of the funnel; yet it is an important characteristic which is associated with the principle of minimal frustration [5, 6]. Also, in contrast to the smooth funnel model, the kinetic hub model assigns a significant role to non-native interactions in the folding process. The hub model refers to kinetic features of protein folding, i.e. the properties of first passage times within the unfolded free energy basin and between unfolded and folded states; and also to topological features of protein folding, i.e. the connectivity of unfolded states with each other. In this manuscript we resolve an apparent contradiction between kinetic features of the hub model and the smooth funnel model of protein folding; we also comment on the meaning of the hub-like network topology in light of our kinetic analysis.
In a recent paper [7] we addressed the question “how long does it take to equilibrate the unfolded state of a protein.” We found that when a protein equilibrates within the unfolded state free energy basin on a faster time scale than the time it takes to fold, the folding will follow two-state kinetics [8–10], regardless of the number of folding pathways and barriers. So if there are multiple pathways with different barriers, these features of the energy landscape will be hidden from direct observation when the population fluctuations within the unfolded ensemble equilibrate more rapidly than the time course of the folding [11]. The mean first passage times (MFPTs) from state to state within the unfolded ensemble can be expressed in terms of the eigenvalues and eigenvectors of the transition matrix with an absorbing boundary condition at the target state. We showed that for mini-proteins, the MFPTs within the unfolded basin are typically much longer than the time required for the population fluctuations to relax. In this manuscript we derive a simple expression that relates the relaxation times of the individual coarse grained states to the MFPTs to those states; this is a very general relation that is valid whether the relaxation is fast or slow compared to the folding process.
The relaxation time, introduced in our previous study quantifies the process of the population fluctuations decaying to their equilibrium values. The definitions of the relaxation time of state i and the total relaxation time are
(1) |
(2) |
where Tii(t) is the ith diagonal element of the transition matrix and Peq(i) is the equilibrium population of state i. is simply the weighted average of the relaxation times of all the states within a particular ensemble.
Another important time scale is the MFPT to a state i. The MFPT to state i is the average time that a trajectory takes to reach state i for the first time, with the initial conditions chosen according to the thermodynamic equilibrium populations excluding state i, which can be expressed as
(3) |
where is the element from the transition matrix with an absorbing boundary condition at state i, Tabs→i.
Another time scale of the system which can be used to characterize the dynamics is the lifetime of a state. The lifetime of state i is defined as the average time that a trajectory stays at state i during each visit. Note that the probability density Pi(t) for the lifetime distribution of a state i in a Markov State Model (MSM) is an exponential distribution,
(4) |
where λi is the sum of all the outgoing rate constants from state i.
We consider the following scheme to characterize the dynamics of the unfolded free energy basin of a protein. We choose any state i and follow the motions between state i and all the other states within the unfolded basin which we label U-i. In general, the distribuition of U-i lifetimes is not single exponential. However, we can relate the average lifetimes of the states i and U-i to their corresponding populations:
(5) |
In reference [7], we showed that the following relation holds when the equilibration within the unfolded basin is much faster than the folding time; in that case:
(6) |
In other words, when the unfolded free energy basin equilibrates rapidly, the MFPT to any state i within the unfolded state ensemble (U) is approximately equal to the lifetime of the collective state formed from all the other states within the unfolded basin excluding state i.
The lifetimes and relaxation times of the states within the unfolded basin are fundamental time scales which characterize the kinetics within the unfolded basin, many kinetic and thermodynamic quantities can be written in terms of them. For example, the equilibrium conditions can be expressed as:
(7) |
(8) |
where tl denotes the time that a trajectory stays at state U-i in the lth visit, as schematically illustrated in Fig. 1, and T is the total length of the trajectory.
We now show that the MFPT to state i can be expressed as a ratio of the second and first moments of the lifetime distribution of the collective state U-i. We consider a very long trajectory of total length T which moves between state i and state U-i many times as shown schematically in Fig. 1. The MFPT to state i can be calculated by picking a point at random while the trajectory is in state U-i, for example point A in Fig. 1, and clocking the time it takes to get to state i from point A in state U-i. This is repeated many times to build up the passage time distribution. Suppose point A is chosen as the starting point, which as shown in the figure is located during the second visit to U-i with lifetime t2. The probability of starting from point A is equal to the product of the probability of choosing a starting point within the second visit of the trajectory to U-i, which is given by , times the averaged first passage time from point A to state i, which is given by , since the point A can be located at any point in time along the second visit of the trajectory to state U-i with equal probability. The weighted sum of all possible first passage times to state i from any place along the trajectory while it is in state U-i can then be written:
(9) |
where tl and tn are the lifetimes of state U-i during the lth and nth visit of the trajectory to state U-i.
Dividing both the numerator and the denominator of the right hand side of eq. 9 by N, which is the total number of visits to state U-i, gives
(10) |
Eq. 10 shows the fact that the MFPT to state i can be expressed as one half of the second moment divided by the first moment of the lifetime distribution of state U-i.
To write the relaxation times in terms of lifetimes is more complicated. The diagonal transition matrix elements in eq. 1 need to be decomposed into the sum of the contributions from various classes of trajectories, which are associated with different numbers of departures from the starting state, shown as follows,
(11) |
where represents the probability density that the trajectory leaves state U-i at . The first term in the equation above is the probability that the trajectory never leaves state i within the time t. While the second term corresponds to the probability that the trajectory leaves state i at time and comes back to state i at time , then stays in state i through the end of the time t. The third term corresponds to the case that the trajectory leaves state i twice at time and respectively and then returns to state i at time and . The remaining terms can be similarly expressed.
Laplace transforming eq. 11 and using the convolution theorem [12], we have
(12) |
where s is the complex argument of the Laplace transform. Eq. 12 is a geometric series which can be summed,
(13) |
To express P̃U−i(s) in terms of the complex argument s, we use the Taylor expansion,
(14) |
To express the relaxation time of state i in eq. 1 in terms of the lifetimes of state U-i, estimate the integral by taking the s → 0 limit,
(15) |
Substituting T̃ii(s) and Peq(i) using eq. 13 and 7 and ignoring the terms of O(s3) or higher, leads to the following relation:
(16) |
(17) |
On the right hand side of eq. 17, the first term is MFPT to state i (eq. 10), the second term is Peq(i) (eq. 7), and the third term is 1 − Peq(i) (eq. 8). Therefore, the general relationship between the MFPT and the relaxation time can be written as,
(18) |
A similar result was derived by Szabo et al [13, 14] in a different context using a Green’s function approach. The analysis above is exact and general for the relationship between the MFPTs and relaxation times no matter what the shape of the energy landscape is.
We constructed an MSM for the dynamics of NTL9 from a 2.9 millisecond MD trajectory provided by D. E. Shaw Research [15–17].
In Fig. 2B, the spectrum of the implied timescales for NTL9 is shown. With the unmodified equilibrium boundary conditions, there is a major gap between the slowest implied timescale and other implied timescales, which means that the folding is two-state. After applying a reflecting boundary at the folded state (F) [7], the dynamics is restricted to be within U. In table I, when a reflecting boundary at F is imposed, the relaxation times within U (~ 0.4μs) are much faster compared to the slowest implied timescale (~ 7μs) with unmodified equilibrium boundary conditions. Together these results show that two-state folding follows a paradigm: all the unfolded states pre-equilibrate before the U ensemble folds with single exponential kinetics at the slowest implied timescale as the equilibration within U is an order of magnitude faster than the folding process.
Table I.
Protein | Granularity | Tfold (μs) | Slowest (μs) | in unmodified equilibrium BC (μs) | in reflecting BC (μs) |
---|---|---|---|---|---|
NTL9 | 20 | 8.38 | 6.97 | 6.79 | 0.36 |
100 | 9.96 | 8.27 | 7.52 | 0.76 | |
Rugged NTL9 | 20 | 12.52 | 59.02 | 10.26 | 79.04 |
Extremely long MFPTs (~ 1ms) between different regions within the unfolded state ensemble are observed in NTL9 [4], which are orders of magnitude longer than the interconversion between unfolded and folded states. This observation was attributed to non-native interactions and the enormity of conformational space. Using equation 18 we can reconcile the apparent contradiction between the long MFPTs between different regions of the NTL9 unfolded state ensemble, and the rapid equilibration within the unfolded state ensemble [2, 5, 18–20]. There are two contributions to the MFPT to state i. The first comes from the relaxation time of state i, which depends on the structure of the entire free energy landscape of the unfolded basin. Secondly, the MFPT to state i is inversely proportional to the equilibrium population of state i. The fact that many mini-proteins including NTL9 exhibit fast equilibration within the unfolded free energy basin, is hidden by the extremely long MFPTs between regions of the unfolded state ensemble with small populations. For NTL9, long MFPTs to the unfolded states are mainly due to their small unfolded populations even though all of the relaxation times within U are very short (~ 400ns) under reflecting boundary conditions. The kinetic hub model of folding also emphasizes topological features of the folding problem, namely that the native state has high network connectivity, hence use of the term “hub” [21–23]. While the theory we present focuses on kinetic aspects of protein folding, the kinetic and topological features are not completely separate. That the total relaxation times within the unfolded basin of NTL9 and other mini-proteins are much faster than the folding time suggests that on these short time scales, the first passage paths between any two unfolded states are direct. This is indeed what we found. Therefore, whether or not the connectivity appears to be hub like depends on the time window you are looking at. At short times, of the order of the relaxation time of the unfolded state ensemble, the connections between any two unfolded states are direct, while at longer times the folded state acts as a hub.
We now clarify the relationship between the relaxation times and lifetimes in the case of rapid equilibration. Combining eq. 17 and 18 and considering that the resolution of the MSM is fine grained so that the populations of individual unfolded states are relatively small, an approximate expression can be written,
(19) |
where as above, and are the first and second moments of the lifetime distribution of state U-i and is the average lifetime of state i. In the case of rapid equilibration, the lifetime distribution of state U-i tl obeys an exponential distribution, and therefore the coefficient of the lifetime ( ) is 1. So in the limit of rapid equilibration within the unfolded free energy basin, the relaxation time of a state is approximately equal to its lifetime. When the condition of rapid equilibration doesn’t hold, the coefficient of the lifetime gets much larger than 1 so that the relaxation time becomes much longer than the lifetime.
In Fig. 2A we compare MFPTs to the NTL9 unfolded states using the exact formula for the MFPTs expressed in terms of relaxation times (eq. 18) with the corresponding results using the approximate expression involving lifetimes (eq. 5 and 6). We see that the exact and approximate results agree very well with each other, because NTL9 is a two state folder. What happens when fast equilibration within U no longer holds? To examine this case, we introduced an internal barrier in the unfolded state ensemble. As expected (eq. 19), the relaxation times can become much larger than the lifetimes in the case of slow equilibration within U. The green circles in Fig. 2A deviate from the reference MFPTs. Furthermore, Fig. 2C and the last row of table I show that the total relaxation time within U is even slower than the folding time for the rugged NTL9. These results confirm that the relaxation times within the unfolded free energy basin reflect the ruggedness of the free energy landscape.
In this Letter we have derived a general expression which relates the MFPT to any state to the corresponding relaxation time of that state. The MFPT is proportional to the relaxation time ( ) of that state, which reflects the degree of ruggedness of the unfolded basin. Secondly, the MFPT to any state i is inversely proportional to the equilibrium population of that state. For mini-proteins which follow two-state folding, the generally long MFPTs to unfolded states reflects the small equilibrium populations of the target states and this depends on the resolution of the model; while the relaxation times of these unfolded states are short due to the smooth landscape of the unfolded basin. Finally, we note that our results are also consistent with a recent study [24] which showed that nonnative interactions play no role in determining the folding mechanism of a two state folder.
Acknowledgments
This work has been supported by a grant from the National Institutes of Health (GM30580). W. D. would like to thank Dr. Bin W. Zhang for very helpful discussions.
Contributor Information
Wei Dai, Department of Physics and Astronomy, Rutgers the State University of New Jersey, Piscataway, NJ 08854.
Anirvan M. Sengupta, Department of Physics and Astronomy, Rutgers the State University of New Jersey, Piscataway, NJ 08854
Ronald M. Levy, Center for Biophysics and Computational Biology and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122-1801 and Department of Chemistry, Temple University, Philadelphia, PA 19122-1801
References
- 1.Dill KA, MacCallum JL. Science. 2012;338:1042. doi: 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]
- 2.Wolynes P, Onuchic J, Thirumalai D. Science. 1995;267:1619. doi: 10.1126/science.7886447. [DOI] [PubMed] [Google Scholar]
- 3.Karplus M. Nat Chem Biol. 2011;7:401. doi: 10.1038/nchembio.565. [DOI] [PubMed] [Google Scholar]
- 4.Bowman GR, Pande VS. Proc Natl Acad Sci USA. 2010;107:10890. doi: 10.1073/pnas.1003962107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. Proteins: Struct, Funct, Bioinf. 1995;21:167. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
- 6.Haq O, Andrec M, Morozov AV, Levy RM. PLoS Comput Biol. 2012;8:e1002675. doi: 10.1371/journal.pcbi.1002675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Levy RM, Dai W, Deng NJ, Makarov DE. Protein Sci. 2013;22:1459. doi: 10.1002/pro.2335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wolynes PG. Q Rev Biophys. 2005;38:405. doi: 10.1017/S0033583505004075. [DOI] [PubMed] [Google Scholar]
- 9.Lane TJ, Schwantes CR, Beauchamp KA, Pande VS. J Chem Phys. 2013;139:145104. doi: 10.1063/1.4823502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ellison PA, Cavagnero S. Protein Sci. 2006;15:564. doi: 10.1110/ps.051758206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Deng NJ, Dai W, Levy RM. J Phys Chem B. 2013;117:12787. doi: 10.1021/jp401962k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schiff JL. The Laplace Transform: Theory and Applications. Springer; New York: 1999. [Google Scholar]
- 13.Bicout DJ, Szabo A. J Chem Phys. 1997;106:10292. [Google Scholar]
- 14.Szabo A, Schulten K, Schulten Z. J Chem Phys. 1980;72:4350. [Google Scholar]
- 15.Bowman GR, Beauchamp KA, Boxer G, Pande VS. J Chem Phys. 2009;131:124101. doi: 10.1063/1.3216567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. Science. 2011;334:517. doi: 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- 17.Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque IS, Pande VS. J Chem Theory Comput. 2011;7:3412. doi: 10.1021/ct200463m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Socci ND, Onuchic JN, Wolynes PG. J Chem Phys. 1996;104:5860. [Google Scholar]
- 19.Onuchic JN, Luthey-Schulten Z, Wolynes PG. Annu Rev Phys Chem. 1997;48:545. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
- 20.Dill KA, Chan HS. Nat Struct Biol. 1997;4:10. doi: 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
- 21.Lane TJ, Pande VS. J Phys Chem B. 2012;116:6764. doi: 10.1021/jp212332c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dickson A, Brooks CL. J Chem Theory Comput. 2012;8:3044. doi: 10.1021/ct300537s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rao F, Caflisch A. J Mol Biol. 2004;342:299. doi: 10.1016/j.jmb.2004.06.063. [DOI] [PubMed] [Google Scholar]
- 24.Best RB, Hummer G, Eaton WA. Proc Natl Acad Sci USA. 2013;110:17874. doi: 10.1073/pnas.1311599110. [DOI] [PMC free article] [PubMed] [Google Scholar]