Abstract
The reactive flux between folded and unfolded states of a two-state protein, whose coarse-grained dynamics is described by a master equation, is expressed in terms of the commitment or splitting probabilities of the microstates in the bottleneck region. This allows one to determine how much each transition through a dividing surface contributes to the reactive flux. By repeating the analysis for a series of dividing surfaces or, alternatively, by partitioning the reactive flux into contributions of unidirectional pathways that connect reactants and products, insight can be gained into the mechanism of protein folding. Our results for the flux in a network with complex connectivity, obtained using the discrete counterpart of Kramers’ theory of activated rate processes, show that the number of reactive transitions is typically much smaller than the total number of transitions that cross a dividing surface at equilibrium.
INTRODUCTION
Discrete coarse-graining of the dynamics is a powerful approach to studying protein folding and other molecular processes that occur on time scales inaccessible to current atomistic simulations.1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 Configuration space is first divided into appropriately defined discrete microstates, and then the dynamics is modeled using a master equation that describes stochastic transitions among these microstates. The required rate coefficients can be extracted using a variety of strategies, for instance by monitoring transitions between coarse-grained states in molecular simulations.7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
Here we establish the relation between the macroscopic kinetics and the underlying microscopic dynamics on a network with complex connectivity (Fig. 1). The focus is on providing a theoretical framework to unravel mechanisms of protein folding when the dynamics is described by a master equation. However, our results are applicable to any discrete kinetic system, such as those that arise in systems biology or the modeling of enzyme catalysis.
One of the main results of this paper is a remarkably simple expression for the reactive flux in terms of the commitment or splitting probabilities, pfold, of microstates in the bottleneck region that separates reactants from products. The pfold of a microstate is defined as the probability that a protein starting from this microstate folds before it unfolds. While this result simplifies the calculation of the rate, its primary importance is that it gives the contribution to the reactive flux of each transition through a dividing surface located in the bottleneck region. By repeating such an analysis for a series of dividing surfaces or from the decomposition of the reactive flux into a sum of contributions of unidirectional pathways, one can gain insight into the mechanism of protein folding.
Our expression for the reactive flux is applicable to discrete dynamics with complex connectivity of the microstates (Fig. 1) and is not based on even an implicit assumption that low-dimensional reaction coordinates exist. We shall show that in general the reactive flux is much smaller than the upper bound given by transition state theory (TST). In TST, which ignores recrossings, the reactive flux is approximated by the number of transitions per unit time that cross a dividing surface in one direction. Our expression for the reactive flux shows that the TST contribution of every transition i→j that crosses the dividing surface is reduced by the difference in pfold's of microstates j and i. When transitions occur between microstates with similar values of pfold, this difference is much smaller that unity.
The commitment or splitting probability, pfold, plays a key role in our analysis. pfold(i) can be found from trajectories obtained in simulations or in state-resolved single-molecule experiments by counting the fraction of trajectory fragments starting in microstate i that reach the fully folded state before the unfolded state. The importance of this quantity appears to have been first recognized by Ryter18 who suggested using the stochastic separatrix (the locus of the phase space points for which the splitting probability is equal to 1∕2) as the boundary separating reactants from products. This idea has been exploited in the theory of chemical reactions19 and was introduced into the protein-folding literature by Du et al.20 who defined the transition state ensemble (TSE) as the set of microstates with pfold close to 1∕2. In this paper we shall derive the expression for the reactive flux in terms of pfold using the discrete analogs of the Kramers21 and Chandler22 approaches.
THEORY
Our analysis is based on the master equation that describes the evolution of the probability pi(t) of finding the protein in microstate i at time t, i=1,2,…,N,
(1) |
The ij-th element of the N×N rate matrix K, Kij, is the rate constant for transition j→i (i≠j) and Kjj=−∑i≠jKij. The equilibrium probability of finding the protein in microstate i, peq(i), is the solution to Kpeq=0 with the normalization . The elements of the rate matrix satisfy the condition of detailed balance, Kijpeq(j)=Kjipeq(i), which follows from the requirement that there is no net flux between microstates at equilibrium.
We shall assume that the structure of the rate matrix, K, is such that folding can be accurately described by a phenomenological kinetic scheme
(2) |
where kF and kU are the folding and unfolding rate constants for transitions between unfolded (U) and folded (F) states of the protein. According to this scheme, the reactive flux in the U→F direction at time t is kFfU(t), where fU(t) is the population of the unfolded state at time t. At equilibrium this flux, which is the number of U→F transitions per unit time, is equal to the reactive flux in the opposite (F→U) direction. Both fluxes, denoted by J, are given by
(3) |
where and are fractional populations of the folded and unfolded states of the protein at equilibrium.
For the kinetics to be described by the scheme in Eq. 2, the majority of the protein population must be localized in two basins separated by a narrow bottleneck, so that the intrabasin relaxation occurs much faster than the interbasin exchange, and thus kinetics of equilibration is essentially single exponential. In such a system all microstates can be separated into three groups: definitely unfolded (UU), definitely folded (FF), and intermediate (I) or bottleneck microstates (Fig. 1). There is some freedom in the choice of the bottleneck boundaries (two dashed lines in Fig. 1) since the reactive flux is rather insensitive to their precise location. Although the equilibrium population of the I-microstates is small, ∑i∊Ipeq(i)⪡∑i∊Lpeq(i), L=FF,UU, these microstates play an important role since all transitions between the UU- and FF-basins occur only through the bottleneck, UU⇔I⇔FF (i.e., we assume that there are no direct transitions between FF- and UU-basins, Kij=Kji=0, i∊UU,j∊FF).
To establish the relationship between the reactive flux at equilibrium and the discrete protein dynamics, we follow Kramers and consider a steady state in which all trajectories entering the FF-basin from the bottleneck are instantly returned to the UU-basin to maintain local equilibrium in this basin. The resulting steady-state distribution, pss(i), satisfies
(4) |
subject to the boundary conditions pss(i)=peq(i) if i∊UU and pss(i)=0 if i∊FF. The reactive flux J is approximated by the steady-state flux through an arbitrary dividing surface Σ in the bottleneck region, which is necessarily crossed by any trajectory connecting the UU- and FF-basins,
(5) |
where F∗ and U∗ denote the sets of the microstates in the I region located on the F- and U-sides of Σ that are connected by a single transition.
The solution to Eq. 4 can be written in terms of peq(i) and the splitting probability, pfold(i), defined as the probability of reaching the FF-basin before reaching the UU-basin starting from microstate i. For i∊FFpfold(i)=1, while for i∊UUpfold(i)=0. The remaining pfold's, as indicated in Ref. 23, can be found by solving the discrete version of Onsager’s equation24 for the splitting probability,
(6) |
This equation can be derived by considering the relation between the pfold of microstate j and those of microstates that are connected to it by a single transition (i.e., microstates i for which Kij≠0). After one transition from microstate j, the protein ends up in microstate i with probability Kij∕∑l≠jKlj. From the probability conservation it then follows that
(7) |
Since ∑l≠jKlj=−Kjj, Eq. 7 is equivalent to Eq. 6.
Comparing Eqs. 4, 6 and using the condition of detailed balance, Kijpeq(j)=Kjipeq(i), it can be shown that
(8) |
Substituting this solution for pss(i) into Eq. 5 and again using the condition of detailed balance, we obtain one of the main results of this paper,
(9) |
which is the expression for the reactive flux when the underlying dynamics involves stochastic transitions on a network of microstates with complex connectivity (Fig. 1). One can use this expression to find the folding and unfolding rate constants by means of the flux-over-population formula, , [see Eq. 3]. In the special case when the microstates correspond to lattice points in a multidimensional space and the transitions occur only between the nearest neighbors, the continuum limit of Eq. 9 has been obtained by E and Vanden-Eijnden.25 We now rederive the result in Eq. 9 by another method.
To obtain this expression using Chandler’s approach,22 consider the “number” correlation function CF(t)=⟨IF(t)IF(0)⟩, where IF is unity when the protein is in one of the microstates on the F side of Σ and zero otherwise. In the two-state phenomenological description, Eq. 2, this correlation function is , so that its time derivative at t=0 is equal to the reactive flux, . In the framework of the more detailed coarse-grained description, Eq. 1, this correlation function is given by CF(t)=∑i,j∊FGij(t)peq(j), where Gij(t) is the probability of finding the protein in microstate i at time t, given that it starts from microstate j at t=0. Using the evolution equation, dGij(t)∕dt=∑nKinGnj(t) with the initial condition Gij(0)=δij, the time derivative of the correlation function can be written as,
(10) |
Using the relations Gij(t)peq(j)=Gji(t)peq(i) and Kijpeq(j)=Kjipeq(i), which follow from the condition of detailed balance, and the fact that ∑i∊FKij=−∑i∊UKij, one can recast Eq. 10 as
(11) |
where i and n are “connected” microstates located on opposite sides of the dividing surface. As t→0, −dCF(t)∕dt reduces to the TST estimate for the reactive flux through the dividing surface Σ,
(12) |
To get a better estimate of the reactive flux, consider −dCF(t)∕dt on a timescale that is much smaller than the mean interbasin equilibration time, (kF+kU)−1, but much larger than the time spent in the bottleneck region by a protein starting in a microstate near the dividing surface. On such a time scale, once the protein reaches the FF- or UU-basin, it does not return to the bottleneck region, so these basins are effectively absorbing. This allows us to approximate the sums in Eq. 11 as: ∑j∊FGji(t)≈∑j∊FFGji(t)≈pfold(i), i∊U∗ or F∗. As a result, −dCF(t)∕dt in Eq. 11 reduces to the reactive flux in Eq. 9.
One can also obtain the expression for the reactive flux, Eq. 9, using the following simple argument. Consider an equilibrium ensemble of identical proteins. Each protein in the ensemble can be in one of the two basins or in the bottleneck. We denote the fraction of proteins in microstate i,i∊I, that entered the bottleneck from the UU-basin by νU(i). The reactive flux is the average number of trajectories that go from the UU-basin to the FF-basin per unit time. The forward (i.e., in the U→F direction) flux through Σ, Jf(Σ), due to such trajectories is given by Jf(Σ)=∑i∊F∗,j∊U∗pfold(i)Kijpeq(j)νU(j). Since a reactive trajectory can cross Σ several times, the reactive flux is Jf(Σ) minus the flux due to backward transitions of these trajectories through Σ, Jb(Σ)=∑i∊F∗,j∊U∗pfold(j)Kjipeq(i)νU(i). Thus, the reactive flux is
(13) |
From the time-reversal symmetry of the underlying microscopic dynamics it follows that, at equilibrium, for any trajectory fragment that goes from microstate j to microstate i in time t, there is a counterpart going in the opposite direction. As a consequence, the fraction of proteins in microstate i, i∊I, that entered the bottleneck from the UU-basin, νU(i), is the same as the fraction of proteins that start from this microstate and reach the UU-basin before reaching the FF-basin, so that
(14) |
Using this relation and the condition of detailed balance, Kijpeq(j)=Kjipeq(i), in Eq. 13, we recover our result for the reactive flux in Eq. 9.
DISCUSSION
As the simplest example, consider the three-state scheme U⇔I⇔F, where kXY is the rate constant for the transition Y→X with X,Y=U,I,F. When the population of I is small, the rates kF and kU can be obtained from the three rate equations by setting dI∕dt=0. For the folding rate this procedure yields kF=kFIkIU∕(kUI+kFI). If one chooses the dividing surface to be between the U and I states, our formalism [Eq. 9] predicts that the reactive flux is J=[pfold(I)−pfold(U)]kIUpeq(U). Since pfold(U)=0 and pfold(I)=kFI∕(kUI+kFI), kF=J∕peq(U) agrees with the above result obtained by making the steady-state assumption for the population of I. It can be readily verified that if one chooses the dividing surface to be between I and F, our formalism yields the identical expression for kF. In fact, it can be shown in general that the reactive flux in Eq. 9 is the same for any dividing surface chosen so that any reactive trajectory unavoidably crosses this surface.
Whereas J is independent of the choice of Σ, individual contributions to the reactive flux may be both positive and negative. However, if we specify Σ using pfold(i), then all contributions have the same sign. Let us chose the dividing surface Σα so that all microstates with pfold(i)<α are on the U-side of Σα, while all microstates with pfold(i)>α are on the F-side of this dividing surface, 0<α<1. We denote the local reactive flux flowing in the U→F direction from microstate i to microstates j that has a greater pfold by Ji→j,
(15) |
Since all fluxes Ji→j crossing Σα are positive, we can determine the importance of each transition based on the magnitude of its contribution to the reactive flux [Fig. 1b]. By repeating this procedure for a series of dividing surfaces Σα, one can evaluate the importance of different folding∕unfolding pathways. Of particular interest are transitions that cross the dividing surface with α=1∕2, since microstates involved in such transitions form the TSE. Using Eq. 9 one can ascertain the dynamically most relevant members of the TSE, i.e., microstates i and j, for which the local reactive flux is the largest. Interestingly, these microstates need not be those with the highest equilibrium populations. Thus, one can determine whether the reactive flux is localized (i.e., only a few pairs of microstates contribute) or delocalized (i.e., there are similar contributions from many transitions). Finally we note that when transitions occur between microstates with similar values of pfold(i), the difference of the splitting probabilities is much smaller than unity, [pfold(j)−pfold(i)]⪡1. The reactive flux in Eq. 9 is then well below its TST upper bound given in Eq. 11, in which recrossings are ignored.
Instead of dividing the total reactive flux into contributions due to transitions crossing a dividing surface, we can decompose it into a sum of contributions due to unidirectional paths connecting the UU- and FF-basins. The local reactive flux between microstate i and j (pfold(j)>pfold(i)) is positive, Ji→j>0, and directed in the U→F direction. Because of the probability conservation, the sums of local reactive fluxes entering and exiting a microstate i are equal, ∑kJk→i=∑jJi→j=Ji, where pfold(k)<pfold(i)<pfold(j). One can construct a directed graph that connects the UU- and FF-basins by following the local reactive fluxes [Fig. 1b]. The contribution of each unidirectional path on this graph to the total reactive flux can be determined as follows. A path that starts from i0∊UU and ends up at iL∊FF passing through microstates i1,i2,…,iL−1∊I contributes to the reactive flux. The sum of all “path” fluxes is equal to the total reactive flux J. The unidirectional path with the largest contribution can be found, for example, by using Dijkstra’s algorithm for the shortest path, identifying ln(Jj∕Ji→j) as the “length” associated with the transition from i to j, and ln(J∕Ji0→i1) as the length associated with the entrance into the bottleneck from the UU-basin.
As an illustration consider the four-state model shown in Fig. 2a. In this model the unfolded (U) and folded (F) states of the protein are separated by two bottleneck states I1 and I2, through which all U-to-F transitions occur. Given rate constants for which pfold(I2)≥pfold(I1), the local reactive fluxes obtained from Eq. 15 are shown in Fig. 2b. There are four distinct dividing surfaces in the bottleneck region. Since the total reactive flux J is independent of the choice of the dividing surface we have
(16) |
From this it follows that JU→1=J1→2+J1→F and J2→F=JU→2+J1→2. These relations represent flux conservation at the nodes. Thus, there are only three independent local reactive fluxes. The total reactive flux can be partitioned among the three unidirectional “flux” paths as follows: J(U→I2→F)=JU→2, J(U→I1→F)=JU→1[J1→F∕(J1→2+J1→F)]=J1→F, and J(U→I1→I2→F)=JU→1[J1→2∕(J1→2+J1→F)]=J1→2. The sum of these fluxes is equal to J.
CONCLUDING REMARKS
To summarize, the phenomenological kinetic scheme in Eq. 2 describes the equilibration of the populations of two basins where the majority of the protein population is localized. The equilibration occurs via a sparsely populated bottleneck region, which connects these basins. Although the equilibration rate is determined by dynamics in the bottleneck, this region does not appear in the phenomenological description since the protein spends a small fraction of time there. One of the main results of our analysis is the expression for the reactive flux in Eq. 9, which relates the phenomenological description of two-state protein folding and a more detailed description of the protein dynamics that involves transitions among discrete coarse-grained microstates. This expression allows one to determine how much each transition through a dividing surface contributes to the observed rate. Alternatively, one can divide the total reactive flux into a sum of contributions due to all directed paths connecting the definitely unfolded and definitely folded microstates. Although our primary focus has been on problems where the reactive flux is related to the phenomenological rate constants describing two-state kinetics, the expression in Eq. 9 is actually valid for an arbitrary partitioning of the microstates into three subsets. Specifically, it gives the number of trajectory fragments per unit time that leave the UU region and enter the FF region, irrespective of whether the intermediate microstates are sparsely populated or even exist.
ACKNOWLEDGMENTS
This study was supported by the Intramural Research Program of the NIH, Center for Information Technology and National Institute of Diabetes and Digestive and Kidney Diseases.
References
- Zwanzig R., Szabo A., and Bagchi B., Proc. Natl. Acad. Sci. U.S.A. 89, 20 (1992) 10.1073/pnas.89.1.20 [DOI] [PMC free article] [PubMed] [Google Scholar]; Zwanzig R., Proc. Natl. Acad. Sci. U.S.A. 92, 9801 (1995). 10.1073/pnas.92.21.9801 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munoz V. and Eaton W. A., Proc. Natl. Acad. Sci. U.S.A. 96, 11311 (1999) 10.1073/pnas.96.20.11311 [DOI] [PMC free article] [PubMed] [Google Scholar]; Henry E. R. and Eaton W. A., Chem. Phys. 307, 163 (2004). 10.1016/j.chemphys.2004.06.064 [DOI] [Google Scholar]
- Ozkan S. B., Bahar I., and Dill K. A., Nat. Struct. Biol. 8, 765 (2001) 10.1038/nsb0901-765 [DOI] [PubMed] [Google Scholar]; Schonbrun J. and Dill K. A., Proc. Natl. Acad. Sci. U.S.A. 100, 12678 (2003). 10.1073/pnas.1735417100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang W. and Chen S. J., J. Chem. Phys. 119, 8716 (2003). 10.1063/1.1613255 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans D. A. and Wales D. J., J. Chem. Phys. 121, 1080 (2004). 10.1063/1.1759317 [DOI] [PubMed] [Google Scholar]
- Pollak E., Auerbach A., and Talkner P., Biophys. J. 95, 4258 (2008). 10.1529/biophysj.108.136358 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Becker O. M. and Karplus M., J. Chem. Phys. 106, 1495 (1997). 10.1063/1.473299 [DOI] [Google Scholar]
- Cieplak M., Henkel M., Karbowski J., and Banavar J. R., Phys. Rev. Lett. 80, 3654 (1998). 10.1103/PhysRevLett.80.3654 [DOI] [Google Scholar]
- de Groot B. L., Daura X., Mark A. E., and Grubmüller H., J. Mol. Biol. 309, 299 (2001). 10.1006/jmbi.2001.4655 [DOI] [PubMed] [Google Scholar]
- Swope W. C., Pitera J. W., and Suits F., J. Phys. Chem. B 108, 6571 (2004). 10.1021/jp037421y [DOI] [Google Scholar]
- Singhal N., Snow C. D., and Pande V. S., J. Chem. Phys. 121, 415 (2004) 10.1063/1.1738647 [DOI] [PubMed] [Google Scholar]; Park S. and Pande V. S., J. Chem. Phys. 124, 054118 (2006) 10.1063/1.2166393 [DOI] [PubMed] [Google Scholar]; Chodera J. D., Singhal N., Pande V. S., Dill K. A., and Swope W. C., J. Chem. Phys. 126, 155101 (2007) 10.1063/1.2714538 [DOI] [PubMed] [Google Scholar]; Singhal Hinrichs N. and Pande V. S., J. Chem. Phys. 126, 244101 (2007). 10.1063/1.2740261 [DOI] [PubMed] [Google Scholar]
- Rao F. and Caflisch A., J. Mol. Biol. 342, 299 (2004). 10.1016/j.jmb.2004.06.063 [DOI] [PubMed] [Google Scholar]
- Chekmarev D. S., Ishida T., and Levy R. M., J. Phys. Chem. B 108, 19487 (2004) 10.1021/jp048540w [DOI] [Google Scholar]; Andrec M., Felts A. K., Gallicchio E., and Levy R. M., Proc. Natl. Acad. Sci. U.S.A. 102, 6801 (2005). 10.1073/pnas.0408970102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sriraman S., Kevrekidis I. G., and Hummer G., J. Phys. Chem. B 109, 6479 (2005). 10.1021/jp046448u [DOI] [PubMed] [Google Scholar]
- Buchete N. V. and Hummer G., J. Phys. Chem. B 112, 6057 (2008). 10.1021/jp0761665 [DOI] [PubMed] [Google Scholar]
- Noé F., Horenko I., Schütte C., and Smith J. C., J. Chem. Phys. 126, 155102 (2007). 10.1063/1.2714539 [DOI] [PubMed] [Google Scholar]
- Faradjian A. K. and Elber R., J. Chem. Phys. 120, 10880 (2004). 10.1063/1.1738640 [DOI] [PubMed] [Google Scholar]
- Ryter D., Physica A 142, 103 (1987) 10.1016/0378-4371(87)90019-7 [DOI] [Google Scholar]; Ryter D., J. Stat. Phys. 49, 751 (1987). 10.1007/BF01009355 [DOI] [Google Scholar]
- Klosek M. M., Matkowsky B. J., and Schuss Z., Ber. Bunsenges. Phys. Chem 95, 331 (1991) [Google Scholar]; Pollak E., Berezhkovskii A. M., and Schuss Z., J. Chem. Phys. 100, 334 (1994) 10.1063/1.467002 [DOI] [Google Scholar]; Talkner P., Chem. Phys. 180, 199 (1994) 10.1016/0301-0104(93)E0426-V [DOI] [Google Scholar]; Drozdov A. N. and Talkner P., Phys. Rev. E 54, 1660 (1996) [DOI] [PubMed] [Google Scholar]; Geissler P. L., Dellago C., and Chandler D., J. Phys. Chem. B 103, 3706 (1999) 10.1021/jp984837g [DOI] [Google Scholar]; Bolhuis P. G., Chandler D., Dellago C., and Geissler P. L., Annu. Rev. Phys. Chem. 53, 291 (2002) 10.1146/annurev.physchem.53.082301.113146 [DOI] [PubMed] [Google Scholar]; Hummer G., J. Chem. Phys. 120, 516 (2004). 10.1063/1.1630572 [DOI] [PubMed] [Google Scholar]
- Du R., Pande V. S., Grosberg A. Yu., Tanaka T., and Shakhnovich E., J. Chem. Phys. 108, 334 (1998). 10.1063/1.475393 [DOI] [Google Scholar]
- Kramers H. A., Physica (Amsterdam) 7, 284 (1940) 10.1016/S0031-8914(40)90098-2 [DOI] [Google Scholar]; Hänggi P., Talkner P., and Borkovec M., Rev. Mod. Phys. 62, 251 (1990) 10.1103/RevModPhys.62.251 [DOI] [Google Scholar]; Nitzan A., Chemical Dynamics in Condensed Phases (Oxford University Press, Oxford, 2006). [Google Scholar]
- Chandler D., J. Chem. Phys. 68, 2959 (1978). 10.1063/1.436049 [DOI] [Google Scholar]
- Berezhkovskii A. and Szabo A., J. Chem. Phys. 121, 9186 (2004) 10.1063/1.1802674 [DOI] [PubMed] [Google Scholar]; Berezhkovskii A. and Szabo A., J. Chem. Phys. 122, 079902 (2004). 10.1063/1.1844397 [DOI] [Google Scholar]
- Onsager L., Phys. Rev. 54, 554 (1938). 10.1103/PhysRev.54.554 [DOI] [Google Scholar]
- E W. and Vanden-Eijnden E., J. Stat. Phys. 123, 503 (2006). 10.1007/s10955-005-9003-9 [DOI] [Google Scholar]