Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Dec 12.
Published in final edited form as: J Chem Phys. 2003 Oct 22;119(16):8716–8729. doi: 10.1063/1.1613255

Analyzing the biopolymer folding rates and pathways using kinetic cluster method

Wenbing Zhang 1, Shi-Jie Chen 1,a)
PMCID: PMC2601668  NIHMSID: NIHMS55561  PMID: 19079645

Abstract

A kinetic cluster method enables us to analyze biopolymer folding kinetics with discrete rate-limiting steps by classifying biopolymer conformations into pre-equilibrated clusters. The overall folding kinetics is determined by the intercluster transitions. Due to the complex energy landscapes of biopolymers, the intercluster transitions have multiple pathways and can have kinetic intermediates (local free-energy minima) distributed on the intercluster pathways. We focus on the RNA secondary structure folding kinetics. The dominant folding pathways and the kinetic partitioning mechanism can be identified and quantified from the rate constants for different intercluster pathways. Moreover, the temperature dependence of the folding rate can be analyzed from the interplay between the stabilities of the on-pathway (nativelike) and off-pathway (misfolded) conformations and from the kinetic partitioning between different intercluster pathways. The predicted folding kinetics can be directly tested against experiments.

I. INTRODUCTION

The conformations and conformational transitions of biopolymers, such as RNAs, DNAs, and proteins, play critical roles in their biological functions. It is extremely important to develop a theory that can predict and analyze the kinetics of biopolymer conformational transitions. Unlike small molecules, biopolymers are mostly flexible chain molecules and have a large number of accessible chain conformations. The large number of chain conformations form an exceedingly complex energy landscape for a biopolymer. For example, a chain molecule can fold and unfold along many different routes that connect the initial and final states. Moreover, due to the complex interplay between entropy and enthalpy, the chain can either fold along smooth downhill fast pathways, or be trapped by kinetic intermediates (traps) that must be disrupted in the folding process.16 Therefore, in order to treat realistic biopolymers, a folding kinetics theory must be able to account for the large number of chain conformations, the multiple kinetic pathways, and the kinetic traps.

Predicting the detailed folding kinetics from the first principle analytical calculations is generally not yet possible for realistic biopolymers. One of the key problems in developing a folding kinetics theory is how to treat the enormously large conformational ensemble. For example, we are not able to quantitatively analyze the RNA folding kinetics based on the complete ensemble of RNA secondary structures. In this paper, by dividing the full conformational ensemble into several interconnected clusters (the kinetic cluster method), we develop a rigorous statistical mechanical framework to predict the detailed RNA secondary structure folding kinetics, including the folding and unfolding rates, dominant pathways, possible kinetic intermediates, and the temperature dependence of the folding kinetics. With the kinetic cluster method, we can make reliable predictions for the folding kinetics through analytical calculations, and the predicted results can be directly applied to the experiments.

A number of studies have applied the master equation approach to describe the protein and RNA folding kinetics.719 Consider a chain molecule that has Ω conformational states; let Pi be the fractional population (probability) of state i. The time evolution of Pi obeys the following master equation:

dPidt=j(kjiPjkijPi),

for i = 1,2,…,Ω, where kij and kji are the rate constants for transitions from state i to j and from j to i, respectively. The rate constants form the rate matrix. For a given initial condition of the populational distribution, the populational kinetics P(t) = col (P1(t), P2(t), P3(t),…) is determined by the linear superposition of the eigenmodes of the rate matrix

P(t)=m=0Ω1Cmnmeλmt, (1)

where −λm and nm are the mth eigenvalue and eigenvector of the rate matrix, respectively, and Cm is the coefficient determined by the initial condition at time t = 0. Each eigen-mode (−λm, nm) represents a kinetic mode of rate λm.

The master equation description of the folding kinetics is general. It considers the full ensemble of the chain conformational states and accounts for the transitions between each and every pair of the kinetically connected states. However, since the number of chain conformations Ω increases exponentially as the chain length grows, the applicability of the master equation approach for realistic biopolymers is strongly limited by the large size (Ω×Ω) of the rate matrix. But, if there exist discrete rate-limiting steps for the kinetic process, it would be possible to “renormalize” the conformational space into a number of clusters. The large ensemble of chain conformations can thus be drastically reduced into a much smaller number of conformational clusters.2022

Different clusters are separated by the rate-limiting steps. If the rate-limiting steps involve sufficiently high kinetic barrier, the microstates within each cluster would have sufficient time to equilibrate and form a macrostate (in local equilibrium) before crossing the intercluster barriers to enter other (kinetically neighboring) clusters. The transitions between different clusters (macrostates) determine the overall folding kinetics of the molecule.

One might use the classical transition state theory to estimate the intercluster rate constants. The classical standard transition state theory, originally developed for chemical reaction dynamics, is based on two assumptions:23 reactants and products are thermal equilibrium ensembles, and molecules that pass through the transition state will proceed to the products. The classical transition state theory is not directly applicable to the intercluster dynamics in biopolymer folding. First, there are multiple parallel pathways connecting different clusters and thus the intercluster kinetics is more convoluted. Second, often the pathways involve kinetic intermediates between the clusters. The kinetic intermediates (and traps) can cause the “rebounds” effect: the molecule may proceed to cross over an intercluster barrier but then immediately find itself trapped in a local minima, causing the molecule to recross the intercluster barrier.

A number of attempts have been made to model the intercluster kinetics.2022,2426 In general, two types of methods have been proposed. In the first type of approach, the intercluster rate is computed based on the transition state with the lowest barrier. For each pair of initial and final states, the optimized pathway is used to estimate the rate constant. In the second type of approach, conformations interconvertible through barrierless transitions are classified as a cluster, and the rate constants are calculated based on all the possible intercluster kinetic pathways. Our present kinetic cluster method is based on the pre-equilibrated macrostates, which includes the barrierless conformational cluster used in the previous model25,26 as a special subset. So, the present approach is more general. From the intercluster rate constants, we construct the reduced rate matrix. From the eigenvalues and eigenvectors of the reduced rate matrix, we can analyze the folding/unfolding rates and the pathways for the overall kinetics.

The folding kinetics analysis is often extremely convoluted due to the large number of possible conformations and pathways and the complex shape of the free-energy landscape. With the kinetic cluster approach developed here, we can find the dominant folding pathways. Also, from the interplay between the stabilities of the nativelike and the misfolded states and from the kinetic trapping effects, we can predict the temperature dependence of the folding rate from analytical calculations. Moreover, the present method may be a first step for the development of a method to treat biopolymer folding kinetics with long chain length. A master equation approach to the long chain folding kinetics has been limited by the large conformational ensemble, while the present kinetic cluster method can reduce the large conformational ensemble into several clusters.

The pre-equilibration and cluster formation have been observed in previous experiments and computer simulations,2738 and the pre-equilibrated macrostates have been used to quantitatively analyze the chemical kinetics and other small molecule systems.3338 The present kinetic cluster approach is a systematic and analytical treatment for complex biopolymer folding kinetics, which has strong sequence dependence, and involves multiple pathways, intermediates, traps, and networks of clusters. The present kinetic cluster approach would be a step forward toward a statistical mechanical framework for the ab initio predictions for the folding rates, pathways, traps (intermediates), and their temperature dependences from the sequence.

II. KINETIC CLUSTER ANALYSIS FOR BIOPOLYMER FOLDING KINETICS

A. Intercluster kinetics

We assume that according to the rate-limiting steps, the conformational ensemble can be divided into two clusters denoted by C and N, such that the intracluster transitions (CC and NN) are much faster than the intercluster transitions (CN). We also assume that cluster C has ΩC conformations: C1,C2,…,CΩC, and cluster N has ΩN conformations: N1,N2,…, NΩN.

In general, there are multiple pathways that can connect the clusters. We use ωCN to denote the number of the intercluster pathways, and use CiNi to denote the ith pathway, where conformations Ci and Ni are connected by a kinetic move. We call such conformations Ci and Ni on the intercluster pathways “pathway conformations,” and call the other conformations “nonpathway conformations.”

Considering that Ci and Ni occupy certain fractional populations, the effective contribution of the CiNi pathway to the overall intercluster transition rates is given by the following equations:

kCiNieff=PCikCiNi;kNiCieff=PNikNiCi; (2)

where PCi and PNi are the fractional populations of Ci and Ni in the respective clusters and are determined by the following equilibrium distribution:

PCi=e(GCiGC)/kBT;PNi=e(GNiGN)/kBT, (3)

where GC and GN are the free energy of the conformational ensemble in clusters C and N, respectively,

GC=kBTln(j=1ΩCeGCj/kBT);GN=kBTln(j=1ΩNeGNj/kBT). (4)

The total intercluster transition rate can be computed as the sum over all the ωCN pathways

kCN=i=1ωCNkCiNieff;kNC=i=1ωCNkNiCieff. (5)

B. Formation of the kinetic clusters

In general, the clusters should satisfy the condition that the intracluster transitions are much faster than the intercluster transitions, such that each cluster is a quasiclosed system. From Eqs. (2)(5), a slow intercluster rate kCN and kNC can arise from (a) small number ωCN of the intercluster pathways; (b) small fractional populations PCi and PNi for the pathway conformations, i.e., high free energies of Ci (and Ni) in the respective clusters [see Fig. 1(A)]; (c) small rate constants kCi→Ni and kNi→Ci for each intercluster pathway, i.e., high kinetic barrier between Ci and Ni [see Fig. 1(B)].

FIG. 1.

FIG. 1

Free-energy landscape and the formation of a cluster. (A) The pathway conformation Ci has high free energies and small populations and thus has a slow rate to escape from cluster C; (B) the intercluster pathways (CiNi) have high free-energy barriers and slow rates.

The pathway conformations effectively form the boundary of a cluster in the conformational space. In order for a cluster to form a pre-equilibrated macrostate, the rates for the (pathway) conformations to escape from the cluster must be small as compared to the rates to enter the cluster. Mathematically, this condition can be expressed by the following inequalities for the rate constants related to the pathway conformations Ci and Ni (i = 1,2,…,ΩCN):

NjNkCiNjCjCkCiCj;CjCkNiCjNjNkNiNj. (6)

The clusters in Figs. 1(A) and 1(B) represent two different scenarios that satisfy the above conditions.

The classification of the conformations into clusters depends on the temperature, solvent condition, etc., because different conditions may have different rate-limiting steps that divide the clusters. For example, under a strong folding condition, the overall relaxation kinetics would be dominated by the folding process. In this case, the rate-limiting steps can be (a) the slow steps in the formation of native intrachain contacts and (b) the slow steps in the disruption of non-native contacts. Under conditions that the unfolding process becomes dominant, the rate-limiting steps can be the slow disruption of the native contacts.

C. Dominant intercluster pathways

Suppose the folded native state is in cluster N. The probability for a molecule to fold through a pathway CiNi is determined by the ratio kCiNieff/kCN, where kCiNieff is the contribution from the CiNi [see Eq. (2)] and kCN is the total rate for all the pathways [see Eq. (5)], and the probability for an unfolding reaction through pathway NiCi is determined by the ratio kNiCieff/kNC. The dominant pathways in the overall kinetics are the most probable CiNi and NiCi transitions. Since kCiNieff,kNiCieff, and kCN, kNC are strongly temperature dependent [see Eqs. (2)(5)], the dominant pathways can be quite different for different temperatures.

D. Effect of the kinetic intermediates on the intercluster kinetics

Due to the complex free-energy landscape of biopolymers, there may exist kinetic intermediates distributed on the intercluster pathways. These kinetic intermediate play an important role in determining the rate and the dominant pathways for the intercluster transitions and for the overall kinetics of the system.

In Fig. 2 we show a schematic one-dimensional free-energy landscape that involves one kinetic intermediate [Figs. 2(A)–2(C)] and multiple kinetic intermediates [Fig. 2(D)] on an intercluster pathway. Physically, if cluster C represents a macrostate for non-native conformations that form a deep kinetic trap, Fig. 2(A) shows that after the molecule is detrapped from C through conformation Ci, it immediately finds itself trapped in a local minimum a before proceeding to fold into the native cluster N (through, e.g., state Ni).

FIG. 2.

FIG. 2

An intercluster pathway (CiNi) that involves one local minimum ai : (A) ΔGb and ΔGf are on the same order of magnitude; (B) ΔGb ≪ ΔGf ; (C) ΔGb = 0 (cooperative transition between Ci and Ni); and (D) two local minima ai and ai.

The free-energy landscape and the rate constants related to a determine that a cannot be treated as a pathway conformation in either cluster C or cluster N [see Eq. (6)]. One can treat the kinetic intermediate as a separate state, and solve the master equation for the multicluster system including local minima a’s as separate states. However, given the large number of intercluster pathways, such an approach could be computationally difficult and the analysis for the results would be quite convoluted in general. Therefore, it is useful to derive an analytical expression for the folding rates based on simple physical analysis.

1. The folding rate

We use Fig. 2(A) to illustrate the general method. Suppose the native state is in cluster N; then, the transition CN represents a folding process, and the folding rate kF is equal to the intercluster transition rate for CN. As shown in Fig. 2(A), since a certain fraction of population that enters the intermediate state a from state Ci would quickly rebound to re-enter cluster C, the effective populations for the forward folding process would be reduced.

We assume that the intermediate state a is not sufficiently stable to form a deep kinetic trap, and there is thus no significant accumulation in the population for state a (steady-state approximation). In this case, the fractional population that flows back to cluster C from the state a is determined by the rate constants kb and kf [see Fig. 2(A)] for the backward rebound transitions and the forward folding transitions, respectively: kb/(kf+kb), and the fractional population that flows into the native cluster N is kf/(kf+kb). These factors are similar to the transmission coefficients in the transition state theory for chemical kinetics.39 Considering the multiple pathways in the rebound and folding transitions, kb and kf can be the total rate for all the possible backward rebound transitions (from a) and for all the possible forward folding transitions (from a), respectively: kb = ΣCj∈CkaCj; kf = ΣNj∈NkaNj.

Considering the distribution of the fractional population PCi of state Ci in cluster C [see Eq. (3)], we obtain the following effective rate constant kCi→N for the folding transition CiN through state a:

kCiN(eff)=PCikCiakf/(kf+kb). (7)

The overall folding rate kF is given by the sum over all the possible pathways between C and N:

kF=i=1ωCNkCiNeff=i=1ωCNPCikCiakf/(kf+kb). (8)

Similarly, the unfolding rate kU for the NC transition is given by

kU=i=1ωCNPNikNiakb/(kf+kb). (9)

As we will show later in the following sections, Eq. (8) and Eq. (9) give quite accurate estimations for the folding and unfolding rates for the intermediate states-mediated inter-cluster transitions.

The above analysis for the rebounds effect on the inter-cluster kinetics and the rates kF and kU is based on the assumption that the local minimum (a) is not a deep trap and thus there is no significant trapping of the population. Equation (8) and Eq. (9) show that kF and kU depend on the intermediates a through the relative ratio kf/kb rather than the absolute values of kf and kb. In terms of the free energies shown in Figs. 2(A)–2(C), kF and kU depends only on the difference in the free-energy barriers, ΔGf − ΔGb, not the absolute values of ΔGf (for the forward transition aNi) or ΔGb (for the backward transition aCi). However, if the barriers ΔGf and ΔGb are very high, a would become very stable and the trapping effect would be significant. In the deep trapping case, the folding and unfolding rate kF and kU would depend on the absolute values of ΔGf and ΔGb. In fact, as the local minimum a becomes deeper, due to the populational accumulation and the related kinetic pause, the folding and unfolding would be slower.

2. Folding rate and the stability of the intermediate states

For a schematic one-dimensional free-energy landscape shown in Fig. 2(A), we assume that the rate constants are determined by the corresponding kinetic barriers as the following:

kCia=eΔGj(0)/kBT;kf=eΔGf/kBT;kb=eΔGb/kBT.

The rate for the intermediate state-mediated intercluster transitions, according to Eq. (8), is given by

kF=PCikCiakfkf+kb=PCikCiakfeΔGf/kBT+eΔGb/kBT. (10)

For a fixed (forward) kinetic barrier ΔGf, a higher barrier ΔGb for the recrossing transition leads to a reduced rate for the rebound process, which effectively causes a more productive forward folding process. For a sufficiently low barrier ΔGb ≪ ΔGf, as shown in Fig. 2(B), the intermediate state a can quickly exchange populations with conformations (Ci) in cluster C to reach local equilibrium. As a result, a can be included in cluster C as a pathway conformation. In the limiting case with ΔGb→0, as shown in Fig. 2(C), the intermediate state a does not exist, and the intercluster transition becomes a cooperative process with no intermediates. The folding rate for the cooperative process is kFcoop=e(ΔGf(0)+ΔGf)/kBT=PCikCiakF.

From the above expressions for kF and kFcoop, we find that, for processes with the same total folding barrier ΔGf(0)+ΔGf, the noncooperative process (without a) could have a faster folding rate than the cooperative process (kF>kFcoop) provided that e−ΔGf/kBT+e−ΔGb/kBT<1. Physically, the above condition requires sufficiently high free-energy barriers ΔGf and ΔGb, i.e., sufficiently stable intermediate (but not stable enough to become a deep kinetic trap). The physical mechanism of such a local minima-mediated folding “acceleration” is due to the slowdown of the rebound process due to the kinetic barrier ΔGb for the recrossing, as compared with the cooperative process with no such kinetic barrier (ΔGb = 0) for the recrossing process.

The accelerated folding rate due to the entropic effect of the intermediates in the transition states was previously reported.40 It was found that for different landscapes with the same uphill barrier barrier ΔGf(0) in Fig. 2, stabilization of the intermediates could lead to a faster folding rate. The folding acceleration reported in that work is different from that reported here: First, we compare different folding processes with the same total folding barrier ΔGf(0)+ΔGf rather than those with the same ΔGf(0). Second, the acceleration of folding here is due to the reduced rebounds and the enhanced productive folding rather than the enhanced entropic effects in the transition state ensemble.

We can generalize the above approach to treat processes involving multiple intermediates between clusters C and N. For two sequentially distributed intermediates a and a′ shown in Fig. 2(D), we consider the following iterative process: a fraction of kb/(kf+kb) of the population in state a′ would rebound to state a, and a fraction of kf/(kf+kb) of such rebound population would refold and re-enter state a′. Therefore, similar to Eq. (7), the effective rate constant kCiN for the folding transition CiN is given by

kCiN(eff)=PCikCia(kfkf+kb)(kfkf+kb)×n=0(kfkf+kbkbkf+kb)n,

and the total folding rate is given by Eq. (8) with the rate constant kCiN(eff) given above. After simplification, we find kCiN(eff)=PCikCiakfkb/[(kf+kb)kf+kbkb], from which we obtain the following condition for the folding acceleration (i.e., kCiN(eff) < the cooperative folding rate PCikCiakfkb): (kf+kb)kf+kbkb<1.

III. ILLUSTRATIVE CALCULATIONS FOR BIOPOLYMER FOLDING PROBLEM

A. The model

Our purpose here is twofold: (i) to demonstrate the implementations of the kinetic cluster method for biopolymer folding and (ii) to validate the method through comparisons with the results from the original exact master equations with the complete conformational ensemble. We choose a simplified RNA hairpinlike folding model which allows the formation and disruption of all the possible hairpin conformations. We assume that the conformations are stabilized by the (sequence-dependent) stacking interactions, therefore, we can use stacks (= two adjacent intrachain contacts; see Fig. 3) to describe the conformational states. Unstacked intra-chain contacts are unstable and can be disrupted quickly. As a result, two conformations that differ by unstacked intra-chain contacts are classified as the same conformational state. To further simplify the calculation, we neglect the loop entropies. The total free energy of a conformation i, according to the nearest model,41,42 is equal to the additive sum of the free energies for all the constituent stacks: ΔGi = −Σall stacksHTΔS), where ΔS and ΔH are the entropic and enthalpic change upon the formation of the corresponding stack, respectively. The native state for a given sequence is the state with the lowest free energy.

FIG. 3.

FIG. 3

A kinetic move is defined as the formation/disruption of a stack (two adjacent intrachain contacts): (x,a,b,y) in (A) and (x2,x1,y1,y2) in (B), or a stacked contact: (x,y) in (C).

The methodology developed here does not rely on any particular choice of kinetic moves. However, to be specific in the illustrative computation, we define a kinetic move to be the formation or breaking of a stack or a stacked intrachain contact; see Fig. 3. As shown in the figure, a kinetic move corresponds to the addition/deletion of either one stack [see Figs. 3(A) and 3(B)] or possibly two consecutive stacks [see Fig. 3(C)].

We assume that rate constant kij for a kinetic move from state i to state j is determined by the free-energy barrier ΔGij for the kinetic move: kij=eΔGij/kBT. Furthermore, we assume that the formation of a stack is rate limited by the associated entropic decrease −ΔS and the disruption of a stack is rate limited by the associated enthalpic increase ΔH. Therefore, ΔG is equal to TΔS and ΔH for the formation and disruption of a stack, respectively, and the rate constants are given by

kij=(eΔS,eΔH/T)forthe(formation,disruption)ofthestack, (11)

where we have set kB = 1 to simplify the notation.

If at temperature T, the formation/disruption of a stack, e.g., (x,a,b,y) in Fig. 3, is distinctively slow as compared with the formation/disruption of all other stacks, the formation/disruption of stack (x,a,b,y) could be a rate-limiting step. As a result, conformations with and without this rate-limiting stack can be classified into two clusters, as illustrated in Fig. 1(B). Mathematically, for such well-separated distribution of the rate constants, the rate matrix can be transformed into block form21 such that the conformations within each cluster would have sufficient time to equilibrate before the rate-limiting stack is formed or disrupted; see Fig. 4.

FIG. 4.

FIG. 4

The block-form rate matrix for a two-cluster system, where the intracluster transition rates are large and the intercluster transition rates are small (and sparsely distributed). The column and row indexes represent the conformational states.

We allow the formation of all possible stacks, native or non-native, in the folding/unfolding processes. A stack is called native if the particular stack exists in the native state, and is called non-native otherwise. For the folding kinetics (a) the formation of a native stack with very large entropic reduction −ΔS is slow [see Eq. (11)] and is an on-pathway rate-limiting step; (b) the disruption of a non-native stack with very high enthalpic barrier ΔH is slow [see Eq. (11)] and is an off-pathway rate-limiting step. In general, for a given RNA sequence, we can identify the folding rate-limiting steps by examining the ΔS and ΔH parameters for all the possible stacks and other structural units. The formation of native structure with large ΔS values is on-pathway rate-limiting steps, and the non-native structures with large ΔH are off-pathway rate-limiting kinetic traps.

We choose a 16-nt RNA hairpin-forming chain, and assume a uniform entropic and enthalpic changes for the formation of all stacks: (−ΔS0,−ΔH0) = (−5,−2), except for those special rate-determining stacks with distinctively larger enthalpic or entropic parameters. We denote the entropy and enthalpy parameters for the special rate-limiting stacks as ΔH* and ΔS*. For RNA secondary structures under 1M NaCl salt condition, the sequence-dependent enthalpic and entropic parameters for different stacks have been measured experimentally,41 but here we use the simplified values for the purpose of model illustration, though the method developed here can be directly applied to realistic RNA molecules by using the realistic enthalpy and entropy parameters. For a 16-nt hairpin chain here, there are totally 391 stack-based hairpin conformational states. For the enthalpic and entropic parameters that we use, the native structure is a hairpin structure with a helical stem of seven consecutive stacks [see Fig. 5(A)].

FIG. 5.

FIG. 5

(A) The native state and (B) the 35 intercluster pathways. Clusters N and C are for the conformations with and without the rate-limiting stack [shaded in (A)].

We will apply the kinetic cluster method to thoroughly investigate the following representative folding scenarios for the simplified 16-nt chain model: folding with one on-pathway rate-limiting step, folding with one off-pathway kinetic trap and folding with two different on-pathway rate-limiting steps. The purpose of the third model calculation is to illustrate the application of the kinetic cluster analysis to a multiple rate-limiting steps folding process. Our strategy is to design different folding pathways by assigning different energy and entropy parameters (ΔH* and ΔS*) for the designed rate-determining stacks. For realistic RNA molecules, such an approach can effectively corresponds to the design of different sequences. For each designed model sequence, the kinetic cluster results will be tested against the exact solutions from the exact master equations for the complete 391 chain conformations.

B. Folding and unfolding kinetics with an on-pathway rate-limiting step

As shown in Fig. 5(A), we assume that the formation of the stack (5,6,11,12) involves a large entropic decrease ΔS*(>ΔS): (−ΔS*,−ΔH*) = (−15,−6). For the given entropy and enthalpy parameters, the heat capacity melting curve shows4346 a single melting transition at the melting temperature Tm = 0.4. We first investigate the kinetics at temperature T = 0.2 (<Tm). The native state has a fractional population of 98% at T = 0.2. Therefore, the relaxation process is predominantly a folding process.

1. Kinetic clusters

Because the ΔS* and ΔH* are larger than ΔS0 and ΔH0, respectively, according to Eq. (11), the formation/disruption of the native stack (5,6,11,12) is distinctively slower than the formation/disruption of other stacks. As shown in Fig. 6, the rate matrix can be transformed into a block form, and the conformations can be classified as two clusters: N and C for conformations with and without the rate-limiting stack (5,6,11,12). Clusters C and N have ΩC = 353 and ΩN = 38 conformations, respectively.

FIG. 6.

FIG. 6

The transformed block-form rate matrix obtained by sorting conformational states into two distinctive groups (clusters) such that the intracluster transition rates are large and the intercluster transition rates are small (and sparsely distributed).

2. Intercluster kinetics

There are ωCN = 35 intercluster pathways, each corresponding to the addition or deletion of stack (5,6,11,12). Most pathways have rate constants (e−ΔS*,e−ΔH*/T) = (e−15,e−30) for the (formation, disruption) of the (5,6,11,12) stack; other pathways involve the simultaneous formation/disruption of two stacks [e.g., C4N4 in Fig. 5(B); see also Fig. 3(C)] and thus have much slower rates (e−(ΔS;0+ΔS*),e−(ΔH0H*)/T) = (e−17,e−42.5). Equations (2)(5) give the intercluster rate constants kCN = 2.088×10−10 and kNC = 6.3×10−16. Under the folding condition T = 0.2, the folding rate kF = kCN is much larger than the unfolding rate kU = kNC and the relaxation rate kR is determined by kR = kCN+kNC ≃ 2.088×10−10.

The above kinetic cluster analysis can be tested by the rigorous master equation approach for the complete conformational ensemble of the 391 states. By diagonalizing the 391×391 rate matrix, we find that the first four nonzero eigenvalues are 2.087×10−10, 2.27×10−6, 4.88×10−6, and 5.17×10−6. Because there exists a large gap between the first and the second nonzero eigenvalue, the relaxation kinetics is predominantly determined by the slowest mode with a rate constant of 2.087×10−10, which is in very good agreement with the result kR ≃ 2.088×10−10 computed above from the kinetic cluster analysis.

3. Folding pathways

We assume that the chain is initially in the fully extended conformation. After an initial quick equilibration, the ΩC = 353 conformations within cluster C form a kinetic intermediate before folding to cluster N. There are ten stable conformations each with free energy G = −5, which is much lower than the next lowest free energy G = −4 with a gap of 5kBT.

The ten kinetic intermediates contain no native contacts and so they appear as off-pathway kinetic traps in the populational kinetics in Fig. 7. However, since the detrapping from these intermediates are faster than the formation of the rate-limiting stack (5,6,11,12), the ten intermediates are neither off-pathway kinetic traps nor obligatory on-pathway intermediates, because they are not pathway conformations. Therefore, the disruption of a non-native kinetic intermediate is not necessarily an off-pathway rate-limiting step.

FIG. 7.

FIG. 7

The population kinetics solved from the exact master equation for the complete conformational ensemble for the folding from the fully extended state. The dashed line is for the population of each of the ten kinetic intermediates, and the dotted line is for the total population of the ten intermediates.

The most probable pathway is found to be C35N35 due to the low free energy (and large fractional population) of state C35. About 95% of the populations fold through this most probable pathway. The most probable folding pathways are dependent on the initial state. For example, the folding for an initial state within the native cluster N would be fast intracluster equilibration.

Figure 8 gives the folding, unfolding, and the relaxation rate for a broad temperature range. The kinetic cluster method gives quite accurate results as compared with the results from the exact 391×391 rate matrix.

FIG. 8.

FIG. 8

The temperature dependence of the folding rate kF (dashed line) and the unfolding rate ku (dashed line) solved from the kinetic cluster analysis. The solid line and the symbols are for the relaxation rate kF+ku solved from the cluster model and from the master equation, respectively.

C. Folding with an off-pathway kinetic trap

We assume that the non-native stack (2,3,6,7) is stabilized by a large enthalpic decrease ΔH*> ΔH0 : (−ΔS*,−ΔH*)=(−2,−3.4). Therefore, the disruption of non-native stack (2,3,6,7) is an off-pathway rate-limiting step due to the large enthalpic barrier ΔH*. The thermal melting curve shows4346 a single peak at the melting temperature Tm = 0.4. We consider the folding process at T = 0.2<Tm. At T = 0.2, the native state [see Fig. 5(A)] occupies a fractional population of 92.3%, and thus the relaxation process is predominantly a folding process. As a result, the relaxation rate is approximately equal to the folding rate kF.

1. Kinetic clusters

The breaking of the non-native stack (2,3,6,7) with a rate constant of e−ΔH*/T = e−17 is much slower than both the disruption rate e−ΔH0/T = e−10 and the formation rate e−ΔS0 = e−5 for all the other stacks. Therefore, all conformations with the stack (2,3,6,7) would form a pre-equilibrated cluster C, and the disruption of the stack (2,3,6,7) in these conformations are the rate-limiting steps in the folding process.

We note that the formation rate for the rate-limiting stack e−ΔS* = e−2 for stack (2,3,6,7) is faster than the disruption and the formation rate for all other stacks. Therefore, it is not appropriate to classify all the conformations without the (2,3,6,7) stack as a pre-equilibrated cluster, because these conformations may form the (2,3,6,7) stack and thus escape the cluster quickly; as a result, conformations without the (2,3,6,7) stack do not form a quasiclosed system and the conditions in Eq. (6) for the formation of a cluster is not satisfied. However, the folding rate, which is essentially equal to the rate for the detrapping from cluster C, is mainly determined by the populational distribution of the trapped conformations in cluster C, not the distribution of the conformations outside cluster C. Therefore, we can compute the folding rate from the inter-“cluster” transitions: cluster C→ (all the conformations outside cluster C). In addition, due to the large rate for the formation of the (2,3,6,7) stack, conformations detrapped from cluster C could quickly return to cluster C. Such “rebounds” effect would play an important role in the folding kinetics.

2. Detrapping kinetics

We use N to denote the ensemble of conformations outside cluster C, i.e., all the conformations without the stack (2,3,6,7). There are ΩC = 6 conformations in cluster C and ωCN = 6 corresponding detrapping pathways; see Fig. 9. Without considering the rebounds effect, the total detrapping rate kCN is determined by Eq. (5) as the sum over the six pathways.

FIG. 9.

FIG. 9

The detrapping pathways for the six trapped states.

We use the detrapping pathway C6N6 to illustrate the calculation. The transition C6N6 has a rate constant of kC6→N6 = e−ΔH/T = e−17 for the breaking of the (2,3,6,7) stack. To account for the rebounds effect, we note that N6 is connected to 25 kinetically neighboring states through 25 different pathways. Among these 25 kinetically neighboring states, one (C6) is in the trapping cluster C and the other 24 are outside C. The pathway (state N6)→(state C6) for the formation of the (2,3,6,7) stack is the recrossing transition and has a rate constant of kb(N6)=eΔS=e2. Among the other 24 pathways, 23 pathways correspond to the different ways to add a stack to N6 with a rate constant of e−ΔS0 and one pathway corresponds to the disruption of the (1,2,15,16) stack with a rate constant of e−ΔH0/T. The total rate for the detrapping through state N6 is given by the sum for the 24 pathways: kf(N6)=23·eΔS0+eΔH0/T=23·e5+e10.

According to Eq. (7), for a folding process starting from state C6, a fraction of kf(N6)/(kb(N6)+kf(N6))=53.5% of the population would fold to conformations in N. The remaining 46.5% of the population would recross the barrier and become retrapped in cluster C before refolding to the detrapped ensemble N. Therefore, the effective rate of detrapping from cluster C through route C6N6→ (ensemble N) is given by Eq. (7): kC6N(eff)=(PC6)kC6N6kf(N6)/(kb(N6)+kf(N6))=4.41×109. According to Eq. (8), the total rate for CN is determined by the sum for all six trapped conformations C1C6 in Fig. 9: kF=i=16kCiN(eff)=1.07×108. Under the strong folding condition T = 0.2, the unfolding rate kU is small, and the relaxation rate ≃kF = 1.07×10−8. This result is in close agreement with the exact master equation result, which gives a lowest nonzero eigenvalue of 1.01×10−8.

3. The importance of the rebounds effect and the dominant detrapping pathways

kCi→N (without the rebounds effect) = 0.005 57, 0.827, 0.827, 0.827, 0.827, 0.827 (× 10−8) for i = 1,2,3,4,5,6, respectively, and kCiN(eff) (with the rebounds effect) = 0.003 54, 0.000 277, 0.0751, 0.213, 0.34, 0.44 (× 10−8) for i = 1,2,3,4,5,6, respectively. The total rate is 4.14×10−8 without the rebounds effect and 1.07×10−8 with the rebounds effect. The neglect of the rebounds effect indeed causes an significant inaccuracy in predicting the transition rates. In addition, from the largest kCiN(eff) values, we find that the dominant contributions for the detrapping of states C4, C5 and C6.

D. Two on-pathway high barriers

In this section, we show how to treat a network of clusters in the presence of multiple rate-limiting steps. We assume that the formation of native stacks (2,3,14,15) and (5,6,11,12) are limited by the large entropic cost ΔS* (>ΔS0): (−ΔS*,−ΔH*) = (−15,−6), so the formation of (2,3,14,15) and (5,6,11,12) are rate-limiting in the folding process. For the given parameters, the melting temperature of the molecule is found4346 to be Tm = 0.4. We consider kinetics at temperature T = 0.25 (<Tm).

1. Kinetic clusters

The conformational space can be classified into the following four clusters: C for the 275 conformations with neither of the two stacks formed, C′ for the 78 conformations with only stack (2,3,14,15) formed, C″ for the 30 conformations with only stack (5,6,11,12) formed, and N for the eight conformations with both stacks formed. The native state [see Fig. 5(A)] is in cluster N, and the fully unfolded state is in cluster C. The existence of the four clusters is evident from the block-form rate matrix shown in Fig. 10, such that the intracluster transition rates are notably higher than the (sparse) intercluster transition rates.

FIG. 10.

FIG. 10

The transformed block-form rate matrix obtained by sorting conformational states into four distinctive groups (clusters).

2. Intercluster kinetics

The folding process from cluster C to cluster N can be represented by two parallel pathways: CC′↔N and CC″↔N. The folding kinetics can be solved from the master equation for the four-state system (C, C′, C″, and N). The key is to compute the 4×4 rate matrix for the transition rates between different clusters.

There are ωCC = 26 pathways connecting conformations in clusters C and C″, each corresponding to the addition and deletion of the (5,6,11,12) with rate e−ΔS* = e−15 and e−ΔH*/T = e−24, respectively. Equations (2)(5) give the following intercluster transition rates: kCC = 6.38×10−10 and kC″ → C = 2.02×10−11.

Equations (2) and (5) give the following most probable pathways for CC″: C4C2;C9C6;C10C3;C5C4, each with a rate constant of 1.37×10−10. Therefore, among the 26 pathways, the probability for the molecule to take one of the above four pathways is 1.37×10−10/6.38×10−10 = 21.5%, and thus about 4×21.5% = 86% of the CC″ transitions are through these four pathways.

Using the similar analysis, we can compute the intercluster rate constants and pathways for the other clusters, and obtain the following 4×4 rate matrix for the four-cluster system:

[8.54×1097.90×1096.38×10100.01.49×10112.34×1090.02.33×1092.02×10110.03.904×1083.902×1080.01.89×10121.89×10123.78×1012],

where the row a the order of C, C′, C″, and N. The above rate matrix has the following eigenvalues:

λ0=0;λ1=2.32×109;λ2=8.56×109;λ3=3.90×108. (12)

From the eigenvector analysis,17 we find that the kinetic modes for λ1, λ2, and λ3 correspond to the (rate-limiting) transitions C′→N, CC′, and C″→N, respectively. This is consistent with the following relationships between the eigenvalues and the intercluster rate constants: λ1kC′→N +kNC, λ2kCC+kC′→C, and λ3kC″→N+kNC.

Figure 11 shows that for a wide range of temperatures (T = 0.25– 2.0), the relaxation rates obtained from the kinetic cluster analysis agrees with the eigenvalues for the original 391×391 rate matrix.

FIG. 11.

FIG. 11

The lines are for the first three nonzero eigenvalues of the 4×4 rate matrix for the four-cluster system for T = 0.2– 1. The symbols are for the first three nonzero eigenvalues of the original 391×391 rate matrix for the complete conformational ensemble.

3. Folding pathways

Initially, the 275 conformations within cluster C quickly equilibrate and are distributed according to their free energies. As a result, the four most stable states in cluster C would appear as kinetic intermediates in the populational kinetics.

From the two lowest eigenmodes λ1 and λ2, we find that the major folding pathway is CC′ (rate = λ2) followed by C′→N (rate = λ1). Furthermore, according to the dominant pathways for CC′ and C′→N, we obtain the two parallel most probable folding pathways as shown in Fig. 12.

FIG. 12.

FIG. 12

The most probable folding pathways from C to N.

A notable feature of the dominant pathways is the absence of the CC″ transition. One might expect that two parallel pathways, CC′→N and CC″→N, would have equal probability because they have exactly the same total kinetic barrier 2TΔS*, where TΔS* is the kinetic barrier for the formation of either stack (2,3,14,15) or stack (5,6,11,12). However, the kinetic partitioning is not only determined by the total barrier along the pathway, but also determined by the distribution of the barriers along the pathways.

Physically, because kCCkCC, cluster C″ is produced much more slowly than cluster C′, and thus the population in cluster C″ does not accumulate significantly. As a result, most of the population would fold along the CC′→N pathway. According to intercluster rate constants (see the 4×4 rate matrix above for the four-cluster system), the relative populational partitioning for the folding along CC′ and for the folding along CC″ is approximately/kCC/kCC = 7.90×10−9/6.38×10−10 = 12.38.

From the populational kinetics for each cluster [see Fig. 13(A)] we find a well-populated transient accumulation for cluster C′, while there is virtually no populational accumulation for cluster C″ during the folding process. Furthermore, in Fig. 13(B), we show the net flux for transitions CC′, C′→N, CC″ and C″→N: PCC(t)=0t(PC(t)×kCCPC(t)×kCC)dt, etc. We find that the fluxes for transitions CC′ and C′→N are much larger than the fluxes for CC″ and C″→N. These results confirm that the dominant pathway is CC′→N.

FIG. 13.

FIG. 13

(A) The population kinetics of each cluster. (B) The net flux for the cluster transitions.

IV. TEMPERATURE DEPENDENCE OF THE FOLDING RATE

From Eqs. (2)(5), the intercluster transition rate kCN can be written in the following form:

i=1ΩCNkCiNi(Pi/Ppath)1+(Pnonpath/Ppath), (13)

where kCi→Ni is the rate constant for the transition CiNi, GCi is the free energy of state Ci, and the ratio

Pi/Ppath=eGCi/T/j=1ΩCNeGCj/T, (14)

is the fractional population of the ith pathway conformation, and the ratio

Pnonpath/Ppath=j=1+ΩCNΩCeGCj/T/j=1ΩCNeGCj/T, (15)

is the relative population between the nonpathway and the pathway conformations in cluster C.

From Eq. (13), the temperature dependence of the folding rate kF comes from the following two factors:

  1. Pnonpath/Ppath for the relative population between the pathway and nonpathway conformations. A larger population of the available pathway conformations would give a larger total intercluster folding rate. Moreover, larger energy gap between the pathway and nonpathway conformations leads to a stronger temperature dependence of Pnonpath/Ppath and thus a stronger temperature dependence of the folding rate.

  2. The heterogeneity in the population of pathway conformations, as quantified by the fractional distribution Pi/Ppath for each pathway conformation Ci (i = 1,2,…,ΩCN). If different intercluster transitions CiNi have the same rate constant kCi→Ni = k0, the temperature dependence due to Pi/Ppath vanishes because Σi=1ΩCN (kCi→Ni)(Pi/Ppath) = k0 is a constant.

In general, different pathway conformations can have different rate constants, e.g., pathway i may have slower rate than pathway j: kCi→Ni <kCj→Nj. In such case, if energy Ei of conformation i is higher than energy Ej of conformation j, the relative population between conformation i and j is Pi/Pj~e−(Ei − Ej)/T would increase as temperature increases, causing a higher probability to fold through the slow pathway CiNi, and thus a slower folding rate.

The above two factors are determined by the free-energy landscapes of the molecule. Different free-energy landscapes, e.g., due to different sequences, can have very different temperature dependence of the folding kinetics. Mutations can cause different effects for Pnonpath/Ppath and for Pi/Ppath, and the interplay between these two factors leads to complex sequence dependence for the temperature dependence of the folding kinetics. For example, mutations can change the folding rate by altering the relative stability and hence the relative distribution (Pnonpath/Ppath) for the pathway and nonpathway conformations in cluster C. Since the predominant majority of the nonpathway conformations contains non-native stacks, stabilization of the non-native stacks can lead to more populated (misfolded) nonpathway conformations, causing a larger Pnonpath/Ppath (and a smaller folding rate kF). Moreover, the stabilization of the misfolded conformations in cluster C may cause the formation of metastable misfolded kinetic intermediates. The interplay of these two effects would cause a slower folding in general.

Similarly, destabilization of the non-native stacks leads to a smaller population of the nonpathway conformations (i.e., smaller Pnonpath/Ppath) and thus a faster folding. For example, in the aforementioned folding model with one on-pathway rate-limiting step, if the entropy and enthalpy parameter of a non-native stack is changed from (−ΔS0, −ΔH0)=(−5,−2) to (−ΔSn, −ΔHn)=(−7,−1.40), a non-native stack would be destabilized by a free-energy change of (ΔHnTΔSn)−(ΔH0TΔS0)=1 at temperature T=0.2 and we expect an accelerated folding. Indeed, our kinetic cluster analysis shows that the folding rate kF is significantly increased from 2.087×10−10 to 7.59×10−8.

The correlation between the folding speed and the relative stability between the native and non-native stacks would help explain and design mutational folding kinetics experiments. In the what follows, based on the 16-nt RNA hairpin model in Fig. 5(A), we investigate the temperature dependence of the folding rate for four mutated model systems.

A. The formation of native stack (6,7,10,11) as a rate-limiting step

Among the 341 conformations in cluster C, there are ΩCN = 50 pathway conformations. All 50 intercluster pathways are for the formation of the rate-limiting stack (6,7,10,11) with the rate constant k0 = e−ΔS*. Therefore, according to Eq. (13), the temperature dependence of kF comes from the factor Pnonpath/Ppath. Figures 14(I) and 14(II) show the results for two different models.

FIG. 14.

FIG. 14

(A) The temperature dependence of folding rate kF (dashed line), unfolding rate ku (dotted line), and the relaxation rate kF+ku (solid line) solved from the kinetic cluster method. The filled square symbol represents the lowest nonzero eigenvalue of the original 391×391 rate matrix. (B) The temperature of folding rate kF (solid line) and the ratio Pnonpath/Ppath for the populations of the misfolded state and the on-pathway state (dashed line).

In Fig. 14(I), we assume (−ΔS0,−ΔH0) = (−3.0, −10.0) for all the native stacks except for the rate-limiting stack (6,7,10,11) which has (−ΔS*,−ΔH*) = (−12.0, −40.0), and we assume (−ΔSn,−ΔHn) = (−3.0,−3.0) for all the non-native stacks. As shown in the figure, as T increases, the nonpathway conformations become more and more stable relative to the pathway conformations, causing a monotonically decreasing folding rate.

Figure 14(II) shows the kinetics for a similar model with a different set of parameters: (−ΔS0,−ΔH0) = (−3.0,−6.0), (−ΔS*,−ΔH*) = (−12.0,−24.0), and (−ΔSn,−ΔHn)=(−4.0,−5.8). The relative stability of the non-pathway conformations first increases then decreases as T is increased, causing a V-shaped kF versus T curve.

B. The formation of stack (5,6,11,12) as a rate-limiting step

In this case, there exist two clusters N and C for conformational with and without the (5,6,11,12) stack, respectively. There are ΩC = 353 conformations in cluster C and ΩN = 38 conformations in cluster N. There are ΩC = 353 conformations in cluster C, and ΩCN = 35 of these conformations are pathway conformations. Unlike the model with (6,7,10,11) as the rate-limiting stack, the current model involves inhomogeneous rate constants for different intercluster pathways. There are 32 pathways for the formation of the rate-limiting stack (5,6,11,12). Each of these 32 pathways has a rate constant of k1 = e−ΔS*. The remaining three pathways correspond to the formation of two consecutive stacks [see Fig. 3(C)]: the (5,6,11,12) stack and a non-rate-limiting native stack. Each of these three pathways has a rate constant of k2 = e−(ΔS*S). In this case, both Pnonpath/Ppath and Pi/Ppath for each pathway conformation Ci (i = 1,2,…,ΩCN) would contribute to the temperature dependence of kF. The combination of Pnonpath/Ppath and Pi/Ppath can give very complex kF versus T behavior. Figures. 14(III) and (IV) show the results for two different models.

In Fig. 14(III), we assume (−ΔS0,−ΔH0) = (−3.0,−10.0), (−ΔS*,−ΔH*) = (−12.0,−40.0), (−Δ Sn, −DHn) = (−3.0,−3.0). The factor Pnonpath/Ppath, shows a V-shape behavior as a function of T, which alone would tend to cause a L shape (increasing then decreasing) of the folding rate. However, the Pi/Ppath factor tends to cause monotonically faster folding for higher temperatures. This is because the conformations Ci on the 32 fast-folding pathways (with rate k1) become more stable than the slow-folding conformations, so Pi/Ppath is larger for the fast-folding conformations. The combination of the above two factors leads to a monotonically increasing folding rate.

In Fig. 14(IV), we assume (−ΔS0, −ΔH0) = (−8.0,−10.0), (−ΔS*,−ΔH*) = (−15.0,−40.0), (−Δ Sn, −ΔHn)=(−3.0,−3.0). The Pnonpath/Ppath factor, which shows an inverted N shape, dominates over the Pi/Ppath factor, causing a N shape for the kF versus T curve.

V. APPLICATION TO REALISTIC RNA FOLDING KINETICS

For a realistic RNA hairpin-forming sequence, the clustering is so complex that any method based on simple inspection of the rate constants becomes impossible. The free energies and the rate matrix can be constructed by using the experimentally measured enthalpy and entropy parameters41 ΔH and ΔS for all possible base stacks. The sequence and the native structure of the hairpin-forming RNA are shown in in Fig. 15(A). There are totally 879 native and non-native structures for this sequence.

FIG. 15.

FIG. 15

(A) The native structure for a realistic RNA hairpin-forming sequence. (B) The populational kinetics for a folding reaction at T = 30 °C. Kinetics for clusters (cluster 2, 3, 4, and 5) whose fractional population never exceeds 10% are not shown in the figure. (C) The network of kinetic pathways between different clusters. The completely unfolded state is in cluster 1 and the native state is in cluster 7.

After examining the enthalpies and entropies for the formation for all the possible different base stacks, we find that there are two on-pathway rate-limiting steps corresponding to the formation of the native stacks (3,4,18,19) = (U,C,G,A) and (5,6,16,17) = (G,A,U,C), respectively, and two off-pathway rate-limiting steps, corresponding to the disrupting of the non-native stacks (5,6,11,12) = (G,A,U,C), and (11,12,18,19) = (U,C,G,A). According to these rate-limiting steps, the conformation space can be classified into the following seven clusters: cluster 1 for the 586 conformations without any of the four stacks formed, cluster 2 for the 105 conformations with stack (3,4,18,19), cluster 3 for the 51 conformations with stack (5,6,11,12), cluster 4 for the 76 conformations with stack (5,6,16,17), cluster 5 for the 36 conformations with stack (11,12,18,19), cluster 6 for the five conformations with both stacks (3,4,18,19) and (5,6,11,12), and cluster 7 for the 20 conformations with both stacks (3,4,18,19) and (5,6,16,17). The native state is in cluster 7, the fully unfolded state is in cluster 1, and there are three misfolded traps: cluster 3, 5 and 6. We consider a folding process starting from the completely unfolded state incluster 1 at T = 30 °C. The chain initially undergoes fast equilibration to form the quasiequilibrium cluster 1. After the initial formation of the cluster 1, the chain would fold to the native state which has a fractional population of 85% in the final equilibrium state by different pathways. As shown in Fig. 15(C), some of the folding population will directly cross the on-pathway rate-limiting step, (e.g., pathway 1→2→7 and 1→4→7), and some of the population will undergo trapping and detrapping processes before folding to the native state by crossing the on-pathway rate limiting step (e.g., 1→5→1→4→7, 1→3→6→2→7, etc.)

Using Eqs. (2)(5), we obtain the 7×7 rate matrix k for the seven-cluster system

[42.018.62.007.1614.20.00.02.011.023×1030.00.00.00.2131.021×10329.40.058.50.00.029.10.02.980.00.07.93×1020.00.07.90×1023.660.00.00.03.660.00.00.029.429.40.00.058.80.00.00.4930.09.88×1020.00.00.592],

where matrix element kij (ij) is the rate constant for the transition from cluster i to cluster j. The above rate matrix has the following eigenvalues in the increasing order:

λ0=0;λ1=2.30;λ2=27.6;λ3=44.5;λ4=88.5;λ5=7.93×102;λ6=1.02×103. (16)

Using the eigenvector analysis,17 we can identify the kinetic modes for each of the eigenvalues. For example, the lowest nonzero eigenvalue λ1 corresponds to the intercluster transition 5→1 for the detrapping of the stack (11,12,18,19), λ2 corresponds to the detrapping of stack (5,6,11,12), and λ3 corresponds to the formation of the on-pathway rate-limiting stacks (3,4,18,19) and (5,6,16,17). From the rate matrix, we can also estimate fractional population for each pathway starting from cluster 1.

  1. Cluster 1→cluster 5 (misfolded): k15/(−k11) = 34%, which means about 34% population will be first misfolded into cluster 5 before detrapping (eigenmode λ1). Moreover, since k15>k51, we would expect kinetic accumulation in cluster 5.

  2. Cluster 1→cluster 3 (misfolded): k13/(−k11) = 5%, which implies that about 5% of the population would first fold into cluster 3.

  3. Cluster 1→cluster 2 (on-pathway) and cluster 1→cluster 4 (on-pathway): (k12+k14)/(−k11) = 61%. Since cluster 2 is kinetically connected to the misfolded trapping clusters 3 and 6, the fractional population for the folding without being trapped would be less than 61%. And the fraction population for the folding through the trapping–detrapping processes would be more than 5%. Moreover, because k27k12 and k47k14, the fraction of population in clusters 2 and 4 would quickly fold into the native cluster 7. As a result, there is no significant kinetic accumulation of population in clusters 2 and 4.

According to Eqs. (16) and (1), we obtain the population for the native cluster 7: Pcluster7(t)=Pcluster7eq0.40eλ1t0.156eλ2t0.462eλ3t, where Pcluster7eq=0.999 is equilibrium population of cluster 7. Because the fractional population of the native state in cluster 7 is Pnativestateeq/Pcluster7eq=0.85, we obtain the following native populational kinetics: Pnativestateeq=0.850.33eλ1t0.13eλ2t0.39eλ3t. In Fig. 15(B), we plot the populational kinetics curves for the native state and for all the 7 clusters. Indeed, we find that cluster 5 is a kinetic intermediates while cluster 2 and 4 do not show significant kinetic accumulation, which agrees well with the analysis.

To validate our kinetic cluster analysis, we have also solved the eigenmodes for the original 879×879 rate matrix for the complete 879 conformational states. The first seven eigenvalues are

λ0=0;λ1=2.27;λ2=27.5;λ3=43.8;λ4=88.1;λ5=7.92×102;λ6=1.02×103. (17)

We find that the first seven eigenvalues of the 879×879 rate matrix agree nearly exactly with the above eigenvalues for the 7×7 rate matrix for the intercluster kinetics. Furthermore, the native populational kinetics solved from the original 879×879 rate matrix is given by Pnativestateeq=0.850.34eλ1t0.13eλ2t0.38eλ3t, which also agrees very well with results from the cluster analysis.

VI. DISCUSSIONS

We have developed a kinetic cluster method to analyze the folding rates and folding pathways. The method is based on the classification of the conformational ensemble into clusters. Different clusters are separated by high kinetic barriers, and thus conformation in each cluster can pre-equilibrate before crossing the intercluster barriers. In terms of the clusters, the overall kinetic process can be represented as the intercluster transitions.

Our intercluster transition rate calculation accounts for the effect of multiple pathways and the effect of possible local minima on the free-energy landscape between the clusters. We are able to identify the dominant pathways from the intercluster kinetic analysis. In addition, we found that for nontrapping local minima, the increase of the stability may accelerate the folding process.

Conformations can be classified into different clusters in different temperature regimes. For example, if the formation of a native stack n causes a large entropic loss ΔSn, conformations without the rate-limiting stack n would form a pre-equilibrated cluster C. If there exists a non-native stack nn that requires a large enthalpic cost ΔHnn for disruption, then for low temperatures T<ΔHnn/ΔSn, the barrier ΔHnn (for the breaking of nn) would exceeds the barrier TΔSn (for the formation of n), and thus the disruption of the misfolded conformations with stack nn is slower than the formation of the rate-limiting native stack n. As a result, conformations with stack nn must be separated out from cluster C to form a separate cluster. However, for temperatures T>ΔHnn/ΔSn, the misfolded conformations with stack nn can quickly equilibrate with other conformations in cluster C, and thus need not be treated as a separate cluster.

Moreover, within a given temperature regime, the conformations can be classified as the same set of clusters, but the folding rate can be quite different for different temperatures. Based on the kinetic cluster analysis, we are able to analyze the temperature dependence of the folding and unfolding rate from the relative stability between the pathway conformations and the nonpathway conformations and that between different pathway conformations. The ability to predict and to analyze the temperature dependence of the folding rate would greatly enable us to design sequences with specific temperature dependence of the folding rate.

For all the sequences and energy landscapes that we have investigated, we found that the quasiequilibrium condition was satisfied for all the conformations within the clusters. We assume that the intracluster transitions are fast so that fast equilibration between conformations can be realized. For conformations that cannot be directly converted through a single kinetic move, we assume that there exist fast routes through multiple (fast) kinetic moves to connect these conformations. In fact, such fast intracluster routes have been identified for a number of conformations that we have examined.

The present theory has been illustrated by the computation with the simple hairpin-forming molecules. For the purpose of illustrating the principle of the method, we used simplified models, but the kinetic cluster theory developed here is general and can be developed to treat the folding problems of complex biopolymers with realistic chain length.

Acknowledgments

This work has been supported by grants from NIH (GM063732) and from AHA National Center (0130064N).

References

RESOURCES