Significance
Theoretical and experimental efforts during the last two decades have revealed many important features of protein folding. However, these efforts have largely focused on single-domain proteins, and additional methods and concepts have been awaited to understand the rich variety of multidomain proteins. To understand the relationship between the topology and folding pathways of multidomain proteins, we developed an Ising-like structure-based model. Our simulated results showed that a dominant pathway is selected in an example protein, dihydrofolate reductase (DHFR), which prevents the rapid decrease in conformation entropy during the course of folding. However, in its circular permutant, for which the topological complexity of wild-type DHFR is resolved, two pathways coexist, resulting in a complex folding behavior.
Keywords: folding intermediates, internal friction, eWSME model
Abstract
How do the folding mechanisms of multidomain proteins depend on protein topology? We addressed this question by developing an Ising-like structure-based model and applying it for the analysis of free-energy landscapes and folding kinetics of an example protein, Escherichia coli dihydrofolate reductase (DHFR). DHFR has two domains, one comprising discontinuous N- and C-terminal parts and the other comprising a continuous middle part of the chain. The simulated folding pathway of DHFR is a sequential process during which the continuous domain folds first, followed by the discontinuous domain, thereby avoiding the rapid decrease in conformation entropy caused by the association of the N- and C-terminal parts during the early phase of folding. Our simulated results consistently explain the observed experimental data on folding kinetics and predict an off-pathway structural fluctuation at equilibrium. For a circular permutant for which the topological complexity of wild-type DHFR is resolved, the balance between energy and entropy is modulated, resulting in the coexistence of the two folding pathways. This coexistence of pathways should account for the experimentally observed complex folding behavior of the circular permutant.
Topology of protein conformation, or the spatial arrangement of structural units and the chain connectivity among them, is a key determinant of the folding mechanisms of proteins (1–5). However, predicting a folding pathway is a subtle problem when a protein comprises multiple regions of cooperative structure formation (i.e., foldons or domains). Given that a protein has n such cooperative regions and each region tends to show a two-state–like structural transition between ordered and disordered states, the protein as a whole can have conformation states and multiple folding routes passing through them are allowed. The statistical weights of these folding routes should be determined both by the interactions among structural regions and the strength of cooperativity within individual regions (6). When multiple competitive routes coexist, the observed folding pathway of an ensemble of molecules should be a superposition of these routes, and the dominant folding pathway should be flexibly changed by changing the solution conditions or by mutations. The multiplicity and flexibility of pathways are important, even for small single-domain proteins like ribosomal protein S6 (7, 8), and are evident for proteins that have repeating structures (9–13). For proteins comprising multiple domains (14), the multiplicity of possible folding pathways is significant. The relative importance among conformation states in the folding process in proteins with n independently foldable domains should be determined by length, structure (13), the topological connectivity of linkers between domains (3), and the interactions at the interface between domains (3, 15, 16). Fig. 1A shows an example protein for the case .
Fig. 1.
Examples of two-domain proteins with different topological complexities. (A) Human γD-crystallin (PDB ID: 1HK0), which has two independently foldable domains connected by a single linker. (B) DHFR (PDB ID: 1rx1), which is topologically more complex, comprising two domains, DLD (blue) and ABD (pink). DLD is a discontinuous domain comprising the N- and C-terminal parts of the chain, and ABD and DLD are connected by two linkers. The positions of linkers are designated by red arrows.
The above mechanism for determining folding intermediates and pathways of multidomain proteins is not applicable when domains have mutually correlated folding tendencies. In particular, the correlation between domains may be significant in a topologically complex protein, which has a domain comprising multiple discontinuous parts of a chain. For example, consider one domain, a discontinuous domain, consisting of residues and , and another domain, a continuous domain, consisting of residues . Because there is a tendency that the continuous parts of the chain form “islands” of ordered structures (17, 18) and that these continuous parts of a sequence can be the nuclei for folding, the discontinuous domain may not be an independent folding unit, but may depend on the continuous domain. In this paper, we theoretically analyze the problem of how a folding pathway is selected in multidomain proteins that have a discontinuous domain by using Escherichia coli dihydrofolate reductase (DHFR) as an example and compare it with its circular permutant that consists only of continuous domains.
As shown in Fig. 1B, DHFR is a 159-residue protein consisting of two domains: a discontinuous loop domain (DLD) (residues 1–37 and 107–159) and an adenosine-binding domain (ABD) (residues 38–106). Because DLD does not include a single contiguous region of the chain, but rather includes separate N- and C-terminal parts, the structural ordering of DLD can be correlated with the structural ordering of ABD. As a model protein, DHFR has been intensively investigated (19–29), which has resulted in a picture that DHFR folds along the following pathway:
[1] |
Here is an intermediate exhibiting heterogeneous compactness with DLD being only partially compacted but ABD attaining a native-like compactness (19). appeared in after folding was initiated from the unfolded state (U). During , further structural development was observed both in ABD and in DLD (19), which led to in which the secondary structures were reasonably formed (20–22) and two subsets of hydrogen-bonding networks were formed in ABD and DLD (23). During ms, structures of ABD and DLD were further organized, which led to the hyperfluorescent intermediate state, , consisting of four substates, , which matured through four parallel pathways on timescales of s to reach the four native conformers, collectively denoted by {N} in Eq. 1 (24–27). It is plausible that the slow process (several hundred seconds) during the phases is due to intense “internal friction” (30–32) in the glassy dynamics of conformation (33), including formation/disruption of nonnative contacts, the effects of proline isomerization, and the cis–trans isomerization of Gly95 and Gly96. Apart from this complexity during the last phase, the folding scheme in Eq. 1 can be regarded as a hierarchical assembly of structures that begins from the ordering of each domain at the early phase of and proceeds to the formation of the whole protein during the later phase of (19). Therefore, the questions are the mechanisms for how such a sequential pathway is realized in DHFR and how the topological complexity of DHFR affects the pathway selection.
The Extended WSME Model
In this paper, we describe the configuration of each residue in a coarse-grained manner, using an Ising-like binary variable, when the ith residue has the native-like configuration and when it has a disordered structure. DHFR has 10 proline residues, which are represented by for trans and for cis isomers. Then, the degree of DHFR folding is represented by order parameters that are defined in individual domains, and , and the folding order parameter of the whole DHFR is .
Using a structure-based coarse-grained model, the existence of an on-pathway intermediate state during the folding process of DHFR was shown with molecular dynamics (MD) simulation (34). Although this simulated intermediate was interpreted as (34), the DLD of the simulated intermediate was poorly structured, which was more similar to than to or in which the structure of DLD is reasonably formed (19). Because experimental data for and became available (19, 21, 22) after this MD result was reported, we revisit the problem of the DHFR folding intermediates in this paper. To highlight the relationship between the topology of the native conformation and the folding pathway, we develop a simple theoretical model as an extension of the structure-based model first developed by Wako and Saito (35, 36) that was systematically applied to proteins by Muñoz and Eaton (18). In this study, we apply the extended version of the Wako–Saito–Muñoz–Eaton (WSME) model to the early phases , , and of DHFR folding, in which the effects of topology should be more evident than during the last processes through which the atomic packing inside a protein should be realized. We circumvent the complexity of the slow phases, using a structure [Protein Data Bank (PDB) ID: 1rx1] that binds the reduced form of nicotinamide adenine dinucleotide phosphate (NADPH) as a model native conformation. Because one conformation that can accommodate a specific ligand is stabilized among four native coexisting conformers by binding the ligand (37), we expect that the pathway toward this conformation would be emphasized in this model with this choice for the model native conformation.
Modifying the original model (18, 35, 36) by considering proline isomerization, energy in the WSME model is expressed as a function of ,
[2] |
where is the energy stabilization due to the formation of a native contact between residues i and j, and . The pattern of native pairs is defined by setting when residues i and j are in contact in the native conformation and otherwise. , where , is the energy cost of proline isomerization and is the ith residue type. Although the i dependence of represents the heterogeneity of the chemical environment around individual proline residues, in this study, we assume the same value, , for all proline sites for simplicity. (See Materials and Methods for the definition of contacts and the parameter values of and .) It should be noted that the many-residue effects are explicitly considered in the first term of Eq. 2. This many-body feature allows this simple model to adequately describe the free-energy surfaces (3, 6, 18, 38, 39) and kinetics (17, 39, 40) of many proteins. An additional advantage of using this model is that the analytical forms of partition functions, and , can be exactly derived (41, 42); thus, free-energy surfaces and other quantities are readily calculated from to obtain a clear view of the folding process. The one-dimensional representation is , where is a sum over patterns of under the constraint of a given M, and is the entropic gain due to structural disordering at the ith residue. (See Materials and Methods for values of .) In a similar manner, the 2D representation, , is defined by applying the constrained sum .
A partition function in the WSME model can be exactly calculated because it is a sum over mosaic patterns of native-like islands in the protein chain (41). Therefore, emphasis on continuous regions that have the native-like configuration is an important feature of the WSME model and is also the reason why the model can describe the cooperativity among multiple residues in a suitable manner in single-domain proteins (17). However, with this emphasis on the many-residue effects, the WSME model may overestimate the cooperativity among residues in larger proteins. As illustrated in Fig. 2, a native contact is formed only in the continuous island of an ordered structure in the WSME model (Fig. 2A) and is lost when the structure of an intervening residue is disordered to direct the chain in the “wrong” direction (Fig. 2B). When a long flexible part between residues to be contacted is disordered, the chain could recover the desired direction to form the native contact (Fig. 2C). The statistical weight of the latter case is assumed to be small in the WSME model because the chain confined in a compacted domain should not have sufficient room to have a long loop that can adjust its direction. This assumption should be invalid for multidomain proteins with complex topologies, in which two domains are connected by multiple linkers. Here, to consider the statistical weight neglected in the WSME model, we develop an extended version of the WSME model (eWSME model). With this eWSME model, we circumvent the problem of overestimating cooperativity without losing the desired features of emphasizing the continuous regions.
Fig. 2.
Illustration of how a native contact is formed with the eWSME model. (A) A native contact is formed between i and j when the residues from i through j have the native-like configuration and . (B) The native contact is lost when the structure of a residue between i and j is disordered. (C) The native contact could be formed when the lengthy flexible part is in between i and j, but is not taken into account in the original WSME model. (D) The Boltzmann weight of the configuration of C can be included in the partition function in the same way as the summation in the WSME model when we consider virtual ring closure of the chain. Residues with the native-like configuration are shown by colored circles and those with disordered configurations are shown by white circles.
We first consider a virtual system in which the N- and C-terminal parts of the original protein are linked to form a ring. The energy function of this system is defined by
[3] |
where , , and ; thus, the native contacts between the ordered segments at the N- and C-terminal parts are considered, even when ABD is disordered in DHFR (Fig. 2D). By calculating or from in a manner similar to calculating or , we can calculate the partition function of the eWSME model by
[4] |
where is the entropic cost to bring the N and C termini to positions in the native conformation, which is estimated by assuming a Gaussian chain of length with as the total number of residues and Å. A persistence length of 20 Å is used (43) (Materials and Methods). when , and when , so that is the interpolation between and . For , we have and . Eq. 4 can be exactly calculated, and thus the 1D and 2D free-energy surfaces of folding, and , and other quantities are derived from . We also investigate the kinetics of folding, using the eWSME model with Monte Carlo simulations.
Results
Free-Energy Surfaces and Kinetics of Folding.
A cysteine-free mutant (C85A/C152S) of DHFR (AS-DHFR) has been frequently used in experiments to prevent the formation of a nonnative disulfide bond during the folding process. To compare the calculated results with experimental data, we also consider AS-DHFR in this paper. Fig. 3 shows the one-dimensional free-energy surface, , of AS-DHFR folding that is calculated using the eWSME model at different temperatures. Three basins in this free-energy landscape correspond to the unfolded (U), intermediate (I), and native (N) states. U and N have the same free energy at the folding temperature: K.
Fig. 3.
One-dimensional representation of the free-energy surface, , of AS-DHFR at different temperatures. M is the number of residues with the native-like configuration.
The basin of the intermediate, I, in is a superposition of multiple basins, which are resolved in the 2D representation of the free-energy surface, , as shown in Fig. 4A. On this 2D surface, which is calculated at temperature K, we find two distinct basins, and . Furthermore, is found around where the slope of the free-energy surface vanishes and the population of folding trajectories can accumulate. At these candidate intermediates and at U, the degree of local structure ordering of individual residues, , is calculated as shown in Fig. 4B, where is the average calculated by . In , ABD is almost native-like, but DLD is poorly structured. In , ABD and the N-terminal part of DLD are native-like, but the C-terminal part remains disordered. As shown in Fig. 1B, this earlier folding of the N-terminal than the C-terminal parts of DLD is due to the spatial proximity of the N-terminal part and ABD, which results in a larger number of native contacts between ABD and the N-terminal part than those between ABD and the C-terminal part. In , DLD is almost native-like, but ABD is poorly structured.
Fig. 4.
Two-dimensional representation of the DHFR free-energy surface resolves the multiplicity of intermediate states. (A) (Upper) Free-energy surface of AS-DHFR folding, , at K. (Lower) The one-dimensional cross section of the free-energy surface at . (B) Expected values for the degree of local structural ordering, , at states U, , , and . Approximately 90% of proline residues remain in trans in the unfolded state, so that 10 proline residues show spikes of in the U state.
With this coexistence of multiple states, two different pathways are possible: The one that connects them is , which we refer to as , and the other is , which we refer to as . Along and , the heights of the free-energy barriers to be surmounted are and , respectively. Because , it would be expected that is a dominant folding pathway. This expectation is verified by analyzing the kinetics of the folding trajectories, using Monte Carlo (MC) simulations with the eWSME model. For these MC simulations, the single-site–updating Metropolis algorithm is adopted as previously used for a single-domain protein with the WSME model (39). We focus on the process within 1 s, so that the configuration of proline sites is maintained as in the initial U state by neglecting the extremely slow cis–trans isomerization of proline residues. In the U state, ∼90% of proline residues assume the trans configuration with our parameterization of (Materials and Methods). Based on an interpretation that the experimentally observed intermediate is a kinetically trapped state because of the intense internal friction including isomerization of the remaining 10% of proline residues, most of the molecules in reside within the basin of N on the free-energy surface in our representation. (See Materials and Methods for MC calculations.)
Fig. 5A shows the logarithms of the extent of folding in individual domains, and , where the overbar denotes the average over an ensemble of 200 MC trajectories at a given time instant t, and and are the numbers of residues in DLD and ABD, respectively. Starting from the ensemble of initial conformations distributed around the basin of U at , the temporal development of and exhibits multiple phases. During the initial phase of , the structure of ABD develops with the DLD structure remaining undeveloped, where represents N trials of changing at randomly chosen residues during the MC calculations; this should correspond to a characteristic timescale for changing the whole chain configuration. During , the rate of development of the structure of DLD is much slower than that of ABD, which indicates a shift of the population of trajectories toward . This initial phase is followed by further development of both and during . For , and are fitted to the lines with the gradients of and , respectively, with the corresponding time constants, and . These large values of and are due to the slowness of proline isomerization, which can be verified by changing the frequency of updating proline residues in the MC simulations (Fig. S1). It should be noted that even though each value of is near 1 or 0 at designated points on the plane as shown in Fig. 4B, the structure ordering averaged over the ensemble of MC trajectories gradually develops at each domain as shown in Fig. 5A. This is in agreement with the observed gradual development of the structural order (19).
Fig. 5.
The folding kinetics of AS-DHFR obtained with MC simulations. (A) Time evolution of the logarithm of the extent of folding in each domain, (red) and (green). and have three phases for , , and . The last phase was fitted by lines with small slopes of for and for . (B–D) Distribution of 200 MC trajectories plotted on the 2D free-energy surface, , at characteristic times of (B), (C), and (D).
At the characteristic times of , , and , the distributions of positions of MC trajectories on the 2D space of are shown in Fig.5 B–D. By comparing these results with the free-energy surface in Fig. 4A, no population of the simulated trajectories can be found that shifted toward , and the simulated DHFR folding follows a single sequential pathway of , which is also consistent with experimental observations (19–29). At the later phase of folding, the internal friction should begin to increase the timescale for the folding process (30–32). This effect can be considered by regarding as a function of M, , which is a relationship similar to the one used previously (30). Here α is the ratio of the effective internal viscosity to the solution viscosity, which should be estimated as by assuming 0.8 centipoise for the solution viscosity and by fitting the WSME results to the data for small proteins (30). is , where is the average energy at a given M. [See Fig. S2 for .] In this study, we estimate , using the values of M calculated at and , and the M obtained by extrapolating the calculated trajectories to . Using these values for α and with ps, the simulated timescales are , , and ms, which agree semiquantitatively with the experimentally observed values of , , and ms (19). It should be noted that in multidomain proteins, the heterogeneous distribution of frustration (44, 45), particularly at the domain interface, should make the internal friction more evident than in small proteins. Thus, should nonlinearly depend on , which further enhances the timescale lengthening effect during the late phases than that in the present estimations. Therefore, the simulated phases , , and correspond to those observed, , , and , respectively. In addition, from Fig. 5 B–D, we can consider that the simulated intermediates and should correspond to and , respectively.
By decomposing free energy into energy and conformation entropy as shown in Fig. S3 A and B, we observe that is entropically more stable than and that is energetically more stable than . Accordingly, in DHFR, the folding pathway is selected to fold the continuous domain, ABD, earlier than the discontinuous domain, DLD, to prevent the rapid entropic reduction due to the assembly of the N- and C-terminal parts in . The existence of two distinct basins, and , suggests that each domain can be energetically stabilized because of the formation of native contacts in each domain. In addition, because of the topology of DHFR, DLD folding requires a larger reduction in entropy than that for ABD folding. This entropic effect can account for the strong preference of with lower than .
Although the basin does not appear along the folding pathway, should be populated at equilibrium as a fluctuation from the basin of the N state. Therefore, the present results predict that individual DHFR molecules in equilibrium should have either one of two fluctuation types: fluctuation along PathABD or fluctuation along PathDLD. This prediction should be investigated using single-molecule observations of DHFR at equilibrium. However, with the original WSME model (18, 35, 36), the basin is absent from the calculated 2D free-energy surface (Fig. S4) because of overestimating the cooperativity among residues and the resulting overemphasis placed on chain connectivity. The extension made in the eWSME model corrects this overestimation by reproducing the balance between the effects of chain connectivity and those of the dense distribution of native contacts in DLD.
Circular Permutant.
To analyze the effects of changing the topology, circular permutants of DHFR (cp-DHFR) have been experimentally investigated (22, 28, 29). In a circular permutant that was obtained by cutting the chain between residues 38 and 39 and connecting the N and C termini, both ABD and DLD comprise continuous parts of the chain, so that the topological complexity of wild-type DHFR, or the discontinuity in DLD, is resolved. Therefore, the correlation between ABD and DLD should be reduced, which would enhance the multiplicity or flexibility of pathways. Indeed, for this cp-DHFR, the observed kinetic fluorescence data were well fitted using two exponentials, which suggests that two parallel pathways coexist (29). This coexistence is confirmed in the calculated 2D free-energy surface. Because of the destabilization of and the stabilization of in cp-DHFR (Fig. 6A), the accessibilities to and from U are similar to each other with and . By decomposing the free energy into energy and conformation entropy (Fig. S3 C–F), we find that is both entropically and energetically stabilized by this circular permutation because the circular permutation aids the formation of the native contacts within the continuous DLD. Although is entropically destabilized because the entropic cost of across the 38–39 cleavage is introduced, the energetic effects dominate the change in the free-energy landscape and shift the balance between energy and entropy to lower the free-energy barrier . The simulated kinetic population shift indicates that the ensemble of MC trajectories splits into two directions (Fig. 6 B–D), which shows that the folding process is a superposition of two pathways. Thus, the experimentally observed folding pathway complexity is caused by resolving the topological complexity in the native conformation.
Fig. 6.
Two-dimensional representation of change in the free-energy surface and temporal shifts of populations of MC trajectories for folding of a circular permutant of DHFR (cp-DHFR). (A) Change in the free-energy surface of folding, at K, caused by the circular permutation. (B–D) Distributions of 200 MC trajectories simulated at K plotted on the 2D space of at (B), (C), and (D). (E) Effects of circular permutation (CP) on folding kinetics are illustrated schematically.
Summary and Discussion
For many single-domain proteins, the energy landscape perspective has been successfully applied to understand their folding mechanisms (33, 46). Whereas the multiplicity and flexibility of folding pathways have been investigated for multidomain proteins (3, 4, 15, 16, 33), questions still remain, particularly with regard to how the different types of folding mechanisms of multidomain proteins, folding through multiple or flexible pathways, or folding along a single sequential pathway can be distinguished from the energy landscape perspective. Using a simple structure-based model, we have shown that the folding mechanism is determined by the balance between the tendency to prevent the steep decrease of entropy through the association of discontinuous parts of the chain and the tendency to promote energy stabilization through the formation of compact domains.
Using DHFR as an example of a multidomain protein, we have shown that the entropic mechanism reflecting chain connectivity is an important factor that determines the folding mechanism. The continuous domain, ABD, folds earlier than the discontinuous domain, DLD. The intermediate in which DLD is structured but ABD is disordered does not appear on the folding pathway but can appear during equilibrium fluctuations, which should result in a heterogeneous distribution of fluctuations in the ensemble of DHFR molecules. The flexibility of ABD should be important for regulating the allosteric transition of DHFR (47). In a circular permutant of DHFR, in which both ABD and DLD are continuous domains, the two pathways are accessible from the unfolded state. Fig. 6E shows a schematic of the entropic mechanism to determine folding pathways. The balance between the entropic and energetic mechanisms can be shifted by changing the topology, and the complex folding behaviors should be realized by simplifying the topology of the native conformation. In more general cases, it is intriguing to examine how the balance is shifted when some parts of a protein are energetically more stabilized. Particularly, the functionally important parts can be energetically stabilized through the evolutionary design and may fold in the early phase (48) even if the corresponding folding route is entropically unfavorable. The folding of functional loops in the early phase was observed in an example single-domain protein, IL-1β (49).
The eWSME model modified from the original WSME model was shown to be suitable for describing proteins that have discontinuous domains. This modification can be further extended to more general cases: The effects of associations of discontinuous parts in proteins that have three or more domains can be considered by adding terms to represent the virtual closure of loops, each of which corresponds to the energetic stabilization of a domain independent from the other domain. The additional virtual closure of loops should be determined from diagrams that represent chain connectivity among domains (50). This method should contribute to providing a unified view of the various folding mechanisms of multidomain proteins.
Materials and Methods
Entropy Cost of Ring Closure.
in Eq. 5 was estimated by taking the logarithm of the probability that two ends of a Gaussian chain are at a given distance r, , which was derived in a manner similar to that for equation 2 in ref. 43 but under a different geometrical constraint, where , Å is the distance between neighboring atoms, Å is the persistence length of a peptide chain, and r is the distance between s of N- and C-terminal ends of the chain in the native conformation.
Contacts Between Residues.
The native conformation of AS-DHFR (C85A/C152S mutant of DHFR) was assumed to have the same native conformation as wild-type DHFR (PDB ID: 1rx1). The pattern of contacts among residues was read from 1rx1 by deleting the sulfur atom from the 85th residue and by assuming that the sulfur atom in the 152th residue was replaced by an oxygen atom at the same position. A contact was defined to be formed between residues i and j in the native conformation, and therefore when the distance between heavy atoms in two residues was less than 4 Å for . The same used for AS-DHFR was used to define the native contacts in the cp-DHFR model. The ligand atoms were neglected in the model.
Parameters.
The parameter in Eq. 2 was defined by , where when the number of pairs of heavy atoms in contact, , was , , , , and . In Figs. 4 and 5, simulations were performed at K. The value kcal/mol was selected to make the simulated folding temperature close to the observed value: K (29). when the ith residue was proline, cal⋅mol−1⋅K−1 when the ith residue was Gly, and cal⋅mol−1⋅K−1 for other residues (41). Eighty to ninety percent of proline residues are considered to remain in the trans configuration upon unfolding (27, 51), and we selected kcal/mol to make this ratio 90% for the sake of computatinal efficiency.
Monte Carlo Simulations.
Corresponding to in Eq. 4, we defined the energy function as generated a trial set of by updating at an arbitrarily chosen site i at every step, and judged the trial using as the effective energy in the Metropolis algorithm. The initial conformations for these MC simulations were prepared by performing the MC calculations at K.
Supplementary Material
Acknowledgments
We thank Drs. Munehito Arai and George Chikenji for fruitful discussions. This work was supported by Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research, Strategic Programs for Innovative Research, and the Computational Materials Science Initiative, Japan.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
See Commentary on page 15863.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1406244111/-/DCSupplemental.
References
- 1.Shank EA, Cecconi C, Dill JW, Marqusee S, Bustamante C. The folding cooperativity of a protein is controlled by its chain topology. Nature. 2010;465(7298):637–640. doi: 10.1038/nature09021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ferreiro DU, Wolynes PG. The capillarity picture and the kinetics of one-dimensional protein folding. Proc Natl Acad Sci USA. 2008;105(29):9853–9854. doi: 10.1073/pnas.0805287105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Itoh K, Sasai M. Cooperativity, connectivity, and folding pathways of multidomain proteins. Proc Natl Acad Sci USA. 2008;105(37):13865–13870. doi: 10.1073/pnas.0804512105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li W, Terakawa T, Wang W, Takada S. Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot. Proc Natl Acad Sci USA. 2012;109(44):17789–17794. doi: 10.1073/pnas.1201807109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Klimov DK, Thirumalai D. Symmetric connectivity of secondary structure elements enhances the diversity of folding pathways. J Mol Biol. 2005;353(5):1171–1186. doi: 10.1016/j.jmb.2005.09.029. [DOI] [PubMed] [Google Scholar]
- 6.Itoh K, Sasai M. Multidimensional theory of protein folding. J Chem Phys. 2009;130(14):145104. doi: 10.1063/1.3097018. [DOI] [PubMed] [Google Scholar]
- 7.Lindberg MO, Oliveberg M. Malleability of protein folding pathways: A simple reason for complex behaviour. Curr Opin Struct Biol. 2007;17(1):21–29. doi: 10.1016/j.sbi.2007.01.008. [DOI] [PubMed] [Google Scholar]
- 8.Haglund E, et al. The HD-exchange motions of ribosomal protein S6 are insensitive to reversal of the protein-folding pathway. Proc Natl Acad Sci USA. 2009;106(51):21619–21624. doi: 10.1073/pnas.0907665106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chavez LL, Gosavi S, Jennings PA, Onuchic JN. Multiple routes lead to the native state in the energy landscape of the β-trefoil family. Proc Natl Acad Sci USA. 2006;103(27):10254–10258. doi: 10.1073/pnas.0510110103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Werbeck ND, Itzhaki LS. Probing a moving target with a plastic unfolding intermediate of an ankyrin-repeat protein. Proc Natl Acad Sci USA. 2007;104(19):7863–7868. doi: 10.1073/pnas.0610315104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ferreiro DU, Walczak AM, Komives EA, Wolynes PG. The energy landscapes of repeat-containing proteins: Topology, cooperativity, and the folding funnels of one-dimensional architectures. PLOS Comput Biol. 2008;4(5):e1000070. doi: 10.1371/journal.pcbi.1000070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schafer NP, et al. Discrete kinetic models from funneled energy landscape simulations. PLoS ONE. 2012;7(12):e50635. doi: 10.1371/journal.pone.0050635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hagai T, Azia A, Trizac E, Levy Y. Modulation of folding kinetics of repeat proteins: Interplay between intra- and interdomain interactions. Biophys J. 2012;103(7):1555–1565. doi: 10.1016/j.bpj.2012.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Han J-H, Batey S, Nickson AA, Teichmann SA, Clarke J. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol. 2007;8(4):319–330. doi: 10.1038/nrm2144. [DOI] [PubMed] [Google Scholar]
- 15.Arviv O, Levy Y. Folding of multidomain proteins: Biophysical consequences of tethering even in apparently independent folding. Proteins. 2012;80(12):2780–2798. doi: 10.1002/prot.24161. [DOI] [PubMed] [Google Scholar]
- 16.Wang Y, Chu X, Suo Z, Wang E, Wang J. Multidomain protein solves the folding problem by multifunnel combined landscape: Theoretical investigation of a Y-family DNA polymerase. J Am Chem Soc. 2012;134(33):13755–13764. doi: 10.1021/ja3045663. [DOI] [PubMed] [Google Scholar]
- 17.Henry ER, Best RB, Eaton WA. Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations. Proc Natl Acad Sci USA. 2013;110(44):17880–17885. doi: 10.1073/pnas.1317105110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Muñoz V, Eaton WA. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA. 1999;96(20):11311–11316. doi: 10.1073/pnas.96.20.11311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Arai M, Iwakura M, Matthews CR, Bilsel O. Microsecond subdomain folding in dihydrofolate reductase. J Mol Biol. 2011;410(2):329–342. doi: 10.1016/j.jmb.2011.04.057. [DOI] [PubMed] [Google Scholar]
- 20.Kuwajima K, Garvey EP, Finn BE, Matthews CR, Sugai S. Transient intermediates in the folding of dihydrofolate reductase as detected by far-ultraviolet circular dichroism spectroscopy. Biochemistry. 1991;30(31):7693–7703. doi: 10.1021/bi00245a005. [DOI] [PubMed] [Google Scholar]
- 21.Arai M, Iwakura M. Probing the interactions between the folding elements early in the folding of Escherichia coli dihydrofolate reductase by systematic sequence perturbation analysis. J Mol Biol. 2005;347(2):337–353. doi: 10.1016/j.jmb.2005.01.033. [DOI] [PubMed] [Google Scholar]
- 22.Arai M, Maki K, Takahashi H, Iwakura M. Testing the relationship between foldability and the early folding events of dihydrofolate reductase from Escherichia coli. J Mol Biol. 2003;328(1):273–288. doi: 10.1016/s0022-2836(03)00212-2. [DOI] [PubMed] [Google Scholar]
- 23.Jones BE, Matthews CR. Early intermediates in the folding of dihydrofolate reductase from Escherichia coli detected by hydrogen exchange and NMR. Protein Sci. 1995;4(2):167–177. doi: 10.1002/pro.5560040204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Touchette NA, Perry KM, Matthews CR. Folding of dihydrofolate reductase from Escherichia coli. Biochemistry. 1986;25(19):5445–5452. doi: 10.1021/bi00367a015. [DOI] [PubMed] [Google Scholar]
- 25.Jennings PA, Finn BE, Jones BE, Matthews CR. A reexamination of the folding mechanism of dihydrofolate reductase from Escherichia coli: Verification and refinement of a four-channel model. Biochemistry. 1993;32(14):3783–3789. doi: 10.1021/bi00065a034. [DOI] [PubMed] [Google Scholar]
- 26.Heidary DK, O’Neill JC, Jr, Roy M, Jennings PA. An essential intermediate in the folding of dihydrofolate reductase. Proc Natl Acad Sci USA. 2000;97(11):5866–5870. doi: 10.1073/pnas.100547697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Texter FL, Spencer DB, Rosenstein R, Matthews CR. Intramolecular catalysis of a proline isomerization reaction in the folding of dihydrofolate reductase. Biochemistry. 1992;31(25):5687–5691. doi: 10.1021/bi00140a001. [DOI] [PubMed] [Google Scholar]
- 28.Smith VF, Matthews CR. Testing the role of chain connectivity on the stability and structure of dihydrofolate reductase from E. coli: Fragment complementation and circular permutation reveal stable, alternatively folded forms. Protein Sci. 2001;10(1):116–128. doi: 10.1110/ps.26601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Svensson A-KE, Zitzewitz JA, Matthews CR, Smith VF. The relationship between chain connectivity and domain stability in the equilibrium and kinetic folding mechanisms of dihydrofolate reductase from E. coli. Protein Eng Des Sel. 2006;19(4):175–185. doi: 10.1093/protein/gzj017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cellmer T, Henry ER, Hofrichter J, Eaton WA. Measuring internal friction of an ultrafast-folding protein. Proc Natl Acad Sci USA. 2008;105(47):18320–18325. doi: 10.1073/pnas.0806154105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chahine J, Oliveira RJ, Leite VB, Wang J. Configuration-dependent diffusion can shift the kinetic transition state and barrier height of protein folding. Proc Natl Acad Sci USA. 2007;104(37):14646–14651. doi: 10.1073/pnas.0606506104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Best RB, Hummer G. Coordinate-dependent diffusion in protein folding. Proc Natl Acad Sci USA. 2010;107(3):1088–1093. doi: 10.1073/pnas.0910390107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ferreiro DU, Komives EA, Wolynes PG. 2013. Frustration in biomolecules. arXiv:1312.0867.
- 34.Clementi C, Jennings PA, Onuchic JN. How native-state topology affects the folding of dihydrofolate reductase and interleukin-1β. Proc Natl Acad Sci USA. 2000;97(11):5871–5876. doi: 10.1073/pnas.100547897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wako H, Saito N. Statistical mechanical theory of protein conformation I. General considerations and the application to homopolymers. J Phys Soc Jpn. 1978;44:1931–1938. [Google Scholar]
- 36.Wako H, Saito N. Statistical mechanical theory of protein conformation II. Folding pathway for protein. J Phys Soc Jpn. 1978;44:1939–1945. [Google Scholar]
- 37.Zhang Z, Rajagopalan PTR, Selzer T, Benkovic SJ, Hammes GG. Single-molecule and transient kinetics investigation of the interaction of dihydrofolate reductase with NADPH and dihydrofolate. Proc Natl Acad Sci USA. 2004;101(9):2764–2769. doi: 10.1073/pnas.0400091101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Itoh K, Sasai M. Flexibly varying folding mechanism of a nearly symmetrical protein: B domain of protein A. Proc Natl Acad Sci USA. 2006;103(19):7298–7303. doi: 10.1073/pnas.0510324103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Yu W, et al. Cooperative folding kinetics of BBL protein and peripheral subunit-binding domain homologues. Proc Natl Acad Sci USA. 2008;105(7):2397–2402. doi: 10.1073/pnas.0708480105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zamparo M, Pelizzola A. Kinetics of the Wako-Saitô-Muñoz-Eaton model of protein folding. Phys Rev Lett. 2006;97(6):068106. doi: 10.1103/PhysRevLett.97.068106. [DOI] [PubMed] [Google Scholar]
- 41.Bruscolini P, Pelizzola A. Exact solution of the Muñoz-Eaton model for protein folding. Phys Rev Lett. 2002;88(25 Pt 1):258101. doi: 10.1103/PhysRevLett.88.258101. [DOI] [PubMed] [Google Scholar]
- 42.Gō N, Abe H. Noninteracting local-structure model of folding and unfolding transition in globular proteins. I. Formulation. Biopolymers. 1981;20(5):991–1011. doi: 10.1002/bip.1981.360200511. [DOI] [PubMed] [Google Scholar]
- 43.Galzitskaya OV, Finkelstein AV. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA. 1999;96(20):11299–11304. doi: 10.1073/pnas.96.20.11299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in native proteins and protein assemblies. Proc Natl Acad Sci USA. 2007;104(50):19819–19824. doi: 10.1073/pnas.0709915104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wensley BG, et al. Separating the effects of internal friction and transition state energy to explain the slow, frustrated folding of spectrin domains. Proc Natl Acad Sci USA. 2012;109(44):17795–17799. doi: 10.1073/pnas.1201793109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol. 2004;14(1):70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- 47.Terada TP, Kimura T, Sasai M. Entropic mechanism of allosteric communication in conformational transitions of dihydrofolate reductase. J Phys Chem B. 2013;117(42):12864–12877. doi: 10.1021/jp402071m. [DOI] [PubMed] [Google Scholar]
- 48.Nagao C, Terada TP, Yomo T, Sasai M. Correlation between evolutionary structural development and protein folding. Proc Natl Acad Sci USA. 2005;102(52):18950–18955. doi: 10.1073/pnas.0509163102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Capraro DT, Gosavi S, Roy M, Onuchic JN, Jennings PA. Folding circular permutants of IL-1β: Route selection driven by functional frustration. PLoS ONE. 2012;7(6):e38512. doi: 10.1371/journal.pone.0038512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Nelson ED, Grishin NV. Scaling approach to the folding kinetics of large proteins. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73(1 Pt 1):011904. doi: 10.1103/PhysRevE.73.011904. [DOI] [PubMed] [Google Scholar]
- 51.Piana S, Lindorff-Larsen K, Shaw DE. Atomic-level description of ubiquitin folding. Proc Natl Acad Sci USA. 2013;110(15):5915–5920. doi: 10.1073/pnas.1218321110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.