Abstract
Repeat proteins have unique elongated structures that, unlike globular proteins, are quite modular. Despite their simple one-dimensional structure, repeat proteins exhibit intricate folding behavior with a complexity similar to that of globular proteins. Therefore, repeat proteins allow one to quantify fundamental aspects of the biophysics of protein folding. One important feature of repeat proteins is the interfaces between the repeating units. In particular, the distribution of stabilities within and between the repeats was previously suggested to affect their folding characteristics. In this study, we explore how the interface affects folding kinetics and cooperativity by investigating two families of repeat proteins, namely, the Ankyrin and tetratricopeptide repeat proteins, which differ in the number of interfacial contacts that are formed between their units as well as in their folding behavior. By using simple topology-based models, we show that modulating the energetic strength of the interface relative to that of the repeat itself can drastically change the protein stability, folding rate, and cooperativity. By further dissecting the interfacial contacts into several subsets, we isolated the effects of each of these groups on folding kinetics. Our study highlights the importance of interface connectivity in determining the folding behavior.
Introduction
Understanding the mechanisms that underlie protein folding at the molecular level requires a detailed understanding of how the various interactions, whether short- or long-range and of a native or nonnative nature, contribute to the overall stability and kinetics. It is a challenging task to quantify the nonadditive effects of these interactions, but it is an essential one if we are to predict the effects of point mutations on folding. In many studies, both experimental and computational, investigators have pursued this goal and focused on various small globular domains to decipher folding at the atomistic level (1–5).
Although small globular proteins are attractive because of their small size and the fact that their folding often follows a simple two-state mechanism, repeat proteins may be more suitable for this task. Repeat proteins can be considered as a simple case of multidomain proteins composed of relatively small domains that have a simple topology and interact only with their neighboring domains (on the other hand, repeat proteins can be seen as more complicated because often their domains cannot fold by themselves). Long-range contacts in repeat proteins are rare, and the folding of such proteins is dominated by local interactions. In previous studies, the one-dimensionality of the structure of repeat proteins, which are composed of repetitive units of similar size and structure packed together in a linear chain, enabled the deletion or addition of a particular repeating unit, illustrating their modular nature and their higher tolerance for manipulations relative to globular proteins (6–9). Furthermore, previous works in which repeat proteins were designed to be composed of identical consensus repeats had an additional advantage for dissecting folding energetics because their stability is more homogenously distributed along the proteins (10–14). Despite their simplicity and homogeneity, which has been useful for folding studies, repeat proteins exhibit complex folding behavior similar to that observed in globular proteins, as reflected, for example, by their high stability, cooperative folding, and multiple folding pathways (11,15–23).
The folding kinetics and pathways of repeat proteins have been extensively studied, and it was shown that in natural repeat proteins, where the units are not identical, the local stabilities of the subunits play an important role in dictating them (24–26). To avoid these variations, investigators designed a series of repeat proteins with nearly identical subunits of varying numbers based on consensus sequences (which were determined using evolutionary information) (10,12,14). Using these designed series of proteins, researchers performed in-depth analyses of the folding stability, kinetics, and cooperativity, concentrating on the effects of the number of subunits and their topology (10–14).
Studies of the folding of the two best-studied series, the Ankyrin (ANK) repeat and the tetratricopeptide (TPR) repeat, have revealed interesting trends. In both of these consensus-designed series, the stability of the consensus-designed protein was shown to increase when it was composed of a larger number of subunits (13,27,28). The ANK proteins were observed to fold in a slower manner than their respective TPR analogs, which have a similar number of residues. In addition, whereas the ANK proteins seemed to fold in a highly cooperative manner, even when the protein included a large number of repeating units, the TPR proteins folded sequentially, populating many intermediates with partial numbers of folded repeat subunits (29).
These contrasting behaviors with regard to folding cooperativity were examined in several experimental and computational studies (28,30–32). It was found that the intricate balance between the intrinsic stabilities of the subunits and their stabilization, owing to interface formation with neighboring units, is important for determining the folding cooperativity of the entire protein. Using simple topology-based models, we were previously able to distinguish between these two series, which suggests that the topology of these structures and their internal connectivity are a major factor that influences cooperativity (31). Here, we examine the effects of formation and the relative strength of the interface between the subunits on the cooperativity and kinetics of the proteins. We show that by modulating the relative strength of the interface, one can significantly alter the folding mechanism, the folding rate, and the extent of cooperativity. For the ANK proteins, we further examine the effect of the interface on the folding reaction by elucidating the contribution of different parts of the interface to the folding characteristics.
Materials and Methods
Coarse-grained molecular-dynamics simulations
We used two series of repeat proteins, ANK and TPR, with a growing number of repeat units (2, 3, 4, 6, 8, and 10 units), which are based on designed proteins that have identical repeating units (PDB: 1N0R and 2FO7), to reduce any effects that might be caused by variation among the units. We studied their folding using molecular dynamics (MD) simulated by the Langevin equation. For our simulations, we used a simple native-topology-based model (also termed the Gō model) that assumes a perfectly funneled energy landscape (33). This model has reproduced experimental kinetic rates and pathways, as well as captured other processes involved in folding, such as folding intermediates, protein dimerization, and assembly (19,34–38). More recently, this model was used to study the effects of confinement (39), tethering (40), and modification by natural posttranslational modifications on the folding of the modified protein (41,42).
To make an in-depth study of the thermodynamics and kinetic properties of the studied proteins, we applied a reduced representation of the proteins. In this method, each amino acid is represented by a single bead centered at the Cα atom. Interactions between the residues are arranged so that the lowest energy will be obtained for the native-state structure. Attractive interactions are introduced between native contacts when these pairs are in proximity, whereas the rest of the pairs have repulsive interactions. The contacting residue pairs are determined using the Contacts of Structural Units method (43). In a vanilla model, all interactions are equally scored, with no regard to their original chemical nature. However, to study the relative importance of some of the contacts, such as those in the interfaces, we artificially change the energetic strength by changing the parameter of the corresponding contacts. Details of the energy function appear below:
We manipulate the strength of the intra- and interrepeat contacts by modifying the values of εintra and εinter, respectively. In the unmodified (wild-type) ANK and TPR proteins with different numbers of repeating units, there is no discrimination between the strengths of the intra- and interrepeat contacts, and both types have identical strength (the strength of all types of contacts is uniform in the original unmodified systems and equals to one). In the modified variants, however, we changed these values in several ways: First, because the ANK proteins have ∼30% more intercontacts than intracontacts, and the situation is nearly opposite in the TPR proteins, we tried to manipulate the original balance of contacts and to imitate the situation in the other series. We therefore increased the strength of the interfaces of the TPR protein series (εinter), and decreased it analogously in the ANK protein series. This was carried out in two stages: The first modification adjusted the total strength of the interface (all of the interfacial contacts) to that of the internal repeats, so that the total enthalpy (the multiplication of the number of contacts by their coefficients) would be equal in the interface and the enthalpy inside the repeating units. In other words, we required that the value of the ratio λ = (N × εinter)/(M × εintra), where (N = the number of intercontacts and M = the number of intracontacts), would be altered to λ = 1. Because the unmodified ANK proteins had a λ-value of ∼1.33, we changed εinterANK from 1 to 0.78. Similarly, because λ ∼ 0.70 in the unmodified TPR series, we changed εinterTPR to 1.28. Consequently, the second modification further increased (or decreased) the coefficients, so that the total enthalpy in the modified protein series resembled the normal situation in the other series. Thus, we created a TPR-like ANK series by changing εinterANK to 0.56 (obtaining λ ∼0.70), and an ANK-like TPR series by changing εinterTPR to 1.89 (obtaining a λ-value of ∼1.33). In addition to these manipulations, we modified the εintra and εinter values, or a subset of these contacts, from 0.5 to 1.5 in selected ANK and TPR systems.
To determine the folding rates of the various systems, we used the mean passage time (MPT) method, in which the folding rate of the protein is correlated with the length of time at which the protein remains unfolded in transition, on average (kF = ln(1/MPT), and is therefore negative). The more negative it becomes, the slower is the folding reaction. The length and number of simulations for each protein were sufficient to ensure that a large sampling size would be obtained. This measure correlated with experimental folding rates in various studies (31,36,44).
To compute the potential of mean force (PMF, the free energy along a given coordinate; in this case, Q, the number of native contacts) curves of the various systems, we sampled a large array of temperatures around TF (the folding temperature, at which the stabilities of the folded and unfolded states are equal). We then used the multiple weighted histogram method (WHAM) (45) to compute thermodynamic values such as free energy, enthalpy, and heat capacity.
A simple capillarity theory for the folding of repeat proteins
We constructed a simple model based on the one-dimensional (1D) nature of repeat proteins to determine the dependence of thermodynamic stability on the number of repeats and the intra- and interrepeat stability. Repeat proteins can be modeled so that each repeat unit has enthalpic and entropic contributions that are independent of the other repeats. We assume that the folded state of a repeat protein has a vanishing configurational entropy, and that the unfolded state has a vanishing enthalpic contribution. A protein with a single repeating unit has EIntra = [−TF(1)S(1)], where EIntra is the enthalpic contribution of a single repeat, S(1) is its entropy in the unfolded state, and TF(1) is its folding temperature. When considering a protein with n > 1 repeating units, the enthalpic contribution from (n-1) interfaces of EInter is added to the contributions from the folding of n repeats. Therefore, we obtain the equation nEIntra + C(n-1)EInter = −TF(n)nS(1). In this equation, we assume extensivity in the entropy, so that S(n) = nS(1). The constant C corrects for different scaling of the enthalpic contribution of the intra- and interrepeat energies. From this, the relationship between folding temperature and the number of repeat units can be easily obtained:
This simple model can also be used to predict the correlation between the folding barrier, n and λ. To that end, we added a capillarity term, Ecap, to the model, which is the energy cost of forming an additional interface due to surface tension in elongating the 1D protein with n repeating units among the n′ units (which are consecutive) that are folded. The free energy of such a state at an arbitrary temperature T reads F ∼ (n′-1)EInter + n′EIntra – T(n-n′)S(1) + Ecap. At n′ = 0, there is no interface and Ecap vanishes. When n′ increases, this term increases and saturates whenever the folded/unfolded interface has a cross section equal to that of the completely folded protein. The free-energy term can be rewritten as F ∼ n′(T − TF(n))S(1) − TnS(1) − (1 − n′/n)EInter + Ecap. Both EInter and Ecap depend on n and n′. EInter can be assumed to saturate when n′ = 2. At the folding temperature (T = TF(n)), the maximum free-energy profile follows F# ∼ Ecap − a(1 − 2/n)EInter, which increases as n increases. The constant a corrects for the fraction of the interface that is formed at the transition state.
Results and Discussion
The folding kinetics and cooperativity of repeat proteins are strongly affected by their native-state topology
As previously mentioned, the two best-studied families of repeat proteins, ANK and TPR, exhibit different folding behaviors despite the similarities in their general structural properties (Fig. 1, A and B). Members of the TPR family tend to fold faster and in a sequential manner (16,17), forming many intermediates during their folding reaction. In contrast, their analogs in the ANK family fold more slowly and in a rather cooperative way (7). We note that the slow folding of ANK is partly affected by the presence of proline residues. However, the relative slow rates are observed even in cases where the prolyl-isomerase was added, and therefore its origin corresponds mostly to protein topology. Using a simple coarse-grained model, which results in favorable interactions with contacts in the native state, we were able to reproduce this behavior in ANK and TPR, using the ANK and TPR series of a growing number of identical repeating units (31) (see Fig. 1, C and D).
In both series, because the protein consists of a larger number of units, its folding rate decreases with the temperature of folding (TF, the temperature at which the folded and unfolded states are equally stable). This is consistent with experimental results obtained in both the ANK and TPR series (13,27). However, whereas the TPR proteins fold in a sequential manner, forming intermediates with different numbers of folded repeats, the proteins in the ANK series remain largely in a two-state folder, even in proteins with a large number of units. Another prominent difference between the two series is that the longer members of the ANK series fold significantly more slowly than the shorter members in the series (when each is measured at zero stability, at the temperature where the folded and unfolded states have equal stabilities), whereas in the TPR series the differences are not so large (13,27,31).
The delicate balance between the stability of the internal units and their coupling, as manifested in the size and stability of each of their interfaces, is thought to be pivotal in determining the formation of the folding nucleus, the propagation rate, and the overall kinetics and cooperativity. We therefore suspected that the observed differences between the two series may originate in the interface properties of the two families. Although the number of contact pairs within the internal repeats in both families is similar (∼43–48 pairs of residues in contact inside the repeat unit), the number of contacts at the interface is different (∼32 in the TPR family, and ∼60 in the ANK family). It is therefore possible that the higher number of contacts in the ANK family (∼30% more contacts in the interface than in the repeat unit itself) is responsible for the cooperative and slow-folding behavior, whereas the less-dense interface of the TPR (∼30% fewer contacts in the interface than internally in each repeat unit) is related to the faster sequential folding behavior of the TPR series.
Modulation of the interface affects the kinetics and cooperativity of the repeat proteins
We hypothesized that by modifying the relative strength of the interfaces by changing the coefficient of attraction between these residues in our simulation model, we would be able to modify the properties of normal protein, and in particular its folding cooperativity and kinetics. Because all of the contacts in the original systems have equal strength, we can alter a subset of them, and introduce stability variation among the contacts. Therefore, we increased the strength of the interactions at the interfaces of the TPR protein series in an attempt to imitate the situation in the ANK system, where the interface is denser. Analogously, we decreased the strength of the interface of the ANK protein series to imitate the TPR situation with the duller interface. Accordingly, we changed the values of λ (the relative strength of the inter- and intracontacts) of TPR from 0.70 to 1.33, and those of ANK from 1.33 to 0.70. This was carried out in two stages. In the first modification, the total strength of the interface (all of the interfacial contacts) was adjusted to that of the internal repeats so that the enthalpic contribution of the intra- and interrepeat contacts would be equal (namely, λ = 1.0, which is obtained by making M × εintra = N × εinter, where M and N are the number of intra- and interrepeat contacts, respectively, and εintra and εinter are the enthalpic contributions gained by forming an individual intra- and interrepeat contact, respectively). This was achieved by reducing the coefficients in the ANK system of each of the interfacial contacts from the original value of 1 to a value of 0.78. In the TPR system, we increased the coefficients of the interface from 1 to 1.28, to achieve equality between the intra- and intercontacts. The second modification further increased (or decreased) the strength of the interrepeat contacts so that the ratio of the enthalpy of the total inter- and intrarepeat contacts in the modified protein series now resembled the normal situation in the other series. Thus, we created a TPR-like ANK series and an ANK-like TPR series in terms of relative interface strength. Our working hypothesis was that the duller and weaker the interface becomes, the less cooperative and faster folding the protein will become.
Our hypothesis was confirmed to a large extent when we examined the kinetics results (Fig. 1 D). With the ANK series, the modified proteins with weakened interfaces fold significantly faster than their original analogs. A striking example is ANK10, whose interface was reduced to ∼56% (λ∼ 0.7; see Materials and Methods), a level similar to that observed for the original TPR10 analog. The folding rate of this interface-reduced ANK10 is equivalent to that of the original ANK3. However, it still folds more slowly than the slowest folder in the original TPR family, demonstrating that there are other determinants at play, such as the connectivity and the stability of the internal units.
With the TPR series, manipulating the interface affected the folding kinetics more modestly, and our ability to slow down the folding reaction and make it more cooperative was less successful. As we increased the strength of the interfacial contacts, the proteins tended to fold more slowly, especially in the longer members of the series; however, the results were not as pronounced as with the ANK series. This may stem from the fact that most of the contacts are relatively short-ranged in the TPR, and therefore attempting to divide them into two groups and prioritizing one of them has modest effects. An alternative explanation is that our manipulations are too strong to reach the desired state, and instead disturb the system so that a stable intermediate form, composed of the group of contacts that have the stronger coefficient, will be formed.
We next examined how the interface manipulations affected the folding cooperativity in the two series. Fig. 2 shows the coupling of unmodified proteins between the folding of individual repeating units in ANK10 and TPR10, and in two manipulated systems with weakened or strengthened interfacial contacts (in the figure, the coupling in folding of each pair of repeats is estimated by the correlation coefficient of finding both repeats in the same states during simulation time). High coupling between most of the repeats is indicative of the degree to which the system folds in an overall two-state manner. In the unmodified ANK10, the systems tend to fold in a highly cooperative manner, which is manifested in the high correlation between the folding behaviors of all the units, except for the outermost ones. As the interface is weakened, however, this cooperativity ceases, and the repeating units become more independent. In the ANK10 system with the weakest interface, the coupling between the units is significantly lower than in the original series and resembles a multistate TPR-like behavior. However, its cooperativity remains higher than in the original TPR10 system (which is in agreement with our observation regarding the kinetics).
Strengthening the interface of the TPR10 system results in higher coupling between the units. However, as with the kinetics, the sequential nature of the folding reaction is not drastically modulated, and the coupling is increased between neighboring units but not between units that are not adjacent. This again highlights the role other factors may play in the folding behavior.
To investigate these phenomena in greater molecular detail, we plotted the radius of gyration (Rg) of each system as a function of the number of contacts (Q) that are formed during the folding reaction (Fig. 3). The free-energy landscapes projected along Rg and Q are shown for ANK5 (which folds significantly faster than ANK10 and thus provides better sampling) and TPR10 systems (Fig. 3, A and C). The ANK system clearly displays two-state behavior, i.e., two highly populated regions that correspond to the folded and unfolded states are observed. In contrast to the ANK protein, the TPR protein is characterized by a long, continuous, evenly populated region, which corresponds to the nature of its sequential folding. When the strength of the ANK interface is reduced (Fig. 3 B), the two-state behavior is weakened and consequently the region between the folded and unfolded states becomes more populated, indicating that the system has more modular folding with more intermediates. However, increasing the strength of the interface in the TPR system does not result in a large difference in the population of various folding intermediates (Fig. 3 D).
When we manipulated the systems in the opposite direction (i.e., by increasing the strength of the interfaces in the ANK proteins and decreasing the interfaces in the TPR proteins), we observed consistent results (see Fig. S1 in the Supporting Material). Interestingly, the ANK systems with the stronger interface became more cooperative (and their folding rates did not increase), indicating that the ANK systems follow a consistent trend in folding cooperativity and kinetics as a function of interface strength. However, the TPR systems with the weakened interface were not significantly affected by this manipulation, much like the TPR systems in which we increased the interface strength.
A capillarity model for folding of repeat proteins captures the relationship among interface strength, folding stability, and kinetics
To explain the correlation between the number of repeats, the strength of their interfaces, the protein stability, and folding kinetics, we constructed a simple model based on the 1D nature of repeat proteins. The thermodynamic parameters of the model are the inter- and intrarepeat stabilities and the entropy of each repeat in the unfolded state. According to this model, which assumes a vanishing entropy and a vanishing enthalpy in the folded and unfolded states, respectively, the folding temperature of a protein comprising n repeats increases as a consequence of the extra stabilization provided by the interface formed between each neighboring folded repeat. The folding temperature of protein with n repeats, TF(n), increases with the number of repeats, but reaches a constant value for large n values.
We examined this relationship between the folding temperature and n for the modified and unmodified ANK and TPR proteins by plotting as a function of (n − 1)/n (see Materials and Methods). We noted that the coarse-grained simulations nicely predict the experimental stability of the series of unmodified TPR proteins with 2–10 repeating units (R = 0.96; Fig. 4 A). Fig. 4 A shows these plots for the different ANK and TPR systems, whose TF value was estimated from the coarse-grained MD simulations. All of the plots obey linear relationships. The slopes of the linear fits, which correspond to EInter/EIntra (namely, λ), agree very well with the value of λ used in the simulations (when c = 5 in all fits). The success of this simple model in predicting the stability of the various ANK and TPR systems strongly confirms that the stability of repeat proteins is controlled by the number of repeats, n, and the ratio between the energetic strengths of the inter- and intrarepeat, λ.
This simple model can also be used to predict the correlation between the folding barrier, n and λ. To that end, we added a capillarity term, Ecap, to the model, which is the energy cost of forming an additional interface due to surface tension in elongating the 1D folded repeat. At the folding temperature, TF(n), the major energetic barrier in the free-energy profile for folding on a protein with n repeating units follows F# = Ecap – a(1 − 2/n)EInter. To examine the dependence of the folding rate on the number of repeats and the strength of the interface between the repeats, we plotted the simulated folding rates as a function of (1 − 2/n) for the modified ANK and TPR systems. The simulated folding rates depend linearly on (1 − 2/n), as predicted by the simple theoretical model (all correlation coefficients are > 0.9), and they also depend on the strength of the interface. The slopes of the linear fits, which correspond to EInter, are indeed larger for systems with tighter interfaces. The value of the slope is similar to the value of Einter used in the simulations (when a is assigned a value of 10).
Modulation of a specific set of contacts in the ANK system is responsible for altering the folding behavior
A consistent trend is observed in our various analyses regarding the kinetics and cooperativity in the ANK system: When the interface between the ANK units becomes weaker, the protein folds faster and more sequentially (i.e., it becomes TPR-like; see Fig. 1 D and top panels of Fig. 2). Because this simple relationship is important for explaining numerous observations from both experiments and simulations, we analyzed the ANK system in greater detail to pinpoint the contacts or regions of the interface that are responsible for the most significant effects observed during the folding process.
We first analyzed the propensity of each contact to appear in the ANK and TPR systems in the transition-state ensemble, here defined as the ensemble of states in which 25–50% of the contacts are formed (as in most of the studied systems, the occurrence of conformations in this region is rare (see Fig. 3)); other definitions do not significantly alter the results. We noted that unlike the TPR system, where the contacts of the interface are relatively evenly formed, the contacts that are formed with a higher propensity in the ANK system tend to cluster in the interface between the longer loops that connect the units (interloop contacts). Other interfacial contacts, between the helices of two adjacent units (interhelical contacts), tend to form in later stages of the folding reaction (Fig. 4). The cluster of contacts in spatial proximity that are formed earlier in the folding reaction resembles the behavior of various globular proteins that fold through a nucleation process. The fact that many of the TPR contacts are more homogenously formed than the ANK variants is in agreement with the observation that the TPR proteins fold in a sequential manner, populating various intermediates along the way.
We speculated that modulating these two sets of interfacial contacts (the interhelix and interloop subsets of the interface, which include 29 and 31 contacts, respectively) separately would result in very different folding behaviors, and would reveal the regions of the interface that are responsible for the slow, cooperative behavior observed in the ANK proteins. We therefore modified the strength of each of these contact subsets by either increasing or decreasing the εinter parameter of these contacts, and studied their folding behavior. Strikingly, when we reduced the strength of the interfacial contacts between the helices (which are only ∼50% of the interfacial contacts), we observed a folding behavior similar to that achieved when we reduced the strength of the contacts in the entire interface. This was observed in the folding rate analysis (see Fig. 6 A), the PMF plots (see Fig. 6 B), and the cooperativity of the modulated system (Fig. S1).
Decreasing the strength of the interfacial contacts between the helices speeds up folding because this modification strengthens the nucleation region (i.e., decreasing the strength of the interfacial contacts between the helices effectively increases the relative strength of the interfacial contacts between the loops, and increases the intrarepeat contacts). A similar argument applies for decreasing the strength of the entire interface, which effectively results in higher stability of the repeating units and thus faster kinetics. The latter effect is due to reducing the cooperativity for folding by prioritizing the formation of the repeating units, which may then fold more sequentially than when the interface between them is tighter. The two systems that have the smallest folding barriers are the systems in which we weakened the strength of either the entire interfacial contacts or just the interhelical contacts (or, alternatively, increased the strength of the intrarepeat contacts alone or with the interfacial contacts between the helices; Fig. 5). Another prominent behavior observed in these systems is the higher population of conformations with an intermediate number of contacts during the folding reaction and the lack of clear, two-state behavior, which was observed in the original, unmodified ANK system (Figs. 3 and 6 B).
In contrast, when we reduced the εinter value of interfacial contacts between the loops, the system kinetics was not accelerated and the cooperativity remained high (Fig. 6, A and B, and Fig. S2). Weakening the interfacial contacts between the loops does not accelerate folding because these contacts are part of the transition state. Similarly, decreasing the strength of the intrarepeat contacts slows down the folding kinetics because they are part of the folding nucleus as well. Equivalently, increasing the strength of these contacts will result in a significant increase in the folding rate (Fig. 6 A).
We noted that when we manipulated only part of the interface (either the interhelices part or the interloops part), this resulted in a complex perturbation to the system. Reducing the entire interface strength prioritizes the internal repeat contacts, which are relatively short-ranged and spatially clustered, and therefore the effect is relatively straightforward. When we reduce the strength of parts of the interface, we simultaneously indirectly increase the probability of forming contacts with other parts of the interface; however, we also strengthen the internal contacts of the repeating units. Therefore, the latter manipulation results in a complex prioritization process whereby both the short-range internal contacts and some of the longer-range interfacial contacts are prioritized at the expense of the other part of the interface. Therefore, the weakening of the different parts of the interface does simply affect the system in a partial manner in comparison with the weakening of the entire interface. Instead, a more complex behavior is expected to occur, as indeed is observed when the PMF curves (Fig. 6 B) are examined. When we compare the curves of various modified systems with the PMF curve of the unmodified ANK5 system, we can see that the transition state is lowered in some of the systems, resulting in faster folding rates. When we reduce the strength of the contacts of the entire interface (in red), we get a significantly faster kinetics, whereas reduction of the intracontacts strength (in blue) results in a system that folds at a rate similar to that of the original, unmodified ANK5 system. When different subsets of the interfacial contacts are altered, we obtain different results: When we reduce the strength of the interfacial contacts that reside between loops (in orange), we observe a significant acceleration of the folding rate, whereas reducing the strength of the contacts that reside between helices (in cyan) do not lead to significant changes in the kinetics.
To obtain a more microscopic understanding of how manipulating the strength of the inter- and intrarepeat contacts affects the folding mechanism and cooperativity, we follow the fraction of native contacts within a repeat and the interfaces it has with its two neighboring repeats. When the intrarepeat contacts of repeat i are monotonically formed along the folding reaction, we detect a conflict in the formation of the two interfaces it forms with repeat i − 1 and repeat i + 1. This is manifested by attenuated formation of the interface with repeat i + 1 while the interface with repeat i − 1 is formed. This conflict is reminiscent of the topological frustration found in folding some proteins in which there is a preferred ordering of the folding pathway, whereas some alternative pathways that are quite probable may result in slow folding and even need backtracking (46,47). The frustration between the interfaces, which is a consequence of the quasi-1D geometry of the repeat proteins, becomes more severe when the interrepeat contacts between the loops are reduced, which results in a slower folding rate (Fig. 6 C). According to this analysis, the systems that fold faster than the unmodified one (by decreasing the strength of the entire interface or of the interrepeat contacts between the helices) result in smaller frustration in the formation of the two interfaces (and a higher probability for forming the intrarepeat contacts; Fig. 6 C). This lower coupling in forming the two interfaces of a given repeat unit involved is a manifestation of less cooperativity in the folding of repeat proteins.
Conclusions
Investigators have extensively used repeat proteins to study various aspects of protein folding, taking advantage of their simplicity and special attributes. In similarity to the case with multidomain proteins (48,49), the interactions between the domains in repeat units were found to modulate folding and affect misfolding, two factors that affected their evolution (50,51). Here, using coarse-grained MD simulations based on the native-state structure of proteins, we were able to capture some of the unique characteristics of folding in two different series of repeat proteins (the ANK and TPR proteins). The results show that their folding kinetics slows down and their stability increases with the number of repeats. Furthermore, our simulations show that the folding rate and cooperativity of members of the ANK and TPR proteins are very different, as observed in previous experiments. The origin of these differences may lie in the relative stabilities of different regions along the proteins, which was manifested in our simulations by the relative strength of the inter- and intrarepeat regions. By modulating the relative strength of the inter- and intrarepeat interactions, we were able to significantly alter the folding behavior of the ANK proteins and thereby transform them into having a TPR-like behavior in terms of both kinetics and cooperativity. The effect of the relative strength of intra- and intercontacts on the coupling between the folding of neighboring repeating units is reminiscent of the folding determinants of coupled folding-binding processes (52–55).
Our results, which are in line with other theoretical and experimental studies (28,30,32), point to the importance of the relative strength of the interface in the folding behavior of repeat proteins. In general, the more intrinsically unstable are the subunits, and the more they rely on the interaction they form with adjacent units, the more likely it is that the folding reaction will be highly cooperative. The stability of the interface between the ANK repeats was experimentally found to be significantly more stabilizing than that of the repeat itself (15). One way to break this coupling is to increase the heterogeneity of the energy between the repeats (56). Recently, a similar approach was applied for the TPR family, where increasing the stability of the interface between its repeats changed the folding from a multistate mechanism to a two-state one (32). In particular, a simple folding theory based on the capillarity model for propagating the linear protein (57–59) is in very good agreement with the simulation results. This simple model predicts that the folding rate of a protein with n repeating units is controlled by the energetics of the interface, and that its stability is determined by the ratio between the inter- and intrarepeat stabilities.
Our simulations show that these intricate behaviors can be captured by simple models (19,30,31), and point to the importance of the native topology in dictating the folding kinetics and cooperativity of these proteins. In addition, they show that the coupling in the folding of the different repeats, in cases where the interfaces are more stable than the repeats, arises from frustration in forming the interfaces between the two neighboring repeats. Destabilization of the interfaces (or the regions of the interfaces that participate in the transition state for propagation of the folding) diminishes this conflict and accelerates folding via accumulation of intermediates in which only some repeats are folded. We note that these effects on kinetics are still observed when we change the definition of the repeat. Several studies have suggested different definitions of the repeating units (including a repeat unit that is half of the repeat (30) or one and half of the repeat unit (19) that we were using). When we simulated the ANK5 and modulated the strength of newly defined interfaces, we observed a similar acceleration in the kinetics (data not shown). This is because the contacts that affect the folding behavior the most, by participating in the transition state (here called interloop contacts), are part of the newly defined interfaces (or at least a large portion of them; see Fig. S2).
Finally, our molecular analysis of the different regions in the ANK protein’s interface shows how modulation of different subregions can affect the folding reaction very differently. Whereas the interactions that are formed between the loops between adjacent units (interloop contacts) seem to play an important role in forming the transition state, the contacts between the helices (interhelical contacts) seem to appear in later stages of the folding reaction. Therefore, we expect that modulating a subset of regions in the interface of various repeat proteins would result in a diverse array of behaviors, further enriching our understanding of the folding of repeat proteins.
Acknowledgments
This work was supported by the Kimmelman Center for Macromolecular Assemblies and the Israeli Science Foundation. Y.L. holds the Lillian and George Lyttle Career Development Chair.
Supporting Material
References
- 1.Fersht A.R., Daggett V. Protein folding and unfolding at atomic resolution. Cell. 2002;108:573–582. doi: 10.1016/s0092-8674(02)00620-7. [DOI] [PubMed] [Google Scholar]
- 2.Onuchic J.N., Wolynes P.G. Theory of protein folding. Curr. Opin. Struct. Biol. 2004;14:70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- 3.Dill K.A., Ozkan S.B., Weikl T.R. The protein folding problem. Annu. Rev. Biophys. 2008;37:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shakhnovich E. Protein folding thermodynamics and dynamics: where physics, chemistry, and biology meet. Chem. Rev. 2006;106:1559–1588. doi: 10.1021/cr040425u. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Oliveberg M., Wolynes P.G. The experimental survey of protein-folding energy landscapes. Q. Rev. Biophys. 2005;38:245–288. doi: 10.1017/S0033583506004185. [DOI] [PubMed] [Google Scholar]
- 6.Main E.R., Lowe A.R., Regan L. A recurring theme in protein engineering: the design, stability and folding of repeat proteins. Curr. Opin. Struct. Biol. 2005;15:464–471. doi: 10.1016/j.sbi.2005.07.003. [DOI] [PubMed] [Google Scholar]
- 7.Kloss E., Courtemanche N., Barrick D. Repeat-protein folding: new insights into origins of cooperativity, stability, and topology. Arch. Biochem. Biophys. 2008;469:83–99. doi: 10.1016/j.abb.2007.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Barrick D., Ferreiro D.U., Komives E.A. Folding landscapes of ankyrin repeat proteins: experiments meet theory. Curr. Opin. Struct. Biol. 2008;18:27–34. doi: 10.1016/j.sbi.2007.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ferreiro D.U., Wolynes P.G. The capillarity picture and the kinetics of one-dimensional protein folding. Proc. Natl. Acad. Sci. USA. 2008;105:9853–9854. doi: 10.1073/pnas.0805287105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mosavi L.K., Minor D.L., Jr., Peng Z.Y. Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl. Acad. Sci. USA. 2002;99:16029–16034. doi: 10.1073/pnas.252537899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Binz H.K., Stumpp M.T., Plückthun A. Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 2003;332:489–503. doi: 10.1016/s0022-2836(03)00896-9. [DOI] [PubMed] [Google Scholar]
- 12.Stumpp M.T., Forrer P., Plückthun A. Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. J. Mol. Biol. 2003;332:471–487. doi: 10.1016/s0022-2836(03)00897-0. [DOI] [PubMed] [Google Scholar]
- 13.Wetzel S.K., Settanni G., Plückthun A. Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins. J. Mol. Biol. 2008;376:241–257. doi: 10.1016/j.jmb.2007.11.046. [DOI] [PubMed] [Google Scholar]
- 14.Main E.R., Xiong Y., Regan L. Design of stable α-helical arrays from an idealized TPR motif. Structure. 2003;11:497–508. doi: 10.1016/s0969-2126(03)00076-5. [DOI] [PubMed] [Google Scholar]
- 15.Mello C.C., Barrick D. An experimentally determined protein folding energy landscape. Proc. Natl. Acad. Sci. USA. 2004;101:14102–14107. doi: 10.1073/pnas.0403386101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kajander T., Cortajarena A.L., Regan L. A new folding paradigm for repeat proteins. J. Am. Chem. Soc. 2005;127:10188–10190. doi: 10.1021/ja0524494. [DOI] [PubMed] [Google Scholar]
- 17.Main E.R., Stott K., Regan L. Local and long-range stability in tandemly arrayed tetratricopeptide repeats. Proc. Natl. Acad. Sci. USA. 2005;102:5721–5726. doi: 10.1073/pnas.0404530102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ferreiro D.U., Cervantes C.F., Komives E.A. Stabilizing IκBα by “consensus” design. J. Mol. Biol. 2007;365:1201–1216. doi: 10.1016/j.jmb.2006.11.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ferreiro D.U., Cho S.S., Wolynes P.G. The energy landscape of modular repeat proteins: topology determines folding mechanism in the ankyrin family. J. Mol. Biol. 2005;354:679–692. doi: 10.1016/j.jmb.2005.09.078. [DOI] [PubMed] [Google Scholar]
- 20.Interlandi G., Wetzel S.K., Caflisch A. Characterization and further stabilization of designed ankyrin repeat proteins by combining molecular dynamics simulations and experiments. J. Mol. Biol. 2008;375:837–854. doi: 10.1016/j.jmb.2007.09.042. [DOI] [PubMed] [Google Scholar]
- 21.Stagg L., Samiotakis A., Wittung-Stafshede P. Residue-specific analysis of frustration in the folding landscape of repeat β/α protein apoflavodoxin. J. Mol. Biol. 2010;396:75–89. doi: 10.1016/j.jmb.2009.11.008. [DOI] [PubMed] [Google Scholar]
- 22.Löw C., Weininger U., Balbach J. Folding mechanism of an ankyrin repeat protein: scaffold and active site formation of human CDK inhibitor p19(INK4d) J. Mol. Biol. 2007;373:219–231. doi: 10.1016/j.jmb.2007.07.063. [DOI] [PubMed] [Google Scholar]
- 23.Werbeck N.D., Itzhaki L.S. Probing a moving target with a plastic unfolding intermediate of an ankyrin-repeat protein. Proc. Natl. Acad. Sci. USA. 2007;104:7863–7868. doi: 10.1073/pnas.0610315104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Löw C., Weininger U., Balbach J. Structural insights into an equilibrium folding intermediate of an archaeal ankyrin repeat protein. Proc. Natl. Acad. Sci. USA. 2008;105:3779–3784. doi: 10.1073/pnas.0710657105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lowe A.R., Itzhaki L.S. Rational redesign of the folding pathway of a modular protein. Proc. Natl. Acad. Sci. USA. 2007;104:2679–2684. doi: 10.1073/pnas.0604653104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tripp K.W., Barrick D. Rerouting the folding pathway of the Notch ankyrin domain by reshaping the energy landscape. J. Am. Chem. Soc. 2008;130:5681–5688. doi: 10.1021/ja0763201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Javadi Y., Main E.R. Exploring the folding energy landscape of a series of designed consensus tetratricopeptide repeat proteins. Proc. Natl. Acad. Sci. USA. 2009;106:17383–17388. doi: 10.1073/pnas.0907455106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Aksel T., Majumdar A., Barrick D. The contribution of entropy, enthalpy, and hydrophobic desolvation to cooperativity in repeat-protein folding. Structure. 2011;19:349–360. doi: 10.1016/j.str.2010.12.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Main E.R., Jackson S.E., Regan L. The folding and design of repeat proteins: reaching a consensus. Curr. Opin. Struct. Biol. 2003;13:482–489. doi: 10.1016/s0959-440x(03)00105-2. [DOI] [PubMed] [Google Scholar]
- 30.Ferreiro D.U., Walczak A.M., Wolynes P.G. The energy landscapes of repeat-containing proteins: topology, cooperativity, and the folding funnels of one-dimensional architectures. PLoS Comput. Biol. 2008;4:e1000070. doi: 10.1371/journal.pcbi.1000070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hagai T., Levy Y. Folding of elongated proteins: conventional or anomalous? J. Am. Chem. Soc. 2008;130:14253–14262. doi: 10.1021/ja804280p. [DOI] [PubMed] [Google Scholar]
- 32.Phillips J.J., Javadi Y., Main E.R. Modulation of the multi-state folding of designed TPR proteins through intrinsic and extrinsic factors. Protein Sci. 2011;21 doi: 10.1002/pro.2018. 327–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Onuchic J.N., Luthey-Schulten Z., Wolynes P.G. Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
- 34.Clementi C., Nymeyer H., Onuchic J.N. Topological and energetic factors: what determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? An investigation for small globular proteins. J. Mol. Biol. 2000;298:937–953. doi: 10.1006/jmbi.2000.3693. [DOI] [PubMed] [Google Scholar]
- 35.Takagi F., Koga N., Takada S. How protein thermodynamics and folding mechanisms are altered by the chaperonin cage: molecular simulations. Proc. Natl. Acad. Sci. USA. 2003;100:11367–11372. doi: 10.1073/pnas.1831920100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chavez L.L., Onuchic J.N., Clementi C. Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates. J. Am. Chem. Soc. 2004;126:8426–8432. doi: 10.1021/ja049510+. [DOI] [PubMed] [Google Scholar]
- 37.Whitford P.C., Miyashita O., Onuchic J.N. Conformational transitions of adenylate kinase: switching by cracking. J. Mol. Biol. 2007;366:1661–1671. doi: 10.1016/j.jmb.2006.11.085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cho S.S., Levy Y., Wolynes P.G. Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics. Proc. Natl. Acad. Sci. USA. 2009;106:434–439. doi: 10.1073/pnas.0810218105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mittal J., Best R.B. Thermodynamics and kinetics of protein folding under confinement. Proc. Natl. Acad. Sci. USA. 2008;105:20233–20238. doi: 10.1073/pnas.0807742105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhuang Z., Jewett A.I., Shea J.E. The effect of surface tethering on the folding of the src-SH3 protein domain. Phys. Biol. 2009;6:015004. doi: 10.1088/1478-3975/6/1/015004. [DOI] [PubMed] [Google Scholar]
- 41.Shental-Bechor D., Levy Y. Effect of glycosylation on protein folding: a close look at thermodynamic stabilization. Proc. Natl. Acad. Sci. USA. 2008;105:8256–8261. doi: 10.1073/pnas.0801340105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hagai T., Levy Y. Ubiquitin not only serves as a tag but also assists degradation by inducing protein unfolding. Proc. Natl. Acad. Sci. USA. 2010;107:2001–2006. doi: 10.1073/pnas.0912335107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sobolev V., Wade R.C., Edelman M. Molecular docking using surface complementarity. Proteins. 1996;25:120–129. doi: 10.1002/(SICI)1097-0134(199605)25:1<120::AID-PROT10>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
- 44.Koga N., Takada S. Roles of native topology and chain-length scaling in protein folding: a simulation study with a Go-like model. J. Mol. Biol. 2001;313:171–180. doi: 10.1006/jmbi.2001.5037. [DOI] [PubMed] [Google Scholar]
- 45.Kumar S., Rosenberg J.M., Kollman P.A. The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 1992;13:1011–1021. [Google Scholar]
- 46.Gosavi S., Chavez L.L., Onuchic J.N. Topological frustration and the folding of interleukin-1β. J. Mol. Biol. 2006;357:986–996. doi: 10.1016/j.jmb.2005.11.074. [DOI] [PubMed] [Google Scholar]
- 47.Capraro D.T., Roy M., Jennings P.A. β-Bulge triggers route-switching on the functional landscape of interleukin-1β. Proc. Natl. Acad. Sci. USA. 2012;109:1490–1493. doi: 10.1073/pnas.1114430109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Batey S., Nickson A.A., Clarke J. Studying the folding of multidomain proteins. HFSP J. 2008;2:365–377. doi: 10.2976/1.2991513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Han J.H., Batey S., Clarke J. The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell Biol. 2007;8:319–330. doi: 10.1038/nrm2144. [DOI] [PubMed] [Google Scholar]
- 50.Wright C.F., Teichmann S.A., Dobson C.M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature. 2005;438:878–881. doi: 10.1038/nature04195. [DOI] [PubMed] [Google Scholar]
- 51.Reshef D., Itzhaki Z., Schueler-Furman O. Increased sequence conservation of domain repeats in prokaryotic proteins. Trends Genet. 2010;26:383–387. doi: 10.1016/j.tig.2010.06.003. [DOI] [PubMed] [Google Scholar]
- 52.Wang J., Lu Q., Lu H.P. Single-molecule dynamics reveals cooperative binding-folding in protein recognition. PLoS Comput. Biol. 2006;2:e78. doi: 10.1371/journal.pcbi.0020078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wang J., Zhang K., Wang E. Dominant kinetic paths on biomolecular binding-folding energy landscape. Phys. Rev. Lett. 2006;96:168101. doi: 10.1103/PhysRevLett.96.168101. [DOI] [PubMed] [Google Scholar]
- 54.Turjanski A.G., Gutkind J.S., Hummer G. Binding-induced folding of a natively unstructured transcription factor. PLoS Comput. Biol. 2008;4:e1000060. doi: 10.1371/journal.pcbi.1000060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bhattacherjee A., Wallin S. Coupled folding-binding in a hydrophobic/polar protein model: impact of synergistic folding and disordered flanks. Biophys. J. 2012;102:569–578. doi: 10.1016/j.bpj.2011.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Street T.O., Bradley C.M., Barrick D. Predicting coupling limits from an experimentally determined energy landscape. Proc. Natl. Acad. Sci. USA. 2007;104:4907–4912. doi: 10.1073/pnas.0608756104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Wolynes P.G. Folding funnels and energy landscapes of larger proteins within the capillarity approximation. Proc. Natl. Acad. Sci. USA. 1997;94:6170–6175. doi: 10.1073/pnas.94.12.6170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Qi X., Portman J.J. Capillarity-like growth of protein folding nuclei. Proc. Natl. Acad. Sci. USA. 2008;105:11164–11169. doi: 10.1073/pnas.0711527105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Trizac E., Levy Y., Wolynes P.G. Capillarity theory for the fly-casting mechanism. Proc. Natl. Acad. Sci. USA. 2010;107:2746–2750. doi: 10.1073/pnas.0914727107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.