Abstract
The understanding of the folding mechanisms of single-domain proteins is an essential step in the understanding of protein folding in general. Recently, we developed a mesoscopic CA–CB side-chain protein model, which was successfully applied in protein structure prediction, studies of protein thermodynamics, and modeling of protein complexes. In this research, this model is employed in a detailed characterization of the folding process of a simple globular protein, the B1 domain of IgG-binding protein G (GB1). There is a vast body of experimental facts and theoretical findings for this protein. Performing unbiased, ab initio simulations, we demonstrated that the GB1 folding proceeds via the formation of an extended folding nucleus, followed by slow structure fine-tuning. Remarkably, a subset of native interactions drives the folding from the very beginning. The emerging comprehensive picture of GB1 folding perfectly matches and extends the previous experimental and theoretical studies.
Introduction
Significant theoretical and experimental research efforts are devoted to understanding how proteins fold into their native structures. Determination of the folded structure is a priority for complete biochemical protein characterization. However, a detailed understanding of the folding process requires characterization of all alternative protein conformations that emerge along the folding pathway, including the unfolded state and partially folded intermediates. Elucidation of the principles governing the folding mechanism will have broad implications for predicting structure from sequence, protein design, and an understanding of the formation and propagation of prions and amyloids.
Theoretical studies lead to better understanding of experimental results providing easy-to-interpret structural models. Molecular mechanics is a powerful method for studying complex molecular systems. However, there is a gap between the time scales of classical molecular dynamics (MD) simulation and the time scales of protein folding. Only small and ultrafast folding (in a range of microseconds) proteins are now tractable by means of classical MD simulations (1). An average protein folds orders of magnitude slower. Perhaps, the largest protein folded using the all-atom potential and electrostatic-driven Monte Carlo (MC) procedure was the 46-residue staphylococcal protein A (2). The same protein direct folding simulations were performed by the use of an implicit solvent model combined with high-temperature all-atom MD (3). Energy landscape study of a protein of similar size (40 –residues) was also carried out by an all-atom free energy force field (4).
So far, for larger proteins, all-atom simulations of the entire folding process, from a random coil to the native state, are possible only for Go models. There have been a number of Go potential-based studies of protein G using the simplified model (5), the all-atom model (6), or with a weak Go-like contribution to the applied force field (7). In Go models, only native interactions are taken into account. Consequently, the lowest energy of the native conformation is guaranteed. The obvious weak point of such an approach is that the knowledge of the native structure is needed to construct the Go potential. A significant shortcoming also results from neglecting nonnative interactions, thereby ignoring their sometimes important role in the folding mechanisms (8).
Because of the time scale limitations of the all-atom molecular mechanics, reduced models offer the most promising possibilities to study large-scale protein rearrangements, as recently demonstrated by Liwo et al. (9). Langevin dynamics with the physics-based united-residue force field was applied successfully to the folding of real proteins. Minimalist protein models with potentials biased only on native secondary structure in combination with a sequence design strategy enabled folding kinetics studies of proteins G and L (10), the two proteins with similar folds but different folding mechanisms.
We have used a reduced protein lattice model and MC dynamics to perform equilibrium folding simulations at various folding stages, beginning from the denatured state. The use of a reduced representation of polypeptide chains led to a significant reduction of the conformational space (11), thereby enabling the search for the native state at a reasonable time scale. Compared with the experimental results, we have obtained a similar sequence of folding events and have identified the interactions critical for the folding process.
Our simulations show that the folding of the B1 domain of IgG-binding protein G (GB1) domain is initiated by the formation of a specific nucleus involving the hydrophobic core residues. These residues were previously found by Shakhnovich and co-workers to participate in the specific nucleation event. The study employed all-atom MC simulations with the Go potential (6) and φ values derived from protein engineering as restraints (12). Moreover, they have shown that the nucleus residues are evolutionarily conserved among proteins that share a similar fold but have very little sequence similarity (12). This finding strongly suggests that the native fold topology is a main factor determining the character of the transition state. In this article we present ab initio folding studies that are not driven by any restraints or by structure-specific potentials. This study confirms the specific nucleation process and provides a detailed description of the sequence of nucleation and folding events, from the highly denatured state to the formation of the native-like globule.
The CA–CB side-chain (CABS) model (13) used in this work was successfully employed by the Kolinski-Bujnicki group in edition 6 of Critical Assessment of Protein Structure Prediction, a community-wide testing experiment of protein structure prediction methods (14). The approach ranked second best overall, and more importantly, in the new fold category after Rosetta, the recombination of short fragments extracted from known protein structures (15). Both methods employed a MC search of the conformational space. In comparative modeling cases, the CABS force field has been supplemented with weak spatial restraints derived from homologous templates. Based on the previous denatured state simulations, it was postulated that the CABS model can be used not only for protein structure prediction, but also to study the folding mechanism (16,17). In this study, we employ knowledge-based statistical potentials only, the same as in the ab initio protein structure prediction. The only way for the evolutionary information to enter the force field is the predicted secondary structure (in the commonly used three-letter code), providing a weak bias toward protein-like local interactions.
This is the only structure-specific input in the CABS ab initio modeling procedure, introduced to take advantage of a relatively high level of accuracy of the contemporary secondary structure prediction methods.
Methods
CABS model description
The CABS protein representation and the model force field have been described in detail recently (13). In the reduced representation of the CABS protein chains, each residue is represented by four united groups: Cα, Cβ, the center of mass of the side group, and the center of the peptide bond. Positions of the Cα atoms are restricted to a simple cubic lattice with the lattice grid equal to 0.61 Å. A large number (800) of possible orientations of the virtual Cα-Cα bonds ensure lack of lattice anisotropy effects. On the other hand, the lattice representation facilitates very fast computation of interactions and local conformational transitions. The Cα trace provides a convenient reference frame for the definition of the position of the remaining interaction centers, which are located off-lattice. In the studies of protein dynamics, the simulation process is controlled by the asymmetric Metropolis MC scheme with a long random sequence of local conformational updates. A single step of the MC algorithm consists of several attempts to execute various local conformational transitions for each residue of the model chain. The sequence of the attempts at particular transitions is generated in a random fashion. The MC process simulates the long-time stochastic dynamics of a polypeptide chain.
The force field consists of several potentials of the mean force derived from a statistical analysis of the structural correlations seen in the known protein structures. The short-range interactions include generic protein-like conformational biases and statistical potentials describing local conformational propensities. A model of the main chain network of hydrogen bonds controls mutual packing of β-strands and a proper cooperative assembly of helices. The sequence and geometric context-dependent statistical potentials describe side-group interactions with a cooperative (multibody) component built in an implicit fashion. The CABS model has been tested extensively in numerous applications, including protein structure prediction, protein docking, and studies of long-time dynamics and thermodynamics of proteins and protein assemblies.
CABS simulations and data analysis
During the simulations analyzed in this work, multiple 10,000,000-MC-step isothermal trajectories were collected at different temperatures. The BioShell package (18) for protein modeling computation was used for managing and analyzing the large volume of simulation data.
The side-chain contact patterns were derived from the distributions of the distances between the centers of gravity of the side chains, present in the CABS model, so side-chain reconstruction was not necessary. The values of the contact cutoff distances depend on the identity and mutual orientation of the amino acids involved: two amino acids were assumed to be “in contact” when the distance between any pair of their heavy atoms was smaller than 4.5 Å.
The estimations of the density of states (Fig. 1, a, b, and d, and Fig. 2) were computed using T-pile (19) from 10,000,000-MC-step isothermal trajectories. Simulations were performed independently twice at all temperatures from 1.90 to 2.10 with 0.01 increments. Each simulation produced 100,000 structures and required ∼2 d using a standard machine (3 GHz CPU, Linux box).
The software used for the analysis of the simulation data as well as the execution of CABS can be downloaded from our website (20). The movie illustrating the evolution of the density of states (and other system properties) with changing temperature can also be viewed at our homepage ((20), in the Files/Movies section).
Multiscale modeling
The multiscale modeling procedure for the data presented (see Fig. 5) consisted of the following steps. First, the missing backbone atoms were reconstructed using the BBQ algorithm (21). Subsequently, the side-chain rotamers were added using SCWRL (22). Because BBQ and SCWRL use libraries from known structures, an implicit underlying assumption is that the local conformational characteristics seen in folded structures are not far from the related protein chain characteristics in the denatured state. As discussed in the context of short-range statistical potentials, this is a legitimate working hypothesis (13). The resulting all-atom models were subjected to a short refinement procedure. We run the all-atom minimization with frozen α-carbons using Amber7 ff99 force field (23), Amber charges, dielectric constant equal to 1.0, and Powell minimization method implemented in Sybyl (Tripos, St. Louis, MO), without initial optimization. The refinement procedure was iterated 1000 times to improve arrangement of the side-chain rotamers. Side-chain positions sometimes changed significantly, but longer iterations did not bring meaningful changes. Recently a very similar procedure has been effectively used in a hierarchical approach to the model refinement and final structure selection after a coarse-grained modeling with CABS (24). Such multiscale methodology leads to a very good approximation of a model's energy (and free energy). Thus, it may be very useful in comprehensive studies of protein-folding energetics, which is now being pursued in our laboratory.
Results and discussion
Key role of the second hairpin in GB1 folding
The B1 domain of streptococcal protein G is a small, very regular α/β structure composed of 56 amino acids. The fold (25) consists of a four-stranded β-sheet and an α-helix tightly packed against the sheet. The sheet can be described as consisting of two symmetrically spaced hairpins: the first one, formed by N-terminal strands and the first β turn (β1-turn1-β2), and the second one, formed by C-terminal strands and the second β turn (β3-turn2-β4). Numerous experimental and theoretical studies highlight the early formation of the second hairpin and its key role in the GB1 folding. The second hairpin was found to be stable in isolation (26) and protected from hydrogen/deuterium exchange early during the folding of the entire protein (27). The isolated second β-hairpin folds on the 10-μs time scale, which is two orders of magnitude faster than the intact protein folding (28). The second hairpin fragment became a model short peptide, and its folding was extensively studied by MD and MC simulations (29). The importance of the second hairpin was also confirmed by φ value analysis (30), which suggested the presence of the well-formed second β-turn in the transition state ensemble.
The role of the supersecondary structural elements in GB1 folding was investigated by coarse-grained MC folding and unfolding MD simulations, which indicated that the helix-second hairpin fragment can stabilize itself to some extent independently of the rest of the protein, whereas the first hairpin cannot (31).
Hydrogen-deuterium exchange and NMR studies suggested that three regions of GB1 may correspond to nucleation sites: the second β-hairpin, the middle of the α-helix, and the N-terminus of the α-helix (27,32). These findings were also found to be consistent with MD unfolding studies (33). Full atom MD in specific solvents suggested that the β-sheet is more mobile and might be expected to unfold earlier than the helix (34). According to protein engineering studies of protein G, complex consequences of the folding kinetics of single point mutations in the helix may suggest its structural diversity during the folding (30). However, effects of several mutations suggested that the helix's C-terminus is better defined than the rest of the helix at the folding transition state ensemble (30).
Protein G unfolding dynamics was also investigated by the older version of the CABS force field with a different chain representation (SICHO model) (35). The first β-hairpin was found to unfold first and to be significantly less stable than the second β-hairpin. Here, we present a much more detailed study. Extensive isothermal simulations from a highly denatured state to the folding temperature were performed, providing a detailed overview of the entire folding process. Changes of the protein properties with the system temperature (T) measured by CABS energy, similarly to the native structure, coordinate root mean-square deviation (cRMSD) and radius of gyration, are shown in Fig. 1. Temperature dependence of the folding progress, with respect to the particular secondary structure elements, measured by the number of native contacts within the given substructure, is illustrated in Fig. 2. According to Fig. 2, the second hairpin and the terminal strands fold in a cooperative two-state fashion, whereas the helix folds in a continuous manner. Furthermore, the native arrangement of the terminal strands seems to have the largest contribution to the overall folding cooperativity. The first hairpin is definitely the least-ordered substructure after the folding transition.
The early formation of the helix and the second β-hairpin is also apparent in residue-specific analysis of the MC trajectories. A native-like arrangement of the side-chain contacts in the transition structures at the transition temperature (Tt) can clearly be seen in Fig. 3 d. Definitely, the helix is the most-ordered residual structure at Tt. It is stabilized by a number of short-range contacts between the side chains separated in the sequence by 3 or 4 positions. Another noteworthy fragment of the local structure at Tt is the second β-hairpin (Fig. 3 d). However, with respect to the long-range interactions, the second hairpin is the main nucleation site, responsible for the early structure formation.
Nucleus residues: Y3, L5, F30, W43, Y45, F52
Changes of the protein properties with the system temperature (Fig. 1) suggest a two-state folding kinetics. Two main assemblies are present along the temperature coordinate that correspond to the denatured and more native-like conformations (definitely less heterogeneous than denatured). Transient conformations between these two assemblies were characterized at Tt by residue-specific contact studies (Fig. 3) and by an average side-chain contact map (Fig. 4 b). This analysis and side-chain contact studies presented below show that the folding transition is associated with a specific nucleation event.
Six residues (Y3, L5, F30, W43, Y45, F52) were found to form the folding nucleus during the MC dynamics simulations described here. The nucleus geometry is shown in Fig. 5 (bottom). The composition of the nucleus is in agreement with the all-atom MC simulations using φ-values as restraints performed by Shakhnovich and co-workers (12). Moreover, the same set of residues was identified by these authors via an inspection of residue conservation in proteins that share fold but not sequence similarity to protein G. The analysis of conservation has shown that all six nucleus residues are among the 10 most frequently conserved in the protein G-like folds. The remaining four of the 10 most conserved ones are T18, A23, A26, and V54. Interestingly, in their secondary structure, subunits A26 and V54 are the second most frequently long-range interacting residues (at Tt): A26 after F30 in the helix, and V54 after F52 in the second hairpin (Fig. 3, b and d). As the authors noted, the nucleus residues are conserved with a high statistical significance in respect to the rest of the sequence, which is consistent with similar observations for other fold families (36). These findings support the idea that the transition state structures and folding mechanisms are determined by the fold topology of native proteins (37). There is a very valuable confirmation of this idea in folding/denaturation studies of ubiquitin (38), which identified a quite similar folding nucleus to that observed by us in GB1. Both proteins are sequentially completely different, although they share the same overall fold, and therefore it is very likely that their folding mechanisms are similar.
Folding mechanism
Native contact clusters analysis (in the long-range interactions terms, Table 1) reveals a well-defined sequence of folding events. The folding process can be described as the sequential assembly of the elements of the supersecondary structure, with the key residues attaching successively to the folding nucleus:
-
1.
The first folding event is the formation of the second β-hairpin (strongly stabilized by three hydrophobic residues: W43, Y45, F52).
-
2.
With the decrease of T, contacts between the α-helix (F30) and the second β-hairpin (W43, Y45, F52) are increasingly stronger, and finally, at the Tt, they become more persistent than the key contacts within the second hairpin.
-
3.
The next folding event is the nucleation of the β-sheet residues between β1- and β4-strands beginning from L5 (β1) and F52 (β4), at the beginning assisted by W43 (β3).
-
4.
The involvement of the last nucleus residue in the nucleation process, Y3, results in the formation of the central part of the β-sheet (β1–β4) and the correct fold topology. A fluctuating native-like globule is formed.
Table 1.
T | 1 | %1 | 2 | %2 | 3 | %3 | 4 | %4 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2.10 | 45-52 | 25.7 | 43-52 | 45-52 | 15.0 | 43-52 | 43-54 | 45-52 | 6.9 | 30-43 | 30-52 | 43-52 | 45-52 | 3.1 |
43-52 | 25.0 | 43-52 | 43-54 | 9.6 | 30-43 | 30-52 | 43-52 | 4.6 | 5-43 | 5-52 | 43-52 | 45-52 | 2.2 | |
30-43 | 20.7 | 43-54 | 45-52 | 9.2 | 30-43 | 43-52 | 45-52 | 4.5 | 30-43 | 43-52 | 43-52 | 43-54 | 2.1 | |
5-43 | 17.8 | 30-43 | 45-52 | 6.9 | 30-52 | 43-52 | 45-52 | 4.3 | 30-52 | 43-52 | 43-54 | 54-52 | 2.0 | |
30-52 | 14.3 | 30-52 | 43-52 | 6.9 | 30-43 | 30-52 | 45-52 | 4.0 | 30-43 | 30-52 | 43-52 | 43-54 | 1.8 | |
2.04 | 45-52 | 32.0 | 43-52 | 45-52 | 19.8 | 43-52 | 43-54 | 45-52 | 9.9 | 30-43 | 30-52 | 43-52 | 45-52 | 5.8 |
43-52 | 31.7 | 43-52 | 45-52 | 12.9 | 30-43 | 30-52 | 43-52 | 8.6 | 5-43 | 5-52 | 43-52 | 45-52 | 4.3 | |
30-43 | 28.3 | 43-54 | 45-52 | 12.7 | 30-43 | 43-52 | 45-52 | 8.0 | 30-43 | 43-52 | 43-54 | 45-52 | 4.1 | |
5-43 | 26.5 | 30-43 | 45-52 | 12.4 | 30-52 | 43-52 | 45-52 | 7.8 | 30-52 | 43-52 | 43-54 | 45-52 | 4.0 | |
30-52 | 22.4 | 30-52 | 43-52 | 11.9 | 30-43 | 30-52 | 45-52 | 7.3 | 26-52 | 30-43 | 30-52 | 43-52 | 3.6 | |
2.01 | 43-52 | 38.5 | 43-52 | 45-52 | 24.4 | 43-52 | 43-54 | 45-52 | 13.3 | 30-43 | 30-52 | 43-52 | 45-52 | 8.4 |
45-52 | 37.2 | 30-52 | 43-52 | 17.7 | 30-43 | 30-52 | 43-52 | 12.4 | 30-43 | 43-52 | 43-54 | 45-52 | 6.5 | |
30-43 | 33.2 | 43-54 | 45-52 | 17.1 | 30-43 | 43-52 | 45-52 | 11.4 | 3-52 | 5-52 | 5-54 | 7-54 | 6.4 | |
5-43 | 30.6 | 30-43 | 43-52 | 16.6 | 30-52 | 43-52 | 45-52 | 11.2 | 30-52 | 43-52 | 43-54 | 45-52 | 6.3 | |
30-52 | 29.4 | 43-52 | 43-54 | 16.3 | 30-43 | 30-52 | 45-52 | 10.4 | 5-43 | 5-52 | 43-52 | 45-52 | 6.1 | |
1.98 | 43-52 | 46.9 | 43-52 | 45-52 | 28.9 | 30-43 | 30-52 | 43-52 | 19.3 | 30-43 | 30-52 | 43-52 | 45-52 | 12.2 |
45-52 | 42.1 | 30-52 | 43-52 | 26.5 | 43-52 | 43-54 | 45-52 | 16.7 | 3-52 | 5-52 | 5-54 | 7-54 | 11.9 | |
30-43 | 41.3 | 30-43 | 43-52 | 24.8 | 30-43 | 43-52 | 45-52 | 16.3 | 5-52 | 30-43 | 30-52 | 43-52 | 11.3 | |
30-52 | 39.8 | 30-43 | 30-52 | 23.3 | 5-52 | 30-52 | 43-52 | 15.7 | 4-51 | 5-52 | 6-53 | 7-54 | 11.0 | |
5-43 | 37.2 | 30-43 | 45-52 | 21.7 | 30-52 | 43-52 | 45-52 | 15.6 | 3-52 | 5-52 | 7-54 | 30-52 | 10.7 | |
1.96 | 43-52 | 52.4 | 30-52 | 43-52 | 32.6 | 30-43 | 30-52 | 43-52 | 20.8 | 3-52 | 5-52 | 5-54 | 7-54 | 15.8 |
30-52 | 47.2 | 43-52 | 45-52 | 32.0 | 3-52 | 5-52 | 30-52 | 18.3 | 4-51 | 5-52 | 6-53 | 7-54 | 14.7 | |
30-43 | 46.4 | 30-43 | 43-52 | 30.6 | 3-52 | 5-52 | 5-54 | 18.2 | 3-52 | 5-52 | 7-54 | 30-52 | 14.4 | |
45-52 | 45.0 | 30-43 | 30-52 | 28.9 | 5-52 | 30-52 | 43-52 | 18.1 | 3-52 | 5-52 | 5-54 | 30-52 | 14.2 | |
5-52 | 44.0 | 30-43 | 45-52 | 25.6 | 3-52 | 5-52 | 7-54 | 18.0 | 3-52 | 5-52 | 6-53 | 7-54 | 14.0 | |
3-52 | 39.9 |
For each temperature five most frequently appearing native contacts (1), doublets (2), triplets (3), and quadruplets of native contacts (4) are shown. Their frequencies, expressed as the percentages of the snapshots in corresponding trajectories, are also presented. At T = 1.96, near the folding Tt, the sixth contact (in the order of the frequency of occurrence) is additionally shown. Data were collected from isothermal trajectories of 10,000,000 MC steps.
Temperature dependence of the native contact clustering and the sequential acquisition of the native-like secondary structure (Fig. 2) are consistent with protein engineering studies of the β-sheet formation. Mutations in β1-, β3-, and β4-strands have intermediate φ-values, suggesting partial formation of a three-stranded β-sheet composed of these strands in the folding transition state ensemble, whereas the first hairpin and the helix have relatively low φ-values (30). A similar picture of the transition state can also be inferred from the observed subsets of long-range (i-ii ≥ 5) native-like side-chain contacts seen in the transition structures at Tt (Fig. 3 d). Strands β1, β3, and β4 are the most native-like. The internal ordering of the helix is stabilized mainly by local (short-range) interactions, and therefore, single mutations within the helix should not have much impact on the whole GB1 folding mechanism.
It should be emphasized that the description presented here of the protein G folding nucleus growth (Table 1), to our knowledge, is the first in-depth, residue-level report of the nucleation process from the very beginning of a protein of comparable or larger size. In previously mentioned work describing all-atom Go simulations of protein G (6), particularly early folding stages are reported less thoroughly, in terms of secondary structure element acquisition. Namely, three folding pathways were observed, each involving formation of its own assembly: helix-first hairpin, helix-second hairpin, and β1–β4. All pathways appeared to converge to the same folding nucleus, which perfectly agrees with what we found in our simulations, although it is difficult to compare the earlier folding stages.
Although descriptions of the folding nucleus on the level of individual residues match perfectly the above all-atom Go simulations and evolutionary studies, it seems justified to conclude that the resolution of protein features representation in the CABS model is high enough. The minimalist model used in folding kinetic studies of proteins G and L (10) seems to be an example of an insufficient protein representation. As authors admit, their model inadequately describes β-sheet structure (it forms a β-strand bundle instead of a β-sheet for protein G). Their protein G folding studies emphasize the role of assembly of β3-strand with β2 and β1, especially in an intermediate formation (nonnative interactions, not reported before), and, for example, completely neglects the role of β1–β4 formation in the fold assembly, so important in these studies.
Highly denatured state exhibits native-like long-range interactions
At T = 2.1 GB1 is highly unfolded with the average radius of gyration, Rg = 16.4 Å, and the chain expands occasionally to an Rg range of 35 Å (for Rg distribution at different temperatures see Fig. 1 d). At this highly swollen state, the protein begins to form first native-like, long-range interactions (for the side-chain contact map see Fig. 4 a). The average size at T = 2.1 is smaller than the GB1 random coil size (23 Å) (39), mostly because of the partial formation of the helix. As can be seen in Fig. 4 a, the center of the helix is the most persistent fragment of the residual structure, present in more than half of snapshots. The rest of the structure, especially in respect to the nonlocal interactions, remains highly disordered. Remarkably, the most frequently occurring long-range side-chain contacts and their fluctuating clusters involve native contacts between the residues found to participate in the folding nucleus (Table 1, T = 2.1). The most frequent nonnative long-range (i-ii ≥ 5) contacts are also observed for the residues participating in the nucleus: Y33–W43 (present in 25% of snapshots), T25–F30 (17%), Y45–K50 (16%), Y45–T51 (15%), and Y3–W43 (15%). The frequent contacts of Y33 may indicate an important role of this residue in the tight packing of a slightly deformed helix against the sheet. This issue has also been discussed by others (34,40).
Recently, there have been an increasing number of reports suggesting a significant presence of the secondary structure and hydrophobic clustering, even under highly denaturing conditions (41,42). Moreover, it was found that a highly denatured protein can exhibit long-range ordering loosely resembling the native-like topology (43). Therefore, the folding process can be directed from the very beginning, and the search for the conformational space could be more efficient when not starting from an accidental structure. Current assumptions are that understanding of the folding process may be possible after more complete structural studies of the denatured state.
Clearly, from the very beginning of the folding process, the residual structure is initiated by hydrophobic interactions. The major role of the hydrophobic interactions in determining the folding route was also found in a reduced modeling study of protein G, whereby the hydrophobic interactions excised from physical energy terms critically affected the folding route (7). The MC simulations presented here confirm the crucial role of the hydrophobic interactions in the initiation and propagation of protein folding (44).
It should be pointed out that although the simulations started from the native structure, the relaxation of the system is so fast that any effects of memory of the initial conformation are completely eliminated.
Structural characteristics of the native-like globule
The formation of the folding nucleus is followed by the assembly of a native-like loosely packed globular structure with the average cRMSD above 5 Å from the native at the Tt (see Fig. 1 c). To investigate structural properties of this globular state, we extracted a large number of snapshots from the isothermal MC trajectory at Tt of 100,000 structures. Over 14,000 structures (the structures from the low-energy basin having CABS energy values between −240 and −320 and cRMSD from native between 4 and 8 Å) were extracted and clustered using a single-link clustering algorithm (45). In the single-link clustering, the distance between the clusters is defined as the distance between their closest members. With the cutoff distance of 2.2 Å, the largest cluster (representing the lowest free-energy basin) consists of 2557 structures. The structures in this cluster have the correct native topology, although with highly variable structural details. The remaining clusters are smaller by two orders of magnitude and represent incorrect folds (incorrect arrangements of the stands in the β-sheet or the C-terminal and the N-terminal strands shifted in the register by two residues). The schematic drawing of the centroid of the largest cluster and the comparison of the sizes of the top 10 clusters are given in Fig. 6. Clearly, relatively well-defined native-like structures dominate the simulation trajectory.
Is there an intermediate state?
GB1 was initially thought to fold by a two-state process, as do many other small proteins (46). However, continuous-flow fluorescence measurements of GB1 kinetics demonstrated clear deviations from the kinetics expected for a simple two-state process and indicated the presence of an intermediate (47). The time course of the refolding revealed a prominent exponential phase with a time constant of 600–700 μs followed by a second, rate-limiting process with a time constant of 2 ms or longer, depending on denaturant concentration. According to Park et al. (47), the biphasic kinetics of the folding can be modeled quantitatively on the basis of a three-state folding mechanism (folding through an intermediate). An ensemble of intermediate states represents native-like fluorescence properties: W43 becomes largely buried during the initial phase of folding. Additionally, the denaturant-dependent rate constant studies provided insight into the solvent-accessible surface area associated with each transition (47). According to this analysis, the initial barrier, the first transition state TS1, represents a well-solvated ensemble of states with α = 0.29 (the α values indicate a change in the solvent-accessible surface area relative to the unfolded state, for the unfolded state, α = 0; for the native, α = 1), whereas both the intermediate and the second transition state, TS2, are nearly as desolvated as the native state (α = 0.85). The high α value for the intermediate state, αIntermediate = 0.85, implies that it represents a compact set of conformations with the solvent-exposed surface area only slightly larger than that of the native state, which is consistent with its native-like fluorescence properties. A comparison with other proteins for which αIntermediate values have been determined on the basis of stopped-flow data shows that the intermediate in the GB1 folding is unusually compact (48). According to Park et al. (47), this is probably a consequence of the fact that the hydrophobic core of GB1 is relatively large for a protein of its size.
The plots shown in Fig. 1 suggest a two-state behavior, although a closer inspection of the structures at the temperatures just below the region of the steepest changes of structural properties clearly indicates that this cooperative chain collapse leads to the molten globule state (Fig. 2), consistent with that described by Park et al. (47).
The molten globule is known as a stable, collapsed state with a partial native-like ordering that proteins can adopt under certain conditions (49,50). The molten globule exhibits a somewhat native secondary and tertiary structure, although with a high mobility of slightly exposed side chains. Interestingly, in our earlier simulation studies (S. Kmiecik and A. Kolinski, unpublished results) of small single-domain proteins, structures significantly more compact and closer to the native structures were usually observed. This is another indication of the three-state folding of GB1. In some simulations of GB1 we observed that long relaxation of the molten globule-like structures, after the initial fast collapse to the proper topology, leads to more closely packed structures with much smaller overall fluctuations near the native state. Because of the coarse-grained character of the CABS model, these compact structures exhibit only partially native-like packing (the best structures were ∼2.5 Å from the native). The backbone geometry and the main-chain hydrogen bond network were native-like, although only a fraction of the side chains became fully fixed (compare the work by Hubner, Shimada, et al. (12)).
Interestingly, the presence of an intermediate is still under debate. Krantz et al. (51) questioned the validity of the analysis by Roder and co-workers (47), suggesting a folding without accumulation of an intermediate, which was later refuted by Roder et al. (52).
Our GB1 simulations support the folding scenario starting from the formation of a loosely packed ensemble of relatively compact states with a native-like overall fold, followed by a rate-limiting formation of the unique native structure with its tightly packed core. Whether the two folding stages can be observed experimentally depends on the stability of the compact intermediates (47). Such accumulation of compact states with the native-like features during the GB1 folding was also observed in MD simulations (53).
Distinct mechanisms of GB1 and CI2 folding
Very interesting is the comparison of the GB1 folding mechanism with the folding mechanism of another small protein, CI2, which is also a paradigm for kinetics studies. Results of CABS modeling of the CI2 folding pathway have been published recently (16). We observed that GB1 and CI2 fold by somewhat different mechanisms. This is in slight disagreement with the interpretation of the experimental data suggested by Daggett and Fersht (54). According to them, CI2 folds via nucleation-collapse around an extended nucleus, similarly to what has been observed in this work for GB1. Indeed, in the case of GB1, all nucleus residues take part in the nucleation event at very early stages of folding. On the contrary, CI2 folds via the assembly of distinct cooperative subunits (16). At the folding transition, the only native tertiary interactions are observed between the two central strands, β3β4. Consolidation of the helix and β3β4 takes place at lower temperatures. In other words, in our simulations CI2 folding can be described as a sequential formation of secondary structure modules, whereas protein G folds as a single structural module (exactly as interpreted by Daggett and Fersht in the case of CI2). Interestingly, when the CABS energy (or radius of gyration) is studied as a function of temperature during the CI2 folding, the stepwise formation of cooperative subunits does not affect its exponential thermal characteristics. It appears that the differences observed in simulations in the folding pathways of GB1 and CI2 are actually in agreement with the available experimental data; the mechanistic picture the simulations provide is easier for comprehensive analysis and interpretation.
Conclusions
The theoretical model of protein G folding resulting from these computer studies is consistent with the experimental observations (30,47) as well as with previous theoretical studies performed by the all-atom Go model, revealing the existence of a well-defined folding nucleus (6,12) and providing a comprehensive picture of the folding mechanism.
This approach may be a very useful tool for qualitative studies of entire folding pathways of large proteins and macromolecular assemblies. Because a procedure exists for fast and accurate protein chain rebuilding (21) with subsequent all-atom refinement and model assessment (24), the proposed method should be applicable to detailed computational studies of long-time dynamics of biomolecular systems. An example of such multiscale modeling is schematically illustrated in Fig. 5, where the rebuilding of the atomic details enabled precise identification of the interaction of the side chains forming the folding nucleus.
The physical folding mechanism observed in the CABS simulations strongly suggests that the interactions in the denatured state are very similar to those in the native structures. Consequently, the knowledge-based potentials derived from native structures are a good approximation of the interactions in the denatured state (16).
Simulations described here provide a detailed insight into the folding mechanism at the level of individual residues. The results consistent with the experimental and theoretical findings prove that the proposed MC dynamics and a sampling scheme mimic qualitative features of the continuous long-time protein dynamics. This opens up a possibility of efficient multiscale computational studies of protein dynamics, folding mechanisms, and protein docking mechanisms.
Acknowledgments
The authors thank Dr. Dominik Gront for the calculations of the density of states (19), which enabled fine data visualization (used in Figs. 1 and 2) and valuable assistance in the contact clusters statistics.
Some simulations described here were conducted at the Computer Center of the Faculty of Chemistry of the University of Warsaw.
Footnotes
Editor: Ron Elber.
References
- 1.Duan Y., Kollman P.A. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 1998;282:740–744. doi: 10.1126/science.282.5389.740. [DOI] [PubMed] [Google Scholar]
- 2.Vila J.A., Ripoll D.R., Scheraga H.A. Atomically detailed folding simulation of the B domain of staphylococcal protein A from random structures. Proc. Natl. Acad. Sci. USA. 2003;100:14812–14816. doi: 10.1073/pnas.2436463100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jang S., Kim E., Shin S., Pak Y. Ab initio folding of helix bundle proteins using molecular dynamics simulations. J. Am. Chem. Soc. 2003;125:14841–14846. doi: 10.1021/ja034701i. [DOI] [PubMed] [Google Scholar]
- 4.Herges T., Wenzel W. silico folding of a three helix protein and characterization of its free-energy landscape in an all-atom force field. Phys. Rev. Lett. 2005;94:018101. doi: 10.1103/PhysRevLett.94.018101. [DOI] [PubMed] [Google Scholar]
- 5.Prieto L., de Sancho D., Rey A. Thermodynamics of Go-type models for protein folding. J. Chem. Phys. 2005;123:154903. doi: 10.1063/1.2064888. [DOI] [PubMed] [Google Scholar]
- 6.Shimada J., Shakhnovich E.I. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Proc. Natl. Acad. Sci. USA. 2002;99:11175–11180. doi: 10.1073/pnas.162268099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lee S.Y., Fujitsuka Y., Kim D.H., Takada S. Roles of physical interactions in determining protein-folding mechanisms: molecular simulation of protein G and α spectrin SH3. Proteins. 2004;55:128–138. doi: 10.1002/prot.10576. [DOI] [PubMed] [Google Scholar]
- 8.Blanco F.J., Ortiz A.R., Serrano L. Role of a nonnative interaction in the folding of the protein G B1 domain as inferred from the conformational analysis of the α-helix fragment. Fold. Des. 1997;2:123–133. doi: 10.1016/s1359-0278(97)00017-5. [DOI] [PubMed] [Google Scholar]
- 9.Liwo A., Khalili M., Scheraga H.A. Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc. Natl. Acad. Sci. USA. 2005;102:2362–2367. doi: 10.1073/pnas.0408885102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Brown S., Head-Gordon T. Intermediates and the folding of proteins L and G. Protein Sci. 2004;13:958–970. doi: 10.1110/ps.03316004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Levitt M., Warshel A. Computer simulation of protein folding. Nature. 1975;253:694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
- 12.Hubner I.A., Shimada J., Shakhnovich E.I. Commitment and nucleation in the protein G transition state. J. Mol. Biol. 2004;336:745–761. doi: 10.1016/j.jmb.2003.12.032. [DOI] [PubMed] [Google Scholar]
- 13.Kolinski A. Protein modeling and structure prediction with a reduced representation. Acta Biochim. Pol. 2004;51:349–371. [PubMed] [Google Scholar]
- 14.Kolinski A., Bujnicki J.M. Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins. 2005;61:84–90. doi: 10.1002/prot.20723. [DOI] [PubMed] [Google Scholar]
- 15.Bradley P., Malmstrom L., Qian B., Schonbrun J., Chivian D., Kim D.E., Meiler J., Misura K.M., Baker D. Free modeling with Rosetta in CASP6. Proteins. 2005;61(Suppl 7):128–134. doi: 10.1002/prot.20729. [DOI] [PubMed] [Google Scholar]
- 16.Kmiecik S., Kolinski A. Characterization of protein-folding pathways by reduced-space modeling. Proc. Natl. Acad. Sci. USA. 2007;104:12330–12335. doi: 10.1073/pnas.0702265104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kmiecik S., Kurcinski M., Rutkowska A., Gront D., Kolinski A. Denatured proteins and early folding intermediates simulated in a reduced conformational space. Acta Biochim. Pol. 2006;53:131–144. [PubMed] [Google Scholar]
- 18.Gront D., Kolinski A. BioShell–a package of tools for structural biology computations. Bioinformatics. 2006;22:621–622. doi: 10.1093/bioinformatics/btk037. [DOI] [PubMed] [Google Scholar]
- 19.Gront D., Kolinski A. T-Pile–a package for thermodynamic calculations for biomolecules. Bioinformatics. 2007;23:1840–1842. doi: 10.1093/bioinformatics/btm259. [DOI] [PubMed] [Google Scholar]
- 20.Home page of the Laboratory of Theory of Biopolymers. http://www.biocomp.chem.uw.edu.pl.
- 21.Gront D., Kmiecik S., Kolinski A. Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from α carbon coordinates. J. Comput. Chem. 2007;28:1593–1597. doi: 10.1002/jcc.20624. [DOI] [PubMed] [Google Scholar]
- 22.Canutescu A.A., Shelenkov A.A., Dunbrack R.L. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 2003;12:2001–2014. doi: 10.1110/ps.03154503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Case, D. A., D. A. Pearlman, J. W. Caldwell, T. E. Cheatham III, J. Wang, W. S. Ross, C. L. Simmerling, T. A. Darden, K. M. Merz, R. V. Stanton, A. I. Cheung, J. J. Vincent, M. Crowley, V. Tsui, H. Gohike, R. J. Radmer, Y. Duan, J. Pitera, I. Massova, G. L. Seibel, U. C. Singh, P. K. Weiner, and P. A. Kollman. 2002. Amber 7. University of California, San Francisco.
- 24.Kmiecik S., Gront D., Kolinski A. Towards the high-resolution protein structure prediction. Fast refinement of reduced models with all-atom force field. BMC Struct. Biol. 2007;7:43. doi: 10.1186/1472-6807-7-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gronenborn A.M., Filpula D.R., Essig N.Z., Achari A., Whitlow M., Wingfield P.T., Clore G.M. A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. Science. 1991;253:657–661. doi: 10.1126/science.1871600. [DOI] [PubMed] [Google Scholar]
- 26.Blanco F.J., Rivas G., Serrano L. A short linear peptide that folds into a native stable β-hairpin in aqueous solution. Nat. Struct. Biol. 1994;1:584–590. doi: 10.1038/nsb0994-584. [DOI] [PubMed] [Google Scholar]
- 27.Kuszewski J., Clore G.M., Gronenborn A.M. Fast folding of a prototypic polypeptide: the immunoglobulin binding domain of streptococcal protein G. Protein Sci. 1994;3:1945–1952. doi: 10.1002/pro.5560031106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Munoz V., Thompson P.A., Hofrichter J., Eaton W.A. Folding dynamics and mechanism of β-hairpin formation. Nature. 1997;390:196–199. doi: 10.1038/36626. [DOI] [PubMed] [Google Scholar]
- 29.Kolinski A., Ilkowski B., Skolnick J. Dynamics and thermodynamics of β-hairpin assembly: insights from various simulation techniques. Biophys. J. 1999;77:2942–2952. doi: 10.1016/S0006-3495(99)77127-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.McCallister E.L., Alm E., Baker D. Critical role of β-hairpin formation in protein G folding. Nat. Struct. Biol. 2000;7:669–673. doi: 10.1038/77971. [DOI] [PubMed] [Google Scholar]
- 31.Derreumaux P. Role of supersecondary structural elements in protein G folding. J. Chem. Phys. 2003;119:4940–4944. [Google Scholar]
- 32.Frank M.K., Clore G.M., Gronenborn A.M. Structural and dynamic characterization of the urea denatured state of the immunoglobulin binding domain of streptococcal protein G by multidimensional heteronuclear NMR spectroscopy. Protein Sci. 1995;4:2605–2615. doi: 10.1002/pro.5560041218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Brooks C.L. Protein and peptide folding explored with molecular simulations. Acc. Chem. Res. 2002;35:447–454. doi: 10.1021/ar0100172. [DOI] [PubMed] [Google Scholar]
- 34.Sheinerman F.B., Brooks C.L. A molecular dynamics simulation study of segment B1 of protein G. Proteins. 1997;29:193–202. doi: 10.1002/(sici)1097-0134(199710)29:2<193::aid-prot7>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
- 35.Kolinski A., Klein P., Romiszowski P., Skolnick J. Unfolding of globular proteins: Monte Carlo dynamics of a realistic reduced model. Biophys. J. 2003;85:3271–3278. doi: 10.1016/S0006-3495(03)74745-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mirny L., Shakhnovich E. Evolutionary conservation of the folding nucleus. J. Mol. Biol. 2001;308:123–129. doi: 10.1006/jmbi.2001.4602. [DOI] [PubMed] [Google Scholar]
- 37.Alm E., Baker D. Matching theory and experiment in protein folding. Curr. Opin. Struct. Biol. 1999;9:189–196. doi: 10.1016/S0959-440X(99)80027-X. [DOI] [PubMed] [Google Scholar]
- 38.Babu C.R., Hilser V.J., Wand A.J. Direct access to the cooperative substructure of proteins and the protein ensemble via cold denaturation. Nat. Struct. Mol. Biol. 2004;11:352–357. doi: 10.1038/nsmb739. [DOI] [PubMed] [Google Scholar]
- 39.Smith C.K., Bu Z., Anderson K.S., Sturtevant J.M., Engelman D.M., Regan L. Surface point mutations that significantly alter the structure and stability of a protein's denatured state. Protein Sci. 1996;5:2009–2019. doi: 10.1002/pro.5560051007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Clore G.M., Gronenborn A.M. Localization of bound water in the solution structure of the immunoglobulin binding domain of streptococcal protein G. Evidence for solvent-induced helical distortion in solution. J. Mol. Biol. 1992;223:853–856. doi: 10.1016/0022-2836(92)90247-h. [DOI] [PubMed] [Google Scholar]
- 41.Klein-Seetharaman J., Oikawa M., Grimshaw S.B., Wirmer J., Duchardt E., Ueda T., Imoto T., Smith L.J., Dobson C.M., Schwalbe H. Long-range interactions within a nonnative protein. Science. 2002;295:1719–1722. doi: 10.1126/science.1067680. [DOI] [PubMed] [Google Scholar]
- 42.Kazmirski S.L., Wong K.B., Freund S.M., Tan Y.J., Fersht A.R., Daggett V. Protein folding from a highly disordered denatured state: the folding pathway of chymotrypsin inhibitor 2 at atomic resolution. Proc. Natl. Acad. Sci. USA. 2001;98:4349–4354. doi: 10.1073/pnas.071054398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Shortle D., Ackerman M.S. Persistence of native-like topology in a denatured protein in 8M urea. Science. 2001;293:487–489. doi: 10.1126/science.1060438. [DOI] [PubMed] [Google Scholar]
- 44.Dyson H.J., Wright P.E., Scheraga H.A. The role of hydrophobic interactions in initiation and propagation of protein folding. Proc. Natl. Acad. Sci. USA. 2006;103:13057–13061. doi: 10.1073/pnas.0605504103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gront D., Kolinski A. HCPM–program for hierarchical clustering of protein models. Bioinformatics. 2005;21:3179–3180. doi: 10.1093/bioinformatics/bti450. [DOI] [PubMed] [Google Scholar]
- 46.Jackson S.E. How do small single-domain proteins fold? Fold. Des. 1998;3:R81–R91. doi: 10.1016/S1359-0278(98)00033-9. [DOI] [PubMed] [Google Scholar]
- 47.Park S.H., Shastry M.C., Roder H. Folding dynamics of the B1 domain of protein G explored by ultrarapid mixing. Nat. Struct. Biol. 1999;6:943–947. doi: 10.1038/13311. [DOI] [PubMed] [Google Scholar]
- 48.Roder H., Colon W. Kinetic role of early intermediates in protein folding. Curr. Opin. Struct. Biol. 1997;7:15–28. doi: 10.1016/s0959-440x(97)80004-8. [DOI] [PubMed] [Google Scholar]
- 49.Ptitsyn O.B. Molten globule and protein folding. Adv. Protein Chem. 1995;47:83–229. doi: 10.1016/s0065-3233(08)60546-x. [DOI] [PubMed] [Google Scholar]
- 50.Kuwajima K. The molten globule state as a clue for understanding the folding and cooperativity of globular-protein structure. Proteins. 1989;6:87–103. doi: 10.1002/prot.340060202. [DOI] [PubMed] [Google Scholar]
- 51.Krantz B.A., Mayne L., Rumbley J., Englander S.W., Sosnick T.R. Fast and slow intermediate accumulation and the initial barrier mechanism in protein folding. J. Mol. Biol. 2002;324:359–371. doi: 10.1016/s0022-2836(02)01029-x. [DOI] [PubMed] [Google Scholar]
- 52.Roder H., Maki K., Cheng H. Early events in protein folding explored by rapid mixing methods. Chem. Rev. 2006;106:1836–1861. doi: 10.1021/cr040430y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sheinerman F.B., Brooks C.L. Calculations on folding of segment B1 of streptococcal protein G. J. Mol. Biol. 1998;278:439–456. doi: 10.1006/jmbi.1998.1688. [DOI] [PubMed] [Google Scholar]
- 54.Daggett V., Fersht A. The present view of the mechanism of protein folding. Nature reviews. 2003;4:497–502. doi: 10.1038/nrm1126. [DOI] [PubMed] [Google Scholar]
- 55.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 56.DeLano, W. L. 2002. The PyMOL Molecular Graphics System. http://www.pymol.org.
- 57.Humphrey W., Dalke A., Schulten K. VMD: visual molecular dynamics. J. Mol. Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]