Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 14.
Published in final edited form as: J Phys Chem B. 2012 Jan 24;116(23):6645–6653. doi: 10.1021/jp210497h

Folding Simulations of the A and B Domain of Protein G

Maksim Kouza 1,*,, Ulrich H E Hansmann 1,*,
PMCID: PMC3337360  NIHMSID: NIHMS349445  PMID: 22214186

Abstract

We study wild type and mutants of the A and B domain of protein G using all-atom Go-models. Our data substantiate the usefulness of such simulation for probing the folding mechanism of proteins and demonstrate that multi-funnel versions of such models allow also to probe more complicated funnel landscapes. In our case, such models reproduce the experimentally observed distributions of the GA98 and GB98 mutants which differ only by one residue but fold into different structures. They also reveal details on the folding mechanism in these two proteins.

Keywords: Protein folding, Go-models, Molecular Dynamics, energy landscapes, Computer Simulations

Introduction

Anfinsen’s seminal experiments1 on denaturation and refolding of ribonuclease A showed that proteins can fold spontaneously into their active structure. This implies that the three-dimensional structure of a protein is encoded in its sequence of amino acids. Correspondingly, one finds often that homology between protein sequences implies similarity between their structures. This observation is the basis of many present structure prediction algorithms, but not with counter examples. Recently, Orban, Bryan and co-worker performed a set mutation experiments,2,3 starting from the A and B domain of protein G (GA and GB), that let to proteins with sequence identities over 90% but distinct structures and functions. In the extreme case (GA98 and GB98), these proteins differed by only a single residue that acts as a switch between the two structures.2 Such behavior is difficult to understand in a simple funnel picture4,5 which assumes that the free energy landscape has a “funnel”-like shape when projected on a suitable order parameter (often the number of native contacts), with multiple possible folding pathways leading to a unique native state.

The mutation experiments of Bryan and Orban suggest an extension of this picture to a double funnel model where the mutations decide on the relative weight of the two funnels, with in the extrema a single residue acting as a gatekeeper between the two basins of attraction. This is an intriguing concept as it would suggest that the sequence of amino acids in a protein may not only contain information on the native structures but also on other structures related to the evolutionary history or future of a protein (a protein may accumulate mutations that may not change its present structure but together with additional mutations could lead to new structures) or correspond to kinetically important intermediates.6,7 In the case of GA98 the competing structure (resembling the B domain of protein G instead of the A domain) is also observed experimentally with a certain, but low, probability.8 However, in general the competing, dormant, structures appear with little probability and it is experimentally difficult to test whether a given protein can take potentially an alternate structure given different environmental conditions or further mutations.

These limitations do not exist in computer simulations which allow in principle to map the free energy landscape of a protein and to explore its basins of attraction or the connecting transition states. Unfortunately, computer simulations that probe the fundamental processes of folding, binding, aggregation of proteins, or their interaction within a cell, are extremely difficult for realistic protein models: all-atom models lead to a rough energy landscape with a huge number of local minima separated by high barriers. The resulting increase in computational costs increases exponentially with the size of the protein. While this problem can be alleviated with use of sophisticated simulation techniques such as parallel tempering9,10 multicanonical sampling,11 or other generalized ensemble techniques,12 by Harold Scheraga13,14 and other (for a recent review, see, for instance,15) it limits the size of proteins that can be studied, even if coarse-grained models16 are used. If the interest is not in predicting new structures, but understanding the folding mechanism of proteins with known structures, the problem can be lessened by using all-atom or simple Cα Go-models as introduced Brooks,17 Onuchic1820 and others.2125 The underlying idea in all Go-models is to favor energetically the formation of contacts that are observed in the (known) native structure of the protein over that of other contacts. Hence, these models are especially suitable for simulating proteins with a single folding funnel.

Our interest is in proteins with a more complex free energy landscape. For these cases it was recently proposed26 to extend Go-models such that the contacts as found in two distinct structures are energetically favored. In the present paper, we explore what we can learn from such Go-model simulations on folding of the A and B domain of protein G, whose native structures are shown in Fig. Figure 1. For this purpose, we study the folding mechanism, free energy landscape and transition states of the wild types of these two proteins, and compare our results with the GA98 and GB98 mutants that differ in only a single residue (see also Fig. Figure 1).

Fig. 1.

Fig. 1

Folded structure of the A domain and B domain of protein G as deposited in the Protein Data Bank under Identifiers 2FS1 and 1PGB, respectively. The mutants GA98 and GB98 share the fold of the parent wild type. Sequence of wild types and mutants are listed in One-Letter-code.

Our interest is in both probing the differences between wild type and mutant, and testing the reliability of our model. For the later purpose, we simulate the A domain (B domain) with an all-atom Go-model that favors the contacts found in the native structure of the B domain (A domain). We find that our all-atom Go-model seems to distinguish between a sequence that “fits” a certain fold and one that does not. Simple Go-models that have only a single funnel are compared with the extended models that can also model more complicated landscapes. While the simple all-atom Go-models allow to probe the folding mechanism of the wild type GA and GB proteins, there use is limited for the mutants GAS and GB. Only the modified versions reproduce correctly the experimental observation that GA98 samples both the GA and the GB fold while GB98 samples only the GB fold albeit both sequences differ only in a single residue. We propose a mechanism that explains the differences between the two mutants.

Methods

Our simulations rely on the all-atom Go-model recently introduced by the Onuchic Lab.20 In this model the energy of a protein configuration is given by

E=bondsεr(rr0)2+anglesεθ(θθ0)2+impropers/planarεχ(χχ0)2+backboneεBBFD(φ)+sidechainsεSCFD(φ)+noncontactsεNC(σNCr)12+contactsεC[(σijr)122(σijr)6]. (1)

with FD(φ)=[1cos(φφ0)]+12[1cos(3(φφ0))]. The harmonic term accounts for chain connectivity, the second term represents the bond angle potential, while the potential for the improper and planar degrees of freedom is described by the third term. Flexible dihedrals are given by fourth and fifth terms. A soft sphere repulsive potential (the sixth term in Eq. Footnote ‡) disfavors the formation of non-native contacts. Finally, the non-local interaction energy between atoms in a native contact is modeled by a 6–12 Lennard-Jones potential. A comprehensive listing parameters can be found in Ref.20

Simulation details

In order to model the wild type CFr protein and its mutant we use the implementation of the above energy function by SMOG (Structure-based MOdels in GROMACS). This publicly available web server, located at http://smog.ucsd.edu,27 accepts pdb-files as input and returns topology and parameter files needed to set up the corresponding Go-model simulation in the GROMACs program package.28 Given a reference configuration SMOG (Structure-based MOdels in GROMACS) generates topology and other input files that allow an immediate simulation of the protein if only a single funnel is assumed. In the cases that we assumed a more complex topology of the energy landscape we edited the topology-files by merging the informations (i.e. contacts and dihedral maps) of two funnels. The few contacts that appeared in both funnels are treated separately by setting there energy to zero. In this way we avoided additional roughness of the energy landscape from competing interactions resulting from these contacts.

We use Gromacs 4.5.328 with Langevin dynamics, starting with an extended protein placed in a cubic box with minimal distance of 1 nm from protein to the box walls. The time step is 0.0005 and we used 2 * 108 steps for production runs. For different systems we use different temperatures in the 102–125 range which is in all cases close to the folding temperature, i.e. allowing the protein to fold and unfold. The corresponding values are given in the text. Note that time and temperatures are given in reduced units as our force field is not a physical energy function. Determination of critical temperature required several test runs for each protein. Temperatures where the population of folded and unfolded states are approximately equal where chosen for the main simulations runs of 2* 108 MD steps, For analysis we reweighed the so obtained data to the folding temperatures, defined by us by the condition that the populations of folded and unfolded configurations are of equal depth.

Results

We first present our results from Go-model simulations of the wild types of A and B domain. In Fig. Figure 2a) we show for the A domain the root-mean-square deviation (rmsd) to the PDB structure as a function of simulation time. The frequent changes between configurations with rmsd below 3 Å (i.e. folded configurations) and such with rmsd above 15 Å indicates that our simulation is with T=125 (in reduced units) at a temperature that is close to the folding temperature. The same is true for our simulation of the B domain at T=110 shown in Fig. Figure 2b). The corresponding plots of the free energy as function of the number of native contacts at these two temperatures in Fig. Figure 3 display in both cases two basins of attraction of almost equal depths, corresponding to the folded and denatured states. The two states are separated by a free energy barrier of ≈2kBT. Note again that the kB here is not the usual Boltzmann constant but an arbitrary chosen constant that sets the energy (or temperature) scale.

Fig. 2.

Fig. 2

Root-mean-square deviation (rmsd) to the corresponding PDB structure as function of time for a) the A-domain (GA) and b) the B-domain (GB) of protein G. Data are from all-atom Go-model simulations at the respective folding temperatures TfGA=125 and TfGB=110.

Fig. 3.

Fig. 3

Free energy as function of the number of native contacts. The data are from all-atom Go-model simulations at the respective folding temperatures TfGA=125 and TfGB=110.

An advantage of all-atom Go-model simulations over that relying on physical force fields is the ability to sample with high statistics the free energy landscape of large proteins as our at the folding temperatures. This allows one to probe the folding mechanism of proteins. In Fig. Figure 4a we show for GA the free energy landscape at the folding temperature as function of rmsd and number of native contacts. The L shaped landscape indicates that first the rmsd decreases rapidly with increasing number of contacts, but stays low and decreases little further once ≈ 100 contacts are formed. Clearly visible is a transition area separating the two regimes of folded and unfolded states. More details on the folding mechanism can be derived from Fig. Figure 4b where the free energy is plotted as function of number of native contacts and relative contact order(RCO) parameter29 defined by:

RCO=ijΔijijNL (2)

where N is the total number of contacts, L is the total number of residues, Δij = 1 if residues form contact and Δij = 0 otherwise. The two regions appear also in this plot, with the unfolded region characterized by a number of native contacts below 100 and RCO of 0.06, while the folded region has an increased roc of 0.08. This indicates that initially local contacts are formed, while once a certain number is reached, long range contacts are formed. This is a common scenario for helix bundles32,33 consistent with a framework model30,31 which assumes as a first step formation of helices with their short range contacts resulting from hydrogen bonding. In a second step, these helices arrange each other and form long range contacts with such between the N- and C-terminal helices forming after such between the terminal and the central helix (data not shown). These inter helical contacts lead to the observed increase of ROC. This picture is supported by our analysis of transition states shown in Fig. Figure 6 and discussed later. This final folding step by arrangement of the three helices is connected with an increase of side-chain contacts as can be seen from Fig. Figure 4c). In this figure the free energy is plotted as function of main-chain contacts (i.e. contacts only between backbone atoms) and side chain contacts (contacts that involve side chain atoms). Once the number of main chain contacts growth above ≈ 30, the number of side chain contacts increases rapidly, while below this threshold the number of side chain contacts growth only slowly.

Fig. 4.

Fig. 4

Two-Dimensional free energy landscapes as function of various quantities for the A domain ((a)–(c)) and B domain ((d)–(f)) of protein G. The data are from all-atom Go-model simulations at the respective folding temperatures TfGA=125 and TfGB=110.

Fig. 6.

Fig. 6

Percentage of native state (FS), transition state (TS) and denatured state (DS) as function of rmsd as obtained in simulations of the four all-atom Go-models GAGA, GBGA, GAGB and GBGB at the respective folding temperatures TfGAGA=125,TfGBGA=125,TfGAGB=113, and TfGBGB=110. The curves are normalized such that the area under the curve is one. The absolute frequency of states are listed above each curve. Shown are also typical folded and transition state configurations. The N-terminal of each configuration is marked by a dot.

The corresponding free-energy landscapes for the B-domain are shown in Fig. Figure 4d–f)). The free energy as function of number of native contacts and rmsd is less L-shaped than for the A domain. The transition happens at a larger rmsd value (≈ 10 Å for GB compared to ≈ 7Å for GA), and the rmsd decreases even in the folded region strongly with increasing number of contacts. Note also that the two regions are more separated than in the GA case. This can be seen also in free energy landscape projected on contact order and number of native contacts. There is a very sharp transition between a region of small number of contacts and low contact order and a region where the contact order is much larger larger and stays constant with increasing number of contacts. The larger values of the contact order results from the beta-sheets in GB that are inherently non-local. Formation of the N-terminal β-hairpins (S1 with S2) seems to be the limiting step in the folding process, as can also be seen from transition states in Fig. Figure 6. Once formed, the hairpin seems to initiate formation of the S4 strand followed by the S3 strand. The helix is formed and dissolves intermediately, but only stable once in contact with the four β-sheets. As in the case of GA side chain ordering and formation of side chain contacts seem to happen only after has assumed its fold. The sequence of events is somehow different from the one observed in coarse-grain simulations of the protein by the Kolinski lab,34 which found that folding starts with formation of the C-terminal hairpin (S3–S4), followed by contacts of this hairpin with the helix. A third step is the formation of the N-terminal hairpin S1–S2 followed by completion of the correct fold with formation of the S1–S4 contacts. It is not clear to us whether these differences result from the differing dynamics (Monte Carlo moves versus molecular dynamics) or from the choice of force fields.

An inherent problem with Go-model simulation is that the energy function is not physical. By energetically favoring contacts that appear in the native structure one makes already assumptions on the folding mechanism. Hence, there is a danger that the observed folding mechanisms are artifacts of the energy function. In order to probe how much the form of our energy function influences our results we made the following test. We took the GA (GB) sequence and forced it into the GB (GA) fold. For this purpose, we started with the GB (GA) fold and changed the sequence of amino acids in 47 subsequent in silico “mutations” from GB (GA) to GA (GB). After each “mutation” done with Modeller,35 the protein configuration was minimized carefully to avoid steric clashes while at the same time preserving the parent fold. We used the contacts in the so-generated two configurations to derive two all-atom Go-models which we call GAGB and GBGA that enforce for the GA sequence the GB fold as lowest-energy state, and vice versa for the GB sequence the GA fold. We then compared these artificial Go-models with the realistic ones, GAGA and GBGB, where the GA (GB) sequence folds into the GA (GB) fold. In Fig. Figure 5 we contrast the free energy landscape as a function of native contacts at the respective folding temperatures for GAGA with GBGA, and for GBGB with GAGB. The inset displays the same quantities as function of the number of native main chain contacts. Note that in both cases the free energy barriers between folded and unfolded state is substantially lower and the transition broader and less pronounced. The difference is especially prominent in the free energy as function of man in chain native contacts. We observe such behavior also in all other free energy plots that we have considered (data not shown).

Fig. 5.

Fig. 5

Free energy as function of number Q of native contacts for (a) GAGA and GBGA, and (b) GBGB and GAGB. The inset shows the free energy as function of main chain contacts. Data are from all-atom Go-model simulations at the respective folding temperatures TfGAGA=125,TfGBGA=125,TfGAGB=113, and TfGBGB=110. Here, GAGB (GBGA) marks a Go-model constructed such that the GA (GB) sequence has the GB (GA) fold as lowest energy state.

While the folding mechanisms did not change we find a less clear order in that long range contacts are formed. This can be also seen in the distribution of transition states displayed in Fig. Figure 6. In this figure we show for GAGA, GBGA, GAGB and GBGB the distribution of folded states, transition states and denatured states as function of rmsd. From an analysis of Fig. Figure 4 and the corresponding plots for GAGB and GBGA (not shown) we define here a state as transition state if its free energy lies in the range close to the maximum free energy as a function of number of native contacts. Structures from regions with with smaller or greater free energy are referred to as unfolded or folded, respectively. Note that the Y-axis shows relative weights, i.e. the area under each curve is set to one. The total frequency of each group on top of the respective curves. Shown in the figures are also typical folded (top or bottom) and transition state (left or right side) configurations. Note that not only absolute frequency of transition states is higher in GBGA (GAGB) than in the native GAGA (GBGB) but also the distribution broader. The transition states and folded states themselves appear to be less defined for the artificial models. We believe that this is because the sequence of a protein optimizes also the side chain contacts in its native state. Forcing the GA (GB) sequence into the GB (GA) fold leads to a large number of competing non-native interactions that lead to the broadening of the transition state distribution. Hence, interestingly, our all-atom Go-model seems to distinguish between a sequence that “fits” a certain fold and one that does not.

We extend now our analysis to the two mutants GA98 and GB98 where the GA and GB sequence are mutated towards two sequences that differ only in a single residue (45LEU in GA98 vs. 45TYR in GB98) but keep both there original fold and function. In Fig. Figure 7a and b) ) we show the free energy landscape as function of number of native contacts Q for GA98 and GB98. The inset shows the same quantity as function of main chain contacts only. Data are again from simulation at the folding temperatures ( TfGA98=125 and TfGB98=110). The position of the free energy barrier is for GA98 shifted to the right of that of GA, and its height raised by about one kBT. On the other hand, for GB98 the barrier is slightly lower and shifted to the left of that in GB. These shifts result from the differences in the number and kind of side chain contacts. When looking into the free energy as function of main chain contacts the shifts are smaller, and for Ga98 the barrier height differs little from that of GA. On the other hand, the barrier in free energy as function of main chain contacts is lowered for GB98 by about one kBT over GB. Note that for the wild types (GA and GB) the height of the barrier differs little between the case where the free energy is projected on the total number of native contacts and the case where it is projected on the number of of native main chain contacts. On the other hand, there are larger differences between the two cases for both mutants (GA98 and GB98). This indicates that arrangement of side chains plays a more important for the folding of the mutants than for that of the wild types. We remark that unlike for the wild types the corresponding figures for GA98GB and GB98GA (i.e. the GA98 (GB98) sequence forced into the GB (GA) fold) look very similar to that GA98GB and GB98GB (data not shown). This is not unexpected as the two sequences GA98 and GB98 differ in only one residue.

Fig. 7.

Fig. 7

Free energy as function of number Q of native contacts. In (a) we compare GAGA with GA98GA, in (b) GB GB with GB98GA. The inset shows the same quantity as function of main chan contacts only. In (c) we compare the free energy landscapes of the regular Go-models GA98GA and GB98GB with such where the GA98 (GB98) sequence has the GB98 (GA98) fold as lowest energy state. The later two models are named GA98GB and GB98GA. All results are from all atom Go-model simulations at their folding temperatures TfGAGA=125,TfGA98GA=125,TfGA98GB=110,TfGBGB=110,TfGB98GB=110 and TfGB98GB=123,

When the energy landscape is plotted as function of two variables, the transition appears to be sharper for the mutants than for the wild types. This can be seen in Fig. Figure 8 where we plot the free energy as function of relative contact order ROC and number of native contacts for a) GA98, b) GA, c) GB98, and d) GB. The transition region is considerably less populated for GA98 and Gb98 than in the corresponding wild types, with the difference more pronounced for GA and GA98. This can be also seen from the frequency in transition states shown in Fig. Figure 9, and indicates that for the mutants folding is more of an all-or-nothing than of the wild type and requires narrow pathways in the funnel landscape. On the other hand, little difference is found in the folding kinetics. As the wild type (GA), GA98 folds by first forming the three helices that then arrange themselves in the final fold, with contacts between the central helix and the terminal helices forming before that between the two terminal helices. In the same way, GB98 forms as GB first the N-terminal hairpin S1–S2, which then initiates formation of the S4 strand followed by the S3 strand, and the helix forming transiently but being only stable after forming contacts with the four β-sheets. The similarity in the transition states between Fig. Figure 6 and Fig. Figure 9 is consistent with this picture of the same folding mechanisms in wild types and mutants.

Fig. 8.

Fig. 8

Free energy as function of contact order ROC and number Q of native contacts for (a) GA98GA, (b) GAGA, (c) GB98GB, and (d) GBGB. All data are from all-atom Go-model simulations at the folding folding temperatures TfGA98GA=125,TfGAGA=125,TfGB98GB=110 and TfGBGB=110.

Fig. 9.

Fig. 9

Percentage of native state (FS), transition state (TS) and denatured state (DS) as function of rmsd as obtained in simulations of the four all-atom Go-models GA98GA, GB98GA, GA98GB and GB98GB at the respective folding temperatures TfGA98GA=125,TfGB98GA=123,TfGA98GB=110, and TfGB98GB=110. The curves are normalized such that the area under the curve is one. The absolute frequency of states are listed above each curve. Shown are also typical folded and transition state configurations. The N-terminal of each configuration is marked by a dot.

By construction, Go-models favor energetically the formation of contacts that are observed in the (known) native structure of the protein over that of other contacts. Hence, these models are especially suitable for simulating proteins with a single folding funnel. However, we can expect in the case of GA98 and GB98 that the free energy landscape is more diverse. As the two sequences differ only in a single residue, we conjecture that the landscape consist of two funnels, with that specific residue acting as a gatekeeper between the two basins of attraction. Such behavior cannot be modeled with a simple Go-model, and our above presented results for Ga98 and Gb98 assume implicitly that the effects of the secondary funnel are minor. However, in the case of GA98 the competing structure (resembling the B domain of protein G instead of the A domain) is also observed experimentally,8 indicating that this assumption is not valid.

For this reason, we have also studied the two mutants GA98 and GB98 with Go-models that explicitly assume a folding landscape with two funnels (leading in either the GA or the GB fold). In Fig. Figure 10 we show the time series and energy landscape for a simulation of GA98 in such a two-funnel all-atom Go-model. Both time series of rmsd (with respect to GA and GB fold) and free energy landscape demonstrate that that GA98 can assume in this model both the GA and the GB fold, but the free energy of the protein in the GB fold is about 3 kBT higher than that of the protein in the GA fold, and comparable to that of the unfolded protein. Note that the free energy changes continuously between the various states, and no obvious barriers can be seen. On the other hand, the corresponding figure for GB98 in Fig. Figure 11 unveils a different behavior. Again, the protein can assume both folds, and again the “correct” fold (GB) is favored by about 3 kBT over the competing GA fold whose free energy is also comparable to the unfolded states, but in this case there is a clearly defined barrier isolating the GA states. This is consistent with the experimental results where GA98 is observed also with small frequency in the GB fold while GB98 is not found in a GA fold.

Fig. 10.

Fig. 10

a) Rmsd with respect to the GA98 (black) and GB98 (red) structure as function of time. The corresponding free energy landscape is shown in b) together with typical structures in the main local minima. The data are from all-atom Go-model simulation at folding temperature TfGA98/GAGB=111. The Go -model was modified such that the model has two two global minima corresponding to the GA98 and GB98 structure.

Fig. 11.

Fig. 11

a) Rmsd with respect to the GA98 (red) and GB98 (black) structure as function of time. The corresponding free energy landscape is shown in b) together with typical structures in the main local minima. The data are from all-atom Go-model simulation at folding temperature TfGB98/GAGB=102. The Go -model was modified such that the model has two two global minima corresponding to the GA98 and GB98 structure.

For both GA98 and GB98 a switch from one to the other configuration implies going through the ensemble of unfolded configurations. In the case of GB98 there is a large barrier between the ensemble of unfolded states and the ensemble of configurations with GA fold, and a much smaller between unfolded states and such with a GB fold. On the other hand, for GA98 no such large barrier exists that separates the ensemble of unfolded states from that of configurations in the competing GB fold. As both proteins differ by a single residue (45LEU in GA98 vs. 45TYR in GB98) we can expect that the behavior of this residues leads to the different folding landscapes, i.e creating the barrier for GB98 while not for GA98.

Our first assumption was that residue 45 selects the conformation of the C-terminal segment which afterwards acts as a template for the structure of the N-terminal segment. Such a mechanism was observed in unfolding simulations of the similar mutants GA59 and GB59 (which have only 59% of sequence identity instead of the 98% identity between GA98 and GB98) in Ref.36 However, in our simulations the observed folding events indicate that folding into the GA structure starts with formation of the three helices that then form inter-helical contacts, while such into the GB-fold starts with formation of the N-terminal hairpin S1–S2, followed by contacts S1–S4 and later S3–S4 and helix-strand contacts. While the folding of GA98 is similar to the one observed in unfolding simulations of GA59 by the Daggett group36 this is not the case for GB98. The Daggett group concluded that GB59 starts folding by forming first the S3–S4 which becomes stabilized by contacts with the helix and prevents folding toward the helix bundle of the GA fold. Instead, our results indicate that the differences between GA98 and GB98 result from the disparity in that 45LEU and 45TYR form long range contacts, not from the way they form their local secondary structure contacts.

This is consistent with an analysis of contacts formed by these residues in the transition region of Fig. Figure 10 and Fig. Figure 11 which indicates that typical helical short range contacts such as between residue 49 and 49 are formed with similar frequency: 65% in GA98 versus 68% in GB98. The same is true for sheet contacts such as between residues 45 and 52 which are found for GA98 with 33% probability, and with 32% for GB98. Similarly, we also find little difference between the transition regions in the frequency of the long range contacts that stabilize the GB fold. For instance we find contacts between residues 23 and 45 (helix-strand 3) in GA98 with 5% and in GB98 with 4% frequency. However, we find a pronounced difference in contacts between residues 32–35 and 45 by that in the GA fold the central helix interacts with the C-terminal helix. For instance, the hydrophobic contacts between 33ILE and 45LEU are found in GA98 with 48%, but such between 33ILE and 45TYR in GB98 only with 29%. Formation of such contacts between the central and the C-terminal helix is the first step after formation of the secondary structure in the folding into the GA fold. We conjecture that the the reduced probability by 45TYR to form such inter-helical contacts is what leads to the barrier in GB98 which protects against the GA fold in this protein. On the other hand, switching from 45TYR to 45LEU increases the frequency of such intra-helical contacts (i.e. favoring the GA fold), but does not change the frequency of the long range contacts typical for the GB fold. In addition, these contacts, as for instance the helix-strand contacts between residues 23 and 45, form late in the process of folding to the GB structure. Formation of these contacts therefore does not constitute a bottleneck in the folding. Hence, the barrier that separates the GB-fold in GA98 is much lower than the one separating the GA-fold in GB98.

Conclusions

We have simulated wild type and mutants of the A and B domain of protein G using all-atom Go-models. While Go-models by construction lead to pre-selected structures as lowest-energy states, our simulations show a clear difference between sequences that “fits” a certain fold and ones that do not. For the wild type GA and GB, simple all-atom Go-model simulations allow to study the folding mechanism of these proteins. However, such models that by construction have only a single folding funnel will fail when the energy landscape of a protein is more complex. In the case of the mutants GA98 and GB98 that differ only in a single residue but have very different distributions of folded structures we therefore tried a modified Go-model that incorporates folding funnels to both GA and GB fold. This model reproduced not only correctly the experimentally observed distributions but also revealed details on the folding mechanism in these two mutants. Such multi-funnel Go-models may therefore be suitable to study the evolutionary history of proteins, structural transitions in proteins, or to incorporate ensemble information into a simulation. For instance, in an upcoming study (Ping J. Hansmann, U.H.E., unpublished results) we study a protein taking into account structural information from a whole NMR ensemble instead of only a single structure.

Acknowledgments

This work was supported, in part, by research grant CHE-08090002 of the National Science Foundation and GM62838 of the National Institutes of Health (USA).

References

RESOURCES