Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Dec 9.
Published in final edited form as: J Phys Chem B. 2007 Jan 11;111(1):293–309. doi: 10.1021/jp065810x

Molecular Dynamics with the United-Residue (UNRES) Force Field. Ab initio Folding Simulations of Multi-chain Proteins

Ana V Rojas 1,2,3, Adam Liwo 1, Harold A Scheraga 1,*
PMCID: PMC2597722  NIHMSID: NIHMS61767  PMID: 17201452

Abstract

The implementation of molecular dynamics with the United-Residue (UNRES) force field is extended to treat multi-chain proteins. Constant temperature was maintained in the simulations with Berendsen or Langevin thermostats. The method was tested on three α-helical proteins [1G6U and GCN4-p1, each with two chains; and 1C94, with four chains]. Simulations were carried out for both isolated single chains and the multi-chain complexes. The proteins were folded by starting from the extended conformation with random initial velocities, and with the chains parallel to each other. No symmetry constraints or structure information were included for the single chains or the multi-chain complexes. In the case of single-chain simulations, a high percentage of the trajectories (100% for 1G6U, 90% for GCN4-p1 and 80% for 1C94) converged to native-like structures (assumed as the experimental structure of a monomer in the multi-chain complex), showing that, for the proteins studied in this work with the UNRES force field, the interactions between chains are not critical for stabilization of the individual chains. In the case of multi-chain simulations, the native structures of the 1G6U and GCN4-p1 complexes, but not that of 1C94, are predicted successfully. The association of the subunits does not follow a unique mechanism; the monomers were observed to fold both before and simultaneously with their association.

1. Introduction

Predicting the native structure of a protein from knowledge of its amino acid sequence by an ab initio (physics-based) approach remains one the most difficult problems in contemporary computational biology. An even more challenging problem is the prediction of the folding pathway of a protein. The ab initio approach has the advantage that it provides thermodynamic and kinetic information about the different stages of the folding process as well as the final structure. To accomplish such predictions, it is necessary to simulate the folding process in real time, starting from a statistical coil (unfolded) conformation, until the native structure is reached. For such a simulation to be realistic, it should ideally include atomic details of both the system and the solvent.1 However, with today's computational power, explicit-solvent all-atom molecular dynamics (MD) algorithms can simulate only events that range up to nanoseconds for typical proteins or microseconds for very small ones.13 These time scales are at least one order of magnitude smaller than the folding times of proteins. To overcome this problem, all-atom simulations either implement alternative sampling methods, such as umbrella sampling,4 or simulate the unfolding process and some aspects of its refolding1, 2; simulations primarily treat single-chain proteins but, in some cases,510 computations are carried out for oligomers. In general, simulations of oligomers either study the stability of a specific structure6,10 or the kinetics of folding and/or assembly5,7,8,9 of the subunits. Stability studies are usually carried out by all-atom MD6,7,10, but this technique is computationally too expensive to study the kinetics of the folding process. To reduce the computational cost, the main approach has made use of minimal models; a minimal model is one for which each amino acid is represented by a few interaction sites, reducing the dimensionality of the problem. Although the information that they can provide is not as detailed as that obtained by all-atom models, they can achieve longer simulation times. Some minimal models have further reduced the computational cost by using a Gō-type potential7,8,9, which creates a funnel-like landscape biased towards the native structure, thereby speeding up the folding process. It has also been possible to study the kinetics of oligomeric proteins without including any structural knowledge of the particular protein of interest. For example, Vieth et al5. used a lattice model with a statistical potential (i.e. biased towards structures in a library, but not towards the particular structure being studied) and Monte Carlo (MC) dynamics to study the folding pathway of the GCN4 leucine zipper from randomly generated initial structures.

With a minimal model, we have recently1114 developed a molecular dynamics algorithm for the physics-based united-residue (UNRES) force field that was previously developed in our laboratory.1523 We will refer to this implementation of UNRES as UNRES/MD. UNRES was originally designed and parameterized to locate native-like structures of proteins as the lowest in potential energy by unrestricted global optimization. The latest version of UNRES, referred to as the 4P force field,23 was optimized on four training proteins: 1GAB (all-α), 1E0L (all-β), 1E0G (α + β) and 1IGD (α + β). It performed well in the CASP6 exercise24; the largest molecule that was folded with this force field contained 208 amino acid residues. The average length of correctly predicted segments of α-helical proteins with this force field is 67 residues (Table 8 in ref 23).

Since the degrees of freedom corresponding to the fastest motions are averaged out18 in UNRES, UNRES/MD was able to simulate events that fall into the microsecond time scale.13 After the success of UNRES/MD with single-chain proteins, it seemed natural to generalize the method to treat multiple-chain proteins. A multi-chain version of UNRES and a global optimization search based on Conformational Space Annealing (CSA) had previously been developed in our laboratory.25 However, that implementation required the use of symmetry to achieve proper folding. Here, we present an extension of UNRES/MD, which can simulate the folding pathway of oligomeric proteins from an extended conformation without imposing symmetry constraints of any kind. As with the single-chain UNRES/MD calculations, Berendsen dynamics (BD) and Langevin dynamics (LD) are used to mimic energy exchange with the solvent and, consequently, to maintain constant temperature. LD provides a more realistic picture through explicit inclusion of the non-conservative friction and random forces, which account for collisions of the protein with the solvent molecules. Because the purpose of this work was to extend the UNRES/MD approach to multichain systems and not to develop an improved force field, we used systems that the 4P force field could treat to test the method. We ran simulations on the following three α-helical proteins of known native structure: 1G6U (two chains, 48 residues each), 2ZTA (two chains, 33 residues each), and 1C94 (four chains, 38 residues each). The complexity and size of these proteins is similar to that of the α-helical proteins tested in our previous work on single-chain UNRES/MD13 and the size of the smallest of them (2ZTA) is within the average size of structural segments of α-helical proteins that can be predicted successfully with the 4P force field.23 These systems are, therefore, appropriate to test the UNRES/MD approach for multichain proteins, given the limitations of the present force field. We did not use β or α+β proteins because we found in our earlier work13 that UNRES/MD generally produces non-native α-helical structures for such proteins, even though the native structures are global energy minima in the UNRES energy surface; this happens because the conformational entropy is neglected in force-field parameterization. We believe that this problem can be overcome by improving the force field and introducing entropic effects, an activity which is presently ongoing in our laboratory.

2. Methods

2.1 United Residues Force Field

UNRES is a coarse-grained model1523 in which the backbone is represented as a sequence of α-carbon (Cα) atoms linked by virtual bonds designated as dC, with united peptide groups (p’s) in their centers. United side chains (SC) are connected by virtual bonds designated as dX to the backbone at the Cα positions with the center of mass of the side chain at the end of dX (see Fig.1). The geometry of the protein is then fully described by the virtual bond vectors dC's and dX's. Since the forces in UNRES are exerted on the peptide groups and side chains, hereafter we will use the term “interacting sites” to refer to both united peptide groups and side chains. The complete UNRES potential energy function for a single chain is given by

USinglech.=ji<jUSCiSCj+ssUCysissCysjss+wSCpjijUSCipj+welji<j1Upipj+wtoriUtor(γi)+wtordiUtord(γi,γi+1)+wbiUb(θi)+wrotiUrot(αi,βi)+m=36wcorr(m)Ucorr(m)+wvibiUvib(di) (1)

where the indices i and j run over the residues. The terms USCiSCj (derived and parameterized in ref 16) correspond to the mean free energy of hydrophobic (hydrophilic) interactions between the side chains. These terms implicitly contain the contributions from the interactions of the side chain with the solvent. The terms UCysissCysjss (derived and parameterized in ref 26) account for the energy of disulfide bonds, with ss running through all those pairs of half-cystines that are known a priori to form disulfide bonds26. The terms USCipj correspond to the excluded-volume potential of the side chain-peptide group interactions. The terms Upipj (derived in ref 15 and parameterized in ref 21) represent the energy of average electrostatic interactions between backbone peptide groups. The terms Utor and Utord (derived and parameterized in ref 20) are the torsional and the double-torsional potentials, respectively, for the rotation about a given virtual bond or two consecutive virtual bonds. The terms Ub and Urot (derived and parameterized in ref 17) are the virtual-angle-bending and side-chain-rotamer potentials, respectively. The terms Ucorr(m) (derived in ref 18 and parameterized in ref 21) correspond to the correlations (of order m) between peptide-group electrostatic and backbone-local interactions. The terms Uvib(di) (derived and parameterized in ref 11), di being the length of the ith virtual bond, are simple harmonic potentials defined by eq. (2).

Uvib(di)=12k(didio)2 (2)

where k is a force constant, currently set at 500 kcal/(mol × Å2) and dio is the average length (corresponding to that used in the fixed-bond UNRES) of the ith virtual bond. The w’s in eq. (1) are the weights of the respective terms.

Fig. 1.

Fig. 1

UNRES representation of a polypeptide chain. Filled circles represent the united peptide groups (p), and open circles represent the Cα atoms, which serve as geometric points. Ellipsoids with their centers of mass at SC positions represent UNRES side chains. The P's are located halfway between two consecutive Cα atoms, at positions (1/2)dC. The conformation of the polypeptide chain can be described fully by either the coordinates of all the dC and dX vectors or by the virtual-bond angles θ, the virtual-bond dihedral angles γ, and the angles α and β defining the orientation of the side chain with respect to the backbone.

The UNRES force field has also been extended to multiple-chain proteins.25 In the present work, the inter-chain interaction energies (and their form, parameters, and weights) were taken to be the same as those of the intra-chain terms in the treatment of single chains. However, since the interacting sites between chains are not backbone-connected, not all the terms present in equation (1) contribute to the inter-chain energy. The interaction energy between two different chains (identified by superscripts k and l, respectively) can be expressed by:

U(Interch.)k,l=ijUSCikSCjl+Kss,lUCysisskCysjssl+wSCpijUSCikpjl+wSCpijUpikSCjl+welijUpikpl+m=36wcorr,nonadj(m)Ucorr,nonadj(m) (3)

where Ucorr,nonadj represents the correlation terms corresponding to interactions between non-adjacent residues. The different terms in eq. (3) have the same form, and the weights have the same values, as those in eq. (1). Detailed descriptions of each of the terms in equation (1) and equation (3) can be found in references 15 through 18, 20, 21 and 26. It should be noted that eq. (3) is different from eq. (1) of reference 25 because the latter represents the complete multiple-chain UNRES potential energy, whereas eq. (3) accounts only for the interaction between two chains in the system. Hence, eq. (3) is only part of the contribution to the complete multiple-chain potential energy. It should also be mentioned here that, for the force field used in this work (4P force field23), the weights of the fifth and sixth order correlation terms, wcorr(5)andwcorr(6) in eq. (1) and wcorr,nonadj(5)andwcorr,nonadj(6) in eq. (3), are zero23, but these terms have been included in the equations for completeness.

In order to mimic peptide concentrations, the system was confined within a soft sphere. This was done by adding another term, Uconf, to the potential energy, causing each interacting site (either a peptide group or a side chain) to feel an attractive force toward the center of the sphere whenever it is outside the boundary of the sphere. This potential, which is added to eq. (1) and eq. (3), is defined by eq. (4).

Uconf=kiuconfik (4)

where uconfik, the confining potential acting on interacting site i in chain k, is given by

uconfik={0ifrikR0kc(rikR0)4ifrik>R0 (5)

where kc is a force constant with unit value (kc =1 kcal/(mol × Å4)), rik is the distance from interacting site i to the center of the sphere (placed at the center of mass of the initial conformation) and R0 is the radius of the sphere. The radius of the sphere determines the volume of the system [volume = 4π(R0)3/3]. Therefore, the value of R0 and the number of peptide chains in the solution determine the peptide concentration of the simulated solution (see section 3 for details of the concentrations used in the simulations); in all simulation, the number of chains was taken as the number of chains in the multi-chain complex.

Combining equation (1), equation (3) and equation (4), we obtain the multiple-chain UNRES potential energy [eq. (6)].

U=kU(Singlech.)k+kl>kU(Interch.)k,l+Uconf (6)

where the indices k and l run through the different chains.

2.2 Equations of Motion

In order to find the time evolution of a system, it is necessary to solve the equations of motion of the system. In general, for a system with generalized coordinates q1,q2,⋯qn, and generalized momenta 1,⋯q̇n this is equivalent to solving the set of Lagrange’s equations:

ddt[q˙iL(q1,q2,,q˙1,q˙2,)]qiL(q1,q2,,q˙1,q˙2,)=Qi (7)

where i = 1, …,n, L is the Lagrangian of the system and the Qi's are the generalized dissipative (Rayleigh) forces acting on the system.

The Qi's are non-conservative forces and, therefore, cannot be derived from the potential energy of the system. For our system, these non-conservative forces are the friction and stochastic forces; they represent collisions with the solvent molecules due to a net motion of the system and random impact of the fluctuating solvent molecules on the solute molecules, respectively, as well as the net effect of averaging out the internal secondary degrees of freedom of the protein molecule. Each Cartesian component of each generalized force will have the form

Qi=γivi(t)+firand (8)

with γi and vi(t) being the friction coefficient and velocity related to the ith coordinate, and firand being a stochastic force with zero mean and intensity given by27 eq. 9.

<firand(t)firand(t+τ)>=2γiRT0δ(τ)δij (9)

where R is the universal gas constant, T0 is the temperature of the bath, δ(τ) is the Dirac delta function (evaluated at an arbitrary time interval τ), and δij is the Kronecker delta function. When the Qi’s are identified with the sum of the stochastic and friction forces, they account for the coupling of the protein chain(s) under study to the solvent, which in turn, acts as a thermostat, thereby maintaining an average constant temperature of the system.

Following previous work,11 we chose to describe each chain by a set of virtual bond vectors dCikanddXik, with dCik being the vector pointing from CiktoCi+1k, except for dC0k which points from the origin to the first Cα in the chain, and dXik the vector pointing from CiktoSCik (see Fig. 1). The superscript k indicates the chain to which reference is being made. The entries corresponding to glycine residues are omitted from the list of dX's since they have zero length. A “dummy" Cα atom is introduced at the beginning (end) of the chain if the first (last) residue is not glycine and if the chain is unblocked.15

To simplify the notation, the dCikanddXik vectors will be grouped in a single vector qk=(dC0k,dCsk,,dCek,dX1k,dX2k,,dXmk)T. The indices s and e correspond to the first and last real residue, i.e., s = 1 if the first residue is Gly and s = 2 otherwise. Likewise, if the last residue is a dummy one, the index e = n − 1, with n being the number of residues in the chain, and e = n otherwise. The index m is the number of non-glycine residues in the chain. It should be noted that, although we have omitted the superscripts, the values of s, e, n and m might in principle be different for different chains within the complex.

The coordinates xpikandxSCik of the united peptide groups and side chains can be reconstructed from the dCikanddXik vectors through equation (10) and equation (11).

xpik=dC0k+j=sj=i1dCjk+12dCik (10)
xSCik=dC0k+j=sj=i1dCjk+dXik (11)

Defining vectors xk=(xpsk,,xpek,xSC1k,,xSCmk), equation (10) and equation (11) can be expressed in matrix form, obtaining a single equation for each chain.

xk=Akqk (12)

where Ak is the matrix that transforms from the generalized coordinates qk of the kth chain to the Cartesian coordinates of the interacting sites xk of the same chain. The same relation holds for the velocities vK=(vpsk,,vpek,vSC1k,,vSCmk)

vk=Akq˙k (13)

Then, when solving Lagrange’s equations, we obtain a relation, for each chain, of the form

ddt[q˙kKk(qk,q˙k)]+qkU(q1,q2,,qN)=ffrick+frandk (14)

where k indicates the chain in question, Kk is its kinetic energy, ffrick and frandk are the friction and random forces acting on that chain, and N is the total number of chains in the protein.

The different chains are coupled only through the UNRES potential energy U, which also includes the free energy of the solvent implicitly in the USCiSCj terms. The kinetic energy of a specific chain does not contain any dependence on the coordinates from a different chain. This enabled us to easily generalize the single-chain equations derived in references 11 and 12 to the multi-chain problem. We obtained the set of equations

q¨k=[Gk]1qkU(q1,q2,,qN)[Gk]1[(Ak)TΓk(Ak)]q˙k+[Gk]1(Ak)Tfrandk (15)

where Γk is a diagonal matrix containing the friction coefficients of the interacting sites (peptide groups and side chains), and Gk is the inertia matrix, defined by eq. (16)

Gk=(Ak)TMk(Ak)+Hk (16)

where Mk is a diagonal matrix containing the masses of the interacting sites, and Hk, also a diagonal matrix, is the part of the inertia matrix corresponding to the internal stretching of the virtual bonds. Mk and Hk are defined by eq (28) and (29), respectively, of ref. 11. Details of the derivation of equation (15) and equation (16) can be found in references 11 and 12.

The components of the vector of random forces are calculated from a normal distribution according to2830

(frandk)i=2γiRTδtN(0,1) (17)

where (frandk)i is the random force acting on the ith site from chain k, γi is the friction coefficient associated with that site, R is the universal gas constant, T is the temperature of the bath, δt is the integration time step, and N(0,1) is a tridimensional normal distribution with zero mean and unit variance.

2.3 Simulations in the Microcanonical Ensemble

As in our previous work11, we first carried out MD calculations in the microcanonical ensemble. In this case, the stochastic and the friction forces are set to zero; therefore, the total energy of the system should be conserved. In order to check that the total energy condition was satisfied, we carried out simulations on two chains of an unblocked ALA10 polypeptide with the variable time step as described in section 3 of reference 11. The simulations showed that the fluctuations in the total energy are negligible when compared with those in the kinetic and potential energies. The total energy is conserved, although only to the extent that it is conserved in reference 11. The results of the microcanonical simulations are not shown here since that is an issue that has already been addressed in reference 11.

2.4 Simulations in the Canonical Ensemble

The microcanonical picture of a completely isolated system in which all the forces are known does not always correspond to typical experimental conditions. For this reason, the canonical ensemble, NVT, for which the temperature (i.e., average kinetic energy) of the system remains constant, is a more desirable choice for carrying out MD simulations.

Langevin Dynamics (LD)

As pointed out in section 2.2, the system can be kept at a constant temperature by inserting stochastic and friction terms in the equations of motion, yielding a Langevin equation, namely eq. 15. The trajectory of the system is obtained by numerical integration of eq. (15), as described in equation (12) thorough equation (17) of reference 12.

Berendsen Dynamics (BD)

There are other methods to maintain a constant temperature heat bath with less computational effort. These methods can be classified in two large groups: extended Lagrangian methods31,32 and rescaling of velocities.27,33 The method that we chose for our MD simulations belongs to the second category and is known as the Berendsen thermostat.27 The idea behind this method is that the system is forced to have the same kinetic energy as if it were subject to the forces in equation (8). To accomplish this, the velocities are rescaled by a factor

λ=[1+δtτT(T0T(t)1)]1/2 (18)

at every simulation step, where δt is the time step, T0 is the reference temperature, τT is an adjustable parameter (known as the time constant of the thermostat), and T(t), the instantaneous temperature of the system at time t, is given by eq. (19).

T(t)=2K(t)RD (19)

where K(t) is the kinetic energy of the system, R is the universal gas constant, and D is the number of degrees of freedom of the system. As a result, the system is globally coupled to a heat bath at temperature T0. Although this method has not been proven to generate a true canonical ensemble, it has the advantage that the coupling can be made as weak as desired by manipulating the constant τT. It has been shown27 that small values of τT (strong coupling) reduce the fluctuations in the kinetic energy K at the expense of increasing fluctuations in the total energy E. Based on our earlier work11, we set τT = 48.9 fs = 1 m.t.u (molecular time unit) and δt= 0.05 m.t.u. These values were tested by carrying out MD simulation with Berendsen dynamics on a system composed of two chains of an unblocked ALA10 polypeptide at a concentration of 1 mM. During the simulations, the fluctuations in the total (E), kinetic (K) and the potential (U) energy were monitored. The simulations showed that the parameters used for the single chain were appropriate for multi-chain complex as well.

The parameters and weights in UNRES have been determined by a hierarchical optimization method.19,21,22,34 The idea behind this method is to reproduce a funnel-like energy landscape with energy decreasing as the number of native-like elements in a structure increases.19,34 Because the 4P force field was designed to find native-like structures as global minima in the potential energy surface, the free-energy gaps between the native-like structures and the lowest-energy non-native structure of the training protein were over emphasized in the optimization process.23 Consequently, the optimal folding temperature for the MD simulations with the UNRES 4P force field turned out to be 800 K13. This value gave the best compromise between folding time and stability of the native-like structures for several benchmark proteins13. This high temperature was not a problem while carrying out single chain simulations because the internal forces acting on a polypeptide chain were tuned to this high temperature. However, in multi-chain simulations, the chains move with respect to each other, and the external motions are too strong to allow association. Therefore, we rescaled all energy term weights by a factor of 3/8 to reduce the folding temperature to 300 K. This operation changes only the energy scale but not the structure of the energy landscape.

3. Results

To study different aspects of the UNRES/MD multiple-chain implementation, we carried out a number of tests. We compared Langevin (LD) and Berendsen dynamics (BD) by carrying out multiple-chain simulations with the same initial conditions with each method. To test whether the presence of other chains was a necessary condition to fold the monomers, we also carried out single-chain simulations with BD and LD, and compared the structures obtained with those of the monomers in the crystal structure of the oligomers. Finally, since the method failed to predict the native structure of 1C94, additional simulations starting from the pdb structure were carried out for this protein. This was done to check whether the native structure was not found because of insufficient simulation time or because the force field was not good enough to properly represent the energy landscape of this protein.

All the runs (both single-chain and multi-chain), except those starting from the PDB structure, were started with the chains in an extended conformation. In all cases, the initial velocities of the peptide groups and side chains were randomly generated. In the multi-chain runs, the chains were placed parallel to each other, separated by a distance large enough (20Å for GCN4-p1 and 1C94, and 40 Å for 1G6U) to allow them to rearrange independently. Since the chains rapidly adjust to an equilibrium ensemble, after starting from extended conformations, the simulations are practically independent of the starting condition. The initial velocities were selected from a Gaussian distribution corresponding to the average kinetic energy at the simulation temperature, as in our earlier work,11 and the temperature was held constant at 300K during all the simulations.

In the multi-chain runs, for those starting from the extended conformation, the radius of the confining sphere was initially set large enough to fit the extended chains. After the first 24 ns of simulation, the radius of the sphere was decreased slowly until the desired concentration (1 mM for the dimers and 10 mM for the tetramer) was reached. This concentration, although higher than those concentrations used in the experiments3537, was chosen because it resulted in a volume large enough to fit the chains without altering their structures, and small enough for the monomers to find each other and interact in a short period of time.

To classify the runs into success and failure, we monitored the Cα rmsd between the computed structures and the crystal structure. If this value, hereafter referred to as ρ, fell below a cut-off value, ρcut, the protein was considered to have folded. The folding time τf, defined as the time at which ρ fell below the cut-off ρcut for the first time, and the residence time τres, defined as the fraction of the total time that ρ was below ρcut, were also computed. For 1G6U, ρcut was, 5 Å for the monomers and 7 Å for the dimers; for GCN4, ρcut was 3.4 Å for the monomers and 4.8 Å for the dimers and, for 1C94, ρcut was 4 Å for the monomers, 5.6 Å for the dimers and 8 Å for the tetramers. If the monomers were folded by this criterion and were stable, and the arrangement of the chains was stable but not native, the overall structure was classified as misfolded. If this criterion was not met, the structure was classified as non-folding.

Domain swapped dimer (PDB ID 1G6U)

1G6U is a synthetic α-helical homodimer with 48 residues per chain37. Each monomer consists of two α-helix segments, with the shortest (14 residues) helix packed against the longest (28 residues) helix. The monomers assemble forming a three-α-helix bundle with the long helices in antiparallel position (Fig. 2A). We will refer to the shortest helix as H1 and the longest helix as H2 (see Fig. 3). In order to provide a better description of the folding trajectories, we monitored the rmsd (Table 1) with respect to the native structure for the entire protein, for each of the monomers and for each of the helices (H1 and H2). To determine the folding times of H1 and H2, we set their cut-off rmsd at 1.5 Å and 4 Å, respectively.

Fig. 2.

Fig. 2

(A) Experimental structure of 1G6U. (B) The most native-like structure (Cα rmsd = 1.79 Å) obtained with BD UNRES MD. (C) An example of a misfolded structure. The C terminus of each chain is marked.

Fig. 3.

Fig. 3

Superposition of one of the monomers in the 1G6U experimental dimer structure (black) on the most native-like structure (grey) (Cα rmsd = 1.22 Å) obtained with the UNRES MD simulations of the monomer using BD. The N-terminal helix H1 and the C-terminal helix H2 are indicated as well as the C terminus.

Table 1.

Summary of trajectories for 1G6U

dimer

〈τf 〈τf(H1)〉 〈τf(H2)〉 ρ min <E>f <E>mf CPU time
Algorithm Nf [ns] [ns] [ns] [Å] 〈τres [kcal/mol] Nmf [kcal/mol] [hs]
Berendsen 9(20) 4.8(0.30) 0.14 0.16 1.79 49 % −402 1 −401 2.9

Langevin 2(19) 14.9(4.0) 0.35 0.20 2.38 36% −403 2 −398 3.9
monomer

〈τf 〈τf(H1)〉 〈τf(H2)〉 ρmin <E>f
Algorithm Nf [ns] [ns] [ns] [Å] 〈τres [kcal/mol]
Berendsen 10 0.92 0.18 0.21 1.22 86 % −186

Langevin 10 2.6 0.25 0.24 1.28 69 % −188

Nf: Number of trajectories (out of 10) that folded to native-like structures. In the dimer simulations, the number of monomers (out of 20, since there were 2 monomers on each of the 10 dimer simulations) that folded to native-like structure is indicated between parentheses.

〈τf〉: Average folding time. The folding time was defined as the time at which the rmsd with respect to the crystal structure fell below the cut-off value (7 Å for the dimers and 5 Å for the monomers). In those runs for which the rmsd never went below the cut-off, the folding time was considered to be the simulation time (12 ns for Berendsen and 16 ns for Langevin). In the dimer simulations, the average folding time of the monomers is indicated between parentheses.

〈τf(H1)〉: Average folding time for the N-terminal helix, H1. The folding time was defined as the time at which the rmsd with respect to the crystal structure fell below 1.5 Å.

〈τf(H2)〉: Average folding time for the C-terminal helix, H2. The folding time was defined as the time at which the rmsd with respect to the crystal structure fell below 4 Å.

ρmin: The lowest rmsd in all of the fluctuating trajectories.

〈τres〉: Fraction of the time that the peptide spent in the native basin averaged over all the folding trajectories.

<E>f: Average potential energy over all structures in the native f basin;

Nmf: Number of trajectories (out of 10) that yielded misfolded structures;

<E>mf: Average potential energy over all structures in the misfolded mf basin;

CPU time: Average CPU time (in hours) per 1 ns of simulation on a single 3.06 GHz Intel Pentium IV Xeon processor;

Monomers

As can be seen in Table 1, all the simulations of the monomers converged to native-like structures, showing that dimerization is not necessary for the folding and stabilization of the individual chains. The most native-like structure, 1.22 Å from native, was produced by BD. A superposition of this structure and the native structure is shown in Fig. 3.

Figure 4 shows potential energy and ρ values for an LD trajectory (panels A and B, respectively) and a BD trajectory (panels C and D, respectively) for an isolated monomer of 1G6U. As can be seen from Fig. 4, the native basin was very stable and, with both methods, once the peptide adopted native-like structures, the fluctuations in the potential energy and ρ became smaller, and the peptide remained in the native basin.

Fig. 4.

Fig. 4

Variation of the potential energy (A) and the Cα rmsd from the native structure of the monomer in the dimer (B), during the folding of an isolated monomer of 1G6U obtained with Langevin dynamics. The solid horizontal line at −187.1 Kcal/mol in panel A is the mean value of the energy after the monomer has reached the native basin. In panel B, the dashed horizontal line, at 5 Å corresponds to the cut-off rmsd above which the monomer structure is considered to have left the native basin, and the solid horizontal line at 2.7 Å indicates the mean Cα rmsd of the monomer inside the native basin. Panels C and D contain the same information as panels A and B, respectively, for a trajectory obtained with Berendsen dynamics. The solid horizontal line at −187.1 Kcal/mol in panel C is the mean value of the energy after the monomer has reached the native basin, and the solid horizontal line at 2.7 Å in panel D is the mean Cα rmsd inside the native basin of the monomer from the monomer in the native structure of the dimer.

Dimers

In the simulation of dimers, the initial separation distance between chains was 40 Å, the initial arrangement was parallel, and the simulation time was about 12 ns for BD and 16 ns for LD. The final concentration of 1 mM was achieved within the first ns. The results are summarized in Table 1. Both algorithms, BD and LD, folded the protein. In general the folding times with BD were shorter than with LD, as observed in our earlier work on single-chain proteins.12 BD also produced the most native-like structure, which is shown in Fig. 2B.

From the simulations, it became evident that the energy landscape generated by the 4P UNRES force field has two basins with low free energy. One of these basins corresponds to the native structure and the other one to a structure that differs from the native in that the long helices are parallel to each other instead of antiparallel (see Fig. 2C). We will refer to the latter structure as a misfolded one. Both structures were very stable and, once the protein fell into one of these basins, it would not escape within the simulation time (12 ns for BD and 16 ns for LD). The difference in average potential energy between the native and the misfolded basin is very small (see Table 1). Thus, it is natural to expect that, for some trajectories, the forces will drive the system to the native basin and, for some others, to the misfolded basin. Indeed, this is what was observed in these simulations. Presumably, improvement of the 4P UNRES force field will stabilize the native basin to a greater extent compared to the non-native basin.

Snapshots of a successful trajectory obtained with LD are shown in Figure 5. For the same trajectory, the values of ρ and the potential energy as a function of time are shown in Fig. 6. The snapshots show that helix formation takes less than 1 ns and, for this particular example, the packing of the helices on both monomers takes about 3 ns. Also for this example, the monomers fold independently, but they are close enough so that, after the subunits have folded, they can overcome the friction forces to turn around (since the initial orientation of the helices is parallel, but in the native structure the orientation is antiparallel) and assemble in less than 2 ns. The folding of the dimer is completed in a total time of 5 ns. The two LD trajectories that converged to the native basin (see Table 1) showed the folding mechanism illustrated in Fig. 5.

Fig. 5.

Fig. 5

Example of a successful trajectory of 1G6U obtained with Langevin dynamics. The C terminus of each chain is marked.

Fig. 6.

Fig. 6

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) in a successful trajectory of 1G6U obtained with Langevin dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native for each of the monomers. The solid horizontal line at −403 Kcal/mol in panel A is the mean value of the energy after the dimer has reached the native basin, and the solid line at 4.8 Å in panel B is the mean Cα rmsd inside the native basin of the dimer. The dashed horizontal line in panels B, C and D corresponds to the cut-off rmsd (7 Å for the dimer and 5 Å for the monomers) above which a structure is considered to have left the native basin. The solid horizontal line at 3.3 Å in panels C and D is the mean Cα rmsd inside the native basin of the monomer.

Figure 7 shows snapshots of an LD trajectory leading to a misfolded structure. The values of ρ and the potential energy for this trajectory are shown in Fig. 8. For this particular trajectory, chain A folds first (cf. panels C and D), and chain B folds while it binds to form the dimer (cf. panels B and C). The formation of the dimer in Fig. 8 corresponds to the stabilization of ρ around 15.6 Å in panel B. For the other LD trajectory that converges to the misfolded basin, the assembly mechanism was similar to that described in Figure 5, in the sense that the monomers folded completely before they assembled. Thus, folding of the monomers followed by their assembly does not always lead to the native basin.

Fig. 7.

Fig. 7

Example of a trajectory of 1G6U, obtained with Langevin dynamics, leading to the misfolded structure. The C terminus of each chain is marked.

Fig. 8.

Fig. 8

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) for a misfolding trajectory of 1G6U obtained with Langevin dynamics. The misfolded structure differs from the native in that the long helices are parallel to each other instead of antiparallel. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native for each of the monomers. In panels A and B, the solid horizontal line (at −401 Kcal/mol in panel A and 15.7 Å in panel B) is the mean value of the energy and the Cα rmsd from the native, respectively, after the protein has fallen into the misfolded basin. The dashed horizontal line in panels C and D corresponds to the 5 Å cut-off rmsd, above which the monomers are considered to have left the native basin i.e., the monomers folded but the overall structure was misfolded. The solid horizontal line in panels C and D (at 5.9 Å in panel C and 3.1 Å in panel D) is the mean Cα rmsd inside the native basin of the monomer.

Those LD trajectories, that did not converge to the native or misfolded basin, reached a state (called non-folded) in which either one or both monomers were folded, but they had not yet assembled within the 16 ns simulation time. Their structures were similar to either the 3-ns or the 4-ns snapshot in Fig. 5.

With BD, all the simulations converged to either the native or the misfolded basin (see Table 1). Among those runs that converged to the native basin, two different pathways were observed, one on which the subunits fold before their assembly (“lock and key” mechanism) and another one on which the subunits fold simultaneously with their assembly (“induced fit” mechanism). Although only a few runs followed the latter assembly mechanism (3 out 9 folding trajectories), this pathway seems to be 2.5 times faster on average than the assembly of already folded subunits, which is not surprising since, after the monomers are folded, they might collide several times until they find the right orientation, which will in general slow down the process. Fig. 9 and Fig. 10 illustrate these two folding pathways. For the trajectory shown in Fig. 9 (fast folding pathway), folding and association of the chains occurs simultaneously, with the dimer folding in less than 0.4 ns, while for the trajectory shown in Fig. 10 (slow folding pathway), although the chains collide several times (snapshots at 0.23 ns, 0.33 ns, 5.83 ns and 6.44 ns), only the last collision results in the formation of the dimer. There is a long period between the snapshots at 0.58 ns and 5.83 ns (this period is not shown in the snapshots), during which the chains remain folded, but they do not collide at all. Fig. 11 contains the values of ρ and the potential energy as a function of time, corresponding to the trajectory shown in the snapshots in Fig 10. In Fig. 11, two pronounced drops in energy can be seen (panel A). The first one corresponds to the folding of the monomers (ρ below 5 Å in panels C and D), and the second one corresponds to the assembly of the dimer (ρ below 7 Å in panel B).

Fig. 9.

Fig. 9

Example of a fast folding trajectory of 1G6U obtained with Berendsen dynamics. The C terminus of each chain is marked.

Fig. 10.

Fig. 10

Example of a slow folding trajectory of 1G6U obtained with Berendsen dynamics. The C terminus of each chain is marked.

Fig. 11.

Fig. 11

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) in a successful trajectory of 1G6U obtained with Berendsen dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native structure for each of the monomers. The solid horizontal line at −401 Kcal/mol in panel A is the mean value of the energy after the protein has reached the native basin. The dashed horizontal line in panels B, C and D corresponds to the cut-off rmsd (7 Å for the dimer and 5 Å for the monomers) above which a structure is considered to have left the native basin. The solid horizontal line at 4.5 Å in panel B is the mean Cα rmsd inside the native basin of the dimer. The solid horizontal line in panels C (2.8 Å) and D (3.1 Å) is the mean Cα rmsd inside the native basin of the monomer‥

The folding mechanism of the only BD trajectory that converged to the misfolded basin was similar to the one described in Fig. 9 (fast folding pathway), except that the orientation of the chains was parallel instead of antiparallel as in the native structure.

When comparing the folding of the isolated monomers of 1G6U in the single- and multi-chain simulations, we found that, with LD, the average folding time of the monomers in the single-chain simulations was shorter than in the multi-chain simulations (see Table 1), which suggests that the interactions between chains might slow down the folding of the individual chains. To further elucidate whether this delay occurs in the formation of helices H1 and H2 or in their packing, we compare the folding times of H1 and H2 in the single chain simulations with their folding times in the multi-chain simulations. We found almost no difference in the average folding time of H2 and, in the case of H1, the formation of the helix seems to be slightly faster for the single chain simulations (see Table 1). This suggests that, for 1G6U with LD, the interactions between the chains can hinder the packing of helices H1 and H2, and can also slow down the formation of the shortest helix (H1).

With BD, on average, the monomers folded three times faster in the multi-chain simulations than in the single-chain simulations (0.30 ns compared to 0.92 ns) (see Table 1). Further analysis of the folding times of helices H1 and H2 showed that, H1 and H2 fold at approximately the same rate for single-chain and multi-chain simulations (see Table 1). This indicates that interactions between chains enhance the packing of H1 and H2, but have no substantial effect on the formation of the helical structures.

The fact that the packing of H1 and H2 is favored by multi-chain interactions with BD and hindered with LD might be explained as follows: with BD, in which the friction forces are absent, the chains can move very fast, and if a collision that does not favor the packing of H1 and H2 has taken place, the chains can quickly rearrange to find a better orientation while, with LD, the reorientation of the chains is much slower due to the friction forces from the solvent. With both methods, collisions will sometimes favor the packing of H1 and H2 and other times hamper it, the only difference is that, with BD, the chains can collide more frequently, and overall (when averaged over several trajectories) the presence of another chain will favor single chain folding.

3.2 GNC4 Leucine Zipper (PDB ID 2ZTA)

The GCN4 leucine zipper (GCN4-p1), derived from the yeast transcriptional activator GCN4, is an α-helical homodimer consisting of two parallel chains with 33 residues per chain35 (Fig. 12 A). Since the helices in GCN4-p1 wrap around each other, its motif is known as a coiled coil. The coiled coil motif is found in many proteins and, for this reason, GCN4-p1 and its mutants have been the subject of numerous studies5,35,38. In particular, simulations of the folding pathway of GCN4-p1 have been carried out by Vieth et al.5, as mentioned in the Introduction.

Fig. 12.

Fig. 12

(A) Experimental structure of GCN4-p1. (B) The most native-like structure (Cα rmsd = 1.19 Å) obtained with LD UNRES MD. (C) An example of a misfolded structure. The N terminus of each chain is indicated

Monomers

With both the BD and LD methods, 9 out of 10 monomer trajectories converged to native-like structures, as can be seen from Table 2. Moreover, these native-like structures were quite stable, indicating that dimerization is not necessary for the folding and stabilization of the individual chains. A superposition of the most native-like structure, obtained with BD, and the experimental structure is shown in Fig. 13 A. Those BD and LD trajectories that did not find the native basin by the end of the simulation showed structures with ρ values around 11 Å, in which the helix was bent, packing against itself, as shown in Fig 13 B.

Table 2.

Summary of trajectories for GCN4-p1

dimer

〈τf ρ min <E>f <E>mf CPU time
Algorithm Nf [ns] [Å] 〈τres [kcal/mol] Nmf [kcal/mol] [hs]
Berendsen 4(17) 6.6(3.2) 1.22 29 % −214 6 −217 1.5

Langevin 3(16) 9.1(3.4) 1.19 81 % −218 1 −225 1.9
monomer

〈τf ρ min <E>f
Algorithm Nf [ns] [Å] 〈τres [kcal/mol]
Berendsen 9 1.5 0.59 69 % −104

Langevin 9 2.7 0.70 74 % −97

Nf: Number of trajectories (out of 10) that folded to native-like structures. In the dimer simulations, the number of monomers (out of 20, since there were 2 monomers on each of the 10 dimer simulations) that folded to native-like structure is indicated between parentheses.

〈τf〉: Average folding time. The folding time was defined as the time at which the rmsd with respect to the crystal structure fell below the cut-off value (4.8 Å for the dimers and 3.4 Å for the monomers). In those runs for which the rmsd never went below the cut-off, the folding time was considered to be the simulation time (12 ns). For BD and LD, in the dimer simulations, the average folding time of the monomers is indicated between parentheses.

ρmin: The lowest rmsd in all of the fluctuating trajectories;

〈τres〉: Fraction of the time that the peptide spent in the native basin averaged over all the folding trajectories.

<E>f: Average potential energy over all structures in the native f basin;

Nmf: Number of trajectories (out of 10) that yielded misfolded structures;

<E>mf: Average potential energy over all structures in the misfolded mf basin;

CPU time: Average CPU time (in hours) per 1 ns of simulation on a single 3.06 GHz Intel Pentium IV Xeon processor;

Fig. 13.

Fig. 13

(A) Superposition of one of the monomers from the experimental structure of GCN4-p1 (black) on the most native-like structure (grey) (Cα rmsd = 0.59 Å) obtained with BD UNRES MD. (B) A structure that was often found during the folding pathway of GCN4-p1 (with both BD and LD), and was the final structure of those trajectories that did not find the native basin. The N-terminus is indicated.

The structure shown in Fig. 13B was also found along the pathway of some of the trajectories that converged to native-like structures. Potential energy and ρ values as a function of time, for an LD trajectory showing such behavior are shown in Fig. 14 (potential energy in panel A and ρ values in panel B). During the first 3 ns of simulation of this trajectory, the peptide adopts structures similar to that shown in Fig. 13 B, which correspond to the plateau in ρ values around 11 Å in panel B. At the third ns of simulation, the monomer finds the native basin (ρ falls below the 3.4 Å cut-off in panel B), and the energy drops considerably (see panel A), showing that the structure in Fig 13 B is only a local minimum and does not compete with the native structure.

Fig. 14.

Fig. 14

Variation of the potential energy (A) and the Cα rmsd from the native structure of the monomer in the dimer (B), during the folding of an isolated monomer of GCN4-p1 obtained with Langevin dynamics. The solid horizontal line at −98 Kcal/mol in panel A is the mean value of the energy after the monomer has reached the native basin. In panel B, the dashed horizontal line at 3.4 Å corresponds to the cut-off rmsd above which the monomer structure is considered to have left the native basin, and the solid horizontal line at 1.9 Å is the mean Cα rmsd of the monomer inside the native basin. Panels C and D contain the same information as panels A and B, respectively, for a trajectory obtained with Berendsen dynamics. The solid horizontal line at −105 Kcal/mol in panel C is the mean value of the energy after the monomer has reached the native basin, and the solid horizontal line at 1.8 Å in panel D is the mean Cα rmsd inside the native basin of the monomer from the monomer in the native structure of the dimer.

Not all the trajectories that converged to the native basin exhibited the folding pathway described in the previous paragraph. In other simulations, a fast folding pathway was observed, with the monomer rapidly finding the native basin without spending time in any intermediate structure. An example of such behavior can be seen in the BD trajectory shown in panels C and D of Fig. 14 (potential energy in panel C and ρ values in panel D). This behavior was the most commonly observed among all the runs (both BD and LD).

In general, either with BD and LD, the native basin was very stable, which can be inferred from the behavior of ρ in panels B and D of Fig. 14; once ρ crossed the 3.4 Å rmsd cut-off (equivalent to finding the native basin), it remained within this cut-off most of the time.

Dimers

The initial separation distance between chains was 26 Å, and the initial arrangement was parallel. Both methods, BD and LD, generated trajectories leading to native-like structures within 12 ns of simulation. The results are summarized in Table 2. The equilibrium concentration of 1 mM was reached during the first 24 ps of simulation. Again, as for 1G6U, two families of stable structures (corresponding to basins with low free energy) were found; one of them was native-like and the other one differed from the native structure in that the orientation of the helices was antiparallel instead of parallel. The most native-like structure generated by UNRES MD, as well as an example of a misfolded structure, are shown in Fig. 12 B and Fig 12 C, respectively.

When running in the LD mode, two different pathways were observed, one on which folding and assembly of subunits were coupled, “induced fit” mechanism, and another one in which the subunits folded before they assemble, “lock and key” mechanism. Of the three LD trajectories that converged to the native basin, two of them folded by the induced fit mechanism and the remaining one by the lock-and-key mechanism. Snapshots from one of the runs that folded by the induced fit mechanism are shown in Fig 15, and the potential energy and ρ values for the same trajectory are shown in Fig 16. In Fig. 15, dimerization starts with the association of the small helical segments at the N-termini and propagates toward the C-termini simultaneously with formation of the helices. The two trajectories folding by this mechanism folded in less than 0.3 ns, which was ten times faster than the trajectory folding by the lock and key mechanism. Snapshots from the trajectory folding by the lock and key mechanism are shown in Fig. 17, and the corresponding potential energy and ρ values as a function of time are shown in Fig. 18. In Fig. 17, the folding of the helices is almost completed at the 0.20 ns snapshot, but the chains fail to bind and go apart. It takes almost 5 ns more for the chains to find the right orientation and form the dimer. This folding mechanism will in general lead to a larger folding time since, once the individual chains adopt their native structure, moving through the solvent in order to find the proper packing is hard, while if the subunits are already attached (in the right place) the rate of folding is limited only by the folding of the individual chains.

Fig. 15.

Fig. 15

Example of a fast folding trajectory of GCN4-p1 obtained with Langevin dynamics. The N terminus of each chain is marked.

Fig. 16.

Fig. 16

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) in a fast folding trajectory of GCN4-p1 obtained with Langevin dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native structure for each of the monomers. In panel A, the solid horizontal line at −220 Kcal/mol is the mean value of the energy after the dimer has reached the native basin. The dashed horizontal line in panels B, C and D corresponds to the cut-off rmsd (4.8 Å for the dimer and 3.4 Å for the monomers) above which a structure is considered to have left the native basin. The solid horizontal line at 3.1 Å in panel B is the mean Cα rmsd inside the native basin of the dimer. The solid horizontal line in panels C and D (at 2.1 Å in panel C and 2.2 Å in panel D) is the mean Cα rmsd inside the native basin of the monomer.

Fig. 17.

Fig. 17

Example of a slow folding trajectory of GCN4-p1 obtained with Langevin dynamics. The N terminus of each chain is marked.

Fig. 18.

Fig. 18

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) in the slow folding trajectory of GCN4-p1 obtained with Langevin dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native structure for each of the monomers. In panel A, the solid horizontal line at −219 Kcal/mol is the mean value of the energy after the dimer has reached the native basin. The dashed horizontal line in panels B, C and D corresponds to the cut-off rmsd (4.8 Å for the dimer and 3.4 Å for the monomers) above which a structure is considered to have left the native basin. The solid horizontal line at 3.0 Å in panel B is the mean Cα rmsd inside the native basin of the dimer. The solid horizontal line in panels C and D (at 2.0 Å in panel C and 2.1 Å in panel D) is the mean Cα rmsd inside the native basin of the monomer.

When running in the BD mode, for some of the trajectories, the protein jumped from one basin to the other one. The potential energy and ρ values for a representative trajectory presenting this behavior are shown in Fig. 19. It can be seen that the dimer (panel B) folds and misfolds without affecting the structure of the monomers (panels C and D), which is consistent with our results from single-chain simulations indicating that the monomers are stable by themselves.

Fig. 19.

Fig. 19

Variation of the potential energy (A) and the Cα rmsd from the native structure for the dimer (B) in a folding trajectory of GCN4-p1 obtained with Berendsen dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native structure for each of the monomers. The dimer remains in the native basin for almost 5 ns after which it jumps to the misfolded basin. The solid horizontal lines at −214 Kcal/mol and −218 Kcal/mol in panel A correspond to the mean values of the potential energy inside the native basin and the misfolded basin, respectively. The solid horizontal lines at 3.7 Å and 16.3 Å in panel B correspond to the mean Cα rmsd inside the native basin and the misfolded basin, respectively. The dashed horizontal line in panels B, C and D corresponds to the cut-off rmsd (4.8 Å for the dimer and 3.4 Å for the monomers) above which a structure is considered to have left the native basin. The solid horizontal line in panels C (at 2.3 Å) and D (at 2.6 Å) corresponds to the mean Cα rmsd inside the native basin of the monomer.

As observed for 1G6U, the average potential energies of the native and misfolded basins were very similar (see Table 2), the slightly lower values for the misfolded structures being within the expected error in the potential function.

When comparing the folding times for the monomers in the multi-chain simulations with those in the single-chain simulations, we notice that, with both BD and LD, the isolated monomers fold, on average, slightly faster. A closer look at those monomers that, in multi-chain simulations, have the largest folding times, or did not fold at all, show that the folding was delayed because the monomers are trapped in structures similar to that shown in Fig 13 B. In all simulations, the dimers were formed, but one or both chains have this bent structure. As already mentioned, this structure was also found along the pathway of some of the trajectories in the simulations of isolated monomers, but the fact that the isolated monomers were able to find the native structure faster indicates that multi-chain interactions might stabilize the structure shown in Fig 13 B.

Those trajectories that did not converge to the native or misfolded basin, reached a state (called non-folded) in which a dimer was formed, but one or both chains had the non native-like structure shown in Fig. 13 B.

It should be emphasized that UNRES MD reflects the energy landscape produced by the UNRES 4P force field. The presence of non-native stable structures is a feature of the force field, not the method. Improvement of the 4P UNRES force field is expected to stabilize the native over the non-native basin to a greater extent.

3.3 retro-GNC4 Leucine Zipper (PDB ID 1C94)

1C94 is a synthetic α-helical homotetramer of 38 residues per chain. The sequence of 1C94 corresponds to the reversed sequence of the Leucine Zipper portion of GCN4, viz., GCN4- p1 (section 3.2). GCN4-p1 consists of 33 residues, and 1C94 consists of the same 33 residues but in reversed order from N- to C-terminus; in addition 1C94 is extended at the N-terminus with the tripeptide sequence CYS-GLY-GLY, and at the C-terminus with GLN-LEU.36 Thus, 1C94 is referred to as the retro-GNC4 Leucine Zipper. The crystal structure, consisting of four α-helices oriented parallel to each other (see Fig. 20 A), was modeled36 as a dimer of dimers since mass spectroscopic analysis indicated that the chains were covalently linked in pairs by disulfide bonds.36

Fig. 20.

Fig. 20

Experimental structure of 1C94 (A) and examples of misfolded structures obtained with LD and BD UNRES MD (B and C). The N-terminus of each chain is indicated.

Monomers

As can be seen from Table 3, 9 out of 10 monomer Langevin trajectories and all 10 Berendsen trajectories converged to native like structures. The remaining trajectories that did not find the native basin by the end of the simulation showed structures with ρ values around 13 Å where the helix is broken, packing against itself. An example of such a structure is shown in Fig 21 B. With an older version of the UNRES force field (α0 force field39), Saunders and Scheraga25 identified a structure of the type shown in Fig. 21 B as the lowest UNRES energy structure. With the force field used in this work (4P force field)23, however, these types of structures have higher energy than the native-like structures, as can be seen by comparing the two Langevin trajectories shown in Fig. 22. Panels A and B show the energy and ρ values, respectively, for the LD trajectory with final structures similar to that shown in Fig. 21 B, and panels C and D show the same information for the LD trajectory converging to the native basin. The mean value of the potential energy in the native basin is indicated with the solid line at −152 Kcal/mol in panel C, which is 12 Kcal/mol lower than the same quantity in panel A, showing that the UNRES 4P potential energy is lower in the native basin.

Table 3.

Summary of trajectories for 1C94

tetramer

From extended conformation From crystal structure

time 〈τf <E>mf <E>n CPU
Algorithm Nf [ns] Nmf [kcal/mol] Nn [kcal/mol] [hs]
Berendsen 0(18) 2.2 4 −504 -- -- 6.9

Langevin 0(18) 2.6 3 −508 3 508 8.1
monomer

〈τf ρmin <E>f
Algorithm Nf [ns] [Å] 〈τres [kcal/mol]
Berendsen 10 1.4 1.36 81 % −107

Langevin 9 2.0 1.28 83 % −152

Nf: Number of trajectories (out of 10) that folded to native-like structures starting from the extended conformation. In the multi-chain simulations, the number of monomers (out of 40, since there were 4 monomers on each of the 10 simulations of tetramers) that folded to native-like structure is indicated between parentheses.

〈τf〉: Average folding time of the monomers. The folding time was defined as the time at which the rmsd with respect to the crystal structure fell below 4 Å. In those runs for which the rmsd never went below the cut-off, the folding time was considered to be the simulation time (12 ns for the isolated monomer simulations, 35 ns for the tetramers simulations with LD, and 26 ns for the tetramer simulations with BD). The average folding times for the multi-chain complex are not calculated since none of the simulations led to native-like tetramers.

Nmf: Number of trajectories (out of 10) that yielded misfolded structures.

<E>mf : Average potential energy over all structures in the misfolded basin.

Nn: Number of trajectories, out of 10 simulations started with the crystal structure as the initial conformation that, after 8 ns of simulation, still had native-like structures (rmsd with respect to crystal structure below 8Å).

<E>n: Average potential energy over all those trajectories that, starting with the crystal structure, remained in the native basin after 8 ns of simulation.

CPU time: Average CPU time (in hours) per 1 ns of simulation on a single 3.06 GHz Intel Pentium IV Xeon processor;

ρ min: The lowest rmsd in all of the fluctuating trajectories;

<E>f : Average potential energy over all structures in the native f basin;

Fig. 21.

Fig. 21

(A) Superposition of one of the monomers in the experimental structure of 1C94 (black) on the most native-like structure (grey) (Cα rmsd = 1.28 Å) obtained with the BD UNRES MD. (B) A structure that was often found during the folding pathway of 1C94 (either with BD or LD), and was the final structure of the monomer LD trajectory that did not find the native basin. The N-terminus is indicated.

Fig. 22.

Fig. 22

Variation of the potential energy (A) and the Cα rmsd from the native structure of the monomer in the tetramer (B) as a function of time, for an LD trajectory of an isolated monomer of 1C94 converging to a non-native-like structure (which is shown in Fig, 21 B). In panel A, the solid horizontal line at −140 Kcal/mol is the mean value of the energy after the monomer has adopted the non-native stable structure. In panel B, the solid horizontal line at 12.8 Å is the mean Cα rmsd after the peptide has adopted the non-native structure. Panels C and D contain the same information as panels A and B, respectively, for an LD trajectory converging to the native basin. The solid horizontal line at −152 Kcal/mol in panel C is the mean value of the energy after the peptide has reached the native basin. In panel D, the dashed horizontal line at 4 Å corresponds to the cut-off rmsd above which the structure is considered to have left the native basin, and the solid horizontal line at 3.1 Å is the mean Cα rmsd inside the native basin. The solid horizontal line at −152 Kcal/mol in panel C is the mean value of the energy after the peptide has reached the native basin.

Fig. 23 shows potential energy (panel A) and ρ values (panel B) for a sample trajectory obtained with BD. As can be seen in this example, all Berendsen trajectories showed higher energy values (panel A) and higher fluctuations in the ρ values (panel B) compared to LD runs (panels C and D in Fig. 22). This could be explained by the fact that, for BD, the absence of friction forces allows for larger conformational changes. No simulations were carried out for 1C94 dimers.

Fig. 23.

Fig. 23

Variation of the potential energy (A) and the Cα rmsd from the native structure (B), during the folding of an isolated monomer of 1C94 obtained with Berendsen dynamics. In panel A, the solid horizontal line at –105 Kcal/mol is the mean value of the energy after the protein has reached the native basin. The dashed horizontal line, at 4 Å, in panel B corresponds to the cut-off rmsd above which the structure is considered to have left the native basin, and the solid horizontal line at 3.3 Å in the same panel is the mean Cα rmsd inside the native basin.

Tetramers

Berendsen and Langevin simulations were carried out starting with the four chains in the extended conformation, with each pair of chains cross-linked by disulfide bonds. The chains were in the same plane, parallel to each other and with a 20 Å distance between consecutive chains. Based on the experimental data36, the CYS residue at the first N-terminal position was assumed to form a disulfide bond with the corresponding CYS residue in another chain; however, this residue was never included in the rmsd calculations since it is not resolved in the experimental structure. The simulation time was 35 ns for LD runs, and 28 ns for BD runs. The equilibrium concentration of 10 mM was reached during the first 50 ps of simulation.

None of the trajectories obtained with UNRES/MD yielded native-like structures. However, both methods found stable structures consisting of two parallel dimers bound together in antiparallel orientation (instead of parallel as in the native structure), examples of which are shown in Fig. 20 B and 20 C. In the structure shown in Fig. 20 B, the dimers have native-like structures, but the area of contact between the dimers is very small. On the other hand, the structure shown in Fig.20 C has better packing, but the dimers have non-native-like structures, and the disulfide-linked monomers are not parallel to each other, but slightly twisted in order to align in an antiparallel orientation with the monomers from the other dimer. These two structures have approximately the same potential energy (≈ −507 Kcal/mol); we will refer to either of them as misfolded structures. Fig. 24 shows the potential energy (panel A) and ρ values for the tetramer (panel B) and for the dimers (panels C and D) as a function of time for the trajectory leading to the structure in Fig 20 B. It can be seen that, by the end of the simulation, the ρ values for the tetramer stabilize around 22 Å (indicated by a solid line in panel B) while, for the dimers, it remains below or close to the 5.6 Å cut-off (indicated by the dashed lines in panels C and D). The potential energy also stabilizes by the end of the simulation, with values around −510 Kcal/mol (indicated by a solid line in panel A).

Fig. 24.

Fig. 24

Variation of the potential energy (A) and the Cα rmsd from the native structure (B) for a misfolding trajectory of 1C94, starting with extended chains, obtained with Langevin dynamics. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native for each of the dimers. In panels A and B, the solid horizontal line is the mean value of the energy (at −510 Kcal/mol) and Cα rmsd from the native structure (at 22.4 Å), respectively, after the tetramer has found the misfolded basin. The dashed horizontal line in panels C and D corresponds to the 5.6 Å cut-off rmsd, above which the dimers are considered to have left the native basin, i.e., the dimers folded but the overall structure was misfolded.

To determine whether the native structure of the tetramer could not be found because of imperfections in the UNRES 4P force field, or simply because the simulation times were too short, we carried out a set of 8 ns simulations with the crystal structure as the initial conformation using Langevin dynamics. As can be seen in Table 3, 3 out of 10 simulations remained in the native basin. Potential energy and ρ values corresponding to one of the trajectories that did not remain in the native basin are shown in Fig. 25. It is important to notice that, although the tetramer leaves the native basin (ρ values crossing the dashed line at the 8 Å cut-off in panel B), there is no substantial change in the potential energy (panel A). We calculated the average potential energy among those structures that remained in the native basin and compared it with the average energy among the misfolded structures. The values obtained were almost equal (see Table 3), indicating that the protein might choose either conformation with the same probability. However, when starting from the extended conformation, none of the simulations led to native-like structure. Therefore, the energy landscape generated by the UNRES 4P potential makes the antiparallel conformation more easily accessible than the parallel (native) conformation; i.e. the free energy of the misfolded basin has a lower value compared to the native basin.

Fig. 25.

Fig. 25

Variation of the potential energy (A) and the Cα rmsd from the native structure (B) for a trajectory of 1C94 that did not remain in the native basin, obtained with Langevin dynamics, with the crystal structure as the initial conformation. The solid horizontal line at −510 Kcal/mol in panel A is the mean value of the energy during the simulation. The dashed horizontal line in panel B corresponds to the 8 Å cut-off rmsd, above which the tetramer is considered to have left the native basin. For the same trajectory, panels C and D show the variation of the Cα rmsd from the native for each of the dimers. The dashed horizontal line in panels C and D corresponds to the 5.6 Å cut-off rmsd, above which the dimers are considered to have left the native basin.

When comparing the folding times of the monomers in the single- and multi-chains simulations (Table 3), we did not find any appreciable difference, indicating that, for this protein, multi-chain interactions do not play an important role in the folding of the monomers.

We conclude that the failure to fold the protein to the native tetramer with the UNRES 4P force field should be attributed to the imperfections in the potential rather than to insufficient simulation time because, first, for the two preceding proteins (1G6U and GCN4-p1), we observed the formation of both the native and the non-native dimers and, second, in our previous implementation of UNRES to search for the native structures of multi-chain proteins with CSA25,40, the native structure of retro-GNC4 could be predicted by global optimization only when native symmetry constraints were imposed. Improvement of the 4P UNRES force field is expected to stabilize the native basin to a greater extent compared to the non-native basin.

4 Conclusions

The UNRES/MD implementation described in reference 13 was extended to treat multi-chain proteins. The method was tested on three α-helical proteins, two dimers and one tetramer.

To simulate a constant temperature bath, two alternative methods were implemented, the Berendsen thermostat (BD) and a method based on the Langevin equation (LD). The latter method includes friction and stochastic forces explicitly as opposed to the former for which these forces are included implicitly. When comparing the time required for each method to find the global minimum of the energy, BD proved to be much faster than LD, as observed in our earlier studies on single-chain proteins.12 However, it should be noted that, despite its predicting efficiency, BD might not reproduce the true folding pathway. LD, which reproduces a true canonical ensemble, should be used instead when studying the kinetics of the folding process, as in reference 14.

Simulations of single chains and multi-chain complexes were carried out with BD and LD. Single-chain simulations indicate that, for each of the three α-helical proteins tested in this work, the structure adopted by the monomer in the multi-chain complex is also the lowest UNRES 4P energy structure of the isolated monomer. In general, the folding times of the monomers in the single chain simulations were shorter than those in the multi-chain simulations, which indicates that, with the UNRES 4P force field, the short-range interactions, responsible for the folding of the single chain α-helices, are impaired by the interactions between different chains. However, the folding of 1C6U with BD (section 3.1) was the exception. In these simulations, the monomers folded faster when they were allowed to interact with another monomer, i.e., the correct packing of the two helices on each monomer is favored by the interactions with another monomer. Although the wrong orientation of the monomers with respect to each other can sometimes hinder the packing of the helices, with BD, in which the friction forces are absent, the chains can rearrange quickly to find a more favorable orientation that will aid the packing of each monomer. This behavior is probably an artifact of BD, and might not represent the folding mechanism of 1G6U.

It is important to note that, although some of the trajectories led to non-native-like structures, these structures were indeed free-energy minima within the context of UNRES 4P. In the case of the two dimers, the non-native structure was competing with the native one. This competition was reflected in our simulations, especially in the case of GCN4-p1 for which the dimer switched from one structure to the other. In the case of 1C94, the results were poor since none of the trajectories yielded the native structure. The reason for this failure might be found in the defects of the UNRES parameters. Improvement of these parameters is ongoing research in our laboratory.

It must be emphasized that the goal of this work was to test the implementation of UNRES/MD on multi-chain proteins and not to improve the 4P force field and, therefore, we chose relatively simple systems which the force field could treat to test the approach, as pointed out in the Introduction. The UNRES 4P force field was trained using four proteins with different topologies and tested on 66 proteins with chain length from 28 to 144 amino acid residues. The average size of correctly predicted segments of α-helical proteins was about 67 residues23. The parameterization procedure and the limitations of the UNRES 4P force field are described extensively in reference 23. The reason for such limitation must be found in the old parameterization procedure19,2123 which neglected conformational entropy, an issue that has been addressed in the new procedure that is currently being developed in our laboratory and will be reported in a separate paper. The multichain UNRES/MD method, with a force field developed by this new procedure, was recently used to predict the structure of a homodimer in the CASP7 experiment. The predicted monomeric structures were complemented with MD simulations of dimers, considerably improving the quality of the predictions. Our CASP7 results will be reported in a separate paper.

Finally, in contrast to earlier calculations of multi-chain complexes, with CSA (Conformational Space Annealing)41,42 as a global optimization algorithm, in which symmetry constraints had to be imposed in order to simulate the experimental structure, no such constraints were imposed here. Apparently, in the time scale achieved in MD with UNRES, the search of the conformational space of a dimer is more efficient than with CSA.

Acknowledgments

We thank Dr. M. Khalili for comments on this paper. This research was supported by grants from the National Institutes of Health (NIH; GM-14312), from the NIH Fogarty International Center (TW-7193), from the National Science Foundation (MCB05-41633), and from the Polish Ministry of Education and Science (3 T09A 032 6). It was carried out by using the resources of our 880-processor Beowulf cluster at the Baker Laboratory of Chemistry and Chemical Biology, Cornell University, the National Science Foundation Tetrascale Computing System at the Pittsburgh Supercomputer Center, the National Center for Supercomputing Applications System at the University of Illinois at Urbana-Champaign, and the resources of the Center for Computation and Technology at Louisiana State University, which is supported by funding from the Louisiana legislature’s Information Technology Initiative.

References

  • 1.Day R, Daggett V. Adv. Protein Chem. 2003;66:373. doi: 10.1016/s0065-3233(03)66009-2. [DOI] [PubMed] [Google Scholar]
  • 2.Fersht AR, Daggett V. Cell. 2002;108:573. doi: 10.1016/s0092-8674(02)00620-7. [DOI] [PubMed] [Google Scholar]
  • 3.Kubelka J, Hofrichter J, Eaton WA. Curr. Opinion Struct. Biol. 2004;14:76. doi: 10.1016/j.sbi.2004.01.013. [DOI] [PubMed] [Google Scholar]
  • 4.Shea JE, Brooks CL., III Annu. Rev. Phys. Chem. 2001;52:499. doi: 10.1146/annurev.physchem.52.1.499. [DOI] [PubMed] [Google Scholar]
  • 5.Vieth M, Kolinski A, Brooks CL, Skolnick J. J. Mol. Biol. 1994;237:361. doi: 10.1006/jmbi.1994.1239. [DOI] [PubMed] [Google Scholar]
  • 6.Ma B, Nussinov R. Proc. Natl. Acad. Sci. U. S. A. 2002;99:14126. doi: 10.1073/pnas.212206899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Levy Y, Caflisch A, Onuchic JN, Wolynes PG. J. Mol. Biol. 2004;340:67. doi: 10.1016/j.jmb.2004.04.028. [DOI] [PubMed] [Google Scholar]
  • 8.Yang S, Cho SS, Levy Y, Cheung MS, Levine H, Wolynes PG, Onuchic JN. Proc. Natl. Acad. Sci. U. S. A. 2004;101:13786. doi: 10.1073/pnas.0403724101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Levy Y, Papoian GA, Onuchic JN, Wolynes PG. Israel J. Chem. 2004;44:281. [Google Scholar]
  • 10.Yang S, Levine H, Onuchic JN, Cox DL. FASEB J. 2005;19:1778. doi: 10.1096/fj.05-4067hyp. [DOI] [PubMed] [Google Scholar]
  • 11.Khalili M, Liwo A, Rakowski F, Grochowski P, Scheraga HA. J. Phys. Chem. B. 2005;109:13785. doi: 10.1021/jp058008o. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Khalili M, Liwo A, Jagielska A, Scheraga HA. J. Phys. Chem. B. 2005;109:13798. doi: 10.1021/jp058007w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liwo A, Khalili M, Scheraga HA. Proc. Natl. Acad. Sci. U.S.A. 2005;102:2362. doi: 10.1073/pnas.0408885102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Khalili M, Liwo A, Scheraga HA. J. Mol. Biol. 2006;355:546. doi: 10.1016/j.jmb.2005.10.056. [DOI] [PubMed] [Google Scholar]
  • 15.Liwo A, Pincus MR, Wawak RJ, Rackovsky S, Scheraga HA. Protein Sci. 1993;2:1715. doi: 10.1002/pro.5560021016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Liwo A, Ołdziej S, Pincus MR, Wawak RJ, Rackovsky S, Scheraga HA. J. Comput. Chem. 1997;18:849. [Google Scholar]
  • 17.Liwo A, Pincus MR, Wawak RJ, Rackovsky S, Ołdziej S, Scheraga HA. J. Comput. Chem. 1997;18:874. [Google Scholar]
  • 18.Liwo A, Czaplewski C, Pillardy J, Scheraga HA. J. Chem. Phys. 2001;115:2323. [Google Scholar]
  • 19.Liwo A, Arłukowicz P, Czaplewski C, Ołdziej S, Pillardy J, Scheraga HA. Proc. Natl. Acad. Sci. U.S.A. 2002;99:1937. doi: 10.1073/pnas.032675399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ołdziej S, Kozlowska U, Liwo A, Scheraga HA. J. Phys. Chem. A. 2003;107:8035. [Google Scholar]
  • 21.Liwo A, Ołdziej S, Czaplewski C, Kozłowska U, Scheraga HA. J. Phys.Chem. B. 2004;108:9421. [Google Scholar]
  • 22.Ołdziej S, Liwo A, Czaplewski C, Pillardy J, Scheraga HA. J. Phys.Chem. B. 2004;108:16934. [Google Scholar]
  • 23.Ołdziej S, Lagiewka J, Liwo A, Czaplewski C, Chinchio M, Nanias M, Scheraga HA. J. Phys. Chem. B. 2004;108:16950. [Google Scholar]
  • 24.Ołdziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA, Khalili M, Arnautova YA, Jagielska A, Makowski M, Schafroth HD, Kazmierkiewicz R, Ripoll DR, Pillardy J, Saunders JA, Kang Y-K, Gibson KD, Scheraga HA. Proc. Natl. Acad. Sci. U.S.A. 2005;102:7547. doi: 10.1073/pnas.0502655102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Saunders JA, Scheraga HA. Biopolymers. 2003;68:300. doi: 10.1002/bip.10226. [DOI] [PubMed] [Google Scholar]
  • 26.Czaplewski C, Ołdziej S, Liwo A, Scheraga HA. Protein Eng. Des. Selection. 2004;17:29. doi: 10.1093/protein/gzh003. [DOI] [PubMed] [Google Scholar]
  • 27.Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR. J. Chem. Phys. 1984;81:3684. [Google Scholar]
  • 28.de Gennes P-G. Scaling concepts in polymer physics. Ithaca: Cornell University Press; 1979. Chapter VI. [Google Scholar]
  • 29.Veitshans T, Klimov D, Thirumalai D. Fold. Des. 1996;2:1. doi: 10.1016/S1359-0278(97)00002-3. [DOI] [PubMed] [Google Scholar]
  • 30.Cieplak M, Hoang TX, Robbins MO. Proteins: Struct., Funct., Genet. 2002;49:104. doi: 10.1002/prot.10188. [DOI] [PubMed] [Google Scholar]
  • 31.Nosé S. Mol. Phys. 1984;52:255. [Google Scholar]
  • 32.Hoover WG. Phys. Rev. A. 1985;31:1695. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
  • 33.Andersen HC. J. Chem. Phys. 1980;72:2384. [Google Scholar]
  • 34.Liwo A, Arłukowicz P, Ołdziej S, Czaplewski C, Makowski M, Scheraga HA. J. Phys. Chem. B. 2004;108:16918. [Google Scholar]
  • 35.O'Shea EK, Klemm JD, Kim PS, Alber T. Science. 1991;254:539. doi: 10.1126/science.1948029. [DOI] [PubMed] [Google Scholar]
  • 36.Mittl PRE, Deillon C, Sargent D, Liu N, Klauser S, Thomas RM, Gutte B, Grütter MG. Proc. Natl. Acad. Sci. U. S. A. 2000;97:2562. doi: 10.1073/pnas.97.6.2562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ogihara NL, Ghirlanda G, Bryson JW, Gingery M, DeGrado WF, Eisenberg D. Proc. Natl. Acad. Sci. U. S. A. 2001;98:1404. doi: 10.1073/pnas.98.4.1404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Harbury PB, Zhang T, Kim PS, Alber T. Science. 1993;262:1401. doi: 10.1126/science.8248779. [DOI] [PubMed] [Google Scholar]
  • 39.Lee J, Ripoll DR, Czaplewski C, Pillardy J, Wedemeyer WJ, Scheraga HA. J. Phys. Chem. B. 2001;105:7291. [Google Scholar]
  • 40.Saunders JA, Scheraga HA. Biopolymers. 2003;68:318. doi: 10.1002/bip.10227. [DOI] [PubMed] [Google Scholar]
  • 41.Lee J, Scheraga HA, Rackovsky S. J. Comput. Chem. 1997;18:1222. [Google Scholar]
  • 42.Lee J, Liwo A, Scheraga HA. Proc. Natl. Acad. Sci. U. S. A. 1999;96:2025. doi: 10.1073/pnas.96.5.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES