Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jun 15.
Published in final edited form as: J Phys Chem B. 2009 Jul 2;113(26):9004–9015. doi: 10.1021/jp901540t

Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides

Robert B Best †,*, Gerhard Hummer
PMCID: PMC3115786  NIHMSID: NIHMS296844  PMID: 19514729

Abstract

Obtaining the correct balance of secondary structure propensities is a central priority in protein force-field development. Given that current force fields differ significantly in their α-helical propensities, a correction to match experimental results would be highly desirable. We have determined simple backbone energy corrections for two force fields to reproduce the fraction of helix measured in short peptides at 300 K. As validation, we show that the optimized force fields produce results in excellent agreement with nuclear magnetic resonance experiments for folded proteins and short peptides not used in the optimization. However, despite the agreement at ambient conditions, the dependence of the helix content on temperature is too weak, a problem shared with other force fields. A fit of the Lifson-Roig helix-coil theory shows that both enthalpy and entropy of helix formation are too small. The helix extension parameter w agrees well with experiment, but its entropic and enthalpic components are both only about half the respective experimental estimates. Our structural and thermodynamic analyses point toward the physical origins of these shortcomings in current force fields, and suggest ways to address them in future force-field development.

Introduction

It is now more than 30 years since the development of the first molecular mechanics force fields1,2 and the first molecular dynamics simulation of a protein.3 Simulations of biomolecules using classical energy functions have found widespread application, from protein structure determination and protein folding to studies of enzyme function and drug screening. While recent advances in computer power mean that sampling of the energy landscape is less of a limitation, the longer simulations now possible are making it increasingly evident that more accurate energy functions (force fields) will be important for reliable results.4 Force-field parameters are continually being improved, mainly by using a combination of ab initio quantum mechanical calculations augmented with empirical data from small molecules representing fragments of amino acids (for proteins).5 Recent developments in protein force fields have focused on improvements to the backbone potential based on comparisons with quantum calculations,59 even leading to the implementation of new terms in the CHARMM energy function. 5,8 The new force fields result in an improved treatment of folded proteins, giving better agreement with experimental data in solution (e.g., from nuclear magnetic resonance (NMR) measurements) and low root-mean-square deviations from the crystal structures (typically <1.5 Å for long MD trajectories) of small globular proteins. 10,11

Although reproduction of native-state structure and dynamics of folded proteins is an important requirement for any force field, the properties of weakly structured peptides are a much more challenging target. With timescales of equilibration between different conformations being ≈ 0.1 − 1µs for helix formation or ≈ 1 − 10µs for β-hairpin formation,12 obtaining equilibrium sampling from simulation is difficult. Furthermore, proper conformational sampling of peptides requires the correct statistical weights for different structures (as determined by the force field). This problem is exemplified by the need to balance α-helical, polyproline II (ppII), and β structure, posing a central issue in protein force-field development. Apart from direct relevance to so-called “natively unstructured” proteins, which have recently attracted a great deal of interest,13 the accurate balance of secondary structure in force fields is important because it is related to many other applications of biomolecular simulation, for example protein folding, aggregation, association and conformational change.

We recently compared an extensive set of experimental scalar couplings14 for Ala5 with long equilibrium simulations of the peptide using 12 different force fields.15 This type of data is very useful because it reflects the backbone conformation of the peptide through its (ϕ, ψ) torsion angles, and can be simply calculated from the simulation trajectories via the Karplus equation.16 Since short peptides do not have a single dominant structure, such comparisons provide a sensitive test of the “intrinsic” secondary structure preferences of a given force field.17 We found that most of the force fields tested showed discrepancies with the experimental data, but that the more recent refinements were much closer to experiment than the older generation. However, even amongst these recent force fields, there were significant differences in secondary structure propensity.15 Similarly, simulations of loop closure dynamics in short peptides indicated markedly different structural propensities among the force fields used,18 as was also found in simulations of the helix-coil transition.19 All-atom folding simulations by other groups have also suggested that the secondary structure balance in current force fields may need adjustment.4

Although we had previously found that several force fields gave similar overall agreement with the experimental J-couplings,15 we focus here on Amber ff03ff037 and Amber ff99SB.9 As shown below, their α-helical propensities are, respectively, too high and too low relative to experiment. We applied a perturbative energy correction (using torsional terms) to match experimental data for two peptides: (i) the scalar couplings for the short Ala5 peptide used in our previous study14,15 and (ii) helical populations determined from NMR chemical shift data for a longer helix-forming peptide, Ac-(AAQAA)3-NH2. The second data set was included because short peptides such as Ala5 are too short to form a stable α helix:14,15 mounting evidence for short Ala-based peptides indicates that they favor extended, ppII-like backbone conformations over the right-handed alpha-helical αR region of the Ramachandran map;14,2024 however, longer Ala-based peptides do form α-helix.2527 To avoid detrimental effects from over-fitting the optimized force fields, we applied only the minimum correction to achieve an acceptable match to experiment. We show that the revised force fields match residual dipolar coupling (RDC) data for simulations of folded ubiquitin about as well as their parents or better, and are able to reproduce NMR scalar couplings for a longer 19-residue peptide with a significant helical population.14 However, the helix content in Ac-(AAQAA)3-NH2 depends more weakly on temperature in the simulations than in experiment, both for the parent and optimized force fields. Fits of the statistical mechanical Lifson-Roig (LR) model for helix formation to the simulation data suggest that the weak temperature dependence arises from the enthalpic and entropic contributions to helix formation both being too small. We discuss possible structural and energetic causes of this phenomenon in an effort to point the way for future force-field improvements.

Methods

Simulation Details

Simulations of Ala5 were run using the simulation package Gromacs28,29 using a protocol similar to that used in our previous work,15 and with the implementation of the Amber force fields by Sorin and Pande.30 The peptide was unblocked and protonated at both N and C termini, corresponding to the experimental conditions of pH 2.14 Molecular dynamics simulations of each peptide in a 30 Å cubic simulation box of explicit TIP3P water31 were run at a constant temperature of 300 K and a constant pressure of 1 atm, with long range electrostatic terms evaluated using particle-mesh Ewald (PME) using a 1.0 Å grid spacing and a 9 Å cutoff for short-range interactions. For each force field, four runs of 50 ns each were initiated from different starting configurations. Further details of the simulation protocols are as published.15

Replica exchange molecular dynamics (REMD) simulations of the blocked peptide Ac-(AAQAA)3-NH2 were run using Gromacs28,29 with 32 replicas spanning a temperature range of 278 K to 595 K. The peptide was solvated in a truncated octahedron simulation cell of 1022 TIP3P water molecules with an initial distance of 35 Å between the nearest faces of the cell. This cell was equilibrated for 200 ps at 300 K and a constant pressure of 1 atm. Subsequently, all REMD simulations were done at constant volume, with long range electrostatics calculated using PME with a 1.2 Å grid spacing and 9 Å cutoff. Dynamics was propagated with a Langevin integration algorithm using a friction of 1 ps−1, and replica exchange attempts every 1 ps (every 500 steps with a time step of 2 fs). Typical acceptance probabilities for the replica exchange were in the range 0.1–0.5. All replica exchange runs used the same set of initial configurations, which were taken from the final configurations of a preliminary replica exchange simulation with ff99SB. The simulations were run for at least 30 ns per replica, of which the first 10 ns were discarded in the analysis (with an aggregate of ≈ 1 µs for each force field). To test for possible system size dependence, additional simulations of Ac-(AAQAA)3-NH2 in a 45 Å truncated octahedron box solvated by 2268 water molecules were run for 30 ns using a similar protocol, in this case with 32 replicas at 5 K intervals between 278 and 433 K.

Additional simulations were performed for the unblocked peptide HEWL19, derived from hen egg-white lysozyme with sequence KVFGRC(SMe)ELAAAMKRHGLDN. The structure and parameters for the S-methylated Cys 6 were adapted from those for methionine and are given in Supporting Information (SI) Figure 1 and Table 1 respectively. Both termini as well as all acidic side chains were protonated, corresponding to the experimental conditions of pH 2.14 The peptide was solvated in a truncated octahedron simulation cell with a 42 Å distance between nearest faces, and equilibrated at constant pressure for 200 ps at 300 K. Constant volume REMD was run with 32 replicas spanning the temperature range 278 K to 472 K, for 27 ns, of which the first 10 ns were discarded in the analysis. All other parameters were the same as for Ac-(AAQAA)3-NH2.

Native state simulations of ubiquitin were run starting from the crystal structure 1UBQ.32 The protein was solvated by 2586 explicit TIP3P water molecules in a cubic simulation box of 45 Å length with long range electrostatics calculated using PME with a 1.2 Å grid spacing and 9 Å cutoff. To neutralize the system charge, 7 sodium and 8 chloride ions were added. Dynamics was propagated for 30 ns at constant pressure (1 atm) and temperature (300 K) using a Nosé-Hoover thermostat33 and Parrinello-Rahman barostat.34

Definition of Helical States

We define helical structure in the spirit of the LR theory, that is to say, in terms of which regions of the Ramachandran map are occupied. We define two α-helical regions in the map:

  • The α+ region of the (ϕ, ψ) map is defined as15 ϕ ∈ [−160°,−20°] and ψ ∈ [−120°,50°]. This inclusive definition aims to capture all residues that are in the right-handed αR free energy basin of the Ramachandran map, even if these are isolated and not part of a genuine helix (red squares in Figure 1), i.e., the α free energy minimum in the context of “random coil”.

  • The αh region of the (ϕ, ψ) map is defined more stringently as35 ϕ ∈ [−100°, −30°] and ψ ∈ [−67°,−7°]. This definition is intended to capture the narrower (ϕ, ψ)-angle distribution of residues within actual α-helices (black squares in Figure 1; the insets show that this stringent definition covers the helical free energy minima for regions with defined secondary structure).

Figure 1.

Figure 1

Ramachandran potentials of mean force (PMFs), − log(P(ϕ, ψ)). To give an indication of the PMF in the absence of any helix formation, histograms of (ϕ, ψ) were constructed from the interior residues of Ala5 (which forms almost no α-helix) in simulations using the modified force fields (A) ff99SB* and (B) ff03*, and (C) from a PDB database, excluding all residues in defined secondary structure.48 The region defined as the α+ basin15 is shown by a red box in each figure. A black box indicates the αh region used in the definition of Of-helix.19 As an indication of the PMF within a helix, insets in (A) and (B) show the corresponding PMFs within an α-helix. This was calculated from simulations of the helix-forming peptide Ac-(AAQAA)3-NH2, by counting only segments of three or more consecutive residues whose (ϕ, ψ) angles are within the α basin. The inset in (C) shows the statistical potential derived by considering only residues in segments with secondary structure in the PDB database.48

Residues which lie within the αh region of the Ramachandran map are denoted as helical (h) for the purposes of LR analysis. All residues outside the αh region are defined as “coil” (c). A helical segment is one which has at least three consecutive residues whose (ϕ, ψ) angles fall within the αh boundaries (i.e., the smallest helix is … chhhc…); since the peptide Ac-(AAQAA)3-NH2 is blocked, all (ϕ, ψ) pairs are properly defined. The fraction of helix 〈hisim for a given residue i in the simulation is calculated as the fraction of time spent by that residue within helical segments. Alternative definitions of helical structure are discussed in the SI text.

Calculation of Helix-Coil Parameters

We have analyzed the data using the LR model for the helix-coil transition. In this model, the statistical weight of a given conformation is the product of weights for individual residues. A residue in the coil conformation is assigned a relative weight 1, and a residue of type i in a helical conformation, but not within a helical segment, is assigned a weight vi. Residues in helical segments receive a weight of wi, except for the terminal residues in the segment which have weight vi (since usually vw this creates a “nucleation penalty” for initial helix formation). The subscripts i indicate that, in some models, v and w depend on the residue types. The partition function of this model for a peptide with N amino acid residues can be written in a compact matrix form as:36,37

Z=(001)i=1NMi(011) (1)

in which the matrix Mi is defined as:

Mi=(wivi0001vivi1) (2)

The average “fraction of helix” for a given residue i is defined as the relative population of conformations where it has weight wi, given by 〈hiLR = ∂ ln Z/∂ ln wi. The average fraction of helix for the whole peptide, 〈fhLR, is the average over these for the non-terminal residues fhLR=(N2)1i=2N1hi (since terminal residues cannot have a w weight by definition). The average number of helical segments 〈nsLR is given by nsLR=i=3N2lnZ/lnwi1lnvi, in which ∂2lnZ/∂lnwi−1∂lnvi identifies the fraction of the ensemble in which residue i is at the C-terminal end of a helix.

We have fitted the simulation data to the model using a Bayesian formalism, where we estimate the probability of a set of chosen parameters, given the simulation data, via

p(parameters|data)p(data|parameters)p(parameters) (3)

We use a uniform prior, p(parameters) = 1 with ln v and ln w as free parameters, and estimate the probability of the observed data arising from the LR model with a given set of parameters using the likelihood function:

L=p(data|parameters)=k=1Nkρk=k=1Nk1Zi=1Nxi,k (4)

The Nk observations correspond to different conformations saved from the simulation, ρk is the equilibrium probability of observation k consistent with the LR partition function, and xi,k is the weight of residue i in observation k (i.e., 1, wi or vi). The overall log-likelihood can then be written as

lnL=iNw,ilnwi+iNv,ilnviNklnZ (5)

where Nw,i (Nv,i) is the total number of times that residue i has weight wi (vi) in the Nk conformations. The model parameters, and their distributions, are inferred by running a Monte Carlo simulation in parameter space. Moves were made in ln w and ln v, with a move being always accepted if ln L increases, or accepted with probability Ltrial/L if ln L decreases. The distribution of parameters is collected after an initial “equilibration” period.

Residual Dipolar Couplings

The RDC Dab between nuclei a and b is calculated from the simulation structures as

Dab=Dabmaxtr{AΦab}=Dabmaxr^abTAr^ab (6)
Φab,αβ=r^ab,αr^ab,β,α,β{x,y,z} (7)

where A is the alignment tensor describing the orientational distribution of the protein in a particular alignment medium,38 and ab is a unit vector pointing from a to b, defined within an intramolecular coordinate frame. The constant Dabmax is related to the gyromagnetic ratios of the nuclei a and b and their distance rab via Dabmax=μ0γaγb/4π2rab3. Although in principle the bond length rab should be included in the ensemble average, all the RDCs were for backbone amide nitrogen and proton (NH) pairs, for which the bond lengths are constrained to the equilibrium values in the simulation. In practice rab was set to 1.04 Å,39 although the actual value is irrelevant in this case as any change would be compensated by a scaling of the best-fit alignment tensor. Assuming that intramolecular motion is uncorrelated with the alignment tensor, Φab can be replaced by an intramolecular ensemble average over the simulation, obtained after a least-squares alignment of the protein backbone to the crystal structure. The alignment tensor A for a particular alignment medium is determined by linear least-squares fitting of all the calculated dipolar couplings (eq 6) against the corresponding experimental couplings.40 Agreement between experimental and simulation-derived dipolar couplings is quantified, for each alignment medium, in terms of Q={i(Di,simDi,expt)2/Di,expt2}1/2, with Qcum being the sum of Q over all alignment media.

Generalized Order Parameters

We compute Lipari-Szabo41 generalized order parameters Sab2 for backbone NH bonds. For nuclei a and b, Sab2 can be calculated from the simulation using the relation:42

Sab2=32trΦab212(trΦab)2 (8)

where Φab is defined in eq 7. This expression is valid provided that the simulation length (30 ns) is much longer than the correlation time for intramolecular fluctuations of the NH bond vector orientation;42 eq 8 will tend to underestimate the order parameters if intramolecular motions are present with time scales longer than the correlation time for molecular tumbling (≈ 2 ns).43

Results

Peptide NMR Results for Original Force Fields

To measure the deviation between experimental data and observables calculated from simulation, we define a parameter χ2 as the mean square deviation from experiment, normalized for the un-certainty.15 In the case of scalar couplings J, this gives:

χ2=1Ni=1N(JisimJi,expt)2σi2 (9)

In this expression, 〈Jisim is the mean of coupling i, back-calculated from simulation, with Ji,expt the corresponding experimental value. For the calculation of scalar couplings from simulation we use the “DFT2” set of Karplus equation parameters,15 based on the work of Case et al.44 The uncertainties σi are assumed to be dominated by the uncertainty of the Karplus parameters.15 Using this metric, we found that, although many older force fields were in sharp disagreement with published J-couplings for Ala5,14 a small group of recent force fields were in better accord with the experimental data152 ≤ 2.25). We note here that in our previous publication, the large disagreement reported for ff99SB was erroneous.45 In fact, ff99SB is amongst the best performing force fields (χ2 = 1.7). The scalar couplings for the two force fields are shown alongside the experimental data in Table 1. Thus, in terms of agreement with the scalar coupling data, ff03 and ff99SB are essentially indistinguishable, and the deviations from experiment are comparable to the experimental uncertainty. However the population in the α+ basin in the Ramachandran map (defined as15 ϕ ∈ [−160, −20] and ψ ∈ [−120,50]) in ff03 is about double that in ff99SB (Table 1).

Table 1.

Structural characteristics of peptides used for parameter optimization. Populations of the α+ β and ppII regions of the Ramachandran map15 averaged over all internal residues are listed (errors in parentheses), together with the fraction α-helix and the χ2 with respect to the experimental data (scalar couplings J for Ala5 and fraction helix per residue 〈hi〉 for Ac-(AAQAA)3-NH2 (Figure 2). For ff99SB and ff99SB*, most of the unaccounted population in the Ramachandran map is located in the αL region.

Peptide Property ff03 ff99SB ff03* ff99SB*
Ala5:
+ 29.3 (2.9) 15.7 (1.2) 16.4 (1.8) 22.5 (2.2)
% β 28.5 (1.1) 37.8 (1.4) 32.5 (0.7) 34.5 (1.3)
% ppII 41.5 (2.0) 42.3 (1.9) 50.5 (1.3) 39.8 (1.6)
% α-helix 1.9 (0.9) 0.0 (0.2) 0.2 (0.2) 0.6 (0.3)

χ2(J) 1.2 1.7 1.1 1.7

Ac-(AAQAA)3-NH2:
+ 93.9 (0.5) 26.9 (1.2) 45.9 (1.8) 48.5 (1.3)
% β 2.7 (0.2) 32.4 (0.8) 26.4 (1.0) 22.7 (0.7)
% ppII 3.3 (0.2) 31.8 (0.9) 27.1 (0.9) 21.8 (0.6)
% α-helix 82.7 (0.7) 1.8 (0.3) 18.8 (2.1) 14.2 (2.1)

X2(〈hi〉) 2600 570 1.9 3.7

Although isolated residues with α-helical (ϕ,ψ) angles occur in unblocked Ala5, it forms only a very small amount of real α-helix (with at least three consecutive residues within the α-helical region of the Ramachandran map), as shown in Table 1. We have therefore augmented our training data set with 13C chemical shift data from the helix-coil transition in a longer helix-forming peptide. The fraction of helix in Ac-(AAQAA)3-NH2 has been estimated from a careful analysis of carbonyl chemical shifts,46 as a function of temperature. This peptide is ideal for simulation, having a relatively high helix population for its size, and being uncharged. The absence of charged (e.g., lysine) residues may better reflect the intrinsic helix propensity of alanine.47 We have carried out long replica exchange molecular dynamics simulations of this peptide in the two force fields and find dramatically different helical populations at ambient temperatures: with ff03, the peptide forms an α helix most of the time, whilst in ff99SB, the α-helical population is negligible. Figure 2 shows the fraction of helix in each force field obtained from the replica at 303 K. The correspondingly large χ2 for deviations from the experimentally estimated fraction of helix at 303 K is also given in Table 1; in this case the errors σi are assumed to be dominated by the sampling error of the simulation. Clearly, ff03 and ff99SB have very different helical propensities and neither agrees particularly well with the experimental data for Ac-(AAQAA)3-NH2. Note that the exact definition of helix does not affect these conclusions (as is shown in SI Figures 2 and 3).

Figure 2.

Figure 2

Helix formation in the peptide Ac-(AAQAA)3-NH2. Fraction of helix 〈hi〉 (see text) is given for each residue of Ac-(AAQAA)3-NH2 from NMR chemical shifts (magenta circles)46 and from simulations at 303 K: ff03 (blue squares), ff99SB (red triangle down), ff03* (black squares), ff99SB* (orange triangle down). Where error bars are not visible, they are smaller than the symbol size.

Strategy for Correction of (ϕ, ψ) Surface

There are many terms in the force fields, most obviously charges, Lennard-Jones terms and torsion angles, which may influence their secondary structure preferences (i.e., for α, β or ppII structure) and there is surely much redundancy in methods for achieving a desired “correction”, given the number of available parameters. Following earlier efforts,9,35 we focus on the ϕ, ψ torsional terms because these are most directly connected with the Ramachandran map. Moreover, ϕ, ψ, torsional modifications should not result in detrimental effects on the non-bonded (charge and Lennard-Jones) terms, which have been carefully parameterized for high quality condensed-phase simulations. Our choice can also be retrospectively justified by the small resulting corrections of around 0.5 kBT (see below), indicating that the additional terms are merely fine tuning force fields which are already close to optimal.

To limit the number of free parameters and avoid over-fitting, we apply a simple cosine correc- tion term to ψ, since this torsion angle is the major determinant of helix propensity:

V1(ψ;kψ,δψ)=kψ[1+cos(ψδψ)] (10)

with parameters kψ and δψ giving respectively the magnitude and phase offset of the correction. At first sight this correction does not fit within the “Amber philosophy” for torsional parameterization, which favors physically motivated torsion parameters (i.e., not allowing arbitrary phase offsets δ). However, since current Amber force fields include torsion parameters for both ψ (N-Cα-C-N) and ψ′(Cβ-Cα-C-N), it is straightforward to map the correction terms with a phase offset δ to the existing terms with δ = 0 by recognizing that ψ′ ≈ ψ + 120°. For D-amino acids, the sign of the phase correction would be reversed.

We used two experimental data sets for parameter optimization. The first was a comprehensive set of scalar couplings for unblocked Ala5, giving residue-specific information on (ϕ, ψ) torsion preferences.14 This is the minimal peptide which can form a single turn of α-helix, and so can be thought of as a model for helix nucleation. Although longer polyalanines were also studied by Graf et al.,14 like Ala5, all formed very little helix (generally at least ten residues are required for significant α-helix population). As a model peptide which can form longer helices (a measure of helix extension), we chose the blocked 15-residue peptide Ac-(AAQAA)3-NH2. This peptide has a small but significant population of α-helix (≈ 20% at 300 K) and in addition (i) it is mostly alanine-based, like the peptides used in the scalar coupling study of Graf et al.,14 (ii) it is uncharged, and blocked at the termini, simplifying the system for simulation, and (iii) there is an extensive residue-resolved NMR study of helix formation as a function of temperature using chemical shifts.46 Our approach was to find the minimal correction (eq 10) such that the force fields would reproduce both the scalar couplings for Ala5 and the fraction of helix in Ac-(AAQAA)3-NH2 at 300 K.

A number of rounds of optimization were carried out to match the data for the longer peptide. In the first round a straightforward perturbation of the original force fields was used to estimate optimal correction parameters. Since simulations with these first round parameters revealed them to be an overcorrection, subsequent rounds kept δψ fixed and used a simple binary search in order to pinpoint the optimal value of kψ. The final force-field parameters are given in Table 2. Plots of the final V(ψ) corrections are given in SI, Figure 4.

Table 2.

Optimized corrections to ff03 and ff99SB, yielding revised force fields ff03* and ff99SB*, respectively. The parameters refer to a cosine correction to ψ given by eq 10.

Parameter ff03* ff99SB*
kψ (kcal/mol) 0.3575 0.1788
δψ (degrees) 285.5 105.4

Below, we refer to the revised force fields as ff03* and ff99SB*.

Results with Optimized Force Fields

We first discuss the simulation results for Ala5 with the modified force fields. The scalar coupling χ2 values for both original and modified force fields are listed in Table 1 (a full list of the experimental and calculated couplings is available as SI Table 3). We find that the improvements in χ2 with respect to the scalar couplings are very modest, indicating the parent force fields are already close to optimal, in terms of the Ala5 data; the final parameters were therefore mainly determined by the data for the longer peptide. Note that early in the optimization process, we derived force fields with lower χ2 for Ala5. These force fields, however, formed only extended conformations with the longer peptide and were most likely overfitted. Despite the small change in χ2, the populations of the various Ramachandran regions have shifted, with ff03* and ff99SB* having lower and higher populations in the helical region than their respective parents.

A much greater improvement was achieved with respect to the fraction of helix for the longer helix-forming peptide. This is demonstrated in Figure 2, where both revised force fields are much closer to the experimental estimate of fraction of helix than the parent force fields, at the temperature of interest (303 K). Both also correctly predict the trend toward lower helical propensity at the C-terminus. To further illustrate the difference between these force fields, snapshots of the structures obtained at this temperature are shown in Figure 3, selected at random from the trajectories. The almost complete helix in ff03 and lack of helix in ff99SB are evident from Figure 3 A and B, respectively; there is some evidence of β-turn formation in the latter. For both modified force fields, short helices are observed, with the structures being slightly more compact overall for ff99SB* than ff03*.

Figure 3.

Figure 3

Randomly selected conformations of Ac-(AAQAA)3-NH2 at 303 K in (A) ff03, (B) ff99SB, (C) ff03* and (D) ff99SB*.

To illustrate the effect of the modifications, we present in Figure 1 (A) and (B) the (ϕ, ψ) potential of mean force derived from the internal residues of Ala5. Both modifications result in very similar final (ϕ, ψ) free energy surfaces (a comparison with the original force fields is given in SI Figure 5). These results can be compared with a statistical potential derived from observed (ϕ, ψ) frequencies in a library of coil residues drawn from the protein data bank (PDB).48 For a fair comparison with the Ala5 peptide, which forms negligible α-helix (Table 1), in Figure 1 (C) we exclude all residues within secondary structure from this analysis. The positions and shapes of the minima derived from the simulations and the PDB are in good accord; note that the relative “free energies” of the major minima in the PDB reflect the biased frequency of different types of secondary structure in the PDB, and not the intrinsic structural preferences, and so would not be expected to agree with the simulation results. In SI Figure 6, we compare the potential energy surface of alanine dipeptide in the original modified force fields and from quantum mechanical (QM) calculations. 5,8 The surfaces for the original and modified force fields compare equally well with the QM data.

Validation: NMR data for a Folded Protein

Although our primary aim is to obtain force fields that have properly balanced secondary structure preferences, with a view to simulations of non-native proteins, it is also important to obtain good results for folded proteins, where recent force fields have made large improvements. The relatively small shifts in the Ramachandran potentials of mean force suggest that our modifications should not have a large effect on native state simulations. To confirm this, we ran 30 ns simulations of folded ubiquitin, a protein for which a wealth of experimental NMR data is available in the literature. In Figure 4 A, we report the backbone root mean square deviation from the crystal structure for the various force fields. Note that this excludes the disordered C-terminal tail (residues 72–76). The performance of the modified force fields is comparable to that of the parent force fields, with deviations being less than 1 Å over most of the simulation. We have also calculated Lipari-Szabo generalized order parameters41 S2 for NH bond vectors from the simulation. These parameters reflect the extent of orientational motion of the NH vector on a picosecond-to-nanosecond time scale. We find (Figure 4 B,C) that the order parameters determined from both the original and modified force fields are in excellent agreement with the experimental order parameters fitted to NMR relaxation data.49

Figure 4.

Figure 4

Tests of refined force fields using 30 ns all-atom simulations of the native state of ubiquitin. Panels in (A) show the root-mean-square deviation of backbone heavy atoms (residues 1–71) from the crystal structure32 for each force-field (from top: ff03, ff99SB, ff03* and ff99SB*). Backbone generalized order parameters (S2) are shown in (B) for ff03 (blue) and ff03* (red) and (C) for ff99SB (blue) and ff99SB* (red); experimental order parameters from NMR relaxation measurements49 are shown as black symbols.

Backbone RDCs are used for a second direct comparison of the simulation results with solution NMR experiments. The RDC data are a natural choice because they are exquisitely sensitive to small variations in geometry and a large set of dipolar couplings for ubiquitin in 10 different alignment media is available in the literature,50 and has already been used to validate the ff99SB force field.51

We calculate the alignment tensor for each medium using only the ordered backbone residues, and assess the deviations from experiment by using the cumulative Q-factor, Qcum, averaged over all alignment media. This procedure is the same as that used in earlier tests of ff99SB, in order to be comparable:51 we find a Qcum of 2.38 for ff99SB, similar to the previous value of 2.22 obtained from longer simulations. The results for all force fields are summarized in Figure 4 (E). There is a substantial improvement in agreement with experiment for ff03* over ff03, while ff99SB* is comparable to ff99SB, within the uncertainty. Therefore, it appears that the new force fields are at least as good as the best current force fields at reproducing RDCs.

Validation: scalar couplings for HEWL19

As pointed out in the introduction, folded proteins represent rather a weak test of force-field secondary structure propensities, since the system explores only the narrow free energy basin around the folded structure. Ideally the force-field correction should be transferable to different peptides to predict helix content accurately for different sequences. In practice this will be limited by the accuracy of other factors that determine helix propensity, such as salt bridges. As a simple test we have run simulations of the 19-residue fragment HEWL19, consisting of residues 1–19 of hen egg-white lysozyme. This peptide has a naturally occurring trialanine sequence from residues 9–11, for which scalar couplings have been determined at 300 K.14 We ran replica exchange simulations of HEWL19 with both the original and modified force fields: the data are summarized in Table 4. We find that both the modified force fields show excellent agreement with the experimental data, with χ2 close to 1. For the original force fields, ff03 shows a large discrepancy (χ2 > 3), whilst χ2 for ff99SB is only slightly larger than that for the modified force fields. However, both the population of the α region of the Ramachandran map, and the fraction of helix are much lower for ff99SB than for the modified force fields. The scalar coupling data for this peptide therefore appear to establish a strong upper bound for the amount of helix formed (as in ff03), but are less discriminating between force fields with a lower fraction of helix. For direct comparison, the scalar couplings for the central residue (Ala10) are also listed in Table 4, with the full data set available in SI Table 3.

Table 4.

Scalar couplings of the central residue (Ala 10) of HEWL19 for different force fields. The lower part of the table lists the Ramachandran populations and fraction helix averaged over the three consecutive Ala residues (errors in parentheses), and the χ2 relative to the experimental J-couplings (complete data for Ala 9, 10, and 11 are given in SI Table 3).

Expt. σ ff03 ff99SB ff03* ff99SB*
A10 1JNCα10) 10.58 0.59 10.04 10.84 10.51 10.27
A10 2JNCα9) 7.24 0.50 6.48 7.50 7.43 7.06
A10 3JHαC10) 1.72 0.38 1.40 2.55 1.55 1.82
A10 3JHNC10) 1.33 0.59 0.58 1.10 1.00 0.65
A10 3JHNCβ10) 2.19 0.39 3.48 2.23 2.86 3.01
A10 3JHNHα10) 5.10 0.91 5.72 7.08 5.95 6.46
A103JHNCα10; ψ9) 0.46 0.10 0.24 0.52 0.46 0.39

% α+ 91.6 (1.6) 34.9 (2.3) 60.6 (2.3) 69.4 (2.5)
% β 1.8 (0.7) 23.5 (1.3) 15.8 (1.7) 12.9 (1.3)
% ppII 6.3 (1.3) 27.9 (1.5) 22.6 (1.3) 15.2 (1.6)
% helix 78.6 (2.5) 4.5 (1.0) 35.1 (2.3) 30.9 (2.0)

χ2(J) 3.2 1.3 1.0 1.0

Temperature Dependence of Helical Content

Since replica exchange sampling was used for the longer peptide, it is also possible to examine the temperature dependence of the helix content. Figure 5 (A) compares the overall fraction of helix of Ac-(AAQAA)3-NH2 determined from the various simulations to that estimated from NMR using chemical shifts.46 Simulations with all force fields resulted in only a weak dependence of helical population on temperature, while the experiments suggest a much sharper transition, with all helix being lost above ≈ 350 K. A similar weak dependence of helical population on temperature was also observed in a recent simulation study by the Pande group, for their modified force field ff99ϕ.52 A number of simple explanations for the weak temperature dependence in simulations can be suggested: (i) lack of adequate statistical sampling in the simulations; (ii) effects of pressure (our replica exchange simulations are done at constant volume); (iii) poor temperature dependence in the water model; (iv) problems with simulation protocol; (v) imaging artifacts due to small simulation box; and (v) definition of “helix”. We address each possible cause here in turn:

Figure 5.

Figure 5

Temperature dependence of helix formation in Ac-(AAQAA)3-NH2. (A) Fraction helix as a function of temperature for ff03 (blue squares) ff99SB (red triangles), and the revised force fields ff03* and ff99SB*; experimental estimate from NMR chemical shifts given as magenta circles. (B) Effects of simulation parameters on results: ff03 (blue squares), ff03 run with Amber 10 instead of Gromacs 3.3.1 (blue circles), ff03 with TIP4P-Ew instead of TIP3P (blue triangles), ff03* (black filled squares), ff03* starting from helical conformations in all replicas (black open circles), ff03* with simulation cell with 6.1% larger volume (black triangle up), ff03* with simulation cell with 5.9 % smaller volume (black triangle down), ff03* with 45 Å simulation cell (black open squares), ff99SB (red triangle up), ff99SB from helix (red circles); experimental data as in (A). The convergence of the calculated helical fraction is shown by plotting a cumulative fraction helix against time for simulations with ff03* at temperatures of (C) 303 K, (D) 355 K and (E) 396 K. Symbols in (C)–(E) as in (B). Where error bars are not visible, they are smaller than the symbol size.

Statistical Sampling

Recent all-atom molecular dynamics simulations by Sorin and Pande have found that equilibration periods (without replica exchange) for the helix-coil transition may be as long as 30 ns,30 and experimental relaxation times for helix formation are at least 100 ns.5356 We have addressed the question of statistical convergence for each of ff03* and ff99SB* by running simulations of Ac-(AAQAA)3-NH2 with all replicas starting (1) from random-coil initial conditions (the standard initial conditions, as used for all the other force fields), and (2) from a helical conformations. The average fraction of helix resulting from these two very different sets of initial conditions are similar (Figure 5 B), having clearly converged over time (Figure 5 C–E). We find a similarly slow rate of convergence to that seen in the earlier work.30 Note also that the fraction of helix is only weakly dependent on temperature for both sets of initial conditions.

Effects of Pressure

Since constant pressure simulations in liquid water will be unstable at high temperatures, we have run the replica exchange simulations at constant volume. This was also justified from a study by Paschek et al. in which the results of two-dimensional volume-temperature replica exchange suggested a negligible effect of pressure on total fraction helix.57 We have confirmed the absence of pressure/density effects on the helix-coil equilibrium by running replica exchange of Ac-(AAQAA)3-NH2 with the ff03* force field in a box with volume 5.9% smaller than the original, and one 6.1% larger. The results in both the larger and smaller box converge to the same values as those in the original box (Figure 5 D and C–E). A plot of the fraction helix as a function of density and temperature finds at most a weak density (or pressure) dependence, in agreement with the results of Paschek et al.57 (SI Figure 7).

Water Model

TIP3P58 is the water model with which the Amber force fields were intended to be used.7,9,59,60 However, it is well known that this model does not reproduce well the temperature dependence of the properties of liquid water (e.g., density maximum). A more accurate recent model explicitly developed for use with Ewald-based electrostatics is TIP4P-Ew.61 Simulations of Ac-(AAQAA)3-NH2 with ff03 in TIP4P-Ew (Figure 5 B) indicate that this model clearly destabilizes helical structure; however it does not result in a sharper dependence of the helix-coil transition on temperature.

Force-Field Implementation and Simulation Protocol

We use Gromacs to run simulations with the Amber force fields. In addition to checking single-point energies with simple molecules in both simulation packages, we have also run replica exchange simulations of the same peptide Ac-(AAQAA)3-NH2 with ff03 in Amber with a set of simulation parameters as similar as possible to those used in Gromacs. The results (Figure 5 B) indicate that the results obtained using Amber and Gromacs are in good agreement.

Imaging Artifacts

When simulating unfolded polypeptides with periodic boundary conditions, some weak interactions between peptides in the primary image and its immediate neighbours are inevitable unless an impractically large simulation box is used. With the standard 35 Å box size used for most of our simulations, we find that the protein interacts with its image (minimum distance to nearest image less than 5 Å) in approximately 5 % of the configurations in the high temperature replicas where the peptide is most unfolded. To ensure that this artifact does not influence the temperature dependence of helix formation, we have run additional simulations of Ac-(AAQAA)3-NH2 with ff03* in a 45 Å truncated octahedron box. The results show (Figure 5 B) that imaging artifacts are negligible since an almost identical fraction helix is obtained with both 35 Å and 45 Å box sizes over the temperature range considered.

Definition of Helix

We have adopted a relatively conservative definition for the αh region of (ϕ, ψ) space. We have also considered alternative definitions which define a larger region of the Ramachandran map as αh, or which define helix using helical i, i + 4 hydrogen bonds. The more generous definitions do not alter the fraction helix much and result in a slightly weaker temperature dependence (see SI text and SI Figures 2 and 3). Therefore, the lack of temperature dependence is not a consequence of the specific helix definition employed.

We conclude from the above analyses that the weak temperature dependence of the helix-coil equilibrium in the simulations is not likely to be caused by the statistical sampling, pressure effects, the water model, possible problems with the simulation protocol, or the definition of “helix”. Instead, it points toward more fundamental issues with the force field. To provide a foundation for future improvements, we will in the following analyze the energetics of the helix-coil equilibrium by separating it into enthalpic and entropic contributions.

Thermodynamics of the Helix-Coil Transition

Lifson-Roig model for Helix-Coil Transition

We can gain some insight into thermodynamics of helix formation by considering a simple model for the helix-coil transition. We use the LR theory, which has been shown to be equivalent to the earlier Zimm-Bragg theory,62 at least for residues within α-helices.36,37, Both theories are also closely related to the partition function used in the AGADIR algorithm for helix prediction.63,64 The LR theory describes helix formation in terms of a nucleation parameter v and helix extension parameter w. Specifically, v is the statistical weight of a residue whose backbone (ϕ, ψ) torsion angles lie within the helical region, relative to the “coil”, or non-helical state: effectively, it is an equilibrium constant between these two states. Since v is a small number, the formation of helical structure is initially improbable. However, once sufficient helix has been formed, it can be stabilized by i, i + 4 hydrogen bonding (as well as other interactions). The parameter w represents the equilibrium constant for this favorable conversion of a coil residue to a helical residue at the end of a helix. Here we use common values of v and w for Ala and Gln. We also considered a separate treatment of Ala and Gln, as well as a variant of the LR theory that includes the possibility of helix-capping interactions,65 which are known to be important6668 (SI text and SI Figures 8–10). These more complex models do describe the data slightly better, even after accounting for the additional parameters required, however, the results presented below do not depend strongly on the choice of model: for example, the fitted w and v when capping is included are almost identical to those when it is not (SI Figure 11). We therefore present here the simplest model.

Using a Bayesian approach, we have determined the optimal LR parameters describing the helix formation with each force field and at each temperature. Even the simple form of the LR model used here provides a reasonable description of the simulation data, in the sense that it reproduces various characteristics of the distribution of helix within the peptide. Figure 6 (A) shows that the simplest LR model fits the total fraction of helix very well at all temperatures. There are small deviations between the number of helical segments predicted by the model, and the simulation data (Figure 6 (B)); these can be reduced by modeling Ala and Gln independently (SI text). The absence of capping interactions in the model results in a symmetric predicted distribution of helicity within the peptide (Figure 6 (C)): this misses the strong capping effect of the blocking acetyl group biasing helix formation toward the N-terminus. Inclusion of capping in the model reproduces this effect (SI text). We comment briefly on an expression commonly used to calculate the number of helical segments, ∂ lnZ/∂ lnv12, where v12 is the element in the first row and second column of the LR matrix M. This relation incorrectly counts …chhc … as a helix, and is a valid approximation only if v is very small. In general this expression will overestimate v (broken lines in Figure 6 (B)), hence helix-coil parameters (particularly v) obtained by fitting this expression to simulation data will tend to be underestimates.

Figure 6.

Figure 6

Quality of LR fit to simulations of Ac-(AAQAA)3-NH2. (A) The average fraction helix 〈fh〉, with simulation data as symbols and model predictions as solid lines (blue: ff03, red: ff99SB, black: ff03*, orange: ff99SB*). (B) The average number of “helical segments”, 〈ns〉, with simulation data as symbols and model predictions as solid lines; broken lines give the prediction using the approximate expression 〈ns〉 ≈ Σi∂lnZ/∂lnv12i. (C) Fraction of helix for each residue in the 303 K replica, symbols and lines as above.

The fitted parameters w(T),v(T) are plotted as a function of temperature in Figure 7. Near 300 K, the fitted w, 1.10 and 1.00 for ff03* and ff99SB*, respectively, are somewhat too small relative to experimental estimates at this temperature (≈ 1.2869); when w is independently fitted to Ala and Gln, the Ala results are closer to experiment (w ≈ 1.28 for ff03* from a fit using separate parameters for A and Q and helix capping; see SI text and SI Figure 11 A). The fitted v, 0.18 and 0.21 for ff03* and ff99SB* are too large with respect to the experimental estimates of 0.03–0.05;65,69 a similar result was obtained from another simulation study.52 For reference the corresponding Zimm-Bragg parameters36,37,62 σ = w/(1+v), s = v2/(1+v)4 are σ = 0.017, s = 0.93 for ff03* and σ = 0.020, s = 0.83 for ff99SB*. Thus, although the force fields match the overall fraction of helix in experiment, the higher nucleation parameters (v or σ) relative to experiment indicate that the helix formation in the simulations is less cooperative than inferred experimentally.

Figure 7.

Figure 7

Fitted LR parameters as a function of temperature. (A) The helix extension parameter w, with colors as in Figure 6. Solid circles are obtained from a free fit of w,v to simulations with each force field. Open circles are obtained by fitting only w, with v fixed to the average v obtained from the free fit, over the full temperature range. Data for the ten residue peptide Ac-(AAQAA)2-NH2 using ff03* is given by solid squares. NMR data are in magenta: solid circles, w from Rohl et al.;,69 open squares, fit of w to the fraction helix in Ac-(AAQAA)3-NH2, assuming v = 0.04796. (B) The nucleation factor v, colors and symbols as in (A). Solid lines in (A) and (B) are the fit of a thermodynamic model with temperature-independent heat capacity (parameters in Table 5).

The LR extension parameter w closely mirrors the helical propensities of the different force fields, with ff03 and ff99SB having, respectively, the largest and smallest values, and ff03* and ff99SB* having similar w. Like the overall fraction of helix, w(T ) fitted to the simulation data exhibits a smaller temperature dependence than the experimental data. The temperature dependence of w is not suppressed by our assumption of a temperature-dependent v: w(T) is nearly unchanged in a fit with v fixed to its average over temperature in the free fit (open symbols in Figure 7 (A)).

When comparing parameters with experiment, it is important to note that the LR parameters v and w are not entirely independent in the fits. In the simulations, a LR model with the experimental v = 0.05 and a slightly increased w can fully account for the residue-specific fraction helix (SI Figure 12). However, the cooperativity of such a model is too high, as manifested in an average number of helical segments being about half of that seen in the simulations. In experiments probing the average fraction of helix,70,71, such collective information is not available, and v is derived mostly from global fits to fraction-helix data for peptides of different length. In fitting the related Zimm-Bragg model to circular dichroism data from copolypeptides of alanine and ornithine or lysine, Yang et al.72 found that the nucleation parameter was very sensitive to the chosen dependence of circular dichroism (CD) signal on helix length, and could vary between σ = 0.004 (v ≈ 0.073) to σ = 0.02 (v ≈ 0.206) depending on this choice. A further nuance to the interpretation of experiments probing the helix- coil transition is the recent finding by Kennedy et al.27 that the fitted w may depend on the length of the peptide (as we find when comparing the w for Ac-(AAQAA)2-NH2 and Ac-(AAQAA)2-NH3 - see Figure 7). These challenges together complicate the comparison of the LR parameters from theory and experiments.

Thermodynamic interpretation of Lifson-Roig parameters

We fit a thermodynamic model to the optimal LR parameters w(T) and v(T), including a temperature-independent heat capacity (Table 5). The fits are indicated by the solid lines in Figure 7. As expected, helix extension is favored by enthalpy ΔH (with the exception of ff99SB for which ΔH > 0) and opposed by entropy ΔS, with ff03* and ff99SB* having ΔH ≈ −0.5 kcal.mol−1 and ΔS ≈ − 1.7 cal.mol−1 .K−1. The fitted heat capacity ΔCV ≈ ΔCp is negative (the opposite sign to that for protein folding), in qualitative agreement with a recent experimental estimate of −7.6 cal.mol−1.K−1.73 We also fitted the experimental estimates by Rohl et al.69 for w(T) (setting ΔCp ≡ 0 because of the relatively small temperature range covered by this data). In this case, we obtain ΔH ≈ −1.3 kcal.mol−1 and ΔS ≈ − 3.8 cal.mol−1.K−1. This value of ΔH is similar to estimates of between −0.9 and −1.3 kcal.mol−1 from CD measurements at different temperatures71 and from calori-metric measurements.73,74 Thus, although the free energy for helix extension, − kBT ln w, is similar in simulation and experiment near 300 K, the enthalpic and entropic contributions are apparently about half the experimentally-derived estimates, resulting in an overall smaller temperature dependence.

Table 5.

Thermodynamic fits to the LR parameters w and v. Errors are given in parentheses and values with large uncertainties are indicated in italics. Note that since the Δ(PV) term is expected to be small,19 we assume ΔA ≈ ΔG, ΔU ≈ ΔH and ΔCv ≈ ΔCp; strictly speaking the simulation data are for the constant volume ensemble. Thus we fit kBT ln w = ΔH300KTΔS300K + ΔCp[T − 300 K]−TΔCp ln[T/300 K], with a similar relation for v.

Parameter:
Force Field
ΔH300K
kcal.mol−1
ΔS300K
cal.mol−1.K−1
ΔCp
cal.mol−1.K−1
w: ff03 −0.99 (0.02) −1.37 (0.09) −5.3 (0.4)
w: ff99SB +0.23 (0.05) 0.29 (0.17) −5.7 (0.5)
w: ff03* −0.61 (0.02) −1.81 (0.06) −3.6 (0.2)
w: ff99SB* −0.47 (0.03) −1.55 (0.09) −3.3 (0.3)

w: Ref69 −1.28 (0.02) −3.78 (0.05)

v: ff03 +0.44 (0.04) 0.35 (0.12) −8.3 (0.4)
v: ff99SB +0.90 (0.04) −1.08 (0.11) −5.8 (0.3)
v: ff03* +0.72 (0.03) −0.94 (0.08) −7.5 (0.2)
v: ff99SB* +0.71 (0.04) −0.86 (0.11) −8.0 (0.3)

v: Ref69 −6.6

Since v is usually assumed temperature independent in experiment, we initially make this approximation for comparison. From our fitted v (for the optimized ff03* or ff99SB*) of 0.2, and the experimental estimate of 0.036,69 we calculate ΔS to be −3.2 cal.mol−1.K−1 and −6.6 cal.mol−1.K−1, respectively: i.e., assuming there is no enthalphic contribution to v, the entropy loss on forming a helical h backbone conformation outside a helical segment in the simulation models is about half that in experiment. Fitting a more general thermodynamic model results in a significant gain in enthalpy on forming a helical conformation, ≈ 0.7 kcal.mol−1 (details in Table 5). This result differs from the classical view of the helix-coil transition, in which the nucle-ation barrier (≈ v2 in the LR model) is purely entropic in origin and v is consequently temperature independent. Interestingly, the enthalpic contribution to v is increased by 0.28 kcal.mol−1 by the correction to ff03, and decreased by 0.19 kcal.mol−1 by the correction to ff99SB, similar to the magnitudes of the ψ corrections (Table 2). This suggests that the main difference between the nucleation parameters v of the parent and optimized force fields comes from the enthalpic contributions of the torsional correction, with the entropic contribution relatively unaffected.

The thermodynamic fits of the helix-coil parameters indicate that while the torsional correction to ψ can approximately reproduce free energies at a given temperature, the enthalpic and entropic contributions to the helix-coil equilibrium are too small, each about half the experimental estimate. This error is reflected in part in the Ramachandran maps. As a rough measure of the conformational entropy lost upon helix formation we have calculated a Shannon entropy over the α+ region of the maps as kB16020dϕ12050dψp(ϕ,ψ)lnp(ϕ,ψ) where p(ϕ, ψ) is the distribution of backbone dihedral angles normalized over the region of integration (i.e., the red squares in Figure 1). Specifically, we calculate the difference in this “entropy” for residues in helical states with and without helical hydrogen bonds. For the PDB,48 ff03* and ff99SB*, these differences are −3.4, −1.8 and −1.4 cal.mol−1.K−1, respectively. By this entropy measure, residues within helices are more structured in the PDB, relative to coil residues in the helical basin, than in the force fields. This structuring is evident in the insets of Figure 1, which show the α+ region for residues within helices: the potential well for residues with helical hydrogen bonds is much narrower in the PDB than in the force fields. A comparable measure of entropy loss upon hydrogen bonding can be obtained by taking the difference between the entropy associated with the LR parameters w (which includes both a c to h conformational change and hydrogen bond formation) and v. This results in a loss of entropy of approximately 3 cal.mol−1.K−1 in experiment, versus only about 0.8–1.0 cal.mol−1.K−1 for the modified force fields, in qualitative agreement with the Shannon entropy estimates.

Discussion

Our work builds on many years of AMBER force field improvements, many using similar correction terms (consequently, it is important to specify exactly which one is being used). For example, the original AMBER force field intended for condensed phase simulation, ff94,59 had a known propensity for strongly overstabilizing helical structure, prompting two corrective modifications: (i) the ff96 release which changed only the ϕ and ψ torsion parameters,75 but proved to overstabi-lize extended structure; (ii) the ff94GS variant of García and Sanbonmatsu in which both the ϕ and ψ torsion parameters were set to zero.35 Although ff94GS achieves a better balance, it still retains too much helical structure as assessed by comparison with NMR scalar coupling data.15

The more recent ff99 parameters, also based on ff94, included extensive changes to the torsional potentials.60 While reasonably balanced between α and extended structure, this force field was found to have a large barrier to rotation in ϕ. To address this, Sorin and Pande replaced the ϕ torsional potential of ff99 with that of ff94 to create ff99ϕ30. In a more extensive optimization, Simmerling and co-workers used quantum chemical calculations on longer alanine peptides in order to optimize both the ϕ and ψ torsions;9 the resulting ff99SB potential gave improved agreement with NMR data for proteins.9,51 The most recent release of the AMBER all-atom force field, ff03, is a major departure from ff94 in that both the charges and torsional potentials have been refitted to high level quantum calculations. In particular, a new approach was applied to deriving the charges in order to make them more compatible with condensed phase simulation.7 Our recent comparison with NMR scalar coupling data for (Ala)5 showed that both ff99SB and ff03 were amongst the force fields showing the best agreement with experimental data.15

By modifying the backbone dihedral potentials of ff03 and ff99SB to match experimental data related to the helix-coil transition, we have obtained force fields that produce the proper balance between average helix and coil populations at ambient conditions. This balance is essential for simulations of unfolded or disordered peptides and protein loops. The transferability of the optimized force fields was demonstrated by reproducing experimental NMR data for peptides not used in parameterization, as well as excellent structural results for a folded protein. However, despite these overall improvements the resulting force fields still exhibit a number of deficiencies that become apparent upon examination of the LR parameters v and w, and their temperature dependence. We find that the force fields underestimate w and overestimate v compared with experiment. As a consequence, the helix-coil transition is less cooperative in the simulations than in experiment. On this basis, we expect that long helices will be more fragmented in the simulations than in experiment, despite the match in average helicity for short helices.

The explanation for the discrepancies between the v,w for experiment and simulation lies in the entropy loss upon helix formation and the compensating enthalpy gain both being too small in magnitude. This thermodynamic analysis of the improved force fields points the way toward further important corrections. The small change in entropy relative to experiment could be explained by a lack of orientational specificity in hydrogen bonding in the force field. For example, it has been shown that although the orientational distribution of side-chain hydrogen bonds in the PDB matches quantum mechanical calculations, the orientational dependence shown by current force fields is too small.76 This is particularly true of the geometry with respect to the hydrogen-bond acceptor, where there is a slight preference for hydrogen bonding in the direction of the “lone pairs”. This preference is also evident in the distribution of hydrogen bonds in small molecule crystal structures;77 inclusion of orientation-dependent hydrogen bonding potentials has produced improvements in NMR structure calculations78 and the prediction and design of protein-protein complexes.79 The additional geometric requirements for hydrogen bonding would be expected to increase the entropic component of w: our analysis of the Shannon entropy in Ramachandran maps is consistent with a larger loss of entropy upon hydrogen-bond formation in the PDB, relative to the simulation. This effect might be addressed in force fields either by the inclusion of a geometrically specific hydrogen-bond energy term80 or the introduction of “off-center” charges in addition to the current “atom-centered” charges.81,82

A second major effect is the neglect of electronic polarization (or induction) effects in current force fields.82 A recent density functional theory study has suggested, for example, that as much as half of the hydrogen-bonding energy in a helix may arise from electron density re-distribution on helix formation.83 Other studies of hydrogen-bonded chains of formamide molecules84 and in long helices85 have led to similar conclusions. An experimental finding that the LR w may depend on helix length is consistent with this cooperativity.27 The increase in enthalpy associated with this effect approximately matches the discrepancy between the enthalpy for the formation of hydrogen bonds in the non-polarizable force fields used here, and the experimental values.

Improved future force fields should, at the minimum, account for the two effects of hydrogen bond geometry and polarization, which appear to account for the observed differences between the simulated and measured thermodynamics of the helix-coil transition. Improvements could be achieved by using multi-center atom charges (or directional potentials) and by adding polarizability,86 at least for the groups involved in backbone hydrogen bonds. When making alterations to the protein hydrogen bonding potential, it will also be necessary to consider the balance of interec-tions with the water model which may be altered by such changes. To be consistent, it may also be necessary to introduce similar terms for protein-water and water-water interactions.

We stress that, desp still be adequate for reproducing equilibrium properties of both unfolded and folded peptides and proteins near 300 K. The fact that the corrections necessary are around 0.5 kBT per residue shows that the parent force fields ff03 and ff99SB are already of high quality; the modifications are simply a “fine tuning” step. Nonetheless, when summed over all the residues, we find that the correction terms have a substantial effect on the conformational distribution. We would therefore recommend using the modified force fields ff03* and ff99SB* for simulations of weakly structured or unfolded peptides and proteins; we note that initial tests suggest that the refined force fields give an improved description of β-hairpin forming peptides. Future development of more sophisticated force fields will be greatly facilitated by experimental data for weakly structured peptides that can be readily calculated from simulation (e.g., NMR scalar couplings, NOE’s), particularly if this is determined at different temperatures.

Supplementary Material

suppText

Table 3.

NMR RDCs for ubiquitin. The agreement between RDCs from 30 ns simulations with each force field, and experimental NMR data recorded in 10 different alignment media50 is assessed by the cumulative Q-factor, Qcum51

Force Field Qcum uncertainty
ff03 2.80 0.10
ff99SB 2.30 0.05
ff03* 2.58 0.05
ff99SB* 2.49 0.09

X-ray structure 2.45

Acknowledgements

We would like to thank Alex Mackerell for sharing with us the LMP2/cc-pVQZ//MP2/6-31g* energy surface for alanine dipeptide. Robert Best is supported by a Royal Society University Research Fellowship. This work was supported in part by the intramural research program of the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, and made use of the biowulf cluster at NIH, the Cambridge HPC facility (Darwin) and local clusters in the Cambridge Chemistry department. We would like to thank Jeetain Mittal and Nick Fawzi for helpful comments on the manuscript.

References

  • 1.Levitt M, Lifson S. J. Mol. Biol. 1969;46:269–279. doi: 10.1016/0022-2836(69)90421-5. [DOI] [PubMed] [Google Scholar]
  • 2.Gelin BR, Karplus M. Proc. Natl. Acad. Sci. U.S.A. 1975;72:2002–2006. doi: 10.1073/pnas.72.6.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.McCammon JA, Gelin BR, Karplus M. Nature. 1977;267:585–590. doi: 10.1038/267585a0. [DOI] [PubMed] [Google Scholar]
  • 4.Freddolino PL, Liu F, Gruebele M, Schulten K. Biophys. J. 2008;94:L75–L77. doi: 10.1529/biophysj.108.131565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mackerell AD., Jr J. Comp. Chem. 2004;25:1584–1604. doi: 10.1002/jcc.20082. [DOI] [PubMed] [Google Scholar]
  • 6.Beachy MD, Chasman D, Murphy RB, Halgren TA, Friesner RA. J. Am. Chem. Soc. 1997;119:5908–5920. [Google Scholar]
  • 7.Duan Y, Wu C, Chowdhury S, Lee MC, Xiong G, Zhang W, Yang R, Cieplak P, Luo R, Lee T, Caldwell J, Wang J, Kollman PA. J. Comp. Chem. 2003;24:1999–2012. doi: 10.1002/jcc.10349. [DOI] [PubMed] [Google Scholar]
  • 8.Mackerell AD, Feig M, Brooks CL. J. Comp. Chem. 2004;25:1584–1604. doi: 10.1002/jcc.20065. [DOI] [PubMed] [Google Scholar]
  • 9.Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Proteins. 2006;65:712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Buck M, Bouguet-Bonnet S, Pastor RW, MacKerell AD., Jr. Biophys. J. 2006;90:L36–L38. doi: 10.1529/biophysj.105.078154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Showalter SA, Brüschweiler R. J. Chem. Theory Comput. 2007;3:961–975. doi: 10.1021/ct7000045. [DOI] [PubMed] [Google Scholar]
  • 12.Kubelka J, Hofrichter J, Eaton WA. Curr. Opin. Struct. Biol. 2004;14:76–88. doi: 10.1016/j.sbi.2004.01.013. [DOI] [PubMed] [Google Scholar]
  • 13.Tompa P. Trends Biochem. Sci. 2002;27:527–533. doi: 10.1016/s0968-0004(02)02169-2. [DOI] [PubMed] [Google Scholar]
  • 14.Graf J, Nguyen PH, Stock G, Schwalbe H. J. Am. Chem. Soc. 2007;129:1179–1189. doi: 10.1021/ja0660406. [DOI] [PubMed] [Google Scholar]
  • 15.Best RB, Buchete N-V, Hummer G. Biophys. J. 2008;95:L07–L09. doi: 10.1529/biophysj.108.132696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Karplus M. J. Chem. Phys. 1959;30:11–15. [Google Scholar]
  • 17.Wang Z-X, Zhang W, Wu C, Lei H, Cieplak P, Duan Y. J. Comput. Chem. 2006;27:781–790. doi: 10.1002/jcc.20386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yeh I-C, Hummer G. J. Am. Chem. Soc. 2002;124:6563–6568. doi: 10.1021/ja025789n. [DOI] [PubMed] [Google Scholar]
  • 19.Gnanakaran S, García AE. Proteins. 2005;59:773–782. doi: 10.1002/prot.20439. [DOI] [PubMed] [Google Scholar]
  • 20.Shi Z, Olson CA, Rose GD, Baldwin RL. Proc. Natl. Acad. Sci. U. S. A. 2002;99:9190–9195. doi: 10.1073/pnas.112193999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen K, Liu Z, Bracken WC, Kallenbach NR. Angew. Chem. Int. Ed. 2007;46:9036–9039. doi: 10.1002/anie.200703376. [DOI] [PubMed] [Google Scholar]
  • 22.Schweitzer-Stenner R, Measey TJ. Proc. Natl. Acad. Sci. U. S. A. 2007;104:6649–6654. doi: 10.1073/pnas.0700006104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Schweitzer-Stenner R, Gonzales W, Bourne GT, Feng JA, Marshall GR. J. Am. Chem. Soc. 2007;129:13095–13109. doi: 10.1021/ja0738430. [DOI] [PubMed] [Google Scholar]
  • 24.Mukhopadhyay P, Zuber G, Beretan DN. Biophys. J. 2008;95:5574–5586. doi: 10.1529/biophysj.108.137596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Marqusee S, Robbins VH, Baldwin RL. Proc. Natl. Acad. Sci. U. S. A. 1989;86:5286–5290. doi: 10.1073/pnas.86.14.5286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Spek EJ, Olson CA, Shi Z, Kallenbach NR. J. Am. Chem. Soc. 1999;121:5571–5572. [Google Scholar]
  • 27.Kennedy RJ, Walker SM, Kemp DS. J. Am. Chem. Soc. 2005;127:16961–16968. doi: 10.1021/ja054645g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Berendsen HJC, van der Spoel D, van Drunen R. Comp. Phys. Comm. 1995;91:43–56. [Google Scholar]
  • 29.Lindahl E, Hess B, van der Spoel D. J. Mol. Model. 2001;7:306–317. [Google Scholar]
  • 30.Sorin EJ, Pande VS. Biophys. J. 2005;88:2472–2493. doi: 10.1529/biophysj.104.051938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jorgensen WL, Chandrasekhar J, Madura JD. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
  • 32.Vijay-Kumar S, Bugg CE, Cook WJ. J. Mol. Biol. 1987;194:531–544. doi: 10.1016/0022-2836(87)90679-6. [DOI] [PubMed] [Google Scholar]
  • 33.Nosé S, Klein ML. Mol. Phys. 1983;50:1055–1076. [Google Scholar]
  • 34.Parinello M, Rahman A. J. Appl. Phys. 1981;52:7182–7190. [Google Scholar]
  • 35.García AE, Sanbonmatsu KY. Proc. Natl. Acad. Sci. U. S. A. 2002;99:2782–2787. doi: 10.1073/pnas.042496899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Poland D, Scheraga HA. Theory of helix-coil transitions in biopolymers. 1st ed. New York: Academic Press; 1970. [Google Scholar]
  • 37.Qian H, Schellman JA. J. Phys. Chem. 1992;96:3987–3994. [Google Scholar]
  • 38.Bax A, Kontaxis G, Tjandra N. Meth. Enzymol. 2001;339:127–174. doi: 10.1016/s0076-6879(01)39313-8. [DOI] [PubMed] [Google Scholar]
  • 39.Ottiger M, Bax A. J. Am. Chem. Soc. 1998;120:12334–12341. [Google Scholar]
  • 40.Losonczi J, Andrec M, Fischer MWF, Prestegard JH. J. Magn. Reson. 1999;138:334–342. doi: 10.1006/jmre.1999.1754. [DOI] [PubMed] [Google Scholar]
  • 41.Lipari G, Szabo A. J. Am. Chem. Soc. 1982;104:4546–4559. [Google Scholar]
  • 42.Henry ER, Szabo A. J. Chem. Phys. 1985;82:4753–4761. [Google Scholar]
  • 43.Maragakis P, Lindorff-Larsen K, Eastwood MP, Dror RO, Klepeis JL, Arkin IT, Jensen MO, Xu H, Trbovic N, Friesner RA, Palmer AG, Shaw DE. J. Phys. Chem. B. 2008;112:6155, 6158. doi: 10.1021/jp077018h. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Case DA, Scheurer C, Brüschweiler R. J. Am. Chem. Soc. 2000;122:10390–10397. [Google Scholar]
  • 45.Best RB, Buchete N-V, Hummer G. Biophys. J. 2008;95:4494. doi: 10.1529/biophysj.108.132696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shalongo W, Dugad L, Stellwagen E. J. Am. Chem. Soc. 1994;116:8288–8293. [Google Scholar]
  • 47.Scheraga HA, Vila JA, Ripoll DR. Biophys. Chem. 2002;101:255–265. doi: 10.1016/s0301-4622(02)00175-8. [DOI] [PubMed] [Google Scholar]
  • 48.Lovell SC, Davis IWWBA, III, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC. Proteins. 2003;50:437–450. doi: 10.1002/prot.10286. [DOI] [PubMed] [Google Scholar]
  • 49.Tjandra N, Feller SE, Pastor RW, Bax A. J. Am. Chem. Soc. 1995;117:12562–12566. [Google Scholar]
  • 50.Hus J-C, Peti W, Griesinger C, Brüschweiler R. J. Am. Chem. Soc. 2003;125:5596–5597. doi: 10.1021/ja029719s. [DOI] [PubMed] [Google Scholar]
  • 51.Showalter SA, Brüschweiler R. J. Am. Chem. Soc. 2007;129:4158–4159. doi: 10.1021/ja070658d. [DOI] [PubMed] [Google Scholar]
  • 52.Huang X, Bowman GR, Pande VS. J. Chem. Phys. 2008;128:205106. doi: 10.1063/1.2908251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Thompson PA, Eaton WA, Hofrichter J. Biochemistry. 1997;36:9200–9210. doi: 10.1021/bi9704764. [DOI] [PubMed] [Google Scholar]
  • 54.Thompson PA, Muñoz V, Jas GS, Henry ER, Eaton WA, Hofrichter J. J. Phys. Chem. B. 2000;104:378–389. [Google Scholar]
  • 55.Huang CY, Getahun Z, Zhu YJ, Klemke JW, DeGrado WF, Gai F. Proc. Natl. Acad. Sci. U. S. A. 2002;99:2788–2793. doi: 10.1073/pnas.052700099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Mukherjee S, Chowdhury P, Bunagan MR, Gai F. J. Phys. Chem. B. 2008;112:9146–9159. doi: 10.1021/jp801721p. [DOI] [PubMed] [Google Scholar]
  • 57.Paschek D, Gnanakaran S, García AE. Proc. Natl. Acad. Sci. U. S. A. 2005;102:6765–6770. doi: 10.1073/pnas.0408527102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Jorgensen WL. J. Am. Chem. Soc. 1981;103:335–340. [Google Scholar]
  • 59.Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. J. Am. Chem. Soc. 1995;117:5179–5197. [Google Scholar]
  • 60.Wang J, Cieplak P, Kollman PA. J. Comput. Chem. 2000;21:1049–1074. [Google Scholar]
  • 61.Horn HW, Swope WC, Pitera JW, Madura JD, Dick TJ, Hura GL, Head-Gordon T. J. Chem. Phys. 2004;120:9665–9678. doi: 10.1063/1.1683075. [DOI] [PubMed] [Google Scholar]
  • 62.Zimm BH, Bragg JK. J. Chem. Phys. 1959;11:526–535. [Google Scholar]
  • 63.Muñoz V, Serrano L. Nat. Struct. Biol. 1994;1:399–409. doi: 10.1038/nsb0694-399. [DOI] [PubMed] [Google Scholar]
  • 64.Muñoz V, Serrano L. Biopolymers. 1997;41:495–509. doi: 10.1002/(SICI)1097-0282(19970415)41:5<495::AID-BIP2>3.0.CO;2-H. [DOI] [PubMed] [Google Scholar]
  • 65.Rohl CA, Chakrabartty A, Baldwin RL. Protein Sci. 1996;5:2623–2637. doi: 10.1002/pro.5560051225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Presta LG, Rose GD. Science. 1988;240:1632–1641. doi: 10.1126/science.2837824. [DOI] [PubMed] [Google Scholar]
  • 67.Richardson JS, Richardson DC. Science. 1988;240:1648–1652. doi: 10.1126/science.3381086. [DOI] [PubMed] [Google Scholar]
  • 68.Serrano L, Fersht AR. Nature. 1989;342:296–299. doi: 10.1038/342296a0. [DOI] [PubMed] [Google Scholar]
  • 69.Rohl CA, Baldwin RL. Biochemistry. 1997;36:8435–8442. doi: 10.1021/bi9706677. [DOI] [PubMed] [Google Scholar]
  • 70.Rohl CA, Scholtz JM, York EJ, Stewart JM, Baldwin RL. Biochemistry. 1992;31:1263–1269. doi: 10.1021/bi00120a001. [DOI] [PubMed] [Google Scholar]
  • 71.Scholtz JM, Qian H, York EJ, Stewart JM, Baldwin RL. Biopolymers. 1991;31:1463–1470. doi: 10.1002/bip.360311304. [DOI] [PubMed] [Google Scholar]
  • 72.Yang J, Zhao K, Gong Y, Vologodskii A, Kallenbach NR. J. Am. Chem. Soc. 1998;120:10646–10652. [Google Scholar]
  • 73.Richardson JM, Makhatadze GI. J. Mol. Biol. 2004;335:1029–1037. doi: 10.1016/j.jmb.2003.11.027. [DOI] [PubMed] [Google Scholar]
  • 74.Scholtz JM, Marqusee S, Baldwin RL, York EJ, Stewart JM, Santoro M, Bolen DW. Proc. Natl. Acad. Sci. U. S. A. 1991;88:2854–2858. doi: 10.1073/pnas.88.7.2854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Kollman PA. Acc. Chem. Res. 1996;29:461–469. [Google Scholar]
  • 76.Morozov AV, Kortemme T, Tsemekhman K, Baker D. Proc. Natl. Acad. Sci. U.S.A. 2004;101:6946–6951. doi: 10.1073/pnas.0307578101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Steiner T. Angew. Chem. Int. Ed. 2002;41:48–76. [Google Scholar]
  • 78.Grishaev A, Bax A. J. Am. Chem. Soc. 2004;126:7281–7292. doi: 10.1021/ja0319994. [DOI] [PubMed] [Google Scholar]
  • 79.Kortemme T, Morozov AV, Baker D. J. Mol. Biol. 2004;326:1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • 80.Lii J-H, Allinger NL. J. Comp. Chem. 1998;19:1001–1016. [Google Scholar]
  • 81.Cieplak P, Caldwell J, Kollman P. J. Comp. Chem. 2001;22:1048–1057. [Google Scholar]
  • 82.Stone AJ. Science. 2008;321:787–789. doi: 10.1126/science.1158006. [DOI] [PubMed] [Google Scholar]
  • 83.Morozov AV, Tsemekhman K, Baker D. J. Phys. Chem. B. 2006;110:4503–4505. doi: 10.1021/jp057161f. [DOI] [PubMed] [Google Scholar]
  • 84.Kobko N, Dannenberg JJJ. Phys. Chem. A. 2003;107:10389–10395. [Google Scholar]
  • 85.Wieczorek R, Dannenberg JJ. J. Am. Chem. Soc. 2003;125:8124–8129. doi: 10.1021/ja035302q. [DOI] [PubMed] [Google Scholar]
  • 86.Ponder JW, Case DA. Adv. Prot. Chem. 2003;66:27–85. doi: 10.1016/s0065-3233(03)66002-x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

suppText

RESOURCES