Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2016 Jul 21;84(10):1490–1516. doi: 10.1002/prot.25094

FF12MC: A revised AMBER forcefield and new protein simulation protocol

Yuan‐Ping Pang 1,
PMCID: PMC5129589  PMID: 27348292

ABSTRACT

Specialized to simulate proteins in molecular dynamics (MD) simulations with explicit solvation, FF12MC is a combination of a new protein simulation protocol employing uniformly reduced atomic masses by tenfold and a revised AMBER forcefield FF99 with (i) shortened C—H bonds, (ii) removal of torsions involving a nonperipheral sp3 atom, and (iii) reduced 1–4 interaction scaling factors of torsions ϕ and ψ. This article reports that in multiple, distinct, independent, unrestricted, unbiased, isobaric–isothermal, and classical MD simulations FF12MC can (i) simulate the experimentally observed flipping between left‐ and right‐handed configurations for C14–C38 of BPTI in solution, (ii) autonomously fold chignolin, CLN025, and Trp‐cage with folding times that agree with the experimental values, (iii) simulate subsequent unfolding and refolding of these miniproteins, and (iv) achieve a robust Z score of 1.33 for refining protein models TMR01, TMR04, and TMR07. By comparison, the latest general‐purpose AMBER forcefield FF14SB locks the C14–C38 bond to the right‐handed configuration in solution under the same protein simulation conditions. Statistical survival analysis shows that FF12MC folds chignolin and CLN025 in isobaric–isothermal MD simulations 2–4 times faster than FF14SB under the same protein simulation conditions. These results suggest that FF12MC may be used for protein simulations to study kinetics and thermodynamics of miniprotein folding as well as protein structure and dynamics. Proteins 2016; 84:1490–1516. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

Keywords: protein folding, protein dynamics, protein simulation, protein structure refinement, molecular dynamics simulation, force field, chignolin, CLN025, Trp‐cage, BPTI

INTRODUCTION

Used in computer simulations to describe the relationship between a molecular structure and its energy, an additive (viz., nonpolarizable) forcefield is an empirical potential energy function with a set of parameters that is often in the form of Eq. (1).1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 In Eq. (1), kb and b0 are constants of the bond potential energy for two atoms separated by one covalent bond; kθ and θ0 are constants of the angle potential energy for two atoms separated by two consecutive covalent bonds; kϕ and δ are constants of the torsion potential energy for two atoms separated by three consecutive covalent bonds; Aij and Bij are constants of the van der Waals interaction energy for two intermolecular atoms or for two intramolecular atoms separated by three or more consecutive covalent bonds; C is a constant of the electrostatic interaction energy for two intermolecular atoms or for two intramolecular atoms separated by three or more consecutive covalent bonds. The Aij and Bij constants for the atoms separated by three consecutive covalent bonds are typically divided by a 1–4 van der Waals interaction scaling factor (termed SCNB in AMBER forcefields15, 16). The C constant for the atoms separated by three consecutive covalent bonds is also divided by a 1–4 electrostatic interaction scaling factor (termed SCEE in AMBER forcefields).

E=kb(bb0)2+kθ(θθ0)2+kϕ[cos(nϕ+δ)+1]+(Aijrij12Bijrij6+Cqiqjrij1) (1)

Current general‐purpose forcefields are already well refined for various simulations of proteins and other molecules, including folding simulations of a range of miniproteins with implicit or explicit solvation.12, 19, 20, 21, 22, 23 However, simulations using these forcefields to autonomously fold miniproteins in molecular dynamics (MD) simulations with explicit solvation without biasing the simulation systems21 have been limited to those performed on extremely powerful but proprietary special‐purpose supercomputers.23, 24, 25 It is desirable to develop a further‐refined, special‐purpose forcefield that can fold miniproteins with folding times that are both shorter than those using a general‐purpose forcefield and, more importantly, closer to the experimental values. This type of specialized forcefield may enable autonomous folding of fast‐folding miniproteins in simulations with explicit solvation on commercial computers such as Apple Mac Pros and permit such simulations to be performed under isobaric–isothermal (NPT) conditions that are used in most experimental protein folding studies. It may also enable autonomous folding of slow‐folding miniproteins on the special‐purpose supercomputers. More importantly, this type of forcefield may improve sampling of nonnative states of a miniprotein in multiple, distinct, independent, unrestricted, unbiased, and classical NPT MD simulations to capture the major folding pathways26 and thereby correctly predict the folding kinetics of the miniprotein. It may also improve simulations of genuine localized disorders of folded globular proteins and refinement of comparative models of monomeric globular proteins by MD simulations.27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44

It has been shown that uniform reduction of the atomic masses of the entire simulation system (both solute and solvent) by tenfold can enhance configurational sampling in NPT MD simulations.45 The uniformly reduced masses by tenfold are hereafter referred to as low masses. The effectiveness of the low‐mass NPT MD simulation technique can be explained as follows:46 To determine the relative configurational sampling efficiencies of two simulations of the same molecule—one with standard masses and another with low masses, the units of distance [l] and energy [m]([l]/[t])2 of the low‐mass simulation are kept identical to those of the standard‐mass simulation, noting that energy and temperature have the same unit. This is so that the structure and energy of the low‐mass simulation can be compared to those of the standard‐mass simulation. Let superscripts lmt and smt denote the times for the low‐mass and standard‐mass simulations, respectively. Then [m lmt] = 0.1 [m smt], [l lmt] = [l smt], and [m lmt]([l lmt]/[t lmt])2 = [m smt]([l smt]/[t smt])2 lead to 10 [t lmt] = [t smt]. A conventional MD simulation program takes the timestep size (Δt) of the standard‐mass time rather than that of the low‐mass time. Therefore, the low‐mass MD simulation at Δt = 1.00 fssmt (viz., 10 fslmt) is theoretically equivalent to the standard‐mass MD simulation at Δt =  10 fssmt, as long as both standard‐ and low‐mass simulations are carried out for the same number of timesteps and there are no precision issues in performing these simulations. This equivalence of mass scaling and timestep‐size scaling explains why the low‐mass NPT MD simulation at Δt = 1.00 fssmt (viz., 3.16 fslmt) can offer better configurational sampling efficacy than the standard‐mass NPT MD simulation at Δt = 1.00 fssmt or Δt = 2.00 fssmt. It also explains why the kinetics of the low‐mass simulation can be converted to the kinetics of the standard‐mass simulation simply through scaling the low‐mass time by a factor of 10. Further, this equivalence explains there are limitations on the use of the mass reduction technique to improve configurational sampling efficiency. Lengthening the timestep size inevitably reduces integration accuracy of an MD simulation. However, the integration accuracy reduction caused by a timestep‐size increase is temperature dependent. Therefore, to avoid serious integration errors, low‐mass NPT MD simulations must be performed with the double‐precision floating‐point format and at Δt ≤ 1.00 fssmt and a temperature of ≤340 K.46 Because temperatures of biological systems rarely exceed 340 K and because MD simulations are performed typically with the double‐precision floating‐point format, low‐mass NPT MD simulation is a viable configurational sampling enhancement technique for protein simulations at a temperature of ≤340 K.

Another study showed that shortening C–H bonds according to the lengths found in high resolution cryogenic protein structures can reduce the computing time of an MD simulation to capture miniprotein folding.47 This is presumably because the shortened C–H bonds reduce the exaggeration of short‐range repulsions caused by the implementation of the 612 Lennard‐Jones potential and a nonpolarizable charge model in an additive forcefield.48 A subsequent study found that increasing or decreasing SCNBs of ϕ and ψ and/or SCEEs of ϕ and ψ can raise or lower, respectively, the ratio of the α‐helical conformation over the β‐strand conformation.49 This suggests that the propensities of a forcefield to adopt secondary structure elements can be adjusted by modifying SCNBs and/or SCEEs of ϕ and ψ without implementing the four backbone torsions (ϕ, ψ, ϕ', and ψ').

In this context and aiming to simulate proteins in MD simulations with explicit solvation, this author devised an additive forcefield named FF12MCsm that is based on general‐purpose AMBER forcefield FF9950 with (i) the aliphatic C–H bonds shortened to 0.98 Å and the aromatic C–H bonds shortened to 0.93 Å, (ii) removal of torsions involving a nonperipheral sp3 atom, and (iii) reduced 1–4 interaction scaling factors of torsions ϕ and ψ (1.00 for SCNB; 1.18 for SCEE). The shortened bond lengths were obtained from a survey of 3709 C–H bonds in the cryogenic protein structures with resolutions of 0.62–0.79 Å.47 The reduced scaling factors were obtained from benchmarking FF12MCsm against the experimentally determined mean helix content of Ac‐(AAQAA)3‐NH2 (hereafter abbreviated as AAQAA).51 To avoid replacing the nonperipheral‐sp3 torsion parameters with a set of arbitrary and complicated scaling factors, two requirements were used to determine the SCNB and SCEE for torsions ϕ and ψ in FF12MCsm. First, the computed mean α‐helix contents of AAQAA at different temperatures using a reported NPT MD simulation protocol49 had to be close to the experimental values. Second, the SCNB and SCEE had to be close to 1.00, namely, the scaling of ϕ and ψ should be reduced as much as possible. As described in RESULTS AND DISCUSSION, with SCNB reduced from 2.00 in FF99 to 1.00 in FF12MCsm and SCEE reduced from 1.20 in FF99 to 1.18 in FF12MCsm, the computed mean α‐helix contents of AAQAA using FF12MCsm are indeed close to the experimental data. Like the removal of the 1–4 interaction scaling factors in the GLYCAM06 forcefield,52 the scaling of the 1–4 van der Waals interactions for ϕ and ψ is completely removed in FF12MCsm. The scaling of the 1–4 electrostatic interactions for ϕ and ψ is also reduced.

Also as demonstrated in RESULTS AND DISCUSSION, these modifications in combination with the mass reduction technique enabled FF12MCsm to fold miniproteins with folding times that were substantially shorter than those of a general‐purpose forcefield. However, FF12MCsm did not fold the miniproteins with folding times that were shorter than the experimentally observed folding times, which emphasizes that these modifications were not made to artificially accelerate folding rates for saving computing time. Instead these modifications were made to improve (i) sampling of nonnative states of a miniprotein, (ii) simulation of genuine localized disorders of a folded globular protein, and (iii) refinement of comparative models of a monomeric globular protein.

As reported,46 FF12MCsm is intended for standard‐mass MD simulations with an explicit solvation model at Δt ≤3.16 fssmt and a temperature of ≤340 K without employing the hydrogen mass repartitioning scheme.53, 54, 55 FF12MCsm can also be used for standard‐mass MD simulations at Δt >3.16 fssmt and a temperature of >340 K with the hydrogen mass repartitioning scheme. A combination of FF12MCsm with the low‐mass configurational sampling enhancement technique45, 46 is a derivative of FF12MCsm. With all atomic masses uniformly reduced by tenfold, this derivative (hereafter abbreviated as FF12MC) is intended for low‐mass NPT MD simulations of proteins with an explicit solvation model (preferably the TIP3P water model56) at Δt = 1.00 fssmt and a temperature of ≤340 K.

This article reports an FF12MC evaluation study consisting of 1350 NPT MD simulations at 1 atm and 274–340 K with an aggregated simulation time of 1252.572 μssmt. Using general‐purpose AMBER forcefields FF96 (see RESULTS AND DISCUSSION for reasons to include this forcefield),57 FF12SB, and FF14SB16 as references, these simulations were carried out to determine whether in multiple, distinct, independent, unrestricted, unbiased, and classical NPT MD simulations FF12MC or FF12MCsm can (i) reproduce the experimental J‐coupling constants of four cationic homopeptides (Ala3, Ala5, Ala7, and Val3)58 and four folded globular proteins of the third immunoglobulin‐binding domain of protein G (GB3),59, 60 bovine pancreatic trypsin inhibitor (BPTI),61 ubiquitin,62 and lysozme,63 (ii) reproduce crystallographic B‐factors64 and nuclear magnetic resonance (NMR)‐derived Lipari‐Szabo order parameters65 of GB3, BPTI, ubiquitin, and lysozyme, (iii) simulate the experimentally observed flipping between left‐ and right‐handed configurations for the C14–C38 disulfide bond of BPTI and its mutant,66 (iv) autonomously fold β‐hairpins of chignolin67 and CLN02568 and an α‐miniprotein Trp‐cage (the TC10b sequence69) with folding times (τfs) in agreement with experimental τfs,70, 71 (v) simulate subsequent unfolding and refolding of these sequences, and (vi) refine TMR01, TMR04, and TMR07—comparative models of proteins selected from the first Critical Assessment of protein Structure Prediction model Refinement (CASPR) experiment (http://predictioncenter.org/caspR/, note that subsequent model refinement experiments are called CASP rather than CASPR). Unless otherwise specified, all simulations described below are multiple, distinct, independent, unrestricted, unbiased, and classical NPT MD simulations.

METHODS

MD simulations of peptides, miniproteins, and folded globular proteins

A peptide or a miniprotein in a fully extended backbone conformation (or a globular protein in its folded state) was solvated with the TIP3P water56 with or without surrounding counter ions and/or NaCls and then energy‐minimized for 100 cycles of steepest‐descent minimization followed by 900 cycles of conjugate‐gradient minimization to remove close van der Waals contacts using SANDER of AMBER 11 (University of California, San Francisco). The resulting system was heated from 0 to a temperature of 274–340 K at a rate of 10 K/ps under constant temperature and constant volume, then equilibrated for 106 timesteps under constant temperature and constant pressure of 1 atm employing isotropic molecule‐based scaling, and finally simulated in 20 or 30 distinct, independent, unrestricted, unbiased, and classical NPT MD simulations using PMEMD of AMBER 11 with a periodic boundary condition at 274–340 K and 1 atm. The fully extended backbone conformations (viz., anti‐parallel β‐strand conformations) of Ala3, Ala5, Ala7, Val3, AAQAA, chignolin, CLN025, and Trp‐cage were generated by MacPyMOL Version 1.5.0 (Schrödinger LLC, Portland, OR). The folded globular protein structures of GB3, BPTI, mutant of BPTI, ubiquitin, and lysozyme were taken from the Protein Data Bank (PDB) structures of IDs 1P7E/1IGD, 5PTI/1PIT, 1QLQ, 1UBQ, and 4LZT, respectively. Four crystallographically determined interior water molecules (WAT111, WAT112, WAT113, and WAT122) were included in the 5PTI structure as the initial conformation of the simulations. Likewise, five interior water molecules (WAT2017, WAT2023, WAT2025, WAT2072, and WAT2092) were included the initial 1QLQ structure. CASPR models TMR01, TMR04, and TMR07 were downloaded from http://predictioncenter.org/caspR/. For TMR01, the cis amide bond of Ser70 was manually changed to the trans configuration, and all residues that were not determined in the corresponding crystal structure (PDB ID: 1XE1) were removed. His28, His33, His44, and His68 of TMR04 were treated as HIE. His20, His51, and His53 of TMR07 were treated as HID. The numbers of TIP3P waters and surrounding ions, initial solvation box size, ionizable residues, and computers used for the NPT MD simulations are provided in Supporting Information Table S1. The 30 unique seed numbers for initial velocities of Simulations 1–30 are listed in Supporting Information Table S2. All simulations used (i) a dielectric constant of 1.0, (ii) the Berendsen coupling algorithm,72 (iii) the Particle Mesh Ewald method to calculate electrostatic interactions of two atoms at a separation of >8 Å,73 (iv) Δt = 0.10, 1.00, or 3.16 fssmt, (v) the SHAKE‐bond‐length constraints applied to all bonds involving hydrogen, (vi) a protocol to save the image closest to the middle of the “primary box” to the restart and trajectory files, (vii) a formatted restart file, (viii) the revised alkali and halide ions parameters,74 (ix) a cutoff of 8.0 Å for nonbonded interactions, (x) the atomic masses of the entire simulation system (both solute and solvent) that were either unscaled or reduced uniformly by tenfold, and (xi) default values of all other inputs of the PMEMD module. For the simulations of Ala3, Ala5, Ala7, and Val3, the forcefield parameters of the cationic Ala (ALC) and the cationic Val (VAC) with their amino and carboxylate groups protonated at pH 2 were generated according to a published procedure using both α‐helix and β‐strand conformations for the RESP charge calculation.75 These forcefield parameters are provided in Supporting Information ALC.lib and VAC.lib. The forcefield parameters of FF12MC are available in the Supporting Information of Ref. 46.

Aggregated native state population

Cα and Cβ root mean square deviation (CαβRMSD) was calculated using PTRAJ of AmberTools 1.5 with root mean square (RMS) fit of all α and β carbon atoms to the corresponding ones in the reference structure without mass weighing. Cα root mean square deviation (CαRMSD) or all‐carbon root mean square deviation (CRMSD) was calculated similarly with RMS fit of all α carbon atoms or all carbon atoms to the corresponding ones in the reference structure, respectively.

In NPT MD simulations, chignolin could fold to its native β‐hairpin with Tyr2 and Trp9 on the same side of the hairpin67 [Fig. 1(A)] and to native‐like β‐hairpins with Tyr2 on one side of the hairpin and Trp9 on the other [Fig. 1(B)].47 Similarly, CLN025 could fold to native‐like β‐hairpins with Tyr1, Trp9, and Tyr10 on one side of the β‐sheet and Tyr2 on the other [Fig. 1(K)] or with Tyr1 and Trp9 on one side and Tyr2 and Tyr10 on the other [Fig. 1(L)] in NPT MD simulations,45 while the native conformations of CLN025 in the NMR and crystal structures have Tyr2 and Trp9 on one side of the β‐sheet and Tyr1 and Tyr10 on the other68 [Fig. 1(F,G)].

Figure 1.

Figure 1

Native and native‐like conformations of chignolin, CLN025, and Trp‐cage (TC10b). A: The NMR structure of chignolin. B: A native‐like conformation of chignolin generated by FF12SB. C: The average chignolin conformation of the largest cluster generated by FF12SB. D: The average chignolin conformation of the largest cluster generated by FF14SB. E: The average chignolin conformation of the largest cluster generated by FF12MC. F: The NMR structure of CLN025. G: The crystal structure of CLN025. H: The average CLN025 conformation of the largest cluster generated by FF12SB. I: The average CLN025 conformation of the largest cluster generated by FF14SB. J: The average CLN025 conformation of the largest cluster generated by FF12MC. K: A native‐like conformation of CLN025 generated by FF12SB. L: Another native‐like conformation of CLN025 generated by FF12SB. M: The NMR structure of the Trp‐cage. N: A native‐like conformation of the Trp‐cage generated by FF12MC. O: The average Trp‐cage conformation of the largest cluster generated by FF12MC.

The smallest CαβRMSD between one of the native‐like chignolin conformations and the chignolin NMR structure is 1.99 Å, whereas the corresponding CαRMSD and CRMSD are 1.58 Å and 3.92 Å, respectively [Fig. 1(B)]. The smallest CαβRMSD between one of the native‐like CLN025 conformations and the CLN025 NMR structure is 2.08 Å, but the corresponding CαRMSD and CRMSD are 1.33 Å and 4.71 Å, respectively [Fig. 1(K)]. The smallest CαβRMSD and CRMSD between the native and native‐like conformations of the Trp‐cage (TC10b) are 2.01 Å and 2.08 Å, respectively [Fig. 1(N)]. To distinguish conformations at the native state from those at native‐like states [Fig. 1(B,K,L,N)] or those at nonnative states, in this study the CαβRMSD cutoff was set at 1.96 Å. Although the time series of CαβRMSD from native conformations revealed that AAQAA, chignolin, CLN025, and the Trp‐cage can be folded to conformations with CαβRMSDs of ≤1.50 Å (Supporting Information Fig. S1), the CαβRMSD cutoff for the native state was set at 1.96 Å because the CαβRMSD between the NMR and crystal structures of CLN025 is 1.95 Å [Fig. 1(G)]. Otherwise, using a CαβRMSD cutoff of ≤1.50 Å would preclude the conformation determined by the crystallographic analysis that is commonly considered at the native state.

Therefore, the individual native state population of chignolin, CLN025, AAQAA, or the Trp‐cage in one MD simulation was calculated as the number of conformations with CαβRMSDs of ≤1.96 Å divided by the number of all conformations saved at every 105 timesteps. Averaging the individual native state populations of a set of 20 or 30 distinct and independent simulations gave rise to the aggregated native state population for the set. The standard deviation (SD) and standard error (SE) of the aggregated native state population were calculated according to Eqs. (1) and (2) of Ref. 47, respectively, wherein N is the number of all simulations, Pi is the individual native state population of the ith simulation, and P¯ is the aggregated native state population.

Fractional helicity and α‐helix population of AAQAA

The experimentally determined fractional helicity (or mean helix content) of AAQAA at a specific temperature (in units of °C) was estimated by averaging component helicities that were obtained according to Eqs. (1) and (2) of Ref. 51 with T m and ΔT values and their SDs taken from Table 1 of Ref. 51. Torsions ϕ and ψ of each residue in AAQAA were computed from 2 x 107 conformations saved at every 103 timesteps of 20 1.00‐μssmt or 3.16‐μssmt simulations of AAQAA with the simulation conditions described above. The forcefield parameters for the Ala residue with amidation using NH2 (ALN) were taken from Ref. 49. The computationally determined fractional helicity of AAQAA was calculated from ϕ and ψ as follows: A residue was considered to be in the α‐helical (viz., 3.613‐helical) conformation if it was one of four consecutive residues with all their torsions ψ and ϕ within 20° of the reported ψ and ϕ for α‐helix (ϕ of −57° and ψ of −47°).76 A component fractional helicity of a residue in AAQAA was defined as the number of the α‐helix conformations for that residue divided by the number of all conformations for AAQAA (viz., 2 x 107). Averaging the component fractional helicities of residues 1–15 gave rise to the computationally determined fractional helicity of AAQAA. The α‐helix population of AAQAA was calculated from CαβRMSD as follows: Cluster analysis of 20,000 conformations from the 20 3.16‐μssmt simulations of AAQAA using FF12MC identified a fullα‐helix conformation with hydrogen bonds involving the Ac and NH2 terminal groups [Fig. 2(A)] as the most popular conformation (Supporting Information Table S3). Using this conformation as the native conformation, CαβRMSDs for all 2 x 107 conformations of AAQAA were then calculated to determine the number of conformations with CαβRMSDs of ≤1.96 Å. Dividing this number by the number of all AAQAA conformations gave rise to the α‐helix population of AAQAA. The SDs of the computationally determined fractional helicity and the α‐helix population were calculated using the same method for the SD of the aggregated native state population described above.

Table 1.

Mean Square Deviations and Root Mean Square Deviations Between Experimental and Calculated J‐Coupling Constants of Homopeptides Using Different Parameter Sets of the Karplus Equations

Peptide Parameter Set χ 2 (mean ± SE) RMSD (mean ± SE in Hz)
FF12SB FF14SB FF12MC FF12SB FF14SB FF12MC
Ala3 Original 0.90 ± 0.01 0.90 ± 0.01 1.34 ± 0.00 0.41 ± 0.00 0.41 ± 0.00 0.50 ± 0.00
Schmidt 1.02 ± 0.01 1.02 ± 0.01 1.17 ± 0.00 0.57 ± 0.00 0.56 ± 0.00 0.45 ± 0.00
DFT1 3.08 ± 0.05 3.02 ± 0.04 1.97 ± 0.00 0.76 ± 0.01 0.75 ± 0.00 0.58 ± 0.00
DFT2 1.37 ± 0.02 1.35 ± 0.02 1.31 ± 0.00 0.50 ± 0.00 0.49 ± 0.00 0.55 ± 0.00
Ala5 Original 0.85 ± 0.03 0.88 ± 0.04 1.32 ± 0.00 0.35 ± 0.01 0.36 ± 0.01 0.50 ± 0.00
Schmidt 0.92 ± 0.02 0.95 ± 0.03 1.16 ± 0.00 0.47 ± 0.00 0.48 ± 0.00 0.44 ± 0.00
DFT1 3.04 ± 0.04 3.05 ± 0.05 2.19 ± 0.01 0.70 ± 0.00 0.71 ± 0.01 0.61 ± 0.00
DFT2 1.33 ± 0.02 1.36 ± 0.03 1.38 ± 0.00 0.46 ± 0.00 0.46 ± 0.00 0.58 ± 0.00
Ala7 Original 0.46 ± 0.02 0.52 ± 0.05 0.84 ± 0.00 0.33 ± 0.01 0.34 ± 0.01 0.49 ± 0.00
Schmidt 0.55 ± 0.02 0.60 ± 0.04 0.75 ± 0.01 0.45 ± 0.00 0.46 ± 0.01 0.45 ± 0.00
DFT1 2.62 ± 0.04 2.70 ± 0.04 1.80 ± 0.01 0.68 ± 0.01 0.69 ± 0.00 0.60 ± 0.01
DFT2 0.96 ± 0.02 1.03 ± 0.03 0.92 ± 0.00 0.45 ± 0.00 0.46 ± 0.01 0.56 ± 0.00
Val3 Original 1.71 ± 0.06 1.74 ± 0.03 0.76 ± 0.00 0.78 ± 0.01 0.79 ± 0.01 0.41 ± 0.00
Schmidt 2.22 ± 0.04 2.38 ± 0.03 0.95 ± 0.00 1.01 ± 0.01 1.05 ± 0.01 0.59 ± 0.00
DFT1 5.27 ± 0.10 6.35 ± 0.10 2.90 ± 0.01 1.13 ± 0.01 1.22 ± 0.01 0.72 ± 0.01
DFT2 2.84 ± 0.06 3.17 ± 0.05 1.22 ± 0.01 0.85 ± 0.01 0.86 ± 0.01 0.45 ± 0.01
Overall All 1.82 ± 0.01 1.94 ± 0.01 1.37 ± 0.00 0.62 ± 0.00 0.63 ± 0.00 0.53 ± 0.00
Overall No DFT1 1.26 ± 0.01 1.33 ± 0.01 1.09 ± 0.00 0.55 ± 0.00 0.56 ± 0.00 0.50 ± 0.00

χ 2, mean square deviation; RMSD, root mean square deviation; SE, standard error; smt, standard‐mass time. The experimental and calculated J‐coupling constants are listed in Tables S8 A–D. The mean and standard error of each χ 2 or RMSD were obtained from 20 distinct and independent 200‐million–timestep NPT MD simulations of a homopeptide at Δt = 1.00 fssmt, 300 K, and 1 atm. The overall χ 2 or RMSD of a forcefield was obtained from averaging all 16 χ 2 values of that forcefield or 12 χ 2 values of that forcefield (excluding those from the DFT1 parameter set) with an equal weight. The standard error of the overall χ 2 or RMSD was calculated using the standard method for propagation of errors of precision.

Figure 2.

Figure 2

The three most populated, instantaneous conformations of AAQAA observed in MD simulations using FF12MC. Numbers in red denote hydrogen bond lengths in Å. A: The full–α‐helix conformation showing hydrogen bonds of two terminal protecting groups. B: The α‐and‐π helical conformation showing the side‐chain•main‐chain hydrogen bond of Gln3, the side‐chain•side‐chain hydrogen bond of Gln8 and Gln13, and main‐chain•main‐chain hydrogen bonds in α and π helices. C: The α‐helix conformation showing substantial unfolding of the Ac, Ala1, and NH2 residues.

J‐coupling constant calculation

Using PTRAJ of AmberTools 1.5, torsions ϕ and ψ of each residue in a homopeptide were computed from all conformations saved at every 103 timesteps of 20 simulations of the peptide with the simulation conditions described above. Similarly, torsions ϕ and ψ of each residue and torsion χ of each non‐glycine residue in a folded globular protein were computed from all conformations saved at every 105 timesteps of 20 simulations of the protein. An instant J‐coupling constant (J i in Hz) of a residue was calculated according to Supporting Information Eqs. (S1)–(S20) using a set of parameters described as follows. The Original parameters of Eqs. (S1)–(S5), (S6), (S7), and (S8) were taken from Refs. 62, 77, 78, and 79, respectively. The Schmidt parameters of Eqs. (S1)–(S5), (S6), (S7), and (S8) were taken from Refs. 77, 78, 80, and 79, respectively. The DFT1 and DFT2 parameters of Eqs. (S1)–(S5), (S6), (S7), and (S8) were taken from Refs. 77, 78, 79,and 81, respectively. The Original and Schmidt parameters of Eqs. (S9)–(S14) were taken from Ref. 82. The Best‐Fit and DFT parameters of Eqs. (S15)–(S20) were taken from Ref. 83. Averaging all instant J‐coupling constants of a residue gave rise to the J‐coupling constant for that residue.

The mean square deviation (χ 2) between experimental and calculated J‐coupling constants was estimated according to Eq. (S21) with σi values taken from Supporting Information Table S3 of Ref. 84. The mean and SE of a χ 2 value were obtained from 20 simulations using the same method for the mean and SE of the aggregated native state population described above. The experimental J‐coupling constants of the four homopeptides were obtained from the supporting information of Ref. 58. The experimental J‐coupling constants of the four folded globular proteins were obtained from the supporting information of Ref. 17 for GB3 and ubiquitin, Ref. 85 for BPTI, and Refs. 63 and 85 for lysozyme. The simulation temperatures of the protein J‐coupling constant calculations were taken from Refs. 59 and 60 for GB3, Ref. 61 for BPTI, Ref. 62 for ubiquitin, and Ref. 63 for lysozyme.

The overall χ 2 value of a forcefield for peptide J‐coupling constants was obtained by averaging all 16 χ 2 values of that forcefield in Table 1 or 12 χ 2 values of that forcefield in Table 1 (excluding those using the DFT1 parameter set) with an equal weight. Similarly, the overall χ 2 value of a forcefield for protein J‐coupling constants was obtained from averaging all four combined or main‐chain χ 2 values of the forcefield with an equal weight. The SE of the overall χ 2 was calculated according to the standard method for propagation of errors of precision.86

The Lipari‐Szabo order parameter prediction

Using a two‐step procedure and PTRAJ of AmberTools 1.5, the backbone N–H Lipari‐Szabo order parameter (S2)65 of a folded globular protein was predicted from all conformations saved at every 103 timesteps of 20 simulations of the protein with the simulation conditions described above and additional conditions described in RESULTS AND DISCUSSION. The first step was to align all saved conformations onto the first saved one using RMS fit of all CA, C, N, and O atoms. The second step was to compute S2 using the isotropic reorientational eigenmode dynamics (iRED) analysis method87 implemented in PTRAJ. Although the first step was unnecessary for the iRED analysis method,87 the explicit alignment was done in this study for the future use of these conformations to compute S2 with other analytical methods. PDB IDs 1P7E, 5PTI, 1UBQ, and 4LZT were used in the GB3, BPTI, ubiquitin, and lysozyme simulations to calculate their S2 parameters. The temperatures of the simulations for GB3, BPTI, ubiquitin, and lysozyme were set at 297 K, 298 K, 300 K, and 308 K, respectively, according to the temperatures at which the experimental S2 parameters were obtained.88, 89, 90, 91 The calculated S2 parameters of each protein reported in Supporting Information Table S4 and Figure 3 are the average of all S2 parameters derived from 20 distinct and independent simulations of the protein. The SE of an S2 parameter was calculated using the same method as the one for the SE of an aggregated native state population. The ability of a forcefield to reproduce the experimental S2 parameters is determined by root mean square deviation (RMSD) between computed and experimental S2 parameters. The experimental S2 parameters extracted from 15N spin relaxation data for GB3, ubiquitin, lysozyme, and BPTI were obtained from respective supporting information or corresponding authors of Refs. 88, 89, 90, 91. The SE of an RMSD was calculated using the same method as the one for the SE of an S2 value.

Figure 3.

Figure 3

Experimental and calculated Lipari‐Szabo order parameters of backbone N–H bonds of GB3, BPTI, ubiquitin, and lysozyme. The order parameters were calculated from 20 unbiased, unrestricted, distinct, independent, and 50‐pssmt NPT MD simulations using FF12MCsm or FF14SB with Δt = 0.1 fssmt.

The crystallographic B‐factor prediction

Using a two‐step procedure and PTRAJ of AmberTools 1.5, the crystallographic B‐factors of Cα and Cγ in a folded globular protein were estimated from all conformations saved at every 103 timesteps of 20 simulations of the protein with the simulation conditions described in the Lipari‐Szabo order parameter prediction. The first step was to align all saved conformations onto the first saved one to obtain an average conformation using RMS fit of all CA atoms (for Cα B‐factors) or all CG and CG2 atoms (for Cγ B‐factors). The second step was to RMS fit all CA atoms (or all CG and CG2 atoms) of all saved conformations onto the corresponding atoms of the average conformation and then calculate the Cα (or Cγ) B‐factors using the “atomicfluct” command in PTRAJ. PDB IDs 1IGD, 1PIT, 1UBQ, and 4LZT were used in the GB3, BPTI, ubiquitin, and lysozyme simulations to calculate their B‐factors. A truncated 1IGD structure (residues 6–61) was used for the GB3 simulations. The simulations for GB3, BPTI, and ubiquitin were done at 297 K, whereas the simulations of lysozyme were performed at 295 K. The calculated B‐factors of each protein reported in Supporting Information Table S5 and Figure 4 are the average of all B‐factors derived from 20 distinct and independent simulations of the protein. The SE of a B‐factor was calculated using the same method as the one for the SE of an S2 parameter. The ability of a forcefield to reproduce the B‐factors was measured by RMSD between computed and experimental B‐factors. The experimental B‐factors of GB3, BPTI, ubiquitin, and lysozyme were taken from the crystal structures of PDB IDs 1IGD, 4PTI, 1UBQ, and 4LZT, respectively. The SE of an RMSD was calculated using the same method for the SE of a B‐factor.

Figure 4.

Figure 4

Experimental and calculated crystallographic Cα B‐factors of GB3, BPTI, ubiquitin, and lysozyme. The B‐factors were calculated from 20 unbiased, unrestricted, distinct, independent, and 50‐pssmt NPT MD simulations using FF12MCsm or FF14SB with Δt = 0.1 fssmt.

Folding time estimation

The folding time (τf) of a peptide or miniprotein was estimated from the mean time of the peptide or miniprotein to fold from a fully extended backbone conformation to its native conformation (abbreviated hereafter as mean time‐to‐folding) in 20 (for AAQAA and β‐hairpins) or 30 (for the Trp‐cage) distinct and independent NPT MD simulations using survival analysis methods92 implemented in the R survival package Version 2.38–3 (http://cran.r‐project.org/package=survival). The afore‐described CαβRMSD cutoff of ≤1.96 Å was used to identify the native conformation. For each simulation with conformations saved at every 105 timesteps, the first time instant at which CαβRMSD reached ≤1.96 Å was recorded as an individual folding time (IFT; Supporting Information Fig. S1). Using the Kaplan‐Meier estimator93, 94 [the Surv() function in the R survival package], the mean time‐to‐folding was calculated from a set of simulations each of which captured a folding event. If a parametric survival function mostly fell within the 95% confidence interval (95% CI) of the Kaplan‐Meier estimation for a set of simulations each of which captured a folding event, the parametric survival function [the Surreg() function in the R survival package] was then used to calculate the mean time‐to‐folding of that set of simulations. If the mean time‐to‐folding derived from the Kaplan‐Meier estimator for a first set of simulations—each of which captured a folding event—was nearly identical to the one derived from a parametric survival function for the first set, the parametric function was then used to calculate the mean time‐to‐folding of a second set of simulations that were identical to the first set except that the simulation time or forcefield of the second set was changed. When a parametric survival function was used to calculate the mean time‐to‐folding, not all simulations in a set had to capture a folding event, but more than half of the set must capture a folding event to avoid an overly wide 95% CI.

CASPR model refinement evaluation and forcefield performance ranking

The average conformation of the largest cluster of a protein model—identified by the cluster analysis described below—was used as the refined model of the protein. This refined model was evaluated with nine quality scores (QSs) including the sseRMSD score,37 the CαRMSD score, the GDT‐TS and GDT‐HA scores,95 the GDC‐all score,96 the RPF score,97 the LDDT score,98 the SphereGrinder score,99 and the CAD score.100 The sseRMSD score was calculated using PTRAJ of AmberTools 1.5 with RMS fit of the CA, C, N, and O atoms of selected residues in the refined model to the corresponding ones in the crystal structure without mass weighing, wherein the selected residues in the refined model correspond to those defined as secondary structure elements in the crystal structure. The CαRMSD, GDT‐TS, and GDT‐HA scores were calculated using the TM‐score program.101 The GDC‐all score was calculated using the input of “LGA_49605 ‐gdc” at the LGA102 server (http://proteinmodel.org/AS2TS/LGA/lga.html). The RPF score was calculated using the RPF program (for Mac OS X) modified for the assessment of template‐based protein structure predictions of the 10th Critical Assessment of protein Structure Prediction (CASP10).97 This modified program was obtained from Dr. Yuanpen J. Huang of the Gaetano T. Montelione group. The LDDT score was calculated using the LDDT executable (for Mac OS X) downloaded from http://swissmodel.expasy.org/lddt/downloads/. The SphereGrinder score was calculated using the SphereGrinder server (http://spheregrinder.cs.put.poznan.pl). The CAD score was calculated with the all‐atom option for both target and model structures using the CAD score server (http://bioinformatics.ibt.lt/cad‐score/).

Cluster analysis and data processing

The conformational cluster analysis of a peptide or miniprotein was performed using PTRAJ of AmberTools 1.5 with the average‐linkage algorithm,103 epsilon of 2.0 Å, and root mean square coordinate deviation was calculated on all Cα and Cβ atoms for AAQAA, chignolin, CLN025, and the Trp‐cage. The cluster analysis of a folded globular protein was performed using the same protocol except that the root mean square coordinate deviation was calculated on Cα atoms of all residues of GB3, BPTI, ubiquitin, and lysozyme or was calculated on Cα atoms of residues 991 for TMR01, residues 770 for TMR04, and residues 1107 for TMR07 (for additional information see Supporting Information Tables S3 and S6). The torsional cluster analyses for BPTI and its mutant were conducted as follows. Using the PTRAJ program, a set of five consecutive torsions of the C14–C38 bond was calculated from each conformation saved at every 105 timesteps from 20 distinct and independent simulations. The five torsions were defined as (i) :14@N :14@CA :14@CB :14@SG; (ii) :14@CA :14@CB :14@SG :38@SG; (iii) :14@CB :14@SG :38@SG :38@CB; (iv) :14@SG :38@SG :38@CB :38@CA; (v) :38@SG :38@CB :38@CA :38@N. Each set of these torsions was then compared to all other sets using the criterion that two torsion sets are different if one of the five torsions in one set differs by 60 degrees of arc or more from the corresponding one in the other set. The number of torsion sets in a cluster divided by all torsion sets gave rise to the occurrence of the cluster. No energy minimization was performed on the average conformation of any cluster. Radius of gyration was calculated using PTRAJ of AmberTools 1.5. Smoothed time series of CαβRMSD were generated by PRISM of GraphPad Software (La Jolla, California) using 32 neighbors on each size and 6th order of the smoothing polynomial.

RESULTS AND DISCUSSION

Use of different timestep sizes for forcefield evaluation

It was reported that unless the atomic masses of the entire simulation system (both solute and solvent) were reduced uniformly by tenfold, FF14SB was unable to fold CLN025 in 10 500‐nssmt simulations at 277 K and Δt = 1.00 fssmt.45 The ability of FF14SB to fold CLN025 in the low‐mass simulations is attributed to the use of a long timestep size (Δt = 3.16 fslmt) in the low‐mass simulations, which is due to the equivalence of mass scaling and timestep‐size scaling as explained in INTRODUCTION. Because of this equivalence, the integration accuracy of a low‐mass simulation at Δt = 1.00 fssmt (viz., a standard‐mass simulation at Δt = 3.16 fssmt) can be assumed to be lower than that of a standard‐mass simulation at Δt = 1.00 fssmt. According to a theoretical analysis53 and a study with 160 submicrosecond or microsecond simulations to autonmously fold β‐hairpins at different Δts and different temperatures,46 Δt = 3.16 fslmt for low‐mass simulations (or 3.16 fssmt for standard‐mass simulations) is still below the integration step size that can cause fatal integration errors as long as the simulations are performed at a temperature of ≤340 K. Informed with this background information, to compare FF12MC with FF12SB/FF14SB, standard‐mass simulations with FF12SB/FF14SB and Δt = 1.00 fssmt were used for peptides and miniproteins. This was so that the integration accuracy of such simulations is higher than that of the low‐mass simulations with FF12MC and Δt = 3.16 fslmt. Low‐mass simulations with FF14SB and Δt = 3.16 fslmt were used only for proteins or in limited cases for miniproteins for direct comparison to low‐mass simulations with FF12MC and Δt = 3.16 fslmt.

Effect of Δt = 3.16 fssmt on quality of NPT MD simulations at a temperature of ≤340 K

As a measure of the integration accuracy or the quality of an MD simulation, ⟨ΔE 21/2/⟨ΔKE 21/2 is the ratio of the root mean square fluctuation of the total energy of the simulation system to the root mean square fluctuation of the kinetic energy of the system; the lower the ratio the higher the simulation quality.104, 105 Although Δt = 3.16 fssmt for the standard‐mass simulations (or Δt = 3.16 fslmt for the low‐mass simulations) is below the limit to cause serious integration errors for an NPT MD simulation that uses a thermostat to keep the temperature of the simulation system at a desired value (≤340 K) and remove the accumulated energy caused by integration errors from the system to the thermostat,46 Δt = 3.16 fssmt (or Δt = 3.16 fslmt) may still be too long and hence compromise the quality of the NPT simulation. To address this concern, the ⟨ΔE 21/2/⟨ΔKE 21/2 ratios were calculated from all NPT simulations described below to compare the integration accuracy of low‐mass NPT simulations using FF12MC and Δt = 3.16 fslmt to that of standard‐mass NPT simulations using FF12SB/FF14SB and Δt = 1.00 fssmt, noting that the ⟨ΔE 21/2/⟨ΔKE 21/2 ratios of the low‐mass microcanonical (NVE) MD simulations with FF12MC and Δt = 3.16 fslmt were not calculated because FF12MC is intended for low‐mass NPT MD simulations. It has been reported that all MD simulations carried out to validate FF14SB used Δt = 1.00 or 2.00 fssmt, a cutoff of 8.0 Å for nonbonded interactions, and the Particle Mesh Ewald method to calculate electrostatic interactions of two atoms at separations of >8.0 Å.16 If the same protocol were used to calculate nonbonded interactions and if the ⟨ΔE 21/2/⟨ΔKE 21/2 ratios of the low‐mass simulations using FF12MC and Δt = 3.16 fslmt were comparable to those of the standard‐mass simulations using FF12SB/FF14SB and Δt = 1.00 fssmt, it would be reasonable to suggest that Δt = 3.16 fssmt (or Δt = 3.16 fslmt) would not compromise the quality of the NPT MD simulations. Indeed, Supporting Information Table S7 shows that the ⟨ΔE 21/2/⟨ΔKE 21/2 ratios (mean ± SE) of all low‐mass NPT simulations using FF12MC at Δt = 3.16 fslmt range from 0.2405 ± 0.0004 to 0.3685 ± 0.0032, whereas the corresponding ratios of all standard‐mass NPT simulations using FF14SB at Δt = 1.00 fssmt range from 0.4096 ± 0.0007 to 0.5064 ± 0.0009. Further, the ranges of the ratio change to 0.2984 ± 0.0009–0.3501 ± 0.0042 and 0.4945 ± 0.0014–0.5011 ± 0.0013 for low‐mass NPT simulations using FF14SBlm, wherein FF14SBlm denotes FF14SB with all atomic masses uniformly reduced by tenfold, at Δt = 3.16 fslmt and standard‐mass NPT simulations using FF12MCsm at Δt = 1.00 fssmt, respectively. These data suggest that the use of Δt = 3.16 fssmt at a temperature of ≤340 K does not compromise the quality of a standard‐mass NPT MD simulation. However, it is not recommended to use Δt > 3.16 fssmt (such as Δt = 4.00 fssmt) at a temperature of ≤340 K without employing the hydrogen mass repartitioning scheme53, 54, 55 because the quality of an MD simulation under such conditions has not been thoroughly evaluated.

Reproducing experimental J‐coupling constants

J‐coupling constants of homopeptides

Although it is debatable whether an agreement between experimental and calculated J‐coupling constants may be used as an indicator of the goodness of fit of a forcefield,106 testing the ability of a forcefield to reproduce experimental J‐coupling constants has become part of a forcefield evaluation study.13, 14, 15, 16, 17 While the experimental J‐coupling constants of cationic homopeptide Ala5 were used in parameterizing FF12SB and FF14SB,16 no experimental J‐coupling constants of any cationic homopeptides or folded globular proteins were used to develop FF12MC. How well FF12MC can reproduce the experimental J‐coupling constants relative to those of FF12SB and FF14SB is important to the critical evaluation of FF12MC. This is because the removal of torsions involving a nonperipheral sp3 atom in FF12MC—a radical difference between FF12MC and general‐purpose AMBER forcefields—may impair the ability of FF12MC to reproduce the experimental J‐coupling constants. Accordingly, a J‐coupling constant calculation study was carried out to investigate the ability of FF12MC to reproduce the experimental J‐coupling constants of four cationic homopeptides (Ala3, Ala5, Ala7, and Val3) at pH 258 relative to those of FF12SB and FF14SB. Homopeptide Gly3 was excluded in this study because a limited data set was used in some of the Karplus parameterizations.58

In general, results derived from fewer than 20 simulations are considered unreliable.107, 108 Therefore, in this study 20 distinct and independent simulations at 300 K and 1 atm were carried out for each of the four homopeptides. The calculated main‐chain J‐coupling constants of each peptide—3 J(HN,Hα), 3 J(HN,C'), 3 J(Hα,C'), 3 J(C',C'), 3 J(HN,Cβ), 1 J(N,Cα), 2 J(N,Cα), 3 J(HN,Cα)—are listed in Supporting Information Table S8. Plotting the mean square deviation (χ 2) between experimental and calculated J‐coupling constants over logarithm of number of timesteps suggests that χ 2 values of all four peptides are converged after ten million timesteps for FF12MC, FF12SB, FF14SB (Supporting Information Fig. S2).

When the main‐chain J‐coupling constants were calculated using the original parameters of the Karplus equations (the Original parameter set in Eqs. S1–S862, 77, 78, 79), FF12SB and FF14SB reproduced the alanine constants better than FF12MC, whereas FF12MC reproduced the valine constants better than FF12SB and FF14SB (Table 1). Overall, the χ 2 values (mean ± SE) of FF12MC, FF12SB, and FF14SB are ≤1.34 ± 0.00, ≤1.71 ± 0.06, and ≤1.74 ± 0.03, respectively. The χ 2 values of FF12SB and FF14SB increased uniformly when alternative parameters of the Karplus equations (the Schmidt, DFT1, and DFT2 parameter sets in Supporting Information Table S9) were used to calculate the J‐coupling constants. For FF12MC, the χ 2 values increased uniformly only when the DFT1 parameter set was used in the calculation.

Doubling the simulation time for each of the 20 Val3 simulations using FF14SB did not reduce the χ 2 values (Supporting Information Table S10). Repeating the Val3 simulations using FF14SB and FF12MC with a cutoff of 9.0 Å for nonbonded interactions and the Particle Mesh Ewald method to calculate electrostatic interactions between atoms at separations of >9.0 Å resulted in χ 2s that were statistically identical to those with the cutoff of 8.0 Å (Supporting Information Fig. S2 and Supporting Information Table S10). The χ 2 values (mean ± SE) of FF14SB for Ala5 in this study (0.88 ± 0.04 for Original; 3.05 ± 0.05 for DFT1; 1.36 ± 0.03 for DFT2; Table 1) are consistent with the corresponding χ 2 values (0.89 ± 0.04 for Original; 2.71 ± 0.15 for DFT1; 1.22 ± 0.03 for DFT2) reported in Tables 1, 2, 3 of Ref. 16. The overall χ 2 values (mean ± SE) of FF12MC, FF12SB, and FF14SB are 1.37 ± 0.00, 1.82 ± 0.01, and 1.94 ± 0.01, respectively. These overall χ 2 values are reduced to 1.09 ± 0.00, 1.26 ± 0.01, and 1.33 ± 0.01, respectively, when the DFT1 dataset is excluded. These results show that FF12MC is on par with FF12SB and FF14SB in reproducing main‐chain J‐coupling constants of the four peptides, despite the removal of torsions involving a nonperipheral sp3 atom in FF12MC.

Table 2.

Mean Square Deviations and Root Mean Square Deviations Between Experimental and Calculated J‐Coupling Constants of Folded Globular Proteins Using the Original Parameters of the Karplus Equations

Protein Temp Type of J χ2 ± SE RMSD ± SE (Hz)
FF12MC FF14SB FF12MC FF14SB
GB3
298 K
Main‐chain 2.01 ± 0.02 1.09 ± 0.02 0.94 ± 0.00 0.66 ± 0.01
Side‐chain 59.0 ± 0.3 56.7 ± 0.1 2.32 ± 0.00 2.25 ± 0.00
Combined 18.78 ± 0.08 17.43 ± 0.04 1.49 ± 0.00 1.34 ± 0.00
BPTI
309 K
Main‐chain
Side‐chain 167.63 ± 0.07 159.4 ± 0.4 4.01 ± 0.00 3.91 ± 0.00
Combined 167.63 ± 0.07 159.4 ± 0.4 4.01 ± 0.00 3.91 ± 0.00
Ubiquitin
303 K
Main‐chain 4.0 ± 0.1 1.04 ± 0.02 1.18 ± 0.01 0.67 ± 0.01
Side‐chain 48.8 ± 0.2 36.9 ± 0.2 2.15 ± 0.00 1.84 ± 0.00
Combined 21.3 ± 0.1 14.95 ± 0.07 1.63 ± 0.01 1.26 ± 0.00
Lysozyme
308 K
Main‐chain 4.7 ± 0.1 1.34 ± 0.01 1.98 ± 0.02 1.05 ± 0.00
Side‐chain 149.4 ± 0.5 135.8 ± 0.2 4.10 ± 0.01 3.94 ± 0.00
Combined 104.4 ± 0.4 94.0 ± 0.2 3.58 ± 0.01 3.32 ± 0.00
Overall Combined 78.0 ± 0.1 71.5 ± 0.1 2.68 ± 0.00 2.46 ± 0.00
Overall Main‐chain 3.57 ± 0.05 1.16 ± 0.01 1.37 ± 0.01 0.79 ± 0.00

Temp, temperature; χ 2, mean square deviation; RMSD, root mean square deviation; SE, standard error; smt, standard‐mass time. The experimental and calculated J‐coupling constants are listed in Supporting Information Tables S11. The mean and standard error of each χ 2 or RMSD were obtained from 20 distinct and independent 316‐nssmt NPT MD simulations at Δt = 1.00 fssmt, 1 atm, and temperature specified in the table. The overall χ 2 or RMSD of a forcefield was obtained from averaging all four combined or main‐chain χ 2 values of the forcefield with an equal weight. The standard error of the overall χ 2 or RMSD was calculated using the standard method for propagation of errors of precision.

Table 3.

Radii of Gyration of Experimental and Simulated Structures of Folded Globular Proteins and Related Alpha Carbon Root Mean Square Deviations of Crystal Structures from the Corresponding NMR or Simulated Structures

Structure Temp (K) No of conformers CαRMSD (Å) RadGyr (Å)
mean SD SE mean SD SE
GB3
1IGD (X‐ray) ambient 1 10.70
2LUM (NMR) 298 1x60 0.80 11.03 0.06 0.01
FF12MC 297 20x1000 0.84 0.09 0.02 10.85 0.11 0.02
FF14SBlm 297 20x1000 0.89 0.09 0.02 10.97 0.11 0.02
FF12MC 297 20x3000 0.82 0.06 0.01 10.85 0.11 0.02
FF14SBlm 297 20x3000 0.86 0.05 0.01 10.97 0.11 0.02
BPTI
5PTI (X‐ray) ambient 1 11.29
1PIT (NMR) 309 1x20 1.18 11.37 0.07 0.02
FF12MC 309 20x1000 1.52 0.16 0.04 11.26 0.15 0.03
FF14SBlm 309 20x1000 0.89 0.15 0.03 11.48 0.10 0.02
Ubiquitin
1UBQ (X‐ray) ambient 1 11.63
1D3Z (NMR) 308 1x10 0.61 11.82 0.05 0.02
FF12MC 300 20x1000 1.54 0.32 0.07 11.66 0.15 0.03
FF14SBlm 300 20x1000 1.69 0.30 0.07 11.66 0.13 0.03
FF12MC 300 20x3000 1.69 0.31 0.07 11.68 0.20 0.04
FF14SBlm 300 20x3000 1.71 0.17 0.04 11.66 0.13 0.03
Lysozyme
4LZT (X‐ray) 295 1 14.03
1E8L (NMR) 308 1x50 1.55 14.13 0.06 0.01
FF12MC 308 20x1000 1.7 0.7 0.2 14.21 0.28 0.06
FF14SBlm 308 20x1000 0.55 0.09 0.02 14.25 0.11 0.02

CαRMSD, alpha carbon root mean square deviation between a crystal structure and an average NMR structure or an average structure obtained from 20 distinct and independent 316‐nssmt (for BPTI and lysozyme) or 948‐nssmt (for GB3 and ubiquitin) NPT MD simulations; RadGyr, average of all radii of gyration of NMR structures or instantaneous structures obtained from 20 distinct and independent NPT MD simulations; SD, standard deviation of CαRMSD or RadGyr calculated from 20 distinct and independent NPT MD simulations; SE, standard error of CαRMSD or RadGyr calculated from 20 distinct and independent NPT MD simulations.

J‐coupling constants of folded globular proteins

Before extending the J‐coupling constant calculation from short peptides to folded globular proteins, it is worth noting that the proton resonance broadening effect of the proteins is substantially greater than that of the peptides and all cross‐peaks involving this resonance are overlapped with other peaks. So ambiguity in assigning protein J‐coupling constants is inevitable. For example, there are two sets of J‐coupling constants of GB3, a 56‐residue protein with near‐perfect assignments of J‐coupling constants.17, 109 In Ref. 17, 3 J(Hα,Hβ2) and 3 J(Hα,Hβ3) are 3.99 and 2.13 for Asp22 and 7.15 and 7.92 for Gln35, respectively. In Ref. 109, 3 J(Hα,Hβ2) and 3 J(Hα,Hβ3) are 2.13 and 3.99 for Asp22 and 7.92 and 7.15 for Gln35, respectively. The discrepancies between these datasets that were independently compiled by two well‐respected groups underscore the challenge of assigning J‐coupling constants without ambiguity. It is also worthy of noting that experimental J‐coupling constants are averaged on a millisecond timescale,110 but MD simulations of folded globular proteins are currently limited to the sub‐microsecond or microsecond timescale. Despite these challenges, testing the ability of a forcefield to reproduce protein J‐coupling constants has also become part of a forcefield evaluation study.13, 14, 15, 16, 17 To compare FF12MC to FF14SB, the main‐chain and side‐chain J‐coupling constants of GB3, BPTI, ubiquitin, or lysozyme were calculated as functions of torsions ϕ, ψ, and χ that were determined from 20 316‐nssmt simulations using either of the two forcefields. All calculated J‐coupling constants of the four proteins are listed in Supporting Information Table S11.

When the Original parameter sets (Supporting Information Table S9) were used to calculate the main‐chain and side‐chain J‐coupling constants, FF12MC and FF14SB reproduced the protein J‐coupling constants with overall χ 2s (mean ± SE) of ≤78.0 ± 0.1 and ≤71.5 ± 0.1 for main‐chain and/or side‐chain constants of the four proteins and overall χ 2s (mean ± SE) of ≤3.57 ± 0.05 and ≤1.16 ± 0.01 for the main‐chain constants, respectively (Table 2). FF14SB performs markedly better than FF12MC in reproducing the main‐chain J‐coupling constants of folded globular proteins. According to the overall RMSDs between experimental and calculated constants of the four proteins, FF14SB also performs significantly better than FF12MC in reproducing the main‐chain J‐coupling constants (Table 2). The same conclusion could be reached when other parameter sets (Supporting Information Table S9) were used, although the overall χ 2s and RMSDs of the other parameter sets were larger than those of the Original parameter sets. Adding harmonic motion to the Karplus relation for spin‐spin coupling111 led to the same conclusion, although it slightly improved the χ 2s and RMSDs. These results demonstrate that FF14SB outperforms FF12MC in reproducing the J‐coupling constants of the four proteins (Table 2). Given the challenges in reproducing protein J‐coupling constants described above, the relatively poor performance of FF12MC was not sufficient to invalidate FF12MC. Nevertheless, it called for a further evaluation of the ability of FF12MC to simulate structure and dynamics of the four folded globular proteins.

Simulating folded globular protein structures

Radius of gyration and CαRMSD from crystal structure

Given the weak performance of FF12MC in reproducing main‐chain protein J‐coupling constants, it was reasonable to suspect that FF12MC might not be able to simulate structure and dynamics of folded globular proteins. To address this concern, 20 316‐nssmt simulations of GB3 were carried out using FF12MC or FF14SBlm. These simulations used the crystal structure of PDB ID 1IGD as the initial conformation and were performed at 297 K at which the NMR study was conducted for determining the Lipari‐Szabo order parameters.88 The average of 20,000 conformers of GB3 saved at 100‐pssmt intervals of the 20 simulations using FF12MC has a CαRMSD of 0.84 Å relative to the crystal structure, while the corresponding CαRMSD of FF14SBlm is 0.89 Å (Table 3). The mean, SD, and SE of the radius of gyration of the 20,000 GB3 conformers obtained from the 20 simulations using FF12MC are 10.85 Å, 0.11 Å, and 0.02 Å, respectively; while the corresponding ones of FF14SBlm are 10.97 Å, 0.11 Å, and 0.02 Å, respectively (Table 3). By comparison, the CαRMSD of the average of 60 NMR conformers is 0.80 Å; the mean, SD, and SE of the radius of gyration of the 60 NMR conformers are 11.03 Å, 0.06 Å, and 0.01 Å, respectively; the radius of gyration of the crystal structure is 10.70 Å. Extending these simulations from 316 nssmt to 948 nssmt yielded statistically the same results (Table 3). The time series of radius of gyration for the GB3 conformers derived from the 20 948‐nssmt simulations using FF12MC or FF14SBlm do not show any signs of unfolding (Supporting Information Fig. S3). Clearly, both FF12MC and FF14SBlm were able to maintain the experimentally determined GB3 structure in the 20 948‐nssmt simulations.

The GB3 simulations were repeated for ubiquitin using the same simulation conditions. These simulations were performed at 300 K that was used to spectroscopically determine the Lipari‐Szabo order parameters of ubiquitin.89 The results of these simulations showed that FF12MC and FF14SBlm were able to maintain the experimentally determined ubiquitin structure in the 20 948‐nssmt simulations (Table 3 and Supporting Information Fig. S3). The GB3 simulations were also repeated for BPTI and lysozyme using the same simulation conditions. However, these simulations were not extended beyond 316 nssmt because, unlike GB3 and ubiquitin, BPTI and lysozyme have multiple disulfide bonds restrain their folded conformations. These simulations were performed at 309 K and 308 K, which were used in the experimental determination of the Lipari‐Szabo order parameters of BPTI91 and lysozyme90, respectively. The results also show that FF12MC and FF14SBlm are able to maintain the experimentally determined ubiquitin structure in the 20 316‐nssmt simulations (Table 3 and Supporting Information Fig. S3). Interestingly, according to CαRMSDs (Table 3), the backbone conformations of BPTI and lysozyme in the FF14SBlm simulations are more restrained than those in the FF12MC simulations and those of the corresponding NMR structures. Taken together, the data in Table 3 and Supporting Information Figure S3 show that, despite its weakness in reproducing main‐chain J‐coupling constants of GB3, ubiquitin, and lysozyme, FF12MC is able to simulate the experimentally determined conformations of GB3, BPTI, ubiquitin, and lysozyme in sub‐microsecond NPT MD simulations.

Simulating local motions of folded globular proteins

Genuine localized disorders of BPTI and its mutant

To investigate the ability of FF12MC and FF14SB to simulate the experimentally observed localized structural variations, the BPTI simulations were analyzed in the context of the report that the C14–C38 disulfide bond of BPTI adopts both left‐ and right‐handed configurations112 in the NMR structure of PDB ID of 1PIT at 309 K61 (Fig. 5). Although C14–C38 of BPTI adopts the right‐handed configuration in three different crystal structures (PDB IDs of 4PTI, 5PTI, and 6PTI),61 the C14–C38 flipping observed in the NMR study was confirmed later by a crystal structure of a BPTI mutant at the data‐collection temperature of 290 K (PDB ID: 1QLQ).66 In this crystal structure, C14–C38 has the left‐handed configuration with an occupancy parameter of 0.38 and the right‐handed one with an occupancy parameter of 0.62. Further, the co‐existence of two configurations for C14–C38 was also observed in an ultrahigh‐resolution (0.86 Å) crystal structure of the same mutant at the data‐collection temperature of 100 K (PDB ID: 1G6X).113

Figure 5.

Figure 5

The right‐ and left‐handed configurations of C14–C38 observed in the NMR structure of BPTI and the crystal structure of a BPTI mutant. The PDB IDs of the NMR and crystal structures are 1PIT and 1QLQ, respectively.

For the 20 simulations of BPTI using FF12MC at 309 K with the initial conformation taken from a 1PIT conformer that adopts the right‐handed C14–C38 configuration, torsion cluster analysis showed that the most and the second most popular C14–C38 configurations over the first duration of 3.16 nssmt were right‐handed (occurrence of 38%) and left‐handed (occurrence of 31%), respectively (Supporting Information Table S12). This trend remained when the analysis was repeated with durations extended to 31.6 nssmt and 316 nssmt (Supporting Information Table S12). Repeating the BPTI simulations at 290 K using the initial conformation taken from a 1QLQ conformer that adopts the left‐handed C14–C38 configuration also showed the two most popular C14–C38 configurations to be right‐handed (occurrence of 22%) and left‐handed (occurrence of 16%) over the first 3.16‐nssmt duration. Extending these simulations to 31.6‐nssmt and 316‐nssmt yielded the same results except that the left‐handed one became most popular during the two longer durations (Supporting Information Table S12).

For FF14SBlm, the 1PIT simulations resulted in the right‐handed configuration being the sole configuration over the durations of 3.16 and 31.6 nssmt and showed a mix of the right‐handed configuration with an occurrence of 98% and the left one with an occurrence of 1% over the duration of 316 nssmt. The 20 simulations of 1QLQ using FF14SBlm under the same conditions as those for FF12MC showed that the right‐handed configuration was absent over the duration of 3.16‐nssmt and present with occurrences of 1% and 2% over the durations of 31.6 and 316 nssmt, respectively (Supporting Information Table S12). The results suggest that FF14SBlm has the ability to lock C14–C38 into the right‐handed configuration that was observed in the crystal structures of 4PTI, 5PTI, and 6PTI. These results also suggest that FF12MC has the ability to simulate the experimentally observed flipping between left‐ and right‐handed configurations for C14–C38 of BPTI and its mutant, presumably due to the removal of torsions involving a nonperipheral sp3 atom. These unique abilities prompted the following studies to further compare the ability of the two forcefields to simulate subtle localized structural variations.

The Lipari‐Szabo order parameters

The squared generalized order parameter (viz., the Lipari‐Szabo order parameters) of a protein can be interpreted as a measure of the spatial restriction of an N–H bond in the protein, with the order parameter being 0 indicating the highest degrees of motion and 1 implying no motion.65 The main theorem of the order parameter is that two stochastic processes of global and local motions are separable by at least an order of magnitude; the global motions such as the overall tumbling correlation time (τc) of a folded globular protein are on the timescale between a few nssmt and tens of nssmt, whereas the local motions such as the motions of backbone N–H bonds are on the order of tens or hundreds of pssmt.114 In the context of this timescale of local motions, multiple sets of 20 standard‐mass simulations that last up to 100 nssmt using FF12MCsm and FF14SB were performed for calculation of the Lipari‐Szabo order parameters of main‐chain N–H bonds extracted from 15N spin relaxation data (S2) of GB3,88 ubiquitin,89 lysozyme,90 and BPTI91 to compare the ability of the two forcefields to simulate subtle backbone motions of folded globular proteins. In this study, Δt = 1 fssmt was used for simulations that lasted for 100 nssmt, while Δt = 0.1 fssmt was used for simulations that lasted for 50–500 pssmt. The reason to use FF12MCsm/FF14SB and Δt = 0.1 fssmt was to ensure adequate sampling in a short simulation.

According to RMSDs between computed and experimental S2 parameters (Supporting Information Table S13), FF12MCsm reproduced the experimental parameters of all four proteins with RMSDs ± SEs ranging from 0.063 ± 0.005 to 0.074 ± 0.002 on the timescale of 50 pssmt. For FF12MCsm, the S2 RMSDs of GB3 are insensitive to simulation time (Supporting Information Table S13). However, the S2 RMSDs of the other proteins do progress in time, and FF12MCsm best reproduced the experimental parameters of those proteins on the timescale of 50 pssmt (Supporting Information Table S13). All S2 parameters calculated on the timescale of 50 pssmt are shown in Figure 3 with their SEs listed in Supporting Information Table S4. By comparison, FF14SB reproduced the experimental parameters on the timescale of 50 pssmt with RMSDs ± SEs ranging from 0.050 ± 0.002 to 0.074 ± 0.002, but it best reproduced the experimental data with RMSDs ± SEs ranging from 0.041 ± 0.003 to 0.061 ± 0.002 on the timescale of 4 nssmt (Supporting Information Table S13). The S2 RMSDs of FF14SB are generally less sensitive to simulation time than those of FF12MCsm (Supporting Information Table S13). Although the S2 simulations using FF12MCsm and FF14SB were performed for up to 100 nssmt, the best calculated S2 parameters using FF12MCsm and FF14SB were not obtained on timescales that are close to five times the τcs of the four proteins. This is partly because the stiffness of a protein exhibiting in the simulations using FF12MCsm or FF14SB differs from the stiffness using a forcefield—ff99SB_φψ(g24;CS)—that led to the five times τc recommendation for best S2 estimation.115

Because the experimental S2 parameters were extracted from the 15N spin relaxation data on the picosecond timescale and the premise that the global and local motions are separable by at least an order of magnitude, the results of the nanosecond simulations suggest that FF12MC is on par with FF14SB in reproducing the experimental S2 parameters of GB3, ubiquitin, lysozyme, and BPTI on the timescale of 50 pssmt (Fig. 3), although FF14SB better reproduces the experimental values than FF12MC on the timescale of 4 nssmt that is in the range of the τcs (2.0–5.7 nssmt) of the four proteins88, 90, 116, 117 (Supporting Information Table S13). These results also prompted the following confirmation study on crystallographic B‐factors that are akin to the S2 parameters.

Crystallographic B‐factors

As a measure of the uncertainty of the atomic mean position, the crystallographic B‐factor of a given atom reflects the displacement of the atom from its mean position in a crystal structure and this displacement attenuates X‐ray scattering and is caused by both thermal motion of the atom and static disorder of the atom in a crystal lattice.64, 118, 119, 120, 121, 122 Despite the challenges of separating the component of the thermal motion in time from the component of the disorder in space,123 crystallographic B‐factors can often be used to quantitatively identify less mobile regions of crystal structures as long as the structures were determined without substantial crystal lattice defects, rigid‐body motions, and refinement errors.124, 125 A low B‐factor indicates a small degree of motion, while a high B‐factor may imply a large degree of motion.

In this context, to further evaluate the ability of FF12MC to simulate subtle thermal motions of a crystalline protein relative to that of FF14SB, simulated B‐factors were obtained from atomic positional fluctuations that were calculated from 20 simulations of a folded globular protein in its solution state on the picosecond scale using FF12MCsm or FF14SB. Although simulations of proteins in their crystalline states126, 127 can offer better and direct comparisons to the experimental data, simulations of proteins in the solution state were done in this study because the crystalline‐state simulations are more computationally demanding than the solution‐state simulations due to the larger size and slower convergence127 of the crystalline system. Further, in a reported study FF14SB was the best at reproducing experimental structural and dynamic properties among all four contemporary forcefields of FF99SB, FF14SB, FF14ipq, and CHARMM26.127 Direct comparison of FF12MC with FF14SB for their performances in the solution‐state simulations can offer an insight into the ability of FF12MC to reproduce crystallographic B‐factors.

Accordingly, the simulations for the S2 calculations were repeated at different temperatures. For GB3, BPTI, and ubiquitin, all simulations were performed at ambient temperature of 297 K because the exact data‐collection temperatures of these proteins had not been reported. The lysozyme simulations were done at the reported data‐collection temperature of 295 K.128 According to RMSDs between computed and experimental B‐factors (Supporting Information Table S14), on the timescale of 50 pssmt, FF12MCsm best reproduces the experimental Cα and Cγ B‐factors of all four proteins with RMSDs ± SEs ranging from 3.2 ± 0.2 to 8 ± 1 Å2 (average RMSD ± SE of 5.1 ± 0.3 Å2) and from 7.8 ± 0.8 to 9.9 ± 0.7 Å2 (average RMSD ± SE of 9 ± 2 Å2), respectively. On the timescale of 50 pssmt, FF14SB also best reproduces the experimental Cα and Cγ B‐factors of all four proteins with RMSDs ± SEs ranging from 3.7 ± 0.1 to 9 ± 1 Å2 (average RMSD ± SE of 6.2 ± 0.3 Å2) and from 8.5 ± 0.3 to 10.3 ± 0.2 Å2 (average RMSD ± SE of 9.1 ± 0.5 Å2), respectively. For both FF12MC and FF14SB, the B‐factor RMSDs of the BPTI and ubiquitin progress more in time than those of GB3 and lysozyme (Supporting Information Table S14). All Cα B‐factors calculated on the timescale of on the timescale of 50 pssmt are shown in Figure 4 with their standard errors listed in Supporting Information Table S5.

These results show that FF12MC is on par with FF14SB in reproducing the crystallographic B‐factors of the four proteins (Fig. 4). The results also demonstrate that FF12MC and FF14SB best reproduce the crystallographic B‐factors on the timescale of 50 pssmt. This timescale corroborates the finding that FF12MC best reproduces the experimental S2 parameters on the timescale of 50 pssmt, suggesting that the calculated S2 parameters and B‐factors on the 50‐pssmt timescale from the simulations using FF12MC and FF14SB may capture the true thermal fluctuations of folded globular proteins.

How well FF12MC is trained to fold AAQAA

Given the encouraging results of FF12MC relative to those of FF14SB in all the afore‐described studies except the main‐chain protein J‐coupling constant calculation, this FF12MC evaluation study turned to examining the ability of FF12MC to autonomously fold a short helical peptide AAQAA relative to those of FF12SB and FF14SB. Because AAQAA has been widely used for folding research and forcefield refinement,12, 13, 84, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140 multiple sets of 20 simulations using FF12MC, FF12SB, and FF14SB to fold AAQAA were first carried out to determine how well FF12MC is trained—with a simple adjustment of two backbone scaling factors—to fold AAQAA relative to FF12SB and FF14SB.

Before describing the folding result of AAQAA, it is worth noting that AAQAA is not a typical α‐helix peptide that exists in a mix of full α‐helix, full coil, and central helices with frayed ends for at least three reasons. First, a small percentage of AAQAA was found to intermittently adopt conformations with an α‐helix component at the N‐terminus and a π‐helix component in a region near the C‐terminus,129 wherein the π helix is the 4.416‐helix found in 15% of known protein structures.141 Second, a study using a polarizable forcefield revealed a cooperative folding process in which the helical conformation is propagated throughout AAQAA once it is nucleated.140 Third, the NMR‐derived residue distribution of helical content of AAQAA51 could not be predicted by the traditional Lifson‐Roig model,142, 143, 144 a statistical mechanical model for theoretical prediction of the mean helical content (viz., fractional helicity) of a typical α‐helix peptide. Side‐chainside‐chain and side‐chainmain‐chain interactions had to be included in the traditional Lifson‐Roig model to correctly simulate the experimentally observed residue distribution of helical content of AAQAA.51

Therefore, no attempt was made to compute the Lifson‐Roig parameters of AAQAA in the present study. As described in METHODS, the fractional helicity of AAQAA was estimated from torsions ϕ and ψ using a simple protocol that is based on local α‐helix content of four consecutive residues. Because of a considerable overlap of the ϕ and ψ torsions between α‐ and π‐helices76 and the subjective nature of defining the torsion ranges of ϕ and ψ for identification of 310‐, α‐, and π‐helices, no attempt was made to include all components of the three helices in the fractional helicity estimation.

To substantiate the torsion‐based protocol, the mean helix content was also estimated from the global α‐helix content of AAQAA according to CαβRMSD from the most populated helical conformation (see METHODS). This alternative is reasonable as long as the population of the second most populated conformation of AAQAA is substantially smaller than the population of the most populated helical conformation. If the fractional helicity derived from the first protocol is slightly higher than the one derived from the second protocol, both protocols are considered to be reasonable. Indeed, according to the cluster analysis of 20 3.16‐μssmt simulations of AAQAA using FF12MC at 274 K (Supporting Information Table S3), the representative, instantaneous conformation in the largest cluster of AAQAA at 274 K is a fullα‐helix conformation [Fig. 2(A)] with a population of 41.7% (Supporting Information Table S3). The second most populated conformation at 274 K has Ala1Ala5 adopting an α‐helix, Ala2Ala7 adopting a hybrid between α‐helix and π helix, Ala3Ala9 adopting a π helix, and Ala6Ala15 adopting a π helix [Fig. 2(B)]. This conformation has a population of 8.0% (Supporting Information Table S3). The third most popular conformation at 274 K is a partialα‐helix conformation with frayed residues of Ac, Ala1, and NH2 [Fig. 2(C)], and this conformation has a population of 3.5% (Supporting Information Table S3). The populations of the three most popular conformations decreased to 18.9%, 5.8%, and 3.0% at 300 K and 14.4%, 5.6%, and 2.1% at 310 K, respectively (Supporting Information Table S3), but the rank orders of these populations at 300 K and 310 K are the same as the one at 274 K. These results support the use of the alternative protocol to estimate the mean helix content of AAQAA and the use of the most popular fullα‐helix conformation [Fig. 2(A)] as the native conformation of AAQAA for the autonomous folding study described below.

According to the smoothed time series of CαβRMSD from the native conformation, the aggregated native state populations, and the estimated folding times of AAQAA at different temperatures (Table 4 and Supporting Information Fig. S1A–C), FF12MC, FF12SB, and FF14SB can autonomously fold AAQAA from a fully extended backbone conformation to the native conformation and simulate subsequent unfolding and refolding in all (for FF12MC) or some (for FF12SB and FF14SB) of 20 simulations.

Table 4.

Folding of a Helical Peptide, Hairpins, and a Miniprotein Trp‐Cage in Isothermal–Isobaric Molecular Dynamics Simulations at 1 atm

Sequence Temperature (K) Aggregated simulation time (μssmt) Aggregated native state population (%) Estimated folding time (nssmt)
mean SD SE mean LCL UCL Event
FF12MC
Chignolin 277 20 × 3.16 47 11 2 153 99 237 20
Chignolin 300 20 × 3.16 33 10 2 79 51 123 20
CLN025 277 20 × 3.16 70 15 3 433 279 671 20
CLN025 300 20 × 3.16 63 13 3 174 112 270 20
AAQAA 274 20 × 3.16 41 8 2 189 122 293 20
AAQAA 300 20 × 3.16 18 3 1 143 92 221 20
AAQAA 310 20 × 3.16 14 3 1 92 59 142 20
Trp‐cage 280 30 × 8.848 18 8 1 1998 1396 2860 30
Chignolin 277 20 × 1.00 40 18 4 153 99 237 20
CLN025 277 20 × 1.00 41 31 7 446 281 708 18
AAQAA 274 20 × 1.00 38 13 3 189 122 293 20
FF12SB
Chignolin 277 20 × 1.00 3 7 2 871 506 1500 13
CLN025 277 20 × 1.00 4 10 2 4
AAQAA 274 20 × 1.00 2 7 2 1287 712 2326 11
AAQAA 300 20 × 1.00 5 6 1 416 265 651 19
AAQAA 310 20 × 1.00 4 3 1 250 159 391 19
FF12SBlm
CLN025 277 20 × 3.16 22 29 6 3328 1889 5863 12
FF14SB
Chignolin 277 20 × 1.00 19 30 7 550 342 886 17
CLN025 277 20 × 1.00 7 14 3 1012 600 1708 14
AAQAA 274 20 × 1.00 4 5 1 1224 677 2213 11
FF14SBlm
CLN025 277 20 × 3.16 38 30 7 1366 860 2170 18

SD, standard deviation; SE, standard error; LCL, lower 95% confidence limit; UCL, upper 95% confidence limit; smt, standard‐mass time; event, the number of simulations that captured a folding event.

For the 20 3.16‐μssmt low‐mass simulations of AAQAA using FF12MC at Δt = 1.00 fssmt, the aggregated native state populations (viz., the α‐helix populations in mean ± SD) are 41 ± 8% at 274 K, 18 ± 3% at 300 K, and 14 ± 3% at 310 K (Table 4). These populations are slightly smaller than the estimated fractional helicities (mean ± SD) of 55 ± 6% at 274 K, 35 ± 3% at 300 K, and 29 ± 4% at 310 K (Table 5). Both the α‐helix populations and the estimated fraction helicities are in reasonable agreement with the experimentally observed fractional helicities (mean ± SD) of 50.6 ± 0.4% at 274 K, 20.8 ± 0.4% at 300 K, and 13.5 ± 0.4% at 310 K (Table 5). The τfs of AAQAA predicted from the 20 simulations using FF12MC are 189 nssmt (95% CI: 122293 nssmt) at 274 K, 143 nssmt (95% CI: 92221 nssmt) at 300 K, and 92 nssmt (95% CI: 59142 nssmt) at 310 K, respectively (Table 4). These small SDs and narrow 95% CIs of the 20 simulations relative to the means suggest that the simulations using FF12MC are converged. This convergence is supported by the smoothed time series of CαβRMSD (Supporting Information Fig. S1A) showing that all simulations captured the most popular fullα‐helix conformation.

Table 5.

Mean Fractional Helicity of Ac‐(AAQAA)3‐NH2 Estimated from NMR Data and MD Simulations

Temperature (K) Mean fractional helicity ± standard deviation (%)
FF14SBa FF12SBa FF12MCa NMRb
274 7 ± 6 6 ± 6 55 ± 6 50.6 ± 0.4
300 9 ± 6 35 ± 3 20.8 ± 0.4
310 8 ± 2 29 ± 4 13.5 ± 0.4
a

Estimated from torsions ϕ and ψ.

b

Estimated from NMR data. Twenty distinct, independent, and one‐billiontime‐step molecular dynamics simulations were performed for each forcefield at each temperature.

For the 20 1.00‐μssmt standard‐mass simulations of AAQAA using FF12SB at Δt = 1.00 fssmt, the aggregated native state populations (mean ± SD) are 2 ± 7% at 274 K, 5 ± 6% at 300 K, and 4 ± 3% at 310 K (Table 4). The corresponding population of FF14SB is 4 ± 5% at 274 K (Table 4). Very low fractional helicities of AAQAA were also observed in the simulations using FF12SB and FF14SB (Table 5). The τfs of AAQAA predicted from the 20 simulations using FF12SB are 1287 nssmt (95% CI: 7122326 nssmt) at 274 K, 416 nssmt (95% CI: 265651 nssmt) at 300 K, and 250 nssmt (95% CI: 159391 nssmt) at 310 K, respectively (Table 4). The τf of AAQAA predicted from the 20 simulations using FF14SB is 1224 nssmt (95% CI: 6772213 nssmt) at 274 K (Table 4). These large SDs and wide 95% CIs relative to the means show that an aggregated simulation time of 20 μssmt is inadequate to estimate the native state populations, mean helix content, or folding times from the simulations using FF12SB and FF14SB. This conclusion is consistent with the smoothed time series of CαβRMSD showing that only some of 20 simulations captured the fullα‐helix conformation (Table 4 and Supporting Information Fig. S1B and C).

As explained above, FF12MC was used in the low‐mass simulations at Δt = 1.00 fssmt (viz., Δt = 3.16 fslmt), whereas FF12SB and FF14SB were used in the standard‐mass simulation at Δt = 1.00 fssmt. This was so that FF12SB and FF14SB were evaluated with higher integration accuracy than the accuracy used for FF12MC. Therefore, the results of the 20 1.00‐μssmt standard‐mass simulations using FF12SB or FF14SB should be compared to those of 20 1.00‐μssmt low‐mass simulations using FF12MC. As listed in Table 4, the aggregated native state population (mean ± SD: 38 ± 13%) of 20 1‐μssmt low‐mass simulations of AAQAA at 274 K using FF12MC is significantly higher than that (mean ± SD: 2 ± 7% or 4 ± 5%) of the 20 1‐μssmt standard‐mass simulations using FF12SB or FF14SB, respectively (Table 4). The τf (189 nssmt) of 20 1‐μssmt low‐mass simulations at 274 K using FF12MC is also significantly shorter than that (1287 nssmt or 1224 nssmt) of the 20 1‐μssmt standard‐mass simulations using FF12SB or FF14SB. The two sets of simulations with equal aggregated simulation times show that FF12SB and FF14SB cannot fold AAQAA as fast as FF12MC. This is hardly surprising because FF12SB and FF14SB were not benchmarked against AAQAA,16 but it shows that by a simple adjustment of two backbone scaling factors FF12MC is well trained to autonomously fold AAQAA.

Folding, unfolding, and refolding of β‐hairpins

FF12MC, FF12SB, and FF14SB were then tested for their ability to autonomously fold β‐hairpins of chignolin and CLN025. According to the smoothed time series of CαβRMSD from the native conformations, the aggregated native state populations, and the estimated folding times (Table 4 and Supporting Information Fig. S1D–F), all three forcefields can fold the two β‐hairpins from fully extended backbone conformations to their native conformations and simulate subsequent unfolding or partial unfolding and refolding in all (for FF12MC) or some (for FF12SB and FF14SB) of 20 1.00‐μssmt simulations at Δt = 1.00 fssmt and 277 K. Cluster analysis showed that the average chignolin conformation of the largest cluster identified from the simulations using FF12MC, FF12SB, or FF14SB had a CRMSD from the first model of the NMR structure67 of 1.62 Å, 3.32 Å, or 1.33 Å, respectively [Fig. 1(C–E)]. The CRMSD of the average chignolin conformation of the second largest cluster of FF12SB was 1.54 Å (Supporting Information Table S3). When compared to the NMR structure of CLN025,68 the average CLN025 conformation of the largest cluster of the simulations using FF12MC, FF12SB, or FF14SB had a CRMSD of 1.72 Å, 1.76 Å, or 1.70 Å, respectively [Fig. 1(H–J)]. When compared to the crystal structure of CLN025,68 the CRMSDs increased to 2.68 Å, 2.59 Å, and 2.51 Å, respectively. These CRMSDs show that all three forcefields can fold CLN025 in water to conformations that resemble more the solution structure than the crystalline structure.

Despite the removal of torsions that involve a nonperipheral sp3 atom in FF12MC and the use of Δt = 3.16 fslmt for FF12MC and Δt = 1.00 fssmt for FF14SB, the CRMSD between two average CLN025 conformations of the most populated clusters derived from the simulations using FF14SB and FF12MC was 0.45 Å, whereas the CRMSD between the NMR and crystal structures of CLN025 was 3.15 Å. In addition, the aggregated native state populations (mean ± SD) of chignolin and CLN025 in the 20 3.16‐μssmt low‐mass simulations at 277 K using FF12MC were 47 ± 11% and 70 ± 15%, respectively (Table 4). The corresponding populations reduced to 33 ± 10% and 63 ± 13%, respectively, when the simulation temperature increased to 300 K (Table 4). These significant differences are consistent with the experimental study showing that CLN025 is more stable than the parent protein chignolin.68 The small SDs of the 20 simulations relative to the means of the native state populations suggest that the simulations using FF12MC are converged. The convergence is supported by the smoothed time series of CαβRMSD (Supporting Information Fig. S1D) showing that all 20 simulations using FF12MC captured the folding of chignolin or CLN025. The τfs of chignolin and CLN025 predicted from the 20 simulations at 277 K using FF12MC were 153 nssmt (95% CI: 99237 nssmt) and 433 nssmt (95% CI: 279671 nssmt), respectively (Table 4). Furthermore, the τf of CLN025 estimated from the 20 simulations using FF12MC was 174 nssmt (95% CI of 112270 nssmt) at 300 K (Table 4). This τf —obtained by using the Kaplan‐Meier estimator without any prior knowledge of the hazard function for the nonnative state population of CLN025—agrees with the experimental study showing that CLN025 folds with a τf of approximately 100 nssmt.70 This agreement suggests that FF12MC might have adequately sampled nonnative states of CLN025 in the 20 simulations at 300 K, which is consistent with the unique ability of FF12MC to simulate the genuine disorder of C14–C38 in BPTI and its mutant.

The aggregated native state populations (mean ± SD) of chignolin and CLN025 in the 20 1.00‐μssmt simulations at Δt = 1.00 fssmt and 277 K using FF12SB (or FF14SB) were 3 ± 7% (or 19 ± 30%) and 4 ± 10% (or 7 ± 14%), respectively (Table 4). The τf of chignolin predicted from the 20 simulations using FF12SB was 871 nssmt (95% CI: 5061500 nssmt) at 277 K (Table 4). The τf of CLN025 at 277 K for FF12SB could not be estimated with confidence because more than half of the 20 simulations did not capture a folding event. The τfs of chignolin and CLN025 predicted from the 20 simulations using FF14SB at 277 K were 550 nssmt (95% CI: 342886 nssmt) and 1012 nssmt (95% CI: 6001708 nssmt), respectively (Table 4). These large SDs and wide 95% CIs relative to the means indicate poor convergence of the simulations using FF12SB and FF14SB. This poor convergence is consistent with the number of simulations that captured a folding event listed in Table 4 and the smoothed time series of CαβRMSD in Supporting Information Figure S1E and F showing that some simulations did not capture the native conformations.

As listed in Table 4, the aggregated native state populations (mean ± SE) of chignolin and CLN025 obtained from 20 1.00‐μssmt low‐mass simulations at 277 K using FF12MC at Δt = 1.00 fssmt were 40 ± 4% and 41 ± 7%, respectively; the τfs of chignolin and CLN025 estimated from the 20 1.00‐μssmt low‐mass simulations at 277 K using FF12MC were 153 nssmt (95% CI: 99237 nssmt) and 446 nssmt (95% CI: 281708 nssmt), respectively. By comparison, the aggregated native state populations (mean ± SE) of chignolin and CLN025 obtained from 20 1.00‐μssmt standard‐mass simulations at 277 K using FF12SB (FF14SB) at Δt = 1.00 fssmt were 3 ± 2% (19 ± 7%) and 4 ± 2% (7 ± 3%), respectively; the τfs of chignolin estimated from the 20 1.00‐μssmt standard‐mass simulations at 277 K using FF12SB and FF14SB were 871 ns smt (95% CI: 5061500 nssmt) and 550 ns smt (95% CI: 342886 nssmt), respectively; the corresponding τf of CLN025 for FF14SB was 1012 ns smt (95% CI: 6001708 nssmt).

Further, 20 3.16‐μssmt low‐mass simulations at Δt = 1.00 fssmt and 277 K were performed to fold CLN025 using FF12SBlm or FF14SBlm, wherein FF12SBlm denotes FF12SB with all atomic masses reduced uniformly by tenfold. The resulting populations (mean ± SE) of CLN025 (22 ± 6% for FF12SBlm and 38 ± 7% for FF14SBlm) derived from the 20 3.16‐μssmt low‐mass simulations are still significantly lower than the corresponding one (70 ± 3%) for FF12MC (Table 4). The resulting τfs of CLN025 at 277 K (3328 nssmt and 95% CI: 18895863 nssmt for FF12SBlm; 1366 nssmt and 95% CI: 8602170 nssmt for FF14SBlm) are also substantially longer than that (433 nssmt and 95% CI: 279671 nssmt) estimated from the 20 3.16‐μssmt low‐mass simulations using FF12MC (Table 4). These results show that FF12MC can indeed fold the two β‐hairpins with folding times that are both shorter than those using FF12SB and FF14SB and closer to the experimental values.

Folding, unfolding, and refolding of an α‐miniprotein

To evaluate the ability of FF12MC to fold an α‐miniprotein that is larger than the β‐hairpins, autonomous folding simulations of a 20‐residue Trp‐cage (the TC10b sequence69) were carried out at 280 K at which the NMR structure of TC10b was determined. FF12SB and FF14SB were not included in this computationally demanding study because as noted above these forcefields fold chignolin, CLN025, and AAQAA at much slower rates than FF12MC. According to the smoothed time series of CαβRMSD from the native conformation, the aggregated native state population, and the estimated folding time (Table 4 and Supporting Information Fig. S1G), FF12MC can fold the Trp‐cage from a fully extended backbone conformation to its native conformation and simulate subsequent unfolding and refolding in all 30 8.848‐μssmt low‐mass simulations at Δt = 1.00 fssmt and 280 K, with (i) an aggregated native state population (mean ± SD) of 18 ± 8% (Table 4), (ii) a τf of 1998 nssmt (95% CI: 13962860 nssmt) at 280 K (Table 4), and (iii) a CRMSD of 1.53 Å between the first NMR model and the average conformation of the largest cluster identified from the Trp‐cage simulations [Fig. 1(O)]. More importantly, the τfs of 1998 nssmt at 280 K for the Trp‐cage (TC10b)—obtained by using the Kaplan‐Meier estimator without any prior knowledge of the hazard function for the nonnative state population of the miniprotein—is consistent with the experimentally determined τf of 1430 nssmt at 300 K.71 Plotting the natural logarithm of the nonnative state population versus time‐to‐folding from nonnative states to the native state of the Trp‐cage reveals a linear relationship with r2 of 0.9408 (Fig. 6), indicating an exponential decay of the nonnative state population of the Trp‐cage over simulation time. This exponential decay is in excellent agreement with the experimental observation that the folding of Trp‐cage follows a two‐state kinetics scheme.71 These results show that FF12MC can fold the Trp‐cage from scratch with high accuracy in the 30 simulations at 280 K. Further, the results demonstrate that FF12MC can capture the two‐state kinetics scheme as the major folding pathway of the Trp‐cage with an estimated τf that is consistent with the experimental value.

Figure 6.

Figure 6

Plot of the natural logarithm of the nonnative state population of the Trp‐cage (TC10b) over time‐to‐folding. The individual folding times were taken from the data provided in Supporting Information Figure S1G. The linear regression analysis was performed using the PRISM 5 program.

Refining CASPR models TMR01, TMR04, and TMR07

Consistent with the need to use statistically‐derived knowledge‐based potentials for refining comparative models of protein structures,31, 37, 39, 145, 146, 147, 148, 149 the accuracy of the physics‐based forcefield—such as those in the form of Eq. (1)—has been suggested to be the primary factor limiting the simulation‐based comparative model refinement.41 This inspired a simulation‐based refinement study of comparative models of monomeric globular proteins to compare FF12MC with FF14SB and FF9657 for their ability to generate conformations that cluster around the native conformation of a test protein. While better refinement can be achieved by performing restricted MD simulations,28, 32, 41, 44 unrestricted and unbiased NPT MD simulations were performed in this study because of its objective to evaluate the effectiveness of a forcefield rather than a refinement protocol.

This refinement study used models TMR01, TMR04, and TMR07 of the first CASPR experiment. Four other models of the experiment were excluded for the following reasons. TMR02 and TMR03 are in the monomeric form, but their crystal structures are in multimeric forms (PDB IDs: 1VM0 and 1VLA). A calcium ion is missing in TMR05 but present in the corresponding crystal structure (PDB ID: 1TVG). TMR06 has a Val1Met mutation and deletion of residues from −8 to 0 relative to the corresponding crystal structure (PDB ID: 1XG8). The refinement studies of TMR01, TMR04, and TMR07 by replica‐exchange MD simulations using a physics‐based forcefield GBSW and a knowledge‐based function RAPDF/HBEM have been reported.32, 37 These studies serve as valuable benchmarks for the present study.

FF96 was chosen because of the insight this early forcefield can offer into how much improvement the AMBER forcefield has made from FF96 to FF14SB and FF12MC over the past two decades. It was chosen also because the refinement of TMR01 made by this author using FF96 and the low‐mass sampling enhancement technique earned a top score (ΔCαRMSD of −2.853 Å) in the CASPR experiment in 2006 (see Supporting Information Note S1 for the CASPR organizers' assessment). To justify its use, FF12MC must perform substantially better in refining TMR01 than FF96lm, wherein FF96lm denotes FF96 with all atomic masses reduced uniformly by tenfold.

In this refinement study, each refined model was assessed by quality scores (QSs) of GDT‐HA,95 GDC‐all,96 RPF,97 and LDDT.98 These QSs were used for the assessment of comparative model refinement of CASP10.97 However, the reported refinement studies32, 37 of TMR01, TMR04, and TMR07 used sseRMSD, CαRMSD, and GDT‐TS. To facilitate comparison, sseRMSD, CαRMSD, and GDT‐TS were also included in the present study. While sseRMSD, CαRMSD, GDT‐TS, GDT‐HA, and GDC‐all are five QSs based on global alignment, RPF and LDDT are two QSs based on local alignment. The SphereGrinder score99 and the CAD score100 were therefore included to balance the local‐alignment scores with the global‐alignment scores. Hereafter, RPF9 and LDDT15 denote the RPF and LDDT scores that were calculated with a distance cutoff of 9.0 Å and 15.0 Å, respectively; SG2n6 denotes the SphereGrinder score that was calculated with an all‐atom RMSD cutoff of 2.0 Å and a sphere radius of 6.0 Å. The MolProbity score150 was excluded in the model assessment of CASP10 because a perfect α‐helix prediction can have an excellent MolProbity score even though the experimental structure is a β‐stand.97 Therefore, the MolProbity score was excluded in this study.

As shown in Figure 7, relative to the crystal structure (PDB ID: 1XE1), the CαRMSD, GDT‐HA, and SG2n6 scores of TMR01 are 6.1 Å, 0.593, and 0.495, respectively (Table 6). These poor QSs are mainly due to large conformational differences at the N‐terminus (residues 18–25 of 1XE1) and three loops (residues 35–42 of 1XE1 for Loop 1; residues 59–64 of 1XE1 for Loop 2; residues 89–97 of 1XE1 for Loop 3). Refining TMR01 using 20 316‐nssmt low‐mass simulations at 340 K and Δt = 1.00 fssmt with FF12MC substantially improved the CαRMSD, GDT‐HA, and SG2n6 scores to 1.4 Å, 0.797, and 0.766, respectively (Table 6). The refinement using FF14SBlm and the same simulation conditions of FF12MC improved the CαRMSD, GDT‐HA, and SG2n6 scores to 3.0 Å, 0.717, and 0.663, respectively. Under the same simulation conditions, FF96lm also considerably improved the CαRMSD, GDT‐HA, and SG2n6 scores of TMR01 to 3.9 Å, 0.712, and 0.629, respectively (Table 6). Using the same refinement protocol as the one for TMR01, all three forcefields considerably improved all nine QSs of TMR04 and TMR07 except that FF12MC and FF96lm slightly increased CαRMSD from 2.2 Å to 2.4 Å and 2.7 Å, respectively, for TMR07 (Fig. 7 and Table 6). An increase of CαRMSD to 2.7 Å was also observed in the TMR07 refinement by GBSW (Table 6). The performance differences among the three forcefields for TMR04 and TMR07 are not as large as those for TMR01. This is because the refinement of TMR01 involves a much larger conformational change than those involved in the refinement of TMR04 and TMR07, as indicated by the respective CαRMSDs of Table 6.

Figure 7.

Figure 7

Overlays of three CASPR crystal structures with unrefined and refined models. The Protein Data Bank IDs of the crystal structures of TMR01, TMR04, and TMR07 are 1XE1, 1WHZ, and 1O13, respectively. Each refined model is the average conformation of the largest cluster of 20 unbiased, unrestricted, distinct, independent, and 316‐nssmt NPT MD simulations of a CASPR model at Δt = 1.00 fssmt and 340 K using FF12MC, FF14SBlm, or FF96lm.

Table 6.

Quality Scores for Refining Three CASPR Models by Five Different Forcefields

Model Refinement sseRMSD(Å) CαRMSD(Å) GDT‐TS GDT‐HA GDC‐all RPF9 LDDT15 SG2n6 CAD
None 1.3 6.1 0.772 0.593 0.491 0.690 0.631 0.495 0.609
RAPDF/HBEM 0.9
TMR01 GBSW 3.9 0.835
FF12MC 0.7 1.4 0.920 0.797 0.820 0.851 0.791 0.766 0.693
FF14SBlm 1.1 3.0 0.849 0.717 0.689 0.805 0.750 0.663 0.656
FF96lm 1.4 3.9 0.854 0.712 0.653 0.776 0.723 0.629 0.642
None 1.8 2.2 0.743 0.543 0.637 0.667 0.603 0.300 0.626
RAPDF/HBEM 0.8
TMR04 GBSW 1.6 0.900
FF12MC 0.6 1.5 0.932 0.811 0.793 0.800 0.762 0.831 0.687
FF14SBlm 0.8 1.1 0.939 0.818 0.838 0.805 0.776 0.802 0.683
FF96lm 0.7 1.6 0.921 0.771 0.776 0.777 0.744 0.790 0.664
None 1.9 2.2 0.766 0.556 0.668 0.686 0.618 0.383 0.590
RAPDF/HBEM 2.1
TMR07 GBSW 2.7 0.810
FF12MC 1.5 2.4 0.846 0.680 0.732 0.815 0.762 0.777 0.694
FF14SBlm 1.2 1.8 0.832 0.654 0.753 0.762 0.713 0.620 0.689
FF96lm 1.6 2.7 0.872 0.710 0.719 0.793 0.765 0.724 0.711

To rank the performances of FF12MC, FF14SBlm, FF96lm, RAPDF/HBEM, and GBSW in refining TMR01, TMR04, and TMR07, this study used two standardization protocols (classical and robust Z scores) that were used to rank the model refinement groups of CASP experiments 9 and 10.97, 151 For each of the three CASPR models refined by M number of forcefields, a QS‐specific Z score was calculated for each of N number of QSs. Averaging all N QS‐specific Z scores of each model with an equal weight gave rise to a model‐specific Z score. Averaging all three model‐specific Z scores of each forcefield with an equal weight gave rise to a forcefield‐specific Z score (ZF). The classical SD‐based Z score was calculated according to Eq. (2). To minimize the influence of “outliers,” the robust Z score that is based on median absolute deviation about the median152 was calculated according to Eq. (3), wherein med(QSi ,M) is the median of {QSi ,1, QSi ,2, …, QSi , j, …, QSi ,M} and i ∈{1, 2, …, N}, QSi , j is the QSi of forcefield j, and med(|QSi ,M – med(QSi ,M)|) is the median of {|QSi ,1 – med(QSi ,M)|, |QSi ,2 – med QSi ,M)|, …, |QSi , j – med(QSi ,M)|, …, |QSi ,M – med(QSi ,M)|}. Missing QSs were assigned a QS‐specific Z score of zero, in the same way as it was done for the model assessment of CASP10.97

Classical Zi= (QSi,j  meani)/SDi (2)
Robust Zi=(QSi,j  med(QSi,M))/(1.4826 × med(|QSi,M  med(QSi,M)|)) (3)

According to the classical and robust Z scores for refining TMR01, TMR04, and TMR07 (Table 7), the best performing forcefields are FF12MC and FF14SBlm; RAPDF/HBEM is better than GBSW; the worst performing forcefield is FF96lm. Both FF12MC and FF14SBlm refined TMR01, TMR04, and TMR07 substantially better than FF96lm. FF12MC outperforms RAPDF/HBEM and GBSW according to all reported QSs of RAPDF/HBEM and GBSW (sseRMSD, CαRMSD and GDT‐TS) listed in Table 6 and both classical and robust Z scores listed in Table 7. FF14SBlm also outperforms RAPDF/HBEM and GBSW according to both classical and robust Z scores (Table 7). FF12MC has a robust Z score of 1.33 and a classical Z score of 0.63, while FF14SBlm has both classical and robust scores of 0.04 (Table 7). In terms of refining CASPR models TMR01, TMR04, and TMR07, the present study shows that an improvement of AMBER forcefields has been made from a robust Z score of −0.56 for FF96lm to 0.04 for FF14SBlm and 1.33 for FF12MC over the past two decades. Further, both robust and classical Z scores suggest that FF12MC can generate conformations that cluster around the native conformation of a test protein better than FF14SBlm, consistent with the unique abilities of FF12MC to simulate the genuine disorder of C14–C38 in BPTI and its mutant and to sample nonnative states of miniproteins thus enabling autonomous folding of these miniproteins with folding times close to the experimental values.

Table 7.

Z Scores for Refining TMR01, TMR04, and TMR07 by Five Forcefields

Forcefield Robust ZF Classical ZF
FF12MC 1.33 0.63
FF14SBlm 0.04 0.04
RAPDF/HBEM −0.08 −0.06
GBSW −0.23 −0.20
FF96lm −0.56 −0.41

Using FF12MC for protein simulations and known limitations

Confined by current computing speeds, it is challenging to predict the folding kinetics of a miniprotein from MD simulations using the already well‐refined, general‐purpose forcefield FF14SB16 that can fold miniproteins with diverse topologies in MD simulations with implicit solvation.22 It is also challenging to use FF14SB to simulate genuine localized disorders of folded globular proteins and to perform simulation‐based refinement of comparative models of monomeric globular proteins with large conformational differences from the native conformations. One proposed strategy to take on these challenges is to develop a further‐refined specialized forcefield that can sample nonnative states of a miniprotein and localized motions of a folded globular protein without barriers such as certain torsions that are otherwise necessary to achieve agreement between experimental observations and simulations that employ implicit solvation. As exemplified above, this type special‐purpose forcefield may enable (i) capturing the major folding pathways of a miniprotein and thereby correct prediction of the native state conformation and the folding kinetics of the miniprotein, (ii) predicting genuine localized disorders of folded globular proteins, and (iii) refining comparative models of monomeric globular proteins. The first pursuit of this strategy has culminated in FF12MC.

As a first‐generation forcefield specialized for protein simulations with explicit solvation, FF12MC has the following known weakness and limitations. As listed in Table 2, FF12MC cannot reproduce main‐chain J‐coupling constants of folded globular proteins as reliably as FF14SB. All bonds involving hydrogen must always be constrained in the NPT MD simulations using FF12MC at Δt = 1.00 fssmt and a temperature of ≤340 K because atomic masses of FF12MC are reduced uniformly by tenfold. FF12MC may not be suitable for studying the anomeric effect153 or calculation of the entropy of restricted rotation about a single bond154 since all torsion potentials involving a nonperipheral sp3 atom are set to zero. FF12MC should not be used for MD simulations employing PMEMD_CUDA of AMBER 12 or 14 (University of California, San Francisco) without re‐compiling PMEMD_CUDA with 0.1008 Da for hydrogen (viz., sim.massH = 0.1008 in gputypes.cpp). Preliminary studies showed that FF12MC could fold chignolin from a fully extended backbone conformation to its native conformation in NPT MD simulations performed entirely on a graphics‐processing unit (Nvidia GeForce GTX Titan) using PMEMD_CUDA of AMBER 12 with the SPFP or DPDP precision model. When using the SPFP model, there was at least a 6‐fold performance improvement of an NPT MD simulation of chignolin performed entirely on an Nvidia GeForce GTX Titan relative to the simulation performed with 16 Intel Xeon E5‐2660 core processors (2.20 GHz). However, without an adequate test using the latest PMEMD_CUDA, FF12MC may not be suitable for simulations to be performed on graphics‐processing units. Instead, FF12MCsm may be experimented on MD simulations using PMEMD_CUDA. Also, no study has been done to determine whether FF12MC can be used for MD simulations at Δt >1.00 fssmt by employing the hydrogen mass repartitioning scheme53, 54, 55 without compromising the ability of FF12MC to study folding kinetics. Lastly, benchmarking FF12MC against quantum mechanical data of local interatomic interactions and experimental structures of proteins in complex with small molecules is required before the forcefield in its present form can be considered suitable for simulations of a protein in complex with a small molecule.

Nevertheless, FF12MC has the following unique abilities. First, FF12MC can simulate flipping between left‐ and right‐handed configurations for C14–C38 of BPTI and its mutant in solution that was observed in the NMR study of BPTI61 and the crystallographic studies of the mutant (Supporting Information Table S12).66, 113 By contrast, FF14SB locks the C14–C38 bond to the right‐handed configuration in solution. Second, FF12MC folds chignolin and CLN025 at 277 K with τfs of 153 and 446 nssmt, respectively; whereas the corresponding τfs of FF14SB are 550 and 1012 nssmt, respectively (Table 4). These τfs suggest that FF12MC can fold a miniprotein in an NPT MD simulation with folding times that are statistically 2–4 times shorter than those of FF14SB. Third, the TMR01 refinement by 20 15.8‐nssmt of NPT MD simulations at 340 K and Δt = 1.00 fssmt using FF12MC improved the CαRMSD from 6.1 Å to 2.5 Å and GDT‐HA from 0.593 to 0.717, whereas the refinement by 20 316‐nssmt of NPT MD simulations at 340 K and Δt = 1.00 fssmt using FF14SBlm improved the CαRMSD from 6.1 Å to 3.0 Å and GDT‐HA from 0.593 to 0.717 (Supporting Information Fig. S4 and Table S6A and B). These results indicate that FF12MC can improve TMR01 at least 20 times faster than FF14SB when both forcefields were used in low‐mass MD simulations under the same conditions. Fourth, it took ∼175 days for FF12MC to complete one, unbiased, unrestricted, and 8.85‐μssmt classical NPT MD simulation that can fold a 20‐residue Trp‐cage (TC10b) on a 12‐core Apple Mac Pro with Intel Westmere (2.93 GHz). Simultaneously and independently performing 30 distinct and independent simulations of this type led to identification of the two‐state kinetics scheme as the major folding pathway of the Trp‐cage (Fig. 6). Without any prior knowledge of the hazard function for the nonnative state population of the Trp‐cage, the τf of the miniprotein was predicted to be 1998 nssmt (95% CI: 13962860 nssmt) from the 30 simulations at 280 K (Table 4). This τf is consistent with the experimentally determined τf of 1430 nssmt at 300 K.71 By comparison, the folding time of the same Trp‐cage sequence reported to date is 14 μssmt that was estimated—also without any prior knowledge that the Trp‐cage follows a two‐state kinetics scheme—from a pioneering 208‐μssmt canonical MD simulation performed on a one‐of‐a‐kind extremely powerful special‐purpose supercomputer.155 Similarly, the simulations using FF12MC predicted the τf of the CLN025 to be 174 nssmt (95% CI: 112270 nssmt) at 300 K (Table 4). This is closer to the experimentally estimated value (∼100 nssmt)70 than the reported folding time (600 nssmt)155 estimated from a 106‐μssmt canonical MD simulation of CLN025 on the same special‐purpose supercomputer. These results suggest that FF12MC has the ability to sample nonnative states of miniproteins in 20–30 distinct and independent NPT MD simulations and hence to fold miniproteins with folding times that are both shorter than those using a general‐purpose forcefield and closer to the experimental values.

These results also suggest that one can predict a priori whether or not a miniprotein folds according to a two‐state kinetics or another scheme at a certain rate without knowing the experimental structure of the miniprotein. As exemplified by the afore‐described retrospective predictions of the folding schemes and folding rates of the CLN025 and Trp‐cage, the prospective prediction can begin with the use of FF12MC to perform 20–30 distinct and independent NPT MD simulations of the miniprotein to obtain 20–30 sets of instantaneous conformations in time. A cluster analysis of all instantaneous conformations from the 20–30 sets can then be done to define the native conformation of the miniprotein according to the average conformation of the largest conformation cluster. A survival analysis using the 20–30 sets of instantaneous conformations in time and the defined native conformation can then be carried out to determine the folding rate and scheme by examining the hazard function for the nonnative state population of the miniprotein. An increase of the number of distinct and independent simulations may be needed in some cases to avoid an overly wide 95% CI.

These unique abilities of FF12MC notwithstanding its weakness in reproducing main‐chain J‐coupling constants of folded globular proteins suggest FF12MC may complement FF14SB for kinetic and thermodynamic studies of miniprotein folding and investigations of protein structure and dynamics in areas such as (i) estimating the folding rate of a miniprotein using survival analysis of at least 20 simulations, (ii) computationally determining whether the folding of the miniprotein follows a two‐state kinetics scheme or other schemes by examining the hazard function for the nonnative state population of the miniprotein, (iii) simulating genuine localized disorders of folded globular proteins, and (iv) refining protein models with large conformational differences from the native conformations.

Supporting information

Supporting Information

Supporting Information

Supporting Information

Supporting Information

ACKNOWLEDGMENTS

Yuan‐Ping Pang acknowledges the support of this work from the US Defense Advanced Research Projects Agency (DAAD19‐01–1‐0322), the US Army Medical Research Material Command (W81XWH‐04–2‐0001), the US Army Research Office (DAAD19‐03–1‐0318, W911NF‐09–1‐0095, and W911NF‐16–1‐0264), the US Department of Defense High Performance Computing Modernization Office, and the Mayo Foundation for Medical Education and Research. The author remains in debt to the late Professor Peter A. Kollman for teaching him the minimalist approach to forcefield development during a one‐year sabbatical in the Kollman group 1994–1995. The author is also in debt to the late Professor Shneior Lifson for a stimulating discussion on forcefield development during his visit to the Weizmann Institute of Science, Rehovot, Israel in 1996. The author thanks four anonymous reviewers for their comments and suggestions. The contents of this article are the sole responsibility of the author and do not necessarily represent the official views of the funders.

REFERENCES

  • 1. Hopfinger AJ, Pearlstein RA. Molecular mechanics force‐field parameterization procedures. J Comput Chem 1984;5:486–499. [Google Scholar]
  • 2. Bowen JP, Allinger NL. Molecular mechanics: the art and science of parametrization In: Lipkowitz KB, Boyd DB, editors. Reviews in computational chemistry. Volume 2 New York: VCH; 1991. pp 81–98. [Google Scholar]
  • 3. Levitt M, Hirshberg M, Sharon R, Daggett V. Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comput Phys Commun 1995;91:215–231. [Google Scholar]
  • 4. Kollman PA. Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules. Accounts Chem Res 1996;29:461–469. [Google Scholar]
  • 5. Hünenberger PH, van Gunsteren WF. Empirical classical interaction functions for molecular simulation In: van Gunsteren WF, Weiner PK, Wilkinson AJ, editors. Computer simulations of biomolecular systems. Volume 3 Dordrecht: Kluwer Academic Publishers; 1997. pp 3–82. [Google Scholar]
  • 6. Ponder JW, Case DA. Force fields for protein simulations. Adv Protein Chem 2003; 66:27–85. [DOI] [PubMed] [Google Scholar]
  • 7. Gnanakaran S, Garcia AE. Validation of an all‐atom protein force field: from dipeptides to larger peptides. J Phys Chem B 2003;107:12555–12557. [Google Scholar]
  • 8. Mackerell AD, Jr. Empirical force fields for biological macromolecules: overview and issues. J Comput Chem 2004;25:1584–1604. [DOI] [PubMed] [Google Scholar]
  • 9. Krieger E, Darden T, Nabuurs SB, Finkelstein A, Vriend G. Making optimal use of empirical energy functions: force‐field parameterization in crystal space. Proteins 2004;57:678–683. [DOI] [PubMed] [Google Scholar]
  • 10. Jorgensen WL, Tirado‐Rives J. Potential energy functions for atomic‐level simulations of water and organic and biomolecular systems. Proc Natl Acad Sci USA 2005;102:6665–6670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Li DW, Bruschweiler R. NMR‐based protein potentials. Angew Chem Int Ed Engl 2010;49:6778–6780. [DOI] [PubMed] [Google Scholar]
  • 12. Lindorff‐Larsen K, Maragakis P, Piana S, Eastwood MP, Dror RO, Shaw DE. Systematic validation of protein force fields against experimental data. PLoS One 2012;7: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Beauchamp KA, Lin YS, Das R, Pande VS. Are protein force fields getting better? A systematic benchmark on 524 diverse NMR measurements. J Chem Theory Comput 2012;8:1409–1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Huang J, MacKerell AD. CHARMM36 all‐atom additive protein force field: validation based on comparison to NMR data. J Comput Chem 2013;34:2135–2145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Cerutti DS, Swope WC, Rice JE, Case DA. ff14ipq: a self‐consistent force field for condensed‐phase simulations of proteins. J Chem Theory Comput 2014;10:4515–4534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Maier JA, Martinez C, Kasavajhala K, Wickstrom L, Hauser K, Simmerling C. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J Chem Theory Comput 2015;11:3696–3713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Robertson MJ, Tirado‐Rives J, Jorgensen WL. Improved peptide and protein torsional energetics with the OPLS‐AA force field. J Chem Theory Comput 2015;11:3499–3509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Harder E, Damm W, Maple J, Wu CJ, Reboul M, Xiang JY, Wang LL, Lupyan D, Dahlgren MK, Knight JL, Kaus JW, Cerutti DS, Krilov G, Jorgensen WL, Abel R, Friesner RA. OPLS3: a force field providing broad coverage of drug‐like small molecules and proteins. J Chem Theory Comput 2016;12:281–296. [DOI] [PubMed] [Google Scholar]
  • 19. Simmerling C, Strockbine B, Roitberg AE. All‐atom structure prediction and folding simulations of a stable protein. J Am Chem Soc 2002;124:11258–11259. [DOI] [PubMed] [Google Scholar]
  • 20. Lei HX, Duan Y. Two‐stage folding of HP‐35 from ab initio simulations. J Mol Biol 2007;370:196–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Lane TJ, Shukla D, Beauchamp KA, Pande VS. To milliseconds and beyond: challenges in the simulation of protein folding. Curr Opin Struc Biol 2013;23:58–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Nguyen H, Maier J, Huang H, Perrone V, Simmerling C. Folding simulations for proteins with diverse topologies are accessible in days with physics‐based force field and implicit solvent. J Am Chem Soc 2014;136:13959–13962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Piana S, Klepeis JL, Shaw DE. Assessing the accuracy of physical models used in protein‐folding simulations: quantitative evidence from long molecular dynamics simulations. Curr Opin Struc Biol 2014;24:98–105. [DOI] [PubMed] [Google Scholar]
  • 24. Suenaga A, Narumi T, Futatsugi N, Yanai R, Ohno Y, Okimoto N, Taiji M. Folding dynamics of 10‐residue β‐hairpin peptide chignolin. Chem Asian J 2007;2:591–598. [DOI] [PubMed] [Google Scholar]
  • 25. Seibert MM, Patriksson A, Hess B, van der Spoel D. Reproducible polypeptide folding and structure prediction using molecular dynamics simulations. J Mol Biol 2005;354:173–183. [DOI] [PubMed] [Google Scholar]
  • 26. Fersht AR. On the simulation of protein folding by short time scale molecular dynamics and distributed computing. Proc Natl Acad Sci USA 2002;99:14122–14125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Lee MR, Tsai J, Baker D, Kollman PA. Molecular dynamics in the endgame of protein structure prediction. J Mol Biol 2001;313:417–430. [DOI] [PubMed] [Google Scholar]
  • 28. Flohil JA, Vriend G, Berendsen HJC. Completion and refinement of 3‐D homology models with restricted molecular dynamics: application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis. Proteins 2002;48:593–604. [DOI] [PubMed] [Google Scholar]
  • 29. Fan H, Mark AE. Refinement of homology‐based protein structures by molecular dynamics simulation techniques. Protein Sci 2004;13:211–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Pang Y‐P. Three‐dimensional model of a substrate‐bound SARS chymotrypsin‐like cysteine proteinase predicted by multiple molecular dynamics simulations: catalytic efficiency regulated by substrate binding. Proteins 2004;57:747–757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Zhu J, Xie L, Honig B. Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge‐based potentials, and clustering. Proteins 2006;65:463–479. [DOI] [PubMed] [Google Scholar]
  • 32. Chen JH, Brooks CL. Can molecular dynamics simulations provide high‐resolution refinement of protein structure? Proteins 2007;67:922–930. [DOI] [PubMed] [Google Scholar]
  • 33. Lee MS, Olson MA. Assessment of detection and refinement strategies for de novo protein structures using force field and statistical potentials. J Chem Theory Comput 2007;3:312–324. [DOI] [PubMed] [Google Scholar]
  • 34. Stumpff‐Kane AW, Maksimiak K, Lee MS, Feig M. Sampling of near‐native protein conformations during protein structure refinement using a coarse‐grained model, normal modes, and molecular dynamics simulations. Proteins 2008;70:1345–1356. [DOI] [PubMed] [Google Scholar]
  • 35. Ishitani R, Terada T, Shimizu K. Refinement of comparative models of protein structure by using multicanonical molecular dynamics simulations. Mol Simul 2008;34:327–336. [Google Scholar]
  • 36. Chopra G, Summa CM, Levitt M. Solvent dramatically affects protein structure refinement. Proc Natl Acad Sci USA 2008;105:20239–20244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Zhu J, Fan H, Periole X, Honig B, Mark AE. Refining homology models by combining replica‐exchange molecular dynamics and statistical potentials. Proteins 2008;72:1171–1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Kannan S, Zacharias M. Application of biasing‐potential replica‐exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins 2010;78:2809–2819. [DOI] [PubMed] [Google Scholar]
  • 39. Zhang J, Liang Y, Zhang Y. Atomic‐level protein structure refinement using fragment‐guided molecular dynamics conformation sampling. Structure 2011;19:1784–1795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Olson MA, Chaudhury S, Lee MS. Comparison between self‐guided langevin dynamics and molecular dynamics simulations for structure refinement of protein loop conformations. J Comput Chem 2011;32:3014–3022. [DOI] [PubMed] [Google Scholar]
  • 41. Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE. Refinement of protein structure homology models via long, all‐atom molecular dynamics simulations. Proteins 2012;80:2071–2079. [DOI] [PubMed] [Google Scholar]
  • 42. Fan H, Periole X, Mark AE. Mimicking the action of folding chaperones by Hamiltonian replica‐exchange molecular dynamics simulations: application in the refinement of de novo models. Proteins 2012;80:1744–1754. [DOI] [PubMed] [Google Scholar]
  • 43. Li DW, Bruschweiler R. Dynamic and thermodynamic signatures of native and non‐native protein states with application to the improvement of protein structures. J Chem Theory Comput 2012;8:2531–2539. [DOI] [PubMed] [Google Scholar]
  • 44. Mirjalili V, Noyes K, Feig M. Physics‐based protein structure refinement through multiple molecular dynamics trajectories and structure averaging. Proteins 2014;82:196–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Pang Y‐P. Low‐mass molecular dynamics simulation: a simple and generic technique to enhance configurational sampling. Biochem Biophys Res Commun 2014;452:588–592. [DOI] [PubMed] [Google Scholar]
  • 46. Pang Y‐P. Low‐mass molecular dynamics simulation for configurational sampling enhancement: More evidence and theoretical explanation. Biochem Biophys Rep 2015;4:126–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Pang Y‐P. At least 10% shorter C–H bonds in cryogenic protein crystal structures than in current AMBER forcefields. Biochem Biophys Res Commun 2015;458:352–355. [DOI] [PubMed] [Google Scholar]
  • 48. Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 1995;117:5179–5197. [Google Scholar]
  • 49. Pang Y‐P. Use of 1–4 interaction scaling factors to control the conformational equilibrium between α‐helix and β‐strand. Biochem Biophys Res Commun 2015;457:183–186. [DOI] [PubMed] [Google Scholar]
  • 50. Wang JM, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem 2000;21:1049–1074. [Google Scholar]
  • 51. Shalongo W, Dugad L, Stellwagen E. Distribution of helicity within the model peptide acetyl(AAQAA)3amide. J Am Chem Soc 1994;116:8288–8293. [Google Scholar]
  • 52. Kirschner KN, Yongye AB, Tschampel SM, Gonzalez‐Outeirino J, Daniels CR, Foley BL, Woods RJ. GLYCAM06: a generalizable biomolecular force field. Carbohydrates. J Comput Chem 2008;29:622–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Feenstra KA, Hess B, Berendsen HJC. Improving efficiency of large time‐scale molecular dynamics simulations of hydrogen‐rich systems. J Comput Chem 1999;20:786–798. [DOI] [PubMed] [Google Scholar]
  • 54. Harvey MJ, Giupponi G, De Fabritiis G. ACEMD: accelerating biomolecular dynamics in the microsecond time scale. J Chem Theory Comput 2009;5:1632–1639. [DOI] [PubMed] [Google Scholar]
  • 55. Hopkins CW, Le Grand S, Walker RC, Roitberg AE. Long‐time‐step molecular dynamics through hydrogen mass repartitioning. J Chem Theory Comput 2015;11:1864–1874. [DOI] [PubMed] [Google Scholar]
  • 56. Jorgensen WL, Chandreskhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J Chem Phys 1983;79:926–935. [Google Scholar]
  • 57. Kollman PA, Dixon R, Cornell WD, Fox T, Chipot C, Pohorille A. The development/application of the “minimalist” organic/biochemical molecular mechanic force field using a combination of ab initio calculations and experimental data In: van Gunsteren WF, Weiner PK, Wilkinson AJ, editors. Computer simulations of biomolecular systems. Volume 3 Dordrecht: Kluwer Academic Publishers; 1997. pp 83–96. [Google Scholar]
  • 58. Graf J, Nguyen PH, Stock G, Schwalbe H. Structure and dynamics of the homologous series of alanine peptides: a joint molecular dynamics/NMR study. J Am Chem Soc 2007;129:1179–1189. [DOI] [PubMed] [Google Scholar]
  • 59. Miclet E, Boisbouvier J, Bax A. Measurement of eight scalar and dipolar couplings for methine‐methylene pairs in proteins and nucleic acids. J Biomol NMR 2005;31:201–216. [DOI] [PubMed] [Google Scholar]
  • 60. Vogeli B, Ying JF, Grishaev A, Bax A. Limits on variations in protein backbone dynamics from precise measurements of scalar couplings. J Am Chem Soc 2007;129:9377–9385. [DOI] [PubMed] [Google Scholar]
  • 61. Berndt KD, Guntert P, Orbons LPM, Wuthrich K. Determination of a high‐quality nuclear magnetic resonance solution structure of the bovine pancreatic trypsin inhibitor and comparison with three crystal structures. J Mol Biol 1992;227:757–775. [DOI] [PubMed] [Google Scholar]
  • 62. Hu JS, Bax A. Determination of φ and χ1 angles in proteins from 13C–13C three‐bond J couplings measured by three‐dimensional heteronuclear NMR. How planar is the peptide bond? J Am Chem Soc 1997;119:6360–6368. [Google Scholar]
  • 63. Smith LJ, Sutcliffe MJ, Redfield C, Dobson CM. Analysis of φ and χ 1 torsion angles for hen lysozyme in solution from 1H NMR spin‐spin coupling constants. Biochemistry 1991;30:986–996. [DOI] [PubMed] [Google Scholar]
  • 64. Willis BTM, Pryor AW. Thermal vibrations in crystallography. London: Cambridge University Press; 1975. 296 p. [Google Scholar]
  • 65. Lipari G, Szabo A. Model‐free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. 1. Theory and range of validity. J Am Chem Soc 1982;104:4546–4559. [Google Scholar]
  • 66. Czapinska H, Otlewski J, Krzywda S, Sheldrick GM, Jaskolski M. High‐resolution structure of bovine pancreatic trypsin inhibitor with altered binding loop sequence. J Mol Biol 2000;295:1237–1249. [DOI] [PubMed] [Google Scholar]
  • 67. Honda S, Yamasaki K, Sawada Y, Morii H. 10 residue folded peptide designed by segment statistics. Structure 2004;12:1507–1518. [DOI] [PubMed] [Google Scholar]
  • 68. Honda S, Akiba T, Kato YS, Sawada Y, Sekijima M, Ishimura M, Ooishi A, Watanabe H, Odahara T, Harata K. Crystal structure of a ten‐amino acid protein. J Am Chem Soc 2008;130:15327–15331. [DOI] [PubMed] [Google Scholar]
  • 69. Barua B, Lin JC, Williams VD, Kummler P, Neidigh JW, Andersen NH. The Trp‐cage: Optimizing the stability of a globular miniprotein. Protein Eng Des Sel 2008;21:171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Davis CM, Xiao SF, Raeigh DP, Dyer RB. Raising the speed limit for β‐hairpin formation. J Am Chem Soc 2012;134:14476–14482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Byrne A, Williams DV, Barua B, Hagen SJ, Kier BL, Andersen NH. Folding dynamics and pathways of the trp‐cage miniproteins. Biochemistry 2014;53:6011–6021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Berendsen HJC, Postma JPM, van Gunsteren WF, Di Nola A, Haak JR. Molecular dynamics with coupling to an external bath. J Chem Phys 1984;81:3684–3690. [Google Scholar]
  • 73. Darden TA, York DM, Pedersen LG. Particle mesh Ewald: An N log(N) method for Ewald sums in large systems. J Chem Phys 1993;98:10089–10092. [Google Scholar]
  • 74. Joung IS, Cheatham TE. Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J Phys Chem B 2008;112:9020–9041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Cieplak P, Cornell WD, Bayly C, Kollman PA. Application of the multimolecule and multiconformational RESP methodology to biopolymers: charge derivation for DNA, RNA, and proteins. J Comput Chem 1995;16:1357–1377. [Google Scholar]
  • 76. Creighton TE. Proteins. New York: W. H. Freeman and Company; 1993. 507 p. [Google Scholar]
  • 77. Wirmer J, Schwalbe H. Angular dependence of 1J(Ni,Cαi) and 2J(Ni,Cα(i‐1)) coupling constants measured in J‐modulated HSQCs. J Biomol NMR 2002;23:47–55. [DOI] [PubMed] [Google Scholar]
  • 78. Ding K, Gronenborn AM. Protein backbone 1HN13Cα and 15N‐13Cα residual dipolar and J couplings: new constraints for NMR structure determination. J Am Chem Soc 2004;126:6232–6233. [DOI] [PubMed] [Google Scholar]
  • 79. Hennig M, Bermel W, Schwalbe H, Griesinger C. Determination of Ψ torsion angle restraints from 3 J(Cα,Cα) and 3 J(Cα,HN) coupling constants in proteins. J Am Chem Soc 2000;122:6268–6277. [Google Scholar]
  • 80. Schmidt JM, Blumel M, Lohr F, Ruterjans H. Self‐consistent 3 J coupling analysis for the joint calibration of Karplus coefficients and evaluation of torsion angles. J Biomol NMR 1999;14:1–12. [DOI] [PubMed] [Google Scholar]
  • 81. Case DA, Scheurer C, Bruschweiler R., Static dynamic effects on vicinal scalar J couplings in proteins and peptides: A MD/DFT analysis. J Am Chem Soc 2000;122:10390–10397. [Google Scholar]
  • 82. Perez C, Lohr F, Ruterjans H, Schmidt JM. Self‐consistent Karplus parametrization of 3J couplings depending on the polypeptide side‐chain torsion Χ1. J Am Chem Soc 2001;123:7081–7093. [DOI] [PubMed] [Google Scholar]
  • 83. Chou JJ, Case DA, Bax A. Insights into the mobility of methyl‐bearing side chains in proteins from 3JCC and 3JCN couplings. J Am Chem Soc 2003;125:8959–8966. [DOI] [PubMed] [Google Scholar]
  • 84. Best RB, Zhu X, Shim J, Lopes PEM, Mittal J, Feig M, MacKerell AD. Optimization of the additive CHARMM all‐atom protein force field targeting improved sampling of the backbone φ, ψ and side‐chain χ1 and χ2 dihedral angles. J Chem Theory Comput 2012;8:3257–3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Lindorff‐Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, Shaw DE. Improved side‐chain torsion potentials for the AMBER ff99SB protein force field. Proteins 2010;78:1950–1958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Andraos J. On the propagation of statistical errors for a function of several variables. J Chem Educ 1996;73:150–154. [Google Scholar]
  • 87. Prompers JJ, Bruschweiler R. General framework for studying the dynamics of folded and nonfolded proteins by NMR relaxation spectroscopy and MD simulation. J Am Chem Soc 2002;124:4522–4534. [DOI] [PubMed] [Google Scholar]
  • 88. Hall JB, Fushman D. Characterization of the overall and local dynamics of a protein with intermediate rotational anisotropy: differentiating between conformational exchange and anisotropic diffusion in the B3 domain of protein G. J Biomol NMR 2003;27:261–275. [DOI] [PubMed] [Google Scholar]
  • 89. Tjandra N, Feller SE, Pastor RW, Bax A. Rotational diffusion anisotropy of human ubiquitin from 15N NMR relaxation. J Am Chem Soc 1995;117:12562–12566. [Google Scholar]
  • 90. Buck M, Boyd J, Redfield C, Mackenzie DA, Jeenes DJ, Archer DB, Dobson CM. Structural determinants of protein dynamics: analysis of 15N NMR relaxation measurements for main‐chain and side‐chain nuclei of hen egg‐white lysozyme. Biochemistry 1995;34:4041–4055. [DOI] [PubMed] [Google Scholar]
  • 91. Beeser SA, Oas TG, Goldenberg DP. Determinants of backbone dynamics in native BPTI: cooperative influence of the 14–38 disulfide and the Tyr35 side‐chain. J Mol Biol 1998;284:1581–1596. [DOI] [PubMed] [Google Scholar]
  • 92. Therneau TM, Grambsch PM. Modeling survival data: extending the Cox model. New York: Springer‐Verlag; 2000. [Google Scholar]
  • 93. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:457–481. [Google Scholar]
  • 94. Rich JT, Neely JG, Paniello RC, Voelker CC, Nussenbaum B, Wang EW. A practical guide to understanding Kaplan‐Meier curves. Otolaryngol Head Neck Surg 2010;143:331–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Read RJ, Chavali G. Assessment of CASP7 predictions in the high accuracy template‐based modeling category. Proteins 2007;69:27–37. [DOI] [PubMed] [Google Scholar]
  • 96. Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Cαs for CASP8 template‐based and high‐accuracy models. Proteins 2009;77:29–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Huang YJ, Mao BC, Aramini JM, Montelione GT. Assessment of template‐based protein structure predictions in CASP10. Proteins 2014;82:43–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Mariani V, Biasini M, Barbato A, Schwede T. lDDT: a local superposition‐free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013;29:2722–2728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL. Proteins 2014;82:7–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Olechnovic K, Kulberkyte E, Venclovas C. CAD‐score: A new contact area difference‐based function for evaluation of protein structural models. Proteins 2013;81:149–162. [DOI] [PubMed] [Google Scholar]
  • 101. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–710. [DOI] [PubMed] [Google Scholar]
  • 102. Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31:3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Shao J, Tanner SW, Thompson N, Cheatham IIITE. Clustering molecular dynamics trajectories, 1. Characterizing the performance of different clustering algorithms. J Chem Theory Comput 2007;3:2312–2334. [DOI] [PubMed] [Google Scholar]
  • 104. van Gunsteren WF, Berendsen HJC. Algorithms for macromolecular dynamics and constraint dynamics. Mol Phys 1977;34:1311–1327. [Google Scholar]
  • 105. van Gunsteren WF, Mark AE. Validation of molecular dynamics simulation. J Chem Phys 1998;108:6109–6116. [Google Scholar]
  • 106. Georgoulia PS, Glykos NM. Using J‐coupling constants for force field validation: application to hepta‐alanine. J Phys Chem B 2011;115:15221–15227. [DOI] [PubMed] [Google Scholar]
  • 107. Ensign DL, Pande VS. Bayesian single‐exponential kinetics in single‐molecule experiments and simulations. J Phys Chem B 2009;113:12410–12423. [DOI] [PubMed] [Google Scholar]
  • 108. Grossfield A, Zuckerman DM. Quantifying uncertainty and sampling quality in biomolecular simulations. Annu Rep Comput Chem 2009;5:23–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Vogeli B, Olsson S, Riek R, Guntert P. Compiled data set of exact NOE distance limits, residual dipolar couplings and scalar couplings for the protein GB3. Data Brief 2015;5:99–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Smith LJ, Mark AE, Dobson CM, van Gunsteren WF. Comparison of MD simulations and NMR experiments for hen lysozyme: analysis of local fluctuations, cooperative motions, and global changes. Biochemistry 1995;34:10918–10931. [DOI] [PubMed] [Google Scholar]
  • 111. Bruschweiler R, Case DA. Adding harmonic motion to the Karplus relation for spin‐spin coupling. J Am Chem Soc 1994;116:11199–11200. [Google Scholar]
  • 112. Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem 1981;34:167–339. [DOI] [PubMed] [Google Scholar]
  • 113. Addlagatta A, Krzywda S, Czapinska H, Otlewski J, Jaskolski M. Ultrahigh‐resolution structure of a BPTI mutant. Acta Crystallogr Sect D: Biol Crystallogr 2001;57:649–663. [DOI] [PubMed] [Google Scholar]
  • 114. Morin S. A practical guide to protein dynamics from 15N spin relaxation in solution. Prog Nucl Magn Reson Spectrosc 2011;59:245–262. [DOI] [PubMed] [Google Scholar]
  • 115. Gu Y, Li DW, Bruschweiler R. NMR order parameter determination from long molecular dynamics trajectories for objective comparison with experiment. J Chem Theory Comput 2014;10:2599–2607. [DOI] [PubMed] [Google Scholar]
  • 116. Szyperski T, Luginbuhl P, Otting G, Guntert P, Wuthrich K. Protein dynamics studied by rotating frame 15N spin relaxation times. J Biomol NMR 1993;3:151–164. [DOI] [PubMed] [Google Scholar]
  • 117. Tjandra N, Szabo A, Bax A. Protein backbone dynamics and 15N chemical shift anisotropy from quantitative measurement of relaxation interference effects. J Am Chem Soc 1996;118:6986–6991. [Google Scholar]
  • 118. Debye P. Interference of x rays and heat movement. Ann Phys 1913;43:49–95. [Google Scholar]
  • 119. Waller I. On the effect of thermal motion on the interference of X‐rays. Z Phys 1923;17:398–408. [Google Scholar]
  • 120. Kidera A, Go N. Normal mode refinement: crystallographic refinement of protein dynamic structure. I. Theory and test by simulated diffraction data. J Mol Biol 1992;225:457–475. [DOI] [PubMed] [Google Scholar]
  • 121. Trueblood KN, Burgi HB, Burzlaff H, Dunitz JD, Gramaccioli CM, Schulz HH, Shmueli U, Abrahams SC. Atomic displacement parameter nomenclature: report of a subcommittee on atomic displacement parameter nomenclature. Act Crystallogr, Sect A 1996;52:770–781. [Google Scholar]
  • 122. Garcia AE, Krumhansl JA, Frauenfelder H. Variations on a theme by Debye and Waller: from simple crystals to proteins. Proteins 1997;29:153–160. [PubMed] [Google Scholar]
  • 123. Meinhold L, Smith JC. Fluctuations and correlations in crystalline protein dynamics: a simulation analysis of Staphylococcal nuclease. Biophys J 2005;88:2554–2563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124. Kuriyan J, Weis WI. Rigid protein motion as a model for crystallographic temperature factors. Proc Natl Acad Sci USA 1991;88:2773–2777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Drenth J. Principles of protein X‐ray crystallography. New York: Springer; 2007. [Google Scholar]
  • 126. Hu ZQ, Jiang JW. Assessment of biomolecular force fields for molecular dynamics simulations in a protein crystal. J Comput Chem 2010;31:371–380. [DOI] [PubMed] [Google Scholar]
  • 127. Janowski PA, Liu C, Deckman J, Case DA. Molecular dynamics simulation of triclinic lysozyme in a crystal lattice. Protein Sci 2016;25:87–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128. Walsh MA, Schneider TR, Sieker LC, Dauter Z, Lamzin VS, Wilson KS. Refinement of triclinic hen egg‐white lysozyme at atomic resolution. Acta Crystallogr, Sect D: Biol Crystallogr 1998;54:522–546. [DOI] [PubMed] [Google Scholar]
  • 129. Shirley WA, Brooks CL. Curious structure in “canonical” alanine based peptides. Proteins 1997; 28:59–71. [DOI] [PubMed] [Google Scholar]
  • 130. Ferrara P, Apostolakis J, Caflisch A. Thermodynamics and kinetics of folding of two model peptides investigated by molecular dynamics simulations. J Phys Chem B 2000;104:5000–5010. [Google Scholar]
  • 131. Hassan SA, Mehler EL. A general screened Coulomb potential based implicit solvent model: calculation of secondary structure of small peptides. Int J Quant Chem 2001;83:193–202. [Google Scholar]
  • 132. Feig M, MacKerell AD, Brooks CL. Force field influence on the observation of π‐helical protein structures in molecular dynamics simulations. J Phys Chem B 2003;107:2831–2836. [Google Scholar]
  • 133. Chen JH, Im WP, Brooks CL. Balancing solvation and intramolecular interactions: toward a consistent generalized born force field. J Am Chem Soc 2006;128:3728–3736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Li XF, Latour RA, Stuart SJ. TIGER2: an improved algorithm for temperature intervals with global exchange of replicas. J Chem Phys 2009;130:174106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135. Best RB, Hummer G. Optimized molecular dynamics force fields applied to the helix‐coil transition of polypeptides. J Phys Chem B 2009;113:9004–9015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136. Best RB, Mittal J. Protein simulations with an optimized water model: cooperative helix formation and temperature‐induced unfolded state collapse. J Phys Chem B 2010;114:14916–14923. [DOI] [PubMed] [Google Scholar]
  • 137. Best RB, Mittal J, Feig M, MacKerell AD. Inclusion of many‐body effects in the additive CHARMM protein CMAP potential results in enhanced cooperativity of α‐helix and β‐hairpin formation. Biophys J 2012;103:1045–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138. Nerenberg PS, Jo B, So C, Tripathy A, Head‐Gordon T. Optimizing solute‐water van der Waals interactions to reproduce solvation free energies. J Phys Chem B 2012;116:4524–4534. [DOI] [PubMed] [Google Scholar]
  • 139. Ioannou F, Leontidis E, Archontis G. Helix formation by alanine‐based peptides in pure water and electrolyte solutions: insights from molecular dynamics simulations. J Phys Chem B 2013;117:9866–9876. [DOI] [PubMed] [Google Scholar]
  • 140. Huang J, MacKerell AD. Induction of peptide bond dipoles drives cooperative helix formation in the (AAQAA)3 peptide. Biophys J 2014;107:991–997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141. Cooley RB, Arp DJ, Karplus PA. Evolutionary origin of a secondary structure: π‐helices as cryptic but widespread insertional variations of α‐helices that enhance protein functionality. J Mol Biol 2010;404:232–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142. Lifson S, Roig A. On the theory of helix‐coil transition in polypeptides. J Chem Phys 1961;34:1963–1974. [Google Scholar]
  • 143. Qian H, Schellman JA. Helix‐coil theories: a comparative‐study for finite length polypeptides. J Phys Chem 1992;96:3987–3994. [Google Scholar]
  • 144. Doig AJ. Recent advances in helix‐coil theory. Biophys Chem 2002;101:281–293. [DOI] [PubMed] [Google Scholar]
  • 145. Zhang C, Liu S, Zhou YQ. Accurate and efficient loop selections by the DFIRE‐based all‐atom statistical potential. Protein Sci 2004;13:391–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146. Misura KMS, Baker D. Progress and challenges in high‐resolution refinement of protein structure models. Proteins 2005;59:15–29. [DOI] [PubMed] [Google Scholar]
  • 147. Summa CM, Levitt M. Near‐native structure refinement using in vacuo energy minimization. Proc Natl Acad Sci USA 2007;104:3177–3182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148. Jagielska A, Wroblewska L, Skolnick J. Protein model refinement using an optimized physics‐based all‐atom force field. Proc Natl Acad Sci USA 2008;105:8268–8273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149. Lin MS, Head‐Gordon T. Reliable protein structure refinement using a physical energy function. J Comput Chem 2011;32:709–717. [DOI] [PubMed] [Google Scholar]
  • 150. Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all‐atom structure validation for macromolecular crystallography. Acta Crystallogr Sect D: Biol Crystallogr 2010;66:12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151. MacCallum JL, Perez A, Schnieders MJ, Hua L, Jacobson MP, Dill KA. Assessment of protein structure refinement in CASP9. Proteins 2011;79:74–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152. Huber PJ, Ronchetti EM. Robust statistics. Hoboken, New Jersey: Wiley; 2009. [Google Scholar]
  • 153. Eliel EL. Conformational analysis in heterocyclic systems: recent results and applications. Angew Chem Int Ed Engl 1972;11:739–860. [Google Scholar]
  • 154. Kemp JD, Pitzer KS. The entropy of ethane and the third law of thermodynamics. Hindered rotation of methyl group. J Am Chem Soc 1937;59:276–279. [Google Scholar]
  • 155. Lindorff‐Larsen K, Piana S, Dror RO, Shaw DE. How fast‐folding proteins fold. Science 2011;334:517–520. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Supporting Information

Supporting Information

Supporting Information


Articles from Proteins are provided here courtesy of Wiley

RESOURCES