Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jul 5.
Published in final edited form as: J Phys Chem B. 2018 Jun 21;122(26):6673–6689. doi: 10.1021/acs.jpcb.8b02144

Validating Molecular Dynamics Simulations Against Experimental Observables in Light of Underlying Conformational Ensembles

Matthew Carter Childers 1, Valerie Daggett 1,*
PMCID: PMC6420231  NIHMSID: NIHMS992986  PMID: 29864281

Abstract

Far from the static, idealized conformations deposited into structural databases, proteins are highly dynamic molecules that undergo conformational changes on temporal and spatial scales that may span several orders of magnitude. These conformational changes, often intimately connected to the functional roles that proteins play, may be obscured by traditional biophysical techniques. Over the past forty years, molecular dynamics (MD) simulations have complemented these techniques by providing the ‘hidden’ atomistic details that underlie protein dynamics. However, there are limitations of the degree to which molecular simulations accurately and quantitatively describe protein motions. Here we show that although four molecular dynamics simulation packages (AMBER, GROMACS, NAMD and ilmm) reproduced a variety of experimental observables for two different proteins (engrailed homeodomain and RNase H) equally well overall at room temperature, there were subtle differences in the underlying conformational distributions and the extent of conformational sampling obtained. This leads to ambiguity about which results are correct, as experiment cannot always provide the necessary detailed information to distinguish between the underlying conformational ensembles. However, the results with different packages diverged more when considering larger amplitude motion, for example the thermal unfolding process and conformational states sampled, with some packages failing to allow the protein to unfold at high temperature or providing results at odds with experiment. While most differences between MD simulations performed with different packages are attributed to the force fields themselves, there are many other factors that influence the outcome, including the water model, algorithms that constrain motion, how atomic interactions are handled, and the simulation ensemble employed. Here four different MD packages were tested each using best practices as established by the developers, utilizing three different protein force fields and three different water models. Differences between the simulated protein behavior using two different packages but the same force field, as well as two different packages with different force fields but the same water models and approaches to restraining motion, show how other factors can influence the behavior, and it is incorrect to place all the blame for deviations and errors on force fields or to expect improvements in force fields alone to solve such problems.

Graphical Abstract

graphic file with name nihms-992986-f0001.jpg

Introduction

Molecular dynamics (MD) simulations, “virtual molecular microscopes”, employ computational methods to probe the dynamical properties of atomistic systems and proffer insights into molecular behavior. Beginning with the report of a 9.2 picosecond simulation of bovine pancreatic trypsin inhibitor (BPTI) in 1977,1 MD simulations have provided the means to visualize proteins in action and to investigate that paradigmatic relationship between form and function.2,3 When taken in context with experimental results, MD simulations can drive discoveries in protein design,4,5 protein folding,69 and other spheres of protein science.10 However, two factors limit the predictive capabilities of MD: first, lengthy simulations may be required to correctly describe certain dynamical properties (i.e. the sampling problem11) second, insufficient mathematical descriptions of the physical and chemical forces that govern protein dynamics may yield biologically meaningless results (i.e. the accuracy problem12).13 To increase our confidence in the ability of MD simulations to provide meaningful results for arbitrary proteins and peptide systems, it is necessary to benchmark computational results against experimental data.

Improved computational infrastructure,14 software,15,16 and parallelization schemes17,18 allow contemporary simulations to probe increasingly larger systems at timescales approaching those of experiment.1922 However, the requisite simulation times to accurately measure dynamical properties are rarely known a priori; instead, simulations are deemed ‘sufficiently long’ when some observable quantity has ‘converged’. In the context of molecular simulation, Sawle and Ghosh argue that convergence is a misnomer and show that the timescales required to satisfy the most stringent tests of ‘convergence’ or ‘self-consistency’ vary from system to system and are dependent on the method used to assess convergence.23 This behavior is mirrored in the analysis of MD simulations: we have shown that the overall level of insight into the dynamics of a system can be modulated by the type of analysis performed and the level of detail described.24 Thus, how long is ‘sufficiently long’ remains unanswered. A wealth of information can be obtained from simulations that probe native state dynamics as well as conformational changes that result in excursions from the native state. However, for slow dynamical processes like the folding of typical globular proteins and the non-folding of intrinsically disordered proteins,25,26 the requisite timescales remain out of reach for the time being, at least for conventional MD simulations.

Approximations built into the mathematical forms of MD force fields and their associated parameterizations give rise to the accuracy problem. These force fields are empirical and begin with parameters obtained from high-resolution experimental data and quantum mechanical calculations for small molecules, and then they are modified to reproduce different experimental properties or desired behaviors.2736 Over time, modification of these parameters has yielded improved force fields with similar functional forms.37 In addition, it is important to note that while usually the focus, or blame, is on the force field, it is not just the potential energy function and associated parameters that determine the results of MD simulations. Protein dynamics are often more sensitive to the protocols used for integration of the equations of motion, treatment of nonbonded interactions and various unphysical approximations.

The most compelling measure of the accuracy of a force field is its ability to recapitulate and predict experimental observables. However, there are challenges associated with this method of validation.13 Namely, the experimental data used for validation are averages over space and time; the underlying distributions and timescales associated with these averages are often obscured. Consequently, correspondence between simulation and experiment does not necessarily constitute a validation of the conformational ensemble(s) produced by MD, i.e. multiple, and possibly diverse, ensembles may produce averages consistent with experiment. This is underscored by simulations that demonstrate how force fields can produce distinct pathways of the lid-opening mechanism of adenylate kinase that nevertheless sample the crystallographically identified ‘open’ and ‘closed’ conformers.38 Furthermore, extensive simulations of the villin headpiece demonstrated that while MD-derived folding rates and native state structures had good agreement with experiment, the folding pathways and denatured state properties were force-field dependent.39 In addition, experimental observables may be derived using relationships that are functions of molecular conformation and are themselves associated with some degree of error. For example, most chemical shift predictors produce chemical shifts from molecular structures via training against high-resolution structural databases, not solely via calculations from first principles.

Here, we address the extent to which multiple simulations performed for 200 ns each agree with experimental data. Multiple short simulations yield better sampling of protein conformational space than a single simulation with total sampling time equal to the aggregate sampling time of multiple small simulations.40,41 As simulations see increased usage, particularly by those not trained in the method, it is important to place quantitative bounds on the extent to which these simulations agree with experimental data and to understand the limits of their ability to explain experimental findings. Consequently, we have compared how three force fields (AMBER ff99SB-ILDN,42 Levitt et al.,43 and CHARMM3644) used within four MD packages (AMBER,4547 in lucem molecular mechanics (ilmm),48 GROMACS,49 and NAMD50) agree with a diverse set of experimental data for two globular proteins with distinct topologies: the Engrailed homeodomain (EnHD) and Ribonuclease H (RNase H) (Figure 1). The Drosophila engrailed homeodomain has 54 residues arranged into three α-helices (denoted HI-HIII) and constitutes the DNA-binding domain of the larger transcription factor in which it is found. RNase H is an endonuclease α/β protein composed of 155 residues organized into five α-helices (denoted αA – αE) and a single, five-stranded β-sheet (denoted β1 – β5) that hydrolyzes the RNA strand in double stranded RNA-DNA hybrid.58

Figure 1. X-ray crystal structures of the engrailed homeodomain and ribonuclease H.

Figure 1

(a) The simple tertiary structure, small size, and ability to fold autonomously without either disulfide bonds or ligands have made homeodomains the subject of numerous investigations into the mechanisms of protein folding. Here, the crystal structure and sequence of the engrailed homeodomain are shown. On the front view (left), the DNA binding residues in HIII are represented as balls and sticks; on the side view (right), the DNA binding residues on the N-terminus as well as four aromatic residues within the hydrophobic core are shown as balls and sticks. (b) RNase H functions in numerous biological processes including inhibition of replication by removal of R-loops,51 removal of Okazaki fragments,52 synthesis of multicopy single-stranded DNA,53 and removal of misincorporated ribonucleotides.54 Here the crystal structure and sequence of ribonuclease H are shown with the β-sheet colored tan, α-helices cyan, and functional regions burgundy. Four functional regions of RNase H have been identified that are critical for RNase H to bind and hydrolyze RNA-DNA hybrids.5557 These regions include αC and the loop between αC and αD (residues 81 to 101, referred to as the handle region), the loop between β1 and β2 (residues 11 – 22, referred to as the glycine rich loop), the loop between β5 and αE α/β (residues 121–127 referred to as the β5E loop), and the active site, which contains three conserved carboxylate residues (Asp10, Glu48, and Asp70) that coordinate divalent cations that are required for catalysis.

Methods

Molecular Dynamics Simulations

The initial coordinates for simulations of EnHD were obtained from the 2.1 Å resolution X-ray crystal structure solved by Clarke et al. (PDB ID: 1ENH).59 The initial coordinates for simulations of RNase H were obtained from the 1.48 Å resolution crystal structure solved by Katayanagi et al. (PDB ID: 2RN2).58 Crystallographic solvent atoms were removed from these structures and then conventional molecular dynamics simulations were performed using four software package-force field combinations: in lucem molecular mechanics (ilmm)48 with the Levitt et al. force field,43 AMBER with the Amber ff99SB-ILDN force field,42 GROMACS49 with the Amber ff99SB-ILDN force field,42 and NAMD50 with the CHARMM36 force field.44,60,61 The simulations were performed under conditions consistent with those under which the experimental data were obtained. Simulations of EnHD were performed at neutral pH (7.0) at 298 K, and simulations of Rnase H were performed at acidic pH (5.5, histidine residues protonated) at 298 K. All simulations were performed in triplicate for 200 nanoseconds using periodic boundary conditions, explicit water molecules and ‘best practice parameters’, as determined by recent papers in the literature by authors of the software packages [AMBER,6264 ilmm,65 GROMACS,66 and NAMD67] and their associated force fields typically contain many adjustable parameters. Here, we aimed to strike a balance between keeping these parameters consistent and adjusting them only when necessary for specific force fields/ MD package combinations. Within each package/force field combination, simulation methods were kept constant for the two proteins. Simulations of the native state were performed at 298 K and thermal unfolding was simulated at 498 K. Details for the initial preparation of the systems and the 298 K simulations for each force field follows. To promote the inclusion of simulation software parameterizations in ongoing and future force field evaluations, we have included the input files for each of our simulations in the supplementary information. We maintain that the algorithms used and associated input control parameters are as important as the force field per se in determining simulated behavior. This information may also be of use in meta analyses that evaluate force field/ MD software. Also, as there is some overlap in the protein (AMBER and GROMACS) and water (GROMACS and NAMD) force fields used by different programs, the simulations will be referred to throughout by the name of the simulation package used.

AMBER

Simulations were performed with the AMBER14 package and ff99SB-ILDN force field.42 Explicit hydrogen atoms were modeled onto the X-ray structure using the leap module and each protein was solvated with explicit TIP4P-EW68 waters in a periodic, truncated octahedral box that extended 10 Å beyond any protein atom. Each system was then minimized in three stages. First, solvent atoms were minimized for 500 steps of steepest descent minimization followed by 500 steps of conjugate gradient minimization in the presence of 100 kcal mol−1 restraints on protein atoms. Second, solvent atoms and protein hydrogens were minimized for 500 steps of steepest descent minimization followed by 500 steps of conjugate gradient minimization in the presence of 100 kcal mol−1 restraints on protein heavy atoms. Third, all atoms were minimized for 500 steps of steepest descent minimization followed by 500 steps of conjugate gradient minimization in the presence of 25 kcal mol−1 restraints on protein Cα atoms. After minimization, systems were heated to 298 K during 6 successive stages. In each stage, the system temperature was increased by 50 K over 200 ps (25,000 steps) using the canonical NVT (constant number of particles, volume, and temperature) ensemble. (25 kcal mol−1 restraints on protein Cα atoms were present during each stage). After the system temperature reached 298 K, the systems were equilibrated over 7 successive stages. During the first 5 stages, the systems were minimized for 1000 steps (500 steps of steepest descent followed by 500 steps of conjugate gradient minimization) and restraints on protein Cα atoms were decreased from 5 kcal mol−1 to 1 kcal mol-1. Next, the systems were equilibrated using the NVT ensemble for 500,000 steps (1 ns) and then the NPT ensemble for an additional 500,000 steps (1 ns) in the presence of 0.5 kcal mol−1 restraints were present on protein Cα atoms. Production dynamics were then performed using the isobaric-isothermal NPT (constant Number of particles, Pressure, and Temperature) ensemble using a 2 fs time step and coordinates were saved every picosecond for analysis. The SHAKE algorithm was used to constrain the motion of hydrogen-containing bonds.69,70 Long-range electrostatic interactions were calculated using the particle mesh Ewald (PME) method.

GROMACS

Simulations were performed with GROMACS version 5.0.649 and the AMBER ff99SB-ILDN42 force field. Hydrogen atoms were modeled onto the X-ray structure using pdb2gmx prior to solvation with TIP3P71 waters in periodic, cubic boxes that extended 10 Å beyond any protein atom. Solvent molecules were replaced with counter ions until the system was neutralized. Throughout the following stages, a Verlet cutoff scheme72 was employed with a 10 Å cutoff for both electrostatic and van der Waals interactions and LINCS was employed to constrain bonds.73 Electrostatic interactions were calculated using PME. The solvated systems were minimized for 50,000 steps using steepest descent minimization. Systems were then equilibrated over two stages in the presence of positional restraints on protein atoms. First, systems were equilibrated in the NVT ensemble for 50,000 steps followed by equilibration in the NPT ensemble for an additional 50,000 steps. Finally, production dynamics were performed in the NPT ensemble with a 2fs time step and coordinates were saved every picosecond for analysis.

ilmm

Simulations were performed with the in lucem molecular mechanics (ilmm) package and Levitt et al. force field43 using our standard protocols. Explicit hydrogen atoms were modeled onto the X-ray crystal structures prior to steepest descent minimization for 1000 steps. Each protein was solvated with explicit flexible 3-center (F3C)74 water molecules in a periodic, cubic box that extended 10 Å beyond any protein atom, with the solvent density set to the experimental value at 298 K (0.997 g mL−1).75 Solvent atoms were then minimized for 1000 steps and equilibrated for 500 steps (1 ps) prior to additional, separate minimization of the solvent (500 steps) and protein (500 steps) atoms. Conventional molecular dynamics simulations were then performed using the microcanonical NVE (constant Number of particles, Volume, and Energy) with a target temperature of 298 K. The equations of motion were propagated using a 2 fs time step with a 10 Å force-shifted nonbonded cutoff,43,76 and coordinates were saved every picosecond for analysis. In contrast to the other software packages described here, ilmm with the Levitt et al. force field43 subscribes to a molecular-level representation and natural Boltzmann sampling through use of the NVE ensemble rather than trying to control macroscopic variables, such as temperature and pressure, in these microscopic systems. Such temperature and pressure coupling lead to very frequent scaling of the velocities, which in turn provides discontinuous trajectories. This may not be an issue if conformational sampling is desired as opposed to pathways for those conformational changes, but ilmm was developed with the objective of characterizing both ‘kinetic’ pathways and ‘equilibrium’ states. In addition, ilmm does not restrain atomic motion via algorithms such as LINCS and SHAKE nor introduce artificial periodicity into the molecular system via algorithms such as PME. 76

NAMD

Simulations were performed with NAMD version 2.1050 and the CHARMM36 force field44. Hydrogen atoms were modeled onto the X-ray crystal structures using psfgen prior to solvation with TIP3P waters71 in a periodic box that extended 10 Å beyond any protein atom. Next, minimization was performed in two phases. In the first stage, minimization was performed for 20,000 steps with all hydrogen-containing bonds constrained and protein atoms fixed. In the second stage, minimization was performed for 1000 steps with all protein backbone atoms fixed and an additional 1000 steps with no fixed atoms. After minimization, systems were heated to 298 K over 10,000 steps with harmonic restraints on backbone atoms that were gradually decreased from 5.0 to 0 kcal mol-1. After heating, systems were equilibrated for 100,000 steps (200 ps) in the NPT ensemble. Finally, production dynamics were performed in the NVT ensemble using a 2 fs time step. Van der Waals interactions were truncated with a switching potential and coordinates were saved every picosecond for analysis. Electrostatic interactions were calculated via PME summation and SHAKE was used to constrain bonds.

High Temperature Unfolding Simulations

In addition to the native state simulations described above, we also performed high temperature unfolding simulations of EnHD with the same force fields as for the native simulations. The high temperature unfolding protocols were similar to those for the native state simulations, with the following changes. The simulations were performed at 498 K in triplicate for 10 ns, non-bonded cutoffs were reduced by 2 Å, and structures were saved every 0.2 ps for analysis. To keep solvent molecules in the liquid state, the pressure was set to ~26 atm77 for simulations with GROMACS, AMBER, and NAMD; for ilmm, the solvent density was set to the experimental value at 498 K and ~26 atm (0.829 g ml−1).75,77 For AMBER, the two final equilibration phases were reversed, with NPT equilibration (500,000 steps) followed by NVT equilibration (500,000 steps), and the NVT ensemble was employed for production dynamics. For NAMD, the NPT ensemble was employed for production dynamics.

Analysis of MD Data

After production dynamics were completed, all trajectories were converted into an ilmm-compatible format and analyses were performed identically for all trajectories. Unless otherwise specified, 298 K analyses were performed on an ensemble created by pooling all three replicate simulations and the first 20 ns of each simulation were excluded from analysis, yielding 5.4 × 105 structures in each ensemble.

Calculation of Gross Structural Changes and Dynamic Fluctuations

The root mean squared deviations (RMSDs) were calculated by aligning each frame to the crystal structure and fluctuations (RMSFs) were calculated by aligning each frame to the average structure calculated after the 20 ns equilibration period. The Cα atoms for all ‘core’ residues were included in the alignment and subsequent calculations. ‘Core’ residues refer to all residues except for flexible N- and C-terminal residues (EnHD: residues 8–53, RNase H: residues 5–142). Experimental Β-factors were compared with RMSFs via Equation 1, where B = experimental B-factors.78

RMSF=(3B/8π2)1/2 Equation 1

Calculation of NMR chemical shifts.

Simulated chemical shifts were calculated with the SHIFTX2 program.79 Experimental data for EnHD and RNase H were obtained from the Biological Magnetic Resonance Bank (BMRB)80 entries 1553681 and 1657,82 respectively. Chemical shifts were calculated for 1% of the ensemble (one frame every 100 ps). For RNase H, the re-referenced chemical shifts provided by the RefDB were used.83

Calculation of NMR scalar coupling constants

We calculated 3JHα,HN scalar coupling constants from MD simulations using the Karplus relation84 (Equation 2) by taking the average of the coupling constants calculated for each frame in the simulation. The coefficients for the Karplus equation (C0, C1, and C2) were obtained from the literature, and we used 7 different parameterizations.8591 Experimental data for EnHD and RNase H were obtained from the BMRB entries 1553681 and 1657,82 respectively.

3J(θ)=C0+C1cosθ+C2cos2θ Equation 2

Calculation of NOE satisfaction

Experimental nuclear Overhauser effect (NOE) values for EnHD and RNase H were obtained from the BMRB, entries 1553681 and 1657,82 with 654 and 1428 NOEs, respectively. NOEs were classified as satisfied in simulations based on the inequality in Equation 3, where r is the distance between a pair of protons, N is the number of frames, and rUB is either the experimental upper bound restraint distance or 5 Å, whichever is greater.

<r6>rUB Equation 3

Calculation of generalized amide S2 order parameters

Experimental order parameters for EnHD were obtained from BMRB entry 15336.81 Experimental order parameters for RNase H were obtained from the supplemental information of Stafford et al.92 MD-derived NH bond order parameters were calculated using the method described by Wong and Daggett using a 250 ps window for EnHD and a 10,000 ps window for RNase H given its ~9.7 ns tumbling time.93 Final order parameters are reported by averaging the results from the three replicate simulations.

Dihedral Angle Principal Component Analysis

Principal component analysis (PCA) is frequently employed to investigate protein dynamics by systematically reducing the dimensionality of complex motions into simpler components.94 Here, we used PCA to investigate the dynamics of the Gly-rich loop in RNase H. Although Cartesian coordinates are normally employed in PCA calculations, we chose to use dihedral angle PCA of selected residues to investigate dynamics in the Gly-rich loop region of RNase H using the sine and cosine components of the ϕ and ψ dihedral angles of residues 11–22 as input.

Transition State Identification

The protein folding/unfolding transition states were identified from the high temperature unfolding simulations by using a previously established conformational clustering method.9597 Using this method, a pairwise Cα RMSD matrix describes the conformational similarity of all structures in n x n-dimensional space (where n = the number of frames in each simulation). This matrix is then projected into three dimensions such that frames with similar conformations are clustered in the reduced dimensional space. We then choose an ensemble of conformers prior to the onset of a significant conformational change to model the transition state structure.

Transition State Evaluation

To assess how well the putative transition state ensemble chosen from MD simulations agrees with experimental observations, we performed a comparative analysis between MD-derived S values and ϕ-values.6 S values, or structure index values, are semi-quantitative equivalents of experimental ϕ-values98,99 and incorporate secondary and tertiary structure components. S values are defined as S = (S)(S), where S describes the secondary structure component and S describes the tertiary structure component, and provide the fraction of native structure present in the transition state on a per-residue basis. The fraction of native secondary structure, S, for residue i is defined as the fraction of time that the dihedral angles of residues i - 1, i, and i + 1 spend within ± 35° of the values in the X-ray structure. The fraction of native tertiary structure, S, for residue i is defined as the number of tertiary contacts present in the transition state divided by the number of tertiary contacts present in the native state. Tertiary contacts were defined for interactions that heavy atoms in the residue of interest form with others separated by two or more residues. (≤ 5.4 Å for C-C contacts and ≤ 4.6 Å for all others).

Results & Discussion

Native State Sampling

The X-ray and NMR-derived structures deposited in the Protein Data Bank (PDB, www.rcsb.org)100 are averages over space and time, washing over the subtle and varied excursions from the native state frequently taken by globular proteins in solution.101 One of the chief goals of MD simulations is to explore such excursions beyond the native state. Before comparing the simulations against NMR observables, we first examined their conformational sampling relative to the starting native structures. We calculated the Cα RMSDs and RMSFs for EnHD and RNase H by aligning the core residues to either the starting structure (RMSD) or the average MD-ensemble structure (RMSF). For EnHD, the core residues were 8–53 and for RNase H, the core residues were 5–142 (Figure 1). For both proteins in all force fields, the Cα RMSDs reached stable values by 100 ns (Figure S1). Averaging over all three replicate simulations per MD package, we found that EnHD had average Cα RMSDs ranging from 0.6 Å (GROMACS and NAMD) to 0.8 Å (AMBER) to 1.0 Å (ilmm) and that RNase H had average Cα RMSDs ranging from 1.3 Å (GROMACS) to 1.4 Å (NAMD) to 1.5 Å (AMBER) to 2.4 Å (ilmm) (Figure 2, Table 3). Independent of the protein system, GROMACS and NAMD produced narrow Cα RMSD distributions with little variation between replicate simulations, whereas AMBER and ilmm produced broader distributions with more variation between simulation replicates (Figure 2, Table 3).

Figure 2. The distribution of Cα root mean squared deviations for EnHD and RNase H.

Figure 2

Overlaid histograms of Cα RMSDs were constructed for each of the three replicate simulations of EnHD (left) and RNase H (right) for simulations performed with AMBER (orange), GROMACS (green), ilmm (purple), and NAMD (blue) at 298 K.

Table 3.

Average (± standard deviation) Cα RMSDs for EnHD and RNase H native state simulations. The first 20 ns of each simulation was excluded in this calculation.

AMBER GROMACS ilmm NAMD

EnHD Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å)
Run 1 0.67 0.10 0.59 0.11 1.07 0.26 0.60 0.09
Run 2 0.95 0.22 0.64 0.14 0.85 0.16 0.56 0.08
Run 3 0.75 0.13 0.58 0.09 1.01 0.16 0.57 0.10
Ensemble 0.79 0.20 0.61 0.12 0.98 0.22 0.57 0.09
RNase H Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å) Avg. (Å) St. Dev. (Å)

Run 1 1.57 0.22 1.35 0.14 2.50 0.12 1.44 0.23
Run 2 1.45 0.16 1.37 0.12 2.28 0.26 1.50 0.21
Run 3 1.58 0.24 1.30 0.13 2.35 0.33 1.38 0.12
Ensemble 1.53 0.22 1.34 0.14 2.38 0.27 1.44 0.20

We analyzed the per-residue contributions to the Cα RMSDs and found that for EnHD, all force fields consistently produced some deviation from the native conformation in two regions. The first region was HIII, the C-terminal helix that contains many DNA binding residues. In ilmm and AMBER, HIII frayed at the C-terminus; however, the nature of this local unfolding was force field dependent. In one ilmm simulation, HIII rotated away from the hydrophobic core, exposing W48 to solvent. After sampling this near-native conformation, HIII began to refold (Figure S2). In one AMBER simulation, the C-terminal residues lost helical structure (Figure S2). Partial loss of helical structure was also present for GROMACS and NAMD, but to a lesser extent (Figure S2). The second region with significant deviation from the native conformation was the loop connecting HI and HII. In ilmm, residues 24 and 25 underwent a dihedral transition, reducing the chain length of the HI-HII loop and producing a 1.9 Å Cα RMSD for residue 29. A similar event was observed with AMBER, however, instead of a single flip, multiple residues in the HI-HII loop experienced small dihedral angle shifts, resulting in a 2.7 Å Cα RMSD for residue 29 (Figure S3). The heightened degree of motion is supported by experiment. Stollar et al. found that the HI-HII loop undergoes conformational fluctuations on the μs-ms timescale.102 These initial displacements seen in MD may be precursors to more significant conformational changes.102 For RNase H, the major contributions to the RMSD came from 5 different regions of the protein: the Gly-rich loop (residues 11–22), the β23 loop (residues 28–30), the αA4 loop (residue 59–63), the handle region (residues 81–101), and the β5E loop (residue 121–127) (Figure 1).

We then compared the simulated Cα RMSFs to crystallographic B-factors and found that AMBER and ilmm had substantially lower root-mean-squared error (RMSE) between the simulated and experimental values than GROMACS or NAMD (Table 1). The correlation coefficients were high for all force fields, ranging between 0.82 and 0.87. The lower RMSEs for AMBER and ilmm can be traced to the HII-HIII loop, N-terminal residues, and C-terminal residues (Figure S4). The dynamics in the AMBER and ilmm simulations lead to the lower errors relative to GROMACS and NAMD. Lower correspondence between the B-factors and simulation was observed for RNase H for all MD packages (correlation coefficients all below 0.7), but the associated RMSEs were low (less than 0.3 Å) (Table 2).

Table 1.

Correspondence between simulation and experiment for EnHD

AMBER GROMACS ilmm NAMD

Rb RMSEc Rb RMSEc Rb RMSEc Rb RMSEc

B-factor 0.82 0.65 0.87 1.24 0.83 0.4 0.83 1.1
Cα CSa 0.98 0.83 0.98 0.83 0.96 0.95 0.98 0.75
Cβ CS 0.99 0.66 0.99 0.7 0.99 0.82 0.99 0.71
C CS 0.90 1.69 0.88 1.63 0.91 1.84 0.92 1.57
N CS 0.84 1.02 0.85 1.1 0.82 0.92 0.86 0.86
Hα CS 0.92 0.13 0.92 0.13 0.87 0.17 0.92 0.14
H CS 0.75 0.38 0.77 0.37 0.79 0.35 0.82 0.34
3JHN,Hα 0.68 1.10 0.82 0.70 0.70 0.94 0.83 0.87
S2 0.71 0.13 0.90 0.08 0.78 0.11 0.94 0.07
a.

CS = Chemical shift

b.

R = Pearson’s correlation coefficient

c.

RMSE = root mean squared error

Table 2.

Correspondence between simulation and experiment for RNase H

AMBER GROMACS ilmm NAMD

Rb RMSEc Rb RMSEc Rb RMSEc Rb RMSEc

B-factor 0.67 0.32 0.62 0.38 0.49 0.37 0.68 0.33
Cα CSa 0.98 0.82 0.98 0.78 0.97 1.19 0.99 0.73
N CS 0.92 2.48 0.94 2.32 0.86 3.16 0.93 2.34
Hα CS 0.93 0.26 0.94 0.23 0.86 0.36 0.93 0.25
H CS 0.83 0.42 0.85 0.39 0.77 0.47 0.86 0.37
S2 0.85 0.10 0.90 0.09 0.83 0.11 0.88 0.09
a.

CS = chemical shift

b.

R = Pearson’s correlation coefficient

c.

RMSE = root mean squared error

Comparison with NMR Observables

Next, we assessed the ability of MD simulations to reproduce four types of NMR observables: chemical shifts, nuclear Overhauser effect crosspeaks (NOEs), backbone NH order parameters, and scalar coupling constants.

Chemical Shifts

Chemical shifts report on the local electronic environments of distinct nuclei within proteins. We calculated the chemical shifts from our simulations using SHIFTX2.79 For the calculation, the first 20 ns of each trajectory was excluded and all three replicate simulations were combined into a single ensemble. To determine the level of sampling required to accurately report chemical shifts, we calculated the chemical shifts for run 1 of RNase H performed with AMBER using 1 ps, 10 ps, and 100 ps granularity and confirmed that subsampling the simulation at 100 ps resulted in predicted shifts that were nearly identical for the same calculation run at full granularity (Figure S5). All the MD packages in this study reproduced chemical shifts with errors comparable to those associated with SHIFTX2 (Figure 3, Figures S6–7, Tables 12). For comparison, we also calculated the agreement for the X-ray crystal structures for EnHD and RNase H. Prior to chemical shift calculations, hydrogens were modeled onto the crystal structure using ilmm and the hydrogen atoms were minimized for 1000 steps of steepest descent minimization. For some nuclei, the MD-generated ensembles produced better agreement than the X-ray conformations. For example, the X-ray conformation of K17 in EnHD did not agree well with the experimental data, especially for the N, Cα, Hα, and HN nuclei (Figure S8). In general, MD ensembles produced chemical shifts more consistent with the experimental data than the X-ray structure alone, particularly for the N and HN chemical shifts; however, little improvement was observed for Cα shifts (Figure S8). We investigated the origins of this discrepancy and found that K17 had distinct hydrogen bonding patterns within different force fields (Table S1). The rate of formation of a main chain hydrogen bond between K17 (donor) and A14 (acceptor) may modulate the predicted chemical shift for the amide hydrogen of K17 (Table S1). Wagner et al. and others have demonstrated that hydrogen bond geometry can influence the chemical shifts for HN nuclei and a subtle difference in hydrogen bonding patterns can contribute to enhanced correspondence between MD and experiment.103 Additionally, there were a few instances where MD simulations produced worse agreement with the experimental data than the reference X-ray structure. Overall however, all the MD packages produced strong agreement with experimental chemical shifts, exhibited by the high correlation coefficients and low RMSEs (Figure 3, Tables 12, Figure S6, S7).

Figure 3. Comparison of the correspondence between MD-derived and experimental chemical shifts.

Figure 3

The correlation coefficients (left column) and RMSEs (right column) for the chemical shift correspondence for EnHD (top row) and RNase H (bottom row) are shown, stratified by nucleus type.

Nuclear Overhauser Effect Crosspeaks

Nuclear Overhauser effect crosspeaks (NOEs) relate to interproton distances and provide significant conformational information. We calculated NOEs from our simulations and stratified the results in two ways. First, we analyzed the percentage satisfaction for NOEs as a function of sequence separation: short-range NOEs refer to NOEs arising from residues i → i + 1, i + 2; medium-range NOEs: i → i + 3, i + 4, i + 5; and long-range NOEs: i → i >i+5. Second, we analyzed the number of NOE violations, stratified by violation distance. For comparison, we also calculated the NOE satisfaction for the EnHD and RNase H crystal structures. For crystal structure analyses, we used the crystal structures containing hydrogens, as described above. For comparison, we also calculated the NOE satisfaction for the EnHD NMR ensemble (PDB ID: 2JWT). For EnHD (654 total NOEs), GROMACS, ilmm, and NAMD ensembles had marginally better agreement with the NOE data (96%, 97%, and 96%, respectively) than the crystal structure alone (95%), while AMBER had marginally worse agreement (94%) (Table S2). Deviations from the level of agreement with the crystal structure were small (< 2%) and there was little variation among replicate simulations. In addition, the total number of NOE violations was small. Across all force fields, the mean number of violations was 28 with an average violation distance of 0.625 Å. The number of severe NOE violations (i.e. those with a violation distance > 2 Å) was also small (AMBER: 2, mean distance = 2.6 Å; GROMACS: 2, mean distance = 2.6 Å; ilmm: 1, mean distance = 2.7 Å; NAMD: 4, mean distance = 3.0 Å); however, there were 12 severe violations in the first NAMD replicate (Table S2). For RNase H (1428 total NOEs), the X-ray structure had marginally better agreement with the experimental NOE data than any MD-generated ensemble (X-ray: 98%, AMBER: 97%, GROMACS: 97%, ilmm; 95%, NAMD: 97%) (Table S3). Again, deviations from the level of agreement with the crystal structure were small (< 3.5%) and there was little variation in the agreement among replicate simulations. Across all force fields, the mean number of violations was 47 with an average violation distance of 0.775 Å. There were a number of severe NOE violations (i.e. those with a violation distance > 2 Å): AMBER: 3, mean distance = 3.5 Å; GROMACS: 2, mean distance = 3.5 Å; ilmm: 5, mean distance = 2.9 Å; NAMD: 4, mean distance = 4.3 Å) (Table S3).

Next, we grouped NOEs by the residues with which they were associated and found that several residues had force field-dependent NOE satisfaction. For example, Leu 26 of EnHD, located in the HI-HII loop, is associated with 50 NOEs: 34 were satisfied by all MD simulations, the X-ray structure, and the NMR ensemble; 1 was never satisfied; and 15 had model-dependent satisfaction (Table S4). Of the 15 NOEs with model-dependent satisfaction, there were 4 NOEs satisfied by the X-ray structure and not the NMR ensemble and 10 satisfied by the NMR ensemble and not the X-ray structure (Table S4). The satisfaction of these NOEs was dependent on the rotameric state of Leu 26, which was in the t, g+ conformation in the X-ray structure and the g-, t conformation in the NMR ensemble (Figure 4). In AMBER, GROMACS, and ilmm, Leu 26 alternated between these two conformations; however, simulations performed with NAMD largely retained the X-ray conformation for Leu 26, populating the t, g+ conformation for 97% of the simulation (Figure 4, Table S5). While no one model satisfied all the NOEs associated with Leu 26, AMBER, GROMACS, and ilmm had better agreement with the NOEs (and on par with the NMR ensemble) than the crystal structure or NAMD. In this instance, rotameric exchange was necessary to satisfy the NOEs that were not present in the crystal structure. The χ1 torsional potentials for Ile, Leu, Asn, and Asp were modified in the ff99SB-ILDN force field, which we used in the AMBER and GROMACS simulations. The improvements made in this force field likely contributed to the modelling of L26 in the AMBER and GROMACS simulations; however, control simulations employing the ff99SB force field were not performed, so the degree of improvement cannot be quantified here. Although we found that 3 of the 4 MD packages had better agreement than the crystal structure with respect to the solution behavior of L26, we cannot say which MD-generated ensemble best agreed with the ‘true’ behavior of L26 in solution. We found that the force fields/packages had variable populations and lifetimes of the two primary rotamer conformations (Tables S5, S6). Furthermore, the HI-HII loop undergoes conformational exchange on the μs-ms timescale, indicating that the MD simulations may not have captured the full extent of dynamics associated with the side chain of this residue.102

Figure 4. Conformational heterogeneity in Leu 26.

Figure 4

(a) Leu 26 occupies distinct rotameric conformations in the X-ray structure and NMR ensemble of EnHD. In the X-ray crystal structure (left), L26 occupies the t, g+ conformation while it occupies the g-, t conformation in the NMR ensemble (right). (b) Side chain χ1 / χ2 dihedral angle maps for the different MD packages. The red point denotes the conformation of L26 in the X-ray structure.

Scalar Coupling Constants

Scalar coupling constants can be related to various dihedral angles via the Karplus relation (Equation 2). Here we calculated the 3JHN,Hα coupling constants, which are related to the ϕ dihedral angle, using seven different parameter sets obtained from the literature for the Karplus equation.8591 Table S7 shows that the choice of Karplus parameters affects the level of agreement between simulation and experiment, with the Schmidt et al. and Smith et al. parameter sets consistently producing higher RMSEs. Overall, the Habeck parameter set, which was derived by applying Bayesian regression models to high-resolution data from ubiquitin, produced the best agreement, with correlation coefficients ranging from 0.80 to 0.89 and RMSDs ranging from 0.8 Hz to 0.98 Hz (Table S7, Figure 5). While most residues had excellent agreement with the experimental data, some residues, such as E22, had poor agreement independent of force field or parameter set, and some residues, such as N41, had force-field dependent agreement (Figure S9). The N41 coupling constant was not described well in the X-ray structure, the AMBER ensemble, or the NAMD ensemble; however, both ilmm and GROMACS-derived ensembles had excellent agreement (ilmm error: 0.04 Hz, GROMACS error: 0.5 Hz). In all MD-generated ensembles, N41 sampled two regions of Ramachandran space: one located in the PIIL basin and one located on the boundary between the β and PIIL basins (Figure S9). The ratio of sampling in these two regions dictated the ensemble-averaged value for ϕ and, in turn, the coupling constant. Upon further analysis, we found that, structurally, N41 functions as a dynamic helix cap, with both the backbone carbonyl and side chain carboxamide group forming multiple hydrogen bonds with the N-terminal residues of HIII (Figure S9, Table S8). These data suggest that force-field specific hydrogen bonding patterns for N41 may have contributed to the level of agreement with experimental data; however, it is also possible that intrinsic ϕ/ψ preferences for individual amino acids, which are known to be force-field dependent,104 may have also influenced the agreement for N41. Ultimately, however, it is not possible to assess which MD-generated ensemble produced the best prediction for in-solution behavior, as numerous ϕ/ψ distributions can yield ensembles that satisfy the experimental data. The difficulties in evaluating simulated and experimental coupling constants are exacerbated by the fact that 2–4 ϕ values map to a single coupling constant and use of the Karplus relation itself can introduce error in the form of the three coefficients.

Figure 5. MD simulations reproduce experimentally determined 3JHN,Hα coupling constants for EnHD.

Figure 5

Simulated vs experimental coupling constants are plotted as a function of residue number. Here, the Habeck et al. parameters have been used.

S2 Generalized Order Parameters

NMR-derived generalized S2 order parameters of NH groups report on the local extent of motion of the polypeptide chains. We calculated the backbone order parameters for EnHD and RNase H separately for each simulation and the reported values were averaged over the three replicates. For EnHD, there was good correspondence between the simulated and experimental values, with correlation coefficients ranging from 0.71 – 0.94 and RMSEs ranging from 0.7 – 1.3 (Figure 6, Table 1). Across all force fields, the N-terminal residues and turns had the greatest error. Furthermore, although ilmm had above-average errors for N-terminal residues, it produced better agreement for helical residues, particularly residues 10–14 and 48–51. (Figure 6, Figure S10) There was also good correspondence between simulation and experiment for RNase H (Table 2, Figure 7). The level of agreement for Gly 15 was force field dependent, with ilmm producing the best agreement (Figure 7, Figure S11). Gly 15 is within the Gly-rich loop, which, along with the β5αE loop, coordinates the DNA/RNA hybrid prior to catalysis. We performed dPCA to explore the conformational distributions of the Gly-rich loop. Our analysis was aided by multiple X-ray and NMR structures of RNase H and several homologues. The following structures were included in our analysis: RNase H from E. coli (X-ray, PDB ID: 2RN2), RNase H from E. coli (NMR, PDB ID: 1RCH), RNase H from Thermus thermophilus (X-ray, PDB ID: 1RIL), a stabilized RNase H variant from E. coli (X-ray, PDB ID: 1GOA), and two structures of RNase H D210N from Homo sapiens in complex with DNA/RNA hybrids (X-ray, PDB ID: 2QKB, in complex with a 20-mer DNA/RNA hybrid; X-ray, PDB ID: 2QKK, in complex with a 14-mer DNA/RNA hybrid). The first two principal components described 53% and 12% of the variance within the dataset and Gly15 had strong weights. There were several highly populated regions in PC space (Figure 8). Of these, one corresponded to the unbound conformation of the Gly-rich loop observed in solution (denoted by a blue arrow in Figure 8) and another corresponded to the bound conformation of the loop observed in solution (denoted by the red arrow in Figure 8). Figure 8c breaks down the sampling of PC space by force field and simulation number. Visualization of the PC maps shows that ilmm was the only force field / MD package that sampled both the bound and unbound conformations; furthermore, the unbound solution conformations were the dominant conformers sampled by ilmm. In contrast, the remaining force fields primarily sampled the X-ray conformation and another region that was not observed in the experimental data (far left of Figure 8). Moreover, even with 600 ns of aggregate simulation time, AMBER, GROMACS, and NAMD were unable to achieve the level of sampling seen in ilmm with 300 ns of aggregate simulation time (i.e. ilmm reached this degree of sampling in < 100 ns). While there are dramatic differences in the sampling, there is a limit to the extent to which we can determine the level of agreement between simulation and experiment, as the number, distribution, and interconversion rates of Gly-rich loop conformations cannot be derived from static structures alone.

Figure 6. Correspondence between experimental and MD-derived order parameters for EnHD.

Figure 6

The experimental (black points) and MD-derived order parameters (red points – AMBER, purple points – ilmm, green points – GROMACS, and blue points – NAMD) are plotted as a function of residue number for EnHD. Excellent correspondence was observed for all force fields except at the N-terminus. A larger number of simulations or significantly longer simulations are required for MD to reproduce the order parameters for highly flexible terminal residues that may become trapped in local minima.

Figure 7. Correspondence between experimental and MD-derived order parameters for RNase H.

Figure 7

The experimental (black points) and MD-derived order parameters (red points – AMBER, purple points – ilmm, green points – GROMACS, and blue points – NAMD) are plotted as a function of residue number for RNase H.

Figure 8. Conformational heterogeneity in the Gly-rich loop.

Figure 8

(a) The dPCA landscape for the residues in the Gly-rich loop, constructed using conformations aggregated from the experimental reference structures and MD simulations, maps the conformational heterogeneity in the Gly-rich loop. Black points denote the location of RNase H reference structures (2RN2, 1RIL, 1RCH, 1GOA, 2QKB, 2QK9, and 2QKK) within the dPCA landscape. The blue arrow denotes the region corresponding to the unbound conformation of the Gly-rich loop in solution (b, left). The red arrow denotes the region corresponding to the DNA/RNA bound conformation of the Gly-rich loop in solution (b, right). (c) Conformations sampled by the Gly-rich loop in MD simulations.

Global Comparison with NMR Observables

To assess the overall agreement of the modeled dynamics with experimental observables, we calculated the χ2 statistic using the method of Panelopulos et al.105 χ2 was calculated using Equation 4, where N is the total number of types experimental observables, rmsdMD, Experiment is the RMSD between MD and experiment for a set of observables, and σ is the error associated with each individual measurement. Prior to calculation of the χ2 statistic, we excluded data for residues that were poorly modeled in at least 3 of the 4 force field/software package sets. Data were excluded on a per-protein, per-data-type basis. Residues were considered poorly modeled if the absolute value of the difference between the experimental and MD derived values was greater than a data-type specific cutoff. The cutoffs were set as twice the mean of the absolute value of the difference between the experimental and MD derived values for all data points. (Figure S12) The data associated with EnHD were organized into 8 types of observables: N chemical shifts, Cα chemical shifts, Cβ chemical shifts, C’ chemical shifts, Hα chemical shifts, HN chemical shifts, 3JHN,Hα coupling constants, and backbone NH order parameters. The data associated with RNase H were organized into 5 types of observables: N chemical shifts, Cα chemical shifts, Hα chemical shifts, HN chemical shifts, and backbone NH order parameters. A value of 0.78 was used for the error associated with the coupling constants, as determined by Beauchamp et al.106 A value of 0.1 was used for the error associated with the backbone NH order parameters. Nucleus-specific errors were used for the chemical shift data, based on the rms errors calculated for SHIFTX2: 0.44 ppm (Cα), 0.52 ppm (Cβ), 0.53 ppm (C), 0.12 ppm (Hα), 0.17 ppm (HN), and 1.12 ppm (N). In the case of RNase H, the expected rms error for SHIFTX2 versus the chemical shifts in the RefDB83 entry for 2RN2 were used for N, Cα, Hα, and HN chemical shifts (1.48, 0.79, 0.14, and 0.30 ppm respectively). For EnHD and RNase H, the χ2 values fell within the expected distribution (Figure 9, Table S9).

Figure 9. Global correspondence between simulation and experiment as assessed by the χ2 statistic.

Figure 9

Each plot shows the χ2 distribution (black curves) for the degrees of freedom associated with each comparison (8 degrees of freedom for EnHD, left; 5 degrees of freedom for RNase H, right). The vertical lines denote the χ2 value calculated for each force-field/software package combination (AMBER, red; GROMACS, green; ilmm, purple; NAMD, blue) as well as the expectation value for that distribution (black).

χ2=i=1N(rmsdiMD,Experiment)2σi2 Equation 4

High Temperature Unfolding

High temperature simulations have been used to probe protein folding/unfolding pathways6,7 and to aid in the design of thermostable protein variants.4 In prior studies, putative protein (un)folding transition states have been identified from high temperature simulations for multiple proteins including EnHD,107 c-Myb,107 chymotrypsin inhibitor 2,6 and barnase7,97 and others. We evaluated the ability of the different MD packages and force fields to model known components of the (un)folding pathway of EnHD. First, we evaluated the sampling of the transition state of protein (un)folding by comparing MD-derived S-values to experimentally derived ϕ-values.107 The ϕ- and S-values reflect the degree of structure present in the transition state along the sequence. The S-values were calculated for the putative transition state ensembles identified via conformational clustering (Figure S13), and the averages over the 3 replicates showed decent agreement with the experimentally derived ϕ-values for ilmm (R = 0.70) and moderate correspondence for AMBER (R = 0.48), GROMACS (R = 0.50) and NAMD (R = 0.35) (Table S10). Next, we examined whether any of the unfolding simulations sampled the known EnHD folding intermediate, the structure of which was first predicted computationally in 2000108,109 and later confirmed by NMR in 2005.110 This intermediate structure was observed in simulations performed with ilmm, but not AMBER, GROMACS or NAMD. Finally, we examined whether the unfolding simulations sampled the denatured state after 10 ns at high temperature. Simulations performed with GROMACS and ilmm sampled highly denatured states, followed by AMBER, whereas simulations performed with NAMD retained significant native-like structure over the course of the simulation (Figure 10). The unfolding pathway of EnHD was strongly dependent on the choice of force field and simulation software. Only one force field/software combination, ilmm with the Levitt et al. force field, sampled transition state, intermediate state, denatured state structures, and kinetics of unfolding consistent with the experimental results for this system.108110 While GROMACS sampled somewhat appropriate transition state-like conformations, significant helical structure was lost after passing through the transition state, resulting in increased sampling of denatured state conformers but without sampling the obligatory intermediate structures.109,110 NAMD also sampled transition-state like conformations, but in contrast to GROMACS, the protein never unfolded, preventing sampling of the intermediate and denatured states over the course of the simulations (Figure 10).

Figure 10. Survey of conformations populated during high temperature unfolding of EnHD.

Figure 10

Native state: Five snapshots, extracted from simulation one of the 298 K simulations at 0 ns, 50 ns, 100 ns, 150 ns, and 200 ns, serve as a visual reference of native state sampling. Transition state: For each of the four MD packages, the transition state is represented by three overlaid structures extracted from the simulation that best modeled the experimentally determined transition state. Intermediate state: One intermediate state structure was extracted from the ilmm simulation that best modeled the experimentally determined intermediate state. Model 1 of the experimentally determined intermediate state structure is shown as a transparent ribbon and aligned against the MD-derived intermediate structure. Final structure: To represent the extent of unfolding that occurred during high temperature MD, the most-disrupted final structures from the 498 K replicates are shown.

Summary and Outlook

The ultimate objective of MD simulations is to visualize dynamic behaviors and structural conformations that cannot be described by experimentally derived structures alone. Usually, the inability of MD simulations to produce conformational ensembles that are consistent with experiment is blamed on the force field. However, the force field is not the sole determinant of MD-simulated behavior or accuracy; if it were, the results from the simulations performed with GROMACS and AMBER would be more similar, as the same force field was used with both packages. To this end, we encourage authors of MD studies to include the standard input files used to model and perform any published simulations. This will facilitate meta-analyses and discussions of the impact of MD software-specific parameters on modeled behavior. Given the differences that we observed, ongoing and future validation efforts must account for both the force field parameterizations as well as the methods and approximations used to propagate systems in time without conflating the two. This includes approximations that may introduce artificial periodicity, such as PME (ref. 76 and references therein), or overly constrain the simulated systems, such as LINCS and SHAKE. In addition, the choice of simulation ensemble can play a role in the dynamics, as suggested by the inability of AMBER, GROMACS and NAMD to reproduce the known unfolding behavior of EnHD. We believe that the improved ability of ilmm to sample conformational intermediates lies in the flexibility and conformational changes that are made possible when such approximations are not allowed to constrain or impede molecular motions. In addition, the force fields use different force constants on the dihedral angles; the Levitt et al., force field uses a value of 0 for the barrier to readily allow conformational transitions subject to steric and electrostatic interactions. AMBER and CHARMM force fields use values of 1.13/1.88 kcal/mole and 1.36/1.46 kcal/mole for Φ/Ψ, respectively. These barriers, while relatively small, may also aid in retention of the starting structure such that native states are well maintained, and unfolding, even at high temperature is discouraged. Another possibility lies in the use of microcanonical ensemble in ilmm, which allows for Boltzmann sampling of an isoenergetic surface of the conformational landscape over continuous reaction pathways without frequent scaling of the velocities, which dampen motion.

In cases where dynamic behavior is required, restricted sampling results in worse agreement with experimental data that reflect protein dynamics in solution. We observed several instances of this in our simulations, particularly with NAMD. For example, simulations of EnHD performed with NAMD had the lowest Cα RMSDs relative to the crystal structure, but they were unable to recover NOEs associated with Leu 26 that were satisfied by the other MD packages. In addition, the coupling constant for N41 as predicted by NAMD was nearly identical to the value present in the X-ray structure. However, the X-ray structure is not in agreement with the solution behavior of N41, and ilmm, GROMACS and AMBER (both using the same protein force field but different TIP4P-EW and TIP3P water models, respectively) showed improved agreement with experiment. Finally, after 10 ns of simulation at 498 K, NAMD (using the CHARMM36 force field and TIP3P water model) was unable to produce significantly denatured structures comparable to those produced by the other MD packages and instead all three helices remained intact and packed. This highlights a more general issue. Many force field/MD packages have been developed to maintain the conformation of the starting crystal structure by attenuating or impeding dynamics in a variety of ways, and this is generally viewed as desirable. However, this becomes problematic if one is interested in characterizing larger scale native protein dynamics or protein unfolding.

Another crucial component to improving MD software is to establish more complex means of comparing experimental and computational results in a systematic and quantitative fashion. As we have shown here, when observables were evaluated at a coarse level of detail, these MD packages showed similar agreement with experiment with respect to the native state. However, when individual residues were examined in detail, significant differences were observed in native dynamics. A prominent difference was the behavior of L26 in EnHD. While three of the four MD packages recovered many NOEs missing from the crystal structure, the underlying dynamics of L26 (i.e. the populations and lifetimes of different rotameric states) varied significantly. This observation highlights not the shortcomings of MD, but the limitations of the data used to assess the computational results. That is, the available experimental data for L26, for example, are unable to identify which MD package best models the L26 dynamics. Recent studies examining different force fields (AMBER, CHARMM, and OPLS) and their ability to correctly model the strength of salt bridges111 and main chain propensities112 in simple model systems highlight how consideration of detailed interactions and behavior reveals dramatic differences between these force fields / MD packages. More broadly, however, the experimental data presented here are unable to distinguish, with high confidence, which MD-generated ensemble best approximates native protein behavior in solution. Starker differences were seen for protein unfolding, in which case only one of the four MD packages produced unfolding trajectories consistent with experiment. Moving forward, access to the underlying distributions that give rise to the experimental observables will be necessary to improve the quality of MD and to better detect when a simulation does or does not appropriately model protein dynamics.

The results presented here, along with other recent force field validation efforts, show that contemporary force fields produce models that are in similar agreement with experimental results. Additionally, these results show that certain force fields agree better with certain observables than others. Considering these results, it is not possible to prescribe a ‘best’ model; instead, models should be selected based on the information sought. For example, simulations of intrinsically disordered proteins typically require significant alterations to model parameters to obtain accurate results, here is a case where it is undesirable to constrain conformational sampling. Along those lines, additional conformational sampling, obtained through longer or more numerous conventional MD simulations, may aid in the identification of an ‘optimal’ model, but it is not guaranteed to do so. Pantelopulos et al. found that longer simulation times yielded better agreement with experiment, independent of the force field chosen.107 In an evaluation of order parameter agreement, Bowman found that a larger aggregate sampling time yielded better agreement for side chain methyl group order parameters, but essentially no change in the level of agreement for backbone order parameters.113 That study also concluded that the aggregate simulation time and method used to calculate observables affected the level of agreement more than the force field.113 However, these comparison were all between MD packages that constrain motion, and here we found that flexible molecular representations and simulation protocols that do not artificially restrain motion provide greater sampling of conformational space in shorter periods of time. Nonetheless, increased conformational sampling, whether obtained through longer or more numerous simulations or choice of MD/FF package, should facilitate model selection in cases where the data clearly indicate that observation of the dynamics across longer timescales is necessary.

Conclusions

Our results show that the MD programs and force fields studied here show comparable agreement overall with experimental data for the native state. However, we observed instances where the MD packages generated distinct conformational ensembles that agreed equally well with the experimental data. This underscores the fact that agreement with experimental data is necessary, but not sufficient, to validate atomistic simulations. The four MD package/force field combinations unquestionably produced distinct ensembles. For example, hydrogen bond networks, including both the residues engaged in the networks as well as the frequency of different interactions, were variable across the MD packages. While these differences in dynamics may be small in magnitude, such dynamic modes form the background over which more extensive conformational changes occur. Ultimately, quantitative comparisons between such rapid, small amplitude motions and experimental data should enhance our ability to isolate the ‘True’ ensembles present in solution.

Supplementary Material

SI1
SI2

Acknowledgements

We thank the National Energy Research Scientific Computing Center, supported by the DOE Office of Biological Research, which is supported by the US Department of Energy under contract number DE-AC02-05CH11231, for providing computing time. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. This work was supported by the Bioengineering Cardiovascular Training Grant (NIH/NIBIB T32EB1650) to M.C.C. The authors also wish to thank Alissa Bleem, Ivan Vulovic, and Malte Lange for performing some initial simulations as well as Dr. Clare-Louise Towse for helpful discussions.

Footnotes

Supporting Information

The supporting information included with this manuscript in PDF format includes 10 tables, 13 figures, and input files for MD simulations and is organized into two files. The first file, Supporting Information 1, contains the following figures and tables: hydrogen bond occupancy for residue K17 of EnHD (Table S1); NOE satisfaction for EnHD (Table S2) and RNase H (Table S3); NOE satisfaction (Table S4), rotamer populations (Table S5), and rotamer lifetimes (Table S6) of residue L26 of EnHD; comparison of parameter sets used with the Karplus relation (Table S7), hydrogen bond occupancy for residue N41 of EnHD (Table S8); tabulation of χ2 results by data type (Table S9); S-value analysis for EnHD (Table S10); Cα RMSD as a function of time (Figure S1); conformational variation in EnHD (Figures S2–3); Cα RMSF of EnHD (Figure S4); quantification of the effect of ensemble sub-sampling on chemical shift agreement (Figure S5); comparison of MD-derived and experimental chemical shifts for EnHD (Figure S6) and RNase H (Figure S7); detailed analysis of chemical shift agreement for residue K17 of EnHD (Figure S8); detailed analysis of coupling constant agreement for residue N41 of EnHD (Figure S9); agreement between MD-derived and experimental order parameters for EnHD (Figure S10); detailed comparison between the experimental and MD-derived amide S2 order parameters within the Glycine-rich loop of RNase H (Figure S11); plots of the distribution of errors between MD and experiment for data associated with the χ2 calculations (Figure S12); and an example of the conformational clustering method used to identify transition state ensembles from high temperature unfolding simulations (Figure S13). The second file, Supporting Information 2, contains the input files used to perform the native state MD simulations for AMBER, GROMACS, ilmm, and NAMD as described in this manuscript.

References

  • (1).McCammon JA; Gelin BR; Karplus M Dynamics of Folded Proteins. Nature 1977, 267 (5612), 585–590. [DOI] [PubMed] [Google Scholar]
  • (2).Dodson GG; Lane DP; Verma CS Molecular Simulations of Protein Dynamics: New Windows on Mechanisms in Biology. EMBO Rep. 2008, 9 (2), 144–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Vlachakis D; Bencurova E; Papangelopoulos N; Kossida S Current State-of-the-Art Molecular Dynamics Methods and Applications. In Protein Chemistry and Structural Biology; 2014; Vol. 94, pp 269–313. [DOI] [PubMed] [Google Scholar]
  • (4).Childers MC; Daggett V Insights from Molecular Dynamics Simulations for Computational Protein Design. Molecular Systems Design & Engineering 2017, 2 (1), 9–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Kiss G; Çelebi-Ölçüm N; Moretti R; Baker D; Houk KN Computational Enzyme Design. Angew. Chem. Int. Ed. Engl 2013, 52 (22), 5700–5725. [DOI] [PubMed] [Google Scholar]
  • (6).Daggett V; Li A; Itzhaki LS; Otzen DE; Fersht AR Structure of the Transition State for Folding of a Protein Derived from Experiment and Simulation. J. Mol. Biol 1996, 257 (2), 430–440. [DOI] [PubMed] [Google Scholar]
  • (7).Bond CJ; Wong K-BB; Clarke J; Fersht AR; Daggett V Characterization of Residual Structure in the Thermally Denatured State of Barnase by Simulation and Experiment: Description of the Folding Pathway. Proc. Natl. Acad. Sci. U. S. A 1997, 94 (25), 13409–13413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Duan Y; Kollman PA Pathways to a Protein Folding Intermediate Observed in a 1-Microsecond Simulation in Aqueous Solution. Science 1998, 282 (5389), 740–744. [DOI] [PubMed] [Google Scholar]
  • (9).Voelz VA; Singh VR; Wedemeyer WJ; Lapidus LJ; Pande VS Unfolded-State Dynamics and Structure of Protein L Characterized by Simulation and Experiment. J. Am. Chem. Soc 2010, 132 (13), 4702–4709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Lee EH; Hsin J; Sotomayor M; Comellas G; Schulten K Discovery Through the Computational Microscope. Structure 2009, 17 (10), 1295–1306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Grossfield A; Zuckerman DM Chapter 2 Quantifying Uncertainty and Sampling Quality in Biomolecular Simulations. In Reports in Computational Chemistry; 2009; Vol. 5, pp 23–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Lopes PEM; Guvench O; MacKerell AD Current Status of Protein Force Fields for Molecular Dynamics Simulations. In Methods in Molecular Biology (Clifton, N.J.); 2015; Vol. 1215, pp 47–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).van Gunsteren WF; Daura X; Hansen N; Mark A; Oostenbrink C; Riniker S; Smith L Validation of Molecular Simulation: An Overview of Issues. Angew. Chemie Int. Ed 2018, 57, 884–902 [DOI] [PubMed] [Google Scholar]
  • (14).Shaw DE; Grossman JP; Bank JA; Batson B; Butts JA; Chao JC; Deneroff MM; Dror RO; Even A; Fenton CH; et al. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis; IEEE, 2014; pp 41–53. [Google Scholar]
  • (15).Beberg AL; Ensign DL; Jayachandran G; Khaliq S; Pande VS Folding@home: Lessons from Eight Years of Volunteer Distributed Computing. In 2009 IEEE International Symposium on Parallel & Distributed Processing; IEEE, 2009; pp 1–8. [Google Scholar]
  • (16).Larson SM; Snow CD; Shirts M; Pande VS Folding@Home and Genome@Home: Using Distributed Computing to Tackle Previously Intractable Problems in Computational Biology. arXiv preprint 2009, arXiv:0901.866. [Google Scholar]
  • (17).Voter AF Parallel Replica Method for Dynamics of Infrequent Events. Phys. Rev. B 1998, 57 (22), R13985–R13988. [Google Scholar]
  • (18).Bowers KJ; Dror RO; Shaw DE Zonal Methods for the Parallel Execution of Range-Limited N-Body Simulations. J. Comput. Phys 2007, 221 (1), 303–329. [Google Scholar]
  • (19).Freddolino PL; Arkhipov AS; Larson SB; McPherson A; Schulten K Molecular Dynamics Simulations of the Complete Satellite Tobacco Mosaic Virus. Structure 2006, 14 (3), 437–449. [DOI] [PubMed] [Google Scholar]
  • (20).Shaw DE; Maragakis P; Lindorff-Larsen K; Piana S; Dror RO; Eastwood MP; Bank JA; Jumper JM; Salmon JK; Shan Y; et al. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330 (6002), 341–346. [DOI] [PubMed] [Google Scholar]
  • (21).Lindorff-Larsen K; Piana S; Dror RO; Shaw DE How Fast-Folding Proteins Fold. Science (80). 2011, 334 (6055), 517–520. [DOI] [PubMed] [Google Scholar]
  • (22).Prinz J-H; Wu H; Sarich M; Keller B; Senne M; Held M; Chodera JD; Schütte C; Noé F Markov Models of Molecular Kinetics: Generation and Validation. J. Chem. Phys 2011, 134 (17), 174105. [DOI] [PubMed] [Google Scholar]
  • (23).Sawle L; Ghosh K Convergence of Molecular Dynamics Simulation of Protein Native States: Feasibility vs Self-Consistency Dilemma. J. Chem. Theory Comput 2016, 12 (2), 861–869. [DOI] [PubMed] [Google Scholar]
  • (24).Benson NC; Daggett V A Comparison of Multiscale Methods for the Analysis of Molecular Dynamics Simulations. J. Phys. Chem. B 2012, 116 (29), 8722–8731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Henriques J; Cragnell C; Skepö M Molecular Dynamics Simulations of Intrinsically Disordered Proteins: Force Field Evaluation and Comparison with Experiment. J Chem Theory Comput 2015, 11 (7), 3420–3431. [DOI] [PubMed] [Google Scholar]
  • (26).Stanley N; Esteban-Martín S; De Fabritiis G Progress in Studying Intrinsically Disordered Proteins with Atomistic Simulations. Prog. Biophys. Mol. Biol 2015, 119 (1), 47–52. [DOI] [PubMed] [Google Scholar]
  • (27).Halgren TA Potential Energy Functions. Curr. Opin. Struct. Biol 1995, 5 (2), 205–210. [DOI] [PubMed] [Google Scholar]
  • (28).Halgren TA Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. J. Comput. Chem 1996, 17 (5–6), 490–519. [Google Scholar]
  • (29).Halgren TA Merck Molecular Force Field. II. MMFF94 van Der Waals and Electrostatic Parameters for Intermolecular Interactions. J. Comput. Chem 1996, 17 (5–6), 520–552. [Google Scholar]
  • (30).Halgren TA Merck Molecular Force Field. III. Molecular Geometries and Vibrational Frequencies for MMFF94. J. Comput. Chem 1996, 17 (5–6), 553–586. [Google Scholar]
  • (31).Halgren TA; Nachbar RB Merck Molecular Force Field. IV. Conformational Energies and Geometries for MMFF94. J. Comput. Chem 1996, 17 (5–6), 587–615. [Google Scholar]
  • (32).Halgren TA Merck Molecular Force Field. V. Extension of MMFF94 Using Experimental Data, Additional Computational Data, and Empirical Rules. J. Comput. Chem 1996, 17 (5–6), 616–641. [Google Scholar]
  • (33).Waldher B; Kuta J; Chen S; Henson N; Clark AE ForceFit: A Code to Fit Classical Force Fields to Quantum Mechanical Potential Energy Surfaces. J. Comput. Chem 2010, 31 (12), 2307–2316. [DOI] [PubMed] [Google Scholar]
  • (34).Weiner SJ; Kollman PA; Case DA; Singh UC; Ghio C; Alagona G; Profeta S; Weiner P A New Force Field for Molecular Mechanical Simulation of Nucleic Acids and Proteins. J. Am. Chem. Soc 1984, 106 (3), 765–784. [Google Scholar]
  • (35).Chen IJ; Yin D; MacKerell AD Combinedab Initio/empirical Approach for Optimization of Lennard-Jones Parameters for Polar-Neutral Compounds. J. Comput. Chem 2002, 23 (2), 199–213. [DOI] [PubMed] [Google Scholar]
  • (36).Ercolessi F; Adams JB Interatomic Potentials from First-Principles Calculations: The Force-Matching Method. Europhys. Lett 1994, 26 (8), 583–588. [Google Scholar]
  • (37).Lindorff-Larsen K; Maragakis P; Piana S; Eastwood MP; Dror RO; Shaw DE Systematic Validation of Protein Force Fields against Experimental Data. PLoS One 2012, 7 (2), e32131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Unan H; Yildirim A; Tekpinar M Opening Mechanism of Adenylate Kinase Can Vary according to Selected Molecular Dynamics Force Field. J Comput Aided Mol Des 2015, 29 (7), 655–665. [DOI] [PubMed] [Google Scholar]
  • (39).Piana S; Lindorff-Larsen K; Shaw DE How Robust Are Protein Folding Simulations with Respect to Force Field Parameterization? Biophys. J 2011, 100 (9), L47–L49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Kazmirski SL; Daggett V Non-Native Interactions in Protein Folding Intermediates: Molecular Dynamics Simulations of Hen Lysozyme. J. Mol. Biol 1998, 284 (3), 793–806. [DOI] [PubMed] [Google Scholar]
  • (41).Caves LS; Evanseck JD; Karplus M Locally Accessible Conformations of Proteins: Multiple Molecular Dynamics Simulations of Crambin. Protein Sci 1998, 7 (3), 649–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Lindorff-Larsen K; Piana S; Palmo K; Maragakis P; Klepeis JL; Dror RO; Shaw DE Improved Side-Chain Torsion Potentials for the Amber ff99SB Protein Force Field. Proteins Struct. Funct. Bioinforma 2010, 78 (8), 1950–1958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Levitt M; Hirshberg M; Sharon R; Daggett V Potential Energy Function and Parameters for Simulations of the Molecular Dynamics of Proteins and Nucleic Acids in Solution. Comput. Phys. Commun 1995, 91 (1–3), 215–231. [Google Scholar]
  • (44).Huang J; MacKerell AD CHARMM36 All-Atom Additive Protein Force Field: Validation Based on Comparison to NMR Data. J Comput Chem 2013, 34 (25), 2135–2145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Pearlman DA; Case DA; Caldwell JW; Ross WS; Cheatham TE; DeBolt S; Ferguson D; Seibel G; Kollman P AMBER, a Package of Computer Programs for Applying Molecular Mechanics, Normal Mode Analysis, Molecular Dynamics and Free Energy Calculations to Simulate the Structural and Energetic Properties of Molecules. Comput. Phys. Commun 1995, 91 (1–3), 1–41. [Google Scholar]
  • (46).Case DA; Cheatham TE; Darden T; Gohlke H; Luo R; Merz KM; Onufriev A; Simmerling C; Wang B; Woods RJ The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26 (16), 1668–1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (47).Salomon-Ferrer R; Case DA; Walker RC An Overview of the Amber Biomolecular Simulation Package. Wiley Interdiscip. Rev. Comput. Mol. Sc 2013, 3 (2), 198–210. [Google Scholar]
  • (48).Beck DA; McCully ME; Alonso DOV; Daggett V in Lucem Molecular Mechanics. 2000–2018. University of Washington: Seattle. [Google Scholar]
  • (49).Abraham MJ; Murtola T; Schulz R; Páll S; Smith JC; Hess B; Lindahl E GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 19–25. [Google Scholar]
  • (50).Phillips JC; Braun R; Wang W; Gumbart J; Tajkhorshid E; Villa E; Chipot C; Skeel RD; Kalé L; Schulten K Scalable Molecular Dynamics with NAMD. J. Comput. Chem 2005, 26 (16), 1781–1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (51).de Massy B; Fayet O; Kogoma T Multiple Origin Usage for DNA Replication in sdrA(rnh) Mutants of Escherichia Coli K-12. J. Mol. Biol 1984, 178 (2), 227–236. [DOI] [PubMed] [Google Scholar]
  • (52).Kitani T; Yoda K; Ogawa T; Okazaki T Evidence That Discontinuous DNA Replication in Escherichia Coli Is Primed by Approximately 10 to 12 Residues of RNA Starting with a Purine. J. Mol. Biol 1985, 184 (1), 45–52. [DOI] [PubMed] [Google Scholar]
  • (53).Shimamoto T; Shimada M; Inouye M; Inouye S The Role of Ribonuclease H in Multicopy Single-Stranded DNA Synthesis in Retron-Ec73 and Retron-Ec107 of Escherichia Coli. J. Bacteriol 1995, 177 (1), 264–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Shen Y; Koh KD; Weiss B; Storici F Mispaired rNMPs in DNA Are Mutagenic and Are Targets of Mismatch Repair and RNases H. Nat. Struct. Mol. Biol 2012, 19 (1), 98–104. [DOI] [PubMed] [Google Scholar]
  • (55).Kanaya S; Katsuda-Nakai C; Ikehara M Importance of the Positive Charge Cluster in Escherichia Coli Ribonuclease HI for the Effective Binding of the Substrate. J. Biol. Chem 1991, 266 (18), 11621–11627. [PubMed] [Google Scholar]
  • (56).Nakamura H; Oda Y; Iwai S; Inoue H; Ohtsuka E; Kanaya S; Kimura S; Katsuda C; Katayanagi K; Morikawa K How Does RNase H Recognize a DNA.RNA Hybrid? Proc. Natl. Acad. Sci. U. S. A 1991, 88 (24), 11535–11539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (57).Oda Y; Iwai S; Ohtsuka E; Ishikawa M; Ikehara M; Nakamura H; Iwail S; Ohtsuka1 E; Ishikawa M; Ikehara M; et al. Binding of Nucleic Acids to E. Coli RNase HI Observed by NMR and CD Spectroscopy. Nucleic Acids Res 1993, 21 (20), 4690–4695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (58).Katayanagi K; Miyagawa M; Matsushima M; Ishikawa M; Kanaya S; Nakamura H; Ikehara M; Matsuzaki T; Morikawa K Structural Details of Ribonuclease H from Escherichia Coli as Refined to an Atomic Resolution. J. Mol. Biol 1992, 223 (4), 1029–1052. [DOI] [PubMed] [Google Scholar]
  • (59).Clarke ND; Kissinger CR; Desjarlais J; Gilliland GL; Pabo CO Structural Studies of the Engrailed Homeodomain. Protein Sci 1994, 3 (10), 1779–1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (60).MacKerell AD; Bashford D; Bellott M; Dunbrack RL; Evanseck JD; Field MJ; Fischer S; Gao J; Guo H; Ha S; et al. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102 (18), 3586–3616. [DOI] [PubMed] [Google Scholar]
  • (61).Mackerell AD; Feig M; Brooks CL Extending the Treatment of Backbone Energetics in Protein Force Fields: Limitations of Gas-Phase Quantum Mechanics in Reproducing Protein Conformational Distributions in Molecular Dynamics Simulations. J. Comput. Chem 2004, 25 (11), 1400–1415. [DOI] [PubMed] [Google Scholar]
  • (62).Janowski PA; Liu C; Deckman J; Case DA Molecular Dynamics Simulation of Triclinic Lysozyme in a Crystal Lattice. Protein Sci 2016, 25 (1), 87–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (63).Chaudhuri R; Tang S; Zhao G; Lu H; Case DA; Johnson ME Comparison of SARS and NL63 Papain-like Protease Binding Sites and Binding Site Dynamics: Inhibitor Design Implications. J. Mol. Biol 2011, 414 (2), 272–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (64).Okal A; Cornillie S; Matissek SJ; Matissek KJ; Cheatham TE; Lim CS Re-Engineered p53 Chimera with Enhanced Homo-Oligomerization That Maintains Tumor Suppressor Activity. Mol. Pharm. 2014, 11 (7), 2442–2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (65).Beck DAC; Daggett V Methods for Molecular Dynamics Simulations of Protein Folding/unfolding in Solution. Methods 2004, 34 (1), 112–120. [DOI] [PubMed] [Google Scholar]
  • (66).Kimanius D; Pettersson I; Schluckebier G; Lindahl E; Andersson M SAXS-Guided Metadynamics. J. Chem. Theory Comput 2015, 11 (7), 3491–3498. [DOI] [PubMed] [Google Scholar]
  • (67).Bernardi RC; Cann I; Schulten K Molecular Dynamics Study of Enhanced Man5B Enzymatic Activity. Biotechnol. Biofuels 2014, 7 (1), 83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (68).Horn HW; Swope WC; Pitera JW; Madura JD; Dick TJ; Hura GL; Head-Gordon T Development of an Improved Four-Site Water Model for Biomolecular Simulations: TIP4P-Ew. J. Chem. Phys 2004, 120 (20), 9665–9678. [DOI] [PubMed] [Google Scholar]
  • (69).Ryckaert J-P; Ciccotti G; Berendsen HJ Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N-Alkanes. J. Comput. Phys 1977, 23 (3), 327–341. [Google Scholar]
  • (70).Miyamoto S; Kollman PA Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem 1992, 13 (8), 952–962. [Google Scholar]
  • (71).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys 1983, 79 (2), 926–935. [Google Scholar]
  • (72).Páll S; Hess B A Flexible Algorithm for Calculating Pair Interactions on SIMD Architectures. Comput. Phys. Commun 2013, 184 (12), 2641–2650. [Google Scholar]
  • (73).Hess B; Bekker H; Berendsen HJC; Fraaije JGEM LINCS: A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem 1997, 18 (12), 1463–1472. [Google Scholar]
  • (74).Levitt M; Sharon R; Laidig KE; Daggett V Calibration and Testing of a Water Model for Simulation of the Molecular Dynamics of Proteins and Nucleic Acids in Solution. 1997, 5647 (96), 5051–5061. [Google Scholar]
  • (75).Kell GS Precise Representation of Volume Properties of Water at One Atmosphere. J. Chem. Eng. Data 1967, 12 (1), 66–69. [Google Scholar]
  • (76).Beck DAC; Armen RS; Daggett V Cutoff Size Need Not Strongly Influence Molecular Dynamics Results for Solvated Polypeptides. Biochemistry 2005, 44 (2), 609–616. [DOI] [PubMed] [Google Scholar]
  • (77).Haar L; Gallagher JS; Kell GS; National Standard Reference Data System (U.S.). NBS/NRC Steam Tables : Thermodynamic and Transport Properties and Computer Programs for Vapor and Liquid States of Water in SI Units; Washington, D.C. :, 1984. [Google Scholar]
  • (78).Willis BTM; Bertram TM; Pryor AW Thermal Vibrations in Crystallography; Cambridge University Press, 1975. [Google Scholar]
  • (79).Han B; Liu Y; Ginzinger SW; Wishart DS SHIFTX2: Significantly Improved Protein Chemical Shift Prediction. J. Biomol. NMR 2011, 50 (1), 43–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (80).Ulrich EL; Akutsu H; Doreleijers JF; Harano Y; Ioannidis YE; Lin J; Livny M; Mading S; Maziuk D; Miller Z; et al. BioMagResBank. Nucleic Acids Res 2008, 36 (Database issue), D402–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (81).Religa TL Comparison of Multiple Crystal Structures with NMR Data for Engrailed Homeodomain. J. Biomol. NMR 2008, 40 (3), 189–202. [DOI] [PubMed] [Google Scholar]
  • (82).Yamazaki T; Yoshida M; Kanaya S; Nakamura H; Nagayama K Assignments of Backbone 1H, 13C, and 15N Resonances and Secondary Structure of Ribonuclease H from Escherichia Coli by Heteronuclear Three-Dimensional NMR Spectroscopy. Biochemistry 1991, 30 (24), 6036–6047. [DOI] [PubMed] [Google Scholar]
  • (83).Zhang H; Neal S; Wishart D RefDB: a Database of Uniformly Referenced Protein Chemical Shifts. J Biomol NMR 2003, 25, 173–195. [DOI] [PubMed] [Google Scholar]
  • (84).Karplus M Vicinal Proton Coupling in Nuclear Magnetic Resonance. J. Am. Chem. Soc 1963, 85 (18), 2870–2871. [Google Scholar]
  • (85).Habeck M; Rieping W; Nilges M Bayesian Estimation of Karplus Parameters and Torsion Angles from Three-Bond Scalar Couplings Constants. J. Magn. Reson 2005, 177 (1), 160–165. [DOI] [PubMed] [Google Scholar]
  • (86).Ludvigsen S; Andersen K V; Poulsen, F. M. Accurate Measurements of Coupling Constants from Two-Dimensional Nuclear Magnetic Resonance Spectra of Proteins and Determination of Phi-Angles. J. Mol. Biol 1991, 217 (4), 731–736. [DOI] [PubMed] [Google Scholar]
  • (87).Schmidt JM; Blümel M; Löhr F; Rüterjans H Self-Consistent 3J Coupling Analysis for the Joint Calibration of Karplus Coefficients and Evaluation of Torsion Angles. J. Biomol. NMR 1999, 14 (1), 1–12. [DOI] [PubMed] [Google Scholar]
  • (88).Smith LJ; Sutcliffe MJ; Redfield C; Dobson CM Analysis of ϕ and χ1 Torsion Angles for Hen Lysozyme in Solution from Proton NMR Spin-Spin Coupling Constants. Biochemistry 1991, 30 (4), 986–996. [DOI] [PubMed] [Google Scholar]
  • (89).Vuister GW; Bax A Quantitative J Correlation: A New Approach for Measuring Homonuclear Three-Bond JHN,Hα Coupling Constants in 15N-Enriched Proteins. J. Am. Chem. Soc 1993, 115 (17), 7772–7777. [Google Scholar]
  • (90).Hu J-SH; Bax A Determination of φ and χ1 Angles in Proteins from 13C−13C Three-Bond J Couplings Measured by Three-Dimensional Heteronuclear NMR. How Planar Is the Peptide Bond? J. Am. Chem. Soc 1997, 119 (27), 6360–6368. [Google Scholar]
  • (91).Pardi A; Billeter M; Wüthrich K Calibration of the Angular Dependence of the Amide Proton-C Alpha Proton Coupling Constants, 3JHN Alpha, in a Globular Protein. Use of 3JHN Alpha for Identification of Helical Secondary Structure. J. Mol. Biol 1984, 180 (3), 741–751. [DOI] [PubMed] [Google Scholar]
  • (92).Stafford KA; Trbovic N; Butterwick JA; Abel R; Friesner RA; Palmer AG Conformational Preferences Underlying Reduced Activity of a Thermophilic Ribonuclease H. J. Mol. Biol 2015, 427 (4), 853–866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (93).Wong K-B; Daggett V Barstar Has a Highly Dynamic Hydrophobic Core: Evidence from Molecular Dynamics Simulations and Nuclear Magnetic Resonance Relaxation Data. Biochemistry. 1998, 37 (32), 11182–11192. [DOI] [PubMed] [Google Scholar]
  • (94).Kazmirski S; Li A; Daggett V Analysis Methods for Comparison of Molecular Dynamics Trajectories: Applications to Protein Unfolding Pathways and Denatured Ensembles. J. Mol. Biol 1999, 290, 283–304. [DOI] [PubMed] [Google Scholar]
  • (95).Li A; Daggett V Characterization of the Transition State of Protein Unfolding by Use of Molecular Dynamics: Chymotrypsin Inhibitor 2. Proc. Natl. Acad. Sci. 1994, 91 (22), 10430–10434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (96).Li A; Daggett V Identification and Characterization of the Unfolding Transition State of Chymotrypsin Inhibitor 2 by Molecular Dynamics Simulations. J. Mol. Biol. 1996, 257 (2), 412–429. [DOI] [PubMed] [Google Scholar]
  • (97).Li A; Daggett V Molecular Dynamics Simulation of the Unfolding of Barnase: Characterization of the Major Intermediate. J. Mol. Biol 1998, 275 (4), 677–694. [DOI] [PubMed] [Google Scholar]
  • (98).Fersht AR Protein Folding and Stability: The Pathway of Folding of Barnase. FEBS Lett. 1993, 325 (1–2), 5–16. [DOI] [PubMed] [Google Scholar]
  • (99).Fersht AR; Bycroft M; Horovitz A; Kellis JT; Matouschek A; Serrano L Pathway and Stability of Protein Folding. Philos. Trans. R. Soc. B Biol. Sci 1991, 332 (1263), 171–176. [DOI] [PubMed] [Google Scholar]
  • (100).Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H; Shindyalov IN; Bourne PE The Protein Data Bank. Nucleic Acids Res 2000, 28 (1), 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (101).Petsko GGA Not Just Your Average Structures. Nat. Struct. Mol. Biol 1996, 3 (7), 565–566. [DOI] [PubMed] [Google Scholar]
  • (102).Stollar EJ; Mayor U; Lovell SC; Federici L; Freund SMV; Fersht AR; Luisi BF Crystal Structures of Engrailed Homeodomain Mutants. J. Biol. Chem 2003, 278 (44), 43699–43708. [DOI] [PubMed] [Google Scholar]
  • (103).Wagner G; Pardi A; Wuethrich K Hydrogen Bond Length and Proton NMR Chemical Shifts in Proteins. J. Am. Chem. Soc. 1983, 105 (18), 5948–5949. [Google Scholar]
  • (104).Vymětal J; Vondrášek J Critical Assessment of Current Force Fields. Short Peptide Test Case. J. Chem. Theory Comput 2013, 9 (1), 441–451. [DOI] [PubMed] [Google Scholar]
  • (105).Pantelopulos GA; Mukherjee S; Voelz VA Microsecond Simulations of Mdm2 and its Complex with p53 Yield Insight into Force Field Accuracy and Conformational Dynamics. Proteins. 2015, 83, 1665–1676. [DOI] [PubMed] [Google Scholar]
  • (106).Beauchamp KA; Lin Y -S.; Das, R.; Pande, V.S. Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements. Journal of Chemical Theory and Computation. 2012, 8, 1409–1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (107).Gianni S; Guydosh NR; Khan F; Caldas TD; Mayor U; White GWN; DeMarco ML; Daggett V; Fersht AR Unifying Features in Protein-Folding Mechanisms. Proc. Natl. Acad. Sci. U. S. A 2003, 100 (23), 13286–13291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (108).Mayor U; Johnson CM; Daggett V; Fersht AR Protein Folding and Unfolding in Microseconds to Nanoseconds by Experiment and Simulation. Proc. Natl. Acad. Sci. U. S. A 2000, 97 (25), 13518–13522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (109).Mayor U; Guydosh NR; Johnson CM; Grossmann JG; Sato S; Jas GS; Freund SMV; Alonso DOV; Daggett V; Fersht AR The Complete Folding Pathway of a Protein from Nanoseconds to Microseconds. Nature 2003, 421: 863–867. [DOI] [PubMed] [Google Scholar]
  • (110).Religa TL; Markson JS; Mayor U; Freund SMV; Fersht AR Solution Structure of a Protein Denatured State and Folding Intermediate. Nature 2005, 437 (7061), 1053–1056. [DOI] [PubMed] [Google Scholar]
  • (111).Debiec KT; Gronenborn AM; Chong LT Evaluating the Strength of Salt Bridges: A Comparison of Current Biomolecular Force Fields. J. Phys. Chem. B 2014, 118, 6561–6569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (112).Vymetal J; Vondrasek J Critical Assessment of Current Force Fields. Short Peptide Test Case. J. Chem. Theor. Comp 2013, 9, 441–451. [DOI] [PubMed] [Google Scholar]
  • (113).Bowman GR Accurately Modeling Nanosecond Protein Dynamics Requires at Least Microseconds of Simulation. Journal of Computational Chemistry 2016, 37 (6), 558–566. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI1
SI2

RESOURCES