Abstract
Factors affecting the accuracy of molecular dynamics (MD) simulations are investigated by comparing generalized order parameters for backbone NH vectors of the B3 immunoglobulin-binding domain of streptococcal protein G (GB3) derived from simulations with values obtained from NMR spin relaxation (Yao L, Grishaev A, Cornilescu G, Bax A. J Am Chem Soc 2010; 132: 4295–309.). Choices for many parameters of the simulations, such as buffer volume, water model, or salt concentration, have only minor influences on the resulting order parameters. In contrast, seemingly minor conformational differences in starting structures, such as orientations of sidechain hydroxyl groups, resulting from applying different protonation algorithms to the same structure, have major effects on backbone dynamics. Some, but not all, of these effects are mitigated by increased sampling in simulations. Most discrepancies between simulated and experimental results occur for residues located at the ends of secondary structures and involve large amplitude nanosecond timescale transitions between distinct conformational substates. These transitions result in autocorrelation functions for bond vector reorientation that do not converge when calculated over individual simulation blocks, typically of length similar to the overall rotational diffusion time. A test for convergence before averaging the order parameters from different blocks results in better agreement between order parameters calculated from different sets of simulations and with NMR-derived order parameters. Thus, MD-derived order parameters are more strongly affected by transitions between conformational substates than by fluctuations within individual substates themselves, while conformational differences in the starting structures affect the frequency and scale of such transitions.
Keywords: protein dynamics, molecular dynamics simulation, B3 domain of protein G, NMR spin relaxation, generalized order parameter, starting structure, autocorrelation function, force field, hydrogen bond
Introduction
Their atomistic detail and high resolution in both space and time make molecular dynamics (MD) simulations an ideal tool for studies of the conformational dynamics of biological molecules, especially in synergy with experimental methods such as NMR. The insights obtained from such joint investigations are necessarily limited by deficiencies in simulation procedures that reduce quantitative agreement with experimental data. Despite efforts made over the last decade, discrepancies persist between different MD simulations of the same system and between MD simulations and NMR.1
The focus of the present work is to identify sources of inaccuracies in MD simulations, both in comparing multiple simulations to each other and in comparing simulations to NMR spin relaxation data. We use the B3 immunoglobulin-binding domain of streptococcal protein G (GB3), a 56 amino acid α/β protein, which has served as a common model system for protein dynamics. As in many other investigations,1–3 the square of the generalized order parameter, S2 (henceforth simply called the order parameter) is used to describe the orientational conformational distribution of the backbone amide (NH) bond vectors. Order parameters can be derived experimentally from NMR spin relaxation rate constants or residual dipolar couplings (RDCs).4 A limitation of spin relaxation experiments is their insensitivity to conformational dynamics on timescales similar to or longer than overall rotational tumbling of molecules. In contrast, RDC-derived order parameters are sensitive to motions on a wider range of timescales up to milliseconds, but contributions from processes much slower than MD trajectory lengths are not captured by simulations.2,5
The present work compares three sets of simulations: set A is based on 14 trajectories previously reported,1 with additional trajectories that were produced herein from the same starting structures; set B consists of 16 trajectories recorded using starting structures generated independently from set A; and set C consists of a 1.2 microsecond simulation previously described in the literature.6 Comparisons between these trajectories demonstrate that multiple simulations using a single force field (AMBER ff99SB) can result in discrepancies between the simulated order parameters because the choice of starting structures influences subsequent sampling. Even sets of starting structures that are indistinguishable by backbone RMSD measures can yield notably different dynamical behavior for GB3. For example, such differences arise from a single tyrosine hydroxyl proton with two orientations, owing to different methods of protonation, that are unable to interconvert even during very long simulations. Other parameters of the simulation protocol, including box size or geometry, water model, salt content, or force cutoffs, have at most minor influences on the behavior of the system during simulations.
Sampling-related problems are reduced in longer simulations, but even microsecond simulations are still strongly dependent on the starting point on the conformational energy landscape. Many sampling-related discrepancies between simulations are consequences of nanosecond timescale motions, often related to sidechain rearrangements or breaking of hydrogen bonds (sometimes linked to water invasion), that lead to unconverged NH autocorrelation functions on the timescale of the analyzed simulation blocks, typically chosen to be of order of the rotational correlation time of GB3 for comparisons with NMR spin relaxation data. Applying a threshold to exclude simulation blocks whose autocorrelation functions fail to converge eliminates nearly all differences between simulations with different starting structures and yields order parameters that are in much better agreement with experimentally derived values.
Materials and Methods
MD simulations
Set A starting structures were derived from the 1.1 Å X-ray crystal structure (PDB code 1IGD)7 with the N-terminus altered as described previously1 to recapitulate the construct used for spin relaxation studies8,9 (Δ1–5, T6M, T7Q; mutations performed in PyMOL10). Side-chains conformations were further optimized using PLOP.11 Starting structures for the set B simulations were derived from the X-ray crystal structure (1IGD) in the same fashion, but independently from set A and without PLOP optimization.
All structures, except reruns of completely solvated systems of previously conducted simulations1 (set A), were prepared for simulation in Maestro12 with the Maestro Protein Preparation Wizard, which also added hydrogen atoms to the structures. Maestro was used to solvate the protein with TIP3P (or TIP4P if specified) water molecules13 in cubic boxes of 50, 54, 75 or 90 Å edges (all lengths ±1 Å). This corresponds to minimum buffer layer thicknesses of 1 nm (50 Å or 54 Å depending on orientation of the molecule in the box), 2 nm (75 Å) or 3 nm (90 Å). The system was either neutralized with two sodium ions or an additional 0.15 M NaCl was added as specified. AMBER ff99SB (or ff99SB-ILDN as specified) simulations were conducted with the Desmond MD software package (Academic Release 3)14. Particle-Mesh-Ewald periodic boundary conditions were used with a 9 or 12 Å cutoff for electrostatic interactions as specified. The integrator used time steps of 2 fs. Bonds to hydrogen atoms were constrained using the M-SHAKE algorithm.15 Energy minimization (convergence threshold 1 kcal/mol/Å) and a 1 ns NPT equilibration simulation at 297 K and 1 atm were conducted for each system. Starting structures for production runs were extracted at equally spaced time intervals from the second half of the equilibration runs. Production runs were conducted for 2.4 ns at constant volume and energy conditions (NVE). Coordinates were written out every 1 ps.
A summary of differences between the original set A and set B simulations can be found in Table I. Reruns of completely solvated set A simulations were processed with Ptraj from the AMBER9 suite16 and then prepared in Maestro.
Table I. Summary of the differences of the original set A and set B simulations.
Note that the N-terminal alterations were performed independently for set A and set B
| Set A | Set B | |
|---|---|---|
| Initial coordinates | 1IGD | 1IGD |
| Mutations | Δ1–5, T6M, T7Q | Δ1–5, T6M, T7Q |
| Protein preparation (e.g. addition of hydrogens) | Amber | Maestro |
| PLOP optimization11 | Yes | No |
| Water model | TIP4P13 | TIP3P13 |
| Force field | AMBER ff99SB20 | AMBER ff99SB |
| Salt | 2 sodium ions | 0.15 M NaCl |
| Solvent box | Orthorhombic, 1 nm minimal buffer layer | Cubic, 1 nm minimal buffer layer |
| Temperature | 297 K | 297 K |
| Length of simulations | 2.4 ns | 2.4 ns |
| Original number of trajectories | 14 | 16 |
Autocorrelation functions and order parameters
All trajectories were analyzed using VMD.17 The effects of overall tumbling during the simulation were removed by superposing the Cα atoms of each frame by RMSD fit to the first frame of the simulation. Orientational autocorrelation functions for the NH bond vectors were calculated as described previously.1 If no convergence threshold was applied, then order parameters were calculated as described previously.1,18 To judge convergence, the last value of the autocorrelation function (C(1200 ps)) for a given residue and simulation was compared to its mean value in the middle region (frames 300 ps to 900 ps). If the absolute difference was within the threshold (set to 0.005), then the order parameter was set to the mean value of the autocorrelation function in the middle region (300 ps to 900 ps) and included into averaging over simulation blocks. If the difference was larger than the defined threshold, then these data were excluded from order parameter averages. Simulated order parameters were scaled by ξ = (1.02/1.04)6 ≈ 0.89 for comparisons with spin relaxation derived data to account for zero point vibrational motions of the NH bond vectors.19
Results
The initial set B trajectories and the previously published set A trajectories were generated as described in the Methods section and summarized in Table I. The calculated order parameters for both sets were compared with experimental order parameters derived from NMR spin relaxation measurements (Fig. 1).9 As described previously,1 the MD-derived order parameters are underestimated compared to experimentally obtained order parameters, especially in the flexible loop regions and at the termini of the protein. In addition, the set B order parameters had major discrepancies with the set A simulations, mainly within the first two loops of the protein (Fig. 1). In loop 1, the main differences are for residues Gly9 and Gly14. These two residues also show large variances in the order parameters calculated for individual simulations within each set. In loop 2, the main differences are for residues Ala20 and Asp22. These two residues have rather large order parameters in set B that are closer to the experimental data.
Figure 1.
Order parameters for simulations using the set A (red) and set B (blue) starting structures. For comparison, the experimental values are shown as filled green circles9. The biggest discrepancies lie within loop 2 (Ala20 and Asp22), and to some extent at the N-terminus and loops 1 (Gly9) and 3 (residues 37 – 41). Error bars represent standard errors.
As described below, a series of additional simulations were performed to identify sources of differences between the set A and set B simulations and between the MD-derived order parameters and experimental values.
Influence of simulation parameters
Water model, box symmetry, and salt concentration were successively changed to eliminate differences between the simulation protocols used for set A and set B. Fourteen simulations of the set A starting structures using the TIP3P water model yielded order parameters that were indistinguishable from the original values obtained using TIP4P water and the discrepancies with set B simulations were unaltered (results not shown). Nine simulations of set B structures were performed in a 50 Å cubic box (1 nm minimal buffer layer, with the long axis of the protein along the diagonal of the cube) using only two sodium ions to neutralize net charge, to correspond to the original set A protocol (Fig. S1). In addition to expected minor differences in loop one, the flexibility and variability of the N-terminus and Ala20 are slightly increased, similarly to the previously published results.1
Resolvating five of the set A starting structures according to the solvation protocol used for set B (TIP3P, 1 nm minimal buffer layer, 0.15 M NaCl, cubic box) produced order parameters in agreement with the original set A trajectories (Fig. S2), with a few non-converged low order parameters for Ala20, suggesting a larger transition in those trajectories, and a number of lower order parameters for Asp22. The trajectories having a low order parameter for Ala20 also have a low order parameter at the N-terminus; the correlation between these two sites will be discussed below. The dynamics of Asp22 appears uncorrelated to the movements of Ala20 or the N-terminus.
To study the influence of the water box size, the set B simulations were expanded to include seventeen trajectories for a 75 Å cubic box and five trajectories for a 90 Å cubic box, corresponding to minimal buffer layers of 2 nm and 3 nm, respectively. Order parameters were calculated for all trajectories and compared with the results for the 54 Å cubic box (Fig. S3).
Agreement between order parameters obtained from the three different box sizes is very good. Only two outliers differences in order parameters > 0.05 were observed: Glycine 9 (difference in order parameters of about 0.09 between 1 nm and 3 nm buffer layer sets) and Glycine 14 (difference in order parameters of about 0.07 between 1 nm and 3 nm buffer layer sets). Two more minor outliers were observed: Glycine 38 and Glycine 41 (to a certain extent affecting the intervening residues as well). All those residues are also outliers in comparison with the experimental data. Note that these are all four of the glycines found in GB3 and that they all lie at loop hinges at the ends of secondary structures (Fig. S3). The peptide bond of Gly14 with Lys13, however, has such variability in the different structures that the β-sheet sometimes extends to Leu12. In the X-ray crystal structure, the sheet also extends to residue 12, but has a clearly visible kink at Gly14. These results suggest that glycine parametrization remains a weakness of current MD force fields.20,21 Because all glycines in GB3 flank secondary structure elements, the problem may arise specifically for glycines in these special positions.
To test whether simulations of larger box sizes were affected by the choice of the electrostatic cutoff, we reran one of the 2 nm simulations of set B increasing the cutoff from 9 to 12. The results were not significantly different (results not shown).
The parameters for the sidechains of isoleucine, leucine, aspartate, and asparagines in the AMBER ff99SB forcefield have been reoptimized,6 yielding a new forcefield termed AMBER ff99SB-ILDN. We conducted fourteen 2.4 ns simulations using the set A starting structures and twenty-six 2.4 ns simulations using the set B starting structures together with the new forcefield. All simulations were performed using a 50 Å cubic box with two sodium ions to neutralize the system (Fig. S4). Simulations of the set A starting structures with the ff99SB-ILDN force field yield somewhat larger order parameters for Ala20 and the N terminus, but otherwise the discrepancies with the results from set B remain. Interestingly, flexibilities of loop 1 (residues 9 through 13) and to a minor extent loop 3 (residues 38 through 40) are more pronounced with the new force field for simulations using the set B starting structures. However, the increased flexibility in loop 1 may reflect sampling limitations, because reduced order parameters are not observed using a 1.2 μs trajectory, which was also conducted with the ff99SB-ILDN force field (vide infra).
Transitions in conformational space and sampling
The above investigations suggest that the starting structures are the main cause for the discrepancies in order parameters between the different sets of simulations. Backbone RMSD comparisons of starting structures from sets A and B to each other and to the X-ray crystal structure were less than 1.2 Å in all cases, with no apparent correlation between RMSD and discrepant order parameters (results not shown). One of the main features of the dynamic behavior of Gly9, Gly14, and Ala20 is their apparent bifurcated behavior in all the simulations. The order parameter is either high (corresponding to a converged autocorrelation function) or low (corresponding to an unconverged autocorrelation function) but seldom adopts intermediate values (compare Figures S3 and S7). This behavior indicates that GB3 undergoes conformational changes on nanosecond or longer timescales and simulated results are therefore strongly dependent on sampling in the nanosecond-length trajectories. Asp22 shows a wide spread of order parameters in the set A simulations, but is rigid in all simulations in set B (Figures S3 and S7). Gly9 and Gly14 are seen to undergo transitions in both sets of simulations. Ala20 exhibits large-scale transitions predominantly in the set A simulations, although these occur occasionally in the set B simulations. In general, the set A starting structures populate a region of the energy landscape for which these conformational transitions are more probable. Backbone RMSDs of each frame from the first frame of each trajectory show transition-like behavior for trajectories that exhibit low order parameters for Gly9 and Gly14 (results not shown). The same transition-like behavior is observed for a similar backbone RMSD analysis of loop 1 alone, but only to a minor extent for loop 2 (results not shown). Thus, the conformational dynamics for Asp22 are more variable than for Gly9, Gly14 or Ala20. In addition, the occasional transitions of Ala20 are not concerted with transitions of Asp22 and do not affect a large portion of loop 2, in contrast to the effects on loop 1, where a number of interactions are rearranged in concert. In other words, loop 1 seems to make transitions as a unit whereas loop 2 does not (vide infra).
Increased sampling with a 1.2 μs long trajectory
We analyzed a 1.2 μs trajectory of GB3 (kindly provided by D.E. Shaw Research)6 to assess the effects of greatly increasing the simulation time. Order parameters were calculated for 500 blocks of length 2.4 ns (set C), to allow comparison with the order parameters obtained from the sets A and B simulations. The results are shown in Figure 2. A number of residues have large variances in order parameters, namely both termini, loop 1 (residues 9–15), Ala20, Asp40 and Gly41 and to some extent Gly38, Val39 and Phe30. With the exception of Phe30, those residues are all in loops, termini or flank secondary structure elements.
Figure 2.

A) Order parameters for all 500 2.4 ns blocks of the 1.2 μs trajectory (grey lines). The red line and dots indicate the average values, the error bars were omitted for clarity. Ala20, Asp22 and Phe30 are marked with a blue, green and red triangle, respectively. Ala20 and Phe30 clearly show bifurcated behavior: the majority of the order parameters are high (corresponding to converged autocorrelation functions) with a number of very low order parameters, which do not affect the average over the large number of blocks.
B) Distribution of order parameters for Ala20, Asp22 and Phe30. Most order parameters are high, but Ala20 and Phe30 exhibit a small number of excursions to very low order parameters (corresponding to unconverged autocorrrelation functions). Asp22 on the other hand does not show any outliers.
The results from the 1.2 μs trajectory show that the mean value of the order parameter for Ala20 is almost unaffected by infrequent fluctuations to conformational states with high local flexibility. Additionally, the low order parameters for Ala20 do not coincide with low order parameters for any of the other residues in loop 2. Thus, transitions to other subensembles appear to be sampled too frequently using the starting structures for set A. Phe30 also exhibits infrequent transitions to other conformational states, linked to low order parameters for those transitions. Examination of the trajectory shows that the transitions of Phe30, situated in the middle of the α-helix, is accompanied by a bending of the helix and other movements all along the length of the protein, most notably in the loops and termini (especially loop 1 and the C-terminus). This transition was not sampled in any of the 2.4 ns trajectories (sets A or B). Very strikingly, Asp22 remains constrained for the entire 1.2 μs trajectory.
Long range effects and the influence of sidechain conformations on local transitions
Conformational transitions of Ala20 are very strongly coupled to movements of the N-terminus, as shown by the correlation between order parameters for Gln2 and Ala20 over the 500 blocks derived from the 1.2 μs simulation (Figure 3). Two hydrogen bonds between the backbones of Ala20 and Met1 transiently break, allowing flipping out of the N-terminus and rearrangement of Ala20. A similar correlation is observed between residues in loop 1 and the C-terminus, as a result of fluctuating interactions between backbone and sidechains of residues 55 and 56 with the backbone and sidechains of residues 8 through 11. However, loop 1 also interacts with loop 3, leading to a lower correlation between the motions of loop 1 and the C-terminus.
Figure 3.
A) Order parameters for Gln2 (blue solid) and Ala20 (green dashed) are strongly correlated over the 1.2 μs trajectory. B) Ala20 flips when its hydrogen bonds with the N-terminus are broken. The left panel shows the native hydrogen-bonded state. The right panel shows the state with the flipped out N-terminus and broken hydrogen bonds.
Although the overall RMSDs from the crystal structure of the backbone of all the structures in sets A and B are similar, plots of the absolute differences for each Cα along the chain of the GB3 show that the starting structures from set A have a higher variability than structures from set B relative to the X-ray crystal structure (Figure 4A). Backbone RMSDs for set A and set B starting structures from the starting structure of the 1.2 μs simulation show a substantial deviation for set A at the Cα of Val21 in loop 2 as well as at the N-terminus (Fig. 4B). The NH of Asp22 is part of the peptide bond to Val21, suggesting a source of different simulated properties for sets A and B. The sidechain of Val21 has a different conformation in the set A starting structures than in the set B structures or the starting structure of the 1.2 μs trajectory. Additionally, the backbone around Val21 is slightly displaced in set A starting structures, which explains the high Cα deviation of Val21 (Fig. 4C). In the original X-ray crystal structure, the sidechain of Val21 is in the same conformation as in the set B starting structures, but the backbone around Val21 adopts an intermediate position between the two clusters of structures, explaining why the RMSDs of the Val21 Cα atoms from the PDB structure are similar for both sets A and B.
Figure 4.

A) Cα RMSDs from the PDB structure averaged over all set A (red) and set B (blue) starting structures (red) and set B starting structures compared to the PDB structure (blue). No obvious differences in loop 2 are apparent. Error bars represent standard errors. B) Cα RMSDs from the starting structure of the 1.2 μs trajectory averaged over all set A (red) and set B (blue) starting structures. Clear differences are observed in the N-terminus and loop 2, most prominently at the Cα of Val21. Error bars represent standard errors. C) Position of the sidechain of Val21 in all set A (red spheres) and set B (blue spheres) starting structures. Ribbons represent the backbone of loop 2; sticks are two representative sidechain conformations of Val21 for each set of structures. In the case of set A, the backbone carbonyl of Val21 is pulled toward the hydroxyl group of Tyr3 (red sticks and spheres).
Differences in protonation influence backbone dynamics
As described previously,1 the carbonyl of Val21 forms a non-native hydrogen bond with the hydroxyl group of Tyr3 in set A simulations. Indeed, in the set A starting structures, this hydroxyl group is pointing towards loop 2 and Val21 (Figure 5A), poised to form the hydrogen bond. In set B and in the 1.2 μs trajectory, the hydroxyl group of Tyr3 is pointing away from loop 2. In this conformation the hyrodxyl group is hydrogen bonded to a water molecule, and is not available to hydrogen bond with Val21. Although Val21 undergoes sidechain rearrangements along the 1.2 μs trajectory, the hydroxyl group of Tyr3 never rotates to point towards loop 2 and thus never forms a hydrogen bond with Val 21 (Fig. 5B). The different orientations of the hydroxyl group arise when preparing the system for simulation by adding hydrogen atoms to the crystal structure. Simulations of the closely related protein GB1 also show increased flexibility of Asp22 and the Tyr3 hydroxyl group is oriented towards Val21 (unpublished results).
Figure 5.
A) The hydroxyl group of Tyr3 points towards the backbone carbonyl of Val21 in all of the set A (red) but away from it in all of the set B (blue) starting structures. B) The hydroxyl of Tyr3 group points away from the backbone carbonyl of Val21 throughout the 1.2 μs trajectory, because it is relatively tightly packed in a hydrophobic environment. The same is true for Tyr45 which does not flip during the 1.2 μs trajectory. Tyr33 which is more exposed than the other two, undergoes occasional complete ring flips along the trajectory.
To test whether the orientation of the hydroxyl group of Tyr3 is sufficient to bifurcate the dynamical behavior during the trajectory, we rotated the hydroxyl group of Tyr3 by approximately 180° to point towards Val21 in a starting structure derived from the set B simulations (which did not exhibit low order parameters for Asp22) and manually rotated the hydroxyl group of Tyr3 to point away from Val21 towards the solvent for a representative set A starting structure. Eight 2.4 ns NVE simulations were run for each system. Figure 6 shows that the behavior is indeed interchanged, demonstrating that the position of the hydroxyl group of Tyr3 at the beginning of the simulation is sufficient to determine the dynamic behavior of Val21/Asp22 during the simulation.
Figure 6.

After flipping the hydroxyl group of Tyr3 away from Val21 in the set A starting structures (A) or towards Val21 in the set B starting structures (B), the behaviors are reversed. Asp22 now undergoes conformational transitions in the set B but not the set A starting structures, showing that the orientation of the hydroxyl group of Tyr3 is sufficient to determine the dynamical properties of Asp22.
To examine whether the hydroxyl orientation of the set B starting structures (hydroxyl pointing away from loop 2) was so strongly favored that no flip would occur in even a 1.2 μs trajectory, we also ran a 1.2 μs simulation with the same structure used to initiate trajectories shown in Fig 6B (set B starting structure with hydroxyl flipped to correspond to set A orientation). Figure S5 shows that the χ1 and χ2 dihedral angles do not change for Tyr3 throughout both 1.2 μs trajectories, independently of the starting structure. On the other hand, the dihedral angle between the hydroxyl group and the ring changes by 180° at a time point 663 ns into the simulation. Figure S6 shows how clearly the hydroxyl flip separates the trajectory into set A like (before the flip) and set B like (after the flip) behavior. A single event does not allow an estimation of the flip rate or populations of the two orientations, but the energy barrier is not too high to be overcome in trajectories on the μs-ms time scale. For all simulations lengths that can easily be achieved with the currently available computational power, the problem remains: the flip rate of the hydroxyl group (much less the entire ring) is too low to equilibrate within nano- or microseconds. The difference in protonation at the beginning still strongly influences the dynamical behavior of the whole trajectory (see Fig. S6).
Timescales and understanding the discrepancies
Many of the discrepancies between the different simulations and with the experimental data occur for residues that undergo infrequent large transitions in conformational space, leading to the bifurcated behavior described above (see Fig. S3 and S7). Many of the trajectories that exhibit exceptionally low order parameters for certain residues also have unconverged autocorrelation functions for those residues (Fig. S7). Most of those transitions seem to be infrequent enough not to affect the mean order parameters for a very long simulation or a large sample of short simulations that cover many substates in conformational space. These large infrequent transitions might not contribute to NMR spin relaxation rate constants either because signal averaging over the large number of molecules in solution makes them invisible or because the timescales of these motions are beyond those that can be captured by relaxation studies. Therefore, we decided to add a convergence criterion to the autocorrelation functions when averaging over many simulations, as described in the Material and Methods section. Figure 7 shows that including a test of convergence for the autocorrelation function increases the order parameters of the main outliers without affecting the remaining residues. This improves the agreement of the different simulations with each other and with NMR-derived order parameters.
Figure 7.

A) Simulated order parameters for set A (red) and set B (blue) starting structures are shown in comparison to experimentally determined values (green dots). Discrepancies are indicated with black triangles. B) After setting a convergence threshold for the autocorrelation function all the marked regions become more rigid and now agree much better between the two sets of starting structures as well as between simulations and experiment.
Discussion
An extensive systematic set of simulations of GB3 demonstrates that the choice of the starting structures is more important for the accuracy of the resulting backbone NH order parameters than many other variables, including box size or geometry, water model, salt content, force field or electrostatic cutoff. This result is in agreement with earlier studies on the subject.22,23 Additionally, many of the primary outliers of MD simulations in comparison with experimental data also are outliers when comparing the results of different MD simulations, which suggests that sampling is one of the main limitations when comparing NMR- and MD-derived order parameters. A solution state NMR experiment is per se averaging over a large number of molecules in many different states with many different transient and local behaviors. In contrast, each MD simulation considers a single molecule and current computational limitations cannot guarantee ergodicity.
Significant improvements have been made to AMBER and CHARMM force fields in recent years.6,20,24,25 Many of those corrections focus on backbone torsion potentials and have led to improved agreement of simulations with experimental data.1,26 Herein we showed that sidechain conformations in starting structures strongly influence backbone order parameters derived from MD simulations and that including recently improved sidechain torsion potentials6 can shift the simulated populations to a more native area of conformational space independently of the starting structure.
The results presented here show that outliers are mostly flexible residues often lying in loops or at the termini of the protein. Those outliers undergo large transitions in conformational space more frequently than other residues. This leads to a bifurcated distribution of their order parameters, reflecting converged and unconverged autocorrelation functions on the timescale of the trajectory or the simulation block used for calculation of the order parameters. The starting structure dictates where the system starts to explore conformational space and consequently how representative sampling will be for the native behavior of the protein. Some transitions were sampled preferentially for one set of starting structures, almost independently of the choice of the parameters used to set up or simulate the system.
Increased simulation lengths allow better sampling, but even for very long simulations the dependence on the starting structure can be strong. The case of Asp22 illustrates this very well. The position of the hydroxyl group of Tyr3 at the beginning of the simulation is sufficient to confine the protein to one part of conformational space for 2.4 ns well as for 1.2 μs simulations. Earlier studies on the prediction of hydrogen positions have found the accurate prediction of hydroxyl hydrogen positions to be of particular difficulty.27,28
Earlier publications have addressed the sampling problem with methods such as accelerated MD (AMD), high temperature MD, replica exchange MD and number of other approaches.29–35 One of these studies used AMD to generate starting structures for regular MD simulations of GB3.29 This approach was not able to produce all of the motions observed in our MD simulations (especially motions in loop 2). On the other hand, any approach that involves the use of a variety of starting structures might mean oversampling experimentally insignificant parts of conformational space.
Here we presented an alternative solution to this problem. The main order parameter outliers between different simulations exhibit bifurcated behavior related to the convergence of the respective bond vector orientational autocorrelation functions. The agreement between the different sets of starting structures as well between simulations and experimentally derived order parameters is improved by excluding simulations that fail a test for convergence of the autocorrelation function of any residue from the averaging of the order parameters for that residue. The success of that strategy might indicate that the unconverged movements are too rare to be seen experimentally in the bulk of molecules, occur on timescales inaccessible to NMR spin relaxation experiments or are erroneously sampled by the simulation and do not occur in the real protein in solution. Thus, that a motion is observed in an MD simulation but not in a specific NMR experiment does not necessarily mean that the force field is erroneous: the motion simply may not be visible with the specific experimental method. The difference of order parameters resulting from different magnetic field strengths, different chemical shift anisotropies or methods used for deriving the order parameters has been shown to exceed 0.1 for some residues of GB3, which is similar to differences between simulations in different force fields and between simulations and experimental data.1,8,9 More recently, Yao et al. have used site specific 15N chemical shift anisotropy tensors to improve the calculation of order parameters from NMR experiments.9 Indeed, some areas of the protein, especially the α-helix, now show a much higher agreement between experimental and MD derived order parameters (Fig. S8).8,9 This again illustrates that improvements in experimental methods and interpretation of experimental results also are critical for assessing necessary improvements in simulation methods.
Conclusion
In the present study, we used MD simulations of GB3, a common model system for protein dynamics, for a detailed investigation of discrepancies between different sets of simulations and between simulations and NMR spin relaxation experiments. We compared the square of the generalized order parameter of the backbone NH bond vector derived from MD simulations and NMR spin relaxation measurements. Major discrepancies between different sets of simulations are due mostly to flexible regions of the protein undergoing nanosecond timescale motions corresponding to transitions between subensembles in conformational space. The four glycines of GB3, all situated at the end of secondary structures at the loop hinges, consistently are outliers in the different simulations.
Nanosecond timescale transitions involve movements of flexible regions of the protein, such as the loops and termini, and are often coupled to the breaking or forming of hydrogen bonds. The autocorrelation function of the involved NH bond vector does not converge for simulation trajectories that are not much longer than the timescale of these transitions. Thus, the effect on the average order parameters is significant for numbers of simulations that can be run on current commodity computer clusters, making sampling a predominant determinant for the agreement of different simulations to each other and to order parameters obtained by NMR spin relaxation experiments. Improved agreement with experiment for order parameters averaged over the 1.2 microsecond trajectory supports this conclusion. However, not all discrepancies observed between MD simulations are resolved by increased sampling. The example of the hydroxyl group of Tyr3 illustrates the strong the dependence on the starting structure can be even for very long simulations. It also highlights the need of care in preparing the structures for simulation, especially when adding hydrogen atoms to the X-ray crystal structures, which usually lack hydrogen atoms.
The differences between starting structures do not have to be large or obvious to have a noticeable influence on the dynamical behavior of the protein. Earlier studies focused on starting structures with different backbone conformations or starting structures derived from different crystal structures.22,23 Here we showed that seemingly small conformational differences in sidechains resulting from different setup protocols applied to the same crystal structure can influence the backbone order parameters for sites distant in sequence or space and therefore seemingly uncorrelated at first glance. Furthermore, we identified specific molecular interactions responsible for the altered conformational dynamics for different starting structures of GB3.
In summary this study emphasizes the importance of both increased sampling and good choices of starting structures in MD simulations. Eliminating nanosecond timescale motions when averaging order parameters over all simulations increases agreement between simulations and experiment. However, force field or sampling limitations might not be the only issues in accurately characterizing nanosecond or slower motions, because NMR spin relaxation techniques are largely insensitive to motions in this time regime. Thus, full understanding of the processes that can be captured by NMR measurements are necessary when judging the accuracy of MD simulations. Identifying and understanding the discrepancies and aberrances between simulations and NMR can provide insights that help develop better force fields or NMR experiments and improve their interpretation.
Supplementary Material
Acknowledgments
We thank D.E. Shaw Research for kindly providing data and discussion. We gratefully acknowledge support from NIH grants GM 50291 (A.G.P.) and GM 40526 (R.A.F.) and an NSF Graduate Research Fellowship (K.A.S.).
References
- 1.Trbovic N, Kim B, Friesner RA, Palmer AG., 3rd Structural analysis of protein dynamics by MD simulations and NMR spin-relaxation. Proteins. 2008;71(2):684–94. doi: 10.1002/prot.21750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lipari G, Szabo A. Model-free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. I. Theory and range of validity. J Am Chem Soc. 1982;104:4546–4559. [Google Scholar]
- 3.Jarymowycz VA, Stone MJ. Fast time scale dynamics of protein backbones: NMR relaxation methods, applications, and functional consequences. Chem Rev. 2006;106(5):1624–71. doi: 10.1021/cr040421p. [DOI] [PubMed] [Google Scholar]
- 4.Palmer AG. NMR characterization of the dynamics of biomacromolecules. Chem Rev. 2004 Aug;104(8):3623–40. doi: 10.1021/cr030413t. [DOI] [PubMed] [Google Scholar]
- 5.Meiler J, Prompers JJ, Peti W, Griesinger C, Bruschweiler R. Model- free approach to the dynamic interpretation of residual dipolar couplings in globular proteins. J Am Chem Soc. 2001;123(25):6098–107. doi: 10.1021/ja010002z. [DOI] [PubMed] [Google Scholar]
- 6.Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, Shaw DE. Improved side-chain torsion potentials for the Amber ff99sb protein force field. Proteins. 2010;78(8):1950–8. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Derrick JP, Wigley DB. The third IgG-binding domain from streptococcal protein G. An analysis by X-ray crystallography of the structure alone and in a complex with Fab. J Mol Biol. 1994;243(5):906–18. doi: 10.1006/jmbi.1994.1691. [DOI] [PubMed] [Google Scholar]
- 8.Hall JB, Fushman D. Variability of the 15N chemical shielding tensors in the B3 domain of protein G from 15N relaxation measurements at several fields. Implications for backbone order parameters. J Am Chem Soc. 2006;128(24):7855–70. doi: 10.1021/ja060406x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yao L, Grishaev A, Cornilescu G, Bax A. Site-specific backbone amide (15)N chemical shift anisotropy tensors in a small protein from liquid crystal and cross-correlated relaxation measurements. J Am Chem Soc. 2010;132(12):4295–309. doi: 10.1021/ja910186u. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.The pymol Molecular Graphics System. Schrödinger, LLC; [Google Scholar]
- 11.Jacobson MP, Kaminski GA, Friesner RA, Rapp CS. Force field validation using protein side chain prediction. J Phys Chem B. 2002;106:11673–11680. [Google Scholar]
- 12.Maestro. Schrödinger, LLC; New York, NY: 2009. [Google Scholar]
- 13.Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J Chem Phys. 1983;79:926–935. [Google Scholar]
- 14.Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, Kolossvary I, Moraes MA, Sacerdoti FD, Salmon JK, Shan Y, Shaw DE. Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters. D. E. Shaw Research, LLC; New York, NY 10036, USA: 2006. [Google Scholar]
- 15.Kräutler V, van Gunsteren WF, Hünenberger PH. A fast SHAKE algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations. J Chem Theory Comput. 2001;22(5):501–8. [Google Scholar]
- 16.Case DA, Darden TA, Cheatham TE, 3rd, Simmerling CL, Wang J, Duke RE, Luo R, Merz KM, Pearlman DA, Crowley M, Walker RC, Zhang W, Wang B, Hayik S, Roitberg A, Seabra G, Wong KF, Paesani F, Wu X, Brozell S, Tsui V, Gohlke H, Yang L, Tan C, Mongan J, Hornak V, Cui G, Beroza P, Mathews DH, Schafmeister C, Ross WS, Kollman PA. AMBER. Vol. 9. University of California; San Francisco: 2006. [Google Scholar]
- 17.Humphrey W, Dalke A, Schulten K. VMD-visual molecular dynamics. J Mol Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 18.Chandrasekhar I, Clore GM, Szabo A, Gronenborn AM, Brooks BR. A 500 ps molecular dynamics simulation study of interleukin-1 beta in water. Correlation with nuclear magnetic resonance spectroscopy and crystallography. J Mol Biol. 1992;226:239–250. doi: 10.1016/0022-2836(92)90136-8. [DOI] [PubMed] [Google Scholar]
- 19.Case DA. Calculations of NMR dipolar coupling strengths in model peptides. J Biomol NMR. 1999;15(2):95–102. doi: 10.1023/a:1008349812613. [DOI] [PubMed] [Google Scholar]
- 20.Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins. 2006;65:712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kührová P, De Simone A, Otyepka M, Best RB. Force-field dependence of chignolin folding and misfolding: comparison with experiment and redesign. Biophys J. 2012;102(8):1897–906. doi: 10.1016/j.bpj.2012.03.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Genheden S, Diehl C, Akke M, Ryde U. Starting-Condition Dependence of Order Parameters Derived from Molecular Dynamics Simulations. J Chem Theory Comput. 2010;6(7):2176–90. doi: 10.1021/ct900696z. [DOI] [PubMed] [Google Scholar]
- 23.Koller AN, Schwalbe H, Gohlke H. Starting structure dependence of NMR order parameters derived from MD simulations: implications for judging force-field quality. Biophys J. 2008;95(1):L04–6. doi: 10.1529/biophysj.108.132811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Buck M, Bouguet-Bonnet S, Pastor RW, MacKerell AD., Jr Importance of the CMAP correction to the CHARMM22 protein force field: dynamics of hen lysozyme. Biophys J. 2006;90:L36–L38. doi: 10.1529/biophysj.105.078154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mackerell AD, Jr, Feig M, Brooks CL., 3rd Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J Comput Chem. 2004;25(11):1400–15. doi: 10.1002/jcc.20065. [DOI] [PubMed] [Google Scholar]
- 26.Best RB, Buchete NV, Hummer G. Are current molecular dynamics force fields too helical? Biophys J. 2008;95(1):L07–9. doi: 10.1529/biophysj.108.132696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Forrest LR, Honig B. An assessment of the accuracy of methods for predicting hydrogen positions in protein structures. Proteins. 2005;61(2):296–309. doi: 10.1002/prot.20601. [DOI] [PubMed] [Google Scholar]
- 28.Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, Presley BK, Richardson JS, Richardson DC. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol. 1999;285(4):1711–33. doi: 10.1006/jmbi.1998.2400. [DOI] [PubMed] [Google Scholar]
- 29.Markwick PR, Bouvignies G, Blackledge M. Exploring multiple timescale motions in protein GB3 using accelerated molecular dynamics and NMR spectroscopy. J Am Chem Soc. 2007;129(15):4724–30. doi: 10.1021/ja0687668. [DOI] [PubMed] [Google Scholar]
- 30.Liwo A, Czaplewski C, Ołdziej S, Scheraga HA. Computational techniques for efficient conformational sampling of proteins. Curr Opin Struct Biol. 200;18(2):134–9. doi: 10.1016/j.sbi.2007.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Okamoto Y. Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations. J Mol Graph Model. 2004;22(5):425–39. doi: 10.1016/j.jmgm.2003.12.009. [DOI] [PubMed] [Google Scholar]
- 32.Mitsutake A, Sugita Y, Okamoto Y. Generalized-ensemble algorithms for molecular simulations of biopolymers. Biopolymers. 2001;60(2):96–123. doi: 10.1002/1097-0282(2001)60:2<96::AID-BIP1007>3.0.CO;2-F. [DOI] [PubMed] [Google Scholar]
- 33.Chou KC, Carlacci L. Simulated annealing approach to the study of protein structures. Protein Eng. 1991;4(6):661–7. doi: 10.1093/protein/4.6.661. [DOI] [PubMed] [Google Scholar]
- 34.Elber R. Long-timescale simulation methods. Curr Opin Struct Biol. 2005 Apr;15(2):151–6. doi: 10.1016/j.sbi.2005.02.004. [DOI] [PubMed] [Google Scholar]
- 35.Lei H, Duan Y. Improved sampling methods for molecular simulation. Curr Opin Struct Biol. 2007 Apr;17(2):187–91. doi: 10.1016/j.sbi.2007.03.003. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



