Abstract

Protein folding is a fascinating, not fully understood phenomenon in biology. Molecular dynamics (MD) simulations are an invaluable tool to study conformational changes in atomistic detail, including folding and unfolding processes of proteins. However, the accuracy of the conformational ensembles derived from MD simulations inevitably relies on the quality of the underlying force field in combination with the respective water model. Here, we investigate protein folding, unfolding, and misfolding of fast-folding proteins by examining different force fields with their recommended water models, i.e., ff14SB with the TIP3P model and ff19SB with the OPC model. To this end, we generated long conventional MD simulations highlighting the perks and pitfalls of these setups. Using Markov state models, we defined kinetically independent conformational substates and emphasized their distinct characteristics, as well as their corresponding state probabilities. Surprisingly, we found substantial differences in thermodynamics and kinetics of protein folding, depending on the combination of the protein force field and water model, originating primarily from the different water models. These results emphasize the importance of carefully choosing the force field and the respective water model as they determine the accuracy of the observed dynamics of folding events. Thus, the findings support the hypothesis that the water model is at least equally important as the force field and hence needs to be considered in future studies investigating protein dynamics and folding in all areas of biophysics.
Introduction
Proteins are essential macromolecules throughout the kingdom of life. Due to their high variability in sequence and structure they can fold into a panoply of different structures and fulfill various different roles and functions, e.g., in metabolism,1,2 signal transduction,3,4 immunity,5,6 muscle contraction,7,8 or drug delivery.9,10 Protein-based therapeutics, also known as biologics, are an increasingly important class of drugs, which has been highlighted in various previous studies.11−14 Hence, understanding protein folding, unfolding, and misfolding is an important ongoing field of research.15−17 It is well-known that most proteins can adopt a distinct fold, which is evolutionarily well conserved, even more than their sequence.18 However, it still remains elusive why a specific sequence adopts a certain fold.16,19 The accurate prediction of protein folding and unfolding dynamics is crucial for numerous fields of research, including drug design,20,21 enzyme engineering,22 and understanding the molecular basis of diseases.23,24 After all, certain conformational rearrangements, such as misfolds, are expected to be involved in numerous diseases, like Alzheimer’s disease, Huntington disease, Parkinson’s disease, or diabetes type 2.25−28 Additionally, refolding events resulting from point mutations have been shown to elucidate changes in specificity or even polyreactivity, providing new applications in drug design and engineering.29−32
In recent years, fast-folding proteins have been in the focus of various studies.33−35 These proteins provide an excellent basis for studying the principles of protein folding, since they allow us to capture folding events in the micro- to millisecond time scale. Furthermore, they exhibit simplified folding dynamics and therefore offer insights into the fundamental interactions and energy landscapes that drive the folding process.
In this study, we focus on the fast-folding protein Chignolin36 and its variant CLN02537 (Figure 1). These short proteins both consist of ten amino acids and adopt a β-hairpin structure as their native fold. They only differ in their terminal residues, namely, the glycines in Chignolin are replaced with two tyrosines in CLN025, which have been reported to increase the stability of the folded state.37,38
Figure 1.

Schematic representation of the investigated systems: on the left is Chignolin (PDB 1UAO),37 and on the right is CLN025 (PDB 5AWL)39 depicted. The mutated residues in CLN025 are highlighted in black (Tyr1 and Tyr10). In the middle, an overlay of both experimentally determined structures used as starting structures for our simulations is shown.
Besides experimental techniques such as NMR, fluorescence measurements, cDNA display proteolysis, and many more,40−43 molecular dynamics (MD) simulations represent an invaluable addition to characterize and study folding processes at atomistic level. Unfortunately, a simulation is only as good as the input parameters used. This fact drives the ongoing development (and improvement) of force fields, water models, and other parameters. The two AMBER44 force fields ff14SB45 and ff19SB46 have demonstrated good agreement to experimental data in protein studies and thus are among many other popular force fields extensively applied in MD simulations.47 While the ff14SB force field includes small empirical adjustments based on NMR data via parametrization with TIP3P,48 ff19SB is solely physics-based and therefore does not depend on a distinct water model. However, the developers recommend the usage of OPC49 (2014) as it reproduces water properties more accurately than older models like TIP3P (1981). Although the AMBER44 force fields are widely used, other options should also be considered. Especially CHARMM,50,51 GROMOS,52 and OPLS53 are noteworthy alternatives, as reflected in their increasing utilization. Beyond conventional force fields, polarizable force fields may potentially improve simulation accuracies, however, at significantly increased computational cost.35,54 Here, we only consider non-polarizable force fields.
The water model has a large impact on the simulated protein properties. The TIP3P (transferable intermolecular potential 3-point) water model, developed by Jorgensen et al., approximates water behavior using a three-site representation.48 It has been widely used in protein simulations due to its simplicity and computational efficiency compared to other 3-point (SPC,55,56 SPC-E,57 OPC3,58 etc.) or 4-point water models (TIP4P,59 TIP4Pew,60 etc.). In recent years, the four-site OPC (optimal 4-point 3-charge rigid water model) model has gained attention among others as an alternative solvent model because it describes bulk water more accurately than less modern water models. It was parametrized based on charge optimization instead of constrained structural features during parametrization.49 OPC has demonstrated better performance in reproducing experimental thermodynamic properties and water–water hydrogen bonding interactions compared to TIP3P.49
Here, we contribute to the ongoing exploration of protein folding dynamics by providing a detailed comparative analysis of ff14SB/TIP3P and ff19SB/OPC by means of fast-folding proteins, i.e., Chignolin and its mutant CLN025. We highlight strengths and limitations of both force fields in combination with their corresponding water models and compare the folding kinetics and thermodynamics with respect to the existing experimental data.
Methods
Simulation Setup
We chose Chignolin (PDB 1UAO)37 and its mutant CLN025 (PDB 5AWL)39 to compare the influence of the force fields (ff14SB and ff19SB) and the water models (TIP3P and OPC, respectively) on folding thermodynamics and kinetics. The structures were prepared in MOE (molecular operating environment)61 applying the Protonate 3D tool (all amino acids were in their standard protonation state at pH 7).62 We placed the fast-folding proteins into cubic water boxes (of TIP3P59 or OPC49 water molecules, respectively) with a minimum wall distance to the protein of 20 Å.63,64 To this end, we used the tleap tool from the AmberTools20 package.65 Parameters for the simulations were derived either from the AMBER force field 14SB46 (for TIP3P water) or 19SB45 (for OPC water). To neutralize the charges, we used uniform background charges65−67 and each system was carefully equilibrated using a multistep equilibration protocol.68,69
With pmemd.cuda, MD simulations in an NpT ensemble were performed.70 Long range electrostatic interactions were calculated using the particle-mesh Ewald method. Bonds involving hydrogen atoms were restrained according to the SHAKE algorithm, enabling a 2.0 fs time step.71 The atmospheric pressure (1 bar) of the system was kept constant by weak coupling to an external bath using the Berendsen algorithm.72 The Langevin thermostat was used to maintain the temperature during simulations at 300 K.73 For each system, two distinct simulations, each of 6 μs, were performed. The long trajectories were clustered in 2D root mean squared deviation (2D-RMSD) according to a hierarchical agglomerative approach implemented in cpptraj66 (σ = 1.8 Å). We started new simulations of every cluster representative (100 ns) to enhance the coverage of the conformational surface (Table 1).
Table 1. Detailed Description of the Simulated Systems and Aggregated Simulation Times.
| system | force field/water model | total aggregated simulation time |
|---|---|---|
| Chignolin | ff14SB/TIP3P | 61.5 μs |
| Chignolin | ff19SB/OPC | 110.9 μs |
| CLN025 | ff14SB/TIP3P | 35.7 μs |
| CLN025 | ff19SB/OPC | 82.2 μs |
Simulation Analysis
With the obtained trajectories we performed a time-lagged independent component analysis (tICA) using the Python library PyEMMA 2 employing a lag time of 5 ns.74 Thermodynamics and kinetics were recovered by Markov state models (MSMs) using PyEMMA 2.75 Herein, a k-means clustering algorithm76 was used to define microstates and a PCCA+ clustering algorithm77 was used for coarse-graining the microstates into macrostates. In general, Markov state models produce a coarse-grained representation of the kinetically distinct conformational states and enable the reconstruction of the free energy surface. The Chapman–Kolmogorov test78,79 evaluates the sampling efficiency and the reliability of the model by employing the variational approach for Markov processes and by accounting for the fraction of states used.80 To build the Markov state model, we used the distances between all heavy atoms, defined 100 microstates using the k-means clustering algorithm, and applied a lag time of 5 ns.
To calculate intramolecular contacts of the investigated proteins, we used the GetContacts software.81 The formation and duration of the respective contacts during a simulation can be visualized with so-called flareplots. In these plots, interacting residues are connected via a line that is colored according to the frequency of the interaction. To visualize the contacts, we used an in-house Python script, which is available on GitHub (https://github.com/liedllab/GetContacts_analysis) and the Python library MNE-connectivity (https://github.com/mne-tools/mne-connectivity). We focused our analysis on hydrogen bond interactions and contacts involving aromatic amino acids (π-stacking and T-stacking). Furthermore, secondary structures, RMSDs (of the Cα atoms), and native contacts were calculated using the AMBER implementation cpptraj.66 As reference, both for the RMSD and the native contacts, the experimental structures (Chignolin: PDB 1UAO,37 CLN025: PDB 5AWL(39)) were used. The secondary structure assignment has been performed according to DSSP,82 and error estimates were evaluated with block averaging.
Grid Inhomogeneous Solvation Theory
Grid inhomogeneous solvation theory (GIST) was used to estimate localized thermodynamic properties of solvation.83−86 GIST is an MD-based approach originating from inhomogeneous solvation theory (IST), which considers the solvation process in terms of localized effects on the solvent around the solute compared to the bulk solvent.87,88 GIST replaces IST’s spatial integrals describing the solvation thermodynamics through discrete sums over a three-dimensional grid, simplifying the underlying calculations. In this study, we apply GIST to calculate the free energy of solvation for five random sample structures of each MSM macrostate to estimate the ensemble properties. Simulations (100 ns) were run with a similar simulation setup as described above; however, all considered protein structures were solvated into a solvent box with a side length of 75 Å. During the simulation, the protein heavy atoms were kept rigid by applying harmonic position restraints of 1000 kcal mol–1 Å–2. A GPU-accelerated version of the GIST algorithm89,90 was used to calculate thermodynamic properties of the last 80 ns of each trajectory on a grid with a size of 75 × 75 × 75 Å and a voxel spacing of 0.5 Å at a temperature of 300 K. Reference densities of 0.0329 and 0.0333 molecules Å–3 were used for the TIP3P and the OPC water models, respectively, according to their bulk density. The free energy of solvation for voxel k is calculated as follows, including a recently suggested correction factor of 0.4 ΔS to account for the missing higher-order entropy terms:90,91
| 1 |
Here, the GIST entropy ΔSGIST (rk) of a single voxel is derived from orientational and translational mobility of the water molecules within this voxel during the simulation. To this end, the distribution of orientations and populations of water molecules within the respective voxel is evaluated over a short simulation. Furthermore, the GIST enthalpy ΔEGIST (rk) of the water molecules within a single voxel quantifies the average interaction of these waters to all other voxels and the solute. For each single peptide structure, thermodynamic properties were calculated by integrating over all GIST voxels within 9 Å of the structures according to Kraml et al.92 To get an ensemble value for the macrostate free energy of solvation, the arithmetic average for each state was calculated from the respective single values.14 In general, the entropy is derived from orientational and translational mobility of the water molecules, while the enthalpy quantifies the interaction between the water molecules in the solvation shell or the bulk solvent, respectively. Error estimates for the sampling errors were again evaluated with block averaging of the restrained GIST simulations.
Results
We elucidate the influence of distinct force fields (ff14SB and ff19SB) and water models (TIP3P and OPC) on the folding characteristics of Chignolin and its mutant CLN025. We analyzed both kinetic and thermodynamic properties in our simulations. To ensure enough conformational transitions between folding and unfolding events, we aggregated in total 290 μs of sampling time (Table 1).
Characterization of Conformational States
Figures 2, 3, 4, and 5 (A) show the free energy surface in terms of the first two TICA coordinates for Chignolin and CLN025. In these free energy landscapes, we show the Markov state models of the folding dynamics, discretizing the free energy surface in three to four kinetically separated conformational states on the free energy landscape. In both Chignolin (Figures 2 and 3) and CLN025 (Figures 4 and 5), independent of the force fields and water models, we found a clear separation of folded and unfolded states. The unfolded state representatives are always shown in green, while the folded state conformations are depicted in purple (Figures 2–5). The latter can be further split up in accordance with available crystal/NMR structures into native folded (purple) and misfolded states (pink and light pink). In Figures 2–5 (parts D and E) we find significant differences in the RMSDs and fraction of native contacts (abbreviated with q) underlining the distinct conformations of the defined states. While the native folded state shows the lowest RMSD (ca. 2 Å) and the highest q compared to the experimental data, the unfolded state shows the exact opposite trend bearing high RMSD and low q. However, the difference between the natively folded and the misfolded states are less pronounced (purple, pink, and light pink states).
Figure 2.
Detailed analysis of Chignolin simulated with ff14SB/TIP3P. (A) The population of the macrostates projected onto the free energy landscape. Green is the unfolded state, purple the natively folded, and light pink the misfolded state. The transition times and the MSM validations are shown in the SI. (B) The intramolecular interaction patterns of these respective states. The darker the interaction, the higher the occurrence in the simulation. The arrows between the states represent the transition times (for further details, see the SI). The larger the arrow, the faster the transition time. In general, the transition times all range in the 100 ns area. (C) The secondary structures for each state displayed determined according to DSSP.82 (D) The RMSDs and (E) the fractions of native contacts.
Figure 3.
Detailed analysis of Chignolin simulated with ff19SB/OPC. (A) The population of the macrostates projected onto the free energy landscape. Green is the unfolded state, purple the natively folded, and light pink the misfolded state. The transition times and the MSM validations are shown in the SI. (B) The intramolecular interaction patterns of the respective states. The darker the interaction, the higher the occurrence in the simulation. The arrows between the states represent the transition times (for further details, see the SI). The larger the arrow, the faster the transition time. In general, the transition times all range in the 100 ns (gray arrows) to low microsecond (black arrows) area. (C) The secondary structures for each state displayed determined according to DSSP.82 (D) The RMSDs and (E) the fractions of native contacts.
Figure 4.
Detailed analysis of CLN025 simulated with ff14SB/TIP3P. (A) The population of the macrostates projected onto the free energy landscape. Green is the unfolded state, purple is the natively folded, and pink and light pink are the misfolded states. The transition times and the MSM validations are shown in the SI. (B) The intramolecular interaction patterns of the respective states. The darker the interaction, the higher the occurrence in the simulation. The arrows between the states represent the transition times (for further details, see the SI). The larger the arrow, the faster the transition time. In general, the transition times all range in the 100 ns (gray arrows) to low microsecond (black arrows) area. (C) The secondary structures for each state displayed determined according to DSSP.82 (D) The RMSDs and (E) the fractions of native contacts.
Figure 5.
Detailed analysis of CLN025 simulated with ff19SB/OPC. (A) The population of the macrostates projected onto the free energy landscape. Green is the unfolded state, purple the natively folded, and light pink the misfolded state. The transition times and the MSM validations are shown in the SI. (B) The interaction patterns of the respective states. The darker the interaction, the higher the occurrence in the simulation. The arrows between states represent the transition times. The larger the arrow, the faster the transition time (for further details, see the SI). In general, the transition times all range in the 100 ns (gray arrows) to low microsecond (black arrows) area. (C) The secondary structures for each state displayed determined according to DSSP.82 (D) The RMSDs and (E) the fractions of native contacts.
Remarkably, the secondary structures of the different folded states (Figures 2–5 (C)) are clearly different: the misfolded state occurrent in all simulations (light pink) highlights a shifted hairpin conformation (extended β-sheet in cyan) compared to the native folded structure, where G7–D3 and T8–Y2 form the β-sheet instead of T8–D3 and W9–Y2. The second misfolded state, which could only be separated in the CLN025 simulations with ff14SB/TIP3P (Figure 4 (C), pink), is dominated by the turn structure (dark blue) of the central residues (4–7) but is missing the extended β-sheet conformation (only a bridge is occurring, colored medium blue). The unfolded state does not adopt any definite secondary structure but shows low occurrence of all different patterns.
A detailed look at the interaction patterns of the states reveals a more complex network in the native fold (Figures 2–5 (B, purple)). Hydrogen bond formation hints at a higher number of interaction partners in the β-sheet region compared to the misfolded states, which also stabilizes this secondary structure. Nevertheless, the misfolded states also have a noteworthy number of interactions that are more conserved between specific binding partners. Of significant importance for differentiation of the native and misfolded states seem to be the hydrogen bond interaction counterparts of residue 1 and residue 3 evident in Figures 2–5 (B, light pink). As for the native state, these residues form interactions that characterize the symmetry of the structure (G1/Y1–G10/Y10 and D3–T8). The misfolded states do not show this symmetry by forming shifted hydrogen bonds from G1/Y1 to W9 instead of G10/Y10. Similarly, D3–G7 is present instead of D3–T8. Furthermore, the additional misfolded state (Figure 4B, pink) shows higher variability in stabilizing interaction partners, especially highlighting increased side chain hydrogen bonds. This characterizes the shifted conformation of this state, where the steric character of the tyrosines causes distorted ends of the protein chain (also visible in the structural representation in Figure 4A).
Effect of Force Field and Water Model on Protein Structure
Even though the same conformational space (Figure S1) is covered by all simulations, the populations of the different conformations and transition kinetics vary significantly. While the simulations with ff14SB/TIP3P show the native folded to be the highest populated state (Figures 2 and 4 (A)), the ff19SB/OPC simulations reveal a population shift toward the unfolded state, which becomes the highest populated (Figures 3 and 5 (A)). In addition, the transition times are affected.
Furthermore, ff14SB/TIP3P shows a higher bias toward helix formation (around 7% compared to 3% for ff19SB/OPC in Chignolin as well as in CLN025 averaged over all residues during the whole simulation; Figures 2–5 (C)). Interestingly, this secondary structure motif is not separable from the unfolded Markov state. Also, the interaction occurrence demonstrates a higher number of hydrogen bonds on average in the more stabilized ff14SB/TIP3P simulations (14 hydrogen bonds on average compared to four in ff19SB/OPC simulations; see Figure S3).
Structural Comparison of Chignolin and CLN025
The high similarity in sequence, in addition to the short length of Chignolin and CLN025, renders the observed similarity in conformational space unsurprising. However, the additional stabilization due to the terminal tyrosines in CLN025 compared to glycines in Chignolin is prevalent in our analyses. We find an increase in the number of folded conformations for CLN025 independent of the force field/water model combinations. Furthermore, the misfolded states simultaneously become destabilized in the CLN025 simulations, ensuring a higher prevalence of the native folded structure. A detailed look into the interaction patterns of the residues forming the folded states (Figures 4 and 5 (B)) reveals the crucial impact of the terminal tyrosines in stabilizing the folded conformation. Owing to their aromatic character and π-stacking interactions between these two amino acids, Y2 and W9 form a hydrophobic patch sustaining the β-hairpin. This distinct characteristic is less pronounced in Chignolin since it is missing the terminal aromatic residues.
Solvation Thermodynamics
The difference of the mean thermodynamic properties of the mis- and unfolded states to the folded state shown in Figure 6 highlights that for all combinations of force fields, the water environment favors the unfolded state the most, followed by the misfolded states. The folded state provides the least favorable environment for water. This is expected, as there is always a balance to be struck between peptide–peptide and peptide–water interactions. An unfavorable change in solvation free energy between two states is often counteracted by even more favorable intramolecular interactions, e.g., through the formation of stronger hydrogen bonds than would be possible with water. Indeed, this counterbalance is at the core of protein secondary structure and without it, no fold would be stable in solution.93 Comparing the variations to the folded states of CLN025 and Chignolin, the mutant shows larger differences in terms of solvation free energy. This results in stronger stabilization of the unfolded and misfolded states by the solvent. Most likely, this arises from the introduction of the two hydroxy groups on the mutants’ additional tyrosines. Remarkably, when investigating the difference between the two water models, the OPC is found to favor the unfolded and to a lesser degree the misfolded state more than TIP3P. Furthermore, the protein–water network (data not shown) reveals more stabilized interactions in the ff19SB/OPC simulations. In sum, these calculations indicate that OPC overstabilizes the unfolded state in comparison to TIP3P, in line with the observations from the MSMs. We find conclusive results in the thermodynamic contributions to the solvation free energy: The difference between the water models seems to arise from the enthalpy, where the variation is noticeably larger than that for the entropies. In terms of the dynamics, the OPC water therefore seems to behave similarly to TIP3P water but results in stronger interactions. We find that the entropy of solvation is systematically smaller in the OPC, however, the differences between the two water models are small. In contrast, the enthalpies of solvation are significantly different. From this observation, we reason that, in comparison to bulk solvent–solvent interactions, the protein–solvent interactions are stronger in OPC.
Figure 6.

Differences in GIST properties of misfolded (Mis. 1 and Mis. 2, colored pink and light pink) and unfolded (Unf., colored green) states to folded states for (A) Chignolin (Chig.) and (B) mutant CLN025 (CLN). Left side: for the OPC water model. Right side: for the TIP3P water model. Displayed are the differences of the free energy of solvation (ΔAGIST, no pattern) and its components, the mean solvation enthalpy (ΔEGIST, dotted), and the solvation entropy (TΔSGIST, crossed) calculated from eq 1. The error bars give the sampling error within the GIST simulations, as assessed by block averaging for each trajectory and error propagation to the combined ensemble properties.
Discussion
Chignolin and CLN025 are among the most investigated fast-folding proteins concerning force field differences and testing various setups in computer simulations.34,35,38,94,95 Due to their small size and distinct native conformation, they present fast-folding kinetics and hence are ideal candidates for examining protein folding. Characterizing the folding dynamics of Chignolin/CLN025 can facilitate the understanding of larger, biologically relevant systems and can consequently inform the design and development of new biotherapeutics. We present a thorough atomistic characterization of the kinetically independent states and their transitions considering different force fields and water models (ff14SB/TIP3P and ff19SB/OPC).
As previously addressed by Kührová et al., Chignolin does not only adopt a natively folded state and unfolded conformations, but also forms a misfolded state, which is characterized by parallel shifted β-strands.38 In contrast to prior studies,96 we find similar results also for CLN025, however, an additional misfolded structure could be characterized. This other misfold is caused by steric hindrances of the terminal tyrosines, which explains its absence in Chignolin. The asymmetric character of the first misfolded state, present in Chignolin and CLN025, has already been outlined before.38 The additional misfolded state forms a more complex network with varying hydrogen bond partners (Y1–T8/W9/Y10 and D3–E5/T6/G7/T8). Although this state is missing stabilizing backbone interactions to form a secondary structure motif, it still is a preserved conformation with striking side chain hydrogen bonds and aromatic interactions (π-stacking Y2–Y10 and T-stacking W9–Y10). We sample the two different misfolded conformations in both simulations of CLN025, independent of the force field/water model combination (Figure S2). However, the separation was only possible in the ff14SB/TIP3P sampling due to the higher stability of folded states in this run.
Previous studies have already highlighted the difference in stability of the two variants, which is also apparent in this study.38,39,96 Due to the terminal tyrosines, CLN025 forms additional interactions, stabilizing the folded conformation. This was also emphasized in NMR studies of the proteins, where Chignolin was folded 60% of the time and CLN025 around 90% (both at 300 K).37,39 It has been previously demonstrated that ff14SB/TIP3P generally overstabilizes secondary structures, especially helices.35,45,97 Yet, if we compare the probabilities presented here, we find similar results for the folded conformation probability in ff14SB/TIP3P and the NMR experiments. The folded state of Chignolin is slightly overstabilized (66 ± 13% folded and 26 ± 4% misfolded compared to 60% folded in the experiment), but that of CLN025 is very similar (63 ± 1% folded and 24 ± 1% misfolded compared to 90% folded in the experiment). If we have a look at the different setup (ff19SB/OPC) in the same context, we find distinct differences, independent of the investigated system (Chignolin, 16 ± 1% folded and 5 ± 1% misfolded; CLN025, 31 ± 12% folded and 1.2 ± 0.3% misfolded). This hints at a higher stabilization of the unfolded states, possibly caused by stronger interactions of the water model with the protein.
Furthermore, the GIST results highlight that the mentioned aromatic residues are able to compensate for the desolvation penalty by forming additional stronger interactions with water, stabilizing the folded states. The strong increase in GIST enthalpy, i.e., higher interaction energies for the OPC, favors protein–water interactions, which might explain the higher population of the unfolded state. Supposedly the increased multipole values (i.e., dipole and quadrupole) of the OPC water model compared to TIP3P could be responsible for stabilizing unfolded conformations.49 The importance of the water model in determining thermodynamics and kinetics of the protein folding is further emphasized in Figures S4 and S5, showing that simulations with Chignolin using ff14SB/OPC and ff19SB/TIP3P reveal similar trends, i.e., higher probability of the unfolded state when using the OPC water model. Also other authors have found that OPC, in comparison to TIP3P, in some instances favors extended structures of proteins.97,98 Nevertheless, further testing of different proteins with the respective force fields and water models is crucial to identifying the strengths and weaknesses of the individual approaches and setups. Especially since Chignolin and CLN025 both feature the same secondary structure, namely, the β-hairpin, it is challenging to extrapolate the conclusions from these proteins to larger/structurally different systems. Nevertheless, a larger solvent accessible surface area may potentially lead to higher solvation free energy, if the water model exhibits particularly strong interactions with the solute. Accordingly, the influence might decrease with compactness and size due to a lack of target area. On the other hand, an overstabilization of unfolded conformations could also hold advantages in exploring a broader conformational space and sampling structurally distinct states in a shorter simulation time. This is in particular beneficial for larger proteins, where the transition times of large conformational changes can be very long.11,99 Furthermore, it has been demonstrated that ff19SB/OPC, presumably due to this overstabilization, is favorable for intrinsically disordered proteins. While other force field/water model combinations mainly adopt overly compact conformations relative to the experiment, this setup presents a more reasonable structural ensemble.98
Overall, it is rather difficult to propose a general recipe for simulating protein folding processes since there are multiple factors involved in addition to the critical role of the force field and the water model. One of the obstacles in characterizing protein folding processes are the time scale at which the folding events occur.34,100 To sample a thermodynamically and kinetically meaningful ensemble, numerous transitions between the unfolded and the folded states are necessary, which consequently increases the required simulation time. To accelerate the folding process and to overcome the time scale limitations of classical molecular dynamics simulations, enhanced sampling techniques (like hyperdynamics, replica-exchange, or metadynamics) and simulations at higher temperatures can be considered.101−106 However, to reconstruct the thermodynamics and kinetics of these folding events from biased trajectories, reweighting schemes need to be applied.107 Thus, the simulation parameters need to be chosen in accordance with the proposed research question, including the careful selection of the water model and force field.
This choice also plays an important role in the estimation of properties that are relevant for experimental studies. We have recently shown that the choice of the water model in molecular simulations may have a dramatic effect on the thermodynamics of complex conformational transitions, such as the Coil–Globule transition, which shares similarities with protein folding.108 Such biased ensembles will influence the calculated ensemble properties. For example, in structure refinement protocols, force fields and their respective water models are often used to optimize the final conformations based on experimental measures.109 One highly used experimental technique to identify folding events is NMR via the nuclear Overhauser effect (NOE) distances. Here, the spatial proximity of atoms/residues indicates crucial interactions and thus the three-dimensional conformation. Hence, these results can be compared to the native contact analysis, which also states the proximity of atoms in comparison to a reference structure and therefore informs about the folding state. Additionally, the J-coupling values from NMR results, i.e., via the Karplus equation, provide information about the backbone dihedrals and thus the conformational arrangement.110−112 This is especially important for the available structures of CLN025, since there are variations in the backbone torsions between the NMR and the crystal structure.39 In addition to the structural information, amide H-D exchange experiments are available for Chignolin37 and CLN025,39 which identified hydrogens in the backbone that are not surface-exposed and thus might contribute to form stabilizing hydrogen bonds. Therefore, the results from the H-D exchange experiments are suitable for comparison to the secondary structure estimation (also with the dihedrals from the J-coupling) and interaction analyses. In general, there are various experimental techniques which go hand-in-hand with computational analyses and thus represent valuable factors for comparison and validation that guide and drive advances in the field.
Conclusion
We present an exhaustive comparison of the force fields ff14SB and ff19SB with their recommended water models, TIP3P and OPC, respectively, and characterize folding kinetics and thermodynamics of both Chignolin and CLN025. We described previously unidentified states in detail, regarding their conformations and intramolecular interaction patterns. The complexity of the state model and the transitions between states exemplifies the general challenge of understanding protein folding, unfolding, and misfolding.
By outlining their folding pathway in atomistic detail, we were able to differentiate the two systems, Chignolin and CLN025, according to their thermodynamic and kinetic behavior in solution. Comparing the different setups, namely, ff14SB/TIP3P and ff19SB/OPC, we find a population shift from folded to unfolded conformations. An in-depth characterization of this discrepancy points out a stabilizing effect of the unfolded state by the OPC water model independent of the force field. Thus, the choice of force field and especially the water model have a significant impact on the resulting conformational ensemble and determine physically relevant properties. To identify the most influential factors and to gain more comprehensive knowledge, further examples still need to be investigated. However, compared to the compact size of Chignolin, larger systems require a more extensive effort for convergence and come with a higher statistical uncertainty. Our findings could also have profound implications for drug discovery, where conformational variations less dramatic than folding–unfolding can occur between apo and holo transitions. In these scenarios, significantly different results could be obtained when assessing the energetic accessibility of ligand-able conformations in proteins of therapeutic relevance, depending on the force field and water model combinations used. To conclude, this study informs about important considerations when requiring kinetically and thermodynamically reasonable states reaching experimental agreement.
Acknowledgments
This work was supported by the Austrian Science Fund (FWF) via Grants P34518 and DOC178B. This work was supported by the Austrian Academy of Sciences APART-MINT postdoctoral fellowship to M.L.F.Q. We acknowledge CHRONOS for awarding us to access to Piz Daint at CSCS, Switzerland. We acknowledge EuroHPC Joint Undertaking for awarding us access to Karolina at IT4Innovations, Czech Republic.
Data Availability Statement
The structures used in this manuscript are publicly available, with the PDB codes 1UAO and 5AWL. The trajectories have been made available via Zenodo (10.5281/zenodo.10499332).
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01106.
Principal component analysis (PCA) and time-lagged independent component analysis (tICA) in color, extended hydrogen bond analysis, cross solvation results (ff14SB/OPC and ff19SB/TIP3P) for Chignolin, transition times, population probabilities, and Markov state model quality checks (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Sheng Y.; Chen Y.; Wang L.; Liu G.; Li W.; Tang Y. Effects of Protein Flexibility on the Site of Metabolism Prediction for CYP2A6 Substrates. J. Mol. Graph. Model. 2014, 54, 90–99. 10.1016/j.jmgm.2014.09.005. [DOI] [PubMed] [Google Scholar]
- Richard J. P. Protein Flexibility and Stiffness Enable Efficient Enzymatic Catalysis. J. Am. Chem. Soc. 2019, 141 (8), 3320–3331. 10.1021/jacs.8b10836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Q. Protein Flexibility as a Biosignal. Crit. Rev. Eukaryot. Gene Expr. 2010, 20 (2), 157–170. 10.1615/CritRevEukarGeneExpr.v20.i2.60. [DOI] [PubMed] [Google Scholar]
- Almagor L.; Avinery R.; Hirsch J. A.; Beck R. Structural Flexibility of CaV1.2 and CaV2.2 I-II Proximal Linker Fragments in Solution. Biophys. J. 2013, 104 (11), 2392–2400. 10.1016/j.bpj.2013.04.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplon H.; Crescioli S.; Chenoweth A.; Visweswaraiah J.; Reichert J. M. Antibodies to Watch in 2023. mAbs 2023, 15 (1), 2153410 10.1080/19420862.2022.2153410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiu M. L.; Goulet D. R.; Teplyakov A.; Gilliland G. L. Antibody Structure and Function: The Basis for Engineering Therapeutics. Antibodies Basel Switz. 2019, 8 (4), 55. 10.3390/antib8040055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El Ghaleb Y.; Fernández-Quintero M. L.; Monteleone S.; Tuluc P.; Campiglio M.; Liedl K. R.; Flucher B. E. Ion-Pair Interactions between Voltage-Sensing Domain IV and Pore Domain I Regulate CaV1.1 Gating. Biophys. J. 2021, 120 (20), 4429–4441. 10.1016/j.bpj.2021.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tuluc P.; Molenda N.; Schlick B.; Obermair G. J.; Flucher B. E.; Jurkat-Rott K. A CaV1.1 Ca2+ Channel Splice Variant with High Conductance and Voltage-Sensitivity Alters EC Coupling in Developing Skeletal Muscle. Biophys. J. 2009, 96 (1), 35–44. 10.1016/j.bpj.2008.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke E.; Stocki P.; Sinclair E. H.; Gauhar A.; Fletcher E. J. R.; Krawczun-Rygmaczewska A.; Duty S.; Walsh F. S.; Doherty P.; Rutkowski J. L. A Single Domain Shark Antibody Targeting the Transferrin Receptor 1 Delivers a TrkB Agonist Antibody to the Brain and Provides Full Neuroprotection in a Mouse Model of Parkinson’s Disease. Pharmaceutics 2022, 14 (7), 1335. 10.3390/pharmaceutics14071335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sehlin D.; Stocki P.; Gustavsson T.; Hultqvist G.; Walsh F. S.; Rutkowski J. L.; Syvänen S. Brain Delivery of Biologics Using a Cross-Species Reactive Transferrin Receptor 1 VNAR Shuttle. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 2020, 34 (10), 13272–13283. 10.1096/fj.202000610RR. [DOI] [PubMed] [Google Scholar]
- Fernández-Quintero M. L.; Georges G.; Varga J. M.; Liedl K. R. Ensembles in Solution as a New Paradigm for Antibody Structure Prediction and Design. mAbs 2021, 13 (1), 1923122 10.1080/19420862.2021.1923122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández-Quintero M. L.; Loeffler J. R.; Kraml J.; Kahler U.; Kamenik A. S.; Liedl K. R. Characterizing the Diversity of the CDR-H3 Loop Conformational Ensembles in Relationship to Antibody Binding Properties. Front. Immunol. 2019, 9, n/a. 10.3389/fimmu.2018.03065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández-Quintero M. L.; Kraml J.; Georges G.; Liedl K. R. CDR-H3 Loop Ensemble in Solution – Conformational Selection upon Antibody Binding. mAbs 2019, 11 (6), 1077–1088. 10.1080/19420862.2019.1618676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waibl F.; Fernández-Quintero M. L.; Kamenik A. S.; Kraml J.; Hofer F.; Kettenberger H.; Georges G.; Liedl K. R. Conformational Ensembles of Antibodies Determine Their Hydrophobicity. Biophys. J. 2021, 120 (1), 143–157. 10.1016/j.bpj.2020.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dill K. A.; Ozkan S. B.; Shell M. S.; Weikl T. R. The Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289–316. 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dill K. A.; MacCallum J. L. The Protein-Folding Problem, 50 Years On. Science 2012, 338 (6110), 1042–1046. 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]
- Geiler K.Protein Folding: The Good, the Bad, and the Ugly. Science in the News. https://sitn.hms.harvard.edu/flash/2010/issue65/ (accessed 2023-07-13).
- Illergård K.; Ardell D. H.; Elofsson A. Structure Is Three to Ten Times More Conserved than Sequence--a Study of Structural Response in Protein Cores. Proteins 2009, 77 (3), 499–508. 10.1002/prot.22458. [DOI] [PubMed] [Google Scholar]
- Anfinsen C. B. Principles That Govern the Folding of Protein Chains. Science 1973, 181 (4096), 223–230. 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
- Fernández-Quintero M. L.; Fischer A.-L. M.; Kokot J.; Waibl F.; Seidler C. A.; Liedl K. R. The Influence of Antibody Humanization on Shark Variable Domain (VNAR) Binding Site Ensembles. Front. Immunol. 2022, 13, n/a. 10.3389/fimmu.2022.953917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fra̧ckiewicz M.The Future of Drug Discovery: AlphaFold’s Impact on Protein Folding. TS2 SPACE. https://ts2.space/en/the-future-of-drug-discovery-alphafolds-impact-on-protein-folding/ (accessed 2023-08-08).
- Fersht A. R.; Matouschek A.; Serrano L. The Folding of an Enzyme. I. Theory of Protein Engineering Analysis of Stability and Pathway of Protein Folding. J. Mol. Biol. 1992, 224 (3), 771–782. 10.1016/0022-2836(92)90561-W. [DOI] [PubMed] [Google Scholar]
- Carter P. J.; Lazar G. A. Next Generation Antibody Drugs: Pursuit of the “High-Hanging Fruit.. Nat. Rev. Drug Discovery 2018, 17 (3), 197–223. 10.1038/nrd.2017.227. [DOI] [PubMed] [Google Scholar]
- Guthmiller J. J.; Han J.; Utset H. A.; Li L.; Lan L. Y.-L.; Henry C.; Stamper C. T.; McMahon M.; O’Dell G.; Fernández-Quintero M. L.; Freyn A. W.; Amanat F.; Stovicek O.; Gentles L.; Richey S. T.; de la Peña A. T.; Rosado V.; Dugan H. L.; Zheng N.-Y.; Tepora M. E.; Bitar D. J.; Changrob S.; Strohmeier S.; Huang M.; García-Sastre A.; Liedl K. R.; Bloom J. D.; Nachbagauer R.; Palese P.; Krammer F.; Coughlan L.; Ward A. B.; Wilson P. C. Broadly Neutralizing Antibodies Target a Haemagglutinin Anchor Epitope. Nature 2022, 602 (7896), 314–320. 10.1038/s41586-021-04356-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moreno-Gonzalez I.; Soto C. Misfolded Protein Aggregates: Mechanisms, Structures and Potential for Disease Transmission. Semin. Cell Dev. Biol. 2011, 22 (5), 482–487. 10.1016/j.semcdb.2011.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukherjee A.; Morales-Scheihing D.; Butler P. C.; Soto C. Type 2 Diabetes as a Protein Misfolding Disease. Trends Mol. Med. 2015, 21 (7), 439–449. 10.1016/j.molmed.2015.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soto C. Protein Misfolding and Disease; Protein Refolding and Therapy. FEBS Lett. 2001, 498 (2), 204–207. 10.1016/S0014-5793(01)02486-3. [DOI] [PubMed] [Google Scholar]
- Selkoe D. J. Amyloid β-Protein and the Genetics of Alzheimer’s Disease*. J. Biol. Chem. 1996, 271 (31), 18295–18298. 10.1074/jbc.271.31.18295. [DOI] [PubMed] [Google Scholar]
- Rappazzo C. G.; Fernández-Quintero M. L.; Mayer A.; Wu N. C.; Greiff V.; Guthmiller J. J. Defining and Studying B Cell Receptor and TCR Interactions. J. Immunol. Baltim. Md 1950 2023, 211 (3), 311–322. 10.4049/jimmunol.2300136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bostrom J.; Yu S.-F.; Kan D.; Appleton B. A.; Lee C. V.; Billeci K.; Man W.; Peale F.; Ross S.; Wiesmann C.; Fuh G. Variants of the Antibody Herceptin That Interact with HER2 and VEGF at the Antigen Binding Site. Science 2009, 323 (5921), 1610–1614. 10.1126/science.1165480. [DOI] [PubMed] [Google Scholar]
- Dickopf S.; Buldun C.; Vasic V.; Georges G.; Hage C.; Mayer K.; Forster M.; Wessels U.; Stubenrauch K.-G.; Benz J.; Ehler A.; Lauer M. E.; Ringler P.; Kobold S.; Endres S.; Klein C.; Brinkmann U. Prodrug-Activating Chain Exchange (PACE) Converts Targeted Prodrug Derivatives to Functional Bi- or Multispecific Antibodies. Biol. Chem. 2022, 403 (5–6), 495–508. 10.1515/hsz-2021-0401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández-Quintero M. L.; Kroell K. B.; Hofer F.; Riccabona J. R.; Liedl K. R. Mutation of Framework Residue H71 Results in Different Antibody Paratope States in Solution. Front. Immunol. 2021, 12, n/a. 10.3389/fimmu.2021.630034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelman H.; Gruebele M. Fast Protein Folding Kinetics. Q. Rev. Biophys. 2014, 47 (2), 95–142. 10.1017/S003358351400002X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How Fast-Folding Proteins Fold. Science 2011, 334 (6055), 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- Kamenik A. S.; Handle P. H.; Hofer F.; Kahler U.; Kraml J.; Liedl K. R. Polarizable and Non-Polarizable Force Fields: Protein Folding, Unfolding, and Misfolding. J. Chem. Phys. 2020, 153 (18), 185102 10.1063/5.0022135. [DOI] [PubMed] [Google Scholar]
- Satoh D.; Shimizu K.; Nakamura S.; Terada T. Folding Free-Energy Landscape of a 10-Residue Mini-Protein, Chignolin. FEBS Lett. 2006, 580 (14), 3422–3426. 10.1016/j.febslet.2006.05.015. [DOI] [PubMed] [Google Scholar]
- Honda S.; Yamasaki K.; Sawada Y.; Morii H. 10 Residue Folded Peptide Designed by Segment Statistics. Structure 2004, 12 (8), 1507–1518. 10.1016/j.str.2004.05.022. [DOI] [PubMed] [Google Scholar]
- Kührová P.; De Simone A.; Otyepka M.; Best R. B. Force-Field Dependence of Chignolin Folding and Misfolding: Comparison with Experiment and Redesign. Biophys. J. 2012, 102 (8), 1897–1906. 10.1016/j.bpj.2012.03.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Honda S.; Akiba T.; Kato Y. S.; Sawada Y.; Sekijima M.; Ishimura M.; Ooishi A.; Watanabe H.; Odahara T.; Harata K. Crystal Structure of a Ten-Amino Acid Protein. J. Am. Chem. Soc. 2008, 130 (46), 15327–15331. 10.1021/ja8030533. [DOI] [PubMed] [Google Scholar]
- Tsuboyama K.; Dauparas J.; Chen J.; Laine E.; Mohseni Behbahani Y.; Weinstein J. J.; Mangan N. M.; Ovchinnikov S.; Rocklin G. J. Mega-Scale Experimental Analysis of Protein Folding Stability in Biology and Design. Nature 2023, 434–444. 10.1038/s41586-023-06328-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nemtseva E. V.; Gerasimova M. A.; Melnik T. N.; Melnik B. S. Experimental Approach to Study the Effect of Mutations on the Protein Folding Pathway. PLoS One 2019, 14 (1), e0210361 10.1371/journal.pone.0210361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Creighton T. E. Experimental Studies of Protein Folding and Unfolding. Prog. Biophys. Mol. Biol. 1979, 33, 231–297. 10.1016/0079-6107(79)90030-0. [DOI] [PubMed] [Google Scholar]
- Rocklin G. J.; Chidyausiku T. M.; Goreshnik I.; Ford A.; Houliston S.; Lemak A.; Carter L.; Ravichandran R.; Mulligan V. K.; Chevalier A.; Arrowsmith C. H.; Baker D. Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing. Science 2017, 357 (6347), 168–175. 10.1126/science.aan0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Case D.A.; Aktulga H.M.; Belfon K.; Ben-Shalom I. Y.; Berryman J.T.; Brozell S.R.; Cerutti D.S.; Cheatham T.E. III; Cisneros G.A.; Cruzeiro V.W.D.; Darden T.A.; Forouzesh N.; Giambaşu G.; Giese T.; Gilson M.K.; Gohlke H.; Goetz A.W.; Harris J.; Izadi S.; Izmailov S.A.; Kasavajhala K.; Kaymak M.C.; King E.; Kovalenko A.; Kurtzman T.; Lee T.S.; Li P.; Lin C.; Liu J.; Luchko T.; Luo R.; Machado M.; Man V.; Manathunga M.; Merz K.M.; Miao Y.; Mikhailovskii O.; Monard G.; Nguyen H.; O’Hearn K. A.; Onufriev A.; Pan F.; Pantano S.; Qi R.; Rahnamoun A.; Roe D.R.; Roitberg A.; Sagui C.; Schott-Verdugo S.; Shajan A.; Shen J.; Simmerling C.L.; Skrynnikov N.R.; Smith J.; Swails J.; Walker R.C.; Wang J.; Wang J.; Wei H.; Wu X.; Wu Y.; Xiong Y.; Xue Y.; York D.M.; Zhao S.; Zhu Q.; Kollman P.A.. Amber 2023. University of California: San Francisco, 2023. [Google Scholar]
- Tian C.; Kasavajhala K.; Belfon K. A. A.; Raguette L.; Huang H.; Migues A. N.; Bickel J.; Wang Y.; Pincay J.; Wu Q.; Simmerling C. ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution. J. Chem. Theory Comput. 2020, 16 (1), 528–552. 10.1021/acs.jctc.9b00591. [DOI] [PubMed] [Google Scholar]
- Maier J. A.; Martinez C.; Kasavajhala K.; Wickstrom L.; Hauser K. E.; Simmerling C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11 (8), 3696–3713. 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ponder J. W.; Case D. A.. Force Fields for Protein Simulations. In Advances in Protein Chemistry; Elsevier, 2003; Vol. 66, pp 27–85. 10.1016/S0065-3233(03)66002-X. [DOI] [PubMed] [Google Scholar]
- Jorgensen W. L. Quantum and Statistical Mechanical Studies of Liquids. 10. Transferable Intermolecular Potential Functions for Water, Alcohols, and Ethers. Application to Liquid Water. J. Am. Chem. Soc. 1981, 103 (2), 335–340. 10.1021/ja00392a016. [DOI] [Google Scholar]
- Izadi S.; Anandakrishnan R.; Onufriev A. V. Building Water Models: A Different Approach. J. Phys. Chem. Lett. 2014, 5 (21), 3863–3871. 10.1021/jz501780a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks B. R.; Bruccoleri R. E.; Olafson B. D.; States D. J.; Swaminathan S.; Karplus M. CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. J. Comput. Chem. 1983, 4 (2), 187–217. 10.1002/jcc.540040211. [DOI] [Google Scholar]
- MacKerell A. D. Jr.; Brooks B.; Brooks C. L. III; Nilsson L.; Roux B.; Won Y.; Karplus M.. CHARMM: The Energy Function and Its Parameterization. In Encyclopedia of Computational Chemistry; John Wiley & Sons, Ltd., 2002. 10.1002/0470845015.cfa007. [DOI] [Google Scholar]
- Schmid N.; Eichenberger A. P.; Choutko A.; Riniker S.; Winger M.; Mark A. E.; Van Gunsteren W. F. Definition and Testing of the GROMOS Force-Field Versions 54A7 and 54B7. Eur. Biophys. J. 2011, 40 (7), 843–856. 10.1007/s00249-011-0700-9. [DOI] [PubMed] [Google Scholar]
- Kaminski G. A.; Friesner R. A.; Tirado-Rives J.; Jorgensen W. L. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105 (28), 6474–6487. 10.1021/jp003919d. [DOI] [Google Scholar]
- Lemkul J. A.; Huang J.; Roux B.; MacKerell A. D. An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev. 2016, 116 (9), 4983–5013. 10.1021/acs.chemrev.5b00505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berendsen H. J. C.; Postma J. P. M.; Van Gunsteren W. F.; Hermans A. J.; Pullman B.. Intermolecular Forces; Reidel: Dordrecht, The Netherlands, 1981. [Google Scholar]
- Berweger C. D.; Van Gunsteren W. F.; Müller-Plathe F. Force Field Parametrization by Weak Coupling. Re-Engineering SPC Water. Chem. Phys. Lett. 1995, 232 (5–6), 429–436. 10.1016/0009-2614(94)01391-8. [DOI] [Google Scholar]
- Berendsen H. J. C.; Grigera J. R.; Straatsma T. P. The Missing Term in Effective Pair Potentials. J. Phys. Chem. 1987, 91 (24), 6269–6271. 10.1021/j100308a038. [DOI] [Google Scholar]
- Izadi S.; Onufriev A. V. Accuracy Limit of Rigid 3-Point Water Models. J. Chem. Phys. 2016, 145 (7), 074501 10.1063/1.4960175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79 (2), 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
- Horn H. W.; Swope W. C.; Pitera J. W.; Madura J. D.; Dick T. J.; Hura G. L.; Head-Gordon T. Development of an Improved Four-Site Water Model for Biomolecular Simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120 (20), 9665–9678. 10.1063/1.1683075. [DOI] [PubMed] [Google Scholar]
- Chemical Computing Group ULC . Molecular Operating Environment (MOE), 2022.02, 2023. https://www.chemcomp.com/Products.htm.
- Labute P. Protonate3D: Assignment of Ionization States and Hydrogen Coordinates to Macromolecular Structures. Proteins 2009, 75 (1), 187–205. 10.1002/prot.22234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El Hage K.; Hédin F.; Gupta P. K.; Meuwly M.; Karplus M. Valid Molecular Dynamics Simulations of Human Hemoglobin Require a Surprisingly Large Box Size. eLife 2018, 7, e35560 10.7554/eLife.35560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gapsys V.; de Groot B. L. Comment on “Valid Molecular Dynamics Simulations of Human Hemoglobin Require a Surprisingly Large Box Size.. eLife 2019, 8, e44718 10.7554/eLife.44718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Case D.A.; Aktulga H.M.; Belfon K.; Ben-Shalom I. Y.; Berryman J.T.; Brozell S.R.; Cerutti D.S.; Cheatham T.E. III; Cisneros G.A.; Cruzeiro V.W.D.; Darden T.A.; Forouzesh N.; Giambaşu G.; Giese T.; Gilson M.K.; Gohlke H.; Goetz A.W.; Harris J.; Izadi S.; Izmailov S.A.; Kasavajhala K.; Kaymak M.C.; King E.; Kovalenko A.; Kurtzman T.; Lee T.S.; Li P.; Lin C.; Liu J.; Luchko T.; Luo R.; Machado M.; Man V.; Manathunga M.; Merz K.M.; Miao Y.; Mikhailovskii O.; Monard G.; Nguyen H.; O’Hearn K. A., Onufriev A.; Pan F.; Pantano S.; Qi R.; Rahnamoun A.; Roe D.R.; Roitberg A.; Sagui C.; Schott-Verdugo S.; Shajan A.; Shen J.; Simmerling C.L.; Skrynnikov N.R.; Smith J.; Swails J.; Walker R.C.; Wang J.; Wang J.; Wei H.; Wu X.; Wu Y.; Xiong Y.; Xue Y.; York D.M.; Zhao S.; Zhu Q.; Kollman P.A.. Amber 2020. University of California: San Francisco, 2020. [Google Scholar]
- Roe D. R.; Cheatham T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9 (7), 3084–3095. 10.1021/ct400341p. [DOI] [PubMed] [Google Scholar]
- Hub J. S.; de Groot B. L.; Grubmüller H.; Groenhof G. Quantifying Artifacts in Ewald Simulations of Inhomogeneous Systems with a Net Charge. J. Chem. Theory Comput. 2014, 10 (1), 381–390. 10.1021/ct400626b. [DOI] [PubMed] [Google Scholar]
- Wallnoefer H. G.; Handschuh S.; Liedl K. R.; Fox T. Stabilizing of a Globular Protein by a Highly Complex Water Network: A Molecular Dynamics Simulation Study on Factor Xa. J. Phys. Chem. B 2010, 114 (21), 7405–7412. 10.1021/jp101654g. [DOI] [PubMed] [Google Scholar]
- Wallnoefer H. G.; Liedl K. R.; Fox T. A Challenging System: Free Energy Prediction for Factor Xa. J. Comput. Chem. 2011, 32 (8), 1743–1752. 10.1002/jcc.21758. [DOI] [PubMed] [Google Scholar]
- Salomon-Ferrer R.; Götz A. W.; Poole D.; Le Grand S.; Walker R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J. Chem. Theory Comput. 2013, 9 (9), 3878–3888. 10.1021/ct400314y. [DOI] [PubMed] [Google Scholar]
- Miyamoto S.; Kollman P. A. Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13 (8), 952–962. 10.1002/jcc.540130805. [DOI] [Google Scholar]
- Berendsen H. J. C.; Postma J. P. M.; van Gunsteren W. F.; DiNola A.; Haak J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81 (8), 3684–3690. 10.1063/1.448118. [DOI] [Google Scholar]
- Adelman S. A.; Doll J. D. Generalized Langevin Equation Approach for Atom/Solid-surface Scattering: General Formulation for Classical Scattering off Harmonic Solids. J. Chem. Phys. 1976, 64 (6), 2375–2388. 10.1063/1.432526. [DOI] [Google Scholar]
- Scherer M. K.; Trendelkamp-Schroer B.; Paul F.; Pérez-Hernández G.; Hoffmann M.; Plattner N.; Wehmeyer C.; Prinz J.-H.; Noé F. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. J. Chem. Theory Comput. 2015, 11 (11), 5525–5542. 10.1021/acs.jctc.5b00743. [DOI] [PubMed] [Google Scholar]
- Chodera J. D.; Noé F. Markov State Models of Biomolecular Conformational Dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135–144. 10.1016/j.sbi.2014.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Likas A.; Vlassis N.; J. Verbeek J. The Global K-Means Clustering Algorithm. Pattern Recognit. 2003, 36 (2), 451–461. 10.1016/S0031-3203(02)00060-2. [DOI] [Google Scholar]
- Röblitz S.; Weber M. Fuzzy Spectral Clustering by PCCA+: Application to Markov State Models and Data Classification. Adv. Data Anal. Classif. 2013, 7 (2), 147–179. 10.1007/s11634-013-0134-6. [DOI] [Google Scholar]
- Karush J. On the Chapman-Kolmogorov Equation. Ann. Math. Stat. 1961, 32 (4), 1333–1337. 10.1214/aoms/1177704871. [DOI] [Google Scholar]
- Miroshin R. N. Special Solutions of the Chapman–Kolmogorov Equation for Multidimensional-State Markov Processes with Continuous Time. Vestn. St Petersburg Univ. Math. 2016, 49 (2), 122–129. 10.3103/S1063454116020114. [DOI] [Google Scholar]
- Wu H.; Noé F. Variational Approach for Learning Markov Processes from Time Series Data. J. Nonlinear Sci. 2020, 30 (1), 23–66. 10.1007/s00332-019-09567-y. [DOI] [Google Scholar]
- Venkatakrishnan A. J.; Fonseca R.; Ma A. K.; Hollingsworth S. A.; Chemparathy A.; Hilger D.; Kooistra A. J.; Ahmari R.; Babu M. M.; Kobilka B. K.; Dror R. O. Uncovering Patterns of Atomic Interactions in Static and Dynamic Structures of Proteins. bioRxiv 2019, 840694. 10.1101/840694. [DOI] [Google Scholar]
- Kabsch W.; Sander C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22 (12), 2577–2637. 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Nguyen C. N.; Kurtzman Young T.; Gilson M. K. Grid Inhomogeneous Solvation Theory: Hydration Structure and Thermodynamics of the Miniature Receptor Cucurbit[7]Uril. J. Chem. Phys. 2012, 137 (4), 044101 10.1063/1.4733951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen C. N.; Cruz A.; Gilson M. K.; Kurtzman T. Thermodynamics of Water in an Enzyme Active Site: Grid-Based Hydration Analysis of Coagulation Factor Xa. J. Chem. Theory Comput. 2014, 10 (7), 2769–2780. 10.1021/ct401110x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen C.; Gilson M. K.; Young T. Structure and Thermodynamics of Molecular Hydration via Grid Inhomogeneous Solvation Theory. arXiv 2011, n/a. 10.48550/arXiv.1108.4876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramsey S.; Nguyen C.; Salomon-Ferrer R.; Walker R. C.; Gilson M. K.; Kurtzman T. Solvation Thermodynamic Mapping of Molecular Surfaces in AmberTools: GIST. J. Comput. Chem. 2016, 37 (21), 2029–2037. 10.1002/jcc.24417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lazaridis T. Inhomogeneous Fluid Approach to Solvation Thermodynamics. 1. Theory. J. Phys. Chem. B 1998, 102 (18), 3531–3541. 10.1021/jp9723574. [DOI] [Google Scholar]
- Lazaridis T. Inhomogeneous Fluid Approach to Solvation Thermodynamics. 2. Applications to Simple Fluids. J. Phys. Chem. B 1998, 102 (18), 3542–3550. 10.1021/jp972358w. [DOI] [Google Scholar]
- Kraml J.; Kamenik A. S.; Waibl F.; Schauperl M.; Liedl K. R. Solvation Free Energy as a Measure of Hydrophobicity: Application to Serine Protease Binding Interfaces. J. Chem. Theory Comput. 2019, 15 (11), 5872–5882. 10.1021/acs.jctc.9b00742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waibl F.; Kraml J.; Hoerschinger V. J.; Hofer F.; Kamenik A. S.; Fernández-Quintero M. L.; Liedl K. R. Grid Inhomogeneous Solvation Theory for Cross-Solvation in Rigid Solvents. J. Chem. Phys. 2022, 156 (20), 204101 10.1063/5.0087549. [DOI] [PubMed] [Google Scholar]
- Chen L.; Cruz A.; Roe D. R.; Simmonett A. C.; Wickstrom L.; Deng N.; Kurtzman T. Thermodynamic Decomposition of Solvation Free Energies with Particle Mesh Ewald and Long-Range Lennard-Jones Interactions in Grid Inhomogeneous Solvation Theory. J. Chem. Theory Comput. 2021, 17 (5), 2714–2724. 10.1021/acs.jctc.0c01185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kraml J.; Hofer F.; Kamenik A. S.; Waibl F.; Kahler U.; Schauperl M.; Liedl K. R. Solvation Thermodynamics in Different Solvents: Water–Chloroform Partition Coefficients from Grid Inhomogeneous Solvation Theory. J. Chem. Inf. Model. 2020, 60 (8), 3843–3853. 10.1021/acs.jcim.0c00289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levy Y.; Onuchic J. N. WATER MEDIATION IN PROTEIN FOLDING AND MOLECULAR RECOGNITION. Annu. Rev. Biophys. Biomol. Struct. 2006, 35 (1), 389–415. 10.1146/annurev.biophys.35.040405.102134. [DOI] [PubMed] [Google Scholar]
- Zschau R. L.; Zacharias M. Mechanism of β-Hairpin Formation in AzoChignolin and Chignolin. J. Comput. Chem. 2023, 44 (9), 988–1001. 10.1002/jcc.27059. [DOI] [PubMed] [Google Scholar]
- Paissoni C.; Camilloni C. How to Determine Accurate Conformational Ensembles by Metadynamics Metainference: A Chignolin Study Case. Front. Mol. Biosci. 2021, 8, n/a. 10.3389/fmolb.2021.694130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maruyama Y.; Koroku S.; Imai M.; Takeuchi K.; Mitsutake A. Mutation-Induced Change in Chignolin Stability from π-Turn to α-Turn. RSC Adv. 2020, 10 (38), 22797–22808. 10.1039/D0RA01148G. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abriata L. A.; Dal Peraro M. Assessment of Transferable Forcefields for Protein Simulations Attests Improved Description of Disordered States and Secondary Structure Propensities, and Hints at Multi-Protein Systems as the next Challenge for Optimization. Comput. Struct. Biotechnol. J. 2021, 19, 2626–2636. 10.1016/j.csbj.2021.04.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shabane P. S.; Izadi S.; Onufriev A. V. General Purpose Water Model Can Improve Atomistic Simulations of Intrinsically Disordered Proteins. J. Chem. Theory Comput. 2019, 15 (4), 2620–2634. 10.1021/acs.jctc.8b01123. [DOI] [PubMed] [Google Scholar]
- Henzler-Wildman K.; Kern D. Dynamic Personalities of Proteins. Nature 2007, 450 (7172), 964–972. 10.1038/nature06522. [DOI] [PubMed] [Google Scholar]
- Freddolino P. L.; Harrison C. B.; Liu Y.; Schulten K. Challenges in Protein-Folding Simulations. Nat. Phys. 2010, 6 (10), 751–758. 10.1038/nphys1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voter A. F. Hyperdynamics: Accelerated Molecular Dynamics of Infrequent Events. Phys. Rev. Lett. 1997, 78 (20), 3908–3911. 10.1103/PhysRevLett.78.3908. [DOI] [Google Scholar]
- Laio A.; Parrinello M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99 (20), 12562–12566. 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansmann U. H. E. Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem. Phys. Lett. 1997, 281 (1), 140–150. 10.1016/S0009-2614(97)01198-6. [DOI] [Google Scholar]
- Sugita Y.; Okamoto Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314 (1), 141–151. 10.1016/S0009-2614(99)01123-9. [DOI] [Google Scholar]
- Torrie G. M.; Valleau J. P. Nonphysical Sampling Distributions in Monte Carlo Free-Energy Estimation: Umbrella Sampling. J. Comput. Phys. 1977, 23 (2), 187–199. 10.1016/0021-9991(77)90121-8. [DOI] [Google Scholar]
- Huber T.; Torda A. E.; van Gunsteren W. F. Local Elevation: A Method for Improving the Searching Properties of Molecular Dynamics Simulation. J. Comput. Aided Mol. Des. 1994, 8 (6), 695–708. 10.1007/BF00124016. [DOI] [PubMed] [Google Scholar]
- Kamenik A. S.; Linker S. M.; Riniker S. Enhanced Sampling without Borders: On Global Biasing Functions and How to Reweight Them. Phys. Chem. Chem. Phys. 2022, 24 (3), 1225–1236. 10.1039/D1CP04809K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quoika P. K.; Kamenik A. S.; Fernández-Quintero M. L.; Zacharias M.; Liedl K. R. Water Model Determines Thermosensitive and Physicochemical Properties of Poly(N-Isopropylacrylamide) in Molecular Simulations. Front. Mater. 2023, 10, n/a. 10.3389/fmats.2023.1005781. [DOI] [Google Scholar]
- Fuentes G.; van Dijk A. D. J.; Bonvin A. M. J. J.. Nuclear Magnetic Resonance-Based Modeling and Refinement of Protein Three-Dimensional Structures and Their Complexes. In Molecular Modeling of Proteins; Kukol A., Ed.; Methods Molecular Biology; Humana Press: Totowa, NJ, 2008; pp 229–255. 10.1007/978-1-59745-177-2_13. [DOI] [PubMed] [Google Scholar]
- Kaptein R.; Boelens R.; Scheek R. M.; Van Gunsteren W. F. Protein Structures from NMR. Biochemistry 1988, 27 (15), 5389–5395. 10.1021/bi00415a001. [DOI] [PubMed] [Google Scholar]
- Jeener J.; Meier B. H.; Bachmann P.; Ernst R. R. Investigation of Exchange Processes by Two-dimensional NMR Spectroscopy. J. Chem. Phys. 1979, 71 (11), 4546–4553. 10.1063/1.438208. [DOI] [Google Scholar]
- Sapienza P. J.; Lee A. L. Using NMR to Study Fast Dynamics in Proteins: Methods and Applications. Curr. Opin. Pharmacol. 2010, 10 (6), 723–730. 10.1016/j.coph.2010.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The structures used in this manuscript are publicly available, with the PDB codes 1UAO and 5AWL. The trajectories have been made available via Zenodo (10.5281/zenodo.10499332).




