Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2021 Jul 29;118(31):e2023856118. doi: 10.1073/pnas.2023856118

Non-Markovian modeling of protein folding

Cihan Ayaz a, Lucas Tepper a, Florian N Brünig a, Julian Kappler b, Jan O Daldrop a, Roland R Netz a,1
PMCID: PMC8346879  PMID: 34326249

Significance

Protein-folding kinetics is often described as Markovian (i.e., memoryless) diffusion in a one-dimensional free energy landscape, governed by an instantaneous friction coefficient that is fitted to reproduce experimental or simulated folding times. For the α-helix forming polypeptide alanine9 and a specific reaction coordinate that consists of the summed native hydrogen-bond lengths, we demonstrate that the friction extracted from molecular dynamics simulations exhibits significant memory with a decay time that is in the nanosecond range and thus, of the same order as the folding and unfolding times. Our non-Markovian modeling not only reproduces the molecular dynamics simulations accurately but also demonstrates that memory friction effects lead to anomalous and drastically accelerated protein kinetics.

Keywords: protein folding, non-Markovian processes, mean first-passage times, generalized Langevin equation, memory effects

Abstract

We extract the folding free energy landscape and the time-dependent friction function, the two ingredients of the generalized Langevin equation (GLE), from explicit-water molecular dynamics (MD) simulations of the α-helix forming polypeptide alanine9 for a one-dimensional reaction coordinate based on the sum of the native H-bond distances. Folding and unfolding times from numerical integration of the GLE agree accurately with MD results, which demonstrate the robustness of our GLE-based non-Markovian model. In contrast, Markovian models do not accurately describe the peptide kinetics and in particular, cannot reproduce the folding and unfolding kinetics simultaneously, even if a spatially dependent friction profile is used. Analysis of the GLE demonstrates that memory effects in the friction significantly speed up peptide folding and unfolding kinetics, as predicted by the Grote–Hynes theory, and are the cause of anomalous diffusion in configuration space. Our methods are applicable to any reaction coordinate and in principle, also to experimental trajectories from single-molecule experiments. Our results demonstrate that a consistent description of protein-folding dynamics must account for memory friction effects.


Biological macromolecular function relies on coupled processes that take place on widely different timescales; this makes the theoretical description of such systems challenging. For proteins, the topic of this paper, folding occurs in the range of microseconds to many minutes or even hours and involves bond vibrations and hydration water motion on subpicosecond times (13). In order to enable large-scale simulations as well as meaningful theories, which should concentrate on the essential features of such processes, several methods for the elimination of irrelevant degrees of freedom have been introduced. For the classical dynamics of an interacting many-body system, the rigorous treatment is based on the Liouville equation and employs the projection operator formalism to integrate out all degrees of freedom except one or a few reaction coordinates (4, 5). Instead of 6N equations of motion for all positions and momenta of an N-particle system, the dynamics is described by few equations for the observables of interest. This coarse-graining procedure leads from a deterministic Hamiltonian to a stochastic description by the generalized Langevin equation (GLE), which for the case of a one-dimensional coordinate q(t), reads (47)

mq¨(t)=U[q(t)]0tdsΓ(ts)q˙(s)+FR(t), [1]

where m is the effective mass of the coordinate q. The potential of mean force U(q), which for proteins, corresponds to the folding free energy landscape, is obtained from the equilibrium probability distribution ρ(q) via U(q)=kBTlnρ(q), where kBT is the thermal energy with kB the Boltzmann constant and T the absolute temperature. The elimination of degrees of freedom introduces non-Markovian effects in terms of the memory function Γ(t), which describes time-dependent friction and thereby, couples the present dynamics to the past states, and stochastic effects in terms of the random force FR(t). In equilibrium, the random force FR(t) is related to Γ(t) via the fluctuation–dissipation theorem FR(t)FR(t)=kBTΓ(|tt|) (7). The derivation of the GLE in Eq. 1 relies on several approximations (811). Thus, for a given reaction coordinate that is a nonlinear function of the microscopic coordinates, the validity of Eq. 1 is not guaranteed and needs to be explicitly checked.

The folding free energy U(q) can be straightforwardly obtained from simulations; it can also be obtained from single-molecule experiments (1215). Clearly, there is no guarantee that a given reaction coordinate, which could be an experimental observable such as the distance between two attached fluorophores, is a good reaction coordinate, meaning that it leads to a Markovian description of the folding process. Different reaction coordinates have been proposed for the efficient description of protein-folding simulations (16); schemes to construct reaction coordinates that optimally yield the transition state, which separates unfolded and folded basins of attraction from each other, have been developed (17). As an alternative to continuous reaction coordinates, Markov models describe protein dynamics in terms of a set of metastable states (18, 19), for which full access to the underlying microscopic coordinates is typically needed. These works have in common that descriptions are sought that minimize memory effects, so that stochastic Markovian theory applies. In the opposite direction, various methods were developed to extract the memory function Γ(t) from time series data for a given reaction coordinate (9, 2025), but the complexities of the GLE, in particular for a nonlinear protein-folding free energy in combination with a numerically determined memory function, prevented predictions of protein-folding times from the GLE, with the notable exception of dialanine (26). This is why in protein-folding theory, the Markovian Langevin equation (LE), where the memory integral is replaced by an instantaneous friction term, is predominantly used. Such a Markovian theory yields many useful insights into protein-folding dynamics and culminated in the comparison of transition-path times and mean folding times (27, 28). However, the success of free energy folding theory on the Markovian level relies partly on the fact that the friction, which determines the prefactor of the Kramers folding time, is normally used as a fitting parameter. Even when the friction is allowed to vary with the reaction coordinate and is extracted from simulations, it is typically computed from folding or reconfiguration times, which by construction, leads to self-consistent predictions of the kinetics (29, 30). In fact, recent experiments revealed significant inconsistencies when comparing directly measured free energy barrier heights with those inferred from transition path and folding times (15), which were suggested to be due to memory effects (31, 32). The same inconsistencies are obtained when the friction of a reaction coordinate is not fitted to folding times but rather, extracted directly from simulation trajectories and used in the framework of Markovian theory, as we demonstrate here.

In our approach, instead of searching for a good reaction coordinate, we employ a standard one-dimensional coordinate that consists of the sum of the separations between native contacts. We use accurate tools for extracting all parameters of the GLE from molecular dynamics (MD) simulations for the helix-forming polypeptide Ala9 in water. The free energy U(q) shows multiple minima separated by low barriers, indicative of the sequential formation of the helix, while the longest decay time of the multiexponential memory function Γ(t) is of the order of the unfolding time. These properties render Ala9 as a very sensitive test of kinetic theory. We simulate the resulting GLE by Markovian embedding techniques. By comparison of the MD and GLE results for the mean folding and unfolding times, we demonstrate that the one-dimensional GLE is an accurate and practical tool for the description of protein-folding dynamics. On the other hand, the Markovian version of the overdamped GLE cannot describe the folding and unfolding kinetics of the peptide as long as the friction is not a fitting parameter but rather, taken as extracted from the MD simulations. This stays true even when the friction coefficient is allowed to depend on the reaction coordinate. As predicted by the Grote–Hynes theory, memory typically accelerates barrier crossing, where the acceleration magnitude depends primarily on the ratio of the memory time and the distance between the minimum and the barrier in reaction coordinate space (3338). This memory-induced speedup of folding and unfolding is found to be accompanied by pronounced anomalous diffusion in reaction coordinate space. Our results are corroborated by a systematic Kramers–Moyal coefficient (KMC) analysis, which shows that higher-order quartic KMCs are nonnegligible and that the linear and quadratic KMCs vanish in the short time limit, as expected in the presence of non-Markovian effects. This implies that the description of protein folding in terms of the Fokker–Planck equation is only valid above a certain timescale that needs to be suitably chosen. We also find that a spurious reaction coordinate–dependent friction profile arises when non-Markovian protein dynamics is described using a Markovian model.

Results and Discussion

MD Simulations and GLE Parameter Extraction.

The effective GLE is constructed from a 10-μs-long MD trajectory for Ala9 in water, which is the simplest polypeptide that forms an α-helix (39) (Methods and SI Appendix, section 1 have details). As a reaction coordinate, we use the summed separations between the H-bond donor nitrogen of residue n and the acceptor oxygen of residue n + 4,

q(t)=13i=24riN(t)ri+4O(t), [2]

which characterize the left-handed α-helical conformation. In the α-helical state, q has a value around 0.3 nm, the mean H-bond length between nitrogen and oxygen. We will further also consider the end-to-end distance as an alternative reaction coordinate. The free energies U(q) in Fig. 1A for different simulation lengths demonstrate that the simulation is fully converged after about 6μs. The free energy displays several metastable states, which are also discernible in the trajectory in Fig. 1B and make this simple polypeptide challenging for theoretical description.

Fig. 1.

Fig. 1.

(A) The free energy U(q) for the mean hydrogen-bond distance reaction coordinate of Ala9 for different simulation lengths; representative snapshots of the polypeptide backbone in all local minima are shown. The barrier used for the calculation of unfolding and folding times is positioned at qB=0.54 nm. (B) A 200-ns-long segment of the trajectory is shown.

Using a generalization of earlier methods (40), we extract the running integral G(t)=0tdsΓ(s) (SI Appendix, section 2 has details), from which the memory function Γ(t) is obtained via a numerical derivative and fitted using least-square methods to a multiexponential of the form

Γ(t)=n=15γnτnet/τn. [3]

The extracted G(t) (gray line) is compared with the corresponding fit (red line) in Fig. 2A; no significant deviations can be discerned. The comparison of the extracted and fitted memory function Γ(t) in Fig. 2B reveals oscillations below a picosecond, which are not reproduced by the exponential fit function but also do not play a role for the kinetics, as will be shown below. The fitted memory times τn and friction coefficients γn are presented in Table 1; the typical reconfiguration time, which can be qualitatively inferred from the trajectory in Fig. 1B, is of the order of the longest decay time τ55 ns. This means that the reaction coordinate is not particularly good since it exhibits pronounced non-Markovian effects and thus, constitutes a suitable test of our methods.

Fig. 2.

Fig. 2.

(A) Running integral G(t) over the memory function; Inset shows a lin-log plot. The horizontal dashed line denotes the total friction coefficient γ¯. (B) Memory function Γ(t); Inset includes short times. Gray lines correspond to the numerical data; red lines correspond to the multiexponential fit according to Eq. 3. (C) Mean-square displacement of the reaction coordinate; MD (blue line) and GLE (orange broken line) simulation results agree perfectly and exhibit superdiffusion for times up to 0.1 ps and subdiffusion up to 1 ns. Underdamped (underd.; red line) and overdamped (overd.; green line, underneath the red line) Markovian Langevin simulations agree perfectly with each other but miss the anomalous diffusion.

Table 1.

Fitted memory function parameters from Eq. 3

n γn (u/ps) τn (ps)
1 2.2103 0.007
2 1.2104 4.6
3 4.2104 40.3
4 2.4105 399
5 5.7104 4,970
γ¯=nγn 3.5105

The effective mass follows from the equipartition theorem according to m=kBT/q˙2 and turns out to be independent of q and given by m=31.3 u (SI Appendix, section 3). The motion described by the GLE is expected to become diffusive after the inertial time τm=m/γ¯, where the total friction coefficient is given by γ¯=nγn=3.5105 u/ps (Table 1). It follows that τm=0.1 fs, even shorter than the MD integration time step; thus, inertial effects are completely negligible. Nevertheless, the acceleration term in Eq. 1 is kept in the GLE simulations, as it stabilizes the numerical integration. In order to estimate the importance of memory effects, the memory times τn are compared with the diffusion timescale τD=βγ¯L2/2 (36), which is the time it takes a free Brownian particle to diffuse over a length L in reaction coordinate space where β=1/kBT is the inverse thermal energy. For L=0.22 nm, the distance between the folded minimum at q=0.32 nm and the barrier at q=0.54 nm in Fig. 1, one obtains τD=6.8 ns, which is of the order of the longest memory time τ5. This places the system in the so-called memory-acceleration regime, where memory effects are relevant and significantly accelerate barrier crossing (3638).

Comparison of MD and GLE Simulations.

Numerical integration of the GLE is straightforwardly achieved by Markovian embedding (i.e., by transforming the GLE into a system of linearly coupled LEs) (22) (SI Appendix, section 4).

In Fig. 3A, we show profiles of the mean first-passage time (MFPT) τMFPT(qS,qF) for unfolding (start position qS=qL=0.32nm; solid lines) and folding kinetics (start position qS=qR=0.99 nm; broken lines) as a function of the final position qF. Statistical errors are determined accounting for data correlations (41) (SI Appendix, section 5) and are smaller than the line thickness. MD and GLE simulation results (blue and orange lines, respectively) agree nicely; this demonstrates that GLE-based non-Markovian modeling of protein folding is feasible and accurate. Even first-passage time distributions from GLE and MD simulations agree satisfactorily with each other, as shown in SI Appendix, section 6.

Fig. 3.

Fig. 3.

(A) Comparison of unfolding and folding MFPTs τMFPT(qS,qF) from MD (blue) and GLE (orange) simulations as a function of the final position qF for start positions qS=qL=0.32 nm (solid lines) and qS=qR=0.99 nm (broken lines). The gray curve shows the folding free energy U(q). (B) Dependence of different MFPTs from GLE simulations on the memory time rescaling factor α; the corresponding start and final positions are illustrated in C, Inset. Open and filled circles correspond to open and filled arrows, respectively, in C, Inset. The colored horizontal lines denote corresponding results for the overdamped Markov limit from Eq. 10. (C) Ratios of the MFPTs shown in B. Ratios of reciprocal MFPTs do not depend on α (red, green, and blue lines that connect colored circles); only the ratio of the folding and unfolding times to the barrier top, τMFPT(qR,qB)/τMFPT(qL,qB) (open green and filled red spheres), depends on α.

Beyond reproducing MD results, the GLE is a diagnostic tool that allows us to quantify the importance of memory effects. In order to modulate memory effects in the GLE, we rescale the memory times according to τnατn for n=2,3,4,5 while keeping the memory time τ1 of the fastest exponential contribution fixed. Since τ1=7 fs is above the simulation time step of 1 fs, this ensures that in the limit α0, we obtain a regularized model that, as we will show below, corresponds to the Markovian limit. In Fig. 3B, we show MFPTs between the three positions qL=0.32 nm, qB=0.54 nm, and qR=0.99 nm as a function of the rescaling factor α from GLE simulations. The six different MFPTs are illustrated in Fig. 3 C, Inset by filled and closed arrows and indicated in Fig. 3B by corresponding filled and open colored spheres. We see that reducing the memory time increases all MFPTs; in other words, memory accelerates barrier crossing (36). As expected, the GLE results approach the overdamped Markov limit, denoted by the horizontal lines in the corresponding color and calculated from the exact expression in Eq. 10, without adjustable parameters as α tends to zero. Interestingly, for folding to the barrier (open green circles), the MFPT for α=1 and the Markovian limit for α0 differ only by a factor of around 2.5. On the other hand, for unfolding to the barrier (filled red circles), the α0 and α=1 MFPTs differ by a factor of around nine. This means that even when treating the total friction coefficient γ¯ as a free parameter, the Markovian overdamped theory Eq. 10, because it is linear in the friction, can reproduce either the MD folding or unfolding times to the barrier but not both simultaneously. This is not due to inertial effects since the overdamped Markovian theory works perfectly for α0, as seen in Fig. 3B. Rather, memory effects influence the times of folding and unfolding to the barrier top differently. This is demonstrated by the plot of MFPT ratios as a function of α in Fig. 3C, where it is seen that the ratio of the folding and unfolding times to the barrier top τMFPT(qR,qB)/τMFPT(qL,qB), denoted by open green and filled red spheres, depends sensitively on α. In contrast, the ratios of reciprocal MFPTs (i.e., MFPTs with interchanged start and final positions), denoted by red, green, and blue lines with identically colored open and filled circles, do not depend on α, which shows that the memory dependence of ratios of MFPTs depends on the precise MFPT definition and by no means indicates a breakdown of the detailed balance or the law-of-mass action. In SI Appendix, section 6, we demonstrate that the memory-induced speedup is even more pronounced for transition-path times compared with folding and unfolding times, in agreement with previous findings (15, 31, 32).

The high accuracy of GLE simulations is furthermore reflected by the good agreement of the mean-square displacement Δq(t)2=(q(t+t)q(t))2 from MD and GLE simulations in Fig. 2C, which exhibits pronounced subdiffusive behavior with an exponent 0.4 for times between 1 ps and 1 ns. Anomalous diffusion is often modeled by fractional theories (31, 42). Fig. 2C shows that it is accurately reproduced by multiexponential memory and that it disappears when memory effects are eliminated, in line with recent theoretical analysis (43). The overall good agreement between MD and GLE simulation results shows that the GLE in the form of Eq. 1 describes the kinetics of Ala9 very accurately. This is not due to our specific choice of reaction coordinate, as demonstrated in SI Appendix, section 7, where we present a similar GLE-based analysis using the Ala9 end-to-end distance as reaction coordinate.

Reaction Coordinate–Dependent Friction.

We so far demonstrated that the GLE in the form of Eq. 1 reproduces the MD simulation kinetics and that memory effects are significant. We now investigate whether reaction coordinate–dependent friction effects, which are not included in the GLE, are relevant. The Markovian LE that incorporates a friction function γ(q) has been amply used to describe protein-folding dynamics (29, 30, 44). In the underdamped version, it reads

mq¨(t)=U(q)γ(q)q˙(t)+kBTγ(q)η(t), [4]

which for general U(q), unfortunately is analytically intractable. The overdamped version

0=U(q)γ(q)q˙(t)kBT2γ(q)γ(q)+kBTγ(q)η(t) [5]

is much more useful since the MFPTs can be calculated analytically. In these expressions, the random force η(t) has vanishing mean, and its correlator is given by η(t)η(t)=2δ(tt). For constant friction, the underdamped LE (Eq. 4) can be derived from the GLE (Eq. 1) by a systematic expansion of the integral kernel (SI Appendix, section 8). The overdamped LE (Eq. 5) follows from Eq. 4 by neglecting the inertia term; the term proportional to the gradient γ(q) cancels a spurious drift term and follows by mapping on the Fokker–Planck equation (SI Appendix, section 9) (6). In fact, from the overdamped LE with constant friction, an arbitrary friction profile γ(q) can be created by a nonlinear transformation of the reaction coordinate (29, 30) (SI Appendix, section 10); this suggests that spatially dependent friction is related to nonlinearities in the reaction coordinate that are not straightforwardly captured by the projection techniques used to derive the GLE Eq. 1.

Various methods to extract γ(q) from experimental or simulated trajectories have been proposed; a systematic approach involves the KMCs, which for the overdamped case and for finite lag time Δt, read

Dk(q)=1k!1Δtq(t+Δt)q(t)kq(t)=q. [6]

The Fokker–Planck equation for the time-dependent probability distribution P(q,t) in terms of the KMCs follows in the limit Δt0 as (7)

P(q,t)t=k=1kqkDk(q)P(q,t), [7]

and the underdamped case is treated in SI Appendix, section 11. According to Pawula’s theorem, for a Markovian process, all KMCs with k>2 vanish for Δt0, and Eq. 7 takes the standard form of a second-order partial differential equation (7). For a non-Markovian process [i.e., if the memory function Γ(t) in Eq. 1 has a finite range], all KMCs with k>1 vanish for Δt0, and thus, the stochastic properties of the process cannot be described by a partial differential equation for P(q,t) at all (SI Appendix, section 12).

For the underdamped LE, the relation between the second-order velocity KMC Dvv and the friction profile γUD(q) reads (7)

Dvv(q)=12Δt(v(t+Δt)v(t))2q(t)=q=kBTγUD(q)m2. [8]

For the overdamped LE, γOD(q) follows from the second-order position KMC Dqq as

Dqq(q)=12Δt(q(t+Δt)q(t))2q(t)=q=kBTγOD(q) [9]

(SI Appendix, section 9). For the numerical computation of the KMCs, we use kernel-density estimators (45) (SI Appendix, section 13). In Fig. 4A, we show the friction profiles γUD(q) (circles) and γOD(q) (lines) computed from the KMCs for different lag times Δt; a number of points are noteworthy. 1) We find no significant deviations between the friction profiles extracted from MD (solid lines and filled circles) and GLE (broken lines and open circles) trajectories; this reverberates that the GLE describes the protein dynamics very faithfully. 2) The underdamped and overdamped friction profiles γUD(q) and γOD(q) disagree for all lag times Δt, which very clearly demonstrates an inconsistency in the Markovian description of protein folding. In fact, in the limit Δt0, both Dqq and Dvv vanish; thus, γOD(q) diverges, while γUD(q) goes to zero (SI Appendix, section 12). 3) While the underdamped friction γUD(q) never reaches a realistic value close to γ¯, the overdamped friction γOD(q) approaches γ¯ for Δt1 ns. This shows that lag times of the order of the longest memory time have to be used in order to generate realistic friction values. 4) The friction profiles extracted from the GLE simulations are position dependent, seen most clearly in γOD(q) for Δt=1 ns (purple broken line); this is clearly a spurious effect since the GLE has no position-dependent friction. We conclude that the mapping of a non-Markovian process onto a Markovian LE produces spurious position-dependent friction effects. Presumably, the effective friction of proteins will in general exhibit a dependence on the reaction coordinate, but the extraction of friction profiles would have to account for memory effects in order to avoid spurious effects. The capability of the GLE Eq. 1 to very accurately reproduce the MD simulation kinetics suggests that for the present case of Ala9, the spatial dependence of friction is negligible.

Fig. 4.

Fig. 4.

(A) Friction coefficient profiles γ(q) from KMC analysis for different lag times Δt (different colors) for the underdamped (underd.) Langevin model, Eq. 8, from MD (filled circles) and GLE simulations (open circles) and for the overdamped (overd.) Langevin model, Eq. 9, from MD (solid lines) and GLE simulations (broken lines). The gray horizontal line shows the total friction coefficient γ¯ extracted from MD simulations. (B) Friction profiles computed from the MD MFPT profiles in Fig. 3A using Eq. 11. γunf(qF) follows from the unfolding MFPTs for start position qS=0.32 nm, and γfol(qF) follows from folding MFPTs for qS=0.99 nm. The gray horizontal line denotes the friction coefficient γ¯ extracted from MD simulations. The gray curve in the background shows the folding free energy U(q). (C) MFPTs from MD and GLE simulations are compared with overd. Markovian predictions according to Eq. 10 using γunf(qF) and γfol(qF) from B.

An alternative way to determine a friction profile γ(q) in the overdamped limit uses the one-to-one relation between the MFPT profiles in Fig. 3A and γ(q). From the expressions for the folding and unfolding times (Eq. 10), γ(q) follows by inversion according to Eq. 11 (30). In Fig. 4B, we show γunf(qF) and γfol(qF) computed from unfolding and folding MFPTs from MD simulations for start positions qS=qL and qS=qR, respectively. Not surprisingly, the profiles γunf(qF) and γfol(qF) are rather close to γ¯ extracted from the MD simulations, which is shown as a gray horizontal line in Fig. 4B, but differ significantly from each other. This suggests that a single friction profile cannot describe folding and unfolding of Ala9 simultaneously. In fact, the values of γunf(qF) and γfol(qF) go down as qF moves to the respective start positions (i.e., as the folding and unfolding times become shorter). This reflects that memory effects particularly accelerate fast transitions (3638).

To demonstrate the limitations of the friction profiles in Fig. 4B, we show in Fig. 4C folding and unfolding MFPT profiles that are calculated according to Eq. 10 from γunf(q) (filled circles) and γfol(q) (open circles). By construction, the MFPTs using γunf(q) reproduce the unfolding simulation data, while the MFPTs using γfol(q) reproduce the folding simulation data. In contrast, the MFPTs using γunf(q) fail to reproduce the simulated folding times, and the MFPTs using γfol(q) fail to reproduce the simulated unfolding times, in particular when the folding/unfolding times become smaller than about 10 ns. In contrast, the GLE model (broken lines) reproduces both folding and unfolding MD dynamics (solid lines). This underlines that there is no consistent way of describing the complete folding/unfolding dynamics with a Markovian model.

Conclusions

By extracting the time-dependent friction from MD simulations for the polypeptide Ala9 from explicit-water MD simulations, we demonstrate that the resulting GLE model can be straightforwardly integrated numerically and reproduces the folding and unfolding kinetics of the MD simulations very accurately. Our findings are not restricted to a reaction coordinate based on the summed distances between native H bonds. As we show in SI Appendix, section 7, the same analysis of the Ala9 end-to-end distance leads to similar results. Decreasing the memory time in the GLE while keeping the friction coefficient (i.e., the integral over the memory function) constant, the folding kinetics changes significantly for folding and unfolding events. This shows that memory effects are important even for the formation kinetics of a single α-helix.

In contrast, the Markovian LE cannot reproduce the full Ala9 reconfiguration dynamics, even with a fitted friction profile; this follows from the comparison of the folding and unfolding kinetics, which would need to be modeled with different friction profiles in order to reproduce the MD simulation kinetics.

We have mostly used the GLE model as a diagnostic tool to understand and quantify non-Markovian effects; since non-Markovian simulations are rather inexpensive, they can also be used as an efficient tool to simulate the response of proteins to environmental changes (e.g., externally applied forces). In fact, our extraction technique for the memory function can in principle also be applied to trajectories from single-molecule experiments (1315), which would enable us to perform non-Markovian GLE simulations on experimental systems directly, without the need of atomistic MD simulations. Because of the limited time resolution of typical experimental data, suitable extraction techniques would have to be used (24, 46).

Methods

MD and GLE Simulation Details.

We use the all-atom Amber03 force field (47) with extended simple point-charge (SPC/E) water (48). The cubic simulation box has side lengths of 4.95 nm and contains 4,023 water molecules. The Lennard–Jones interactions are cut off after 1.0 nm. For long-range electrostatic interactions, we use the particle Mesh Ewald method (49). The simulation time step is 1 fs, and the total simulation time is 10μs. All simulations are performed in the NVT ensemble using the Gromacs 2019 MD package (50). Further details are given in SI Appendix, section 1. In the GLE simulations, we used the same time step and simulation time as in the MD simulations. Input files of the MD simulations are available for download under (http://dx.doi.org/10.17169/refubium-29935). Our Python scripts for the numerical extraction of the memory kernel, for performing a GLE simulation, and computing MFPTs can be found in GitHub (https://github.com/lucastepper/memtools).

From MFPTs to Friction Profiles.

The MFPT is defined as the mean time needed to reach the final position qF for the first time when starting from a position qS. For the overdamped LE in Eq. 5, it reads for qS<qF (51),

τMFPT(qS,qF)=βqSqFdqeβU(q)γ(q)qminqdqeβU(q) [10a]

and for qS>qF,

τMFPT(qS,qF)=βqFqSdqeβU(q)γ(q)qqmaxdqeβU(q). [10b]

Taking the derivative of Eq. 10 w.r.t. qF gives the friction profile γ(qF) as (30)

γunf(qF)=kBTeβU(qF)Z1τMFPTqFforqS<qF, [11a]
γfol(qF)=kBTeβU(qf)Z2τMFPTqFforqS>qF, [11b]

where Z1=qminqFdqeβU(q) and Z2=qFqmaxdqeβU(q).

Supplementary Material

Supplementary File

Acknowledgments

We acknowledge discussions with W. A. Eaton; support by Deutsche Forschungsgemeinschaft Grant CRC 1114 “Scaling Cascades in Complex System,” Project 235221301, Project B03; and support by European Research Council Advanced Grant NoMaMemo Grant 835117. Work was funded in part by the European Research Council under the EU’s Horizon 2020 Program, Grant 740269.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2023856118/-/DCSupplemental.

Data Availability

Derivations that support the findings of this study are included in SI Appendix. Simulation input files data have been deposited in Institutional Repository (http://dx.doi.org/10.17169/refubium-29935). Our codes for extracting the memory kernel, running GLE simulations, and for computing MFPTs are available in GitHub (https://github.com/lucastepper/memtools).

References

  • 1.Bryngelson J. D., Onuchic J. N., Socci N. D., Wolynes P. G., Funnels, pathways, and the energy landscape of protein folding: A synthesis. Proteins Structure Function Genetic 21, 167–195 (1995). [DOI] [PubMed] [Google Scholar]
  • 2.Levy Y., Onuchic J. N., Water mediation in protein folding and molecular recognition. Annu. Rev. Biophys. Biomol. Struct. 35, 389–415 (2006). [DOI] [PubMed] [Google Scholar]
  • 3.Dill K. A., MacCallum J. L., The protein-folding problem, 50 years on. Science 338, 1042–1046 (2012). [DOI] [PubMed] [Google Scholar]
  • 4.Zwanzig R., Memory effects in irreversible thermodynamics. Phys. Rev. 124, 983–992 (1961). [Google Scholar]
  • 5.Mori H., Transport, collective motion, and Brownian motion. Prog. Theor. Phys. 33, 423–455 (1965). [Google Scholar]
  • 6.Van Kampen N. G., Elimination of fast variables. Phys. Rep. 124, 69–160 (1985). [Google Scholar]
  • 7.Risken H., “Fokker-Planck equation” in The Fokker-Planck Equation: Methods of Solution and Applications, Risken H., Ed. (Springer Series in Synergetics, Springer, Berlin, Germany, 1996), pp. 63–95. [Google Scholar]
  • 8.Grabert H., Hänggi P., Talkner P., Microdynamics and nonlinear stochastic processes of gross variables. J. Stat. Phys. 22, 537–552 (1980). [Google Scholar]
  • 9.Lange O. F., Grubmüller H., Collective Langevin dynamics of conformational motions in proteins. J. Chem. Phys. 124, 214903 (2006). [DOI] [PubMed] [Google Scholar]
  • 10.Kinjo T., Hyodo S., Equation of motion for coarse-grained simulation based on microscopic description. Phys. Rev. 75, 051109 (2007). [DOI] [PubMed] [Google Scholar]
  • 11.Hijón C., Español P., Vanden-Eijnden E., Delgado-Buscalioni R., Mori–Zwanzig formalism as a practical computational tool. Faraday Discuss 144, 301–322 (2010). [DOI] [PubMed] [Google Scholar]
  • 12.Schuler B., Eaton W. A., Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol. 18, 16–26 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yu H., et al. , Energy landscape analysis of native folding of the prion protein yields the diffusion constant, transition path time, and rates. Proc. Natl. Acad. Sci. U.S.A. 109, 14452–14457 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hinczewski M., Gebhardt J. C. M., Rief M., Thirumalai D., From mechanical folding trajectories to intrinsic energy landscapes of biopolymers. Proc. Natl. Acad. Sci. U.S.A. 110, 4500–4505 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Neupane K., et al. , Direct observation of transition paths during the folding of proteins and nucleic acids. Science 352, 239–242 (2016). [DOI] [PubMed] [Google Scholar]
  • 16.Hegger R., Stock G., Multidimensional Langevin modeling of biomolecular dynamics. J. Chem. Phys. 130, 034106 (2009). [DOI] [PubMed] [Google Scholar]
  • 17.Best R. B., Hummer G., Reaction coordinates and rates from transition paths. Proc. Natl. Acad. Sci. U.S.A. 102, 6732–6737 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Noé F., Horenko I., Schütte C., Smith J. C., Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states. J. Chem. Phys. 126, 155102 (2007). [DOI] [PubMed] [Google Scholar]
  • 19.Chodera J. D., Singhai N., Pande V. S., Dill K. A., Swope W. C., Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. J. Chem. Phys. 126, 155101 (2007). [DOI] [PubMed] [Google Scholar]
  • 20.Straub J. E., Borkovec M., Berne B. J., Calculation of dynamic friction on intramolecular degrees of freedom in. J. Phys. Chem. 91, 4995–4998 (1987). [Google Scholar]
  • 21.Horenko I., Hartmann C., Schütte C., Noé F., Data-based parameter estimation of generalized multidimensional Langevin processes. Phys. Rev. E 76, 016706 (2007). [DOI] [PubMed] [Google Scholar]
  • 22.Darve E., Solomon J., Kia A., Computing generalized Langevin equations and generalized Fokker–Planck equations. Proc. Natl. Acad. Sci. U.S.A. 106, 10884–10889 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gottwald F., Ivanov SD., Kühn O., Vibrational spectroscopy via the Caldeira-Leggett model with anharmonic system potentials. J. Chem. Phys. 144, 164102 (2016). [DOI] [PubMed] [Google Scholar]
  • 24.Jung G., Hanke M., Schmid F., Iterative reconstruction of memory kernels. J. Chem. Theor. Comput. 13, 2481–2488 (2017). [DOI] [PubMed] [Google Scholar]
  • 25.Daldrop J. O., Kappler J., Brünig F. N., Netz R. R., Butane dihedral angle dynamics in water is dominated by internal friction. Proc. Natl. Acad. Sci. U.S.A. 115, 5169–5174 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lee H. S., Ahn S. H., Darve E. F., The multi-dimensional generalized Langevin equation for conformational motion of proteins. J. Chem. Phys. 150, 174113 (2019). [DOI] [PubMed] [Google Scholar]
  • 27.Chung H. S., McHale K., Louis J. M., Eaton W. A., Single-molecule fluorescence experiments determine protein folding transition path times. Science 335, 981–984 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chung H. S., Piana-Agostinetti S., Shaw D. E., Eaton W. A., Structural origin of slow diffusion in protein folding. Science 349, 1504–1510 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Best R. B., Hummer G., Coordinate-dependent diffusion in protein folding. Proc. Natl. Acad. Sci. U.S.A. 107, 1088–1093 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hinczewski M., von Hansen Y., Dzubiella J., Netz R. R., How the diffusivity profile reduces the arbitrariness of protein folding free energies. J. Chem. Phys. 132, 245103 (2010). [DOI] [PubMed] [Google Scholar]
  • 31.Satija R., Das A., Makarov D. E., Transition path times reveal memory effects and anomalous diffusion in the dynamics of protein folding. J. Chem. Phys. 147, 152707 (2017). [DOI] [PubMed] [Google Scholar]
  • 32.Satija R., Makarov D. E., Generalized Langevin equation as a model for barrier crossing dynamics in biomolecular folding. J. Phys. Chem. B 123, 802–810 (2019). [DOI] [PubMed] [Google Scholar]
  • 33.Grote R. F., Hynes J. T., The stable states picture of chemical reactions. II. Rate constants for condensed and gas phase reaction models. J. Chem. Phys. 73, 2715–2732 (1980). [Google Scholar]
  • 34.Hanggi P., Mojtabai F., Thermally activated escape rate in presence of long-time memory. Phys. Rev. 26, 1168–1170 (1982). [Google Scholar]
  • 35.Pollak E., Grabert H., Hänggi P., Theory of activated rate processes for arbitrary frequency dependent friction: Solution of the turnover problem. J. Chem. Phys. 91, 4073–4087 (1989). [Google Scholar]
  • 36.Kappler J., Daldrop J. O., Brünig F. N., Boehle M. D., Netz R. R., Memory-induced acceleration and slowdown of barrier crossing. J. Chem. Phys. 148, 014903 (2018). [DOI] [PubMed] [Google Scholar]
  • 37.Kappler J., Hinrichsen V. B., Netz R. R., Non-Markovian barrier crossing with two-time-scale memory is dominated by the faster memory component. Euro. Phys. J. E 42, 119 (2019). [DOI] [PubMed] [Google Scholar]
  • 38.Lavacchi L., Kappler J., Netz R. R., Barrier crossing in the presence of multi-exponential memory functions with unequal friction amplitudes and memory times. Europhys. Lett. 131, 40004 (2020). [Google Scholar]
  • 39.Jas G. S., Eaton W. A., Hofrichter J., Effect of viscosity on the kinetics of α-helix and β-hairpin formation. J. Phys. Chem. B 105, 261–272 (2001). [Google Scholar]
  • 40.Kowalik B., et al. , Memory-kernel extraction for different molecular solutes in solvents of varying viscosity in confinement. Phys. Rev. 100, 012126 (2019). [DOI] [PubMed] [Google Scholar]
  • 41.Flyvbjerg H., “Error estimates on averages of correlated data” in Advances in Computer Simulation, Kertész J., Kondor I., Eds. (Lecture Notes in Physics, Springer, Berlin, Germany, 1998), pp. 88–103. [Google Scholar]
  • 42.Metzler R., Jeon J. H., Cherstvy A. G., Barkai E., Anomalous diffusion models and their properties: Non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking. Phys. Chem. Chem. Phys. 16, 24128–24164 (2014). [DOI] [PubMed] [Google Scholar]
  • 43.Mitterwallner B. G., Lavacchi L., Netz R. R., Negative friction memory induces persistent motion. Eur. Phys. J. E 43, 67 (2020). [DOI] [PubMed] [Google Scholar]
  • 44.Chahine J., Oliveira R. J., Leite V. B. P., Wang J., Configuration-dependent diffusion can shift the kinetic transition state and barrier height of protein folding. Proc. Natl. Acad. Sci. U.S.A. 104, 14646–14651 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Gorjão L. R., Meirinhos F., kramersmoyal: Kramers–Moyal coefficients for stochastic processes. J. Open. Source Softw. 4, 1693 (2019). [Google Scholar]
  • 46.Mitterwallner B. G., Schreiber C., Daldrop J. O., Rädler, Netz R. R., Non-Markovian data-driven modeling of single-cell motility. Phys. Rev. E 101, 032408 (2020). [DOI] [PubMed] [Google Scholar]
  • 47.Duan Y., et al. , A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J. Comput. Chem. 24, 1999–2012 (2003). [DOI] [PubMed] [Google Scholar]
  • 48.Berendsen H. J. C., Grigera J. R., Straatsma T. P., The missing term in effective pair potentials. J. Phys. Chem. 91, 6269–6271 (1987). [Google Scholar]
  • 49.Darden T., York D., Pedersen L., Particle mesh Ewald: An N log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993). [Google Scholar]
  • 50.Abraham M. J., et al. , GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015). [Google Scholar]
  • 51.Weiss G. H., “First passage time problems in chemical physics” in Advances in Chemical Physics, I. Prigogine, Ed. (John Wiley & Sons, Ltd, 2007), vol 13, pp. 1–18. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Data Availability Statement

Derivations that support the findings of this study are included in SI Appendix. Simulation input files data have been deposited in Institutional Repository (http://dx.doi.org/10.17169/refubium-29935). Our codes for extracting the memory kernel, running GLE simulations, and for computing MFPTs are available in GitHub (https://github.com/lucastepper/memtools).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES