Accurate Memory Kernel Extraction from Discretized Time-Series Data

Lucas Tepper; Benjamin Dalton; Roland R Netz

doi:10.1021/acs.jctc.3c01289

. 2024 Apr 11;20(8):3061–3068. doi: 10.1021/acs.jctc.3c01289

Accurate Memory Kernel Extraction from Discretized Time-Series Data

Lucas Tepper ¹, Benjamin Dalton ¹, Roland R Netz ^1,^*

PMCID: PMC11044577 PMID: 38603471

Abstract

Memory effects emerge as a fundamental consequence of dimensionality reduction when low-dimensional observables are used to describe the dynamics of complex many-body systems. In the context of molecular dynamics (MD) data analysis, accounting for memory effects using the framework of the generalized Langevin equation (GLE) has proven efficient, accurate, and insightful, particularly when working with high-resolution time series data. However, in experimental systems, high-resolution data are often unavailable, raising questions about the impact of the data resolution on the estimated GLE parameters. This study demonstrates that direct memory extraction from time series data remains accurate when the discretization time is below the memory time. To obtain memory functions reliably, even when the discretization time exceeds the memory time, we introduce a Gaussian Process Optimization (GPO) scheme. This scheme minimizes the deviation of discretized two-point correlation functions between time series data and GLE simulations and is able to estimate accurate memory kernels as long as the discretization time stays below the longest time scale in the data, typically the barrier crossing time.

Introduction

A fundamental challenge in natural sciences involves the creation of a simplified, yet accurate, representation of complex system dynamics using a low-dimensional coordinate. For instance, in spectroscopy, atomic motions are investigated solely through the polarization induced by an electromagnetic field, resulting in spectra.¹ In the case of molecules in fluids, the myriad of interactions with the solvent is often reduced to a one-dimensional diffusion process.^2,3 In numerous studies,⁴⁻⁸ the folding of a protein is described by a one-dimensional reaction coordinate. These diverse fields all share the common approach of projecting the complete many-body dynamics of 6N atomic positions and momenta onto a few or even a single reaction coordinate. Starting from the deterministic kinetics of a Hamiltonian system, the projection procedure yields a stochastic description based on the generalized Langevin equation (GLE),⁹⁻¹¹ which, in the case of a one-dimensional coordinate x(t) and its corresponding velocity v(t), reads

where m is the effective mass of the coordinate x. The potential of mean force U(x) is directly available from the equilibrium probability distribution ρ(x) via U(x) = −k_BT ln ρ(x), where k_B is the Boltzmann constant, and T is the absolute temperature. Non-Markovian effects arise as a direct consequence of the dimensionality reduction.¹² In the GLE, the memory kernel Γ(t) weights the effect of past velocities on the current acceleration. Stochastic effects, represented by the random force F_R(t), are linked to the memory function via the fluctuation–dissipation theorem in equilibrium, ⟨F_R(0)F_R(t)⟩ = k_BTΓ(t). When the relaxation of the environment governing Γ(t) is sufficiently fast, Γ(t) approaches a delta kernel, and the Langevin equation emerges from the GLE. Considerable efforts have been dedicated to identifying suitable reaction coordinates to minimize memory effects and enable a Markovian description of protein folding.^4−6,8,12

In recent works, the memory function Γ(t) was extracted from time series data of proteins of biological relevance, allowing the non-Markovian description of a protein’s folding kinetics in a nonlinear folding landscape. Memory effects were found to be highly relevant, both in model systems¹³ and real proteins.^14,15 Multiple methods exist to extract memory functions from MD data. A much-used method is based on Volterra equations, which are deterministic, integro-differential equations that derive from the GLE and allow for the extraction of the memory kernel from time correlation functions.¹⁶⁻¹⁹ While Volterra equations offer good accuracy when high-quality time-series data are available, it is unclear if they remain efficient when the observations of the system are sampled with long discretization times. A recent research endeavor used an iterative scheme to approximate the memory kernel by adapting a trial kernel with a heuristic update based on the velocity autocorrelation function.²⁰ Another work parametrized memory kernels by fitting correlation functions to an analytical solution of the GLE.²¹ In order to include the short and long time scales of the system dynamics, the fit included both the two-point correlation and its running integral. Both methods share the limitation of not being applicable to a nonlinear potential energy function U(x). A recent paper not suffering from such a limitation used a maximum-likelihood model to estimate the GLE parameters that best fit the given MD data.²² In a different work on polymer solutions, star polymers were coarse-grained to single beads interacting via a nonlinear U(x). A GLE system was set up to mimic the star polymers’ kinetics. The simulation parameters of the GLE system were iteratively changed using Gaussian Process Optimization (GPO) such that the coarse-grained and MD velocity autocorrelations were most similar.²³ The same idea was used to estimate a joint memory kernel over multiple temperatures.²⁴

Here, we consider the effects of temporal discretization, motivated by the fact that data are always discretized. For MD simulations, archived data often only contains the atomic positions at time intervals of hundreds of picoseconds to nanoseconds, as in the case of the data from the Anton supercomputer.²⁵ When considering experimental data, measurement devices limit the time step of the observations, typically at the microsecond scale.^26,27 In a prior publication, discretization effects were examined within the framework of data-driven GLE analysis. The GLE, without a potential, was solved analytically. To deal with discretization effects, the discretized mean-squared displacement and velocity autocorrelation functions were computed, allowing for the direct fitting of the memory kernel.²⁸ The present work investigates how a GLE with a nonharmonic potential can be parametrized given discretized data by considering a highly nonlinear molecular dynamics test system. The Volterra-based approach is shown to be remarkably resilient to time discretizations. Where the Volterra approach ceases to function, we demonstrate that Gaussian Process Optimization is a suitable method to obtain memory kernels from discrete time series data. In matching correlation functions computed from subsampled data, we present a method to deal with the discretization effects and extend the GLE analysis to nonlinear data at higher discretizations. The choice of correlation functions involves some flexibility, demonstrating the broad applicability of our approach. For a small alanine homopeptide used as a test system, the Volterra method is suitable for discretization times that reach the memory time of about 1 ns. In comparison, the GPO method extends the range to discretization times up to the folding time of 58 ns.

Results and Discussion

We investigate the effect of data discretization starting from a 10-μs-long MD trajectory of alanine nonapeptide (Ala₉) in water, which was established as a sensitive test system for non-Markovian effects in our previous work.¹⁴ As in our original analysis, the formation of the α-helix in Ala₉ is measured by the mean distance between the H-bond acceptor oxygen of residue n and the donor nitrogen of residue n + 4

In the α-helical state, x has a value of approximately 0.3 nm, the mean H-bond length between nitrogen and oxygen. The potential of mean force U(x) in Figure 1C displays several metastable states along the folding landscape; Ala₉, therefore, is a suitable and nontrivial test system for numerical methods. Figure 1A shows a 450 ns long trajectory. To test how time discretization affects memory extraction, frames of the trajectory are left out to achieve an effective discretization time step Δt. Such a discretized trajectory (orange data points for Δt = 1 ns) is compared in Figure 1B to the time series at full resolution. The potential U(x) is always estimated from a histogram of the entire data set to separate time discretization from effects arising due to the undersampling of the potential (see section I in the Supporting Information).

I-**III** Representative snapshots for different values of the mean hydrogen-bond distance reaction coordinate of Ala₉, x, defined in eq 2. A Multiple folding and unfolding events occur within a 450 ns trajectory segment. B A single folding event. The orange circles indicate the time series discretized at Δt = 1 ns. C The potential landscape U(x) for Ala₉, computed from the trajectory at full resolution. The folded state (I) forms a sharp minimum at x = 0.32 nm. A local minimum is found at x = 0.62 nm (II). The unfolded state forms a broad minimum around x = 1.0 nm (**III**).

Volterra Equations

To extract memory kernels from time-series data, the GLE in eq 1 is multiplied by v(0) and averaged over time. By using the relation ⟨F_R(t)v(0)⟩ = 0,^9,10 one obtains the Volterra equation^14,19

where C^vv(t) is the velocity autocorrelation function, and C^∇Uv(t) is the correlation between the gradient of the potential and the velocity. By integrating eq 3 from 0 to t, we derive a Volterra equation involving the running integral over the kernel G(t) = ∫^t₀dsΓ(s) and insert mC^vv(0) = C^∇Ux(0)¹⁴ to obtain

with C^∇Ux(t) being the correlation between the gradient of the potential and the position. Computing the memory kernel directly from eq 3 is possible^29,30 but prone to instabilities.¹⁷ Extracting G(t) using eq 4 and computing Γ(t) via a numerical derivative improves the numerical stability.^17,31 The discretization and solution of eq 4 are discussed in section II of the Supporting Information. A recent study proposed an alternative technique for extracting memory kernels by Taylor expansion of the convolution integral³² (we discuss the potential applicability of this Ansatz to our specific problem in section III in the Supporting Information). We fit Γ(t) extracted from the full-resolution data at Δt = 1 fs using least-squares to a multiexponential of the form

The fitted memory times τ_i and friction coefficients γ_i are presented in Table 1. The fitting involves both Γ(t) and G(t), as elaborated in the Methods section, and accurately captures the MD kinetics, similar to our previous work.¹⁴

Table 1. Fitted Memory Function Parameters for Δt = 1 fs According to Eq 5 ^a.

i	γ_i [u/ps]	τ_i [ps]
1	2.2 · 10³	0.007
2	4.4 · 10⁴	18
3	2.4 · 10⁵	370
4	6.0 · 10⁴	4100
5	4.6 · 10³	5700
γ_tot = ∑⁵_i=1γ_i	3.5 · 10⁵
		1000

Open in a new tab

The fits for Δt > 1 fs are shown in section IV in the Supporting Information.

In order to estimate the impact of the non-Markovian effects on the kinetics, we turn to a heuristic formula for the mean first-passage time τ_MFP of a particle in a double-well potential in the presence of exponentially decaying memory.^13,33,34 Validated by extensive simulations, the heuristic formula accurately described the non-Markovian effects occurring in the folding of various proteins.¹⁵ For a single exponential memory function, the heuristic formula identifies three different regimes by comparing the single memory time τ to the diffusion time scale τ_D = γ_totL²/k_BT, which is the time it takes for a free Brownian particle to diffuse over a length of L in reaction coordinate space. The first regime is the Markovian limit, where τ ≪ τ_D and non-Markovian effects are negligible. The second regime is a non-Markovian regime where τ_D/100 ≲ τ ≲ 10τ_D, in which a speed-up of τ_MFP compared to the Markovian description is observed. The third regime occurs when τ ≳ 10τ_D, where τ_MFP is slowed down compared to the Markovian description due to non-Markovian memory effects.

To compute τ_D, we take L = 0.22 nm, the distance between the folded state at x = 0.32 nm and the barrier at x = 0.54 nm, the total friction γ_tot = ∑⁵_i=1γ_i and obtain τ_D = 6.8 ns. The τ_i values in Table 1 span times from τ₁ = 7 fs ≪ τ_D up to τ₅ = 5.7 ns ≈ τ_D. In a previous work,¹⁵ τ_mem = ∫^∞₀dssΓ(s)/∫^∞₀ds Γ(s), the first moment of the memory kernel, was proposed as the characteristic time scale for a multiscale memory kernel. For the memory kernel in Table 1, we find τ_mem = 1 ns, correctly predicting the non-Markovian speed-up of τ_MFP that a previous study demonstrated for Ala₉.¹⁴ In this work, we will establish τ_mem as the limit for the discretization time Δt, beyond which the Volterra method ceases to produce accurate results.

In the following, the full-resolution kernel obtained for a time step of Δt = 1 fs will serve as a reference for results using a higher Δt. Comparing the extracted G(t) with the corresponding fit according to eq 5 (red line) in Figure 2F shows no significant differences in the long time limit. Figure 2C shows oscillations of the extracted Γ(t) for t < 1 ps, which are discarded by the exponential fit. As we will show later, they do not play a role in the kinetics. For both Γ(t) in Figure 2B and C^vv(t) in Figure 2A, the oscillations disappear for Δt ≥ 0.1 ps, indicating that they are caused by subpicosecond molecular vibrations. Moreover, the value of Γ(t) for t < 1 ps is consistently attenuated as Δt increases, mirroring the same trend observed in C^vv(0), as illustrated in the inset of Figure 2A. In contrast, Γ(t) for t > 1 ps in the inset of Figure 2B shows an exponential decay that is well preserved for all Δt < 1 ns. The running integral G(t) in Figure 2D stays mostly unchanged for discretizations smaller than Δt < 1 ns. This demonstrates that the Volterra extraction scheme is accurate for discretization times below the mean memory time, i.e. for Δt < τ_mem = 1 ns.

Memory extraction by the inversion of the Volterra eq 4 for different discretization times Δt, using data from MD simulations of Ala₉. A Velocity autocorrelation C^vv(t). B Memory kernel Γ(t), from numerical differentiation of G(t). C Multiexponential fit of Γ(t) computed for Δt = 1 fs (gray) compared to the corresponding numerical data (dark red). The fitted parameters are shown in Tables 1 and S1. D Running integral over the memory kernel G(t). E Total friction γ_tot, computed from the exponential fits of the kernels. The vertical broken gray line indicates τ_mem = ∫^∞₀dssΓ(s)/∫^∞₀ds Γ(s) = 1 ns. F Fit of G(t) (gray) computed at Δt = 1 fs compared to corresponding numerical data (dark red). G Comparison of the mean first-passage times τ_MFP computed from the MD data (black broken lines) to τ_MFP obtained from GLE simulations using kernels extracted at different Δts (colored lines).

The multiexponential kernel in eq 5 allows for the efficient numerical simulation of the GLE by setting up a Langevin equation where the reaction coordinate x is coupled harmonically to one overdamped, auxiliary variable per exponential component^10,35 (see section V in the Supporting Information). Utilizing this simulation technique, Figure 2G compares profiles for the mean first-passage times τ_MFP originating from both the folded and unfolded states. For Δt ≤ 10 ps, the τ_MFP values obtained from the GLE simulations (colored lines) closely align with those derived from MD simulations (black broken lines), thereby manifesting the precise correspondence between the non-Markovian GLE description and the kinetics observed in the MD simulation. In Figure 2E, we present the asymptotic limit lim_t→∞G(t), representing the total friction coefficient γ_tot of the system, estimated by summing the individual γ_i values obtained from the exponential fits. When Δt ≥ 1 ns, we find that G(t) does not show a plateau value in the long-time limit. Consequently, this leads to a notable discrepancy between the τ_MFP profiles presented in Figure 2G and their MD counterparts for Δt ≥ 1 ns. Combining the information provided in Figure 2D, E, and G, it becomes evident that the extracted profile of G(t), the total friction γ_tot, and folding times τ_MFP all deviate significantly from the MD reference data when the discretization time approaches the memory time τ_mem. As a result, we assert that Volterra extraction becomes inadequate when the discretization time exceeds the memory time τ_mem. In Section VI of the Supporting Information, we demonstrate that the failure of the Volterra extraction scheme for large Δt is mostly due to discretization effects in the potential gradient-position correlation function C^∇Ux(t).

Gaussian Process Optimization

So far, we have demonstrated that the Volterra equation can be used to extract a consistent memory kernel for a wide range of discretization times up to Δt ≈ τ_mem. The resulting GLE faithfully captures the underlying kinetics when judged by τ_MFP for discretizations below the memory time scale τ_mem but fails when exceeding it. Given that the discretization time may exceed the dominant memory time scale in typical experimental settings, an improved method is clearly desirable. In the following, we describe a scheme that is not based on the Volterra equation and allows the extraction of Γ(t) for Δt that significantly exceeds τ_mem. For this, we use a matching scheme between the discretized time correlation functions of the MD reference system C^MD(nΔt) and of the GLE C^GLE(nΔt, θ) via the mean-squared loss

The type of correlation function will be specified later. The loss is evaluated over N samples, where N is determined based on the decay time of the correlation (see Table S3). In an iterative optimization, the friction and memory time parameters in eq 5 that serve as the GLE parameters θ = (γ₁, τ₁, ..., γ₅, τ₅) are updated, and the GLE is integrated using a simulation time step δt chosen small enough that discretization effects in the GLE simulations are negligible. For the sake of comparability, we maintain a constant mass value of m = 31.4 u, derived using the equipartition theorem according to m = k_BT/⟨v²⟩, from the MD data. In fact, the precise value of m has no significant influence on the method’s outcome since it can be accommodated within the kernel. Furthermore, the system’s inertial time τ_m = m/γ_tot = 0.09 fs is markedly shorter than all other relevant time scales, leading to an overdamped system in which the mass value is irrelevant. To find the best parameter set, θ, the choice of the optimizer is crucial. The loss Inline graphic , defined in eq 6, is inherently noisy due to the stochastic integration of the GLE and possesses, in general, many local minima in a high-dimensional space. Faced with such a task, common gradient-based or simplex methods fail.^36,37 Genetic algorithms present a powerful alternative but require many sample evaluations.³⁸⁻⁴⁰ Given the computational cost of a converged GLE simulation, we choose Gaussian Process Optimization (GPO)⁴¹⁻⁴³ as a method to minimize Inline graphic . GPO builds a surrogate model of the real loss that incorporates noise⁴⁴⁻⁴⁶ and allows for nonlocal search^47,48 (see section VII in the Supporting Information). As an active learning technique, it guides the sampling of new parameters, improving optimization efficiency.⁴⁹⁻⁵¹

In principle, any correlation function can serve as an optimization target. Figure 2A shows that the velocity autocorrelation function C^vv(t) decays to zero after about 1 ps, while Figure 3B,G shows that Inline graphic , the autocorrelation of the position x̅(t) = x(t) – ⟨x⟩, decays much more slowly over about 50 ns. With such a difference in the decay times of the two correlations, we define two losses based on eq 6, , using C^vv(t), and , using , anticipating that the two correlations probe different scales of the dynamics. Furthermore, we define Inline graphic , a linear combination of and , to test if including both correlations in the loss function improves the quality of the GLE parameters. The parameter α is selected for each Δt to achieve a balanced weighting between the two losses and is tabulated in Table S3 in the Supporting Information. For every GP optimization, 300 different θ values are evaluated via 18-μs-long GLE simulations each. The 10 θ samples with the lowest loss form the basis for the following analysis. When optimizing the loss function with a discretization of Δt = 2 ps, Figure 3A illustrates that Inline graphic (blue) accurately replicates the MD reference for C^vv(t), whereas (orange) exhibits discrepancies. Conversely, in Figure 3B, perfectly reproduces , while struggles to do so. Remarkably, the combined loss function (green) successfully aligns with both reference correlations simultaneously. To evaluate the quality of the GLE parameters, Figure 3C provides a comparison of the mean first-passage times τ_MFP between GLE results from the GPO solutions and the MD reference. We calculate τ_MFP for GPO-based GLE simulations and the MD reference using identical discretizations Δt. Notably, we observe that Inline graphic fails to align with the MD reference, whereas both and exhibit consistency with it. This outcome underscores the insufficiency of C^vv(t) in capturing the slow kinetics of barrier crossing. A comparison of τ_MFP between and reveals a slightly better correspondence to the MD reference for Inline graphic , signifying that the inclusion of C^vv(t) improves the optimization. Examining the obtained memory kernels in Figure 3D-E, all loss functions yield kernels that largely conform to the exponential fit of the MD reference but exclude the first memory component with a decay time of approximately τ₁ ≈ 7 fs. Both Inline graphic and correctly identify the plateau of G(t), while underestimates it, which we identify as the origin for the failure to correctly predict τ_MFP. Next, we evaluated the performance of the GPO for discretization times exceeding τ_mem. In Figure 3F-J, we show the results for Δt = 10 ns, demonstrating that the GPO approach yields similar results for all differently defined loss functions. The discretized C^vv(t), Inline graphic , and τ_MFP are in perfect agreement with the MD reference. The kernels agree for all but the lowest times. To confirm that the increased discretization used for the τ_MFP computation does not introduce any bias into the results, we perform an additional comparison of τ_MFP computed at the full-time resolution of Δt = 2 fs (see Figure S4 in the Supporting Information). Figure 4 provides a comparison of the performance of the Volterra and GPO approaches across various discretizations. This comparison focuses on the overall friction, folding, and unfolding mean first-passage times, as these observables are not included in the GPO optimization process. As shown in the previous section, the applicability of the Volterra method is limited to discretizations below memory time τ_mem = 1 ns. Extraordinarily, the GPO approach can surpass the boundary set by the memory time and estimates folding times with good accuracy for discretizations up to Δt = 40 ns. This limit roughly corresponds to the mean time it takes the system to fold, τ^MD_fold = 58 ns, which is given by the mean first-passage time from the unfolded state at x = 0.98 nm to the folded state at x = 0.32 nm. For the highest discretization time tested, Δt = 240 ns, the GP optimization still finds meaningful folding times, while underestimating the total friction.

To visualize the Gaussian Process Optimization (GPO), we plot the mean of observables over the 10 best optimization runs. We compare GPO results using the loss (blue), based on C^vv(t), (orange), based on the autocorrelation of the position x̅(t) = x(t) – ⟨x⟩ and , a linear combination of and (green). For Δt = 2 ps, we compare the observables AC^vv(t), B, C τ_MFP, DΓ̃(t) = Γ(t)/Γ(0), and EG̃(t) = G(t)/G(0) to the MD reference (black broken line). Equally, for Δt = 10 ns, we show FC^vv(t), G, H τ_MFP, IΓ̃(t), and JG̃(t). The kernels in D, E, I, and J, parametrized by eq 5, are plotted as time-continuous functions.

Inline graphic — To visualize the Gaussian Process Optimization (GPO), we plot the mean of observables over the 10 best optimization runs. We compare GPO results using the loss (blue), based on C^vv(t), (orange), based on the autocorrelation of the position x̅(t) = x(t) – ⟨x⟩ and , a linear combination of and (green). For Δt = 2 ps, we compare the observables AC^vv(t), B, C τ_MFP, DΓ̃(t) = Γ(t)/Γ(0), and EG̃(t) = G(t)/G(0) to the MD reference (black broken line). Equally, for Δt = 10 ns, we show FC^vv(t), G, H τ_MFP, IΓ̃(t), and JG̃(t). The kernels in D, E, I, and J, parametrized by eq 5, are plotted as time-continuous functions.

A The total friction γ_tot = ∑⁵_i=1γ_i obtained via the Volterra scheme (orange) is constant for discretizations of Δt < 1 ns. For Δt higher than the memory time τ_mem = 1 ns, it decreases until the extraction fails. Gaussian process optimization (GPO, blue) estimates the correct friction for much higher Δt. The horizontal gray line shows γ^MD_tot, the total friction extracted directly from the MD data. B The folding and unfolding mean first-passage times from GLE simulations with kernels extracted at different discretizations, given by the mean time it takes the system to first reach from x = 0.32 nm to x = 0.98 nm (unfolding) and reverse (folding). The MD folding times, τ^MD_fold = 58 ns and τ^MD_unfold = 26 ns, are indicated as horizontal gray lines τ_mem = 1 ns and τ^MD_fold as vertical gray lines. The GPO estimates the correct folding and unfolding times up to Δt ≈ τ^MD_fold, significantly higher than the Volterra scheme.

Conclusions

We investigate the effect that time discretization of the input data has on memory extraction. As a specific example, we consider MD time-series data of the polypeptide Ala₉. Computing a memory kernel via the inversion of the Volterra eq 4 requires the velocity autocorrelation and potential gradient-position correlation function. These autocorrelations change significantly as a result of increasing time discretization, and with it a surrogate kernel is obtained that differs from the full-resolution kernel. Our key finding is that given a discretization time lower than the characteristic memory time, the Volterra approach can compute a kernel that reproduces the kinetics of the MD system. Here, we define the characteristic memory time τ_mem via the first moment of the memory kernel, taking into account all decay times of the kernel, and find τ_mem = 1 ns for Ala₉. By extracting the memory kernel from MD trajectories at different discretizations, we show that the Volterra approach is able to reproduce the kinetics when the discretization time Δt is below τ_mem.

To also cover the important regime when Δt > τ_mem, we introduce a Gaussian Process Optimization (GPO) scheme based on matching discretized time correlation functions of the reference and the GLE system. We test losses based on the velocity and position autocorrelation functions, for which GPO yields memory kernels very similar to the Volterra scheme and is able to reproduce the reaction-coordinate dynamics and the folding times.

We demonstrate the effectiveness of GPO for discretization times up to the folding time of τ^MD_fold = 58 ns, about 50 times higher than the highest discretization for which the Volterra approach is applicable. As elaborated in previous works,¹³⁻¹⁵ memory can affect the kinetics of protein barrier crossing on time scales far exceeding the memory time, up to the longest time scale of the system. Therefore, the presented GPO approach is expected to extend the applicability of non-Markovian analysis to a wide range of discretized systems not suitable for the Volterra method.

In fact, the GPO analysis is not limited to data from MD simulations but can be used whenever encountering highly discrete experimental data. The application to data from single-molecule experiments⁵²⁻⁵⁴ is a promising venue for future research.

Methods

The MD simulation data is taken from our previous publication, see¹⁴ for details. The MD simulation has a simulation time step of δt = 1 fs, while all GLE simulations use a time step of δt = 2 fs. In the computation of the hb4 coordinate (eq 2), the distances are computed between the oxygens of Ala2, Ala3, and Ala4 and the nitrogens of Ala6, Ala7, and Ala8, where Ala1 is the alanine residue at the N-terminus of the polypeptide of Ala₉.

All analysis code is written in Python⁵⁵ or Rust.⁵⁶Table S3 shows the weights α for the loss Inline graphic , which includes C^vv(t) and . The memory kernels are fitted using the differential evolution algorithm implemented in the Python package ’scipy’⁵⁷ by minimizing a mean-squared loss, including both the kernel and the running integral over the kernel, , where is the mean-squared loss of the kernel and Inline graphic is the mean-squared loss of the running integral of the kernel. The resulting kernels and values for α_mem are shown in Table S1.

The GPO is performed using the ‘GaussianProcessRegressor’ implemented in the Python package ‘scikit-learn’,⁵⁸ using 10 optimizer restarts. When computing the loss Inline graphic , the correlation functions are evaluated over a finite number of sample points, N, always beginning with t = 0. The number of sample points N is given in Table S3. To minimize the expected improvement in eq S16 or maximize the standard deviation in eq S17, we use the ‘L-BFGS-B’ method implemented in ‘scipy’,⁵⁷ starting from 200 random samples drawn uniformly over the space of the parameters θ (see Table S2). When performing the analysis of the GPO on the basis of the 10 best runs, the integrations are repeated with a different seed for the random number generator used in the GLE integration, ensuring that the observables are reproduced by different integration runs with the same GLE parameters θ.

Acknowledgments

We gratefully acknowledge support by the Deutsche Forschungsgemeinschaft (DFG) Grant SFB 1449 and the European Research Council (ERC) Advanced Grant NoMaMemo No. 835117.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01289.

Additional derivations, details for numerical implementations, detailed numerical results, and additional figures (PDF)

The authors declare no competing financial interest.

Supplementary Material

ct3c01289_si_001.pdf^{(904.3KB, pdf)}

References

Quaresima V.; Ferrari M. A Mini-Review on Functional Near-Infrared Spectroscopy (fNIRS): Where Do We Stand, and Where Should We Go?. Photonics 2019, 6, 87. 10.3390/photonics6030087. [DOI] [Google Scholar]
Einstein A. Über Die von Der Molekularkinetischen Theorie Der Wärme Geforderte Bewegung von in Ruhenden Flüssigkeiten Suspendierten Teilchen. Ann. Phys. 1905, 322, 549–560. 10.1002/andp.19053220806. [DOI] [Google Scholar]
Brox Th. A One-Dimensional Diffusion Process in a Wiener Medium. Ann. Probab. 1986, 14, 1206–1218. 10.1214/aop/1176992363. [DOI] [Google Scholar]
Kitao A.; Go N. Investigating Protein Dynamics in Collective Coordinate Space. Curr. Opin. Struct. Biol. 1999, 9, 164–169. 10.1016/S0959-440X(99)80023-2. [DOI] [PubMed] [Google Scholar]
Best R. B.; Hummer G. Coordinate-Dependent Diffusion in Protein Folding. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 1088–1093. 10.1073/pnas.0910390107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ernst M.; Wolf S.; Stock G. Identification and Validation of Reaction Coordinates Describing Protein Functional Motion: Hierarchical Dynamics of T4 Lysozyme. J. Chem. Theory Comput. 2017, 13, 5076–5088. 10.1021/acs.jctc.7b00571. [DOI] [PubMed] [Google Scholar]
Socci N. D.; Onuchic J. N.; Wolynes P. G. Diffusive Dynamics of the Reaction Coordinate for Protein Folding Funnels. J. Chem. Phys. 1996, 104, 5860–5868. 10.1063/1.471317. [DOI] [Google Scholar]
Neupane K.; Manuel A. P.; Woodside M. T. Protein Folding Trajectories Can Be Described Quantitatively by One-Dimensional Diffusion over Measured Energy Landscapes. Nat. Phys. 2016, 12, 700–703. 10.1038/nphys3677. [DOI] [Google Scholar]
Mori H. Transport, Collective Motion, and Brownian Motion. Prog. Theor. Phys. 1965, 33, 423–455. 10.1143/PTP.33.423. [DOI] [Google Scholar]
Zwanzig R. Nonlinear Generalized Langevin Equations. J. Stat. Phys. 1973, 9, 215–220. 10.1007/BF01008729. [DOI] [Google Scholar]
Ayaz C.; Scalfi L.; Dalton B. A.; Netz R. R. Generalized Langevin Equation with a Nonlinear Potential of Mean Force and Nonlinear Memory Friction from a Hybrid Projection Scheme. Phys. Rev. E 2022, 105, 054138 10.1103/PhysRevE.105.054138. [DOI] [PubMed] [Google Scholar]
Plotkin S. S.; Wolynes P. G. Non-Markovian Configurational Diffusion and Reaction Coordinates for Protein Folding. Phys. Rev. Lett. 1998, 80, 5015–5018. 10.1103/PhysRevLett.80.5015. [DOI] [Google Scholar]
Kappler J.; Daldrop J. O.; Brünig F. N.; Boehle M. D.; Netz R. R. Memory-Induced Acceleration and Slowdown of Barrier Crossing. J. Chem. Phys. 2018, 148, 014903 10.1063/1.4998239. [DOI] [PubMed] [Google Scholar]
Ayaz C.; Tepper L.; Brünig F. N.; Kappler J.; Daldrop J. O.; Netz R. R. Non-Markovian Modeling of Protein Folding. Proc. Natl. Acad. Sci. U. S. A. 2021, 118, e2023856118. 10.1073/pnas.2023856118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dalton B. A.; Ayaz C.; Kiefer H.; Klimek A.; Tepper L.; Netz R. R. Fast Protein Folding Is Governed by Memory-Dependent Friction. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2220068120 10.1073/pnas.2220068120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berne B. J.; Harp G. D.. Advances in Chem. Phys.; John Wiley & Sons, Ltd.: 2007; pp 63–227. [Google Scholar]
Lange O. F.; Grubmüller H. Collective Langevin Dynamics of Conformational Motions in Proteins. J. Chem. Phys. 2006, 124, 214903 10.1063/1.2199530. [DOI] [PubMed] [Google Scholar]
Deichmann G.; van der Vegt N. F. A. Bottom-up Approach to Represent Dynamic Properties in Coarse-Grained Molecular Simulations. J. Chem. Phys. 2018, 149, 244114 10.1063/1.5064369. [DOI] [PubMed] [Google Scholar]
Daldrop J. O.; Kappler J.; Brünig F. N.; Netz R. R. Butane Dihedral Angle Dynamics in Water Is Dominated by Internal Friction. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, 5169–5174. 10.1073/pnas.1722327115. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jung G.; Hanke M.; Schmid F. Iterative Reconstruction of Memory Kernels. J. Chem. Theory Comput. 2017, 13, 2481–2488. 10.1021/acs.jctc.7b00274. [DOI] [PubMed] [Google Scholar]
Daldrop J. O.; Kowalik B. G.; Netz R. R. External Potential Modifies Friction of Molecular Solutes in Water. Phys. Rev. X 2017, 7, 041065 10.1103/PhysRevX.7.041065. [DOI] [Google Scholar]
Vroylandt H.; Goudenège L.; Monmarché P.; Pietrucci F.; Rotenberg B. Likelihood-Based Non-Markovian Models from Molecular Dynamics. Proc. Natl. Acad. Sci. U. S. A. 2022, 119, e2117586119 10.1073/pnas.2117586119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S.; Ma Z.; Pan W. Data-Driven Coarse-Grained Modeling of Polymers in Solution with Structural and Dynamic Properties Conserved. Soft Matter 2020, 16, 8330–8344. 10.1039/D0SM01019G. [DOI] [PubMed] [Google Scholar]
Ma Z.; Wang S.; Kim M.; Liu K.; Chen C.-L.; Pan W. Transfer Learning of Memory Kernels for Transferable Coarse-Graining of Polymer Dynamics. Soft Matter 2021, 17, 5864–5877. 10.1039/D1SM00364J. [DOI] [PubMed] [Google Scholar]
Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
Alemany A.; Rey-Serra B.; Frutos S.; Cecconi C.; Ritort F. Mechanical Folding and Unfolding of Protein Barnase at the Single-Molecule Level. Biophys. J. 2016, 110, 63–74. 10.1016/j.bpj.2015.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
von Hansen Y.; Mehlich A.; Pelz B.; Rief M.; Netz R. R. Auto- and Cross-Power Spectral Analysis of Dual Trap Optical Tweezer Experiments Using Bayesian Inference. Rev. Sci. Instrum. 2012, 83, 095116 10.1063/1.4753917. [DOI] [PubMed] [Google Scholar]
Mitterwallner B. G.; Schreiber C.; Daldrop J. O.; Rädler J. O.; Netz R. R. Non-Markovian Data-Driven Modeling of Single-Cell Motility. Phys. Rev. E 2020, 101, 032408 10.1103/PhysRevE.101.032408. [DOI] [PubMed] [Google Scholar]
Gordon D.; Krishnamurthy V.; Chung S.-H. Generalized Langevin Models of Molecular Dynamics Simulations with Applications to Ion Channels. J. Chem. Phys. 2009, 131, 134102 10.1063/1.3233945. [DOI] [PubMed] [Google Scholar]
Shin H. K.; Kim C.; Talkner P.; Lee E. K. Brownian Motion from Molecular Dynamics. Chem. Phys. 2010, 375, 316–326. 10.1016/j.chemphys.2010.05.019. [DOI] [Google Scholar]
Kowalik B.; Daldrop J. O.; Kappler J.; Schulz J. C. F.; Schlaich A.; Netz R. R. Memory-Kernel Extraction for Different Molecular Solutes in Solvents of Varying Viscosity in Confinement. Phys. Rev. E 2019, 100, 012126 10.1103/PhysRevE.100.012126. [DOI] [PubMed] [Google Scholar]
Cao S.; Qiu Y.; Kalin M. L.; Huang X. Integrative Generalized Master Equation: A Method to Study Long-Timescale Biomolecular Dynamics via the Integrals of Memory Kernels. J. Chem. Phys. 2023, 159, 134106 10.1063/5.0167287. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kappler J.; Hinrichsen V. B.; Netz R. R. Non-Markovian Barrier Crossing with Two-Time-Scale Memory Is Dominated by the Faster Memory Component. Eur. Phys. J. E: Soft Matter Biol. Phys. 2019, 42, 119. 10.1140/epje/i2019-11886-7. [DOI] [PubMed] [Google Scholar]
Lavacchi L.; Kappler J.; Netz R. R. Barrier Crossing in the Presence of Multi-Exponential Memory Functions with Unequal Friction Amplitudes and Memory Times. EPL 2020, 131, 40004 10.1209/0295-5075/131/40004. [DOI] [Google Scholar]
Bao J.-D. Numerical Integration of a Non-Markovian Langevin Equation with a Thermal Band-Passing Noise. J. Stat. Phys. 2004, 114, 503–513. 10.1023/B:JOSS.0000003118.62044.b7. [DOI] [Google Scholar]
Cetin B.; Burdick J.; Barhen J.. Global Descent Replaces Gradient Descent to Avoid Local Minima Problem in Learning with Artificial Neural Networks. IEEE International Conference on Neural Networks; 1993; Vol. 2, pp 836–842.
Bandler J. Optimization Methods for Computer-Aided Design. IEEE Trans. Microw. Theory Techn. 1969, 17, 533–552. 10.1109/TMTT.1969.1127005. [DOI] [Google Scholar]
Zeigler B. P.Study of Genetic Direct Search Algorithms for Function Optimization; 1974.
Fitzpatrick J. M.; Grefenstette J. J. Genetic Algorithms in Noisy Environments. Machine Learning 1988, 3, 101–120. 10.1007/BF00113893. [DOI] [Google Scholar]
Buche D.; Schraudolph N.; Koumoutsakos P. Accelerating Evolutionary Algorithms with Gaussian Process Fitness Function Models. IEEE T. Syst. Man. Cy. C 2005, 35, 183–194. 10.1109/TSMCC.2004.841917. [DOI] [Google Scholar]
Williams C. K. I.; Rasmussen C. E.. Gaussian Processes for Machine Learning; MIT Press: 2006. [Google Scholar]
Gramacy R. B.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences; Chapman Hall/CRC: Boca Raton, FL, 2020. [Google Scholar]
Deringer V. L.; Bartók A. P.; Bernstein N.; Wilkins D. M.; Ceriotti M.; Csányi G. Gaussian Process Regression for Materials and Molecules. Chem. Rev. 2021, 121, 10073–10141. 10.1021/acs.chemrev.1c00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stegle O.; Fallert S. V.; MacKay D. J. C.; Brage S. Gaussian Process Robust Regression for Noisy Heart Rate Data. IEEE Trans. Biomed. Eng. 2008, 55, 2143–2151. 10.1109/TBME.2008.923118. [DOI] [PubMed] [Google Scholar]
Daemi A.; Alipouri Y.; Huang B. Identification of Robust Gaussian Process Regression with Noisy Input Using EM Algorithm. Chemom. Intell. Lab. Syst. 2019, 191, 1–11. 10.1016/j.chemolab.2019.05.001. [DOI] [Google Scholar]
Lin M.; Song X.; Qian Q.; Li H.; Sun L.; Zhu S.; Jin R.. Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019; pp 2838–2847.
Kaappa S.; del Río E. G.; Jacobsen K. W. Global Optimization of Atomic Structures with Gradient-Enhanced Gaussian Process Regression. Phys. Rev. B 2021, 103, 174114 10.1103/PhysRevB.103.174114. [DOI] [Google Scholar]
Nikolaidis P.; Chatzis S. Gaussian Process-Based Bayesian Optimization for Data-Driven Unit Commitment. Int. J. Electr. Power Energy Syst. 2021, 130, 106930 10.1016/j.ijepes.2021.106930. [DOI] [Google Scholar]
Zhao T.; Zheng Y.; Wu Z. Improving Computational Efficiency of Machine Learning Modeling of Nonlinear Processes Using Sensitivity Analysis and Active Learning. Digital Chemical Engineering 2022, 3, 100027 10.1016/j.dche.2022.100027. [DOI] [Google Scholar]
Chang J.; Kim J.; Zhang B.-T.; Pitt M. A.; Myung J. I. Data-Driven Experimental Design and Model Development Using Gaussian Process with Active Learning. Cognit. Psychol. 2021, 125, 101360 10.1016/j.cogpsych.2020.101360. [DOI] [PubMed] [Google Scholar]
Jin S.-S.; Hong J.; Choi H. Gaussian Process-Assisted Active Learning for Autonomous Data Acquisition of Impact Echo. Autom. Constr. 2022, 139, 104269 10.1016/j.autcon.2022.104269. [DOI] [Google Scholar]
Petrosyan R.; Patra S.; Rezajooei N.; Garen C. R.; Woodside M. T. Unfolded and Intermediate States of PrP Play a Key Role in the Mechanism of Action of an Antiprion Chaperone. Proc. Natl. Acad. Sci. U. S. A. 2021, 118, e2010213118 10.1073/pnas.2010213118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hinczewski M.; Gebhardt J. C. M.; Rief M.; Thirumalai D. From Mechanical Folding Trajectories to Intrinsic Energy Landscapes of Biopolymers. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 4500–4505. 10.1073/pnas.1214051110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neupane K.; Foster D. A. N.; Dee D. R.; Yu H.; Wang F.; Woodside M. T. Direct Observation of Transition Paths during the Folding of Proteins and Nucleic Acids. Science 2016, 352, 239–242. 10.1126/science.aad0637. [DOI] [PubMed] [Google Scholar]
Van Rossum G.; Drake F. L.. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, 2009. [Google Scholar]
Matsakis N. D.; Klock II F. S. The rust language. ACM SIGAda Ada Letters 2014, 34, 103–104. 10.1145/2692956.2663188. [DOI] [Google Scholar]
Virtanen P.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; et al. Scikit-learn: Machine learning in Python. Journal of machine learning research 2011, 12, 2825–2830. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct3c01289_si_001.pdf^{(904.3KB, pdf)}

[ref1] Quaresima V.; Ferrari M. A Mini-Review on Functional Near-Infrared Spectroscopy (fNIRS): Where Do We Stand, and Where Should We Go?. Photonics 2019, 6, 87. 10.3390/photonics6030087. [DOI] [Google Scholar]

[ref2] Einstein A. Über Die von Der Molekularkinetischen Theorie Der Wärme Geforderte Bewegung von in Ruhenden Flüssigkeiten Suspendierten Teilchen. Ann. Phys. 1905, 322, 549–560. 10.1002/andp.19053220806. [DOI] [Google Scholar]

[ref3] Brox Th. A One-Dimensional Diffusion Process in a Wiener Medium. Ann. Probab. 1986, 14, 1206–1218. 10.1214/aop/1176992363. [DOI] [Google Scholar]

[ref4] Kitao A.; Go N. Investigating Protein Dynamics in Collective Coordinate Space. Curr. Opin. Struct. Biol. 1999, 9, 164–169. 10.1016/S0959-440X(99)80023-2. [DOI] [PubMed] [Google Scholar]

[ref5] Best R. B.; Hummer G. Coordinate-Dependent Diffusion in Protein Folding. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 1088–1093. 10.1073/pnas.0910390107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] Ernst M.; Wolf S.; Stock G. Identification and Validation of Reaction Coordinates Describing Protein Functional Motion: Hierarchical Dynamics of T4 Lysozyme. J. Chem. Theory Comput. 2017, 13, 5076–5088. 10.1021/acs.jctc.7b00571. [DOI] [PubMed] [Google Scholar]

[ref7] Socci N. D.; Onuchic J. N.; Wolynes P. G. Diffusive Dynamics of the Reaction Coordinate for Protein Folding Funnels. J. Chem. Phys. 1996, 104, 5860–5868. 10.1063/1.471317. [DOI] [Google Scholar]

[ref8] Neupane K.; Manuel A. P.; Woodside M. T. Protein Folding Trajectories Can Be Described Quantitatively by One-Dimensional Diffusion over Measured Energy Landscapes. Nat. Phys. 2016, 12, 700–703. 10.1038/nphys3677. [DOI] [Google Scholar]

[ref9] Mori H. Transport, Collective Motion, and Brownian Motion. Prog. Theor. Phys. 1965, 33, 423–455. 10.1143/PTP.33.423. [DOI] [Google Scholar]

[ref10] Zwanzig R. Nonlinear Generalized Langevin Equations. J. Stat. Phys. 1973, 9, 215–220. 10.1007/BF01008729. [DOI] [Google Scholar]

[ref11] Ayaz C.; Scalfi L.; Dalton B. A.; Netz R. R. Generalized Langevin Equation with a Nonlinear Potential of Mean Force and Nonlinear Memory Friction from a Hybrid Projection Scheme. Phys. Rev. E 2022, 105, 054138 10.1103/PhysRevE.105.054138. [DOI] [PubMed] [Google Scholar]

[ref12] Plotkin S. S.; Wolynes P. G. Non-Markovian Configurational Diffusion and Reaction Coordinates for Protein Folding. Phys. Rev. Lett. 1998, 80, 5015–5018. 10.1103/PhysRevLett.80.5015. [DOI] [Google Scholar]

[ref13] Kappler J.; Daldrop J. O.; Brünig F. N.; Boehle M. D.; Netz R. R. Memory-Induced Acceleration and Slowdown of Barrier Crossing. J. Chem. Phys. 2018, 148, 014903 10.1063/1.4998239. [DOI] [PubMed] [Google Scholar]

[ref14] Ayaz C.; Tepper L.; Brünig F. N.; Kappler J.; Daldrop J. O.; Netz R. R. Non-Markovian Modeling of Protein Folding. Proc. Natl. Acad. Sci. U. S. A. 2021, 118, e2023856118. 10.1073/pnas.2023856118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Dalton B. A.; Ayaz C.; Kiefer H.; Klimek A.; Tepper L.; Netz R. R. Fast Protein Folding Is Governed by Memory-Dependent Friction. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2220068120 10.1073/pnas.2220068120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Berne B. J.; Harp G. D.. Advances in Chem. Phys.; John Wiley & Sons, Ltd.: 2007; pp 63–227. [Google Scholar]

[ref17] Lange O. F.; Grubmüller H. Collective Langevin Dynamics of Conformational Motions in Proteins. J. Chem. Phys. 2006, 124, 214903 10.1063/1.2199530. [DOI] [PubMed] [Google Scholar]

[ref18] Deichmann G.; van der Vegt N. F. A. Bottom-up Approach to Represent Dynamic Properties in Coarse-Grained Molecular Simulations. J. Chem. Phys. 2018, 149, 244114 10.1063/1.5064369. [DOI] [PubMed] [Google Scholar]

[ref19] Daldrop J. O.; Kappler J.; Brünig F. N.; Netz R. R. Butane Dihedral Angle Dynamics in Water Is Dominated by Internal Friction. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, 5169–5174. 10.1073/pnas.1722327115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Jung G.; Hanke M.; Schmid F. Iterative Reconstruction of Memory Kernels. J. Chem. Theory Comput. 2017, 13, 2481–2488. 10.1021/acs.jctc.7b00274. [DOI] [PubMed] [Google Scholar]

[ref21] Daldrop J. O.; Kowalik B. G.; Netz R. R. External Potential Modifies Friction of Molecular Solutes in Water. Phys. Rev. X 2017, 7, 041065 10.1103/PhysRevX.7.041065. [DOI] [Google Scholar]

[ref22] Vroylandt H.; Goudenège L.; Monmarché P.; Pietrucci F.; Rotenberg B. Likelihood-Based Non-Markovian Models from Molecular Dynamics. Proc. Natl. Acad. Sci. U. S. A. 2022, 119, e2117586119 10.1073/pnas.2117586119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] Wang S.; Ma Z.; Pan W. Data-Driven Coarse-Grained Modeling of Polymers in Solution with Structural and Dynamic Properties Conserved. Soft Matter 2020, 16, 8330–8344. 10.1039/D0SM01019G. [DOI] [PubMed] [Google Scholar]

[ref24] Ma Z.; Wang S.; Kim M.; Liu K.; Chen C.-L.; Pan W. Transfer Learning of Memory Kernels for Transferable Coarse-Graining of Polymer Dynamics. Soft Matter 2021, 17, 5864–5877. 10.1039/D1SM00364J. [DOI] [PubMed] [Google Scholar]

[ref25] Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]

[ref26] Alemany A.; Rey-Serra B.; Frutos S.; Cecconi C.; Ritort F. Mechanical Folding and Unfolding of Protein Barnase at the Single-Molecule Level. Biophys. J. 2016, 110, 63–74. 10.1016/j.bpj.2015.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] von Hansen Y.; Mehlich A.; Pelz B.; Rief M.; Netz R. R. Auto- and Cross-Power Spectral Analysis of Dual Trap Optical Tweezer Experiments Using Bayesian Inference. Rev. Sci. Instrum. 2012, 83, 095116 10.1063/1.4753917. [DOI] [PubMed] [Google Scholar]

[ref28] Mitterwallner B. G.; Schreiber C.; Daldrop J. O.; Rädler J. O.; Netz R. R. Non-Markovian Data-Driven Modeling of Single-Cell Motility. Phys. Rev. E 2020, 101, 032408 10.1103/PhysRevE.101.032408. [DOI] [PubMed] [Google Scholar]

[ref29] Gordon D.; Krishnamurthy V.; Chung S.-H. Generalized Langevin Models of Molecular Dynamics Simulations with Applications to Ion Channels. J. Chem. Phys. 2009, 131, 134102 10.1063/1.3233945. [DOI] [PubMed] [Google Scholar]

[ref30] Shin H. K.; Kim C.; Talkner P.; Lee E. K. Brownian Motion from Molecular Dynamics. Chem. Phys. 2010, 375, 316–326. 10.1016/j.chemphys.2010.05.019. [DOI] [Google Scholar]

[ref31] Kowalik B.; Daldrop J. O.; Kappler J.; Schulz J. C. F.; Schlaich A.; Netz R. R. Memory-Kernel Extraction for Different Molecular Solutes in Solvents of Varying Viscosity in Confinement. Phys. Rev. E 2019, 100, 012126 10.1103/PhysRevE.100.012126. [DOI] [PubMed] [Google Scholar]

[ref32] Cao S.; Qiu Y.; Kalin M. L.; Huang X. Integrative Generalized Master Equation: A Method to Study Long-Timescale Biomolecular Dynamics via the Integrals of Memory Kernels. J. Chem. Phys. 2023, 159, 134106 10.1063/5.0167287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Kappler J.; Hinrichsen V. B.; Netz R. R. Non-Markovian Barrier Crossing with Two-Time-Scale Memory Is Dominated by the Faster Memory Component. Eur. Phys. J. E: Soft Matter Biol. Phys. 2019, 42, 119. 10.1140/epje/i2019-11886-7. [DOI] [PubMed] [Google Scholar]

[ref34] Lavacchi L.; Kappler J.; Netz R. R. Barrier Crossing in the Presence of Multi-Exponential Memory Functions with Unequal Friction Amplitudes and Memory Times. EPL 2020, 131, 40004 10.1209/0295-5075/131/40004. [DOI] [Google Scholar]

[ref35] Bao J.-D. Numerical Integration of a Non-Markovian Langevin Equation with a Thermal Band-Passing Noise. J. Stat. Phys. 2004, 114, 503–513. 10.1023/B:JOSS.0000003118.62044.b7. [DOI] [Google Scholar]

[ref36] Cetin B.; Burdick J.; Barhen J.. Global Descent Replaces Gradient Descent to Avoid Local Minima Problem in Learning with Artificial Neural Networks. IEEE International Conference on Neural Networks; 1993; Vol. 2, pp 836–842.

[ref37] Bandler J. Optimization Methods for Computer-Aided Design. IEEE Trans. Microw. Theory Techn. 1969, 17, 533–552. 10.1109/TMTT.1969.1127005. [DOI] [Google Scholar]

[ref38] Zeigler B. P.Study of Genetic Direct Search Algorithms for Function Optimization; 1974.

[ref39] Fitzpatrick J. M.; Grefenstette J. J. Genetic Algorithms in Noisy Environments. Machine Learning 1988, 3, 101–120. 10.1007/BF00113893. [DOI] [Google Scholar]

[ref40] Buche D.; Schraudolph N.; Koumoutsakos P. Accelerating Evolutionary Algorithms with Gaussian Process Fitness Function Models. IEEE T. Syst. Man. Cy. C 2005, 35, 183–194. 10.1109/TSMCC.2004.841917. [DOI] [Google Scholar]

[ref41] Williams C. K. I.; Rasmussen C. E.. Gaussian Processes for Machine Learning; MIT Press: 2006. [Google Scholar]

[ref42] Gramacy R. B.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences; Chapman Hall/CRC: Boca Raton, FL, 2020. [Google Scholar]

[ref43] Deringer V. L.; Bartók A. P.; Bernstein N.; Wilkins D. M.; Ceriotti M.; Csányi G. Gaussian Process Regression for Materials and Molecules. Chem. Rev. 2021, 121, 10073–10141. 10.1021/acs.chemrev.1c00022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] Stegle O.; Fallert S. V.; MacKay D. J. C.; Brage S. Gaussian Process Robust Regression for Noisy Heart Rate Data. IEEE Trans. Biomed. Eng. 2008, 55, 2143–2151. 10.1109/TBME.2008.923118. [DOI] [PubMed] [Google Scholar]

[ref45] Daemi A.; Alipouri Y.; Huang B. Identification of Robust Gaussian Process Regression with Noisy Input Using EM Algorithm. Chemom. Intell. Lab. Syst. 2019, 191, 1–11. 10.1016/j.chemolab.2019.05.001. [DOI] [Google Scholar]

[ref46] Lin M.; Song X.; Qian Q.; Li H.; Sun L.; Zhu S.; Jin R.. Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019; pp 2838–2847.

[ref47] Kaappa S.; del Río E. G.; Jacobsen K. W. Global Optimization of Atomic Structures with Gradient-Enhanced Gaussian Process Regression. Phys. Rev. B 2021, 103, 174114 10.1103/PhysRevB.103.174114. [DOI] [Google Scholar]

[ref48] Nikolaidis P.; Chatzis S. Gaussian Process-Based Bayesian Optimization for Data-Driven Unit Commitment. Int. J. Electr. Power Energy Syst. 2021, 130, 106930 10.1016/j.ijepes.2021.106930. [DOI] [Google Scholar]

[ref49] Zhao T.; Zheng Y.; Wu Z. Improving Computational Efficiency of Machine Learning Modeling of Nonlinear Processes Using Sensitivity Analysis and Active Learning. Digital Chemical Engineering 2022, 3, 100027 10.1016/j.dche.2022.100027. [DOI] [Google Scholar]

[ref50] Chang J.; Kim J.; Zhang B.-T.; Pitt M. A.; Myung J. I. Data-Driven Experimental Design and Model Development Using Gaussian Process with Active Learning. Cognit. Psychol. 2021, 125, 101360 10.1016/j.cogpsych.2020.101360. [DOI] [PubMed] [Google Scholar]

[ref51] Jin S.-S.; Hong J.; Choi H. Gaussian Process-Assisted Active Learning for Autonomous Data Acquisition of Impact Echo. Autom. Constr. 2022, 139, 104269 10.1016/j.autcon.2022.104269. [DOI] [Google Scholar]

[ref52] Petrosyan R.; Patra S.; Rezajooei N.; Garen C. R.; Woodside M. T. Unfolded and Intermediate States of PrP Play a Key Role in the Mechanism of Action of an Antiprion Chaperone. Proc. Natl. Acad. Sci. U. S. A. 2021, 118, e2010213118 10.1073/pnas.2010213118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref53] Hinczewski M.; Gebhardt J. C. M.; Rief M.; Thirumalai D. From Mechanical Folding Trajectories to Intrinsic Energy Landscapes of Biopolymers. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 4500–4505. 10.1073/pnas.1214051110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] Neupane K.; Foster D. A. N.; Dee D. R.; Yu H.; Wang F.; Woodside M. T. Direct Observation of Transition Paths during the Folding of Proteins and Nucleic Acids. Science 2016, 352, 239–242. 10.1126/science.aad0637. [DOI] [PubMed] [Google Scholar]

[ref55] Van Rossum G.; Drake F. L.. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, 2009. [Google Scholar]

[ref56] Matsakis N. D.; Klock II F. S. The rust language. ACM SIGAda Ada Letters 2014, 34, 103–104. 10.1145/2692956.2663188. [DOI] [Google Scholar]

[ref57] Virtanen P.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref58] Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; et al. Scikit-learn: Machine learning in Python. Journal of machine learning research 2011, 12, 2825–2830. [Google Scholar]

PERMALINK

Accurate Memory Kernel Extraction from Discretized Time-Series Data

Lucas Tepper

Benjamin Dalton

Roland R Netz

Abstract

Introduction

Results and Discussion