Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 1.
Published in final edited form as: J Comput Phys. 2020 Jun 3;418:109633. doi: 10.1016/j.jcp.2020.109633

Data-driven molecular modeling with the generalized Langevin equation

Francesca Grogan a,*, Huan Lei b,c, Xiantao Li d, Nathan A Baker a,e
PMCID: PMC7494205  NIHMSID: NIHMS1602163  PMID: 32952214

Abstract

The complexity of molecular dynamics simulations necessitates dimension reduction and coarse-graining techniques to enable tractable computation. The generalized Langevin equation (GLE) describes coarse-grained dynamics in reduced dimensions. In spite of playing a crucial role in non-equilibrium dynamics, the memory kernel of the GLE is often ignored because it is difficult to characterize and expensive to solve. To address these issues, we construct a data-driven rational approximation to the GLE. Building upon previous work leveraging the GLE to simulate simple systems, we extend these results to more complex molecules, whose many degrees of freedom and complicated dynamics require approximation methods. We demonstrate the effectiveness of our approximation by testing it against exact methods and comparing observables such as autocorrelation and transition rates.

Keywords: molecular dynamics, generalized Langevin equation, coarse-grained models, dimension reduction, data-driven parametrization

1. Introduction

Molecular dynamics methods simulate atomic trajectories using Newton’s second law of motion. Full atomic-detail molecular dynamics (MD) simulations are often prohibitively expensive due to the complexity and size of the systems under study. Model reduction based on surrogates [16] and projection operators [7, 8] is a popular approach for reducing dimension and complexity in a wide range of computational science applications. One such model is the generalized Langevin equation (GLE) [7, 8], which describes the system in terms of collective degrees of freedom and simulates dynamics in terms of coarse-grained collective variables (CVs). The GLE reduces the problem size by only explicitly representing the dynamics of these CVs; the remaining degrees of freedom are described implicitly. GLE-based approaches have been successfully used in a variety of application areas [914]. An important component of the GLE is a time-dependent memory kernel that accounts for the implicit degrees of freedom and their impact on the evolution of the explicitly resolved CVs. This memory term plays a crucial role in non-equilibrium dynamics but is often hard to characterize and evaluate, particularly for high-dimensional systems [1517]. The kernel is sometimes simplified to reduce computational requirements; however, this often renders the model unable to accurately represent system dynamics [1820].

Ideally, construction of the memory kernel should balance computational cost and accuracy. For theoretical convergence analysis and error bounds for various memory kernel approximations, see [21]. Previous work by the authors [22] introduced a data-driven approach to parameterize the GLE memory kernel via a rational approximation in Laplace space. This modeling ansatz—along with the introduction of appropriate auxiliary variables—transforms the GLE into an extended system driven by white noise, where the second fluctuation-dissipation theorem (FDT) [23] can be satisfied by properly choosing the covariance matrix. Numerical studies on simple systems (a tagged particle in solvent) show that this approach can successfully characterize the non-equilibrium dynamics beyond Einstein’s Brownian motion theory and accurately predicts observables such as transition rates between a double-well potential. In the data-driven algorithm, modeling accuracy relies on the approximation order of the memory kernel. Data-driven model reduction methods have also been developed by others using a variety of approaches [2427]. Additionally, Zhu and Venturi have demonstrated the use of polynomial approximations of memory kernels for GLE-like problems [28] as well as a first-principles method for systems with local interactions [29].

In this work, we extend the data-driven parameterization approach [22] to construct a reduced model for the small molecule system of benzyl bromide (BnBr) in explicit water. We recently developed a data-driven approach [5] for uncertainty quantification of the equilibrium properties (e.g., solvation energy) with respect to the non-Gaussian conformation fluctuations using this solvated BnBr system. To quantify the non-equilibrium dynamics, the non-Markovian memory will need to be accurately constructed. In particular, this system is more complex than the benchmark problems considered previously by us [22]: its energy landscape has multiple energetic minima and both the intra- as well as inter-molecular interactions contribute to the energy-dissipation process. On the other hand, the small size of BnBr allows MD simulations to achieve near-ergodic sampling of its conformational space within a tractable amount of time. The transition rate between the two conformational states can be directly evaluated by MD simulation and compared with the predictions from the reduced model. Recently, similar work has been reported by Lee and co-workers [30], where a reduced model of the molecule alanine dipeptide is constructed by the GLE in terms of two dihedral angles and is then parameterized through time-series expansions. In the current study, we present an alternative approach that constructs the memory kernel in Laplace space based on a modification of our earlier approach. These modifications were made to accommodate the more complex gradient system we study in this work, see Section 2.2 for details. An advantage of our approach is that accuracy can be adaptively tuned by adjusting the order of the memory kernel approximation. We demonstrate the applicability of our GLE method on model reduction for molecules in aqueous environments.

The paper is organized as follows. Section 2 introduces the GLE and presents our methodology for constructing the data-driven reduced-order model. We discuss our simulation setup at the end of Section 2. In Section 3, we present results testing our approximation against exact methods and comparing observables such as autocorrelation and transition rates which show the exact memory term is well-modeled by its data-driven parametrization. We briefly conclude and discuss avenues for future work in Section 4.

2. Methods

We begin in Section 2.1 by introducing the GLE and the CVs used. Section 2.2 discusses two approaches to build a rational approximation to the GLE memory kernel. Using extended dynamics to represent the GLE is presented in Section 2.3, with initial and noise conditions detailed in Section 2.4. Finally, we provide simulation setup details in Section 2.5.

2.1. Preliminaries

Before introducing the GLE, we discuss the CVs to be calculated from our BnBr simulations. We perform principal component analysis (PCA) on the BnBr atom positions x(t):[0,)N and velocities x˙(t) obtained from an MD trajectory, where N is the number of degrees of freedom in the system (usually N = 3n − 6 for n atoms). The covariance matrix CN×N is defined as

C=(x(t)x(t))(x(t)x(t))T,

where 〈·〉 denotes the ensemble average with respect to the equilibrium distribution of x.

We project the BnBr trajectory onto the principal modes using the eigen-decomposition C = VDVT to obtain the principal components q(t):[0,)N

q(t)=VT(x(t)x(t))

and associated velocities q˙(t). These principal components provide an understanding of the dynamic behavior of BnBr by highlighting the dominant motions of the molecule. In our study, we use the first principal component as our CV; however, it is possible to generalize our method to multi-dimensional as well as nonlinear CVs.

We next introduce the CV mass matrix to be used in the GLE. Generally, for a nonlinear CV, f(x), the mass matrix M is the diagonal matrix whose elements Mii are given by

Mii=(i=1n1μi(fxi)2ρ(x)dx)1,

where μi is the mass associated with the i-th atom and ρ is the equilibrium probability density function (PDF). In the case of our linear CV, M can simply be defined using the equipartition theorem:

Mq˙q˙T=β1,

where β = (kBT)−1, kB is the Boltzmann constant, T is the temperature, and I is the N-dimensional identity matrix. For details regarding the equivalence of the two formulæ, see Lange and Grubmüller [31]. We note that the mass matrix may also be approximated as a function of the CV, so that the GLE consists of only coarse-grained terms [30].

Given a mass matrix M, the momentum p(t):[0,)N is defined as

p(t)=Mq˙(t) (1)

and the GLE can be written as

p˙=F(q)0tK(tτ)q˙(τ)dτ+R(t), (2)

where F(q):[0,)N is the conservative force, K(t):[0,)N×N is the time-dependent memory kernel function, and R(t):[0,)N is the random noise modeled as a stationary Gaussian process with zero mean that satisfies the second FDT:

R(t)R(t)T=β1K(tt).

2.2. Constructing a rational approximation to the memory kernel

We define the correlation matrices G(t):[0,)N×N and H(t):[0,)N×N as

G(t)=p˙(t)q(0)TF(q(t))q(0)TH(t)=q˙(t)q(0)T. (3)

Right-multiplying by q(0)T, the GLE (Eq. (2)) becomes

G(t)=0tK(tτ)H(τ)dτ (4)

with the assumption 〈R(t)q(0)T〉 = 0; see [32] for details.

With G(t) and H(t) defined by Eq. (3), we can solve Eq. (4) by transferring this integral equation into frequency space using the Laplace transform [33]:

G^(λ)=0G(t)et/λdt,     H^(λ)=0H(t)et/λdt,     K^(λ)=0K(t)et/λdt, (5)

such that Eq. (4) becomes

G^(λ)=K^(λ)H^(λ). (6)

Taking λ → ∞ of Eq. (5) gives

G^()=0G(t)dt,     H^()=0H(t)dt,     K^()=0K(t)dt. (7)

We note that the definitions of G and H differ from our previous work [22], where H was defined as the velocity correlation matrix, i.e., q˙(t)q˙(0)T. However, our previous choice led to numerical instability in the construction of K^(λ). In particular, the Markovian limit condition requires limλG^(λ)=limλK^(λ)H^(λ). If we choose H(t)=q˙(t)q˙(0)T, we need to evaluate the term limλH^(λ)=0+q˙(t)q˙(0)Tdt. For the gradient system considered in the present work, we note that

limtq(t)q˙(0)T=limtρ(q(t)=q|q˙(0)=v0)ρ0v(v0)qv0Tdv0=ρeqq(q)ρ0v(v0)qv0Tdv0eβU(q)eβv0TM1v0/2qv0Tdv00

where ρ(q(t)=q|q˙(0)=v0) represents the conditional probability of q(t) with the initial condition q˙(0)=v0 and we take ρ0v(v0), the probability density function of v0, to be the equilibrium density. Accordingly, limλH^(λ)=0, yielding the ill-conditioning of the Markovian limit of Eq. (6). Using H(t) as defined in Eq. (3) does not result in this ill-conditioning. (We note that for the numerical cases considered in [22], the dynamic equation does not contain the term U(q) and we do not encounter such difficulty).

With G^(λ) and H^(λ) sampled from MD simulations, we construct the memory kernel K^(λ) in the form of

K^(λ)(Im=1MBmλm)1(m=1MAmλm), (8)

where the terms of the expression are matrices Am,BmN×N. The highest-order coefficients of an M-order expansion can be found through the limit of Eq. (8):

limλK^(λ)=BM1AM, (9)

as K^()=G^()H^()1=(0G(t)dt)(0H(t)dt)1 by taking λ → ∞. Note that recovers K^() recovers the friction tensor in Markovian limit, i.e., the Markovian approximation is the zeroth-order GLE approximation

p˙=F(q)K^()q˙(t)+R(t), (10)

where K^() is the friction tensor and is proportional to the diffusion tensor D by the Einstein relation K^()=kBTD1.

Eq. (9) allows us to solve for either AM or BM. To solve for the remaining M − 1 unknown coefficients, there are two approaches. If high-order derivatives of H(t) and G(t) are available from the data at t = 0, then K^(λ) can be (semi-analytically) constructed by the first approach described below. Alternatively, as the numerical evaluation of higher-order terms may introduce significant numerical error in the coefficient calculations, K^(λ) can be numerically constructed by a regression approach using G^(λ) and H^(λ) at interpolation points.

Approach 1. The first approach involves coefficient matching and differentiation. First, we perform a Taylor expansion of K^(λ)

K^(λ)=n=1K^(n)(0)n!λn. (11)

Substituting this expression into the left-hand side of Eq. (8) and matching with respect to λ we obtain the formula

K^(n)(0)n!=An+l+m=nBlK^(m)(0)m!. (12)

We can then determine the terms K^(i) by differentiating Eq. (6). As an example, we compute the first-order coefficients. In this case, we use Eq. (12) to match coefficients with respect to λ1:

K^(1)(0)=A1, (13)

where we have used the fact that K^(0)(0)=0. To find an expression for the derivative K^(1)(0), we differentiate Eq. (6) and let λ → 0

G^(3)(0)=[K^(0)H^(3)(0)+3K^(1)(0)H^(2)(0)+3K^(2)(0)H^(1)(0)+K^(3)(0)H^(0)]=3K^(1)(0)H^(2)(0), (14)

noting that limλ0K^(λ)=limλ0H^(λ)=limλ0H^(1)(λ)=0 since q and q˙ are uncorrelated. Integrating Eq. (5) by parts and letting λ → 0 gives [22]

G^(i)(0)=i!G(i1)(0),     H^(i)(0)=i!H(i1)(0),     K^(i)(0)=i!K(i1)(0). (15)

Combining Eq. (15) with Eq. (14), we arrive at the following expressions for the first-order coefficients:

A1=G(2)(0)[H(1)(0)]1 (16)
B1=A1K^()1. (17)

Approach 2. The second approach to solve for unknown coefficients also starts with Eq. (9). However, in this approach, the memory kernel is constructed using regression at discrete values of λ For an M-order approximation, we choose a set of points λ1, λ2, …, λ2M−1 ∈ (0,∞) and solve for the coefficients such that the approximate memory kernel (Eq. (8)) interpolates the exact memory kernel at the chosen set of points. This results in M − 1 nonlinear equations. With Eq. (9), these equations comprise a nonlinear system of M equations to be solved, which we can express as

F(λ1,,λ2M1;A1,,AM,B1,,BM)=0. (18)

Continuing with our first-order example, this approach would result in the following F:

F(λ1;A1,B1)={(IB1λ1)1(A1λ1)K^(λ1)B1+A1K^()1 (19)

where λ1 is the user-chosen point and we are solving for unknowns A1 and B1. Any nonlinear solver can be employed to solve Eq. (18); we used the default trust-region algorithm available with MATLAB.

Our previous work used the first approach since the correlation matrices G and H were defined differently, allowing access to higher-order derivatives at t = 0. In contrast, for the new correlation matrices defined in Eq. (3), high-order information is no longer available. As such, we use the second approach in this current work. We note that the choice of the interpolation points is somewhat ad hoc: for the present study, we choose points to capture the peak and asymptotic values of K^(λ).As shown in Figure 3, we see a pronounced peak in K^(λ) which is indicative of dynamics in the BnBr time domain fluctuating more prominently toward the origin (for more on the relationship between K(t) and K^(λ) see [34]). Thus, we chose points close to λ = 0 so that we could more accurately capture this behavior. This choice of points does affect the quality of the approximation; an example is shown in Section 3.2.1. The optimal choice of the λ requires further investigation.

Figure 3:

Figure 3:

Memory kernel in Laplace space from MD simulation versus kernels constructed using data-driven GLE approximations of varying orders. Inset: Close-up of third- and fourth-order approximations capturing the pronounced peak of K^(λ).

2.3. Representing the GLE with extended dynamics driven by white noise

Once we have determined coefficients of the rational memory term, we can construct a new approximate GLE system [22]. We illustrate this construction by deriving the extended system in the first-order case. We know from Eq. (8) that the first-order rational approximation for K^ is given by

K^(λ)(IB1λ)1(A1λ).

Taking the inverse Laplace transform, L1, we obtain

K(t)=L1{K^(λ)}A1eB1t.

Let us define the auxiliary variable d(t):[0,)MN, where M is the order of the rational approximation, as

d(t)=0tK(tτ)q˙(τ)dτ+R(t).

With M = 1 for this derivation, we next define d1(t), the auxiliary variable in the first-order case, as

d1(t)=0tA1eB1(tτ)q˙(τ)dτ+R(t). (20)

Using the Leibniz integral rule, we can differentiate d1(t):

d˙1(t)=A1q˙(t)B10tA1eB1(tτ)q˙(τ)dτ+R˙(t).

Note that R(t) is assumed to be colored noise and not white noise, and as such is differentiable. As discussed in Section 2.4, R(t) obeys the FDT. Deferring the details to Section 2.4, the detailed colored noise can be expressed in terms of the initial condition d1(0) and a simple white noise term W1(t):

R(t)=0teB1(tτ)W1(τ)dτ+eB1td1(0),

which is further discussed in the next section. We remark that B1 is negative and the first term of R(t) is well-behaved for large t. Using the Leibniz integral rule again, we can write d˙1(t):

d˙1(t)=A1q˙(t)B10tA1eB1(tτ)q˙(τ)dτ+W1(t)+B10teB1(tτ)W1(τ)dτ+B1eB1td1(0). (21)

Note that

B10teB1(tτ)W1(τ)dτ+B1eB1td1(0)=B1R(t)B10tA1eB1(tτ)q˙(τ)dτ=B1d1(t)B1R(t)

and so Eq. (21) can be written as

d˙1(t)=B1d1(t)A1q˙(t)+W1(t)

to obtain the first-order approximate GLE system:

q˙=M1p,p˙=F(q)+d1,d˙1=B1d1A1q˙+W1. (22)

Higher-order approximations are obtained by generalizing the procedure above to obtain

q˙=M1p,p˙=F(q)+ZTdM,d˙M=BdMQZq˙+WM, (23)

where the symmetric positive definite matrix QMN×MN and the matrix ZMN×N are determined by matching Eq. (8) with the equation K(t) = ZT eBtQZ. Q is the covariance matrix of the auxiliary vector dM under equilibrium. The matrix B is dependent on order; e.g., a fourth-order approximation would have the form

B=(000B4100B3010B2001B1).

2.4. Initial and noise conditions to satisfy the second FDT

Recall that R(t) in Eq. (2) simulates system noise as a colored noise that must satisfy the second FDT. Through our extended GLE system, we can replace R(t) with a simpler white noise term W(t) and choose the initial and noise conditions for W(t) and d(t) to ensure that the colored noise generated by these extended dynamics also satisfies the second FDT [22]. For the first-order approximation, the initial and noise conditions are

d1(0)d1(0)T=β1A1W1(t)W1(t)T=β1(B1A1+A1B1T)δ(tt), (24)

and for higher-order approximations, the initial and noise conditions are

dM(0)dM(0)T=β1QWM(t)WM(t)T=β1(BQ+QBT)δ(tt) (25)

For other work on GLEs for systems driven by white noise, see [35] and [36].

The extended GLE system approximations eliminate costly integration of the exact memory term, which depends on the system history, by replacing it with an extended system of stochastic differential equations. The accuracy of this approximation improves with increasing order which involves reformulation of the matrix B and recalculation of the matrices Q and Z. These computations are relatively simple to perform for the low-order approximations needed to model small molecular systems (see Section 3 of this manuscript). Overall, our method provides substantial dimension-reduction and results in significant increase in computational tractability.

2.5. Simulation setup

Simulations were run using GROMACS [37] and the general AMBER force field [38]. We performed 360 simulations of a single BnBr molecule (Figure 1), which is comprised of 15 atoms, in a solvent consisting of 1011 water molecules in a (3.14216 nm)3 domain. We used a constant number-volume-temperature (NVT) ensemble with a Nosé-Hoover thermostat [39] at 300 K. Each simulation ran for 10 ns with a time step of 2 fs. The particle-mesh Ewald method [40] was used for long-range electrostatics. All BnBr bond lengths were constrained using the LINCS algorithm [41]; we note that this significantly reduces the dimension of the molecular conformational space. The BnBr positions were stored at every time step.

Figure 1:

Figure 1:

The benzyl bromide (BnBr) molecule with carbon atoms shown in gray, hydrogen in white, and bromine in red.

As a post-processing step, translational and rotational degrees of freedom were removed from the trajectory using the GROMACS function trajconv. Averaging was then done over the 360 total trajectories. We performed PCA on these trajectories and checked for convergence by splitting the post-processed trajectories into equal halves and calculating the PDFs of each half for the first few principal components. These PDFs of both halves matched well with each other, indicating the simulation had converged. As the first principal component accounted for 63% of the observed variance, we found it sufficient to use this as our single CV for the purposes of illustration in this paper. To physically interpret and visualize the results of PCA, we can generate a porcupine plot showing the motion along an eigenvector. In particular, the porcupine plot of the first eigenvector, Figure 2, shows dominant motions of BnBr, with the direction and length of each “quill” showing the direction and magnitude of motion, respectively. In particular, we can see the bromomethyl group contributes prominently to the motion of the first mode.

Figure 2:

Figure 2:

Porcupine plot of first eigenvector showing dominant motion of benzyl bromide (BnBr).

From this, we constructed the correlation matrices and solved for the unknown rational coefficients for zeroth- to fourth-order approximations as described above. Note that a zero-order approximation is simply a Markovian approximation, with the integral term in Eq. (2) simplifying to K^()q˙(t); see Ma et al. for details [42].

3. Results

The following section presents results testing our approximation against exact methods. In particular, we assess comparisons using the memory kernel, position autocorrelation, velocity autocorrelation, and mean first-passage time.

3.1. Memory kernel

The PDF ρ(q) of the CV defined in Section 2.1 can be calculated using kernel density estimation on samples from the MD trajectory. This PDF can be used to calculate the free energy U(q)

U(q)=β1 ln (ρ(q)) (26)

which, in turn, can be used to calculate the mean force

F(q)=U(q). (27)

With F(q), we are able to sample G(t) and H(t) and construct the Laplace transform of the memory term K^(λ) based on the numerical approach introduced in Section 2.2.

The exact K^(λ) calculated from our simulations shows a pronounced peak near λ = 0.01. Since this peak is indicative of oscillations in K(t) (i.e. oscillations back in the time domain), good approximation of this peak is important for capturing system dynamics. As shown in Figure 3, the first- and second-order approximations do not reproduce the peak; however, the third- and fourth-order rational functions have enough interpolation points for an accurate model.

3.2. Autocorrelation functions

The PDF tests the equilibrium properties of the approximation. To test dynamic properties, we computed both the position autocorrelation function (PACF) 〈q(t)q(0)T〉 as well as the velocity autocorrelation function (VACF) q˙(t)q˙(0)T and compared the resulting approximate trajectories to data calculated directly from the original MD simulation. The results are shown in Figure 4. The accuracy of the PACF increases with increasing order of the GLE approximation, with all performing better than the zero-order Markovian approximation. Likewise, the accuracy of the VACF also increases with increasing order of the GLE approximation. Long-time oscillations occur in the VACF of BnBr; we found such behavior is due to the following:

  • the strong intramolecular covalent bond interactions and the dominant motions of BnBr.

  • the number of peaks in the FFT of the velocity autocorrelation is of comparable order to the number of peaks in the vibrational spectrum of BnBr.

These oscillations make it particularly challenging to approximate; inaccuracies in these autocorrelations may lead to misinterpretation of the underlying nature of the system dynamics. The third- and fourth-order approximations reproduce both the PACF and VACF fairly well. Recall that we applied LINCS constraints to all BnBr bond lengths, which reduced the amount of noise in the VACF of the principal components, and likely allowed for easier approximation of the GLE terms.

Figure 4:

Figure 4:

PACF (A) and VACF (B) for exact MD data compared to approximate GLE simulations.

3.2.1. Selecting interpolation points

Recall that we construct an order-M rational memory term through regression with user-selected values λ12, …, λ2M−1 ∈ (0,∞). Figure 5 compares the VACFs of two third-order approximations constructed using two different sets of interpolation points. We see that the approximation using shorter-time interpolation points more accurately reproduces the VACF. As shown in Figure 3, this increase in accuracy is due to the regression model sufficiently capturing the pronounced peak in K^(λ), which occurs close to λ = 0.

Figure 5:

Figure 5:

Third-order VACF approximation using two different sets of interpolation points λi, i = 1, …, 5.

3.3. Mean first-passage time

Predicting non-equilibrium properties such as mean first-passage time (MFPT) between states is a challenging test for the GLE approximation since this statistic from the original BnBr MD simulations was not known or used a priori in the data-driven parametrization of the GLE. From the density ρ, we can calculate the potential of our first principal component U(q) = −β−1 ln(ρ) which has two wells, as shown in Figure 6A. Denoting the left potential well as state “A” and the right as state “B”, we define the MFPT as the mean time for a particle starting at an initial state to cross the peak maximum into the other state. In this example, this maximum occurs at q = 0.075; thus state A is defined as q < 0.075 and state B is defined as q > 0.075. Figure 6B shows a comparison for all orders of the MFPT and the exact MD trajectory. The Markovian approximation fails to accurately reproduce the MFPT, while the higher-order GLE approximations show significantly better agreement with the MD results.

Figure 6:

Figure 6:

Mean first-passage time between states in a double-well potential of mean force. A. Double-well potential U(q) calculated from the PDF ρ(q). B. MFPT (ps) from the approximate GLE compared to the exact MD data (red dotted line), shown with a 95% confidence interval.

4. Conclusion

As full MD simulations often require intensive computational resources and long run times to achieve ergodicity, researchers have increasingly relied on reduced-order modeling for simulation. In particular, the GLE has seen a resurgence in popularity, as it provides a convenient description of coarse-grained dynamics. While the exact GLE can significantly reduce problem size and difficulty, the memory kernel of the GLE relies on past-system history and is often hard to characterize and compute. To mitigate this, previous work introduced a data-driven approximation to the GLE. While there is cost associated with sampling from the exact system, it can be computationally cheaper than running the full MD simulation for very long times. Directly sampling from correlation functions of exact system dynamics, we replaced the memory kernel with a rational approximation and carefully introduced an auxiliary variable and white-noise term to convert the GLE into an extended system that does not rely on past-system history. Additionally, accuracy is adaptively affected by the chosen order of the rational approximation. This current work extends a data-driven approximation of the GLE [22] to more complex and realistic molecules. Using BnBr as our test case, our comparison of exact MD simulation against the approximation shows observables are reproduced well using relatively low orders for the rational term.

There are multiple avenues for future work that further develops modeling capability of complex systems. While we were able to represent BnBr system dynamics with a single CV, accurate construction of the memory term and reduced dynamics remains a challenging task as the complexity of the system increases, since the condition number of B may become large. Thus, it would be necessary to test for robustness on systems where the CV dimension is higher than one, as is done in [30]. Towards this end, to alleviate such difficulties, we are developing a regularization approach for the GLE approximation by formulating the memory kernel construction as an optimization problem. Furthermore, while we were able to use an unbiased density to compute the force term in this current work, it is more difficult to compute this term with respect to higher-dimensional spaces. To ensure adequate sampling of the energy surface, enhanced sampling methods may need to be paired with the data-driven GLE approximation in order to give robust results.

Acknowledgments

We thank Peiyuan Gao for helpful discussions. This work was performed using resources through Research Computing at Pacific Northwest National Laboratory. FG acknowledges support from the Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR) through the ASCR Distinguished Computational Mathematics Postdoc Project under ASCR Project 71268. HL and NAB acknowledge support from NIH grant GM069702.

Acronyms

ASCR

Advanced Scientific Computing Research

BnBr

benzyl bromide

CV

collective variable

DOE

Department of Energy

FDT

fluctuation-dissipation theorem

GLE

generalized Langevin equation

MFPT

mean first-passage time

MD

molecular dynamics

NVT

constant number-volume-temperature

PACF

position autocorrelation function

PCA

principal component analysis

PDF

probability density function

VACF

velocity autocorrelation function

References

  • [1].Gubskaya Anna V., Kholodovych Vladyslav, Knight Doyle, Kohn Joachim, and Welsh William J.. Prediction of fibrinogen adsorption for biodegradable polymers: Integration of molecular dynamics and surrogate modeling. Polymer, 48(19):5788–5801, 2007. doi: 10.1016/j.polymer.2007.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].James Daniel Nance. Investigating Molecular Dynamics with Sparse Grid Surrogate Models. PhD thesis, North Carolina State University, 2015. [Google Scholar]
  • [3].Lei H, Yang X, Zheng B, Lin G, and Baker NA. Constructing surrogate models of complex systems with enhanced sparsity: Quantifying the influence of conformational uncertainty in biomolecular solvation. SIAM Multiscale Model. Simul, 13(4):1327–1353, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Razi M, Narayan A, Kirby RM, and Bedrov D. Fast predictive models based on multi-fidelity sampling of properties in molecular dynamics simulations. Computational Materials Science, 152: 125–133, 2018. doi: 10.1016/j.commatsci.2018.05.029. [DOI] [Google Scholar]
  • [5].Lei H, Li J, Gao P, Stinis P, and Baker NA. A data-driven framework for sparsity-enhanced surrogates with arbitrary mutually dependent randomness. Computer Methods in Applied Mechanics and Engineering, 350:199 – 227, 2019. ISSN 0045–7825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Zhao Pei, Han Song, Li Xiaoxia, Zhu Tong, Tao Xiaofang, and Guo Li. Comparison of RP-3 pyrolysis reactions between surrogates and 45-component model by ReaxFF molecular dynamics simulations. Energy & Fuels, 33(8):7176–7187, 2019. doi: 10.1021/acs.energyfuels.9b01321. [DOI] [Google Scholar]
  • [7].Mori Hazime. Transport, collective motion, and Brownian motion. Progress of Theoretical Physics, 33(3):423–455, 1965. doi: 10.1143/ptp.33.423. [DOI] [Google Scholar]
  • [8].Zwanzig Robert. Nonlinear generalized Langevin equations. Journal of Statistical Physics, 9(3): 215–220, 1973. doi: 10.1007/bf01008729. [DOI] [Google Scholar]
  • [9].Adelman SA. Generalized Langevin Equations and many-body problems in chemical dynamics. In Advances in Chemical Physics, pages 143–253. John Wiley & Sons, Inc., 2007. doi: 10.1002/9780470142639.ch2. [DOI] [Google Scholar]
  • [10].Turq Pierre, Lantelme Frédéric, and Friedman Harold L.. Brownian dynamics: Its application to ionic solutions. The Journal of Chemical Physics, 66(7):3039–3044, 1977. doi: 10.1063/1.434317. [DOI] [Google Scholar]
  • [11].Córdoba Andrés, Indei Tsutomu, and Schieber Jay D.. Elimination of inertia from a generalized Langevin equation: Applications to microbead rheology modeling and data analysis. Journal of Rheology, 56(1):185–212, 2012. doi: 10.1122/1.3675625. [DOI] [Google Scholar]
  • [12].Démery Vincent, Bénichou Olivier, and Jacquin Hugo. Generalized Langevin equations for a driven tracer in dense soft colloids: construction and applications. New Journal of Physics, 16 (5):053032, 2014. doi: 10.1088/1367-2630/16/5/053032. [DOI] [Google Scholar]
  • [13].Wu Yu-Wen and Yu Hsiu-Yu. Adhesion of a polymer-grafted nanoparticle to cells explored using generalized Langevin dynamics. Soft Matter, 14(48):9910–9922, 2018. doi: 10.1039/c8sm01579a. [DOI] [PubMed] [Google Scholar]
  • [14].Ariel Gil and Eric Vanden-Eijnden. Testing transition state theory on Kac-Zwanzig model. Journal of Statistical Physics, 126(1):43–73, 2007. [Google Scholar]
  • [15].Darve E, Solomon J, and Kia A. Computing generalized Langevin equations and generalized Fokker-Planck equations. Proceedings of the National Academy of Sciences, 106(27):10884–10889, 2009. doi: 10.1073/pnas.0902633106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Li Xiantao. A coarse-grained molecular dynamics model for crystalline solids. International Journal for Numerical Methods in Engineering, 83(8–9):986–997, 2010. doi: 10.1002/nme.2892. [DOI] [Google Scholar]
  • [17].Jung Gerhard, Hanke Martin, and Schmid Friederike. Iterative reconstruction of memory kernels. Journal of Chemical Theory and Computation, 13(6):2481–2488, 2017. doi: 10.1021/acs.jctc.7b00274. [DOI] [PubMed] [Google Scholar]
  • [18].Guàrdia E and Padró JA. Generalized Langevin dynamics simulation of interacting particles. The Journal of Chemical Physics, 83(4):1917–1920, 1985. doi: 10.1063/1.449379. [DOI] [Google Scholar]
  • [19].Lei Huan, Caswell Bruce, and George Em Karniadakis. Direct construction of mesoscopic models from microscopic simulations. Physical Review E, 81(2), 2010. doi: 10.1103/physreve.81.026704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Shugard Mary, Tully John C., and Nitzan Abraham. Dynamics of gas-solid interactions: Calculations of energy transfer and sticking. The Journal of Chemical Physics, 66(6):2534–2544, 1977. doi: 10.1063/1.434249. [DOI] [Google Scholar]
  • [21].Zhu Yuanran, Dominy Jason M., and Venturi Daniele. On the estimation of the Mori-Zwanzig memory integral. Journal of Mathematical Physics, 59(10):103501, 2018. doi: 10.1063/1.5003467. [DOI] [Google Scholar]
  • [22].Lei Huan, Baker Nathan A., and Li Xiantao. Data-driven parameterization of the generalized Langevin equation. Proceedings of the National Academy of Sciences, 113(50):14183–14188, 2016. doi: 10.1073/pnas.1609587113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Kubo R. The fluctuation-dissipation theorem. Reports on Progress in Physics, 29(1):255–284, 1966. doi: 10.1088/0034-4885/29/1/306. [DOI] [Google Scholar]
  • [24].Lin Kevin K and Lu Fei. Data-driven model reduction, Wiener projections, and the Mori-Zwanzig formalism. arXiv preprint arXiv:1908.07725, 2019. [Google Scholar]
  • [25].Lu Fei, Lin Kevin, and Chorin Alexandre. Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Communications in Applied Mathematics and Computational Science, 11(2):187–216, 2016. [Google Scholar]
  • [26].Ma Chao, Wang Jianchun, et al. Model reduction with memory and the machine learning of dynamical systems. arXiv preprint arXiv:1808.04258, 2018. [Google Scholar]
  • [27].Russo Antonio, Durán-Olivencia Miguel A, Kevrekidis Ioannis G, and Kalliadasis Serafim. Deep learning as closure for irreversible processes: A data-driven generalized Langevin equation. arXiv preprint arXiv:1903.09562, 2019. [Google Scholar]
  • [28].Zhu Yuanran and Venturi Daniele. Faber approximation of the Mori-Zwanzig equation. Journal of Computational Physics, 372:694–718, 2018. [Google Scholar]
  • [29].Zhu Yuanran and Venturi Daniele. Generalized langevin equations for systems with local interactions. Journal of Statistical Physics, 2020. doi: 10.1007/s10955-020-02499-y. [DOI] [Google Scholar]
  • [30].Lee Hee Sun, Ahn Surl-Hee, and Darve Eric F.. The multi-dimensional generalized Langevin equation for conformational motion of proteins. The Journal of Chemical Physics, 150(17), 2019. doi: 10.1063/1.5055573. [DOI] [PubMed] [Google Scholar]
  • [31].Lange Oliver F. and Grubmüller Helmut. Collective Langevin dynamics of conformational motions in proteins. The Journal of Chemical Physics, 124(21):214903, 2006. doi: 10.1063/1.2199530. [DOI] [PubMed] [Google Scholar]
  • [32].Chen Minxin, Li Xiantao, and Liu Chun. Computation of the memory functions in the generalized Langevin models for collective dynamics of macromolecules. The Journal of Chemical Physics, 141(6):064112, 2014. doi: 10.1063/1.4892412. [DOI] [PubMed] [Google Scholar]
  • [33].Linz P. Numerical methods for Volterra integral equations of the first kind. The Computer Journal, 12(4):393–397, 1969. doi: 10.1093/comjnl/12.4.393. [DOI] [Google Scholar]
  • [34].Davies Brian. Integral Transforms and Their Applications. Springer, 2010. [Google Scholar]
  • [35].Hudson Thomas and Li Xingjie Helen. Coarse-graining of overdamped Langevin dynamics via the Mori-Zwanzig formalism. arXiv e-prints, art. arXiv:1810.08175, 2018. [Google Scholar]
  • [36].Zhu Yuanran and Venturi Daniele. Hypoellipticity and the Mori-Zwanzig formulation of stochastic di erential equations. arXiv e-prints, art. arXiv:2001.04565, 2020. [Google Scholar]
  • [37].Berendsen HJC, van der Spoel D, and van Drunen R. GROMACS: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1–3):43–56, 1995. doi: 10.1016/0010-4655(95)00042-e. [DOI] [Google Scholar]
  • [38].Wang Junmei, Wolf Romain M., Caldwell James W., Kollman Peter A., and Case David A.. Development and testing of a general AMBER force field. Journal of Computational Chemistry, 25(9):1157–1174, 2004. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  • [39].Hoover William G.. Canonical dynamics: equilibrium phase-space distributions. Physical Review A, 31(3):1695–1697, 1985. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
  • [40].Darden Tom, York Darrin, and Pedersen Lee. Particle mesh Ewald: An N · log(N) method for Ewald sums in large systems. The Journal of Chemical Physics, 98(12):10089–10092, 1993. doi: 10.1063/1.464397. [DOI] [Google Scholar]
  • [41].Hess Berk, Bekker Henk, Berendsen Herman J. C., and Fraaije Johannes G. E. M.. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, 18(12), 1997. doi: . [DOI] [Google Scholar]
  • [42].Ma Lina, Li Xiantao, and Liu Chun. The derivation and approximation of coarse-grained dynamics from Langevin dynamics. The Journal of Chemical Physics, 145(20):204117, 2016. doi: 10.1063/1.4967936. [DOI] [PubMed] [Google Scholar]

RESOURCES