Abstract
Complex spatiotemporal dynamics of physicochemical processes are often modeled at a microscopic level (through, e.g., atomistic, agent-based, or lattice models) based on first principles. Some of these processes can also be successfully modeled at the macroscopic level using, e.g., partial differential equations (PDEs) describing the evolution of the right few macroscopic observables (e.g., concentration and momentum fields). Deriving good macroscopic descriptions (the so-called “closure problem”) is often a time-consuming process requiring deep understanding/intuition about the system of interest. Recent developments in data science provide alternative ways to effectively extract/learn accurate macroscopic descriptions approximating the underlying microscopic observations. In this paper, we introduce a data-driven framework for the identification of unavailable coarse-scale PDEs from microscopic observations via machine-learning algorithms. Specifically, using Gaussian processes, artificial neural networks, and/or diffusion maps, the proposed framework uncovers the relation between the relevant macroscopic space fields and their time evolution (the right-hand side of the explicitly unavailable macroscopic PDE). Interestingly, several choices equally representative of the data can be discovered. The framework will be illustrated through the data-driven discovery of macroscopic, concentration-level PDEs resulting from a fine-scale, lattice Boltzmann level model of a reaction/transport process. Once the coarse evolution law is identified, it can be simulated to produce long-term macroscopic predictions. Different features (pros as well as cons) of alternative machine-learning algorithms for performing this task (Gaussian processes and artificial neural networks) are presented and discussed.
The behavior of microscopic complex systems is often described in terms of effective, macroscopic governing equations, leading to simple and efficient prediction. Yet, the discovery/derivation of such macroscopic governing equations generally relies on deep understanding and prior knowledge about the system, as well as extensive and time-consuming mathematical justification. Recent developments in data-driven computational approaches suggest alternative ways toward uncovering useful coarse-scale governing equations directly from fine-scale data. Interestingly, even deciding what the “right” coarse-scale variables are may present a significant challenge. In this paper, we introduce and implement a framework for systematically extracting coarse-scale observables from microscopic/fine-scale data and for discovering the underlying governing equations using machine-learning techniques (e.g., Gaussian processes and artificial neural networks) enhanced by feature selection methods. Intrinsic representations of the coarse-scale behavior via manifold learning techniques (in particular, diffusion maps) generating alternative possible forms of the governing equations are also explored and discussed.
I. INTRODUCTION
The successful description of the spatiotemporal evolution of complex systems typically relies on detailed mathematical models operating at a fine scale (e.g., molecular dynamics, agent-based, stochastic, or lattice-based methods). Such microscopic, first-principles models, keeping track of the interactions between huge numbers of microscopic level degrees of freedom, typically lead to prohibitive computational cost for large-scale spatiotemporal simulations.
To address this issue (and since we are typically interested in macroscale features—pressure drops, reaction rates—rather than the position and velocity of each individual molecule), reduced, coarse-scale models are developed and used, leading to significant computational savings in large-scale spatiotemporal simulations.1
Macroscopically, the fine-scale processes may often be successfully modeled using partial differential equations (PDEs) in terms of the right macroscopic observables (“coarse variables”: not molecules and their velocities, say, but rather pressure drops and momentum fields). Deriving the macroscopic PDE that effectively models the microscopic physics (the so-called “closure problem”) requires, however, deep understanding/intuition about the complex system of interest and often extensive mathematical operations; the discovery of macroscopic governing equations is typically a difficult and time-consuming process.
To bypass the first-principles discovery of a macroscopic PDE directly, several data-driven approaches provide ways to effectively determine good coarse observables and the approximate coarse-scale relations between them from simulation data. In the early 1990s, researchers (including our group) employed artificial neural networks for system identification (both lumped and distributed).2–6 Projective time integration in dynamical systems7 and fluid dynamics8,9 also provides a good data-driven approximation of long-time prediction based not on closed-form equations, but rather on a “black box” simulator. Furthermore, the equation-free framework for designing fine-scale computational experiments and systematically processing their results through “coarse time-steppers” has proven its usefulness/computational efficiency in analyzing macroscopic bifurcation diagrams.10,11 The easy availability of huge simulation data sets, and recent developments in the efficient implementation of machine-learning algorithms, has made the revisiting of the identification of nonlinear dynamical systems from simulation time series an attractive—and successful—endeavor. Working with observations at the macroscopic level, hidden macroscopic PDEs can be recovered directly by artificial neural networks6 (see also Ref. 12). Sparse identification of nonlinear dynamics (SINDy)13 as well as Gaussian processes14,15 have also been successfully used, resulting in explicit data-driven PDEs. All these approaches rely on macroscopic observations.
In this paper, we discuss the identification of unavailable coarse-scale PDEs from fine-scale observations through a combination of machine learning and manifold learning algorithms. Specifically, using Gaussian processes, artificial neural networks, and/or diffusion maps, and starting with candidate coarse fields (e.g., densities), our procedure extracts relevant macroscopic features (e.g., coarse derivatives) from the data and then uncovers the relations between these macroscopic features and their time evolution (the right-hand side of the explicitly unavailable macroscopic PDE).
To effectively reduce the input data domain, we employ two feature selection methods: (1) a sensitivity analysis via Automatic Relevance Determination (ARD)16–18 in Gaussian processes and (2) a manifold learning technique, diffusion maps.19 Having selected the relevant macrofeatures in terms of which the evolution can be modeled, we employ two machine-learning algorithms to approximate a “good” right-hand side of the underlying PDEs: (1) Gaussian process regression and (2) artificial neural networks.
Our framework is illustrated through the data-driven discovery of the macroscopic, concentration-level PDE resulting from a fine-scale, lattice Boltzmann (LB) model of a reaction/transport process [the FitzHugh-Nagumo (FHN) process in one spatial dimension]. Long-term macroscopic prediction is enabled by numerical simulation of the coarse-scale PDE identified from the lattice Boltzmann data. Different possible feature combinations (leading to different realizations of the same evolution) will also be discussed.
The remainder of the paper is organized as follows: In Sec. II, we present an overview of our proposed framework and briefly review theoretical concepts of Gaussian process regression, artificial neural networks, and diffusion maps. Two methods for feature selection are also presented. In Sec. III, we describe two simulators at different scales: (1) the FitzHugh-Nagumo model at the macroscale and (2) its lattice Boltzmann realization at the microscale. In Sec. IV, we demonstrate the effectiveness of our proposed framework and discuss the advantages and challenges of different feature selection methods and regression models for performing this task. In Sec. V, we summarize our results and discuss open issues for further development of the data-driven discovery of the underlying coarse PDE from microscopic observations.
II. FRAMEWORK FOR RECOVERING A COARSE-SCALE PDE VIA MACHINE LEARNING
A. Overview
The workflow of our framework for recovering hidden coarse-scale PDEs from microscopic observations is schematically shown in Figs. 1 and 2. Specifically, this framework consists of two subsections: (1) computing coarse-scale observables and (2) identifying coarse-scale PDEs and then numerically simulating them.
To clarify the algorithm, consider a single field (the activator ; later in this paper, we will use two densities, and , for the activator and the inhibitor, respectively). As shown in Fig. 1, we compute the coarse-scale observable (here, the concentration field) through the zeroth moment of the microscopic LB simulation [averaging the particle distribution functions () on a given lattice point, see Sec. III B for more details].
Given the coarse-scale observable, we estimate its time derivative and several of its spatial derivatives (e.g., , , and ), typically using finite difference schemes in time and space as necessary. A PDE of the form is a relation between the time derivative and a number of spatial derivatives; this relation holds at every moment in time and every point in space. For a simple reaction-diffusion equation, say, , the data triplets will in general lie on a two-dimensional manifold in a three-dimensional space, since is a function of and . Knowing that this manifold is two-dimensional suggests (in the spirit of the Whitney and Takens embedding theorems20,21) that any five generic observables suffice to create an embedding—and thus learn , a function on the manifold, as a function of these five observables. One might choose, for example, as observables the values of at any five spatial points at a given time moment, possibly the five points used in a finite difference stencil for estimating spatial derivatives. In the study of time series through delay embeddings, one uses observations on a temporal stencil; it is interesting that here, one might use observations on a spatial stencil—encoding information analogous to spatial derivatives (see Ref. 12). Motivated by this perspective, in order to learn the time derivative , we use an original input data domain including several (say, all up to some order) spatial derivatives. We also consider the selection of a reduced input data domain via two feature selection methods: (1) a sensitivity analysis by automatic relevance determination (ARD) in Gaussian processes16,18,22 and (2) a manifold learning approach, diffusion maps,23,24 with a regression loss (see Sec. IV B in more detail). Then, we consider two different machine-learning methods (Gaussian process regression and artificial neural networks) to learn based on the selected feature input data domain.
After training, simulation of the learned coarse-scale PDE given a coarse initial condition can proceed with any acceptable discretization scheme in time and space (from simple finite differences to, say, high order spectral or finite element methods).
B. Gaussian process regression
One of the two approaches we employ to extract dominant features and uncover the RHS of coarse-scale PDEs is Gaussian process regression. In Gaussian processes, to represent a probability distribution over target functions (here, the time derivative), we assume that our observations are a set of random variables whose finite collections have a multivariate Gaussian distribution with an unknown mean (usually set to zero) and an unknown covariance matrix . This covariance matrix is commonly formulated by a Euclidean distance-based kernel function in the input space, whose hyperparameters are optimized by training data. Here, we employ a radial basis kernel function (RBF), which is the de facto default kernel function in Gaussian process regression, with ARD,17
(1) |
where is a dimensional vector of hyperparameters and is the number of dimensions of the input data domain. The optimal hyperparameter set can be obtained by minimizing a negative log marginal likelihood with the training data set ,
(2) |
where is the number of training data points and and represent the variance of the (Gaussian) observation noise and an identity matrix, respectively.
To find the Gaussian distribution of the function values (here, the time derivative) at test data points, we represent the multivariate Gaussian distribution with the covariance matrix constructed by Eq. (1) as
(3) |
where is a predictive distribution for test data and represents a covariance matrix between training and test data while represents a covariance matrix between test data.
Finally, we represent a Gaussian distribution for time derivatives at the test point in terms of a predictive mean and its variance, through conditioning a multivariate Gaussian distribution as
(4) |
(5) |
and we assign the predictive mean () as the estimated time derivative for the corresponding data point.
C. Artificial neural networks
Next, we consider (artificial, possibly deep) neural networks (ANNs or NNs or DNNs) for identifying the RHS of coarse-scale PDEs. Generally, neural networks consist of an input layer, one or more hidden layers, and an output layer, all comprising several computational neurons, typically fully connected by weights (), biases (), and an activation function []. Macroscopic observables and their spatial derivatives are assigned at the input layer, while the corresponding time derivative is obtained at the output layer (here, we are considering only first order PDEs in time; higher order equations, like the wave equation, involving second derivatives in time can also be accounted for within the framework). In (feedforward) neural networks, a universal approximation theorem25 guarantees that for a single hidden layer with (sufficient) a finite number of neurons, an approximate realization of the target function, , can be found. Here, approximation implies that the target and learned functions are sufficiently close in an appropriately chosen norm (). The approximate form of the target function obtained through the feedforward neural net can be written as
(6) |
The root-mean-square error cost function,
(7) |
typically measures the goodness of the approximation.
In order to obtain a generalizable network, with good performance on the test data set as well as on the training data set (e.g., preventing overfitting), several regularization approaches have been proposed, mostly relying on modifications of the cost function. Foresee and Hagan26 showed that modifying the cost function by adding the regularization term results in a network that will maximize the posterior probability based on Bayes’ rule. We thus trained our network based on a total cost function of the form
(8) |
in which and are network tuning parameters. Here, we employ Bayesian regularized backpropagation for training, which updates weight and bias values through Levenberg-Marquardt optimization;27 we expect that, for our data, comparable results would be obtained through other modern regularization/optimization algorithms.
D. Diffusion maps
Diffusion maps (DMAPs) have successfully been employed for dimensionality reduction and nonlinear manifold learning.23,24,28,29 The diffusion maps algorithm can guarantee, for data lying on a smooth manifold—and at the limit of infinite data—that the eigenvectors of the large normalized kernel matrices constructed from the data converge to the eigenfunctions of the Laplace-Beltrami operator on the manifold on which the data lie. These eigenfunctions can also provide nonlinear parametrizations (i.e., sets of coordinates) for such (Riemannian) manifolds. To approximate the Laplace-Beltrami operator from scattered data points on the manifold, a normalized diffusion kernel matrix between observation (data) points is commonly used,
(9) |
where are real-valued observations and is the kernel width. After that, one obtains a normalized matrix by
(10) |
where is a diagonal matrix whose entry is the sum of corresponding row of . Here, is a tuning parameter: corresponds to the classical normalized graph Laplacian,30,31 while , which takes into account the local data density, yields the Laplace-Beltrami operator;24 in this paper, we set . Then, is calculated simply as
(11) |
where is a diagonal matrix whose entry is the sum of the corresponding row of .
Finally, an embedding of the manifold is constructed by the first few (say, ) nontrivial eigenvectors of ,
(12) |
where corresponds to the number of diffusion steps (here, ) with descending ordered eigenvalues .
Instead of using the Euclidean distance between the data points in the diffusion map algorithm, it has recently been proposed to use a different, kernel-based similarity metric;32 the associated kernel-based embeddings allow for control of the distortion of the resulting embedding manifold; we will return to this and its possible implications in our Conclusions (Sec. V).
E. Feature selection
Describing the coarse-scale spatiotemporal dynamics in the form of a (translationally invariant) PDE involves learning the local field time derivatives as a function of a few, relevant local field spatial derivatives. Starting with a “full” input data domain consisting of all local field values as well as all their coarse-scale spatial derivatives (up to some order), we must extract the few “relevant” spatial derivatives as dominant features of this input data domain. Such feature selection will typically reduce the dimensionality of the input data domain. Among various feature selection methods, we employ two algorithms based on (1) sensitivity analysis via ARD in Gaussian processes16,18,22 and (2) manifold parametrization through input-output-informed diffusion maps.19
First, we employ sensitivity analysis via automatic relevance determination (ARD) in Gaussian processes, which effectively reduces the number of input data dimensions. In Gaussian processes, we obtain the optimal hyperparameter set by minimizing a negative log marginal likelihood [see Eq. (2)]. ARD assigns a different hyperparameter for each input dimension . As can be seen in Eq. (1), a large value of nullifies the difference between target function values along the dimension, allowing us to designate this dimension as “insignificant.” Practically, we select the input dimensions with relatively small to build a reduced input data domain, which can still successfully represent the approximation of the right-hand side on the underlying PDEs.
Alternatively, we employ a manifold learning technique to find the intrinsic representation of the coarse-scale PDE and then examine the relation between these intrinsic coordinates and given input features (spatial field derivatives). Specifically, diffusion maps will provide an intrinsic parametrization of the combined input-output data domain (here, for and for ). Selecting leading intrinsic coordinates (eliminating higher harmonic eigenfunctions), we can then find the lowest-dimensional embedding space for the PDE manifold (the manifold embodying and as a function of the embedding intrinsic coordinates). We then test several combinations of subsets of the input domain coordinates (spatial derivatives) as to their ability to parametrize the discovered intrinsic embedding coordinates. Each set of such inputs, which successfully parametrize the intrinsic embedding coordinates (see Fig. 3), provides us a new possibility of learning a PDE formulation that describes the spatiotemporal dynamics of our observation data set.
In principle, any subset of intrinsic coordinates that successfully parametrize the manifold can be used to learn functions on the manifold and, in particular, and . The success of any particular subset of leading intrinsic coordinates in describing and is confirmed through regression, via a mean-squared-error loss ().
Next, we investigate which set of features of the input domain (which sets of spatial derivatives) can be best used to parametrize the intrinsic embedding (and thus learn the PDE right-hand side). One can find the subset of features from a user-defined dictionary (here, spatial derivatives) to parametrize the intrinsic embedding coordinates through a linear group LASSO.33 In this paper, we examine several combinations of input domain variables and find subsets that can minimally parametrize the intrinsic embedding; this is quantified through a total regression loss () based on a mean-squared error as
(13) |
where represents a GP regression loss for representing the intrinsic coordinate using selected input features and represents the number of intrinsic coordinates we chose.
ARD for Gaussian processes suggests the “best” input domain subset in terms of which we will try and predict and . In the manifold learning context, we may find several different input subsets capable of parametrizing the manifold on which the observed behavior lies. Different minimal parametrizing subsets will lead to different (but, in principle, on the data, equivalent) right-hand sides for the PDE evolution. One expects that some of them will be “better conditioned” (have better Lipschitz constants) than others.
III. DIFFERENT SCALE SIMULATORS FOR ONE-DIMENSIONAL REACTION-DIFFUSION SYSTEMS
A. Macroscale simulator: FitzHugh-Nagumo model
To describe a one-dimensional reaction-diffusion system that involves an activator and an inhibitor , the FitzHugh-Nagumo model consists of two coupled reaction-diffusion partial differential equations,
(14) |
where and are model parameters, represents a kinetic bifurcation parameter, and and represent diffusion coefficients for and , respectively. Here, we set these parameters to , , , , and .11 We discretize a spatial domain on with and a time domain on with , respectively. We impose homogeneous Neumann boundary conditions at both boundaries and solve these equations (for various initial conditions) numerically via the finite element method using the COMSOL Multiphysics® software.34
B. Microscale simulator: The lattice Boltzmann model
We also introduce a lattice Boltzmann model (LBM),35,36 which can be thought of as a mesoscopic numerical scheme for describing spatiotemporal dynamics using finite-difference-type discretizations of Boltzmann-Bhatnagar-Gross-Krook (BGK) equations,37 retaining certain advantages of microscopic particle models. In this paper, the lattice Boltzmann model is our fine-scale “microscopic simulator,” and its results are considered to be “the truth” from which the coarse-scale PDE will be learned.
The time evolution of the particle distribution function on a given lattice can be described by
(15) |
where a superscript indicates the activator and the inhibitor and represents a collision term defined by Bhatnagar-Gross-Krook (BGK),37
(16) |
where represents a BGK relaxation coefficient defined as38
(17) |
To compute our coarse-scale observables and , we employ the D1Q3 model, which uses three distribution functions on the one-dimensional lattice as for each density (totalling 6 distribution functions). Through the zeroth moment (in the velocity directions) of the overall distribution function, finally, we compute the coarse-scale observable and as
(18) |
Based on spatially uniform local diffusion equilibrium, for which is homogeneous in all velocity directions, the weights are chosen all equal to 1/3,
(19) |
Thus, the reaction terms in Eq. (15) are modeled by
(20) |
All model parameters () are the same as the FHN PDE. The corresponding spatiotemporal behavior of these coarse observables and is shown in Figs. 4(a) and 4(c), while the FHN PDE simulation for the same coarse initial conditions is shown in Figs. 4(b) and 4(d).
IV. RESULTS
A. Learning without feature selection
We begin by considering our proposed framework without feature selection, so as to later contrast with the results including feature selection. The data come from the fine-scale lattice Boltzmann simulation. For the parameter values selected, the long-term dynamics of the LB simulation lie, for all practical purposes, on (or very close to) a stable time-periodic solution. To predict the coarse time derivatives and , we collect training data from five different initial conditions near this stable periodic solution (see Fig. 5) with the following LB spatiotemporal discretization—in space, 99 discretized points on with and in time, 451 discretized points on with for each initial condition. Since our data come from the fine-scale LB code, we need to initialize at the fine, LB scale of particle distribution functions (and not just of the concentrations and ). To initialize the particle distribution functions in the lattice Boltzmann model, we apply the equal weights rule, for , , and , motivated by near-equilibrium considerations. We do expect that such initialization features will soon be “forgotten” as higher distribution moments become quickly slaved to the lower (here, the zeroth) moments (see, for example, Ref. 39). To ensure that our results are not affected by the initialization details, we only start collecting training data after relaxation by short time simulation (here, 2000 time steps with or ) (see the Appendix). We estimate the local coarse fields and their (several) spatial and temporal derivatives through finite differences and then apply machine-learning algorithms (here, Gaussian processes as well as neural networks) to learn the time derivatives of the activator and the inhibitor using as input variables the local , and all their spatial derivatives up to and including order two (),
(21) |
Specifically, for the neural network approach, we build two different networks, one for the prediction of the activator and one for the inhibitor. For both the activator and the inhibitor, we set 2 hidden layers each consist of 6 neurons using a hyperbolic tangent sigmoid activation function; as mentioned above, we use Levenberg-Marquardt optimization with a Bayesian regularization (see Sec. II C). Both networks use the mean-squared error as their loss function. For Gaussian processes, we employ a radial basis kernel function with ARD [see Eq. (1)]. Regression results obtained by each of the two methods for the time derivatives in the training data set are shown in Fig. 6. Both methods provide good approximations of the target time derivatives and . Given the test coarse initial condition (black curves in Fig. 5), simulation results with the learned PDE from to with and their normalized absolute differences from the “ground truth” LB simulations are shown in Fig. 7. The order of magnitude of these absolute differences for both models is the same as those between the LB FHN and the explicitly known FHN PDE [see Figs. 4(e) and 4(f)].
B. Learning with feature selection
Now, we consider the possibility of feature selection in an attempt to learn the RHS of coarse-scale PDEs with a minimal number of input domain variables (spatial derivatives). First, we apply the sensitivity analysis via ARD in the case of Gaussian process approximation. The optimal ARD weights () for and are tabulated in Table I. has three relatively small weights for (), and has also three relatively small weights for (). It is interesting to observe that the selected features via ARD are the same as those in the explicitly known FHN PDE [see Eq. (14)]. This shows that ARD can effectively guide in selecting the appropriate dimensionality of the input data domain, resulting here in the same spatial derivative choices as in the explicitly known FHN PDE. Now, we use the reduced input data domain () for and () for to recover the RHS of the coarse-scale PDEs as
(22) |
Regression results of our two methods for the time derivatives are shown in Fig. 8. Results of long time simulation of the learned PDEs by each method, from to , as well as normalized absolute differences from the simulation of the “ground truth” LB are shown in Fig. 9.
TABLE I.
u | ux | uxx | v | vx | vxx | |
---|---|---|---|---|---|---|
ut | 5.28 × 1000 | 4.23 × 1006 | 9.13 × 1002 | 2.13 × 1003 | 5.32 × 1008 | 4.78 × 1007 |
vt | 1.33 × 1002 | 6.69 × 1006 | 1.94 × 1006 | 5.09 × 1002 | 4.20 × 1006 | 1.75 × 1002 |
The two machine-learning methods operating with a reduced input data domain can still provide good approximations of the time derivatives and of the resulting dynamics. The order of magnitude of these absolute differences is effectively the same as the difference of the FHN LB from the explicitly known FHN PDE. It is, therefore, clear that our framework effectively recovers the coarse-scale PDE from fine-scale observation data; the difference is that the right-hand side of the PDE is now given in terms of the ANN right-hand side, or in terms of the observed data and the GP kernel/hyperparameters, rather than the simple algebraic formula of Eq. (14).
Next, we consider an alternative approach for feature selection, via our manifold learning technique, diffusion maps. The best candidate set among different combinations of intrinsic coordinates (varying the number of leading independent intrinsic dimensions and recording the corresponding Gaussian process regression loss) is shown in Table II. Since the three-dimensional intrinsic embedding space exhibits a (tiny) regression loss of order , we choose an input domain for consisting of () as shown in Fig. 10(a). For , by the same token, we choose the three-dimensional embedding space consisting of () as shown in Fig. 10(b). Based on these identified intrinsic embedding spaces, we examined several subsets of input domain features (spatial derivatives) using the total loss of Eq. (13). “Good” subsets of input features (those that result in small regression losses with minimal input dimension) are presented in Table III. Clearly, different choices of such input feature subsets can give comparable total losses; this suggests that we may construct different right-hand sides of the unknown coarse-scale PDE, which are comparably successful in representing the observed dynamics.
TABLE II.
Optimal intrinsic coordinates | Regression loss (L) | |||
---|---|---|---|---|
ut | vt | ut | vt | |
1d | () | () | 4.60 × 10−04 | 7.69 × 10−06 |
2d | () | () | 1.40 × 10−06 | 1.50 × 10−06 |
3d | () | () | 2.18 × 10−08 | 4.74 × 10−08 |
4d | () | () | 1.64 × 10−08 | 5.71 × 10−09 |
TABLE III.
Features | Total loss (LT) | Features | Total loss (LT) | |
---|---|---|---|---|
1d | (u) | 6.51 × 10−05 | (u) | 7.93 × 10−05 |
2d | (u, v) | 1.65 × 10−08 | (u, v) | 1.49 × 10−05 |
3d | (u, uxx, v) | 6.52 × 10−09 | (u, v, vxx) | 3.32 × 10−07 |
(u, ux, v) | 7.39 × 10−09 | (u, ux, vxx) | 6.21 × 10−07 | |
4d | (u, ux, uxx, v) | 2.68 × 10−09 | (u, v, vx, vxx) | 4.47 × 10−09 |
The good candidates for and identified this way, consisting of three input features, are () and (); they are the same as those found from GPs via ARD and also the same as the ones in the explicitly known FHN PDE. Interestingly, another possible alternative candidate set is also identified: () for and () for .
Using these alternative candidate feature sets, we model different “versions” of what, on the data, is effectively the same coarse-scale PDE. The “alternative” version of the PDE can be symbolically written as
(23) |
and the corresponding regression results of the time derivatives are shown in Fig. 11. Specifically, we use the first spatial derivative instead of the second spatial derivative for learning .
As shown in Fig. 12, both models provide good predictions of the “ground truth” LB simulations; we observe, however, that the accuracy of the neural network based predictions is enhanced. These results confirm that, on the data, alternative coarse-scale PDE forms can provide successful macroscopic description.
To further explore this possibility of alternative PDE forms that represent the observed data with qualitatively comparable accuracy, we also explored the efficacy of a third coarse-scale PDE description, in terms of a yet different input feature set: for and for so that the PDE can symbolically be written as
(24) |
The corresponding prediction results of the time derivatives are shown in Fig. 13.
As shown in Fig. 13, both regression methods provide an inaccurate approximation of near ; the order of magnitude of this error is . The long-term prediction results are not as accurate representations of the ground truth LB simulation as the previous two coarse-scale PDE realizations; yet they may still be qualitatively informative. Normalized absolute differences of long-time simulation for both machine-learning methods are shown in Fig. 14. As was the case in the previous alternative PDE realizations, the NN model appears more accurate than the GP one.
To compare our identified coarse-scale PDEs with the explicitly known FHN PDE [see Eq. (14)], we also compare the predictions of our coarse-scale PDEs to those of the FHN PDE via mean normalized absolute differences for the test coarse initial condition followed from to as
(25) |
where is the total number of data points and and represent simulation results of the FHN PDE, respectively. The comparison of these representative simulations of our various coarse-scale PDEs is summarized in Table IV. The differences across our various coarse-scale identified PDEs are of order and below, comparable to the difference between each of them and the FHN PDE. Specifically, “feature selection 1” (Fig. 9), whose variables are the same as those of the explicit FHN PDE, provides the best PDE realization via both the GP and the NN models.
TABLE IV.
MNADu | MNADv | |
---|---|---|
No feature selection with GPs | 1.59 × 10−02 | 1.62 × 10−02 |
No feature selection with NNs | 1.53 × 10−02 | 1.56 × 10−02 |
Feature selection 1 with GPs | 1.58 × 10−02 | 1.62 × 10−02 |
Feature selection 1 with NNs | 1.54 × 10−02 | 1.57 × 10−02 |
Feature selection 2 with GPs | 2.39 × 10−02 | 2.20 × 10−02 |
Feature selection 2 with NNs | 2.00 × 10−02 | 2.11 × 10−02 |
Feature selection 3 with GPs | 3.20 × 10−02 | 3.31 × 10−02 |
Feature selection 3 with NNs | 2.08 × 10−02 | 2.16 × 10−02 |
V. CONCLUSION
In this paper, we demonstrated the data-driven discovery of macroscopic, concentration-level PDEs for reaction/transport processes resulting from fine-scale observations (here, from simulations of a lattice Boltzmann mesoscopic model). Long-term macroscopic prediction is then obtained by simulation of the identified (via machine-learning methods) coarse-scale PDE. We explored the effect of input feature selection capability on the effectiveness of our framework to identify the underlying macroscopic PDEs.
Our framework suggests four different PDEs (one without and three with feature selection), all comparable with the explicit FitzHugh-Nagumo PDE on the data: all of them provide good approximations of sample coarse-scale dynamic trajectories. The FHN PDE terms have a well-established mechanistic physical meaning (reaction and diffusion); it would be interesting to explore if any physical meaning can be ascribed to our alternative parametrizations of the right-hand side of the coarse-scale evolution PDE. Clearly, the identified PDE depends on our choice of observables—for example, our diffusion map embedding coordinates. We plan to explore the use of kernel-based embeddings (as discussed in Ref. 32 mentioned above) as an approach that can control the well-posedness of the embedding and the distortion of the resulting manifold; this will clearly affect the identified PDE, and it will be interesting to study the interplay between differently distorted embedding manifolds and different identified approximate PDEs.
In our framework, we employed finite differences to estimate spatial derivatives in the formulation of the PDE. Instead of numerical spatial derivatives, we may use the values of coarse observables at neighboring points directly to uncover the coarse evolution law. The effect of this alternative embedding for the PDE right-hand side, explored in Ref. 12, on the accuracy of the identified model predictions, is the subject of ongoing research.
We believe that the framework we presented is easily generalizable to multiscale and/or multifidelity data. Here, we worked across a single scale gap and a single fine-scale simulator providing the ground truth. We envision that data fusion tools can be combined with the approach to exploit data at several cascaded scales and taking advantage of simulation data from several heterogeneous simulators.9,40
ACKNOWLEDGMENTS
S. Lee, M. Kooshkbaghi, and I. G. Kevrekidis gratefully acknowledge partial support by the National Institutes of Health (NIH) and Defense Advanced Research Projects Agency (DARPA). Discussions of the authors with Dr. Felix Dietrich are gratefully acknowledged.
APPENDIX: “HEALING” FOR THE LATTICE BOLTZMANN MODEL
An important assumption underlying our work is that the fine-scale model can “close” at a coarse-scale level. In our particular case, this means that, even though our fine-scale Lattice Boltzmann model (LBM) evolves particle distribution functions, one can be predictive at the coarse level of the zeroth moments of these functions and the concentrations and of the activator and the inhibitor. The hypothesis that allows this reduction is that the problem is singularly perturbed in time: higher order moments of these distribution functions become quickly slaved to the “slow,” governing, zeroth order moment fields. Yet, while initializing the FHN PDE only requires spatial profiles of and at the initial time, initializing the full FHN LBM requires initial conditions for all the evolving particle distributions. Constructing such detailed fine-scale initializations consistent with the coarse initial conditions is an important conceptual component of equation-free computation; the term used is “lifting.”7,41
Here, in lifting the coarse-scale observable (a concentration field ) to the microscopic description (particle distribution function ) on each node, we employ an equal weight rule, i.e., the three particle distribution function values at the node are chosen to be
(A1) |
This equal weight choice (the local, spatially uniform diffusive equilibrium distribution) is not, in general, consistent with the (spatially nonuniform and not simply diffusive) macroscopic PDE model (here, the FitzHugh-Nagumo PDEs); yet, we expect that the fine-scale simulation features will become rapidly slaved to the local concentration field.39 To estimate the appropriate slaving/relaxation time, we compare the norm of the density predicted by two differently initialized LBM simulations: the one that lies on the long-term stable limit cycle and the one that results from it by retaining the coarse state, but perturbing the associated fine-scale states according to the local diffusive equilibrium rule [for in Eq. (20)].
To explore the slaving time scale, we trace the norm of the difference between the simulations resulting from these two initializations. This norm is defined as
(A2) |
where and represent the density with equal weights and the density without equal weights (reference solution), respectively. As shown in Fig. 15, after a fast transient oscillation of (for ), the norm decays smoothly until . There is still a small inherent bias (the trajectory will come back to a nearby point along the limit cycle); this does not affect our estimate of the slaving time. We, therefore, chose a relaxation time (or 2000 LB time steps), and we started collecting coarse observations as training data from our various initializations only after .
Note: This paper is part of the Focus Issue, “When Machine Learning Meets Complex Systems: Networks, Chaos and Nonlinear Dynamics.”
REFERENCES
- 1.Noid W., “Perspective: Coarse-grained models for biomolecular systems,” J. Chem. Phys. 139, 090901 (2013). 10.1063/1.4818908 [DOI] [PubMed] [Google Scholar]
- 2.Hudson J., Kube M., Adomaitis R., Kevrekidis I., Lapedes A., and Farber R., “Nonlinear signal processing and system identification: Applications to time series from electrochemical reactions,” Chem. Eng. Sci. 45, 2075–2081 (1990). 10.1016/0009-2509(90)80079-T [DOI] [Google Scholar]
- 3.Krischer K., Rico-Martinez R., Kevrekidis I., Rotermund H., Ertl G., and Hudson J., “Model identification of a spatiotemporally varying catalytic reaction,” AIChE J. 39, 89–98 (1993). 10.1002/aic.690390110 [DOI] [Google Scholar]
- 4.Rico-Martinez R., Anderson J., and Kevrekidis I., “Continuous-time nonlinear signal processing: A neural network based approach for gray box identification,” in Proceedings of IEEE Workshop on Neural Networks for Signal Processing (IEEE, 1994), pp. 596–605.
- 5.Anderson J., Kevrekidis I., and Rico-Martinez R., “A comparison of recurrent training algorithms for time series analysis and system identification,” Comput. Chem. Eng. 20, S751–S756 (1996). 10.1016/0098-1354(96)00133-0 [DOI] [Google Scholar]
- 6.Gonzalez-Garcia R., Rico-Martinez R., and Kevrekidis I., “Identification of distributed parameter systems: A neural net based approach,” Comput. Chem. Eng. 22, S965–S968 (1998). 10.1016/S0098-1354(98)00191-4 [DOI] [Google Scholar]
- 7.Kevrekidis I. G., Gear C. W., Hyman J. M., Kevrekidid P. G., Runborg O., and Theodoropoulos C., “Equation-free, coarse-grained multiscale computation: Enabling microscopic simulators to perform system-level analysis,” Commun. Math. Sci. 1, 715–762 (2003). 10.4310/CMS.2003.v1.n4.a5 [DOI] [Google Scholar]
- 8.Sirisup S., Karniadakis G. E., Xiu D., and Kevrekidis I. G., “Equation-free/Galerkin-free pod-assisted computation of incompressible flows,” J. Comput. Phys. 207, 568–587 (2005). 10.1016/j.jcp.2005.01.024 [DOI] [Google Scholar]
- 9.Lee S., Kevrekidis I. G., and Karniadakis G. E., “A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion,” J. Comput. Phys. 344, 516–533 (2017). 10.1016/j.jcp.2017.05.021 [DOI] [Google Scholar]
- 10.Siettos C., Gear C., and Kevrekidis I., “An equation-free approach to agent-based computation: Bifurcation analysis and control of stationary states,” Europhys. Lett. 99, 48007 (2012). 10.1209/0295-5075/99/48007 [DOI] [Google Scholar]
- 11.Theodoropoulos C., Qian Y.-H., and Kevrekidis I. G., ““Coarse” stability and bifurcation analysis using time-steppers: A reaction-diffusion example,” Proc. Natl. Acad. Sci. U.S.A. 97, 9840–9843 (2000). 10.1073/pnas.97.18.9840 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bar-Sinai Y., Hoyer S., Hickey J., and Brenner M. P., “Learning data-driven discretizations for partial differential equations,” Proc. Natl. Acad. Sci. U.S.A. 116, 15344–15349 (2019). 10.1073/pnas.1814058116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rudy S. H., Brunton S. L., Proctor J. L., and Kutz J. N., “Data-driven discovery of partial differential equations,” Sci. Adv. 3, e1602614 (2017). 10.1126/sciadv.1602614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Raissi M., Perdikaris P., and Karniadakis G. E., “Machine learning of linear differential equations using Gaussian processes,” J. Comput. Phys. 348, 683–693 (2017). 10.1016/j.jcp.2017.07.050 [DOI] [Google Scholar]
- 15.Raissi M. and Karniadakis G. E., “Hidden physics models: Machine learning of nonlinear partial differential equations,” J. Comput. Phys. 357, 125–141 (2018). 10.1016/j.jcp.2017.11.039 [DOI] [Google Scholar]
- 16.Qi Y. A., Minka T. P., Picard R. W., and Ghahramani Z., “Predictive automatic relevance determination by expectation propagation,” in Proceedings of the Twenty-First International Conference on Machine Learning (ACM, 2004), p. 85.
- 17.Rasmussen C. E. and Williams C. K. I., Gaussian Processes for Machine Learning (MIT Press, 2006). [Google Scholar]
- 18.Wipf D. P. and Nagarajan S. S., “A new view of automatic relevance determination,” in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2008), pp. 1625–1632.
- 19.Holiday A., Kooshkbaghi M., Bello-Rivas J. M., Gear C. W., Zagaris A., and Kevrekidis I. G., “Manifold learning for parameter reduction,” J. Comput. Phys. 392, 419–431 (2019). 10.1016/j.jcp.2019.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Whitney H., “Differentiable manifolds,” Ann. Math. 37, 645–680 (1936). 10.2307/1968482 [DOI] [Google Scholar]
- 21.Takens F., “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980, edited by D. Rand and L.-S. Young (Springer, Berlin, 1981), pp. 366–381.
- 22.Williams C. K. and Rasmussen C. E., “Gaussian processes for regression,” in Advances in Neural Information Processing Systems (MIT Press, 1996), pp. 514–520.
- 23.Coifman R. R., Lafon S., Lee A. B., Maggioni M., Nadler B., Warner F., and Zucker S. W., “Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps,” Proc. Natl. Acad. Sci. U.S.A. 102, 7426–7431 (2005). 10.1073/pnas.0500334102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Coifman R. R. and Lafon S., “Diffusion maps,” Appl. Comput. Harmon. Anal. 21, 5–30 (2006). 10.1016/j.acha.2006.04.006 [DOI] [Google Scholar]
- 25.Cybenko G., “Approximation by superpositions of a sigmoidal function,” Math. Control Signals Syst. 2, 303–314 (1989). 10.1007/BF02551274 [DOI] [Google Scholar]
- 26.Foresee F. D. and Hagan M. T., “Gauss-Newton approximation to Bayesian learning,” in Proceedings of International Conference on Neural Networks (ICNN’97) (IEEE, 1997), Vol. 3, pp. 1930–1935.
- 27.Hagan M. T. and Menhaj M. B., “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. Neural Netw. 5, 989–993 (1994). 10.1109/72.329697 [DOI] [PubMed] [Google Scholar]
- 28.Nadler B., Lafon S., Coifman R. R., and Kevrekidis I. G., “Diffusion maps, spectral clustering and reaction coordinates of dynamical systems,” Appl. Comput. Harmon. Anal. 21, 113–127 (2006). 10.1016/j.acha.2005.07.004 [DOI] [Google Scholar]
- 29.Nadler B., Lafon S., Coifman R., and Kevrekidis I. G., “Diffusion maps—A probabilistic interpretation for spectral embedding and clustering algorithms,” in Principal Manifolds for Data Visualization and Dimension Reduction (Springer, 2008), pp. 238–260.
- 30.Belkin M. and Niyogi P., “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Advances in Neural Information Processing Systems (MIT Press, 2002), pp. 585–591.
- 31.Belkin M. and Niyogi P., “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput. 15, 1373–1396 (2003). 10.1162/089976603321780317 [DOI] [Google Scholar]
- 32.Bittracher A., Klus S., Hamzi B., and Schütte C., “A kernel-based method for coarse graining complex dynamical systems,” e-print arXiv:1904.08622 (2019).
- 33.Meila M., Koelle S., and Zhang H., “A regression approach for explaining manifold embedding coordinates,” e-print arXiv:1811.11891.
- 34.COMSOL, COMSOL Multiphysics (COMSOL AB, Stockholm), see www.comsol.com.
- 35.Chen S. and Doolen G. D., “Lattice Boltzmann method for fluid flows,” Annu. Rev. Fluid Mech. 30, 329–364 (1998). 10.1146/annurev.fluid.30.1.329 [DOI] [Google Scholar]
- 36.Succi S., The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond (Oxford University Press, 2001). [Google Scholar]
- 37.Bhatnagar P. L., Gross E. P., and Krook M., “A model for collision processes in gases. I. Small amplitude processes in charged and neutral one-component systems,” Phys. Rev. 94, 511 (1954). 10.1103/PhysRev.94.511 [DOI] [Google Scholar]
- 38.Qian Y. and Orszag S., “Scalings in diffusion-driven reaction : Numerical simulations by lattice BGK models,” J. Stat. Phys. 81, 237–253 (1995). 10.1007/BF02179978 [DOI] [Google Scholar]
- 39.Van Leemput P., Lust K., and Kevrekidis I. G., “Coarse-grained numerical bifurcation analysis of lattice Boltzmann models,” Physica D 210, 58–76 (2005). 10.1016/j.physd.2005.06.033 [DOI] [Google Scholar]
- 40.Lee S., Dietrich F., Karniadakis G. E., and Kevrekidis I. G., “Linking Gaussian process regression with data-driven manifold embeddings for nonlinear data fusion,” Interface Focus 9, 20180083 (2019). 10.1098/rsfs.2018.0083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kevrekidis I. G. and Samaey G., “Equation-free multiscale computation: Algorithms and applications,” Annu. Rev. Phys. Chem. 60, 321–344 (2009). 10.1146/annurev.physchem.59.032607.093610 [DOI] [PubMed] [Google Scholar]