STENCIL-NET for equation-free forecasting from data

Suryanarayana Maddu; Dominik Sturm; Bevan L Cheeseman; Christian L Müller; Ivo F Sbalzarini

doi:10.1038/s41598-023-39418-6

. 2023 Aug 7;13:12787. doi: 10.1038/s41598-023-39418-6

STENCIL-NET for equation-free forecasting from data

Suryanarayana Maddu ^1,^2,^3,^9,¹⁰, Dominik Sturm ^4,⁵, Bevan L Cheeseman ^1,^2,^3,¹¹, Christian L Müller ^6,^7,⁸, Ivo F Sbalzarini ^1,^2,^3,^9,^✉

PMCID: PMC10406911 PMID: 37550328

Abstract

We present an artificial neural network architecture, termed STENCIL-NET, for equation-free forecasting of spatiotemporal dynamics from data. STENCIL-NET works by learning a discrete propagator that is able to reproduce the spatiotemporal dynamics of the training data. This data-driven propagator can then be used to forecast or extrapolate dynamics without needing to know a governing equation. STENCIL-NET does not learn a governing equation, nor an approximation to the data themselves. It instead learns a discrete propagator that reproduces the data. It therefore generalizes well to different dynamics and different grid resolutions. By analogy with classic numerical methods, we show that the discrete forecasting operators learned by STENCIL-NET are numerically stable and accurate for data represented on regular Cartesian grids. A once-trained STENCIL-NET model can be used for equation-free forecasting on larger spatial domains and for longer times than it was trained for, as an autonomous predictor of chaotic dynamics, as a coarse-graining method, and as a data-adaptive de-noising method, as we illustrate in numerical experiments. In all tests, STENCIL-NET generalizes better and is computationally more efficient, both in training and inference, than neural network architectures based on local (CNN) or global (FNO) nonlinear convolutions.

Subject terms: Applied mathematics, Computer science, Computational science

Introduction

Often in science and engineering, measurement data from a dynamical processes in space and time are available, but a first-principles mathematical model may not be. Numerical simulation methods can then not be used to predict system behavior. This situation is particularly prevalent in areas such as biology, medicine, environmental science, economy, and finance. There has therefore been much interest in using machine-learning models for data-driven forecasting of space-time dynamics with unknown governing equation.

Recent advancements in data-driven forecasting techniques using artificial neural networks include the ODIL framework¹ and a mesh-free variant using graph neural networks². In addition, the feasibility of data-driven forecasting of chaotic dynamics has been demonstrated using neural networks with recurrent connections^3–5. The common goal of these approaches is to learn a discrete propagator of the observed dynamics. This propagator is a combination of the values on finitely many discrete sampling points at a certain time t that explains the values at the next time point $t + Δ t$ . As such, these approaches neither learn a governing equation of the observed dynamics, like sparse regression methods^6–8 or PDE-nets^9,10 do, nor a data-guided approximation of the solution to a known governing equation, like Physics-Informed Neural Networks do^11,12. Instead, they learn discrete rules that explain the dynamics of the data. This makes equation-free forecasting from data feasible for a number of applications, including prediction, extrapolation, coarse-graining, and de-noising.

The usefulness of data-driven forecasting methods, however, hinges on the accuracy and stability of the propagator they learned. Ideally, the propagator fulfills the constraints for a valid numerical scheme in the discretization used to represent the data. Then, the learned propagators can be expected to possess generalization power. This has been accomplished when the governing equation is known¹³ and for explicit time-integration schemes^14,15. Moreover, work on solution- and resolution-specific discretization of nonlinear PDEs has shown that Convolutional Neural Network (CNN) filters are able to generalize to larger spatial solution domains than they have been trained on¹⁶. However, an over-complete set of CNN filters was used, which renders their training computationally expensive and data-demanding. It remains a challenge to achieve data-efficient and self-consistent forecasting for general spatio-temporal dynamics with unknown governing equation that generalizes to coarser grids in order to accelerate prediction.

Here, we present the STENCIL-NET architecture for equation-free forecasting of nonlinear and/or chaotic dynamics from spatiotemporal data across different grid resolutions. STENCIL-NET is inspired by works on learning data-adaptive discretizations for nonlinear PDEs, which have been shown to generalize to coarser grids¹⁷. This generalization ability derives from the inductive bias gained when a Multi-Layer Perceptron (MLP) architecture is combined with a known consistent time-stepping scheme, such as a Runge-Kutta method¹⁸ or Total Variation Diminishing (TVD) methods^19,20. On a regular Cartesian grid, it is straightforward to represent the spatial discretization by a neural network architecture. STENCIL-NET relies on sliding a small MLP over the input patches to perform cascaded cross-channel parametric pooling, which enables learning complex features of the dynamics. We illuminate the mathematical relationship between this neural network architecture and ENO/WENO finite differences^19,20. This connection to classic numerical methods constrains the propagators learned by STENCIL-NET to be consistent on average in the sense of numerical analysis, i.e., the prediction errors on average decrease for increasing spatial resolution and increasing stencil size. Therefore, STENCIL-NETs can be used to produce predictions beyond the space and time horizons they were trained for, for chaotic dynamics, and for coarser grid resolutions than they were trained for, all of which we show in numerical experiments. We also show that STENCIL-NETs do this better than both Convolutional Neural Networks (CNN) based local nonlinear convolutions and Fourier Neural Operators (FNO), which rely on a combination of global Fourier modes and nonlinear convolutions. As a consequence of their accuracy and stability, STENCIL-NETs extrapolate better to coarser grid resolutions and are computationally more efficient in both training and inference/simulation than CNNs and FNOs with comparable numbers of trainable parameters.

The STENCIL-NET

Consider the following dynamic process in discrete space and time:

\begin{matrix} \frac{\partial u_{i}}{\partial t} = N_{d} (u_{m} (x_{i}), Ξ, Δ x), i = 1, \dots, N_{x}, \end{matrix}

where $u_{i} = u (x_{i}, t_{j})$ are the $N_{x}$ data points in space and $N_{t}$ in time, and $Ξ$ are unknown parameters. The subset $u_{m} (x_{i}) = {u (x_{j}) : x_{j} \in S_{m} (x_{i})}$ are the $(2 m + 1)$ data points within some finite stencil support $S_{m}$ of radius m around point $x_{i}$ at time t. $Δ x$ is the grid resolution of the spatial discretization and $N_{d} : R^{2 m + 1} \mapsto R$ is the nonlinear discrete propagator of the data. Integrating Eq. (1) on both sides over one time step $Δ t$ yields a discrete map from $u (x_{i}, t)$ to $u (x_{i}, t + Δ t)$ as:

\begin{matrix} u_{i}^{n + 1} = u_{i}^{n} + \int_{t}^{t + Δ t} N_{d} (u_{m}^{n} (x_{i}), Ξ, Δ x) d τ, \end{matrix}

where $u_{i}^{n + 1} = u (x_{i}, t + Δ t)$ , $u_{i}^{n} = u (x_{i}, t)$ , and $u_{m}^{n} (x_{i}) = {u (x_{j}, t) : x_{j} \in S_{m} (x_{i})}$ . Approximating the integral on the right-hand side by quadrature, we find

\begin{matrix} u_{i}^{n + 1} = T_{d} (u_{i}^{n}, N_{d} (u_{m}^{n} (x_{i}), Ξ, Δ x), Δ t) + O (Δ t^{r}), n = 0, \dots, N_{t} - 1, \end{matrix}

where $N_{t}$ is the total number of time steps. Here, $T_{d}$ is the explicit discrete time integrator with time-step size $Δ t$ . Due to approximation of the integral by quadrature, the discrete time integrator converges to the continuous-time map in Eq. (2) with temporal convergence rate r as $Δ t \to 0$ . Popular examples of explicit time-integration schemes include forward Euler, Runge-Kutta, and TVD Runge-Kutta methods. In this work, we only consider Runge-Kutta-type methods and their TVD variants ²⁰. STENCIL-NET approximates the discrete nonlinear function $N_{d} (\cdot)$ with a neural network, leading to:

\begin{matrix} {\hat{u}}_{i}^{n + k} = T_{d}^{k} ({\hat{u}}_{i}^{n}, N_{θ} (u_{m}^{n} (x_{i}), Ξ), Δ t), \end{matrix}

where $N_{θ} : R^{2 m + 1} \mapsto R$ are the nonlinear network layers with weights $θ$ . The superscript k in $T_{d}^{k}$ denotes that the discrete propagator maps k time steps into the future.

Assuming point-wise uncorrelated noise $η$ on the data, i.e., $v_{i}^{n} = u_{i}^{n} + η_{i}^{n}$ , Eq. (4) can be extended for mapping noisy data from $v_{i}^{n}$ to $v_{i}^{n + k}$ , as:

\begin{matrix} {\hat{v}}_{i}^{n + k} = T_{d}^{k} ({\hat{v}}_{i}^{n} - {\hat{η}}_{i}^{n}, N_{θ} (v_{m}^{n} (x_{i}) - {\hat{n}}_{m}^{n} (x_{i}), Ξ), Δ t) + {\hat{η}}_{i}^{n + k}, \end{matrix}

where $v_{m}^{n} (x_{i}) = {v (x_{j}, t) : x_{j} \in S_{m} (x_{i})}$ are the given noisy data and ${\hat{n}}_{m}^{n} (x_{i}) = {\hat{η} (x_{j}, t) : x_{j} \in S_{m} (x_{i})}$ the noise estimates on the stencil $S_{m}$ centered at point $x_{i}$ at time t.

Neural network architecture

STENCIL-NET uses a single MLP convolutional (mlpconv) unit with $N_{l}$ fully-connected hidden layers inside to represent the discrete propagator $N_{θ}$ and a discrete Runge-Kutta time integrator for $T_{d}^{k}$ . This results in the architecture shown in Fig. 1. Sliding the mlpconv unit over the input state-variable vector $u^{n}$ (or ${\hat{u}}^{n}$ during inference) maps the input on the local stencil patch to discretization features. The computation thus performed by the mlpconv unit is:

\begin{matrix} y^{1} = ς (W_{1} x + b_{1}), \dots, y^{N_{l}} = ς (W_{N_{l}} y^{N_{l} - 1} + b_{N_{l}}), \end{matrix}

where $θ = {W_{q}, b_{q}}_{q = 1, 2, \dots, N_{l}}$ are the trainable weights and biases, respectively, and $ς$ is the (nonlinear) activation function. Usual choices of activation functions include $tanh$ , sigmoid, and ReLU. Sliding the mlpconv unit across the input vector amounts to cascaded cross-channel parametric pooling over a CNN layer ²¹, which allows for trainable interactions across channels for better abstraction of the input data across multiple resolution levels.

The STENCIL-NET architecture for equation-free forecasting from data. The mlpconv unit performs parametric pooling by moving a small (stencil size $S_{m}$ ) MLP network across the input vector ${\hat{u}}^{n}$ at time n to generate the feature maps. The features reaching the output layer are the stencil weights of the discrete propagator $N_{θ}$ . A Runge-Kutta time integrator is used to evolve the network output over time for q steps forward and backward in time, which is then used to compute the loss.

This is in contrast to a conventional CNN, where higher-level abstraction is achieved by over-complete sets of filters, at the cost of requiring more training data and incurring extra workload on the downstream layers of the network for composing features²¹. We therefore provide a direct comparison with a CNN architecture where the feature map is generated by convolving the input $x$ followed by a nonlinear activation, i.e.,

\begin{matrix} y^{k} = ς (W_{k}^{m} x + b_{k}), \end{matrix}

where $W_{k}^{m}$ is a circulant (and not dense, as in Eq. (6) matrix that represents the convolution and depends on the size of the filter $| S_{m} | = (2 m + 1)$ , and $b_{k}$ is the bias term. For further comparison, we also benchmark the mlpconv STENCIL-NET architecture against the global operator-learning method Fourier Neural Operators (FNO)²². The FNO computation is:

\begin{matrix} y^{1} = ς (W_{1} x + b_{1} + F^{- 1} (R_{1} F (x))), \dots, y^{N_{l}} = ς (W_{N_{l}} y^{N_{l} - 1} + b_{N_{l}} + F^{- 1} (R_{N_{ℓ}} F (y^{N_{l} - 1}, ))), \end{matrix}

where $θ = {W_{q}, R_{q}, b_{q}}_{q = 1, 2, \dots, N_{l}}$ are the trainable weights and biases, and $ς$ is the (nonlinear) activation function. The operators $F$ and $F^{- 1}$ are the forward and inverse Fourier transforms, respectively.

Loss function and training

From Eq. (5), we derive a loss function for learning the local nonlinear discrete propagator $N_{θ}$ :

\begin{matrix} L_{MSE} = \sum_{n = 1}^{N_{t}} \sum_{i = 1}^{N_{x}} \sum_{k = - q}^{q} γ_{k} ‖ v_{i}^{n + k} - (T_{d}^{k} (v_{i}^{n} - \hat{η}, N_{θ} ({\hat{v}}_{m}^{n}, Ξ), Δ t) + {\hat{η}}_{i}^{n + k}) ‖_{2}^{2} . \end{matrix}

The loss compares the forward and backward predictions $T_{d}^{k}$ of the STENCIL-NET with the training data $v_{i}^{n + k}$ . It is computed as detailed in Algorithm 1. The positive integer q is the number of Runge-Kutta integration steps considered during optimization, which we refer to as the training time horizon. The scalars $γ_{k}$ are exponentially decaying (over the training time horizon q) weights that account for the accumulating prediction error²³. In 1D, we treat the noise estimates ${\hat{η}}_{i}$ as latent variables that enable STENCIL-NET to separate dynamics from noise. In that case, we initialize with estimates obtained from Tikonov smoothing of the training data (initialization step in Algorithm 1). For higher-dimensional problems, however, learning noise as a latent variable becomes infeasible. We then instead directly learn a smooth (in time) propagator from the noisy data, as demonstrated in Section "STENCIL-NET for learning smooth dynamics from noisy data".

We penalize the noise estimates $\hat{N} = {[{\hat{n}}_{m}^{n}]}_{\forall (m, n)}$ in order to avoid learning the trivial solution $v_{i}^{n + k} = {\hat{v}}_{i}^{n} + {\hat{η}}_{i}^{n + k}$ of the minimization problem in Eq. (9), where $N_{θ} \equiv 0$ and only the noise accounts for the data. We also impose a penalty on the weights of the network ${W_{i}}_{i = 1, 2, \dots, N_{l}}$ in order to prevent over-fitting. The total loss then becomes:

\begin{matrix} Loss = L_{MSE} + λ_{n} ‖ \hat{N} ‖_{F}^{2} + λ_{wd} \sum_{i = 1}^{N_{l}} {‖ W_{i} ‖}_{F}^{2}, \end{matrix}

where $N_{l}$ is the number of layers in the single mlpconv unit, $\hat{N} \in R^{N_{x} \times N_{t}}$ is the matrix of point-wise noise estimates, and ${‖ \cdot ‖}_{F}$ is the Frobenius norm of a matrix. We perform grid search through hyper-parameter space in order to identify values of the penalty parameters. We find the choice $λ_{n} = 10^{- 5}$ and $λ_{wd} = 10^{- 8}$ to work well for all problems considered in this paper. Alternatively, methods like drop-out, early-stopping criteria, and Jacobi regularization can be used to counter the over-fitting problem. As a data-augmentation strategy, we unroll the training data both forward and backward in time when evaluating the loss in Eq. (9). Numerical experiments (not shown) confirm that this training mode on average leads to more stable and more accurate models than trained solely forward in time. Regardless, inference is done only forward in time.

Training is done in full-batch mode using an Adam optimizer²⁴ with learning rate $l r = 0.001$ . As activation functions, we use Exponential Linear Units (ELU) for their smooth function interpolation abilities. While one could readily use ReLU, Leaky-ReLU, $tanh$ , or adaptive activation functions with learnable hyper-parameters, we find that ELU consistently performs best in the benchmark problems considered below. The complete training procedure is summarized in Algorithm 1. To generate a similar CNN architecture for comparison, we replace the mlpconv unit in Fig. 1 with the local nonlinear convolution map from Eq. (7). Similarly, for comparison with FNO, we replace the mlpconv unit with the feature map from Eq. (8) in order to approximate discrete operators through convolutions in frequency space.

Forecasting accuracy

In classic numerical analysis, stability and accuracy (connected to consistency via the Lax Equivalence Theorem) play central roles is determining the validity of a discretization scheme. In a data-driven setting, learning a propagator of the values on the grid nodes from a time t to a later time $t + Δ t$ assumes that the discrete stencil of the propagator is a valid numerical discretization of some (unknown) ground-truth dynamical system. In this section, we therefore rationalize our choice of network architecture by algebraic parallels with known grid-based discretization schemes. Specifically, we analyze the approximation properties of the STENCIL-NET architecture and argue accuracy by analogy with the well-known class of solution-adaptive ENO/WENO finite-difference schemes^19,20. As we show below, WENO schemes involve reciprocal functions, absolute values, and “switch statements”, and they are rational functions in the stencil values.

MLPs are particularly effective in representing rational functions ²⁶. As a consequence, MLPs can efficiently represent WENO-like stencils when approximating the nonlinear discrete spatial propagator $N_{d}$ . To illustrate this, we compare the best polynomial, rational, and MLP fits to the spike function in Fig. 2. The rational and MLP approximations both closely follow the true spike (black solid line), whereas polynomials fail to capture the “switch statement”, i.e., the division operation that are quintessential for resolving sharp functions.

Approximation properties for sharply varying functions. Best rational, polynomial, and MLP fits to the spike function $f (x) = (a + a | x - b | / (x - b)) (1 / {(x + c)}^{2})$ with $a = 0.5$ , $b = 0.25$ , and $c = 0.1$ . The rational function fit uses Newman polynomials²⁵ to approximate $| x |$ .

This also explains the results shown in Fig. 3, where the WENO and STENCIL-NET (i.e., MLP) methods accurately resolve the advection of a sharp pulse. Remarkably, the STENCIL-NET can advect the pulse on $4 \times$ coarser grids than the WENO scheme is able to (Fig. 3D). All polynomial approximations using central-difference or fixed-stencil upwinding schemes fail to faithfully forecast the dynamics. Based on this empirical evidence, we posit that a relationship between the neural network architecture used and a known class of numerical methods is beneficial for equation-free forecasting, in particular in the presence of sharp gradients or multi-scale features. We therefore next analyze the numerical-method equivalence of the STENCIL-NET architecture.

Numerical solutions of advection of a sharp pulse (red dashed line) using different discretization schemes. (A) Data-adaptive fifth-order WENO stencils; (B) second- and fourth-order central finite differences; (C) first- and third-order upwinding schemes; (D) STENCIL-NET on a $4 \times$ coarser grid compared with fifth-order WENO on the same grid. In all plots the advection velocity is $c = 2$ in the one-dimensional advection equation $u_{t} + c u_{x} = 0$ . All plots are at time $t = 3$ .

Relation with finite-difference stencils

The lth spatial derivative at location $x_{i}$ on a grid with spacing $Δ x$ can be approximated with convergence order r using linear convolutions:

\begin{matrix} {(\frac{\partial^{l} u}{\partial x^{l}}|}_{x = x_{i}} = \sum_{x_{j} \in S_{m} (x_{i})} ξ_{j} u_{j} + O (Δ x^{r}), \end{matrix}

where $u_{j} = u (x_{j})$ . The $ξ_{j}$ are the stencil weights that can be determined by local polynomial interpolation^20,27. The stencil of radius m is $S_{m} (x_{i}) = {x_{i - m}, x_{i - m + 1}, \dots, x_{i}, \dots, x_{i + m - 1}, x_{i + m}}$ with size $| S_{m} | = 2 m + 1$ . For a spatial domain of size L, the spacing is $Δ x = L / N_{x}$ , where $N_{x}$ is the number of grid points discretizing space. The following propositions from^9,28 define discrete moment conditions that need to be fulfilled in order for the stencil to be consistent for linear and quadratic operators, respectively:

Proposition 3.1

(Nonlinear convolutions⁹) Nonlinear terms of order 2, including products of derivatives (e.g., $u u_{x}, u^{2}, u_{x} u_{xx}$ ) can be approximated by nonlinear convolution. For $l_{1}, l_{2}, r_{1}, r_{2} \in Z_{0}^{+}$ , we can write,

\begin{matrix} {((\frac{\partial^{l_{1}} u}{\partial x^{l_{1}}}), (\frac{\partial^{l_{2}} u}{\partial x^{l_{2}}})|}_{x = x_{i}} = \sum_{x_{j} \in S_{m} (x_{i})} \sum_{x_{k} \in S_{m} (x_{i})} ξ_{jk} u_{j} u_{k} + O (Δ x^{r_{1} + r_{2}}), \end{matrix}

such that the stencil size $| S_{m} | \geq | l_{1} + r_{1} - 1 |$ and $| S_{m} | \geq | l_{2} + r_{2} - 1 |$ . Eq. (12) is a convolution with a Volterra quadratic form, as described in Ref.²⁹.

The linear convolutions in Eq. (11) and the nonlinear convolutions in Eq. (12) are based on local polynomial interpolation. They are thus useful to discretize smooth solutions arising in problems like reaction-diffusion, fluid flow at low Reynolds numbers, and elliptic PDEs. But they fail to discretize nonlinear flux terms arising, e.g., in problems like advection-dominated flows, Euler equations, multi-phase flows, level-set and Hamilton-Jacobi equations ¹⁹. This is because higher-order interpolation near sharp gradients leads to oscillations that do not decay when refining the grid, see Fig. 3B, a fact known as “Gibbs phenomenon”.

Moreover, fixed stencil weights computed from moment conditions can fail to capture the direction of information flow in the data (“upwinding”), leading to non-causal and hence unstable predictions^30,31. This can be relaxed by biasing the stencil along the direction of information flow or by constructing weights with smooth flux-splitting methods, like Godunov³² and Lax-Friedrichs³³ schemes. To counter the issue of spurious oscillations, artificial viscosity ³⁴ can also be added at the cost of lower solution accuracy. Alternatively, data-adaptive stencils with ENO (Essentially Non-Oscillatory)¹⁹ or WENO (Weighted ENO)²⁰ weights are available, which seems the correct choice for data-driven forecasting since they do not only adhere to some moment conditions, but also adapt to the data themselves.

The ENO/WENO method for discretely approximating a continuous function f at a point $x_{i \pm 1 / 2}$ can be written as a linear convolution:

\begin{matrix} {\hat{u}}_{i \pm 1 / 2} = \sum_{x_{j} \in S^{\pm} (x_{i})} ν_{j}^{\pm} u (x_{j}) + O (Δ x^{r}), \end{matrix}

where $u_{i \pm 1 / 2} = u (x_{i} \pm Δ x / 2)$ , and the function values and coefficients on the stencils are stored in $u_{m} = {u (x_{j}) : x_{j} \in S_{m} (x_{i})}$ and $ν (x_{j})$ , respectively. The stencils $S_{m}^{\pm} (x_{i})$ are $S_{m}^{\pm} (x_{i}) = S_{m} (x_{i}) \ x_{i \pm m}$ , as illustrated in Fig. 4 for the example of $m = 3$ . Unlike the fixed-weight convolutions in Eqs. (11) and (12), the coefficients $ν$ in ENO/WENO stencils are computed based on local smoothness features as approximated on smaller sub-stencils, which in turn depend on the functions values. This leads to locally data-adaptive stencil weights and allows accurate and consistent approximations even if the data $u_{i}$ are highly varying.

Illustration of data-adaptive stencils. Schematic of the stencil $S_{3} (x_{i})$ centered around the point $x_{i}$ . Here, $S_{3} (x_{i}) = S_{3}^{+} \cup S_{3}^{-}$ .

The key idea behind data-adaptive stencils is to use a nonlinear map to choose the locally smoothest stencil, while discontinuities are avoided, to result in smooth and essentially non-oscillatory solutions. The WENO-approximated function values ${\hat{u}}_{i \pm 1 / 2}$ from Eq. (13) can be used to approximate spatial variation in the data as follows:

\begin{matrix} {(\frac{\partial u}{\partial x}|}_{x = x_{i}} = \frac{{\hat{u}}_{i + 1 / 2} - {\hat{u}}_{i - 1 / 2}}{Δ x} + O (Δ, x^{r}) . \end{matrix}

In finite-difference methods, the function $\hat{u}$ is a polynomial flux, e.g. $\hat{u} (u) = c_{1} u^{2} + c_{2} u$ , whereas in finite-volume methods, the function f is the grid-cell average $\hat{u} = \frac{1}{Δ x} \int_{x - Δ x / 2}^{x + Δ x / 2} u (ξ) d ξ$ .

It is clear from Eqs. (11) and (12) that polynomials (i.e., nonlinear convolutions) cannot approximate division operations. Thus, methods using fixed, data-independent stencil weights or multiplications of filters¹⁰ fail to approximate dynamics with sharp variations. However, as shown in Fig. 2, this can be achieved by rational functions. It is straightforward to see that ENO/WENO stencils are rational functions in the stencil values $u_{m}$ . Indeed, from Eq. (13), the stencil weights are computed as $ν = \frac{g_{1} (u_{m} (x_{i}))}{g_{2} (u_{m} (x_{i}))}$ , with a polynomial $g_{1} : R^{2 m + 1} \to R$ and a strictly positive polynomial $g_{2} : R^{2 m + 1} \to R$ ²⁰.

In summary, depending on the smoothness of the data $u_{i}$ , the propagator can either be approximated using fixed stencils that are polynomials in the stencil values (Eqs. 11, 12) or using solution-adaptive stencils that are rational functions in the stencil values (Eqs. (13), (14)). STENCIL-NET can represent either. This leads to the conclusion that any continuous dynamics, smooth or not, $N (\cdot)$ can be approximated by a polynomial or rational function in local stencil values $u_{m} (x_{i})$ , up to any desired order of accuracy r on a Cartesian grid with resolution $Δ x$ , thus:

\begin{matrix} N_{d} (u_{m} (x_{i}), Ξ) = N (u (x), Ξ) + O (Δ x^{r}) for u_{m} (x_{i}) = {u (x_{j}) : x_{j} \in S_{m} (x_{i})}, \end{matrix}

where $N_{d} (\cdot)$ is the discrete propagator.

Numerical experiments

We apply the STENCIL-NET for learning stable and accurate equation-free forecasting operators for a variety of nonlinear dynamics. For all problems discussed here, we use a single mlpconv unit with $N_{l} = 3$ hidden layers (not counting the input and output layers), and Exponential Linear Unit (ELU) activation functions. Input to the network are the data $u_{m}$ on a stencil of radius $m = 3$ in all examples. We present numerical experiments that demonstrate distinct applications of the STENCIL-NET architecture using data from deterministic dynamics, data from chaotic dynamics, and noisy data.

STENCIL-NET for equation-free forecasting

We first demonstrate the capability of STENCIL-NET to extrapolate space-time dynamics beyond the training domain and to different parameter regimes. For this, we consider the forced Burgers equation in one spatial dimension and time. The forced Burgers equation with a nonlinear forcing term can produce rich and sharply varying solutions. Moreover, we choose the forcing term at random in order to explore generalization to different parts of the solution manifold¹⁶. The forced Burgers equation in 1D is:

\begin{matrix} \frac{\partial u}{\partial t} + \frac{\partial (u^{2})}{\partial x} = D \frac{\partial^{2} u}{\partial x^{2}} + f (x, t) \end{matrix}

for the unknown function u(x, t) with diffusion constant $D = 0.02$ . Here, we use the forcing term

\begin{matrix} f (x, t) = \sum_{i = 1}^{N} A_{i} sin (ω_{i} t + 2 π l_{i} x / L + ϕ_{i}) \end{matrix}

with each parameter drawn independently and uniformly at random from its respective range: $A \in [- 0.1, 0.1]$ , $ω \in [- 0.4, 0.4]$ , $ϕ \in [0, 2 π]$ , and $N = 20$ . The domain size L is set to $2 π$ (i.e., $x \in [0, 2 π]$ ) with periodic boundary conditions and $l_{i} \in {2, 3, 4, 5}$ . We use a smooth initial condition $u (x, t = 0) = exp (- {(x - 3)}^{2})$ and generate data on $N_{x} = 256$ evenly spaced grid points with fifth-order WENO discretization of the convection term and second-order central differences for the diffusion term. Time integration is performed using a third-order Runge-Kutta method with time-step size $Δ t$ chosen as large as possible according to the Courant-Friedrichs-Lewy (CFL) condition of this equation. For larger domains, we adjust the range of $l_{i}$ so as to preserve the wavenumber spectrum of the dynamics, e.g., for $L = 8 π$ we use $l_{i} \in {8, 9, \dots, 40}$ . We use the same spatial resolution $Δ x$ for all domain sizes, i.e., the total number of grid points $N_{x}$ grows proportionally to domain size.

We train STENCIL-NET at different spatial resolutions $Δ x_{c} = (C Δ x)$ , where C is the sub-sampling factor. We use sub-sampling factors $C \in {2, 4, 8}$ in space. We sub-sample the data by simply removing intermediate grid points. For example, for $C = 2$ , we only keep spatial grid points with even indices. During training, time integration is done with a step size that satisfies the CFL condition in the sub-sampled mesh, i.e., $Δ t_{c} \leq {(Δ x_{c})}^{2} / D$ , where $Δ x_{c} = C Δ x$ and $Δ t_{c}$ are the training space and time resolutions, respectively. Training is done in the full spatial domain of the data for times 0 through 40, independent for each resolution. We then test how well STENCIL-NET is able to forecast the solution to times >40. Due to the steep gradients and rapidly varying dynamics of the Burgers equation, this presents a challenging problem for testing STENCIL-NET’s generalization capabilities.

Figure 5A compares the STENCIL-NET prediction on a four-fold downsampled grid (i.e., $C = 4$ ) at the end of the training time interval after training on the full-resolution training data. Figure 5B compares the learned nonlinear discrete propagator $N_{θ}$ from the STENCIL-NET with the true discrete $N_{d}$ of the simulation that generated the data. It is thanks to the good match in this propagator that STENCIL-NET is able to generalize for parameters and domain-sizes beyond the training conditions. Indeed, this is what we observe in Fig. 6, where we compare the fifth-order accurate WENO data on a fine grid $(Δ x = L / N_{x}, N_{x} = 256, L = 2 π)$ with the STENCIL-NET predictions on coarsened grids with $Δ x_{c} = C Δ x$ for sub-sampling factors $C \in {2, 4, 8}$ and for longer times. STENCIL-NET produces stable and accurate forecasts on all coarser grids for times >40 beyond the training data (dashed black box). The point-wise absolute prediction error (right column) grows when leaving the training time domain, but does not “explode” and is concentrated around the steep gradients in the solution, as expected.

Comparison between STENCIL-NET (4-fold coarsened) and fifth-order WENO for forced Burgers. (A) Comparison between the STENCIL-NET prediction and WENO data over the entire domain at time $t = 40$ , i.e., at the end of the training time horizon. (B) Comparison of the nonlinear discrete operators learned by STENCIL-NET ( $N_{θ}$ ) and the ground-truth WENO scheme ( $N_{d}$ ) at time $t = 40$ .

Forced Burgers forecasting with STENCIL-NET on coarser grids. (A) Left: fifth-order WENO ground-truth (GT) data with spatial resolution $N_{x} = 256$ . Right: power spectra of the predictions from different architectures (STENCIL-NET, CNN, FNO) compared to ground-truth. (B,C,D) Left column: output of STENCIL-NET on $2 \times$ , $4 \times$ , and $8 \times$ coarser grids. The dashed boxes contain the data used for training at each resolution. Right column: point-wise absolute error of the STENCIL-NET prediction compared to the ground-truth (GT) data in (A) beyond the training domain.

We compare the inference accuracy and computational performance of the mlpconv-based STENCIL-NET with a CNN and with the operator-learning architecture FNO. The CNN relies on local nonlinear convolutions, whereas the FNO learns continuous operators using a combination of nonlinear convolutions and global Fourier modes. Both have previously been proposed for dynamics forecasting from data^22,35. In all comparisons, we use a CNN with 5 hidden layers of 10 filters each, and a filter radius $m = 3$ that matches the stencil radius of the STENCIL-NET. For the FNO, we use 5 hidden layers with 6 local convolutional filters per layer. We rule out spectral bias³⁶ by verifying that the fully trained networks can represent the Fourier spectra of the ground-truth at the different sub-samplings tested. The results in Fig. 6 show that all three neural-network approximations are able to capture the power spectrum of the true signal. Next, we compare the mean-square errors of the network predictions at different sub-sampling. The results in Table 1 show that while the FNO performs best for low sub-sampling, the STENCIL-NET has the best generalization power to coarse grids. This is consistent with the expectation that FNO achieves superior accuracy when trained with sufficient data²². However, the FNO predictions become increasingly unstable during inference for high sub-sampling.

Table 1.

Prediction Mean Square Error (MSE) comparison between STENCIL-NET, a Convolutional Neural Network (CNN), and a Fourier Neural Operator (FNO) of comparable sizes (i.e., numbers of trainable parameters) for different sub-sampling of the forced Burgers test case after training for 10,000 epochs with $q = 2$ .

Prediction MSE	$2 \times$	$4 \times$	$8 \times$
STENCIL-NET (2264 parameters)	0.0419	$0.0392$	$0.0741$
CNN (2281 parameters)	0.0411	0.0562	0.0823
FNO (2665 parameters)	$0.0351$	0.0488	0.0835

Open in a new tab

All networks are trained on data up to time $T = 40$ , while the errors are evaluated for predictions up to time $T = 160$ . Best performance is highlighted in bold.

In contrast to the generative network proposed in Ref.¹⁴, which used all time frames for training, the parametric pooling on local stencil patches enables STENCIL-NET to consistently extrapolate also to larger spatial domains, as shown in Fig. 7. Furthermore, as shown in Fig. 8, STENCIL-NET is able to generalize to forcing terms (i.e., dynamics) different from the one used to generate the training data. As seen from the point-wise errors, all architectures perform well within the training window. Also, the largest errors always occurs near steep gradients or jumps in the solution, as expected. The FNO, however, develops spurious oscillations for long prediction intervals.

STENCIL-NET extrapolation to larger spatial domains and longer times. (A) STENCIL-NET prediction on a $4 \times$ coarser grid. (B) Comparison between ground-truth discrete propagator $N_{d}$ (solid) and STENCIL-NET layer output $N_{θ}$ (dashed) at times 21 (within training data) and 150 (past training data) marked by dashed vertical lines in (A). The STENCIL-NET was trained on the data within the domain marked by the solid rectangle in (A).

STENCIL-NET generalization to different forcing terms without re-training. Each panel (A,B,C) corresponds to forced Burgers dynamics with a different forcing term (see Eq. 17). Every second row in each panel correspond to point-wise error associated with STENCIL-NET, CNN and FNO (left to right). STENCIL-NET was trained only once using data from (A) (training domain marked by dashed box). Nevertheless, it was able to accurately predict the qualitatively different dynamics for the other forcing terms, since it learned a discrete propagator of the dynamics rather than the solution values themselves. The plots in the first column show the ground-truth data for the different forcing terms, obtained by a fifth-order WENO scheme. The three subsequent columns show the predictions on $4 \times$ coarser grids without additional (re-)training and the corresponding absolute errors (w.r.t. WENO) below each plot. The three columns compare STENCIL-NET with a CNN and a FNO of comparable complexity (see Table 1).

In addition to its improved generalization power, STENCIL-NET is also computationally more efficient than both the CNN and FNO approaches. This is confirmed in the timings reported in Table 2 for training and inference/simulation, respectively. All times were measured on a Nvidia A100-SXM4 GPU for networks with comparable numbers of trainable parameters.

Table 2.

Top: Average (over 1000 epochs) training time per epoch compared between the STENCIL-NET, CNN, and FNO from Table 1 with $4 \times$ spatial sub-sampling for different training time horizons q. Bottom: Average (over 1000 time steps) simulation (inference) time per time-step for the forced Burgers test case with different sub-sampling. Best performance is highlighted in bold as measured on a Nvidia A100-SXM4 GPU.

Training time (sec)	$q = 2$	$q = 4$	$q = 6$
STENCIL-NET	$0.0145$	$0.0271$	$0.0398$
CNN	0.0350	0.0668	0.0989
FNO	0.0664	0.1296	0.1926

Simulation time (sec)	$2 \times$	$4 \times$	$8 \times$
STENCIL-NET	$0.00181$	$0.00175$	$0.00172$
CNN	0.00346	0.00343	0.00336
FNO	0.00692	0.00685	0.00610

Open in a new tab

Finally, we analyze the influence of the choice of STENCIL-NET architecture on the results. Figure 9A,B show the effect of the choice of MLP size and of the weight regularization parameter $λ_{wd}$ (see Eq. 10) for different training time horizons q. Figure 9C compares the accuracies of predictions on grids of varying resolution and for different sizes of the space and time domains (L and T, respectively). Training was always done for $L = 2 π$ , $T = 40$ . In Figure. 9, we also notice that the prediction error on average decreases for increasing stencil complexity (Fig. 9A) and for increasing mesh resolution (Fig. 9C). This is empirical evidence that STENCIL-NET learns a consistent numerical discretization of the underlying dynamical system.

Architecture choice, choice of $λ_{wd}$ , and generalization power. All models are trained on the spatio-temporal data contained in the dashed box in Fig. 6. The training box encompasses the entire spatial domain of length $L = 2 π$ and a time duration of $T = 40$ . (A) STENCIL-NET prediction accuracy (measured as mean squared error, MSE, w.r.t. ground truth) for different network architectures. Only hidden layers are counted, not the input and output layers. Each data point corresponds to the average MSE accumulated over all stable seed configurations. (B) Effect of the weights regularization parameter $λ_{wd}$ (see Eq. (10) on the prediction accuracy for different training time horizons q. The plots in B were produced with a 3-layer, 64-nodes network architecture, which showed the best performance in A and is used also for all other experiments in this paper. (C) Plots to illustrate the generalization power of trained STENCIL-NET to larger domains. Each data point shows the MSE prediction error from the best model run on the grid of the respective resolution (sub-sample factor) for different domain lengths L and final times T. Training was always done on the data for $L = 2 π$ and $T = 40$ .

In summary, we find that a STENCIL-NET model trained on a small training domain can generalize well to longer times, larger domains, and to dynamics containing different forcing terms f(x, t) in Eq. (17) in a numerically consistent way. This highlights the importance of using a network architecture that mathematically relates to numerically valid discrete operators.

STENCIL-NET as an autonomous predictor of chaotic dynamics

We next analyze how well the generalization power of STENCIL-NET forecasting transfers to inherently chaotic dynamics. Nonlinear dynamical systems, like the Kuramoto-Sivashinsky (KS) model, can describe chaotic spatio-temporal dynamics. The KS equation exhibits varying levels of chaos depending on the bifurcation parameter L (domain size) of the system⁵. Given their high sensitivity to numerical errors, KS equations are usually solved using spectral methods. However, data-driven models using Recurrent Neural Networks (RNNs)^5,31,37 have also shown success in predicting chaotic systems. This is mainly because RNNs are able to capture long-term temporal correlations and implicitly identify the required embedding for forecasting.

We challenge these results using STENCIL-NET for long-term stable prediction of chaotic dynamics. The training data are obtained from spectral solutions of the KS equation for domain size $L = 64$ . The KS equation for an unknown function u(x, t) in 1D is:

\begin{matrix} \frac{\partial u}{\partial t} + \frac{\partial (u^{2})}{\partial x} + \frac{\partial^{2} u}{\partial x^{2}} + \frac{\partial^{4} u}{\partial x^{4}} = 0 . \end{matrix}

The domain $x \in [- 32, 32]$ of length $L = 64$ has periodic boundary conditions and is discretized by $N_{x} = 256$ evenly spaced spatial grid points. We use the initial condition

\begin{matrix} u (x, t = 0) = \sum_{i = 1}^{N} A_{i} sin (2 π l_{i} x / L + ϕ_{i}) . \end{matrix}

Each parameter is drawn independently and uniformly at random from its respective range: $A \in [- 0.5, 0.5]$ , $ϕ \in [0, 2 π]$ , and $l_{i} \in {1, 2, 3}$ . We use a spectral method to numerically solve the KS equation using the chebfun³⁹ package. Time integration is performed using a modified exponential time-differencing fourth-order Runge-Kutta method⁴⁰ with step size $Δ t = 0.05$ . For the STENCIL-NET predictions, we use a grid sub-sampling factor of $C = 4$ in space, i.e., $N_{c} = 64$ , and we train on data up to time 12.5. The prediction time-step size is chosen as large as possible to satisfy the CFL condition.

The spatio-temporal dynamic data from the chaotic KS system is shown in Fig. 10. The STENCIL-NET predictions on a $4 \times$ coarser grid diverge from the true data over time (see bottom row of Fig. 10). This is due to the chaotic behavior of the dynamics for domain lengths $L > 22$ , which causes any small prediction error to grow exponentially at a rate proportional to the maximum Lyaponuv exponent of the system. Despite this fundamental unpredictability of the actual space-time values, STENCIL-NET is able to correctly predict the value of the maximum Lyaponuv exponent and the spectral statistics of the system (see Fig. 11). This is evidence that the equation-free STENCIL-NET forecast is consistent in that it has correctly learned the intrinsic ergodic properties of the dynamics that has generated the data. In Fig. 12, we show a STENCIL-NET forecast on a $4 \times$ coarser grid for a different initial condition obtained from Eq. (19) with a different random seed, and for longer times ( $8 \times$ ) than it was trained for. This shows that the statistically consistent propagator learned by STENCIL-NET can also be run as an autonomous predictor of chaotic dynamics beyond the training domain.

Equation-free forecasting of chaotic Kuramoto-Sivashinsky spatio-temporal dynamics. (Top) Spectral solution of the KS equation on a domain of length $L = 64$ , where the system behaves chaotically. Data within the dashed rectangle are used for training of the STENCIL-NET model. (Middle) STENCIL-NET forecast on a $4 \times$ coarser grid for a $4 \times$ longer time. (Bottom) Point-wise absolute difference between the ground-truth data in the top row and the STENCIL-NET forecast in the middle row.

Statistical characteristics of the chaotic system. (Left) Comparison of the power spectral density (PSD) of the ground-truth solution and the STENCIL-NET prediction on a $4 \times$ coarser grid. (Right) Growth of the distance between nearby trajectories in the STENCIL-NET prediction, characterizing the maximum Lyaponuv exponent (slope of the dashed red line), compared to the ground truth value $\approx 0.084$ ³⁸.

Autonomous prediction of chaotic spatio-temporal dynamics. STENCIL-NET can run as an autonomous predictor of long-term chaotic dynamics with different initial condition and for longer times that it was trained for (training data in the dashed rectangle in Fig. 10) and on a (here $4 \times$ ) coarser grid.

STENCIL-NET for learning smooth dynamics from noisy data

The discrete time-stepping constraints force any STENCIL-NET prediction to follow a smooth time trajectory. This property can be exploited for filtering true dynamics from noise. We demonstrate this de-noising capability using numerical solution data from the Korteweg-de Vries (KdV) equation with artificially added noise. The KdV equation for an unknown function u(x, t) in 1D is:

\begin{matrix} \frac{\partial u}{\partial t} + \frac{\partial (u^{2})}{\partial x} + δ \frac{\partial^{3} u}{\partial x^{3}} = 0, \end{matrix}

where we use $δ = 0.0025$ . We again use a spectral method to generate numerical data from the KdV equation using the chebfun³⁹ package. The domain is $x \in [- 1, 1]$ with periodic boundary conditions, and the initial condition is $u (x, t = 0) = cos (π x)$ . The spectral solution is represented on $N_{x} = 256$ equally spaced grid points discretizing the spatial domain.

We corrupt the data vector $U = {[u (x_{i}, t_{j})]}_{\forall (i, j)} \in R^{N_{x} N_{t}}$ with element-wise independent additive Gaussian noise:

\begin{matrix} V = U + η, \end{matrix}

with each element of $η$ , $η = σ N (0, {std}^{2} (U))$ , where $N (m, V)$ is the normal distribution with mean m and variance V, and $std (U)$ is the empirical standard deviation of the data $U$ , rendering the noise data-dependent. The parameter $σ$ is the magnitude of the noise.

We use $σ = 0.1$ and train a STENCIL-NET over the entire space-time extent of the noisy data with $C = 4$ -fold sub-sampling in space and a training time-step size of $Δ t = 0.02$ . We use a third-order TVD Runge-Kutta method for time integration up to final time $T = 1$ during training. With this configuration, STENCIL-NET is able to learn a stable and accurate propagator for the discretized KdV dynamics from the noisy data $V$ , enabling it to separate data from noise, as shown in Fig. 13.

STENCIL-NET for learning smooth dynamics from noisy data. (A) Korteweg-de Vries data with point-wise data-dependent additive Gaussian noise of $σ = 0.1$ used for training. (B) STENCIL-NET prediction after training for 10,000 epochs on a $4 \times$ coarser grid using data up to time 1.0 (dashed vertical line). (C) Point-wise absolute error between the STENCIL-NET prediction and the noise-free ground-truth KdV dynamics.

Also for this noisy case, we compare STENCIL-NET with CNN and FNO architectures, as in the Burgers case without noise. The results in Table 3 show that for low sub-sampling factors, the CNN better predicts the ground-truth signal than the STENCIL-NET, but the STENCIL-NET generalizes significantly better to coarser grids. However, we were unable to train a stable STENCIL-NET with 2264 parameters (used on $4 \times$ and $8 \times$ downsampling) for prediction with $2 \times$ downsampling. This difference in parameterization could cause the spectral properties of the neural networks to differ, such that error estimates cannot be compared across resolutions in this case. Like in the forced Burgers case, the STENCIL-NET is computationally more efficient than both the CNN and FNO (Table 4). Taken together, this example shows the potentially powerful capabilities of STENCIL-NET to extract consistent dynamics from noisy data.

Table 3.

Prediction Mean Square Error (MSE) comparison for the Korteweg-de Vries (KdV) test case with different spatial sub-sampling and $q = 4$ after 10,000 epochs of training.

Prediction MSE	$2 \times$	$4 \times$	$8 \times$
STENCIL-NET	0.03276	$0.01024$	$0.02509$
CNN	$0.01532$	0.02174	0.07613
FNO	0.03078	0.04860	0.06978

Open in a new tab

For $4 \times$ and $8 \times$ sub-sampling, the STENCIL-NET has 2264 parameter, the CNN 2281, and the FNO 2665. For $2 \times$ sub-sampling, the STENCIL-NET has 8897 parameters, the CNN 8761, and the FNO 8681. In all cases, the prediction error is averaged over a time window two times longer than the training domain. MSEs are averaged over all stable configurations from 5 independent repetitions of the experiment for different network initialization.

Best performance is highlighted in bold.

Table 4.

Top: Average (over 1000 epochs) training time per epoch compared between the STENCIL-NET, CNN, and FNO from Table 3 with $4 \times$ spatial sub-sampling for different training time horizons q. Bottom: Average (over 50 time steps) simulation/inference time per time-step for the KdV test case with different sub-sampling. Best performance is highlighted in bold as measured on a Nvidia A100-SXM4 GPU.

Training time (sec)	$q = 2$	$q = 4$	$q = 6$
STENCIL-NET	$0.01395$	$0.02582$	$0.03770$
CNN	0.03880	0.07572	0.1132
FNO	0.06878	0.1284	0.1902

Simulation time (sec)	$2 \times$	$4 \times$	$8 \times$
STENCIL-NET	$0.00120$	$0.00119$	$0.00111$
CNN	0.00312	0.00307	0.00303
FNO	0.00640	0.00600	0.00559

Open in a new tab

Conclusions

We have presented the STENCIL-NET architecture for equation-free forecasting from data. STENCIL-NET uses patch-wise parametric pooling and discrete time integration constraints to learn propagators of the discrete dynamics on multiple resolution levels. The design of the STENCIL-NET architecture rests on a formal connection between MLP convolutional layer, rational functions, and solution-adaptive WENO finite-difference schemes. This renders STENCIL-NET approximations valid numerical discretizations of some latent nonlinear dynamics. The accuracy of the predictions also translates to better generalization power and extrapolation stability to coarser resolutions than both Convolutional Neural Networks (CNNs) and Fourier Neural Operators (FNOs), while being computationally more efficient in both training and inference/simulation. Through spectral analysis, we also found that neural architectures capture the power spectrum of the true dynamics, discounting the effects of spectral bias when checking for consistency. We have thus shown that STENCIL-NET can be used as a fast and accurate forecaster of nonlinear dynamics, for model-free autonomous prediction of chaotic dynamics, and for detecting latent dynamics in noisy data.

STENCIL-NET provides a general template for learning representations conditioned on discretized numerical operators in space and time. It combines the expression power of neural networks with the inductive bias enforced from numerical time-stepping, leveraging the two for stable and accurate data-driven forecasting. Beyond being a fast and accurate extrapolator or surrogate model, achieving three to four orders of magnitude speedup over traditional numerical solvers for cases where governing equations are known, a STENCIL-NET can be repurposed to learn closure corrections in computational fluid dynamics and in active material models. Along the same lines, a STENCIL-NET can also be repurposed to learn corrections in coarse-grained discretizations using existing numerical methods (e.g., spectral methods or finite-volume methods) in a hybrid machine-learning/simulation workflow. Finally, since STENCIL-NET is able to directly operate on noisy data, as we have shown here, it can also be used to decompose spatiotemporal dynamics from “propagator-free” noise in application areas such as biology, neuroscience, and finance, where physical models of the true dynamics may not be available.

Future work includes extensions of the STENCIL-NET architecture to 2D and 3D problems over time and to delayed coordinates for inferring latent variables. In addition, combined learning of local and global stencils could be explored.

In the interest of reproducibility, we publish our GPU and multi-core CPU Python implementation of STENCIL-NET and also make all of our trained models and raw training data available to users. They are available from https://github.com/mosaic-group/STENCIL-NET.

Acknowledgements

This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) as part of the Cluster of Excellence “Physics of Life” under code EXC-2068, and by the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF) as part of the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig. The present work was also partly funded by the Center for Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament.

Author contributions

S.M.: concept, algorithm and code development, results, results analysis, figures, writing initial draft. D.S.: code implementation, results, figures. B.L.C.: algorithm development, results analysis, approved manuscript. C.L.M.: concept, initial idea, literature, approved manuscript. I.F.S.: initial idea, concept, project supervision, results plan, results analysis, funding acquisition, manuscript editing.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Code availability

The source code, trained models, and data reported in this study are available at: https://github.com/mosaic-group/STENCIL-NET.

Competing interests

The authors declare no competing interests. This study contains no data obtained from human or living samples.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Karnakov, P., Litvinov, S. & Koumoutsakos, P. Optimizing a DIscrete Loss (ODIL) to solve forward and inverse problems for partial differential equations using machine learning tools. arXiv preprint arXiv:2205.04611 (2022).
2.Pilva, P. & Zareei, A. Learning time-dependent PDE solver using message passing graph neural networks. arXiv preprint arXiv:2204.07651 (2022).
3.Pathak J, Hunt B, Girvan M, Lu Z, Ott E. Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys. Rev. Lett. 2018;120:024102. doi: 10.1103/PhysRevLett.120.024102. [DOI] [PubMed] [Google Scholar]
4.Pathak J, Lu Z, Hunt BR, Girvan M, Ott E. Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data. Chaos: Interdiscip. J. Nonlinear Sci. 2017;27:121102. doi: 10.1063/1.5010300. [DOI] [PubMed] [Google Scholar]
5.Vlachas, P. R., Byeon, W., Wan, Z. Y., Sapsis, T. P. & Koumoutsakos, P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. In: Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences474, 20170844 (2018). [DOI] [PMC free article] [PubMed]
6.Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 2016;113:3932–3937. doi: 10.1073/pnas.1517384113. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. Sci. Adv. 2017;3:e1602614. doi: 10.1126/sciadv.1602614. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Maddu S, Cheeseman BL, Sbalzarini IF, Müller CL. Stability selection enables robust learning of differential equations from limited noisy data. Proc. R. Soc. A. 2022;478:20210916. doi: 10.1098/rspa.2021.0916. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Long, Z., Lu, Y., Ma, X. & Dong, B. PDE-Net: Learning PDEs from data. arXiv preprint arXiv:1710.09668 (2017).
10.Long Z, Lu Y, Dong B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 2019;399:108925. doi: 10.1016/j.jcp.2019.108925. [DOI] [Google Scholar]
11.Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017).
12.Maddu S, Sturm D, Müller CL, Sbalzarini IF. Inverse Dirichlet weighting enables reliable training of physics informed neural networks. Mach. Learn.: Sci. Technol. 2022;3:015026. [Google Scholar]
13.Tompson, J., Schlachter, K., Sprechmann, P. & Perlin, K. Accelerating eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning70, 3424–3433 (JMLR. org, 2017).
14.Kim, B. et al. Deep fluids: A generative network for parameterized fluid simulations. In Computer Graphics Forum, vol. 38, 59–70 (Wiley Online Library, 2019).
15.Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in neural information processing systems, 6571–6583 (2018).
16.Bar-Sinai Y, Hoyer S, Hickey J, Brenner MP. Learning data-driven discretizations for partial differential equations. Proc. Natl. Acad. Sci. 2019;116:15344–15349. doi: 10.1073/pnas.1814058116. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Mishra, S. A machine learning framework for data driven acceleration of computations of differential equations. arXiv preprint arXiv:1807.09519 (2018).
18.Queiruga, A. F., Erichson, N. B., Taylor, D. & Mahoney, M. W. Continuous-in-depth neural networks. arXiv preprint arXiv:2008.02389 (2020).
19.Osher S, Fedkiw R, Piechor K. Level set methods and dynamic implicit surfaces. Appl. Mech. Rev. 2004;57:B15–B15. doi: 10.1115/1.1760520. [DOI] [Google Scholar]
20.Shu, C.-W. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws. In Advanced numerical approximation of nonlinear hyperbolic equations, 325–432 (Springer, Berlin, Heidelberg, 1998).
21.Lin, M., Chen, Q. & Yan, S. Network in network. arXiv preprint arXiv:1312.4400 (2013).
22.Li, Z. et al. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020).
23.Rudy SH, Kutz JN, Brunton SL. Deep learning of dynamics and signal-noise decomposition with time-stepping constraints. J. Comput. Phys. 2019;396:483–506. doi: 10.1016/j.jcp.2019.06.056. [DOI] [Google Scholar]
24.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
25.Newman DJ. Rational approximation to $| x |$ . Mich. Math. J. 1964;11:11–14. doi: 10.1307/mmj/1028999029. [DOI] [Google Scholar]
26.Telgarsky, M. Neural networks and rational functions. In Proceedings of the 34th International Conference on Machine Learning70, 3387–3393 (JMLR. org, 2017).
27.Fornberg B. Generation of finite difference formulas on arbitrarily spaced grids. Math. Comput. 1988;51:699–706. doi: 10.1090/S0025-5718-1988-0935077-0. [DOI] [Google Scholar]
28.Schrader B, Reboux S, Sbalzarini IF. Discretization correction of general integral PSE operators for particle methods. J. Comput. Phys. 2010;229:4159–4182. doi: 10.1016/j.jcp.2010.02.004. [DOI] [Google Scholar]
29.Zoumpourlis, G., Doumanoglou, A., Vretos, N. & Daras, P. Non-linear convolution filters for CNN-based learning. In Proceedings of the IEEE International Conference on Computer Vision, 4761–4769 (2017).
30.Courant R, Isaacson E, Rees M. On the solution of nonlinear hyperbolic differential equations by finite differences. Commun. Pure Appl. Math. 1952;5:243–255. doi: 10.1002/cpa.3160050303. [DOI] [Google Scholar]
31.Patankar, S. Numerical heat transfer and fluid flow. (Washington, 1980).
32.Godunov SK. A difference method for numerical calculation of discontinuous solutions of the equations of hydrodynamics. Mat. Sb. 1959;47(89):271–306. [Google Scholar]
33.Lax PD. Weak solutions of nonlinear hyperbolic equations and their numerical computation. Commun. Pure Appl. Math. 1954;7:159–193. doi: 10.1002/cpa.3160070112. [DOI] [Google Scholar]
34.Sod GA. Numerical methods in fluid dynamics: initial and initial boundary-value problems. Cambridge University Press; 1985. [Google Scholar]
35.Kovachki, N. B. et al. Neural operator: Learning maps between function spaces. CoRR arXiv:abs/2108.08481 (2021).
36.Rahaman, N. et al. On the spectral bias of neural networks. In International Conference on Machine Learning, 5301–5310 (PMLR, 2019).
37.Vlachas P, et al. Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw. 2020;126:191–217. doi: 10.1016/j.neunet.2020.02.016. [DOI] [PubMed] [Google Scholar]
38.Edson RA, Bunder JE, Mattner TW, Roberts AJ. Lyapunov exponents of the Kuramoto-Sivashinsky PDE. ANZIAM J. 2019;61:270–285. [Google Scholar]
39.Driscoll, T. A., Hale, N. & Trefethen, L. N. Chebfun guide (2014).
40.Kassam A-K, Trefethen LN. Fourth-order time-stepping for stiff PDEs. SIAM J. Sci. Comput. 2005;26:1214–1233. doi: 10.1137/S1064827502410633. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code, trained models, and data reported in this study are available at: https://github.com/mosaic-group/STENCIL-NET.

[CR1] 1.Karnakov, P., Litvinov, S. & Koumoutsakos, P. Optimizing a DIscrete Loss (ODIL) to solve forward and inverse problems for partial differential equations using machine learning tools. arXiv preprint arXiv:2205.04611 (2022).

[CR2] 2.Pilva, P. & Zareei, A. Learning time-dependent PDE solver using message passing graph neural networks. arXiv preprint arXiv:2204.07651 (2022).

[CR3] 3.Pathak J, Hunt B, Girvan M, Lu Z, Ott E. Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys. Rev. Lett. 2018;120:024102. doi: 10.1103/PhysRevLett.120.024102. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Pathak J, Lu Z, Hunt BR, Girvan M, Ott E. Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data. Chaos: Interdiscip. J. Nonlinear Sci. 2017;27:121102. doi: 10.1063/1.5010300. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Vlachas, P. R., Byeon, W., Wan, Z. Y., Sapsis, T. P. & Koumoutsakos, P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. In: Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences474, 20170844 (2018). [DOI] [PMC free article] [PubMed]

[CR6] 6.Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 2016;113:3932–3937. doi: 10.1073/pnas.1517384113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. Sci. Adv. 2017;3:e1602614. doi: 10.1126/sciadv.1602614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Maddu S, Cheeseman BL, Sbalzarini IF, Müller CL. Stability selection enables robust learning of differential equations from limited noisy data. Proc. R. Soc. A. 2022;478:20210916. doi: 10.1098/rspa.2021.0916. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Long, Z., Lu, Y., Ma, X. & Dong, B. PDE-Net: Learning PDEs from data. arXiv preprint arXiv:1710.09668 (2017).

[CR10] 10.Long Z, Lu Y, Dong B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 2019;399:108925. doi: 10.1016/j.jcp.2019.108925. [DOI] [Google Scholar]

[CR11] 11.Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017).

[CR12] 12.Maddu S, Sturm D, Müller CL, Sbalzarini IF. Inverse Dirichlet weighting enables reliable training of physics informed neural networks. Mach. Learn.: Sci. Technol. 2022;3:015026. [Google Scholar]

[CR13] 13.Tompson, J., Schlachter, K., Sprechmann, P. & Perlin, K. Accelerating eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning70, 3424–3433 (JMLR. org, 2017).

[CR14] 14.Kim, B. et al. Deep fluids: A generative network for parameterized fluid simulations. In Computer Graphics Forum, vol. 38, 59–70 (Wiley Online Library, 2019).

[CR15] 15.Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in neural information processing systems, 6571–6583 (2018).

[CR16] 16.Bar-Sinai Y, Hoyer S, Hickey J, Brenner MP. Learning data-driven discretizations for partial differential equations. Proc. Natl. Acad. Sci. 2019;116:15344–15349. doi: 10.1073/pnas.1814058116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Mishra, S. A machine learning framework for data driven acceleration of computations of differential equations. arXiv preprint arXiv:1807.09519 (2018).

[CR18] 18.Queiruga, A. F., Erichson, N. B., Taylor, D. & Mahoney, M. W. Continuous-in-depth neural networks. arXiv preprint arXiv:2008.02389 (2020).

[CR19] 19.Osher S, Fedkiw R, Piechor K. Level set methods and dynamic implicit surfaces. Appl. Mech. Rev. 2004;57:B15–B15. doi: 10.1115/1.1760520. [DOI] [Google Scholar]

[CR20] 20.Shu, C.-W. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws. In Advanced numerical approximation of nonlinear hyperbolic equations, 325–432 (Springer, Berlin, Heidelberg, 1998).

[CR21] 21.Lin, M., Chen, Q. & Yan, S. Network in network. arXiv preprint arXiv:1312.4400 (2013).

[CR22] 22.Li, Z. et al. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020).

[CR23] 23.Rudy SH, Kutz JN, Brunton SL. Deep learning of dynamics and signal-noise decomposition with time-stepping constraints. J. Comput. Phys. 2019;396:483–506. doi: 10.1016/j.jcp.2019.06.056. [DOI] [Google Scholar]

[CR24] 24.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[CR25] 25.Newman DJ. Rational approximation to $| x |$ . Mich. Math. J. 1964;11:11–14. doi: 10.1307/mmj/1028999029. [DOI] [Google Scholar]

[CR26] 26.Telgarsky, M. Neural networks and rational functions. In Proceedings of the 34th International Conference on Machine Learning70, 3387–3393 (JMLR. org, 2017).

[CR27] 27.Fornberg B. Generation of finite difference formulas on arbitrarily spaced grids. Math. Comput. 1988;51:699–706. doi: 10.1090/S0025-5718-1988-0935077-0. [DOI] [Google Scholar]

[CR28] 28.Schrader B, Reboux S, Sbalzarini IF. Discretization correction of general integral PSE operators for particle methods. J. Comput. Phys. 2010;229:4159–4182. doi: 10.1016/j.jcp.2010.02.004. [DOI] [Google Scholar]

[CR29] 29.Zoumpourlis, G., Doumanoglou, A., Vretos, N. & Daras, P. Non-linear convolution filters for CNN-based learning. In Proceedings of the IEEE International Conference on Computer Vision, 4761–4769 (2017).

[CR30] 30.Courant R, Isaacson E, Rees M. On the solution of nonlinear hyperbolic differential equations by finite differences. Commun. Pure Appl. Math. 1952;5:243–255. doi: 10.1002/cpa.3160050303. [DOI] [Google Scholar]

[CR31] 31.Patankar, S. Numerical heat transfer and fluid flow. (Washington, 1980).

[CR32] 32.Godunov SK. A difference method for numerical calculation of discontinuous solutions of the equations of hydrodynamics. Mat. Sb. 1959;47(89):271–306. [Google Scholar]

[CR33] 33.Lax PD. Weak solutions of nonlinear hyperbolic equations and their numerical computation. Commun. Pure Appl. Math. 1954;7:159–193. doi: 10.1002/cpa.3160070112. [DOI] [Google Scholar]

[CR34] 34.Sod GA. Numerical methods in fluid dynamics: initial and initial boundary-value problems. Cambridge University Press; 1985. [Google Scholar]

[CR35] 35.Kovachki, N. B. et al. Neural operator: Learning maps between function spaces. CoRR arXiv:abs/2108.08481 (2021).

[CR36] 36.Rahaman, N. et al. On the spectral bias of neural networks. In International Conference on Machine Learning, 5301–5310 (PMLR, 2019).

[CR37] 37.Vlachas P, et al. Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw. 2020;126:191–217. doi: 10.1016/j.neunet.2020.02.016. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Edson RA, Bunder JE, Mattner TW, Roberts AJ. Lyapunov exponents of the Kuramoto-Sivashinsky PDE. ANZIAM J. 2019;61:270–285. [Google Scholar]

[CR39] 39.Driscoll, T. A., Hale, N. & Trefethen, L. N. Chebfun guide (2014).

[CR40] 40.Kassam A-K, Trefethen LN. Fourth-order time-stepping for stiff PDEs. SIAM J. Sci. Comput. 2005;26:1214–1233. doi: 10.1137/S1064827502410633. [DOI] [Google Scholar]

PERMALINK

STENCIL-NET for equation-free forecasting from data

Suryanarayana Maddu

Dominik Sturm

Bevan L Cheeseman

Christian L Müller

Ivo F Sbalzarini

Abstract

Introduction

The STENCIL-NET

Neural network architecture

Figure 1.

Loss function and training

Forecasting accuracy

Figure 2.

Figure 3.

Relation with finite-difference stencils

Proposition 3.1

Figure 4.

Numerical experiments

STENCIL-NET for equation-free forecasting

Figure 5.

Figure 6.

﻿Table 1.

Figure 7.

Figure 8.

Table 2.

Figure 9.

STENCIL-NET as an autonomous predictor of chaotic dynamics

Figure 10.

Figure 11.

Figure 12.

STENCIL-NET for learning smooth dynamics from noisy data

Figure 13.

Table 3﻿.

Table 4.

Conclusions

Acknowledgements

Author contributions

Funding

Code availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1.

Table 3.