WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION

DANIEL A MESSENGER; DAVID M BORTZ

doi:10.1137/20m1343166

. Author manuscript; available in PMC: 2024 Jan 18.

Published in final edited form as: Multiscale Model Simul. 2021 Sep 7;19(3):1474–1497. doi: 10.1137/20m1343166

WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION

DANIEL A MESSENGER ^†, DAVID M BORTZ ^†

PMCID: PMC10795802 NIHMSID: NIHMS1912871 PMID: 38239761

Abstract

We present a novel weak formulation and discretization for discovering governing equations from noisy measurement data. This method of learning differential equations from data fits into a new class of algorithms that replace pointwise derivative approximations with linear transformations and variance reduction techniques. Compared to the standard SINDy algorithm presented in [S. L. Brunton, J. L. Proctor, and J. N. Kutz, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937], our so-called weak SINDy (WSINDy) algorithm allows for reliable model identification from data with large noise (often with ratios greater than 0.1) and reduces the error in the recovered coefficients to enable accurate prediction. Moreover, the coefficient error scales linearly with the noise level, leading to high-accuracy recovery in the low-noise regime. Altogether, WSINDy combines the simplicity and efficiency of the SINDy algorithm with the natural noise reduction of integration, as demonstrated in [H. Schaeffer and S. G. McCalla, Phys. Rev. E, 96 (2017), 023302], to arrive at a robust and accurate method of sparse recovery.

Keywords: data-driven model selection, nonlinear dynamics, sparse recovery, generalized least squares, Galerkin method, adaptive grid, 37M10, 62J99, 62-07, 65R99

1. Problem statement.

Consider a first-order dynamical system in $D$ dimensions of the form

\frac{d}{d t} x (t) = F (x (t)), x (0) = x_{0} \in ℝ^{D}, 0 \leq t \leq T,

(1.1)

and measurement data $y \in ℝ^{M \times D}$ given at $M$ timepoints $t = {(t_{1}, \dots, t_{M})}^{T}$ by

y_{m d} = x_{d} (t_{m}) + ϵ_{m d}, m \in [M], d \in [D],

where throughout we use the bracket notation $[M] : = {1, \dots, M}$ . The variable $ϵ \in ℝ^{M \times D}$ represents a matrix of independent and identically distributed measurement noise. The focus of this article is the reconstruction of the dynamics (1.1) from the measurements $y$ .

The SINDy algorithm (sparse identification of nonlinear dynamics [4]) has been shown to be successful in solving this problem for sparsely represented nonlinear dynamics when noise is small and dynamic scales do not vary across multiple orders of magnitude. This framework assumes that the function $F : ℝ^{D} \to ℝ^{D}$ in (1.1) is given componentwise by

F_{d} (x (t)) = \sum_{j = 1}^{J} w_{j d}^{⋆} f_{j} (x (t))

(1.2)

for some known family of functions ${(f_{j})}_{j \in [J]}$ and a sparse weight matrix $w^{⋆} \in ℝ^{J \times D}$ . The problem is then transformed into solving for $w^{⋆}$ by building a data matrix $Θ (y) \in ℝ^{M \times J}$ given by

Θ {(y)}_{m j} = f_{j} (y_{m}), y_{m} : = (y_{m 1}, \dots, y_{m D}),

so that the candidate functions are directly evaluated at the noisy data. Solving (1.1) for $F$ then reduces to identifying a sparse weight matrix $\hat{w}$ such that

\dot{y} \approx Θ (y) \hat{w},

(1.3)

where $\dot{y}$ is the numerical time derivative of the data $y$ . Sequential-thresholding least squares is then used to arrive at a sparse solution.

1.1. Background.

Research into statistically rigorous selection of mathematical models from data can be traced back to Akaike’s seminal work in the 1970s [1, 2]. In the last 20 years, there has been substantial work in this area at the interface between applied mathematics, computer science, and statistics (see [3, 11, 12, 19, 22, 23] for both theory and applications). More recently, the formulation of system discovery problems in terms of a candidate basis of nonlinear functions (1.2) and subsequent discretization (1.3) was introduced in [21] in the context of catastrophe prediction. The authors of [21] used compressed sensing techniques to enforce sparsity. Since then there has been an explosion of interest in the problem of identifying nonlinear dynamical systems from data, with some of the primary techniques being Gaussian process regression [15], deep neural networks [16], Bayesian inference [26, 27], and classical methods from numerical analysis [7, 9, 25]. These techniques have been successfully applied to the discovery of both ordinary and partial differential equations.

The variety of discovery algorithms qualitatively differ in the interpretability of the resulting data-driven dynamical system, the scope and efficiency of the algorithm, and the robustness to noise, scale separation, etc. For instance, a neural network based data-driven dynamical system does not easily lend itself to physical interpretation, while the SINDy algorithm identifies governing equations which can be analyzed directly. Moreover, it is also well-known that the training stage for neural networks and other iterative learning algorithms can be computationally costly. Concerning the scope of an algorithm, several methods have been independently developed to discover models under the assumption of some prior knowledge of the governing equations, notably for low-degree polynomial chaotic systems, cyclic ODEs, interacting particles, and Hamiltonian dynamics [20, 18, 13, 24]. In each of these cases the authors derive probabilistic recovery guarantees depending on the number of available trajectories, the size of the candidate model library, the level of incoherence of the data, and/or the sparsity of the governing equations.

The vast majority of algorithms and recovery guarantees assume that pointwise derivatives of the data either are available or can be reliably computed. This severely limits an algorithm’s robustness to noise and hence its applicability to real world data. Here we relax this assumption and provide rigorous justification for the weak formulation of the dynamics as a means to circumvent this ubiquitous problem in model selection. Building off of the SINDy framework, we present the robust discovery algorithm WSINDy (weak SINDy), which operates under the assumption that the time derivative is unavailable and that the only prior knowledge of the governing equations is their inclusion in a large model library. We also focus on the realistic scenario where only a single noisy trajectory of the state variable is available; however, extension to multiple trajectories is of course possible. For simplicity, we restrict numerical experiments to autonomous ODEs for their amenability to analysis. Natural next steps are to explore identification of PDEs and nonautonomous dynamical systems. We note that the use of integral equations for system identification was introduced in [17], where compressed sensing techniques were used to enforce sparsity, and that this technique can be seen as a special case of the method introduced here.

In section 2 we introduce the algorithm with analysis of the resulting error structure. Section 3 contains numerical results showing identification of six ODE systems over a range of noise levels and parameter regimes. In section 4, we provide concluding remarks as well as natural next directions for this line of research. In Appendix A we include a detailed comparison between WSINDy and SINDy as well as further information on the generalized least squares method.

2. WSINDy.

We approach the problem of system identification (1.3) from a nonstandard perspective by utilizing the weak form of the differential equation. Recall that for any smooth test function $ϕ : ℝ \to ℝ$ (absolutely continuous is sufficient) and interval $(a, b) \subset [0, T]$ , (1.1) admits the weak formulation

ϕ (b) x (b) - ϕ (a) x (a) - \int_{a}^{b} ϕ^{'} (u) x (u) d u = \int_{a}^{b} ϕ (u) F (x (u)) d u, 0 \leq a < b \leq T .

(2.1)

With $ϕ = 1$ , we arrive at the integral equation of the dynamics explored in [17]. If we instead take $ϕ$ to be nonconstant and compactly supported in $(a, b)$ , we arrive at

- \int_{a}^{b} ϕ^{'} (u) x (u) d u = \int_{a}^{b} ϕ (u) F (x (u)) d u .

(2.2)

Assuming a representation of the form (1.2), we then define the generalized residual $𝓡 (w; ϕ)$ for a given test function $ϕ$ by replacing $F$ with a candidate element from the span of ${(f_{j})}_{j \in [J]}$ and $x$ with $y$ as follows:

𝓡 (w; ϕ) : = \int_{a}^{b} (ϕ^{'} (u) y (u) + ϕ (u) (\sum_{j = 1}^{J} w_{j} f_{j} (y (u)))) d u .

(2.3)

Clearly, with $w = w^{⋆}$ and $y = x (t)$ we have $𝓡 (w; ϕ) = 0$ for all $ϕ$ compactly supported in $(a, b)$ ; however, $y$ is a discrete set of data, so (2.3) can at best be approximated numerically. Measurement noise then presents a significant barrier to accurate indentification of $w^{⋆}$ .

2.1. Method overview.

For analogy with traditional Galerkin methods, consider the forward problem of solving a dynamical system such as (1.1) for $x$ . The Galerkin approach is to seek a solution $x$ represented in a chosen trial basis ${(f_{j})}_{j \in [J]}$ such that the residual $𝓡$ , defined by

𝓡 = \int ϕ (t) (\dot{x} (t) - F (x (t))) d t,

is minimized over all test functions $ϕ$ living in the span of a given test function basis ${(ϕ_{k})}_{k \in [K]}$ . If the trial and test function bases are known analytically, inner products of the form $〈 f_{j}, ϕ_{k} 〉$ appearing in the residual can be computed exactly. Thus, the computational error results only from representing the solution in a finite-dimensional function space.

The method we present here can be considered a data-driven Galerkin method of solving for $F$ where the trial “basis” is given by the set of gridfunctions ${(f_{j} (y))}_{j \in [J]}$ evaluated at the data and only the test function basis ${(ϕ_{k})}_{k \in [K]}$ is known analytically. In this way, inner products appearing in $𝓡 (w; ϕ)$ must be approximated numerically, implying that the accuracy of the recovered weights $\hat{w}$ is ultimately limited by the quadrature scheme used to discretize inner products. Using Lemma 2 below, we show that the correct coefficients $w^{⋆}$ may be recovered to effective machine precision accuracy (given by the tolerance of the forward ODE solver) from noise-free trajectories $y$ by discretizing (2.2) using the trapezoidal rule and choosing $ϕ$ to decay smoothly to zero at the boundaries of its support. Specifically, in this article we demonstrate this fact by choosing test functions from a particular family of unimodal piecewise polynomials $𝓢$ defined in (2.6).

Having chosen a quadrature scheme, the next accuracy barrier is presented by measurement noise, introducing randomness into the residuals $𝓡 (w; ϕ)$ . Numerical integration then couples residuals $𝓡 (w; ϕ_{1})$ and $𝓡 (w; ϕ_{2})$ whenever $ϕ_{1}$ and $ϕ_{2}$ have overlapping support. In this way, $𝓡 (w; ϕ)$ does not have an ideal error structure for least squares but may be amenable to generalized least squares. Below we analyze the distribution of the residuals $𝓡 (w; ϕ)$ to arrive at a generalized least squares approach where an approximate covariance matrix can be computed directly from the test functions. This analysis also suggests that placing test functions near steep gradients in the dynamics may improve recovery; hence we develop a derivative-free method for adaptively clustering test functions near steep gradients.

Remark 1.

The weak formulation of the dynamics introduces a wealth of information: given $M$ timepoints $t = {(t_{m})}_{m \in [M]}$ , (2.2) affords $K = M (M - 1) / 2$ residuals over all possible supports $(a, b) \subset t \times t$ with $a < b$ . Of course, one could also assimilate the responses of multiple families of test functions $({ϕ_{k}^{1}}_{k \in [K_{1}]}, {ϕ_{k}^{2}}_{k \in [K_{2}]}, \dots)$ ; however, the computational complexity of such an exhaustive approach quickly becomes intractable. We stress that even with large noise, our proposed method identifies the correct nonlinearities with accurate weight recovery while keeping the number of test functions lower than the number of timepoints $(K < M)$ .

2.2. Algorithm: WSINDy.

We state here the WSINDy algorithm in full generality. We propose a generalized least squares approach with approximate covariance matrix $Σ$ . Below we derive a particular choice of $Σ$ which utilizes the action of the test functions ${(ϕ_{k})}_{k \in [K]}$ on the data $y$ . Sequential thresholding on the weight coefficients $w$ with thresholding parameter $λ$ is used to enforce sparsity, where $λ \leq \min_{w^{⋆} \neq 0} | w^{⋆} |$ is necessary for recovery. Lastly, an $ℓ_{2}$ -regularization term with coefficient $γ$ is included for problems involving rank deficiency. Methods of choosing optimal values of $λ$ and $γ$ directly from a given dataset do exist, for instance, by selecting the optimal position in a Pareto front [5]; however, this is not the focus of our current study, and thus we select values that work across multiple examples. Specifically, in the experiments below we set $γ = 0$ with the exception of the nonlinear pendulum and the five-dimensional linear system, examples which show that regularization can be used to discover dynamics from excessively large libraries. For noise-free data the algorithm is only weakly dependent on $λ$ and so we use $λ = 0.001$ , while for noisy data we set $λ = \frac{1}{4} \min_{w^{⋆} \neq 0} | w^{⋆} |$

\hat{w} = WSINDy (y, t; {(ϕ_{k})}_{k \in [K]}, {(f_{j})}_{j \in [J]}, Σ, λ, γ) :

Construct matrix of trial gridfunctions $Θ (y) = [f_{1} (y) | \dots | f_{J} (y)]$ .
Construct integration matrices $V$ , $V^{'}$ such that
$V_{k m} = Δ t ϕ_{k} (t_{m}), V_{k m}^{'} = Δ t ϕ_{k}^{'} (t_{m}) .$
Compute Gram matrix $G = V Θ (y)$ and right-hand side $b = - V^{'} y$ so that $G_{k j} = 〈 ϕ_{k}, f_{j} (y) 〉$ and $b_{k d} = - 〈 ϕ_{k}^{'}, y_{d} 〉$ .
Solve the generalized least squares problem with $ℓ_{2}$ -regularization
$\hat{w} = {argmin}_{w} {{(G w - b)}^{T} Σ^{- 1} (G w - b) + γ^{2} ∥ w ∥_{2}^{2}},$

using sequential thresholding with parameter $λ$ to enforce sparsity.

With this as our core algorithm, we can now consider a residual analysis (section 2.3) leading to a generalized least squares framework. We can also develop theoretical results related to the test functions (section 2.4), yielding a more thorough understanding of the impact of using uniform (section 2.4.1) and adaptive (section 2.4.2) placement of test functions along the time axis.

2.3. Residual analysis.

Performance of WSINDy is determined by the behavior of the residuals

𝓡 (w; ϕ_{k}) : = {(G w - b)}_{k} \in ℝ^{1 \times D},

denoted $𝓡 (w) \in ℝ^{K \times D}$ for the entire residual matrix. Here we analyze the residual for autonomous $F$ to highlight key aspects for future analysis, as well as to arrive at an appropriate choice of approximate covariance $\sum$ . We also provide a heuristic argument in favor of placing test functions near steep gradients in the dynamics.

A key difficulty in recovering the true weights $w^{⋆}$ is that for nonlinear systems the residual evaluated at the true weights $w^{⋆}$ is biased: $E [𝓡 (w^{⋆})] \neq 0$ . Any minimization of $𝓡$ thus introduces a bias in the recovered weights $\hat{w}$ . Nevertheless, we can understand how different test functions impact the residual by linearizing around the true trajectory $x (t)$ and isolating the dominant error terms :

𝓡 (w; ϕ_{k}) = 〈 ϕ_{k}, Θ (y) w 〉 + 〈 ϕ_{k}^{'}, y 〉 = 〈 ϕ_{k}, Θ (y) (w - w^{⋆}) 〉 + 〈 ϕ_{k}, Θ (y) w^{⋆} 〉 + 〈 ϕ_{k}^{'}, y 〉 = 〈 ϕ_{k}, Θ (y) (w - w^{⋆}) 〉 + 〈 ϕ_{k}, F (y) - F (x) 〉 + 〈 ϕ_{k}^{'}, ϵ 〉 + I_{k} = \underset{R_{1}}{\underset{︸}{〈 ϕ_{k}, Θ (y) (w - w^{⋆}) 〉}} + \underset{R_{2}}{\underset{︸}{〈 ϕ_{k}, ϵ \nabla F (x) 〉}} + \underset{R_{3}}{\underset{︸}{〈 ϕ_{k}^{'}, ϵ 〉}} + I_{k} + 𝓞 (ϵ^{2}),

where $\nabla F {(x)}_{d d^{'}} = \frac{\partial F_{d^{'}}}{\partial x_{d}} (x)$ . The errors manifest in the following ways:

$R_{1}$ is the misfit between $w$ and $w^{⋆}$ .
$R_{2}$ results from measurement error in trial gridfunctions: $f_{j} (y) = f_{j} (x + ϵ) \neq f_{j} (x)$ .
$R_{3}$ results from replacing $x$ with $y = x + ϵ$ in the left-hand side of (2.2).
$I_{k}$ is a deterministic integration error.
$𝓞 (ϵ^{2})$ is the remainder term in the truncated Taylor expansion of $F (y)$ around $x$ :
$F (y_{m}) = F (x (t_{m})) + ϵ_{m} \nabla F (x (t_{m})) + 𝓞 ({| ϵ_{m} |}^{2}) .$

Clearly, recovery of $F$ when $ϵ = 0$ is straightforward: $R_{1}$ and $I_{k}$ are the only error terms; thus one only needs to select a quadrature scheme that ensures that the integration error $I_{k}$ is negligible and $\hat{w} = w^{⋆}$ will be the minimizer. A primary focus of this study is the use of a specific family of piecewise polynomial test functions $𝓢$ defined below for which the trapezoidal rule is highly accurate (see Lemma 2). Figure 3.1 demonstrates this fact on noise-free data.

FIG. 3.1. — *Noise-free data* ( $σ_{N R} = 0$ )*: plots of relative coefficient error* $E_{2} (\hat{w})$ *(defined in* (3.2)*) vs. p. V*1-V4 *indicate different ODE parameters (see Table 2). For the Lorenz system the parameters are fixed, and* 40 *different initial conditions are sampled from a uniform distribution. In each case, the recovered coefficients* $\hat{w}$ *rapidly converge to within the accuracy of the ODE solver (*10⁻¹⁰).

For $ϵ > 0$ , accurate recovery of $F$ requires one to choose hyperparameters that exemplify the true misfit term $R_{1}$ by enforcing that the other error terms are of lower order. We look for ${(ϕ_{k})}_{k \in [K]}$ and $Σ = C C^{T}$ that approximately enforce $C^{- 1} 𝓡 (w^{⋆}) \sim 𝓝 (0, σ^{2} I)$ , justifying the least squares approach. In the next subsection we address the issue of approximating the covariance matrix, providing justification for using $Σ = V^{'} {(V^{'})}^{T}$ . The following subsection provides a heuristic argument for how to reduce corruption from the error terms $R_{2}$ and $R_{3}$ by placing test functions near steep gradients in the data.

2.3.1. Approximate covariance $Σ$ .

Neglecting the deterministic integration error, which can be made small (see Lemma 2 below), and higher-order noise terms, the residual evaluated at the true weights is approximately

𝓡 (w^{⋆}; ϕ_{k}) \approx R_{2} + R_{3},

where $E [R_{2}] = E [R_{3}] = (0, \dots, 0)$ implies that $E [𝓡 (w^{⋆})] = 0$ to leading order. Given the variances

V [R_{2}] = V [〈 ϕ_{k}, ϵ \nabla F (x) 〉] = Δ t σ^{2} ({‖ ϕ_{k} | \nabla F_{1} (x) | ‖}_{2}^{2}, \dots, {‖ ϕ_{k} | \nabla F_{D} (x) | ‖}_{2}^{2})

and

V [R_{3}] = V [〈 ϕ_{k}^{'}, ϵ 〉] = Δ t σ^{2} ({‖ ϕ_{k}^{'} ‖}_{2}^{2}, \dots, {‖ ϕ_{k}^{'} ‖}_{2}^{2}),

the true distribution of $𝓡 (w^{⋆})$ depends on $F$ , which is not known a priori. If it holds that ${‖ ϕ_{k}^{'} ‖}_{2} ≫ {‖ ϕ_{k} | \nabla F_{d} (x) | ‖}_{2}, d \in [D]$ , a leading order approximation to $Cov (𝓡 (w^{⋆}))$ is

Σ : = V^{'} {(V^{'})}^{T} \propto Cov (R_{3}),

using that $Cov {(R_{3})}_{i j} = Δ t σ^{2} 〈 ϕ_{i}^{'}, ϕ_{j}^{'} 〉$ . For this reason, we employ localized test functions and adopt the heuristic $Σ = V^{'} {(V^{'})}^{T}$ below.

2.3.2. Adaptive refinement.

Next we show that by localizing $ϕ_{k}$ around large $| \dot{x} |$ , we get an approximate cancellation of the error terms $R_{2}$ and $R_{3}$ . Consider the one-dimensional case $(D = 1)$ where $m$ is an arbitrary time index and $y_{m} = x (t_{m}) + ϵ$ is an observation. When $| \dot{x} (t_{m}) |$ is large compared to $ϵ$ , we approximately have

y_{m} = x (t_{m}) + ϵ_{m} \approx x (t_{m} + δ t) \approx x (t_{m}) + δ t F (x (t_{m}))

(2.4)

for some small $δ t$ , i.e., the perturbed value $y_{m}$ lands close to the true trajectory $x$ at the time $t_{m} + δ t$ . To understand the heuristic behind this approximation, let $t_{m} + δ t$ be the point of intersection between the tangent line to $x (t)$ at $t_{m}$ and $x (t_{m}) + ϵ$ . Then

δ t = \frac{ϵ}{\dot{x} (t_{m})};

hence $| \dot{x} (t_{m}) | ≫ ϵ$ implies that $x (t_{m}) + ϵ$ will approximately lie on the true trajectory. As well, regions where $| \dot{x} (t_{m}) |$ is small will not yield accurate recovery in the case of noisy data, since perturbations are more likely to exit the relevant region of phase space. If we linearize $F$ using the approximation (2.4) we get

F (y_{m}) \approx F (x (t_{m})) + δ t F^{'} (x (t_{m})) F (x (t_{m})) = F (x (t_{m})) + δ t \ddot{x} (t_{m}) .

(2.5)

Assuming $ϕ_{k}$ is sufficiently localized around $t_{m}$ , (2.4) also implies that

〈 ϕ_{k}^{'}, x 〉 + \underset{R_{3}}{\underset{︸}{〈 ϕ_{k}^{'}, ϵ 〉}} = 〈 ϕ_{k}^{'}, y 〉 \approx 〈 ϕ_{k}^{'}, x 〉 + δ t 〈 ϕ_{k}^{'}, F (x) 〉;

hence $R_{3} \approx δ t 〈 ϕ_{k}^{'}, F (x) 〉$ , while (2.5) implies

〈 ϕ_{k}, Θ (y) w 〉 = \underset{= R_{1}}{\underset{︸}{〈 ϕ_{k}, Θ (y) (w - w^{⋆}) 〉}} + 〈 ϕ_{k}, F (y) 〉 \approx ϕ_{k}, Θ (y) (w - w^{⋆}) + 〈 ϕ_{k}, F (x) 〉 + \underset{\approx R_{2}}{\underset{︸}{δ t 〈 ϕ_{k}, \ddot{x} 〉}} = 〈 ϕ_{k}, Θ (y) (w - w^{⋆}) 〉 + 〈 ϕ_{k}, F (x) 〉 - δ t 〈 ϕ_{k}^{'}, F (x) 〉,

having integrated by parts. Collecting the terms together yields that the residual takes the form

𝓡 (w; ϕ_{k}) = 〈 ϕ_{k}^{'}, y 〉 + 〈 ϕ_{k}, Θ (y) w 〉 \approx R_{1},

and we see that $R_{2}$ and $R_{3}$ have effectively cancelled. In higher dimensions this interpretation does not appear to be as illuminating, but nevertheless, for any given coordinate $x_{d}$ , it does hold that terms in the error expansion vanish around points $t_{m}$ where $| {\dot{x}}_{d} |$ is large, precisely because $x_{d} (t_{m}) + ϵ \approx x_{d} (t_{m} + δ t)$ .

2.4. Test function basis ${(ϕ_{k})}_{k \in [K]}$

Here we introduce a test function space $𝓢$ and quadrature scheme to minimize integration errors and enact the heuristic arguments above, which rely on $ϕ_{k}$ having fast decay to its support boundaries and being sufficiently localized to ensure ${‖ ϕ_{k}^{'} ‖}_{2}^{2} ≫ {‖ ϕ_{k} ‖}_{2}^{2}$ . We define the space $𝓢$ of unimodal piecewise polynomials of the form

ϕ (t) = {\begin{array}{l} C {(t - a)}^{p} {(b - t)}^{q} & t \in [a, b], \\ 0 & otherwise, \end{array}

(2.6)

where $(a, b) \subset t \times t$ satisfies $a < b$ and $p, q \geq 1$ . The normalization

C = \frac{1}{p^{p} q^{q}} {(\frac{p + q}{b - a})}^{p + q}

ensures that $∥ ϕ ∥_{\infty} = 1$ . Functions $ϕ \in 𝓢$ are nonnegative, unimodal, and compactly supported in $[0, T]$ with $⌊ \min {p, q} ⌋ - 1$ continuous derivatives. Larger $p$ and $q$ imply faster decay towards the endpoints of the support. For $p = q$ , we refer to $p$ as the degree of $ϕ$ .

To ensure the integration error in approximating inner products $〈 f_{j}, ϕ_{k} 〉$ is negligible, we rely on the following lemma, which provides a bound on the error in discretizing the weak derivative relation

- \int ϕ^{'} f d t = \int ϕ f^{'} d t

(2.7)

using the trapezoidal rule for compactly supported $ϕ$ . Following the lemma we introduce two strategies for choosing the parameters of the test functions ${(ϕ_{k})}_{k \in [K]} \subset 𝓢$ .

Lemma 2 (numerical error in weak derivatives).

Let $f, ϕ$ have continuous derivatives of order $p$ , and define $t_{j} = a + j \frac{b - a}{N} = a + j Δ t$ . If $ϕ$ has roots $ϕ (a) = ϕ (b) = 0$ of multiplicity $p$ , then

\frac{Δ t}{2} \sum_{j = 0}^{N - 1} [g (t_{j}) + g (t_{j + 1})] = 𝓞 (Δ t^{p + 1}),

(2.8)

where $g (t) = ϕ^{'} (t) f (t) + ϕ (t) f^{'} (t)$ . In other words, the composite trapezoidal rule discretizes the weak derivative relation (2.7) to order $p + 1$ .

Proof.

This is a simple consequence of the Euler-Maclaurin formula. If $g : [a, b] \to ℂ$ is a smooth function, then the following asymptotic expansion holds:

\frac{Δ t}{2} \sum_{j = 0}^{N - 1} [g (t_{j}) + g (t_{j + 1})] \sim \int_{a}^{b} g (t) d t + \sum_{k = 1}^{\infty} \frac{Δ t^{2 k} B_{2 k}}{(2 k)!} (g^{(2 k - 1)} (b) - g^{(2 k - 1)} (a)),

where $B_{2 k}$ are the Bernoulli numbers. The asymptotic expansion provides corrections to the trapezoidal rule that realize machine precision accuracy up until a certain value of $k$ , after which terms in the expansion grow and the series diverges [6, Chapter 3]. In our case, $g (t) = ϕ^{'} (t) f (t) + ϕ (t) f^{'} (t)$ , where the root conditions on $ϕ$ imply that

\int_{a}^{b} g (t) d t = 0 and g^{(k)} (b) = g^{(k)} (a) = 0, 0 \leq k \leq p - 1.

So for $p$ odd, we have that

\frac{Δ t}{2} \sum_{j = 0}^{N - 1} [g (t_{j}) + g (t_{j + 1})] \sim \sum_{k = (p + 1) / 2}^{\infty} \frac{Δ t^{2 k} B_{2 k}}{(2 k)!} (g^{(2 k - 1)} (b) - g^{(2 k - 1)} (a)) = \frac{B_{p + 1}}{(p + 1)!} (ϕ^{(p)} (b) f (b) - ϕ^{(p)} (a) f (a)) Δ t^{p + 1} + 𝓞 (Δ t^{p + 2}) .

For even $p$ , the leading term is $𝓞 (Δ t^{p + 2})$ with a slightly different coefficient.

For $ϕ \in 𝓢$ with $p = q$ , the exact leading order error in term in (2.8) is

\frac{2^{p} B_{p + 1}}{p + 1} (f (b) - f (a)) Δ t^{p + 1},

(2.9)

which is negligible for a wide range of reasonable $p$ and $Δ t$ values. The Bernoulli numbers eventually start growing like $p^{p}$ , but for smaller values of $p$ they are moderate. For instance, with $Δ t = 0.1$ and $f (b) - f (a) = 1$ , this error term is $o (1)$ up until $p = 85$ , where it takes the value 0.495352, while for $Δ t = 0.01$ , the error is below machine precision for all $p$ between 7 and 819. For these reasons, in what follows we choose test functions ${(ϕ_{k})}_{k \in [K]} \subset 𝓢$ and discretize all integrals using the trapezoidal rule. Unless otherwise stated, each function $ϕ_{k}$ satisfies $p = q$ and so is fully determined by the tuple ${p_{k}, a_{k}, b_{k}}$ indicating its polynomial degree and support. In the next two subsections we propose two different strategies for determining $ϕ_{k}$ using the data $y$ .

2.4.1. Strategy 1: Uniform grid.

The simplest strategy for choosing a basis of test functions ${(ϕ_{k})}_{k \in [K]} \subset 𝓢$ is to place $ϕ_{k}$ uniformly on the interval $[0, T]$ with fixed degree $p$ and fixed support size

L : = # {t \cap supp (ϕ_{k})}

(i.e., $L$ is the number of timepoints in $t$ that $ϕ_{k}$ is supported on). The triple $(L, p, K)$ then defines the scheme, where each piece effects the distribution of the residual $𝓡 (w)$ .

Step 1: Choosing L.

Heuristically, the support size of $ϕ_{k}$ relates to the Fourier transform of the data. If $supp (ϕ_{k})$ is small compared to the dominant wavemodes in the dynamics, then high-frequency noise will dominate the values of the inner products $〈 ϕ_{k}^{'}, y 〉$ . If $supp (ϕ_{k})$ is much larger than the dominant wavemodes, then too much averaging may occur, leading to unresolved dynamics. A natural choice is then to set $L$ equal to the period of a known active wavemode^{1
$k$}:

L = ⌊ \frac{1}{Δ t} \frac{2 π}{(2 π T / k)} ⌋ = ⌊ \frac{M}{k} ⌋ .

In the noise-free and small-noise experiments below we set $L = ⌊ \frac{M}{25} ⌋$ and leave optimal selection of $L$ based on Fourier analysis to future work.

Step 2: Determining $p$ .

In light of the derivation above of the approximate covariance matrix $Σ = V^{'} {(V^{'})}^{T}$ , we define the parameter $ρ : = {‖ ϕ_{k}^{'} ‖}_{2} / {‖ ϕ_{k} ‖}_{2}$ , which serves as an estimate for the ratio $\sqrt{V [R_{3}] / V [R_{2}]}$ between the standard deviations of the two dominant error terms $R_{3}$ and $R_{2}$ in the residual $𝓡 (w^{⋆})$ . Larger $ρ$ indicates better agreement with the approximate covariance matrix $Σ$ , since $Σ \propto Cov (R_{3})$ . Furthermore, for $ϕ_{k} \in 𝓢$ we have the exact formula

ρ^{2} = \frac{8 p^{2}}{{(b - a)}^{2}} (\frac{Γ (2 p - 1) Γ (2 p + \frac{1}{2})}{Γ (2 p + 1) Γ (2 p + \frac{3}{2})}) = \frac{p}{{(b - a)}^{2}} (\frac{4 p + 1}{p - \frac{1}{2}}),

where $Γ (z) = \int_{0}^{\infty} t^{z - 1} e^{- t} d t$ is the gamma function. Given $ρ^{2} \geq (5 + 2 \sqrt{6}) / {(b - a)}^{2}$ , a polynomial degree $p$ may be selected from $ρ$ using the formula

⌊ p = \frac{1}{8} (({(b - a)}^{2} ρ^{2} - 1) + \sqrt{{({(b - a)}^{2} ρ^{2} - 1)}^{2} - 8 {(b - a)}^{2} ρ^{2}}) ⌋ .

Step 3: Determining $K$ .

Next we introduce the shift parameter $s \in [0, 1]$ defined by

s : = ϕ_{k} (t^{*}) s.t. ϕ_{k} (t^{*}) = ϕ_{k + 1} (t^{*}),

which determines $K$ from $p$ and $L$ . In words, $s$ is the height of intersection between $ϕ_{k}$ and $ϕ_{k + 1}$ and measures the amount of overlap between successive test functions. More overlap increases the correlation between rows in the residual $𝓡 (w)$ and hence leads to larger off-diagonal elements in the covariance matrix $Σ$ . Larger $s$ implies that neighboring functions overlap on more points, with $s = 1$ indicating that $ϕ_{k} = ϕ_{k + 1}$ . Specifically, neighboring test functions overlap on $⌊ L (1 - \sqrt{1 - s^{1 / p}}) ⌋$ timepoints. In Figures 3.2 and 3.3 we vary the parameters $ρ$ and $s$ and observe that results agree with intuition: larger $ρ$ (better agreement with $Σ$ ) and larger $s$ (more test functions) lead to better recovery of $w^{⋆}$ . We summarize the uniform grid algorithm below.

FIG. 3.2. — *Small-noise regime: dynamic recovery of the Duffing equation with* $β = 1$ . *Top: heat map of the* $\log_{10}$ *average error* $E_{2} (\hat{w})$ *(left) and sample standard deviation of* $E_{2} (\hat{w})$ *(right) over* 200 *instantiations of noise with* $σ_{N R} = 0.04$ (4% *noise) vs*. $ρ$ *and* $s$ . Bottom: $E_{2} (\hat{w})$ . $ρ$ *for fixed* $s = 0.5$ *and various* $σ_{N R}$ . *For* $ρ > 3$ *the average error is roughly an order of magnitude below* $σ_{N R}$ .

FIG. 3.3. — *Small-noise regime: dynamic recovery of the van der Pol oscillator with* $β = 4$ . *Top: heat map of the* $\log_{10}$ *average error* $E_{2} (\hat{w})$ *(left) and sample standard deviation of* $E_{2} (\hat{w})$ *(right) over* 200 *instantiations of noise with* $σ_{N R} = 0.04$ (4% *noise) vs.* $ρ$ *and* $s$ . *Bottom:* $E_{2} (\hat{w})$ vs. $ρ$ *for fixed* $s = 0.5$ *and various* $σ_{N R}$ . *Similar to the Duffing equation, average error falls to roughly an order of magnitude below* $σ_{N R}$ , *although for van der Pol this regime is reached when* $ρ \approx 6$ .

$\hat{w} = WSINDy_UG (y, t; {(f_{j})}_{j \in [J]}, L, ρ, s, λ, γ)$ :

Construct matrix of trial gridfunctions $Θ (y) = [f_{1} (y) | \dots | f_{J} (y)]$ .
Construct integration matrices $V, V^{'}$ such that
$V_{k m} = Δ t ϕ_{k} (t_{m}), V_{k m}^{'} = Δ t ϕ_{k}^{'} (t_{m})$
with the test functions ${(ϕ_{k})}_{k \in [K]}$ determined by $L, ρ, s$ as described above.
Compute Gram matrix $G = V Θ (y)$ and right-hand side $b = - V^{'} y$ so that $G_{k j} = 〈 ϕ_{k}, f_{j} (y) 〉$ and $b_{k d} = - 〈 ϕ_{k}^{'}, y_{d} 〉$ .
Compute approximate covariance and Cholesky factorization $Σ = V^{'} {(V^{'})}^{T} = C C^{T}$ .
Solve the generalized least squares problem with $ℓ_{2}$ -regularization
$\hat{w} = {argmin}_{w} {{(G w - b)}^{T} Σ^{- 1} (G w - b) + γ^{2} ∥ w ∥_{2}^{2}},$
using sequential thresholding with parameter $λ$ to enforce sparsity.

2.4.2. Strategy 2: Adaptive grid.

Motivated by the arguments above, we now introduce an algorithm for constructing a test function basis localized near points of large change in the dynamics. This occurs in three steps: (1) construct a weak approximation to the derivative of the dynamics $v \approx \dot{x}$ , (2) sample $K$ points $c$ from a cumulative distribution $ψ$ with density proportional to the total variation $| v |$ , and (3) construct test functions centered at $c$ using a width-at-half-max parameter $r_{w h m}$ to determine the parameters $(p_{k}, a_{k}, b_{k})$ of each function $ϕ_{k}$ . Each of these steps is numerically stable and carried out independently along each coordinate of the dynamics. A visual diagram is provided in Figure 2.1.

FIG. 2.1. — *Adaptive grid construction used on data from the Duffing equation with* 10% *noise (* $σ_{N R} = 0.1$ *). As desired, the centers* $c$ *are clustered near steep gradients in the dynamics despite large measurement noise. (* $Note - ϕ (t)^{'} / 10$ *is plotted in the upper-left instead of* $ϕ (t)^{'}$ *in order to visualize both* $ϕ$ *and* $ϕ^{'}$ .)

Step 1: Weak derivative approximation.

Define $v : = - V_{w}^{'} y$ , where the matrix $- V_{w}^{'}$ enacts a linear convolution with the derivative of a chosen test function $ϕ \in 𝓢$ of degree $p_{w}$ and support size $L_{w}$ so that

v_{m} = - 〈 ϕ^{'}, y 〉 = 〈 ϕ, \dot{y} 〉 \approx {\dot{y}}_{m} .

The parameters $L_{w}$ and $p_{w}$ are chosen by the user, with $L_{w} = 5$ and $p_{w} \geq 2$ corresponding to taking a centered finite difference derivative with a 3-point stencil. Smaller $p_{w}$ results in more smoothing and minimizes the corruption from noise while still accurately locating steep gradients in the dynamics. For the examples below we arbitrarily² use $p_{w} = 2$ and $L_{w} = 17$ .

Step 2: Selecting $c$ .

Having computed $v$ , define $ψ$ to be the cumulative sum of $| v |$ normalized so that max $ψ = 1$ . In this way $ψ$ is a valid cumulative distribution function with density proportional to the total variation of $y$ . We then find $c$ by sampling from $ψ$ . Let $U = [0, \frac{1}{K}, \frac{2}{K}, \dots, \frac{K - 1}{K}]$ with $K$ being the number of the test functions; we then define $c = ψ^{- 1} (U)$ , or numerically,

c_{k} = \min {t \in t : ψ (t) \geq U_{k}} .

This stage requires the user to select the number of test functions $K$ .

Step 3: Construction of test functions ${(ϕ_{k})}_{k \in [K]}$ .

Having chosen the location $c_{k}$ of the centerpoint for each test function $ϕ_{k}$ , we are left to choose the degree $p_{k}$ of the polynomial and the supports $[a_{k}, b_{k}]$ . The degree is chosen according to the width-at-half-max parameter $r_{w h m}$ , which specifies the difference in timepoints between each center $c_{k}$ and $\arg_{t} {ϕ_{k} (t) = 1 / 2}$ , while the supports are chosen such that $ϕ_{k} (b_{k} - Δ t) = 10^{- 16}$ . This gives us a nonlinear system of two equations in two unknowns which can be easily solved (i.e., using fzero in MATLAB). This can be done for one reference test functions and the rest of the weights obtained by translation. The optimal value of $r_{w h m}$ depends on the timescales of the dynamics and can be chosen from the data using the Fourier transform as in the uniform grid case; however, for simplicity we set $r_{w h m} = ⌊ M / 100 ⌋$ in the large-noise examples below.

The adaptive grid WSINDy algorithm is summarized as follows: $\hat{w} = WSINDy_AG (y, t; {(f_{j})}_{j \in [J]}, p_{w}, L_{w}, K, r_{w h m}, λ, γ)$ :

Construct matrix of trial gridfunctions $Θ (y) = [f_{1} (y) | \dots | f_{J} (y)]$ .
Construct integration matrices $V, V^{'}$ such that
$V_{k m} = Δ t ϕ_{k} (t_{m}), V_{k m}^{'} = Δ t ϕ_{k}^{'} (t_{m}),$
with test functions ${(ϕ_{k})}_{k \in [K]}$ determined by $p_{w}, L_{w}, K, r_{w h m}$ as described above.
Compute Gram matrix $G = V Θ (y)$ and right-hand side $b = - V^{'} y$ so that $G_{k j} = 〈 ϕ_{k}, f_{j} (y) 〉$ and $b_{k d} = - 〈 ϕ_{k}^{'}, y_{d} 〉$ .
Compute approximate covariance and Cholesky factorization $Σ = V^{'} {(V^{'})}^{T} = C C^{T}$
Solve the generalized least squares problem with $ℓ_{2}$ -regularization
$\hat{w} = {argmin}_{w} {{(G w - b)}^{T} Σ^{- 1} (G w - b) + γ^{2} ∥ w ∥_{2}^{2}},$
using sequential thresholding with parameter $λ$ to enforce sparsity.

3. Numerical experiments.

We now show that WSINDy is capable of recovering the correct dynamics to high accuracy over a range of noise levels. We examine the systems in Table 1 which exhibit several canonical dynamics, namely growth and decay, nonlinear oscillations and chaotic dynamics, in dimensions $D \in {2, 3, 5}$ . To generate true trajectory data we use ode45 in MATLAB with absolute and relative tolerance 10⁻¹⁰ and collect $M$ samples uniformly³ in time with sampling rate $Δ t$ . The parameters $M$ and $Δ t$ are chosen to provide a balance between illustrating ODE behaviors and avoiding an overabundance of observations. Gaussian white noise with mean zero and variance $σ^{2}$ is added to the exact trajectories, where $σ$ is computed by specifying a noise ratio $σ_{N R}$ and setting

σ = σ_{N R} \frac{∥ x ∥_{F}}{\sqrt{M D}},

(3.1)

where the Frobenius norm of a matrix $x \in ℝ^{M \times D}$ is defined by

∥ x ∥_{F} : = \sqrt{\sum_{m = 1}^{M} \sum_{d = 1}^{D} {| x_{m d} |}^{2}} .

The ratio of noise to signal is then approximately equal to the square root of the variance: $∥ ϵ ∥_{F} / ∥ x ∥_{F} \approx σ$ .

TABLE 1.

ODEs used in numerical experiments. For Linear 5D, Duffing, van der Pol, and Lotka–Volterra we measure the accuracy in the recovered system as the parameter $β$ varies (see Table 2).

Name	Governing equations	$M$	$Δ t$
Linear 5D	${\begin{array}{l} {\dot{x}}_{1} = - x_{5} + β x_{1} + x_{2}, \\ {\dot{x}}_{i} = - x_{i - 1} + β x_{i} + x_{i + 1}, i = 2, 3, 4 \\ {\dot{x}}_{5} = - x_{4} + β x_{5} + x_{1} \end{array}$	1401	0.025
Duffing	${\begin{array}{l} {\dot{x}}_{1} = x_{2}, \\ {\dot{x}}_{2} = - 0.2 x_{2} - 0.2 x_{1} - β x_{1}^{3} \end{array}$	3001	0.01
Van der Pol	${\begin{array}{l} {\dot{x}}_{1} = x_{2}, \\ {\dot{x}}_{2} = β x_{2} (1 - x_{1}^{2}) - x_{1} \end{array}$	3001	0.01
Lotka–Volterra	${\begin{array}{l} {\dot{x}}_{1} = 3 x_{1} - β x_{1} x_{2}, \\ {\dot{x}}_{2} = β x_{1} x_{2} - 6 x_{2} \end{array}$	1001	0.01
Nonlinear pendulum	${\begin{array}{l} {\dot{x}}_{1} = x_{2}, \\ {\dot{x}}_{2} = - \sin (x_{1}) \end{array}$	501	0.1
Lorenz	${\begin{array}{l} {\dot{x}}_{1} = 10 (x_{2} - x_{1}), \\ {\dot{x}}_{2} = x_{1} (28 - x_{3}) - x_{2}, \\ {\dot{x}}_{3} = x_{1} x_{2} - \frac{8}{3} x_{3} \end{array}$	10001	0.001

Open in a new tab

We measure the accuracy in the recovered dynamical system using the relative $∥ \cdot ∥_{F}$ error in the recovered coefficients,

E_{2} (\hat{w}) = \frac{{‖ \hat{w} - w^{⋆} ‖}_{F}}{{‖ w^{⋆} ‖}_{F}},

(3.2)

and the relative $∥ \cdot ∥_{F}$ error between the noise-free data $x$ and the data-driven dynamics $x_{d d}$ along the same timepoints:

𝓔_{2} (x_{d d}) = \frac{{‖ x_{d d} - x ‖}_{F}}{∥ x ∥_{F}} .

(3.3)

The collection of ODEs in Table 1 are all first-order autonomous systems; however, they exhibit a diverse range of dynamics. The Linear 5D system (for $β < 0$ ) and Duffing’s equation are both examples of damped oscillators, showing that WSINDy is able to discern whether such motion is governed by linear or nonlinear coupling between variables. For $β > 0$ , the Linear 5D system exhibits exponential growth. The van der Pol oscillator, Lotka–Volterra system, and nonlinear pendulum demonstrate that a stable limit cycle with abrupt changes may manifest from vastly different nonlinear mechanisms, which turn out to be identifiable using the weak form. Finally, the Lorenz system exhibits deterministic chaos, and hence the dynamics cover a wide range of Fourier modes, which easily become corrupted with noise.

3.1. Noise-free data.

The goal of the following noise-free experiments is to demonstrate convergence of the recovered weights $\hat{w}$ to the true weights $w^{⋆}$ to within the accuracy tolerance of the ODE solver (fixed 10⁻¹⁰ throughout). In light of Lemma 2, this should occur as the decay rate of the test functions ${(ϕ_{k})}_{k \in [K]}$ is increased, which for test functions in class $𝓢$ (see (2.6)) is realized by increasing the polynomial degree $p$ . Hence, over the range of parameter values in Table 2, for each system we test convergence as $p$ increases. We use the uniform grid approach with shift parameter $s$ chosen such that the number of test functions equals to the number of trial functions $(K = J)$ , resulting in square Gram matrices $G = V Θ (y)$ . The support of the basis functions along the timegrid $t$ is set to $L = ⌊ \frac{M}{25} ⌋$ points. The data-driven trial basis ${(f_{j})}_{j \in [J]}$ includes all monomials in the state variables up to degree 5 as well as the trigonometric terms $\cos (n y_{d})$ , $\sin (n y_{d})$ for $n = 1, 2$ and $d \in [D]$ . We set the regularization parameter to zero $(γ = 0)$ , with the exception of the nonlinear pendulum, where $γ = 10^{- 8}$ , and the sparsity threshold to $λ = 0.001$ . We note that a nonzero $γ$ is always necessary to discover the nonlinear pendulum from combined trigonometric and polynomial libraries since $\sin (x_{1})$ is well-approximated by polynomial terms; however, the same is not true for low-order polynomial systems. In cases considered here, sequential thresholding successfully removes trigonometric library terms for ODE systems with polynomial dynamics despite initially ill-conditioned Gram matrices $G$ resulting from combining polynomial and trigonometric terms.

TABLE 2.

Specifications for parameters used in illustrating simulations in Figure 3.1.

ODE	$β$	$x (0)$	$L$	$Δ L$	$J (= K)$
Linear 5D	(−0.3, −0.2, −0.1,0.1)	${(10, 0, 0, 0, 0)}^{T}$	57	5	252
Duffing	(0.01,0.1,1,10)	${(0, 2)}^{T}$	121	99	29
Van der Pol	(0.01,0.1,1,10)	${(0, 1)}^{T}$	121	99	29
Lotka–Volterra	(0.005,0.01,0.1,1)	${(1, 1)}^{T}$	41	33	29
Pendulum	—	$\begin{matrix} x_{2} (0) = 0, \\ x_{1} (0) \in {\frac{15}{16} π, \frac{10}{16} π, \frac{5}{16} π, \frac{1}{16} π} \end{matrix}$	21	16	29
Lorenz	—	$\sim U_{{[- 15, 15]}^{2} \times [10, 40]}$	401	141	68

Open in a new tab

Figure 3.1 shows that in the limit of large $p$ , WSINDy recovers the correct weight matrix $w^{⋆}$ of each system in Table 1 to an accuracy of $𝓞 (10^{- 10})$ . For the Linear 5D system, we vary the growth/decay parameter, showing that the system is identifiable to high accuracy despite an excessively large trial library (252 terms). For Duffing’s equation and the van der Pol oscillator, the same convergence trend is observed for $β$ values spanning several orders of magnitude. Accuracy is slightly worse for the Lotka–Volterra equation when $β = 0.005$ , which corresponds to highly infrequent predator-prey interactions and leads to solutions with large amplitudes and gradients. For the nonlinear pendulum, we test that WSINDy is able to identify the $\sin (x_{1})$ nonlinearity for both large and small initial amplitudes, noting that $x_{1} (0) = \frac{15}{16} π \approx π$ produces strongly nonlinear oscillations, while $x_{1} (0) = \frac{1}{16} π$ produces small-angle oscillations where $\sin (x_{1}) \approx x_{1}$ . In addition, for the pendulum we use fewer samples $(M = 501)$ and a larger time step $Δ t = 0.1$ and hence observe a decreased convergence rate. For the Lorenz equations we vary the initial conditions, generating 40 random initial conditions from a region covering the strange attractor, and show convergence over all cases.

3.2. Small-noise regime.

We now turn to the case of low to moderate noise levels, examining a noise ratio $σ_{N R}$ in the range $[10^{- 5}, 0.04]$ for the van der Pol oscillator and Duffing’s equation. We examine $ρ \in [1, 7]$ and $s \in [0.3, 0.95]$ , where $ρ : = {‖ ϕ_{k}^{'} ‖}_{2} / {‖ ϕ_{k} ‖}_{2}$ and $s$ is the height of intersection of two neighboring test functions $ϕ_{k}$ and $ϕ_{k + 1}$ (with $s = 1$ leading to $ϕ_{k} = ϕ_{k + 1}$ and $s = 0$ indicating $supp (ϕ_{k}) \cap supp (ϕ_{k + 1}) = ϕ$ . Using the analysis from section 2.3, increasing $ρ$ affects the distribution of the residual $𝓡 (w)$ by magnifying the portion $R_{3} = 〈 ϕ_{k}^{'}, ϵ 〉$ that is linear in the noise. For $ϕ \in 𝓢$ , larger $ρ$ corresponds to a higher polynomial degree $p$ , with $ρ \in [1, 7]$ leading to $p \in [2, 98]$ . Larger shift parameter $s$ corresponds to more test functions (higher $K$ ) but also to higher correlation between rows in $G$ , as $〈 ϕ_{k}, f_{j} (y) 〉 \approx 〈 ϕ_{k + 1}, f_{j} (y) 〉$ when the supports of $ϕ_{k}$ and $ϕ_{k + 1}$ sufficiently overlap. Here $s \in [0.3, 0.95]$ corresponds to $K \in [14, 451]$ . We again use the uniform grid approach with $γ = 0$ and $λ = \frac{1}{4} \min_{w_{j}^{⋆} \neq 0} | w_{j}^{⋆} |$ . For each system we generate 200 instantiations of noise and record the coefficient error over the range of $s$ and $ρ$ values.

From Figures 3.2 and 3.3 we observe two properties. Firstly, the coefficient error $E_{2} (\hat{w})$ monotonically deceases with increasing $s$ and $ρ$ ; hence accurate recovery re quires sufficient overlap between test functions (large enough shift parameter $s$ ) and sufficiently localized test functions that amplify the portion of the residual that is linear in the noise. Secondly, for large enough $ρ$ and $s$ , the error in the coefficients scales linearly with $σ_{N R}$ , leading to an accuracy of $E_{2} (\hat{w}) \approx 0.1 σ_{N R}$ , or $- \log_{10} (0.1 σ_{N R})$ significant digits in the recovered coefficients. In Appendix A we show that this second property does not hold for standard SINDy; in particular, the method of differentiation must change depending on the noise level in order to reach a desired accuracy.

3.3. Large-noise regime.

Figures 3.4 to 3.9 show that adaptive placement of test functions (Strategy 2) can be employed to discover dynamics in the large-noise regime with fewer test functions. We test that each system in Table 1 can be discovered under $σ_{N R} = 0.1$ (10% noise) from only 250 test functions distributed near steep gradients in $y$ , which are located using the scheme in section 2.4.2 with $p_{w} = 2$ and $L_{w} = 17$ . We set the width-at-half-max of the test functions to $r_{w h m} = ⌊ M / 100 ⌋$ timepoints. To exemplify the separation of scales and the severity of the corruption from noise, the noisy data $y$ , true data $x$ , and trajectories $x_{d d}$ from the learned dynamical systems are shown in dynamo view and in phase space (for $D \leq 3$ ). We extend $x_{d d}$ by 50% to show that the data-driven system captures the true limiting behavior. We set the sparsity to $λ = \frac{1}{4} \min_{w^{⋆} \neq 0} | w^{⋆} |$ and $γ = 0$ except in the Linear 5D and nonlinear pendulum examples, where $γ = \sqrt{σ_{N R}} \approx 0.32$ . For the trial basis we use all monomials up to degree 5 in the state variables, and for the pendulum we include the trigonometric terms $\sin (k y_{d}), \cos (k y_{d})$ for $k = 1, 2$ and $d = 1, 2$ .

FIG. 3.4. — *Large-noise regime: Linear* 5D system with damping $β = - 0.2$ . *All correct terms were identified with an error in the weights of* $E_{2} (\hat{w}) = 0.0064$ *and a trajectory error of* $𝓔_{2} (\hat{w}) = 0.013$ .

FIG. 3.9. — *Large-noise regime: Lorenz system with* $x_{0} = {(- 8, 7, 27)}^{T}$ . *All correct terms were identified with an error in the weights of* $E_{2} (\hat{w}) = 0.0084$ *and trajectory error* $𝓔 (\hat{w}) = 0.56$ . *The large trajectory error is expected due to the chaotic nature of the solution. Using data up until* $t = 1.5$ *(first* 1500 *timepoints) the trajectory error is* 0.027.

In each case the correct terms are identified with coefficient error $E_{2} (\hat{w}) < 10^{- 2}$ , in agreement with the trend $E_{2} (\hat{w}) \approx 0.1 σ_{N R}$ observed in the small-noise regime. For the Linear 5D, Duffing, and Lotka–Volterra systems (Figures 3.4, 3.5, and 3.7) the data-driven trajectory $x_{d d}$ is indistinguishable from the true data to the eye, with trajectory error $𝓔_{2} (\hat{w}) < 0.02$ . For the van der Pol oscillator and nonlinear pendulum (Figures 3.6 and 3.8), $x_{d d}$ follows a limit cycle with an attractor that is indistiguishable from the true data (see phase plane plots); however, an error in the period of oscillation of roughly 0.6% leads to a larger trajectory error. The data-driven trajectory for the Lorenz equation diverges from the true trajectory around $t = 2.5$ (Figure 3.9), which is expected from chaotic dynamics, but still remains close to the Lorenz attractor.

FIG. 3.5. — *Large-noise regime: Duffing equation*, $β = 1$ . *All correct terms were identified with an error in the weights of* $E_{2} (\hat{w}) = 0.0075$ *and a trajectory error of* $𝓔_{2} (\hat{w}) = 0.014$ .

FIG. 3.7. — *Large-noise regime: Lotka–Volterra system with* $β = 1$ . *All correct nonzero terms were identified with an error in the weights of* $E_{2} (\hat{w}) = 0.0013$ *and trajectory error* $𝓔_{2} (\hat{w}) = 0.0082$ .

FIG. 3.6. — *Large-noise regime: van der Pol oscillator*, $β = 4$ . *All correct terms were identified with coefficient error* $E_{2} (\hat{w}) = 0.0073$ *and trajectory error* $𝓔_{2} (\hat{w}) = 0.32$ . *The data-driven trajectory* $x_{d d}$ *has a slightly shorter oscillation period of* 10.14 *time units compared to the true* 10.2, *resulting in an eventual offset from the true data* $x$ *and hence a larger trajectory error. Measured over the time interval* [0, 8] *the trajectory error is* 0.065.

FIG. 3.8. — *Large-noise regime: nonlinear pendulum with initial conditions* $x (0) = {(15 π / 16, 0)}^{T}$ . *All correct nonzero terms were identified with an error in the weights of* $E_{2} (\hat{w}) = 0.0089$ *and an error between* $𝓔_{2} (\hat{w}) = 0.076$ .

4. Concluding remarks.

We have developed and investigated a data-driven model selection algorithm based on the weak formulation of differential equations. The algorithm utilizes the reformulation of the model selection problem as a sparse regression problem for the weights $w^{⋆}$ of a candidate function basis ${(f_{j})}_{j \in [J]}$ introduced in [21] and generalized in [4] as the SINDy algorithm. Our WSINDy algorithm can be seen as a generalization of the sparse recovery scheme using integral terms found in [17], where dynamics were recovered from noisy data using the integral equation. We have shown that by extending the integral equation to the weak form and using test functions with certain localization and smoothness properties, one may discovery the dynamics over a wide range of noise levels, with accuracy scaling favorably with noise: $E_{2} (\hat{w}) \approx 0.1 σ_{N R}$ .

A natural line of inquiry is to consider how WSINDy compares with conventional SINDy. There are several notable advantages of WSINDy; in particular, by considering the weak form of the equations, WSINDy completely avoids approximation of pointwise derivatives which significantly reduce the accuracy in conventional SINDy. When using SINDy, one must choose an appropriate numerical differentiation scheme depending on the noise level (e.g., finite differences are not robust to large noise but work well for small noise). For WSINDy, test functions from the space $𝓢$ (see section 2.4) together with the trapezoidal rule are effective in both low-noise and high-noise regimes. We demonstrate these observations in Appendix A by comparing WSINDy to SINDy under several numerical differentiation schemes. On the other hand, it may be the case that less data is required by standard SINDy. For the examples shown here, WSINDy works optimally for test functions supported on at least 15 timepoints, while many derivative approximations require fewer consecutive points.

WSINDy also utilizes the linearity of inner products with test functions to estimate the covariance structure of the residual, performing model selection in a generalized least squares framework. This is a much more appropriate setting given that residuals are neither independent nor uniformly distributed; however, we note that our implementations in this article employ approximate covariance matrices and could benefit from further refinements and investigation. In Appendix B we show that using generalized least squares with approximate covariance improves some results over ordinary least squares, but not significantly. We leave incorporation of more detailed knowledge of the covariance structure to future work. In addition, generalized least squares could potentially improve traditional model selection algorithms that rely on pointwise derivative estimates by similarly exploiting linear operators. Ultimately, a thorough analysis of the advantages of generalized least squares for model selection deserves further study.

Lastly, the most obvious extensions lie in generalizing the WSINDy method to spatiotemporal datasets. WSINDy as presented here in the context of ODEs is an exciting proof of concept with natural extensions to spatiotemporal and multiresolution settings building upon the extensive results in numerical and functional analysis for weak and variational formulations of physical problems.

Acknowledgments.

Code used in this manuscript is publicly available on GitHub at https://github.com/MathBioCU/WSINDy. The authors would like to thank Prof. Vanja Dukic (University of Colorado at Boulder, Department of Applied Mathematics) and Kadierdan Kaheman (University of Washington at Seattle, Department of Applied Mathematics) for helpful discussions.

Funding:

This research was supported in part by the NSF/NIH Joint DMS/NIGMS Mathematical Biology Initiative grant R01GM126559 and in part by the NSF Computing and Communications Foundations Division grant CCF-1815983. This work also utilized resources from the University of Colorado Boulder Research Computing Group, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University.

Appendix A. Comparison between WSINDy and SINDy.

Here we compare WSINDy and SINDy using the van der Pol oscillator, Lotka–Volterra system, and Lorenz equation. For WSINDy we place test functions along the time axis according to the uniform grid strategy. For SINDy, we examine three differentiation methods: total variation regularized derivatives (SINDy-TV), centered second-order finite difference (SINDy-FD-2), and centered fourth-order finite difference (SINDy-FD-4). For SINDy-TV we use default settings and set the regularization parameter equal to the time step.

For each system and noise level we generate 200 independent instantiations of noise and record the average coefficient error $E_{2} (\hat{w})$ (3.2) as well as the average true positivity ratio (TPR) [10]:

TPR (\hat{w}) = \frac{TP (\hat{w})}{TP (\hat{w}) + FP (\hat{w}) + FN (\hat{w})},

(A.1)

where $TP (\hat{w})$ is the number of correctly identified nonzero terms, $FP (\hat{w})$ is the number of falsely identified nonzero terms, and $FN (\hat{w})$ is the number of terms that are falsely identified as having a coefficient of zero. Since the feasible range of sparsity thresholds $λ$ depends on the noise level, we adopt the selection methodology in [14] to choose an appropriate $λ$ value for each instantiation of noise: $λ$ is chosen from the set $10^{{- 5 + \frac{i}{10}, i \in {0, \dots, 50}}}$ (i.e., the 51 values from 10⁻⁵ to 1 equally spaced $\log_{10}$ ) as the minimizer of the loss function

𝓛 (λ) = \frac{{‖ A w^{λ} - A w^{0} ‖}_{2}}{{‖ A w^{0} ‖}_{2}} + \frac{# {j : w_{j}^{λ} \neq 0}}{J},

where $A = V Θ (y)$ for WSINDy and $A = Θ (y)$ for $SINDy; w^{λ}$ is the sequential-thresholding least squares solution for sparsity threshold $λ$ , and $J$ is the number of terms in the model library (for further details see [14]).

From Figures A.1, A.2, and A.3 we observe that for small noise (up to $σ_{N R} = 10^{- 1}$ ), the coefficient error for WSINDy follows the linear trend $E_{2} (\hat{w}) \approx 0.1 σ_{N R}$ (observed in the text) and that SINDy-FD-4 behaves similarly but with slightly worse accuracy. For larger noise, SINDy diverges in accuracy and identification of the correct nonzero terms for each differentiation scheme, while WSINDy maintains a TPR of at least 0.8 up to 40% noise for each system. WSINDy thus provides an advantage across the entire noise spectrum examined, all while employing the same weak discretization scheme.

FIG. A.1. — *Comparison between WSINDy and SINDy: van der Pol. Clockwise from top left: small-noise* $T P R (\hat{w})$ *(defined in (A.1)), large-noise* $T P R (\hat{w})$ , *large-noise* $E_{2} (\hat{w})$ *(defined (3.2)), small-noise* $E_{2} (\hat{w})$ .

FIG. A.2. — *Comparison between WSINDy and SINDy: Lotka–Volterra. Clockwise from top left: small-noise* $T P R (\hat{w})$ *(defined in (A.1)), large-noise* $T P R (\hat{w})$ , *large-noise* $E_{2} (\hat{w})$ *(defined (3.2)), small-noise* $E_{2} (\hat{w})$ .

FIG. A.3. — *Comparison between WSINDy and SINDy: Lorenz system. Clockwise from top left: small-noise* $T P R (\hat{w})$ *(defined in (A.1)), large-noise* $T P R (\hat{w})$ , *large-noise* $E_{2} (\hat{w})$ *(defined (3.2)), small-noise* $E_{2} (\hat{w})$ .

Appendix B. Generalized least squares vs. ordinary least squares.

FIG. B.1. — *Comparison between WSINDy with GLS and WSINDy with ordinary least squares using the Duffing equation. Results are averaged over* 200 *instantiations of noise*.

Generalized least squares (GLS) aims to account for correlations between the residuals [8]. Given a linear model $y = X β + ϵ$ , where $Cov (ϵ) = Σ$ and $E [ϵ ∣ X] = 0$ , the GLS estimator of the parameters $β$ upon observing $\hat{y}$ is

\hat{β} = {(X^{T} Σ^{- 1} X)}^{- 1} X^{T} Σ^{- 1} \hat{y} .

This provides the best linear unbiased estimator of $β$ in the sense that if $\tilde{β}$ is any other unbiased estimator, then $\tilde{β}$ has lower variance: $V [{\hat{β}}_{i}] \leq V [{\tilde{β}}_{i}], i = 1, \dots, n$ .

Above we derived an approximate covariance matrix $Σ \approx V^{'} {(V^{'})}^{T}$ to use in the GLS implementation of WSINDy, although the true covariance depends on the underlying unknown dynamical system and hence is unattainable. In addition, since in our case $X = G = V Θ (y)$ depends on the noise $ϵ$ , the assumption $E [ϵ ∣ X] = 0$ is violated. Nevertheless, we find that the noise regime $σ_{N R} \in [0.01, 0.3]$ does benefit from using GLS over ordinary least squares. Figure B.1 shows that for the Duffing equation, GLS extends the region ${σ_{N R} ∣ TPR (\hat{w}) > 0.95}$ from $σ_{N R} \leq 0.05 to σ_{N R} \leq 0.15$ , as well as increases the accuracy in the recovered coefficients. This suggests that further improvements can be made with a more refined covariance matrix.

Footnotes

Such that $𝓕_{k} (y) : = \sum_{j = 0}^{M - 1} y_{m} e^{- 2 π i j k / M}$ is not negligible.

We find that a lower-degree test function with small support effectively locates steep gradients in noisy trajectories.

We leave a detailed study of nonuniform time sampling to future work.

REFERENCES

[1].AKAIKE H, A new look at the statistical model identification, IEEE Trans. Automat. Control, 19 (1974), pp. 716–723, 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]
[2].AKAIKE H, On entropy maximization principle, in Applications of Statistics, Krishnaiah PR, ed., North-Holland, Amsterdam, 1977, pp. 27–41 [Google Scholar]
[3].BORTZ DM AND NELSON PW, Model selection and mixed-effects modeling of HIV infection dynamics, Bull. Math. Biol, 68 (2006), pp. 2005–2025, 10.1007/s11538-006-9084-x. [DOI] [PubMed] [Google Scholar]
[4].BRUNTON SL, PROCTOR JL, AND KUTZ JN, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].CORTIELLA A, PARK K-C, AND DOOSTAN A, Sparse identification of nonlinear dynamical systems via reweighted l₁-regularized least squares, Comput. Methods Appl. Mech. Engrg, (2021), p. 113620.
[6].DAHLQUIST G. AND BJÖRCK A, Numerical Methods in Scientific Computing: Volume 1, vol. 103, SIAM, 2008. [Google Scholar]
[7].KANG SH, LIAO W, AND LIU Y, IDENT: Identifying differential equations with numerical time evolution, J. Sci. Comput, 87 (2021), 1. [Google Scholar]
[8].KARIYA T. AND KURATA H, Generalized Least Squares, John Wiley & Sons, New York, 2004. [Google Scholar]
[9].KELLER RT AND DU Q, Discovery of dynamics using linear multistep methods, SIAM J. Numer. Anal, 59 (2021), pp. 429–455. [Google Scholar]
[10].LAGERGREN J, NARDINI JT, LAVIGNE GM, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatio-temporal data, Proc. A, 476 (2020), 20190800. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].LAGERGREN JH, NARDINI JT, MICHAEL LAVIGNE G, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatiotemporal data, Proc. A, 476 (2020), 20190800, 10.1098/rspa.2019.0800. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].LILLACCI G. AND KHAMMASH M, Parameter estimation and model selection in computational biology, PLoS Comput. Biol, 6 (2010), e1000696, 10.1371/journal.pcbi.1000696. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].LU F, MAGGIONI M, AND TANG S, Learning interaction kernels in heterogeneous systems of agents from multiple trajectories, J. Mach. Learn. Res, 22 (2021), pp. 1–67. [Google Scholar]
[14].MESSENGER DA AND BORTZ DM, Weak SINDy for Partial Differential Equations, arXiv preprint, arXiv:2007.02848 [math.NA], 2020, https://arxiv.org/abs/2007.02848. [DOI] [PMC free article] [PubMed]
[15].RAISSI M, PERDIKARIS P, AND KARNIADAKIS GE, Machine learning of linear differential equations using Gaussian processes, J. Comput. Phys, 348 (2017), pp. 683–693. [Google Scholar]
[16].RUDY SH, KUTZ JN, AND BRUNTON SL, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, J. Comput. Phys, 396 (2019), pp. 483–506. [Google Scholar]
[17].SCHAEFFER H. AND MCCALLA SG, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), 023302. [DOI] [PubMed] [Google Scholar]
[18].SCHAEFFER H, TRAN G, WARD R, AND ZHANG L, Extracting structured dynamical systems using sparse optimization with very few samples, Multiscale Model. Simul, 18 (2020), pp. 1435–1461. [Google Scholar]
[19].TONI T, WELCH D, STRELKOWA N, IPSEN A, AND STUMPF MP, Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems, J. R. Soc. Interface, 6 (2009), pp. 187–202, 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].TRAN G. AND WARD R, Exact recovery of chaotic systems from highly corrupted data, Multiscale Model. Simul, 15 (2017), pp. 1108–1129. [Google Scholar]
[21].WANG W-X, YANG R, LAI Y-C, KOVANIS V, AND GREBOGI C, Predicting catastrophes in nonlinear dynamical systems by compressive sensing, Phys. Rev. Lett, 106 (2011), p. 154101. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].WARNE DJ, BAKER RE, AND SIMPSON MJ, Using experimental data and information criteria to guide model selection for reaction–diffusion problems in mathematical biology, Bull. Math. Biol, 81 (2019), pp. 1760–1804, 10.1007/s11538-019-00589-x. [DOI] [PubMed] [Google Scholar]
[23].WU H. AND WU L, Identification of significant host factors for HIV dynamics modelled by non-linear mixed-effects models, Stat. Med, 21 (2002), pp. 753–771, 10.1002/sim.1015. [DOI] [PubMed] [Google Scholar]
[24].WU K, QIN T, AND XIU D, Structure-preserving method for reconstructing unknown Hamiltonian systems from trajectory data, SIAM J. Sci. Comput, 42 (2020), pp. A3704–A3729. [Google Scholar]
[25].WU K. AND XIU D, Numerical aspects for approximating governing equations using data, J. Comput. Phys, 384 (2019), pp. 200–221. [Google Scholar]
[26].ZHANG S. AND LIN G, Robust data-driven discovery of governing physical laws with error bars, Proc. A, 474 (2018), 20180305. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].ZHANG S. AND LIN G, Robust Subsampling-Based Sparse Bayesian Inference to Tackle Four Challenges (Large Noise, Outliers, Data Integration, and Extrapolation) in the Discovery of Physical Laws from data, arXiv preprint, arXiv:1907.07788 [stat.ML], 2019, https://arxiv.org/abs/1907.07788.

[R1] [1].AKAIKE H, A new look at the statistical model identification, IEEE Trans. Automat. Control, 19 (1974), pp. 716–723, 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]

[R2] [2].AKAIKE H, On entropy maximization principle, in Applications of Statistics, Krishnaiah PR, ed., North-Holland, Amsterdam, 1977, pp. 27–41 [Google Scholar]

[R3] [3].BORTZ DM AND NELSON PW, Model selection and mixed-effects modeling of HIV infection dynamics, Bull. Math. Biol, 68 (2006), pp. 2005–2025, 10.1007/s11538-006-9084-x. [DOI] [PubMed] [Google Scholar]

[R4] [4].BRUNTON SL, PROCTOR JL, AND KUTZ JN, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].CORTIELLA A, PARK K-C, AND DOOSTAN A, Sparse identification of nonlinear dynamical systems via reweighted l₁-regularized least squares, Comput. Methods Appl. Mech. Engrg, (2021), p. 113620.

[R6] [6].DAHLQUIST G. AND BJÖRCK A, Numerical Methods in Scientific Computing: Volume 1, vol. 103, SIAM, 2008. [Google Scholar]

[R7] [7].KANG SH, LIAO W, AND LIU Y, IDENT: Identifying differential equations with numerical time evolution, J. Sci. Comput, 87 (2021), 1. [Google Scholar]

[R8] [8].KARIYA T. AND KURATA H, Generalized Least Squares, John Wiley & Sons, New York, 2004. [Google Scholar]

[R9] [9].KELLER RT AND DU Q, Discovery of dynamics using linear multistep methods, SIAM J. Numer. Anal, 59 (2021), pp. 429–455. [Google Scholar]

[R10] [10].LAGERGREN J, NARDINI JT, LAVIGNE GM, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatio-temporal data, Proc. A, 476 (2020), 20190800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].LAGERGREN JH, NARDINI JT, MICHAEL LAVIGNE G, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatiotemporal data, Proc. A, 476 (2020), 20190800, 10.1098/rspa.2019.0800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].LILLACCI G. AND KHAMMASH M, Parameter estimation and model selection in computational biology, PLoS Comput. Biol, 6 (2010), e1000696, 10.1371/journal.pcbi.1000696. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].LU F, MAGGIONI M, AND TANG S, Learning interaction kernels in heterogeneous systems of agents from multiple trajectories, J. Mach. Learn. Res, 22 (2021), pp. 1–67. [Google Scholar]

[R14] [14].MESSENGER DA AND BORTZ DM, Weak SINDy for Partial Differential Equations, arXiv preprint, arXiv:2007.02848 [math.NA], 2020, https://arxiv.org/abs/2007.02848. [DOI] [PMC free article] [PubMed]

[R15] [15].RAISSI M, PERDIKARIS P, AND KARNIADAKIS GE, Machine learning of linear differential equations using Gaussian processes, J. Comput. Phys, 348 (2017), pp. 683–693. [Google Scholar]

[R16] [16].RUDY SH, KUTZ JN, AND BRUNTON SL, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, J. Comput. Phys, 396 (2019), pp. 483–506. [Google Scholar]

[R17] [17].SCHAEFFER H. AND MCCALLA SG, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), 023302. [DOI] [PubMed] [Google Scholar]

[R18] [18].SCHAEFFER H, TRAN G, WARD R, AND ZHANG L, Extracting structured dynamical systems using sparse optimization with very few samples, Multiscale Model. Simul, 18 (2020), pp. 1435–1461. [Google Scholar]

[R19] [19].TONI T, WELCH D, STRELKOWA N, IPSEN A, AND STUMPF MP, Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems, J. R. Soc. Interface, 6 (2009), pp. 187–202, 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].TRAN G. AND WARD R, Exact recovery of chaotic systems from highly corrupted data, Multiscale Model. Simul, 15 (2017), pp. 1108–1129. [Google Scholar]

[R21] [21].WANG W-X, YANG R, LAI Y-C, KOVANIS V, AND GREBOGI C, Predicting catastrophes in nonlinear dynamical systems by compressive sensing, Phys. Rev. Lett, 106 (2011), p. 154101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].WARNE DJ, BAKER RE, AND SIMPSON MJ, Using experimental data and information criteria to guide model selection for reaction–diffusion problems in mathematical biology, Bull. Math. Biol, 81 (2019), pp. 1760–1804, 10.1007/s11538-019-00589-x. [DOI] [PubMed] [Google Scholar]

[R23] [23].WU H. AND WU L, Identification of significant host factors for HIV dynamics modelled by non-linear mixed-effects models, Stat. Med, 21 (2002), pp. 753–771, 10.1002/sim.1015. [DOI] [PubMed] [Google Scholar]

[R24] [24].WU K, QIN T, AND XIU D, Structure-preserving method for reconstructing unknown Hamiltonian systems from trajectory data, SIAM J. Sci. Comput, 42 (2020), pp. A3704–A3729. [Google Scholar]

[R25] [25].WU K. AND XIU D, Numerical aspects for approximating governing equations using data, J. Comput. Phys, 384 (2019), pp. 200–221. [Google Scholar]

[R26] [26].ZHANG S. AND LIN G, Robust data-driven discovery of governing physical laws with error bars, Proc. A, 474 (2018), 20180305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].ZHANG S. AND LIN G, Robust Subsampling-Based Sparse Bayesian Inference to Tackle Four Challenges (Large Noise, Outliers, Data Integration, and Extrapolation) in the Discovery of Physical Laws from data, arXiv preprint, arXiv:1907.07788 [stat.ML], 2019, https://arxiv.org/abs/1907.07788.

PERMALINK

WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION

DANIEL A MESSENGER

DAVID M BORTZ

Abstract

1. Problem statement.

1.1. Background.

2. WSINDy.

2.1. Method overview.

Remark 1.

2.2. Algorithm: WSINDy.

2.3. Residual analysis.

FIG. 3.1.

2.3.1. Approximate covariance Σ.

2.3.2. Adaptive refinement.

2.4. Test function basis (ϕk)k∈[K]

Lemma 2 (numerical error in weak derivatives).

Proof.

2.4.1. Strategy 1: Uniform grid.

Step 1: Choosing L.

Step 2: Determining p.

Step 3: Determining K.

FIG. 3.2.

FIG. 3.3.

2.4.2. Strategy 2: Adaptive grid.

FIG. 2.1.

Step 1: Weak derivative approximation.

Step 2: Selecting c.

Step 3: Construction of test functions (ϕk)k∈[K].

3. Numerical experiments.

TABLE 1.

3.1. Noise-free data.

TABLE 2.

3.2. Small-noise regime.

3.3. Large-noise regime.

FIG. 3.4.

FIG. 3.9.

FIG. 3.5.

FIG. 3.7.

FIG. 3.6.

FIG. 3.8.

4. Concluding remarks.

Acknowledgments.

Funding:

Appendix A. Comparison between WSINDy and SINDy.

FIG. A.1.

FIG. A.2.

FIG. A.3.

Appendix B. Generalized least squares vs. ordinary least squares.

FIG. B.1.

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.3.1. Approximate covariance $Σ$ .

2.4. Test function basis ${(ϕ_{k})}_{k \in [K]}$

Step 2: Determining $p$ .

Step 3: Determining $K$ .

Step 2: Selecting $c$ .

Step 3: Construction of test functions ${(ϕ_{k})}_{k \in [K]}$ .