Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 18.
Published in final edited form as: Multiscale Model Simul. 2021 Sep 7;19(3):1474–1497. doi: 10.1137/20m1343166

WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION

DANIEL A MESSENGER , DAVID M BORTZ
PMCID: PMC10795802  NIHMSID: NIHMS1912871  PMID: 38239761

Abstract

We present a novel weak formulation and discretization for discovering governing equations from noisy measurement data. This method of learning differential equations from data fits into a new class of algorithms that replace pointwise derivative approximations with linear transformations and variance reduction techniques. Compared to the standard SINDy algorithm presented in [S. L. Brunton, J. L. Proctor, and J. N. Kutz, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937], our so-called weak SINDy (WSINDy) algorithm allows for reliable model identification from data with large noise (often with ratios greater than 0.1) and reduces the error in the recovered coefficients to enable accurate prediction. Moreover, the coefficient error scales linearly with the noise level, leading to high-accuracy recovery in the low-noise regime. Altogether, WSINDy combines the simplicity and efficiency of the SINDy algorithm with the natural noise reduction of integration, as demonstrated in [H. Schaeffer and S. G. McCalla, Phys. Rev. E, 96 (2017), 023302], to arrive at a robust and accurate method of sparse recovery.

Keywords: data-driven model selection, nonlinear dynamics, sparse recovery, generalized least squares, Galerkin method, adaptive grid, 37M10, 62J99, 62-07, 65R99

1. Problem statement.

Consider a first-order dynamical system in D dimensions of the form

ddtx(t)=F(x(t)),x(0)=x0D,0tT, (1.1)

and measurement data yM×D given at M timepoints t=(t1,,tM)T by

ymd=xd(tm)+ϵmd,m[M],d[D],

where throughout we use the bracket notation [M]:={1,,M}. The variable ϵM×D represents a matrix of independent and identically distributed measurement noise. The focus of this article is the reconstruction of the dynamics (1.1) from the measurements y.

The SINDy algorithm (sparse identification of nonlinear dynamics [4]) has been shown to be successful in solving this problem for sparsely represented nonlinear dynamics when noise is small and dynamic scales do not vary across multiple orders of magnitude. This framework assumes that the function F:DD in (1.1) is given componentwise by

Fd(x(t))=j=1Jwjdfj(x(t)) (1.2)

for some known family of functions (fj)j[J] and a sparse weight matrix wJ×D. The problem is then transformed into solving for w by building a data matrix Θ(y)M×J given by

Θ(y)mj=fj(ym),ym:=(ym1,,ymD),

so that the candidate functions are directly evaluated at the noisy data. Solving (1.1) for F then reduces to identifying a sparse weight matrix w^ such that

y˙Θ(y)w^, (1.3)

where y˙ is the numerical time derivative of the data y. Sequential-thresholding least squares is then used to arrive at a sparse solution.

1.1. Background.

Research into statistically rigorous selection of mathematical models from data can be traced back to Akaike’s seminal work in the 1970s [1, 2]. In the last 20 years, there has been substantial work in this area at the interface between applied mathematics, computer science, and statistics (see [3, 11, 12, 19, 22, 23] for both theory and applications). More recently, the formulation of system discovery problems in terms of a candidate basis of nonlinear functions (1.2) and subsequent discretization (1.3) was introduced in [21] in the context of catastrophe prediction. The authors of [21] used compressed sensing techniques to enforce sparsity. Since then there has been an explosion of interest in the problem of identifying nonlinear dynamical systems from data, with some of the primary techniques being Gaussian process regression [15], deep neural networks [16], Bayesian inference [26, 27], and classical methods from numerical analysis [7, 9, 25]. These techniques have been successfully applied to the discovery of both ordinary and partial differential equations.

The variety of discovery algorithms qualitatively differ in the interpretability of the resulting data-driven dynamical system, the scope and efficiency of the algorithm, and the robustness to noise, scale separation, etc. For instance, a neural network based data-driven dynamical system does not easily lend itself to physical interpretation, while the SINDy algorithm identifies governing equations which can be analyzed directly. Moreover, it is also well-known that the training stage for neural networks and other iterative learning algorithms can be computationally costly. Concerning the scope of an algorithm, several methods have been independently developed to discover models under the assumption of some prior knowledge of the governing equations, notably for low-degree polynomial chaotic systems, cyclic ODEs, interacting particles, and Hamiltonian dynamics [20, 18, 13, 24]. In each of these cases the authors derive probabilistic recovery guarantees depending on the number of available trajectories, the size of the candidate model library, the level of incoherence of the data, and/or the sparsity of the governing equations.

The vast majority of algorithms and recovery guarantees assume that pointwise derivatives of the data either are available or can be reliably computed. This severely limits an algorithm’s robustness to noise and hence its applicability to real world data. Here we relax this assumption and provide rigorous justification for the weak formulation of the dynamics as a means to circumvent this ubiquitous problem in model selection. Building off of the SINDy framework, we present the robust discovery algorithm WSINDy (weak SINDy), which operates under the assumption that the time derivative is unavailable and that the only prior knowledge of the governing equations is their inclusion in a large model library. We also focus on the realistic scenario where only a single noisy trajectory of the state variable is available; however, extension to multiple trajectories is of course possible. For simplicity, we restrict numerical experiments to autonomous ODEs for their amenability to analysis. Natural next steps are to explore identification of PDEs and nonautonomous dynamical systems. We note that the use of integral equations for system identification was introduced in [17], where compressed sensing techniques were used to enforce sparsity, and that this technique can be seen as a special case of the method introduced here.

In section 2 we introduce the algorithm with analysis of the resulting error structure. Section 3 contains numerical results showing identification of six ODE systems over a range of noise levels and parameter regimes. In section 4, we provide concluding remarks as well as natural next directions for this line of research. In Appendix A we include a detailed comparison between WSINDy and SINDy as well as further information on the generalized least squares method.

2. WSINDy.

We approach the problem of system identification (1.3) from a nonstandard perspective by utilizing the weak form of the differential equation. Recall that for any smooth test function ϕ: (absolutely continuous is sufficient) and interval (a,b)[0,T], (1.1) admits the weak formulation

ϕ(b)x(b)ϕ(a)x(a)abϕ(u)x(u)du=abϕ(u)F(x(u))du,0a<bT. (2.1)

With ϕ=1, we arrive at the integral equation of the dynamics explored in [17]. If we instead take ϕ to be nonconstant and compactly supported in (a,b), we arrive at

abϕ(u)x(u)du=abϕ(u)F(x(u))du. (2.2)

Assuming a representation of the form (1.2), we then define the generalized residual 𝓡(w;ϕ) for a given test function ϕ by replacing F with a candidate element from the span of (fj)j[J] and x with y as follows:

𝓡(w;ϕ):=ab(ϕ(u)y(u)+ϕ(u)(j=1Jwjfj(y(u))))du. (2.3)

Clearly, with w=w and y=x(t) we have 𝓡(w;ϕ)=0 for all ϕ compactly supported in (a,b); however, y is a discrete set of data, so (2.3) can at best be approximated numerically. Measurement noise then presents a significant barrier to accurate indentification of w.

2.1. Method overview.

For analogy with traditional Galerkin methods, consider the forward problem of solving a dynamical system such as (1.1) for x. The Galerkin approach is to seek a solution x represented in a chosen trial basis (fj)j[J] such that the residual 𝓡, defined by

𝓡=ϕ(t)(x˙(t)F(x(t)))dt,

is minimized over all test functions ϕ living in the span of a given test function basis (ϕk)k[K]. If the trial and test function bases are known analytically, inner products of the form fj,ϕk appearing in the residual can be computed exactly. Thus, the computational error results only from representing the solution in a finite-dimensional function space.

The method we present here can be considered a data-driven Galerkin method of solving for F where the trial “basis” is given by the set of gridfunctions (fj(y))j[J] evaluated at the data and only the test function basis (ϕk)k[K] is known analytically. In this way, inner products appearing in 𝓡(w;ϕ) must be approximated numerically, implying that the accuracy of the recovered weights w^ is ultimately limited by the quadrature scheme used to discretize inner products. Using Lemma 2 below, we show that the correct coefficients w may be recovered to effective machine precision accuracy (given by the tolerance of the forward ODE solver) from noise-free trajectories y by discretizing (2.2) using the trapezoidal rule and choosing ϕ to decay smoothly to zero at the boundaries of its support. Specifically, in this article we demonstrate this fact by choosing test functions from a particular family of unimodal piecewise polynomials 𝓢 defined in (2.6).

Having chosen a quadrature scheme, the next accuracy barrier is presented by measurement noise, introducing randomness into the residuals 𝓡(w;ϕ). Numerical integration then couples residuals 𝓡(w;ϕ1) and 𝓡(w;ϕ2) whenever ϕ1 and ϕ2 have overlapping support. In this way, 𝓡(w;ϕ) does not have an ideal error structure for least squares but may be amenable to generalized least squares. Below we analyze the distribution of the residuals 𝓡(w;ϕ) to arrive at a generalized least squares approach where an approximate covariance matrix can be computed directly from the test functions. This analysis also suggests that placing test functions near steep gradients in the dynamics may improve recovery; hence we develop a derivative-free method for adaptively clustering test functions near steep gradients.

Remark 1.

The weak formulation of the dynamics introduces a wealth of information: given M timepoints t=(tm)m[M], (2.2) affords K=M(M1)/2 residuals over all possible supports (a,b)t×t with a<b. Of course, one could also assimilate the responses of multiple families of test functions ({ϕk1}k[K1],{ϕk2}k[K2],); however, the computational complexity of such an exhaustive approach quickly becomes intractable. We stress that even with large noise, our proposed method identifies the correct nonlinearities with accurate weight recovery while keeping the number of test functions lower than the number of timepoints (K<M).

2.2. Algorithm: WSINDy.

We state here the WSINDy algorithm in full generality. We propose a generalized least squares approach with approximate covariance matrix Σ. Below we derive a particular choice of Σ which utilizes the action of the test functions (ϕk)k[K] on the data y. Sequential thresholding on the weight coefficients w with thresholding parameter λ is used to enforce sparsity, where λminw0|w| is necessary for recovery. Lastly, an 2-regularization term with coefficient γ is included for problems involving rank deficiency. Methods of choosing optimal values of λ and γ directly from a given dataset do exist, for instance, by selecting the optimal position in a Pareto front [5]; however, this is not the focus of our current study, and thus we select values that work across multiple examples. Specifically, in the experiments below we set γ=0 with the exception of the nonlinear pendulum and the five-dimensional linear system, examples which show that regularization can be used to discover dynamics from excessively large libraries. For noise-free data the algorithm is only weakly dependent on λ and so we use λ=0.001, while for noisy data we set λ=14minw0|w|

w^=WSINDy(y,t;(ϕk)k[K],(fj)j[J],Σ,λ,γ):
  1. Construct matrix of trial gridfunctions Θ(y)=[f1(y)||fJ(y)].

  2. Construct integration matrices V, V such that
    Vkm=Δtϕk(tm),Vkm=Δtϕk(tm).
  3. Compute Gram matrix G=VΘ(y) and right-hand side b=Vy so that Gkj=ϕk,fj(y) and bkd=ϕk,yd.

  4. Solve the generalized least squares problem with 2-regularization
    w^=argminw{(Gwb)TΣ1(Gwb)+γ2w22},

using sequential thresholding with parameter λ to enforce sparsity.

With this as our core algorithm, we can now consider a residual analysis (section 2.3) leading to a generalized least squares framework. We can also develop theoretical results related to the test functions (section 2.4), yielding a more thorough understanding of the impact of using uniform (section 2.4.1) and adaptive (section 2.4.2) placement of test functions along the time axis.

2.3. Residual analysis.

Performance of WSINDy is determined by the behavior of the residuals

𝓡(w;ϕk):=(Gwb)k1×D,

denoted 𝓡(w)K×D for the entire residual matrix. Here we analyze the residual for autonomous F to highlight key aspects for future analysis, as well as to arrive at an appropriate choice of approximate covariance . We also provide a heuristic argument in favor of placing test functions near steep gradients in the dynamics.

A key difficulty in recovering the true weights w is that for nonlinear systems the residual evaluated at the true weights w is biased: E[𝓡(w)]0. Any minimization of 𝓡 thus introduces a bias in the recovered weights w^. Nevertheless, we can understand how different test functions impact the residual by linearizing around the true trajectory x(t) and isolating the dominant error terms :

𝓡(w;ϕk)=ϕk,Θ(y)w+ϕk,y=ϕk,Θ(y)(ww)+ϕk,Θ(y)w+ϕk,y=ϕk,Θ(y)(ww)+ϕk,F(y)F(x)+ϕk,ϵ+Ik=ϕk,Θ(y)(ww)R1+ϕk,ϵF(x)R2+ϕk,ϵR3+Ik+𝓞(ϵ2),

where F(x)dd=Fdxd(x). The errors manifest in the following ways:

  • R1 is the misfit between w and w.

  • R2 results from measurement error in trial gridfunctions: fj(y)=fj(x+ϵ)fj(x).

  • R3 results from replacing x with y=x+ϵ in the left-hand side of (2.2).

  • Ik is a deterministic integration error.

  • 𝓞(ϵ2) is the remainder term in the truncated Taylor expansion of F(y) around x:
    F(ym)=F(x(tm))+ϵmF(x(tm))+𝓞(|ϵm|2).

Clearly, recovery of F when ϵ=0 is straightforward: R1 and Ik are the only error terms; thus one only needs to select a quadrature scheme that ensures that the integration error Ik is negligible and w^=w will be the minimizer. A primary focus of this study is the use of a specific family of piecewise polynomial test functions 𝓢 defined below for which the trapezoidal rule is highly accurate (see Lemma 2). Figure 3.1 demonstrates this fact on noise-free data.

FIG. 3.1.

FIG. 3.1.

Noise-free data (σNR=0): plots of relative coefficient error E2(w^) (defined in (3.2)) vs. p. V1-V4 indicate different ODE parameters (see Table 2). For the Lorenz system the parameters are fixed, and 40 different initial conditions are sampled from a uniform distribution. In each case, the recovered coefficients w^ rapidly converge to within the accuracy of the ODE solver (10−10).

For ϵ>0, accurate recovery of F requires one to choose hyperparameters that exemplify the true misfit term R1 by enforcing that the other error terms are of lower order. We look for (ϕk)k[K] and Σ=CCT that approximately enforce C1𝓡(w)𝓝(0,σ2I), justifying the least squares approach. In the next subsection we address the issue of approximating the covariance matrix, providing justification for using Σ=V(V)T. The following subsection provides a heuristic argument for how to reduce corruption from the error terms R2 and R3 by placing test functions near steep gradients in the data.

2.3.1. Approximate covariance Σ.

Neglecting the deterministic integration error, which can be made small (see Lemma 2 below), and higher-order noise terms, the residual evaluated at the true weights is approximately

𝓡(w;ϕk)R2+R3,

where E[R2]=E[R3]=(0,,0) implies that E[𝓡(w)]=0 to leading order. Given the variances

V[R2]=V[ϕk,ϵF(x)]=Δtσ2(ϕk|F1(x)|22,,ϕk|FD(x)|22)

and

V[R3]=V[ϕk,ϵ]=Δtσ2(ϕk22,,ϕk22),

the true distribution of 𝓡(w) depends on F, which is not known a priori. If it holds that ϕk2ϕk|Fd(x)|2,d[D], a leading order approximation to Cov(𝓡(w)) is

Σ:=V(V)TCov(R3),

using that Cov(R3)ij=Δtσ2ϕi,ϕj. For this reason, we employ localized test functions and adopt the heuristic Σ=V(V)T below.

2.3.2. Adaptive refinement.

Next we show that by localizing ϕk around large |x˙|, we get an approximate cancellation of the error terms R2 and R3. Consider the one-dimensional case (D=1) where m is an arbitrary time index and ym=x(tm)+ϵ is an observation. When |x˙(tm)| is large compared to ϵ, we approximately have

ym=x(tm)+ϵmx(tm+δt)x(tm)+δtF(x(tm)) (2.4)

for some small δt, i.e., the perturbed value ym lands close to the true trajectory x at the time tm+δt. To understand the heuristic behind this approximation, let tm+δt be the point of intersection between the tangent line to x(t) at tm and x(tm)+ϵ. Then

δt=ϵx˙(tm);

hence |x˙(tm)|ϵ implies that x(tm)+ϵ will approximately lie on the true trajectory. As well, regions where |x˙(tm)| is small will not yield accurate recovery in the case of noisy data, since perturbations are more likely to exit the relevant region of phase space. If we linearize F using the approximation (2.4) we get

F(ym)F(x(tm))+δtF(x(tm))F(x(tm))=F(x(tm))+δtx¨(tm). (2.5)

Assuming ϕk is sufficiently localized around tm, (2.4) also implies that

ϕk,x+ϕk,ϵR3=ϕk,yϕk,x+δtϕk,F(x);

hence R3δtϕk,F(x), while (2.5) implies

ϕk,Θ(y)w=ϕk,Θ(y)(ww)=R1+ϕk,F(y)ϕk,Θ(y)(ww)+ϕk,F(x)+δtϕk,x¨R2=ϕk,Θ(y)(ww)+ϕk,F(x)δtϕk,F(x),

having integrated by parts. Collecting the terms together yields that the residual takes the form

𝓡(w;ϕk)=ϕk,y+ϕk,Θ(y)wR1,

and we see that R2 and R3 have effectively cancelled. In higher dimensions this interpretation does not appear to be as illuminating, but nevertheless, for any given coordinate xd, it does hold that terms in the error expansion vanish around points tm where |x˙d| is large, precisely because xd(tm)+ϵxd(tm+δt).

2.4. Test function basis (ϕk)k[K]

Here we introduce a test function space 𝓢 and quadrature scheme to minimize integration errors and enact the heuristic arguments above, which rely on ϕk having fast decay to its support boundaries and being sufficiently localized to ensure ϕk22ϕk22. We define the space 𝓢 of unimodal piecewise polynomials of the form

ϕ(t)={C(ta)p(bt)qt[a,b],0otherwise, (2.6)

where (a,b)t×t satisfies a<b and p,q1. The normalization

C=1ppqq(p+qba)p+q

ensures that ϕ=1. Functions ϕ𝓢 are nonnegative, unimodal, and compactly supported in [0,T] with min{p,q}1 continuous derivatives. Larger p and q imply faster decay towards the endpoints of the support. For p=q, we refer to p as the degree of ϕ.

To ensure the integration error in approximating inner products fj,ϕk is negligible, we rely on the following lemma, which provides a bound on the error in discretizing the weak derivative relation

ϕfdt=ϕfdt (2.7)

using the trapezoidal rule for compactly supported ϕ. Following the lemma we introduce two strategies for choosing the parameters of the test functions (ϕk)k[K]𝓢.

Lemma 2 (numerical error in weak derivatives).

Let f,ϕ have continuous derivatives of order p, and define tj=a+jbaN=a+jΔt. If ϕ has roots ϕ(a)=ϕ(b)=0 of multiplicity p, then

Δt2j=0N1[g(tj)+g(tj+1)]=𝓞(Δtp+1), (2.8)

where g(t)=ϕ(t)f(t)+ϕ(t)f(t). In other words, the composite trapezoidal rule discretizes the weak derivative relation (2.7) to order p+1.

Proof.

This is a simple consequence of the Euler-Maclaurin formula. If g:[a,b] is a smooth function, then the following asymptotic expansion holds:

Δt2j=0N1[g(tj)+g(tj+1)]abg(t)dt+k=1Δt2kB2k(2k)!(g(2k1)(b)g(2k1)(a)),

where B2k are the Bernoulli numbers. The asymptotic expansion provides corrections to the trapezoidal rule that realize machine precision accuracy up until a certain value of k, after which terms in the expansion grow and the series diverges [6, Chapter 3]. In our case, g(t)=ϕ(t)f(t)+ϕ(t)f(t), where the root conditions on ϕ imply that

abg(t)dt=0andg(k)(b)=g(k)(a)=0,0kp1.

So for p odd, we have that

Δt2j=0N1[g(tj)+g(tj+1)]k=(p+1)/2Δt2kB2k(2k)!(g(2k1)(b)g(2k1)(a))=Bp+1(p+1)!(ϕ(p)(b)f(b)ϕ(p)(a)f(a))Δtp+1+𝓞(Δtp+2).

For even p, the leading term is 𝓞(Δtp+2) with a slightly different coefficient.

For ϕ𝓢 with p=q, the exact leading order error in term in (2.8) is

2pBp+1p+1(f(b)f(a))Δtp+1, (2.9)

which is negligible for a wide range of reasonable p and Δt values. The Bernoulli numbers eventually start growing like pp, but for smaller values of p they are moderate. For instance, with Δt=0.1 and f(b)f(a)=1, this error term is o(1) up until p=85, where it takes the value 0.495352, while for Δt=0.01, the error is below machine precision for all p between 7 and 819. For these reasons, in what follows we choose test functions (ϕk)k[K]𝓢 and discretize all integrals using the trapezoidal rule. Unless otherwise stated, each function ϕk satisfies p=q and so is fully determined by the tuple {pk,ak,bk} indicating its polynomial degree and support. In the next two subsections we propose two different strategies for determining ϕk using the data y.

2.4.1. Strategy 1: Uniform grid.

The simplest strategy for choosing a basis of test functions (ϕk)k[K]𝓢 is to place ϕk uniformly on the interval [0,T] with fixed degree p and fixed support size

L:=#{tsupp(ϕk)}

(i.e., L is the number of timepoints in t that ϕk is supported on). The triple (L,p,K) then defines the scheme, where each piece effects the distribution of the residual 𝓡(w).

Step 1: Choosing L.

Heuristically, the support size of ϕk relates to the Fourier transform of the data. If supp(ϕk) is small compared to the dominant wavemodes in the dynamics, then high-frequency noise will dominate the values of the inner products ϕk,y. If supp(ϕk) is much larger than the dominant wavemodes, then too much averaging may occur, leading to unresolved dynamics. A natural choice is then to set L equal to the period of a known active wavemode1 k:

L=1Δt2π(2πT/k)=Mk.

In the noise-free and small-noise experiments below we set L=M25 and leave optimal selection of L based on Fourier analysis to future work.

Step 2: Determining p.

In light of the derivation above of the approximate covariance matrix Σ=V(V)T, we define the parameter ρ:=ϕk2/ϕk2, which serves as an estimate for the ratio V[R3]/V[R2] between the standard deviations of the two dominant error terms R3 and R2 in the residual 𝓡(w). Larger ρ indicates better agreement with the approximate covariance matrix Σ, since ΣCov(R3). Furthermore, for ϕk𝓢 we have the exact formula

ρ2=8p2(ba)2(Γ(2p1)Γ(2p+12)Γ(2p+1)Γ(2p+32))=p(ba)2(4p+1p12),

where Γ(z)=0tz1etdt is the gamma function. Given ρ2(5+26)/(ba)2, a polynomial degree p may be selected from ρ using the formula

p=18(((ba)2ρ21)+((ba)2ρ21)28(ba)2ρ2).
Step 3: Determining K.

Next we introduce the shift parameter s[0,1] defined by

s:=ϕk(t)s.t.ϕk(t)=ϕk+1(t),

which determines K from p and L. In words, s is the height of intersection between ϕk and ϕk+1 and measures the amount of overlap between successive test functions. More overlap increases the correlation between rows in the residual 𝓡(w) and hence leads to larger off-diagonal elements in the covariance matrix Σ. Larger s implies that neighboring functions overlap on more points, with s=1 indicating that ϕk=ϕk+1. Specifically, neighboring test functions overlap on L(11s1/p) timepoints. In Figures 3.2 and 3.3 we vary the parameters ρ and s and observe that results agree with intuition: larger ρ (better agreement with Σ) and larger s (more test functions) lead to better recovery of w. We summarize the uniform grid algorithm below.

FIG. 3.2.

FIG. 3.2.

Small-noise regime: dynamic recovery of the Duffing equation with β=1. Top: heat map of the log10 average error E2(w^) (left) and sample standard deviation of E2(w^) (right) over 200 instantiations of noise with σNR=0.04 (4% noise) vs. ρ and s. Bottom:E2(w^). ρ for fixed s=0.5 and various σNR. For ρ>3 the average error is roughly an order of magnitude below σNR.

FIG. 3.3.

FIG. 3.3.

Small-noise regime: dynamic recovery of the van der Pol oscillator with β=4. Top: heat map of the log10 average error E2(w^) (left) and sample standard deviation of E2(w^) (right) over 200 instantiations of noise with σNR=0.04 (4% noise) vs. ρ and s. Bottom: E2(w^) vs. ρ for fixed s=0.5 and various σNR. Similar to the Duffing equation, average error falls to roughly an order of magnitude below σNR, although for van der Pol this regime is reached when ρ6.

w^=WSINDy_UG(y,t;(fj)j[J],L,ρ,s,λ,γ):

  1. Construct matrix of trial gridfunctions Θ(y)=[f1(y)||fJ(y)].

  2. Construct integration matrices V,V such that
    Vkm=Δtϕk(tm),Vkm=Δtϕk(tm)
    with the test functions (ϕk)k[K] determined by L,ρ,s as described above.
  3. Compute Gram matrix G=VΘ(y) and right-hand side b=Vy so that Gkj=ϕk,fj(y) and bkd=ϕk,yd.

  4. Compute approximate covariance and Cholesky factorization Σ=V(V)T=CCT.

  5. Solve the generalized least squares problem with 2-regularization
    w^=argminw{(Gwb)TΣ1(Gwb)+γ2w22},
    using sequential thresholding with parameter λ to enforce sparsity.

2.4.2. Strategy 2: Adaptive grid.

Motivated by the arguments above, we now introduce an algorithm for constructing a test function basis localized near points of large change in the dynamics. This occurs in three steps: (1) construct a weak approximation to the derivative of the dynamics vx˙, (2) sample K points c from a cumulative distribution ψ with density proportional to the total variation |v|, and (3) construct test functions centered at c using a width-at-half-max parameter rwhm to determine the parameters (pk,ak,bk) of each function ϕk. Each of these steps is numerically stable and carried out independently along each coordinate of the dynamics. A visual diagram is provided in Figure 2.1.

FIG. 2.1.

FIG. 2.1.

Adaptive grid construction used on data from the Duffing equation with 10% noise (σNR=0.1). As desired, the centers c are clustered near steep gradients in the dynamics despite large measurement noise. (Noteϕ(t)/10 is plotted in the upper-left instead of ϕ(t) in order to visualize both ϕ and ϕ.)

Step 1: Weak derivative approximation.

Define v:=Vwy, where the matrix Vw enacts a linear convolution with the derivative of a chosen test function ϕ𝓢 of degree pw and support size Lw so that

vm=ϕ,y=ϕ,y˙y˙m.

The parameters Lw and pw are chosen by the user, with Lw=5 and pw2 corresponding to taking a centered finite difference derivative with a 3-point stencil. Smaller pw results in more smoothing and minimizes the corruption from noise while still accurately locating steep gradients in the dynamics. For the examples below we arbitrarily2 use pw=2 and Lw=17.

Step 2: Selecting c.

Having computed v, define ψ to be the cumulative sum of |v| normalized so that max ψ=1. In this way ψ is a valid cumulative distribution function with density proportional to the total variation of y. We then find c by sampling from ψ. Let U=[0,1K,2K,,K1K] with K being the number of the test functions; we then define c=ψ1(U), or numerically,

ck=min{tt:ψ(t)Uk}.

This stage requires the user to select the number of test functions K.

Step 3: Construction of test functions (ϕk)k[K].

Having chosen the location ck of the centerpoint for each test function ϕk, we are left to choose the degree pk of the polynomial and the supports [ak,bk]. The degree is chosen according to the width-at-half-max parameter rwhm, which specifies the difference in timepoints between each center ck and argt{ϕk(t)=1/2}, while the supports are chosen such that ϕk(bkΔt)=1016. This gives us a nonlinear system of two equations in two unknowns which can be easily solved (i.e., using fzero in MATLAB). This can be done for one reference test functions and the rest of the weights obtained by translation. The optimal value of rwhm depends on the timescales of the dynamics and can be chosen from the data using the Fourier transform as in the uniform grid case; however, for simplicity we set rwhm=M/100 in the large-noise examples below.

The adaptive grid WSINDy algorithm is summarized as follows: w^=WSINDy_AG(y,t;(fj)j[J],pw,Lw,K,rwhm,λ,γ):

  1. Construct matrix of trial gridfunctions Θ(y)=[f1(y)||fJ(y)].

  2. Construct integration matrices V,V such that
    Vkm=Δtϕk(tm),Vkm=Δtϕk(tm),
    with test functions (ϕk)k[K] determined by pw,Lw,K,rwhm as described above.
  3. Compute Gram matrix G=VΘ(y) and right-hand side b=Vy so that Gkj=ϕk,fj(y) and bkd=ϕk,yd.

  4. Compute approximate covariance and Cholesky factorization Σ=V(V)T=CCT

  5. Solve the generalized least squares problem with 2-regularization
    w^=argminw{(Gwb)TΣ1(Gwb)+γ2w22},
    using sequential thresholding with parameter λ to enforce sparsity.

3. Numerical experiments.

We now show that WSINDy is capable of recovering the correct dynamics to high accuracy over a range of noise levels. We examine the systems in Table 1 which exhibit several canonical dynamics, namely growth and decay, nonlinear oscillations and chaotic dynamics, in dimensions D{2,3,5}. To generate true trajectory data we use ode45 in MATLAB with absolute and relative tolerance 10−10 and collect M samples uniformly3 in time with sampling rate Δt. The parameters M and Δt are chosen to provide a balance between illustrating ODE behaviors and avoiding an overabundance of observations. Gaussian white noise with mean zero and variance σ2 is added to the exact trajectories, where σ is computed by specifying a noise ratio σNR and setting

σ=σNRxFMD, (3.1)

where the Frobenius norm of a matrix xM×D is defined by

xF:=m=1Md=1D|xmd|2.

The ratio of noise to signal is then approximately equal to the square root of the variance: ϵF/xFσ.

TABLE 1.

ODEs used in numerical experiments. For Linear 5D, Duffing, van der Pol, and Lotka–Volterra we measure the accuracy in the recovered system as the parameter β varies (see Table 2).

Name Governing equations M Δt
Linear 5D {x˙1=x5+βx1+x2,x˙i=xi1+βxi+xi+1,i=2,3,4x˙5=x4+βx5+x1 1401 0.025
Duffing {x˙1=x2,x˙2=0.2x20.2x1βx13 3001 0.01
Van der Pol {x˙1=x2,x˙2=βx2(1x12)x1 3001 0.01
Lotka–Volterra {x˙1=3x1βx1x2,x˙2=βx1x26x2 1001 0.01
Nonlinear pendulum {x˙1=x2,x˙2=sin(x1) 501 0.1
Lorenz {x˙1=10(x2x1),x˙2=x1(28x3)x2,x˙3=x1x283x3 10001 0.001

We measure the accuracy in the recovered dynamical system using the relative F error in the recovered coefficients,

E2(w^)=w^wFwF, (3.2)

and the relative F error between the noise-free data x and the data-driven dynamics xdd along the same timepoints:

𝓔2(xdd)=xddxFxF. (3.3)

The collection of ODEs in Table 1 are all first-order autonomous systems; however, they exhibit a diverse range of dynamics. The Linear 5D system (for β<0) and Duffing’s equation are both examples of damped oscillators, showing that WSINDy is able to discern whether such motion is governed by linear or nonlinear coupling between variables. For β>0, the Linear 5D system exhibits exponential growth. The van der Pol oscillator, Lotka–Volterra system, and nonlinear pendulum demonstrate that a stable limit cycle with abrupt changes may manifest from vastly different nonlinear mechanisms, which turn out to be identifiable using the weak form. Finally, the Lorenz system exhibits deterministic chaos, and hence the dynamics cover a wide range of Fourier modes, which easily become corrupted with noise.

3.1. Noise-free data.

The goal of the following noise-free experiments is to demonstrate convergence of the recovered weights w^ to the true weights w to within the accuracy tolerance of the ODE solver (fixed 10−10 throughout). In light of Lemma 2, this should occur as the decay rate of the test functions (ϕk)k[K] is increased, which for test functions in class 𝓢 (see (2.6)) is realized by increasing the polynomial degree p. Hence, over the range of parameter values in Table 2, for each system we test convergence as p increases. We use the uniform grid approach with shift parameter s chosen such that the number of test functions equals to the number of trial functions (K=J), resulting in square Gram matrices G=VΘ(y). The support of the basis functions along the timegrid t is set to L=M25 points. The data-driven trial basis (fj)j[J] includes all monomials in the state variables up to degree 5 as well as the trigonometric terms cos(nyd), sin(nyd) for n=1,2 and d[D]. We set the regularization parameter to zero (γ=0), with the exception of the nonlinear pendulum, where γ=108, and the sparsity threshold to λ=0.001. We note that a nonzero γ is always necessary to discover the nonlinear pendulum from combined trigonometric and polynomial libraries since sin(x1) is well-approximated by polynomial terms; however, the same is not true for low-order polynomial systems. In cases considered here, sequential thresholding successfully removes trigonometric library terms for ODE systems with polynomial dynamics despite initially ill-conditioned Gram matrices G resulting from combining polynomial and trigonometric terms.

TABLE 2.

Specifications for parameters used in illustrating simulations in Figure 3.1.

ODE β x(0) L ΔL J(=K)
Linear 5D (−0.3, −0.2, −0.1,0.1) (10,0,0,0,0)T 57 5 252
Duffing (0.01,0.1,1,10) (0,2)T 121 99 29
Van der Pol (0.01,0.1,1,10) (0,1)T 121 99 29
Lotka–Volterra (0.005,0.01,0.1,1) (1,1)T 41 33 29
Pendulum x2(0)=0,x1(0){1516π,1016π,516π,116π} 21 16 29
Lorenz U[15,15]2×[10,40] 401 141 68

Figure 3.1 shows that in the limit of large p, WSINDy recovers the correct weight matrix w of each system in Table 1 to an accuracy of 𝓞(1010). For the Linear 5D system, we vary the growth/decay parameter, showing that the system is identifiable to high accuracy despite an excessively large trial library (252 terms). For Duffing’s equation and the van der Pol oscillator, the same convergence trend is observed for β values spanning several orders of magnitude. Accuracy is slightly worse for the Lotka–Volterra equation when β=0.005, which corresponds to highly infrequent predator-prey interactions and leads to solutions with large amplitudes and gradients. For the nonlinear pendulum, we test that WSINDy is able to identify the sin(x1) nonlinearity for both large and small initial amplitudes, noting that x1(0)=1516ππ produces strongly nonlinear oscillations, while x1(0)=116π produces small-angle oscillations where sin(x1)x1. In addition, for the pendulum we use fewer samples (M=501) and a larger time step Δt=0.1 and hence observe a decreased convergence rate. For the Lorenz equations we vary the initial conditions, generating 40 random initial conditions from a region covering the strange attractor, and show convergence over all cases.

3.2. Small-noise regime.

We now turn to the case of low to moderate noise levels, examining a noise ratio σNR in the range [105,0.04] for the van der Pol oscillator and Duffing’s equation. We examine ρ[1,7] and s[0.3,0.95], where ρ:=ϕk2/ϕk2 and s is the height of intersection of two neighboring test functions ϕk and ϕk+1 (with s=1 leading to ϕk=ϕk+1 and s=0 indicating supp(ϕk)supp(ϕk+1)=ϕ. Using the analysis from section 2.3, increasing ρ affects the distribution of the residual 𝓡(w) by magnifying the portion R3=ϕk,ϵ that is linear in the noise. For ϕ𝓢, larger ρ corresponds to a higher polynomial degree p, with ρ[1,7] leading to p[2,98]. Larger shift parameter s corresponds to more test functions (higher K) but also to higher correlation between rows in G, as ϕk,fj(y)ϕk+1,fj(y) when the supports of ϕk and ϕk+1 sufficiently overlap. Here s[0.3,0.95] corresponds to K[14,451]. We again use the uniform grid approach with γ=0 and λ=14minwj0|wj|. For each system we generate 200 instantiations of noise and record the coefficient error over the range of s and ρ values.

From Figures 3.2 and 3.3 we observe two properties. Firstly, the coefficient error E2(w^) monotonically deceases with increasing s and ρ; hence accurate recovery re quires sufficient overlap between test functions (large enough shift parameter s) and sufficiently localized test functions that amplify the portion of the residual that is linear in the noise. Secondly, for large enough ρ and s, the error in the coefficients scales linearly with σNR, leading to an accuracy of E2(w^)0.1σNR, or log10(0.1σNR) significant digits in the recovered coefficients. In Appendix A we show that this second property does not hold for standard SINDy; in particular, the method of differentiation must change depending on the noise level in order to reach a desired accuracy.

3.3. Large-noise regime.

Figures 3.4 to 3.9 show that adaptive placement of test functions (Strategy 2) can be employed to discover dynamics in the large-noise regime with fewer test functions. We test that each system in Table 1 can be discovered under σNR=0.1 (10% noise) from only 250 test functions distributed near steep gradients in y, which are located using the scheme in section 2.4.2 with pw=2 and Lw=17. We set the width-at-half-max of the test functions to rwhm=M/100 timepoints. To exemplify the separation of scales and the severity of the corruption from noise, the noisy data y, true data x, and trajectories xdd from the learned dynamical systems are shown in dynamo view and in phase space (for D3). We extend xdd by 50% to show that the data-driven system captures the true limiting behavior. We set the sparsity to λ=14minw0|w| and γ=0 except in the Linear 5D and nonlinear pendulum examples, where γ=σNR0.32. For the trial basis we use all monomials up to degree 5 in the state variables, and for the pendulum we include the trigonometric terms sin(kyd),cos(kyd) for k=1,2 and d=1,2.

FIG. 3.4.

FIG. 3.4.

Large-noise regime: Linear 5D system with damping β=0.2. All correct terms were identified with an error in the weights of E2(w^)=0.0064 and a trajectory error of 𝓔2(w^)=0.013.

FIG. 3.9.

FIG. 3.9.

Large-noise regime: Lorenz system with x0=(8,7,27)T. All correct terms were identified with an error in the weights of E2(w^)=0.0084 and trajectory error 𝓔(w^)=0.56. The large trajectory error is expected due to the chaotic nature of the solution. Using data up until t=1.5 (first 1500 timepoints) the trajectory error is 0.027.

In each case the correct terms are identified with coefficient error E2(w^)<102, in agreement with the trend E2(w^)0.1σNR observed in the small-noise regime. For the Linear 5D, Duffing, and Lotka–Volterra systems (Figures 3.4, 3.5, and 3.7) the data-driven trajectory xdd is indistinguishable from the true data to the eye, with trajectory error 𝓔2(w^)<0.02. For the van der Pol oscillator and nonlinear pendulum (Figures 3.6 and 3.8), xdd follows a limit cycle with an attractor that is indistiguishable from the true data (see phase plane plots); however, an error in the period of oscillation of roughly 0.6% leads to a larger trajectory error. The data-driven trajectory for the Lorenz equation diverges from the true trajectory around t=2.5 (Figure 3.9), which is expected from chaotic dynamics, but still remains close to the Lorenz attractor.

FIG. 3.5.

FIG. 3.5.

Large-noise regime: Duffing equation, β=1. All correct terms were identified with an error in the weights of E2(w^)=0.0075 and a trajectory error of 𝓔2(w^)=0.014.

FIG. 3.7.

FIG. 3.7.

Large-noise regime: Lotka–Volterra system with β=1. All correct nonzero terms were identified with an error in the weights of E2(w^)=0.0013 and trajectory error 𝓔2(w^)=0.0082.

FIG. 3.6.

FIG. 3.6.

Large-noise regime: van der Pol oscillator, β=4. All correct terms were identified with coefficient error E2(w^)=0.0073 and trajectory error 𝓔2(w^)=0.32. The data-driven trajectory xdd has a slightly shorter oscillation period of 10.14 time units compared to the true 10.2, resulting in an eventual offset from the true data x and hence a larger trajectory error. Measured over the time interval [0, 8] the trajectory error is 0.065.

FIG. 3.8.

FIG. 3.8.

Large-noise regime: nonlinear pendulum with initial conditions x(0)=(15π/16,0)T. All correct nonzero terms were identified with an error in the weights of E2(w^)=0.0089 and an error between 𝓔2(w^)=0.076.

4. Concluding remarks.

We have developed and investigated a data-driven model selection algorithm based on the weak formulation of differential equations. The algorithm utilizes the reformulation of the model selection problem as a sparse regression problem for the weights w of a candidate function basis (fj)j[J] introduced in [21] and generalized in [4] as the SINDy algorithm. Our WSINDy algorithm can be seen as a generalization of the sparse recovery scheme using integral terms found in [17], where dynamics were recovered from noisy data using the integral equation. We have shown that by extending the integral equation to the weak form and using test functions with certain localization and smoothness properties, one may discovery the dynamics over a wide range of noise levels, with accuracy scaling favorably with noise: E2(w^)0.1σNR.

A natural line of inquiry is to consider how WSINDy compares with conventional SINDy. There are several notable advantages of WSINDy; in particular, by considering the weak form of the equations, WSINDy completely avoids approximation of pointwise derivatives which significantly reduce the accuracy in conventional SINDy. When using SINDy, one must choose an appropriate numerical differentiation scheme depending on the noise level (e.g., finite differences are not robust to large noise but work well for small noise). For WSINDy, test functions from the space 𝓢 (see section 2.4) together with the trapezoidal rule are effective in both low-noise and high-noise regimes. We demonstrate these observations in Appendix A by comparing WSINDy to SINDy under several numerical differentiation schemes. On the other hand, it may be the case that less data is required by standard SINDy. For the examples shown here, WSINDy works optimally for test functions supported on at least 15 timepoints, while many derivative approximations require fewer consecutive points.

WSINDy also utilizes the linearity of inner products with test functions to estimate the covariance structure of the residual, performing model selection in a generalized least squares framework. This is a much more appropriate setting given that residuals are neither independent nor uniformly distributed; however, we note that our implementations in this article employ approximate covariance matrices and could benefit from further refinements and investigation. In Appendix B we show that using generalized least squares with approximate covariance improves some results over ordinary least squares, but not significantly. We leave incorporation of more detailed knowledge of the covariance structure to future work. In addition, generalized least squares could potentially improve traditional model selection algorithms that rely on pointwise derivative estimates by similarly exploiting linear operators. Ultimately, a thorough analysis of the advantages of generalized least squares for model selection deserves further study.

Lastly, the most obvious extensions lie in generalizing the WSINDy method to spatiotemporal datasets. WSINDy as presented here in the context of ODEs is an exciting proof of concept with natural extensions to spatiotemporal and multiresolution settings building upon the extensive results in numerical and functional analysis for weak and variational formulations of physical problems.

Acknowledgments.

Code used in this manuscript is publicly available on GitHub at https://github.com/MathBioCU/WSINDy. The authors would like to thank Prof. Vanja Dukic (University of Colorado at Boulder, Department of Applied Mathematics) and Kadierdan Kaheman (University of Washington at Seattle, Department of Applied Mathematics) for helpful discussions.

Funding:

This research was supported in part by the NSF/NIH Joint DMS/NIGMS Mathematical Biology Initiative grant R01GM126559 and in part by the NSF Computing and Communications Foundations Division grant CCF-1815983. This work also utilized resources from the University of Colorado Boulder Research Computing Group, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University.

Appendix A. Comparison between WSINDy and SINDy.

Here we compare WSINDy and SINDy using the van der Pol oscillator, Lotka–Volterra system, and Lorenz equation. For WSINDy we place test functions along the time axis according to the uniform grid strategy. For SINDy, we examine three differentiation methods: total variation regularized derivatives (SINDy-TV), centered second-order finite difference (SINDy-FD-2), and centered fourth-order finite difference (SINDy-FD-4). For SINDy-TV we use default settings and set the regularization parameter equal to the time step.

For each system and noise level we generate 200 independent instantiations of noise and record the average coefficient error E2(w^) (3.2) as well as the average true positivity ratio (TPR) [10]:

TPR(w^)=TP(w^)TP(w^)+FP(w^)+FN(w^), (A.1)

where TP(w^) is the number of correctly identified nonzero terms, FP(w^) is the number of falsely identified nonzero terms, and FN(w^) is the number of terms that are falsely identified as having a coefficient of zero. Since the feasible range of sparsity thresholds λ depends on the noise level, we adopt the selection methodology in [14] to choose an appropriate λ value for each instantiation of noise: λ is chosen from the set 10{5+i10,i{0,,50}} (i.e., the 51 values from 10−5 to 1 equally spaced log10) as the minimizer of the loss function

𝓛(λ)=AwλAw02Aw02+#{j:wjλ0}J,

where A=VΘ(y) for WSINDy and A=Θ(y) for SINDy;wλ is the sequential-thresholding least squares solution for sparsity threshold λ, and J is the number of terms in the model library (for further details see [14]).

From Figures A.1, A.2, and A.3 we observe that for small noise (up to σNR=101), the coefficient error for WSINDy follows the linear trend E2(w^)0.1σNR (observed in the text) and that SINDy-FD-4 behaves similarly but with slightly worse accuracy. For larger noise, SINDy diverges in accuracy and identification of the correct nonzero terms for each differentiation scheme, while WSINDy maintains a TPR of at least 0.8 up to 40% noise for each system. WSINDy thus provides an advantage across the entire noise spectrum examined, all while employing the same weak discretization scheme.

FIG. A.1.

FIG. A.1.

Comparison between WSINDy and SINDy: van der Pol. Clockwise from top left: small-noise TPR(w^) (defined in (A.1)), large-noise TPR(w^), large-noise E2(w^) (defined (3.2)), small-noise E2(w^).

FIG. A.2.

FIG. A.2.

Comparison between WSINDy and SINDy: Lotka–Volterra. Clockwise from top left: small-noise TPR(w^) (defined in (A.1)), large-noise TPR(w^), large-noise E2(w^) (defined (3.2)), small-noise E2(w^).

FIG. A.3.

FIG. A.3.

Comparison between WSINDy and SINDy: Lorenz system. Clockwise from top left: small-noise TPR(w^) (defined in (A.1)), large-noise TPR(w^), large-noise E2(w^) (defined (3.2)), small-noise E2(w^).

Appendix B. Generalized least squares vs. ordinary least squares.

FIG. B.1.

FIG. B.1.

Comparison between WSINDy with GLS and WSINDy with ordinary least squares using the Duffing equation. Results are averaged over 200 instantiations of noise.

Generalized least squares (GLS) aims to account for correlations between the residuals [8]. Given a linear model y=Xβ+ϵ, where Cov(ϵ)=Σ and E[ϵX]=0, the GLS estimator of the parameters β upon observing y^ is

β^=(XTΣ1X)1XTΣ1y^.

This provides the best linear unbiased estimator of β in the sense that if β˜ is any other unbiased estimator, then β˜ has lower variance: V[β^i]V[β˜i],i=1,,n.

Above we derived an approximate covariance matrix ΣV(V)T to use in the GLS implementation of WSINDy, although the true covariance depends on the underlying unknown dynamical system and hence is unattainable. In addition, since in our case X=G=VΘ(y) depends on the noise ϵ, the assumption E[ϵX]=0 is violated. Nevertheless, we find that the noise regime σNR[0.01,0.3] does benefit from using GLS over ordinary least squares. Figure B.1 shows that for the Duffing equation, GLS extends the region {σNRTPR(w^)>0.95} from σNR0.05toσNR0.15, as well as increases the accuracy in the recovered coefficients. This suggests that further improvements can be made with a more refined covariance matrix.

Footnotes

1

Such that 𝓕k(y):=j=0M1yme2πijk/M is not negligible.

2

We find that a lower-degree test function with small support effectively locates steep gradients in noisy trajectories.

3

We leave a detailed study of nonuniform time sampling to future work.

REFERENCES

  • [1].AKAIKE H, A new look at the statistical model identification, IEEE Trans. Automat. Control, 19 (1974), pp. 716–723, 10.1109/TAC.1974.1100705. [DOI] [Google Scholar]
  • [2].AKAIKE H, On entropy maximization principle, in Applications of Statistics, Krishnaiah PR, ed., North-Holland, Amsterdam, 1977, pp. 27–41 [Google Scholar]
  • [3].BORTZ DM AND NELSON PW, Model selection and mixed-effects modeling of HIV infection dynamics, Bull. Math. Biol, 68 (2006), pp. 2005–2025, 10.1007/s11538-006-9084-x. [DOI] [PubMed] [Google Scholar]
  • [4].BRUNTON SL, PROCTOR JL, AND KUTZ JN, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].CORTIELLA A, PARK K-C, AND DOOSTAN A, Sparse identification of nonlinear dynamical systems via reweighted l1-regularized least squares, Comput. Methods Appl. Mech. Engrg, (2021), p. 113620.
  • [6].DAHLQUIST G. AND BJÖRCK A, Numerical Methods in Scientific Computing: Volume 1, vol. 103, SIAM, 2008. [Google Scholar]
  • [7].KANG SH, LIAO W, AND LIU Y, IDENT: Identifying differential equations with numerical time evolution, J. Sci. Comput, 87 (2021), 1. [Google Scholar]
  • [8].KARIYA T. AND KURATA H, Generalized Least Squares, John Wiley & Sons, New York, 2004. [Google Scholar]
  • [9].KELLER RT AND DU Q, Discovery of dynamics using linear multistep methods, SIAM J. Numer. Anal, 59 (2021), pp. 429–455. [Google Scholar]
  • [10].LAGERGREN J, NARDINI JT, LAVIGNE GM, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatio-temporal data, Proc. A, 476 (2020), 20190800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].LAGERGREN JH, NARDINI JT, MICHAEL LAVIGNE G, RUTTER EM, AND FLORES KB, Learning partial differential equations for biological transport models from noisy spatiotemporal data, Proc. A, 476 (2020), 20190800, 10.1098/rspa.2019.0800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].LILLACCI G. AND KHAMMASH M, Parameter estimation and model selection in computational biology, PLoS Comput. Biol, 6 (2010), e1000696, 10.1371/journal.pcbi.1000696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].LU F, MAGGIONI M, AND TANG S, Learning interaction kernels in heterogeneous systems of agents from multiple trajectories, J. Mach. Learn. Res, 22 (2021), pp. 1–67. [Google Scholar]
  • [14].MESSENGER DA AND BORTZ DM, Weak SINDy for Partial Differential Equations, arXiv preprint, arXiv:2007.02848 [math.NA], 2020, https://arxiv.org/abs/2007.02848. [DOI] [PMC free article] [PubMed]
  • [15].RAISSI M, PERDIKARIS P, AND KARNIADAKIS GE, Machine learning of linear differential equations using Gaussian processes, J. Comput. Phys, 348 (2017), pp. 683–693. [Google Scholar]
  • [16].RUDY SH, KUTZ JN, AND BRUNTON SL, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, J. Comput. Phys, 396 (2019), pp. 483–506. [Google Scholar]
  • [17].SCHAEFFER H. AND MCCALLA SG, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), 023302. [DOI] [PubMed] [Google Scholar]
  • [18].SCHAEFFER H, TRAN G, WARD R, AND ZHANG L, Extracting structured dynamical systems using sparse optimization with very few samples, Multiscale Model. Simul, 18 (2020), pp. 1435–1461. [Google Scholar]
  • [19].TONI T, WELCH D, STRELKOWA N, IPSEN A, AND STUMPF MP, Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems, J. R. Soc. Interface, 6 (2009), pp. 187–202, 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].TRAN G. AND WARD R, Exact recovery of chaotic systems from highly corrupted data, Multiscale Model. Simul, 15 (2017), pp. 1108–1129. [Google Scholar]
  • [21].WANG W-X, YANG R, LAI Y-C, KOVANIS V, AND GREBOGI C, Predicting catastrophes in nonlinear dynamical systems by compressive sensing, Phys. Rev. Lett, 106 (2011), p. 154101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].WARNE DJ, BAKER RE, AND SIMPSON MJ, Using experimental data and information criteria to guide model selection for reaction–diffusion problems in mathematical biology, Bull. Math. Biol, 81 (2019), pp. 1760–1804, 10.1007/s11538-019-00589-x. [DOI] [PubMed] [Google Scholar]
  • [23].WU H. AND WU L, Identification of significant host factors for HIV dynamics modelled by non-linear mixed-effects models, Stat. Med, 21 (2002), pp. 753–771, 10.1002/sim.1015. [DOI] [PubMed] [Google Scholar]
  • [24].WU K, QIN T, AND XIU D, Structure-preserving method for reconstructing unknown Hamiltonian systems from trajectory data, SIAM J. Sci. Comput, 42 (2020), pp. A3704–A3729. [Google Scholar]
  • [25].WU K. AND XIU D, Numerical aspects for approximating governing equations using data, J. Comput. Phys, 384 (2019), pp. 200–221. [Google Scholar]
  • [26].ZHANG S. AND LIN G, Robust data-driven discovery of governing physical laws with error bars, Proc. A, 474 (2018), 20180305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].ZHANG S. AND LIN G, Robust Subsampling-Based Sparse Bayesian Inference to Tackle Four Challenges (Large Noise, Outliers, Data Integration, and Extrapolation) in the Discovery of Physical Laws from data, arXiv preprint, arXiv:1907.07788 [stat.ML], 2019, https://arxiv.org/abs/1907.07788.

RESOURCES