Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 20.
Published in final edited form as: Physica D. 2022 Jun 18;439:133406. doi: 10.1016/j.physd.2022.133406

Learning mean-field equations from particle data using WSINDy

Daniel A Messenger 1,*, David M Bortz 1
PMCID: PMC10358825  NIHMSID: NIHMS1912875  PMID: 37476028

Abstract

We develop a weak-form sparse identification method for interacting particle systems (IPS) with the primary goals of reducing computational complexity for large particle number N and offering robustness to either intrinsic or extrinsic noise. In particular, we use concepts from mean-field theory of IPS in combination with the weak-form sparse identification of nonlinear dynamics algorithm (WSINDy) to provide a fast and reliable system identification scheme for recovering the governing stochastic differential equations for an IPS when the number of particles per experiment N is on the order of several thousands and the number of experiments M is less than 100. This is in contrast to existing work showing that system identification for N less than 100 and M on the order of several thousand is feasible using strong-form methods. We prove that under some standard regularity assumptions the scheme converges with rate O(N12) in the ordinary least squares setting and we demonstrate the convergence rate numerically on several systems in one and two spatial dimensions. Our examples include a canonical problem from homogenization theory (as a first step towards learning coarse-grained models), the dynamics of an attractive–repulsive swarm, and the IPS description of the parabolic–elliptic Keller–Segel model for chemotaxis. Code is available at https://github.com/MathBioCU/WSINDy_IPS.

Keywords: Data-driven modeling, Interacting particle systems, Weak form, Mean-field limit, Sparse regression

1. Problem statement

Recently there has been considerable interest in the methodology of data-driven discovery for governing equations. Building on the Sparse Identification of Nonlinear Dynamics (SINDy) [1], we developed a weak form version (WSINDy) for ODEs [2] and for PDEs [3]. In this work, we develop a formulation for discovering governing stochastic differential equations (SDEs) for interacting particle systems (IPS). To promote clarity and for reference later in the article, we first state the problem of interest. Subsequently, we will provide a discussion of background concepts and current results in the literature.

Consider a particle system Xt=(Xt(1),,Xt(N))RNd where on some fixed time window t[0,T], each particle Xt(i)Rd evolves according to the overdamped dynamics

dXt(i)=(KμtN(Xt(i))V(Xt(i)))dt+σ(Xt(i))dBt(i) (1.1)

with initial data X0(i) each drawn independently from some probability measure μ0Pp(Rd), where Pp(Rd) is the space probability measures on Rd with finite pth moment.1 Here, K is the interaction potential defining the pairwise forces between particles, V is the local potential containing all exogenous forces, σ is the diffusivity, and (Bt(i))i=1,,N are independent Brownian motions each adapted to the same filtered probability space (Ω, B, P, ((Ft)t0). The empirical measure is defined

μtN1Ni=1NδXt(i),

and the convolution KμtN is defined

KμtN(x)=RdK(xy)dμtN(y)=1Ni=1NK(xXt(i))

where we set K(0)=0 whenever K(0) is undefined. The recovery problem we wish to solve is the following.

(P) Let X=(Xt(1),,Xt(M)) be discrete-time data at L timepoints t(t1,,tL for M i.i.d. trials of the process (1.1) with K=K, V=V, and σ=σ and let Y=X+ε be a corrupted dataset. For some fixed compact domain DRd containing supp (Y), and finite-dimensional hypothesis spaces2 HKL2(DD), HVL2(D), and HσL2(D), solve

(K^,V^,σ^)=argminKHK,VHV,σHσKKL2(DD)+VVL2(D)+σσL2(D).

The problem (P) is clearly intractable because we do not have access to K, V, or σ, and moreover the interactions between these terms render simultaneous identification of them ill-posed. We consider two cases: (i) ε0 and σ=0, corresponding to purely extrinsic noise, and (ii) ε=0 and σ0, corresponding to purely intrinsic noise. The extrinsic noise case is important for many applications, such as cell tracking, where uncertainty is present in the position measurements. In this case we examine ε representing i.i.d. Gaussian noise with mean zero and variance3 ϵ2Id added to each particle position in X. In the case of purely intrinsic noise, identification of the diffusivity σ is required as well as the deterministic forces on each particle as defined by K and V. A natural next step is to consider the case with both extrinsic and intrinsic noise. However, the combined noise case is sufficiently nuanced as to render it beyond the scope of the article, and we leave it for future work.

2. Background

Interacting particle systems (IPS) such as (1.1) are used to describe physical and artificial phenomena in a range of fields including astrophysics [4,5], molecular dynamics [6], cellular biology [7-9], and opinion dynamics [10]. In many cases the number of particles N is large, with cell migration experiments often tracking 103 – 106 cells and simulations in physics (molecular dynamics, particle-in-cell, etc.) requiring N in the range 106 – 1012. Inference of such systems from particle data thus requires efficient means of computing pairwise forces from O(N2) interactions at each timestep for multiple candidate interaction potentials K. Frequently, so-called mean-field equations at the continuum level are sufficient to describe the evolution of the system, however in many cases (e.g. chemotaxis in biology [11]) only phenomenological mean-field equations are available. Moreover, it is often unclear how many particles N are needed for a mean-field description to suffice. Many disciplines are now developing machine learning techniques to extract coarse-grained dynamics from high-fidelity simulations (see [12] for a recent review in molecular dynamics). In this work we provide a means for inferring governing mean-field equations from particle data assumed to follow the dynamics (1.1) that is highly efficient for large N, and is effective in learning mean-field equations when N is in range 103 – 105.

Inference of the drift and diffusion terms for stochastic differential equations (SDEs) is by now a mature field, with the primary method being maximum-likelihood estimation, which uses Girsanov’s theorem together with the Radon–Nikodym derivative to arrive at a log-likelihood function for regression. See [13, 14] for some early works and [15] for a textbook on this approach. More recently, sparse regression approaches using the Kramers–Moyal expansion have been developed [16-18] and the authors of [19] use sparse regression to learn population level ODEs from agent-based modeling simulations. The authors of [20] also derived a bias-correcting regression framework for inferring the drift and diffusion in underdamped Langevin dynamics, and in [21] a neural network-based algorithm for inferring SDEs was developed.

Only in the last few years have significant strides been made towards parameter inference of interacting particle systems such as (1.1) from data. Apart from some exceptions, such as a Gaussian process regression algorithm recently developed in [22], applications of maximum likelihood theory are by far the most frequently studied. An early but often overlooked work by Kasonga [23] extends the maximum-likelihood approach to inference of the interaction potential K, assuming full availability of the continuous particle trajectories and the diffusivity σ. Two decades later, Bishwal [24] further extended this approach to discrete particle observations in the specific context of linear particle interactions. In both cases, a sequence of finite-dimensional subspaces is used to approximate the interaction function, and convergence is shown as the dimension of the subspace J and number of particles N both approach infinity. More recently, the maximum likelihood approach has been carried out in [25,26] in the case of radial interactions and in [27] in the case of linear particle interactions and single-trajectory data (i.e. one instance of the particle system). The authors of [28] recently developed an online maximum likelihood method for inference of IPS, and in [29] maximum likelihood is applied to parameter estimation in an IPS for pedestrian flow. It should also be noted that parameter estimation for IPS is common in biological sciences, with the most frequently used technique being nonlinear least squares with a cost function comprised of summary statistics [7,30].

Problem (P) is made challenging by the coupled effects of K, V, and σ. In each of the previously mentioned algorithms, the assumption is made that σ is known and/or that K takes a specific form (radial or linear). In addition, the maximum likelihood-based approach approximates the differential dXt(i) of particle i using a 1st-order finite difference: dXt(i)Xt+Δt(i)Xt(i), which is especially ill-suited to problems involving extrinsic noise in the particle positions. Our primary goal is to show that the weak-form sparse regression framework allows for identification of the full model (K, V, σ), with significantly reduced computational complexity, when N is on the order of several thousands or more. We use a two-step process: the density of particles is approximated using a density kernel G and then the WSINDy algorithm (weak-form sparse identification of nonlinear dynamics) is applied in the PDE setting [2,3]. WSINDy is a modified version of the original SINDy algorithm [1,31] where the weak formulation of the dynamics is enforced using a family of test functions that offers reduced computational complexity, high-accuracy recovery in low-noise regimes, and increased robustness to high-noise scenarios. The feasibility of this approach for IPS is grounded in the convergence of IPS to associated mean-field equations. The reduction in computational complexity follows from the reduction in evaluation of candidate potentials (as discussed in Section 4.2), as well as the convolutional nature of the weak-form algorithm.

To the best of our knowledge, we present here the first weak-form sparse regression approach for inference of interacting particle systems, however we now review several related approaches that have recently been developed. In [32], the authors learn local hydrodynamic equations from active matter particle systems using the SINDy algorithm in the strong-form PDE setting. In contrast to [32], our approach learns nonlocal equations using the weak-form, however similarly to [32] we perform model selection and inference of parameters using sparse regression at the continuum level. The weak form provides an advantage because no smoothness is required on the particle density (for requisite smoothness the authors of [32] use a Gaussian kernel, which is more expensive to compute than simple particle binning as done here). The authors of [33] developed an integral formulation for inference of plasma physics models from PIC data using SINDy, however their method involves first computing strong-form derivatives and then averaging, rather than integration by parts against test functions as done here, and as in [32], the learned models are local. In [34], the authors apply the maximum likelihood approach in the continuum setting on the underlying nonlocal Fokker–Planck equation and learn directly the nonlocal PDE using strong-form discretizations of the dynamics. While we similarly use the continuum setting for inference (albeit in weak form), our approach differs from [34] in that it is designed for the more realistic setting of discrete-time particle data, rather than pointwise data on the particle density (assumed to be smooth in [34]).

2.1. Contributions

The purpose of the present article is to show that the weak form provides an advantage in speed and accuracy compared with existing inference methods for particle systems when the number of particles is sufficiently large (on the order of several thousand or more). The key points of this article include:

  1. Formulation of a weak-form sparse recovery algorithm for simultaneous identification of the particle interaction force K, local potential V, and diffusivity σ from discrete-time particle data.

  2. Convergence with rate O(N12) of the resulting full-rank least-squares solution as the number of particles N and timestep Δt0.

  3. Numerical illustration of (II) along with robustness to either intrinsic randomness (e.g. Brownian motion) or extrinsic randomness (e.g. additive measurement noise).

2.2. Paper organization

In Section 3 we review results from mean-field theory used to show convergence of the weak-form method. In Section 4 we introduce the WSINDy algorithm applied to interacting particles, including hyperparameter selection, computational complexity, and convergence of the method under suitable assumptions in the limit of large N. Section 5 contains numerical examples exhibiting the convergence rates of the previous section and examining the robustness of the algorithm to various sources of corruption, and Section 6 contains a discussion of extensions and future directions. In the Appendix we provide information on the hyperparameters used A.1, derivation of the homogenized equation (5.3) A.2, results and discussion for the case of small N and large M (in comparison with [26]) A.3, and proofs to technical lemmas A.4. Table 1 includes a list of notations used throughout.

Table 1.

Notations used throughout.

Variable Definition Domain
K Pairwise interaction potential Lloc1(Rd,R)
V Local potential C(Rd,R)
σ Diffusivity C(Rd,Rd×d)
N Number of particles per experiment {2, 3,…}
d Dimension of latent space N
T Final time (0, ∞)
(Ω,B, P, (F)t0, (Bt(i))i=1N) Filtered probability space
Rd Independent Ω,B Brownian motions on (P, (F)t0, (Bt(i))i=1N, Xt(i))
ith t particle in the particle system (1.1) at time Rd Xt
N t-particle system (1.1) at time RNd μtN
Empirical measure of Xt P(Rd)
FtN Distribution of Xt P(RNd)
Xt Mean-field process (3.2) at time t Rd
μt Distribution of Xt P(Rd)
t L discrete timepoints [0, [0,T]]
Xt Collection of M independent samples of Xt at t RMLNd
Yt Sample of Xt corrupted with i.i.d. additive noise RMLNd
Ut Approximate density from particle positions P(Rd)
G Density kernel mapping μtN to Ut L1(Rd×Rd,R)
D Spatial support of Ut, t[0,T] Compact subset of Rd
C Discretization of D
Ut Discrete approximate density Ut(C)
,h semi-discrete inner product, trapezoidal rule over C
,h,Δt Fully-discrete inner product, trapezoidal rule over C×t
LK Library of candidate interaction forces
LV Library of candidate local forces
Lσ Library of candidate diffusivities
L (LK, LV, Lσ)
Ψ Set of n test functions (ψk)k=1n C2(Rd×(0,T))
ϕm,p(v;Δ) Test functions used in this work (Eq. (4.4)
λ Set of sparsity thresholds
L Loss function for sparsity thresholds (Eq. (4.6)

3. Review of mean-field theory

Our weak-form approach utilizes that under fairly general assumptions the empirical measure μtN of the process Xt defined in (1.1) converges weakly to μt, the distribution of the associated mean-field process Xt defined in (3.2). Specifically, under suitable assumptions on V, K, σ, and μ0, there exists T>0 such that for all t[0,T], the mean-field limit4

limNμtN=μt

holds in the weak topology of measures,5 where μt is a weak-measure solution to the mean-field dynamics

tμt=(μtKμt)+(μtV)+12i,j=1d2xixj(σσTμt),μ0P2(Rd). (3.1)

Eq. (3.1) describes the evolution of the distribution of the McKean–Vlasov process

dXt=Kμt(Xt)dtV(Xt)dt+σ(Xt)dBt. (3.2)

This implies that as N, an initially correlated particle system driven by pairwise interaction becomes uncorrelated and only interacts with its mean-field distribution μt. In particular, the following theorem summarizes several mean-field results taken from the review article [35] with proofs in [36,37].6

Theorem ([35-37]). Assume that ΔK is globally Lipschitz, V=0, and σ(x)=σ=const. In addition assume that μ0P2(Rd). Then for any T>0, for all tT it holds that

  1. There exists a unique solution (Xt, μt) where Xt is a strong solution to (3.2) and μt is a weak-measure solution to (3.1).

  2. For any ϕCb1(Rd),
    E[(1Ni=1Nϕ(Xt(i))Rdϕ(x)dμt(x))2]CϕC12N (3.3)

    with C depending on Lip(K) and T.

  3. For any kN, a.e. t<T, the k-particle marginal
    ρt(k),N(x1,,xk)Rd(Nk)FtN(x1,,xk,xk+1,,xN)×dxk+1dxN

    converges weakly to μtk as N, where FtNP(RNd) is the distribution of Xt.

The previous result immediately extends to the case of V and σ both globally Lipschitz and has been extended to K only locally-Lipschitz in [38], K with Coulomb-type singularity at the origin in [39], and domains with boundaries in [40,41]. Analysis of the model (3.1) continues to evolve in various contexts, including analysis of equilibria [42-44] and connections to deep learning [45]. For our convergence result below we simply assume that K, V, σ and μ0 are such that (i) and (ii) from the above theorem hold.

3.1. Weak form

Despite the O(N12) convergence of the empirical measure in previous theorem, it is unclear at what particle number N the mean-field equations become a suitable framework for inference using particle data, due to the complex variance structure at any finite N. A key piece of the present work is to show that the weak form of the mean-field equations does indeed provide a suitable setting when N is at least several thousands. Moreover, since in many cases (3.1) can only be understood in a weak sense, the weak form is the natural framework for identification. We say that μt is a weak solution to (3.1) if for any ψC2(Rd×(0,T)) compactly supported it holds that

0TRdtψ(x,t)dμt(x)dt=0TRd(ψ(x,t)Kμt(x)+ψ(x,t)V(x)12Tr(2ψ(x,t)σ(x)σT(x)))dμt(x)dt, (3.4)

where 2ψ denotes the Hessian of ψ and Tr(A) is the trace of the matrix A. Our method requires discretizing (3.4) for all ψΨ where Ψ=(ψ1,,ψn) is a suitable test function basis, and approximating the mean-field distribution μt with a density Ut constructed from discrete particle data at time t. We then find K, V, and σ within specified finite-dimensional function spaces.

4. Algorithm

We propose the general Algorithm 4.1 for discovery of mean-field equations from particle data. The inputs are a discrete-time sample Y containing M experiments each with N particle positions over L timepoints t=(t1,,tL). The following hyperparameters are defined by the user: (i) a kernel G used to map the empirical measure μtN to an approximate density Ut, (ii) a spatial grid C over which to evaluate the approximate density Ut=Ut(C), (iii) a library of trial functions L={LK,LV,Lσ}={(Kj)j=1JK,(Vj)j=1JV,(σj)j=1Jσ}, (iv) a basis of test functions Ψ=(ψk)k=1n, (v) a quadrature rule over the spatiotemporal grid (C, t) denoted by an inner product ⟨·, ·⟩, and (vi) sparsity factors λ for the modified sequential thresholding least-squares Algorithm 4.2 (MSTLS) reviewed below. We discuss choices of these hyperparameters in Section 4.1, computational complexity of the algorithm in Section 4.2, convergence of the algorithm in Section 4.3. In Section 4.4 we briefly discuss gaps between theory and practice. Table 1 includes a list of notations used throughout.

Algorithm 4.1 WSINDy for identifying mean-field Eq. (3.1) from particle data Y
(w^,λ^)=WSINDy(Y,t;G,C,L,Ψ,,,λ)

4.1. Hyperparameter selection

4.1.1. Quadrature

We assume that the set of gridpoints C in Algorithm 4.1 is chosen from some compact domain DRd containing supp (Y). The choice of C (and D) must be chosen in conjunction with the quadrature scheme, which includes integration in time using the given timepoints t as well as space. For completeness, the inner products in lines 10, 16, 22, and 27 of Algorithm 4.1 are defined in the continuous setting by

f,g=0TDf(x,t)g(x,t)dxdt,

and the convolution in line 10 is defined by

KjUt(x)=DKj(xy)Ut(y)dy.

In the present work we adopt the scheme used in the application of WSINDy for local PDEs [3], which includes the trapezoidal rule in space and time with test functions ψ compactly supported in D×(0,T). We take D to be a rectangular domain enclosing supp (Y) and CD to be equally-spaced in order to efficiently evaluate convolution terms. In what follows we denote by ⟨·, ·⟨ the continuous inner product, , the inner product over ,h evaluated using the composite trapezoidal rule in space with meshwidth D×[0,T] and Lebesgue integration in time, and by h the trapezoidal rule in both space and time, with meshwidth ,h,Δt in space and h in time. With some abuse of notation, Δt will denote the convolution of fg and f, understood to be discrete or continuous by the context. Note also that we denote by g, μN, and μ the measures over U defined by Rd×[0,T], μtNΛ[0,T] and μtΛ[0,T], respectively, where UtΛ[0,T] is the Lebesgue measure on [0, Λ[0,T]].

4.1.2. Density kernel

Having chosen the domain [0,t] containing the particle data DRd, let Y be a partition of Ph={Bk}k with D(kBk=D) indicating the size of the atoms h. For the remainder of the paper we take Bk to be hypercubes of equal side length Bk in order to minimize computation time for integration, although this is by no means necessary. For particle positions h, we define the histogram7

Ut=k1Bk1Bk(x)(1Ni1Bk(Xt(i)))=DG(x,y)dμtN(y). (4.1)

Here the density kernel is defined

G(x,y)=k1Bk1Bk(x)1Bk(y),

and in this setting the corresponding spatial grid Xt is the set of center-points of the bins C=(ck)k, from which we define the discrete histogram data Bk. The discrete histogram Ut=Ut(C) then serves as an approximation to the mean-field distribution Ut.

Pointwise estimation of densities from samples of particles usually requires large numbers of particles to achieve reasonably low variance, and in general the variance grows inversely proportional to the bin width μt. One benefit of the weak form is that integrating against a histogram h does not suffer from the same increase in variance with small U. In particular,

Lemma 1. Let h be a sequence of (Y(1),Y(2),)-valued random variables such that the empirical measure Rd of μN converges weakly to Y(Y(1),,Y(N)) according to

E[(ψ,μNψ,μ)2]CψC12N1 (4.2)

for all μp(Rd) and ψC1(Rd) a universal constant. Let C be the histogram computed with kernel U using (4.1) with G bins and equal sidelength n. Then for any h in ψ compactly supported in C1(Rd), we have the mean-squared error (for D depending on C~ and C)

E[(ψ,Uhψ,μ)2]C~ψC12(h2+N1).

Remark 1. We note that (4.2) follows immediately for d i.i.d.,8 and also for Y(i)μ a solution to (1.1) at time Y=Xt with mean-field distribution t according to (3.3) (for suitable μ=μt, K, and V), which is the setting of the current article.

Proof of Lemma 1. First we note that by compact support of σ, the trapezoidal rule can be written

ψ,Uh=ψ,RdG(,y)dμN(y)h=ψC,μN=1Ni=1NψC(Y(i))

where the midpoint approximation ψ of ψC is given by

ψC(x)=k=1Kψ(ck)1Bk(x). (4.3)

Hence we simply split the error and use (4.2):

E[(ψ,Uhψ,μ)2]2E[ψCψ,μN2]+2E[(ψ,μNψ,μ)2]ψC12(d2h2+2CN1).

The previous lemma in particular shows that small bin width ψ does not negatively impact h as an estimator of ψ,Uh, which is in contrast to ψ,μ as a pointwise estimator of U(x). For example, if we assume that μ(x) is sampled from a Y density C1, it is well known that the mean-square optimal bin width is μ [46]. Summarizing this result, elementary computation reveals the pointwise bias for h=O(N13),

bias(U(x))=E[U(x)]μ(x)=μ(Bk)Bkμ(x)μ(ξ)μ(x)

for some xBk. Letting ξBk, we have

bias(U(x))2Lk22d1h2.

For the variance we get

Var(U(x))=1Nμ(Bk)(1μ(Bk))Bk2=μ(ξ)N(1μ(Bk))12d1h,

and hence a bound for the mean-squared error

E[(U(x)μ(x))2]Lk22d1h2+μ(ξ)N2d1h1.

Minimizing the bound over Lk=maxxBkμ(x) we find an approximately optimal bin width

h=(ρ(ξ)23d12Lk2)13N13=O(N13),

which provides an overall pointwise root-mean-squared error of h. Hence, not only does the weak form remove the inverse O(N13) dependence in the variance, but fewer particles are needed to accurately approximate integrals of the density h.

4.1.3. Test function basis

For the test functions μ we use the same approach as the PDE setting [3], namely we fix a reference test function (ψk)1kn and set

ψk(x,t)=ψ(xkx,tkt)

where ψ is a fixed set of query points. This, together with a separable representation

ψ(x,t)=ϕ1(x1)ϕd(xd)ϕd+1(t),

enables construction of the linear system (Q{(xk,tk)}1kn, G) using the FFT. We choose b, ϕj, of the form

ϕm,p(v;Δ)max(1(vmΔ)2,0)p (4.4)

where 1jd+1 is the integer support parameter such that m is supported on ϕm,p points of spacing 2m+1 and Δ{h,Δt} is the degree of p1. For simplicity we set ϕm,p for ϕj=ϕmx,px and 1jd, so that only the numbers ϕd+1=ϕmt,pt, mx, px, mt need to be specified.

Since pt has exactly ϕm,p weak derivatives, p and px must be at least as large as the maximum spatial and temporal derivatives appearing in the library pt, or L, px2. Larger pt1 results in higher-accuracy enforcement of the weak form (3.4) in low-noise situations (see Lemma 2 of [2] for details), however the convergence analysis below indicates that smaller p, Lip(αψ), may reduce variance. The support parameter m determines the length and time scales of interest and must be chosen small enough to extract relevant scales yet large enough to sufficiently reduce variance.

In [3, Appendix A] the authors developed a changepoint algorithm to choose α2, mx, mt, px automatically from the Fourier spectrum of the data pt. Here, for each of the three examples in Section 5, we fix U across all particle numbers ψ, extrinsic noises N, and intrinsic noises ε, in order to instead focus on convergence in σ. To strike a balance between accuracy and small N we choose Lip(ψ) and pt=3 throughout. We used a combination of the changepoint algorithm and manual tuning to arrive at px=5 and mx which work well across all noise levels and numbers of particles examined. Query points mt are taken to be an equally-spaced subgrid of Q with spacing C and sx for spatial and temporal coordinates. The resulting values st, px, pt, mx, mt, and sx determine the weak discretization scheme and can be found in Appendix A.1 for each example below.

The results in Section 5 appear robust to st, 3px. In addition, choosing pt9 and mx specific to each dataset mt using the changepoint method often improves results. Although automated in the changepoint algorithm, we recommend visualizing the overlap between the Fourier spectra of Y and ψ when choosing U, px, pt, mx in order to directly observe which the modes in the data will experience filtering under convolution with mt. In general, there is much flexibility in the choice of ψ. Optimizing ψ continues to be an active area of research.

4.1.4. Trial function library

The general Algorithm 4.1 does not impose a radial structure for the interaction potential ψ, nor does it assume any prior knowledge that the particle system is in fact interacting. In the examples below,9 the libraries K, LK, LV are composed of monomial and/or trigonometric terms to demonstrate that sparse regression is effective in selecting the correct combination of nonlocal drift, local drift, and diffusion terms. Rank deficiency can result, however, from naive choices of nonlocal and local bases. Consider the kernel Lσ, which satisfies

Kμt=xM1(μt)=V(x)

where K(x)=12x2 and V(x)=12xM1(μt)2 is the first moment of M1(μt). Since μt is conserved in the model (3.2) posed in free-space,10 including the same power-law terms in both libraries M1(μt) and LK will lead to rank deficiency. This is easily avoided by incorporating known symmetries of the model (3.2), however in general we recommend that the user builds the library LV incrementally and monitors the condition number of L while selecting terms.

4.1.5. Sparse regression

As in [3], we enforce sparsity using a modified sequential thresholding least-squares algorithm (MSTLS), included as Algorithm 4.2, where the “modifications” are two-fold. First, we incorporate into the thresholding step the magnitude of the overall term G as well as the coefficient magnitude wjGj2, by defining non-uniform lower and upper thresholds

{Ljλ=λmax{1,bGj}Ujλ=1λmin{1,bGj}},1jJ, (4.5)

where wj is the number of columns in J=JK+JV+Jσ. Second, we perform a grid search11 over candidate sparsity parameters G and choose the parameter λ that is the smallest minimizer over λ^ of the cost function

L(λ)=G(wλw0)2Gw02+wλ0J (4.6)

where λ is the output of the sequential thresholding algorithm with non-uniform thresholds (4.5) and wλ is the least-squares solution.12 The final coefficient vector is then set to w0=Gb.

We now review some aspects of Algorithm 4.2. Results from [47] on the convergence of STLS carry over for the inner loop of Algorithm 4.2, namely if w^=wλ^ is full-rank, the inner loop terminates in at most G iterations with a resulting coefficient vector J that is a local minimizer of the cost function wλ. This implies that the full algorithm terminates in atmost F(w)=Gwb22+λ2w0 least-squares solves (each on a subset of columns of mJ).

When considering recovery of the true weight vector G, Theorem 1 implies convergence in particle number w of N to w^ when w is full-rank. The rate of convergence depends implicitly on the condition number of G, hence it is recommended that one builds the library G incrementally, stopping before the conditional number L grows too large. If κ(G) is rank deficient, classical recovery guarantees from compressive sensing do not necessarily apply, due to high correlations between the columns of G (recall each column is constructed from the same dataset G).13 One may employ additional regularization (e.g. Tikhonov regularization as in [31]); however, in general, improvements to existing sparse regression algorithms for rank-deficient, noisy, and highly-correlated matrices is an active area of research.

Algorithm 4.2 Modified sequential thresholding with automatic threshold selection
U

The bounds (4.5) enforce a quasi-dominant balance rule, such that wjGj2 is within log10(λ) orders of magnitude from b2 and wj is within log10(λ) orders of magnitude from 1 (the coefficient of time derivative tμt. This is specifically designed to handle poorly-scaled data (see the Burgers and Korteweg–de Vries examples in [3]), however we leave a more thorough examination of the thresholding requirements necessary for models with multiple scales to future work.

As the sum of two relative errors, minimizers of the cost function L equally weight the accuracy and sparsity of wλ^. By choosing λ^ to be the smallest minimizer of L over λ, we identify the thresholds λλ such that λ<λ^ as those resulting in an overfit model. We commonly choose λ to be log-equally spaced (e.g. 50 points from 10−4 to 1), and starting from a coarse grid, refine λ until the minimum of L is stationary.

4.2. Computational complexity

To compute convolutions against K for each KLK, we first evaluate (xiK)1id at the grid CC defined by

CC{xRd:x=(i1h,,idh),nin},

where h is the spacing of C and n, 1d, is the number of points in C along the th coordinate. Computing14 xiKxi(CC) requires 2dC evaluations of K, where C==1dn is the number of points in C. We then use the d-dimensional FFT to compute the convolutions

xiKUtxiKUt(C),tt

where only entries corresponding to particle interactions within C are retained. For d=1 this amounts to O(ClogC) flops per timestep. For d=2 and higher dimensions, the d-dimensional FFT is considerably slower unless one of the arrays is separable. To enforce separability, trial interaction potentials in LK can be chosen to be a sum of separable functions,

K(x)=q=1Qk1,q(x1)kd,q(xd), (4.7)

in which case only a series of one-dimensional FFTs are needed to compute xiKUt, and again the cost is O(ClogC) per timestep. When K is not separable, a low-rank approximation can be computed from xiK,

xiKq=1Qσqk1,qkd,q (4.8)

which again reduces convolutions to a series of one-dimensional FFTs. For d=2, this is accomplished using the truncated SVD, while for higher dimensions there does not exist a unique best rank-Q tensor approximation, although several efficient algorithms are available to compute a sufficiently accurate decomposition [49-51] (and the field of fast tensor decompositions is advancing rapidly).

We propose to compute convolutions by first computing a low-rank decomposition of xiK using the randomized truncated SVD [52] or a suitable randomized tensor decomposition and then applying the d-dimensional FFT as a series of one-dimensional FFTs. In the examples below we consider only d=1 and d=2, and leave extension to higher dimensions to future work.

Using low-rank approximations, the mean-field approach provides a significant reduction in computational complexity compared to direct evaluations of particle trajectories when N is sufficiently large. A particle-level computation of the nonlocal force in weak-form requires evaluating terms of the form

=1L(1N2i=1Nj=1Nxψ(Xt(i),t)xK(Xt(i)Xt(j)))Δt=xψ,μN(xKμN)h,Δt.

For a single candidate interaction potential K, a collection of J test functions ψ, and M experiments, this amounts to MLN2+MLNJ function evaluations in Rd and O(MLN2J) flops. If we use the proposed method, employing the convolutional weak form with a separable reference test function ψ (as in WSINDy for PDEs [3]) and exploiting a rank Q approximation of xK when computing convolutions against interaction potential, we instead evaluate

xψ(U(xKU))

using O(LQClog(C)) flops and only 2dC evaluations of xK, reused at each of the L timepoints.15 Fig. 1 provides a visualization of the reduction in function evaluations for L=100 timepoints and M=10 experiments over a range of N and C1d (points along each spatial dimension when C is a hypercube) in d=2 and d=3 spatial dimensions. Table 5 in Appendix A.1 lists walltimes for the examples below, showing that with N=64,000 particles the full algorithm implemented in MATLAB runs in under 10 s with all computations in serial on a laptop with an AMD Ryzen 7 pro 4750u processor, and requiring less than 8 Gb of RAM. The dependence on N is only through the O(N) computation of the histogram, hence this approach may find applications in physical coarse-graining (e.g. of molecular dynamics or plasma simulations).

Fig. 1.

Fig. 1.

Factor by which the mean-field evaluation of interaction forces using histograms reduces total function evaluations as a function of particle number N and average gridpoints per dimension C1d for data with M=10 experiments each with L=100 timepoints. For example, with d=2 spatial dimensions (left) and N>2000 particles, the number of function evaluations is reduced by at least a factor of 104.

4.3. Convergence

We now show that the estimators K^, V^, and σ^ of the weak-form method converge with a rate O(h+N12+Δtη) when ordinary least squares are used (i.e. λ=0) and only M=1 experiment is available. Here η>0 is the Hölder exponent of the sample paths of the process Xt. We assume that D, C, G, Ph and the resulting histogram U=(Ut)tT are as in Section 4.1.2. We make the following assumptions on the true model and resulting linear system throughout this section.

Assumption H. Let p1 be fixed.

  • (H.1) For each N2, Xt=(Xt(1),,Xt(N)) is a strong solution to (1.1) for t[0,T], and for some η>0 the sample paths tXt(i)(ω) are almost-surely η-Hölder continuous, i.e. for some Cη>0,
    Xt(i)(ω)Xs(i)(ω)Cηtsη,0stT,1iN,fora.e.ωΩ.
  • (H.2) The initial particle distribution μ0 satisfies the moment bound
    Rdxpdμ0(x)Mp<.
  • (H.3) K and V satisfy for some Cp>0 the growth bound:
    V(x)V(y)+K(x)K(y)Cpxy(1+max{x,y}p1),x,yRd.
  • (H.4) For the same constant Cp>0, it holds that16
    σ(x)σ(y)FCpxy12×(1+max{x,y}p212),x,yRd
  • (H.5) The test functions (ψk)1kηC2(Rd×(0,T)) are compactly supported and together with the library L are such that G has full column rank with17 G1CG almost surely for some constant CG>0.

  • (H.6) The true functions K, V, and σ are in the span of L.

We will now define some notation and state some technical lemmas with proofs found in Appendix A.4. Define the weak-form operator

L(ρ,ψ,,)tψψKρψV+12Tr(2ψσ(σ)T),ρ, (4.9)

where ρ=(ρt)tT is a curve in Pp(Rd), ψ is a C2 function compactly supported over Rd×(0,T), and , is an inner product over Rd×(0,T). If ρ=(μt)tT is a weak solution to (3.1) and , is the L2(Rd) inner product then L(ρ,ψ,,)=0. If instead ρ=(μtN)tT, then by Itô’s formula L(ρ,ψ,,) takes the form of an Itô integral, and we have the following:

Lemma 2. Under Assumptions (H.1)-(H.5), there exists a constant C>0 independent of N such that

E[L(μN,ψ,,)]CN.

Proof. See Appendix A.4.

With the following lemma, we can relate the histogram U to the empirical measure μN through L using the inner product ,h defined by trapezoidal-rule integration in space and continuous integration in time.

Lemma 3. Under Assumptions (H.1)-(H.5), for C independent of N and h, it holds that

E[L(U,ψ,,h)L(μN,ψ,,)]Ch.

Proof. See Appendix A.4.

To incorporate discrete-time effects, we consider the difference between L(U,ψ,,h) and L(U,ψ,,h,Δt), where recall that ,h,Δt denotes trapezoidal rule integration in space with meshwidth h and in time with sampling rate Δt.

Lemma 4. Under Assumptions (H.1)-(H.5), for C independent of N, h, and Δt, it holds that

E[L(U,ψ,,h)L(U,ψ,,h,Δt)]C(h+Δtη).

Proof. See Appendix A.4.

The previous estimates directly lead to the following bound on the model coefficients w^:

Theorem 1. Assume that Assumption H holds. Let w^ be the learned model coefficients and w the true model coefficients. For C independent of N, h, and Δt it holds that

E[w^w1]C(h+N12+Δtη).

Proof. Using that K, V, and σ are in the span of L (H.6), we have that

bk=tψk,Uh,Δt=L(U,ψk,,h,Δt)+GkTwLk+GkTw,

where GkT is the kth row of G. From Lemmas 2-4 we have

E[Lk]E[L(U,ψk,,h,Δt)L(U,ψk,,h)]+E[L(U,ψk,,h)L(μN,ψk,,)]+E[L(μN,ψk,,)]C(h+N12+Δtη).

Using that G is full rank, it holds that w^=Gb=GL+w, hence the result follows from the uniform bound on G1 (H.5):

E[w^w1]E[G1L1]CCG(h+N12+Δtn).

Under the assumption (H.6), an immediate corollary is

E[KK^L2(DD)+VV^L2(D)][+σ(σ)Tσ^(σ^)TFL2(D)]C(h+N12+Δtη), (4.10)

This follows from

KK^L2(DD)j=1Jwjw^jKjL2(DD)(supjKjL2(DD))ww^1,

and similarly for V^ and σ^. Finally, setting h=Nα for α>0 will ensure convergence as N and Δt0.

4.4. Theory vs. Practice

We now make several remarks about the practical performance of Algorithm 4.1 with respect to the theoretical convergence of Theorem 1.

Remark 2. An important case of Theorem 1 is σ=0, in which case μtN itself is a weak-measure solution to the mean-field Eq. (3.1) and the algorithm returns, for η2, w^w1C(h+Δtη). This partially explains the accuracy observed for purely-extrinsic noise examples in Figs. 5 and 9. We note further that in the absence of noise (ε=0 and σ=0, not included in this work) Algorithm 4.1 recovers systems to high accuracy similarly to WSINDy applied to local dynamical systems [2,3].

Fig. 5.

Fig. 5.

Recovery of (3.1) in one spatial dimension for K=KQANR and σ=0 under different levels of observational noise ϵ. Left: relative error in learned interaction kernel K^. Middle: true positivity ratio for full model (3.1). Right: true positivity ratio for drift term.

Fig. 9.

Fig. 9.

Recovery of (3.1) in two spatial dimensions with K given by (5.5) from deterministic particles (σ=0) with extrinsic noise ϵ.

Remark 3. Algorithm 4.1 in general implements sparse regression, yet Theorem 1 deals with ordinary least squares. Since least squares is a common subroutine of many sparse regression algorithms (including the MSTLS algorithm used here), the result is still relevant to sparse regression. Lastly, the full-rank assumption on G implies that as N sequential thresholding reduces to least squares.

Remark 4. Theorem 1 assumes data from a single experiment (M=1), while the examples below show that M>1 experiments improve results. For any fixed M>1, the N limit results in convergence, however, the N-fixed and M limit does not result in convergence, as this does not lead to the mean-field equations.18 The examples below show that using M>1 has a practical advantage, and in Appendix A.3 we demonstrate that even for small particle systems (N=10) the large M regime yields satisfactory results.

Remark 5. Many interesting examples have non-Lipschitz K, in particular a lack of smoothness at x=0. If μtN does not converge to a singular measure as N, then the bound (A.4) holds for K with a jump discontinuity at x=0, where an additional O(h) term arises from pairwise interactions within an O(h) distance. The examples below are chosen in part to show that O(N12) convergence holds for K with jumps at the origin.

5. Examples

We now demonstrate the successful identification of several particle systems in one and two spatial dimensions as well as the O(N12) convergence predicted in Theorem 1. In each case we use Algorithm 4.1 to discover a mean-field equation of the form (3.1) from discrete-time particle data. For each dataset we simulate the associated interacting particle system Xt given by (1.1) using the Euler–Maruyama scheme (initial conditions and timestep are given in each example). We assess the ability of WSINDy to select the correct model using the true positivity ratio19

TPR(w^)=TPTP+FN+FP (5.1)

where TP is the number of correctly identified nonzero coefficients, FN is the number of coefficients falsely identified as zero, and FP is the number of coefficients falsely identified as nonzero [53]. To demonstrate the O(N12) convergence given by (4.10), for correctly identified models (i.e. TPR(w^)=1) we compute the relative 2-error of the recovered interaction force K^, local force V^, and diffusivity σ^ over CC and C, respectively, denoting this by in the plots below. Results are averaged over 100 trials.

For the computational grid C we first compute the sample standard deviation s of Y and we choose D to be the rectangular grid extending 3s from the mean of Y in each spatial dimension. We then set C to have 128 points in x and y for d=2 dimensions, and 256 points in x for d=1, noting that these numbers are fairly arbitrary, and used to show that the grid need not be too large. We set the sparsity factors so that log10(λ) contains 100 equally spaced points from −4 to 0. More information on the specifications of each example can be found in Appendix A.1. (MATLAB code used to generate examples is available at https://github.com/MathBioCU/WSINDy_IPS.)

5.1. Two-dimensional local model and homogenization

The first system we examine is a local model (K(x,y)=0) defined by the local potential V(x,y)=xy and diffusivity σ(x,y)=2(1+0.95cos(ωx)cos(ωy))I2, where I2 is the identity in R2. This results in a constant advection, variable diffusivity mean-field model20

tμt=xμtyμt+Δ[(1+0.95cos(ωx)cos(ωy))μt]. (5.2)

The purpose of this example is three-fold. First, we are interested in the ability of Algorithm 4.1 to correctly identify a local model from a library containing both local and nonlocal terms. Next, we evaluate whether the O(N12) convergence is realized. Lastly, we investigate whether for large ω the weak-form identifies the associated homogenized equation (derived in Appendix A.2)

tμt=xμtyμt+ω¯Δμt, (5.3)

where ω¯ is given by the harmonic mean of diffusivity:

ω¯=(Ddxdy1+0.95cos(x)cos(y))1.

For ω{1,20} we evolve the particles from an initial Gaussian distribution with mean zero and covariance I2 and record particle positions for 100 timesteps with Δt=0.02 (subsampled from a simulation with timestep 10−4). We use a rectangular domain D of approximate sidelength 10 and compute histograms with 128 bins in x and y for a spatial resolution of Δx0.078 (see Fig. 2 for solution snapshots), over which ω¯0.62. For ω=1 we compare recovered equations with the full model (5.2), while for ω=20 we compare with (5.3), for comparison computing ω¯ over each domain D using MATLAB’s integral2. Fig. 3 shows that as the particle number increases, we do in fact recover the desired equations, with TPR(w^) approaching one as N increases. For ω=1 we observe O(N12) convergence of the local potential V^ and the diffusivity σ^. For ω=20, we observe approximate O(N12) convergence of V^, and σ^ converging to within 2% of 2ω¯, the homogenized diffusivity (higher accuracy can hardly be expected for ω=20 since (5.3) is itself an approximation in the limit of infinite ω).

Fig. 2.

Fig. 2.

Snapshots at time t=2Δt=0.06 (left) and t=100Δt=2 (right) of histograms computed with 128 bins in x and y from 16,384 particles evolving under (5.2) with ω=1 (top) and ω=20 (bottom).

Fig. 3.

Fig. 3.

Convergence of σ^ (left) and V^ (middle), recall denotes the 2 norm, for (5.2) with ω{1,20}, as well as TPR(w^) (right). For ω=1, results are compared to the exact model (5.2), while for ω=20 results are compared to the homogenized equation (5.3).

5.2. One-dimensional nonlocal model

We simulate the evolution of particle systems under the quadratic attraction/Newtonian repulsion potential 1 2

KQANR(x)=12x2x (5.4)

with no external potential (V=0). The x portion of KQANR, leading to a discontinuity in K, is the one-dimensional free-space Green’s function for Δ. For d1, when replaced by the corresponding Green’s function in d dimensions, the distribution of particles evolves under KQANR into the characteristic of the unit ball in Rd, which has implications for design and control of autonomous systems [54]. We compare three diffusivity profiles, σ(x)=0 corresponding to zero intrinsic noise, σ(x)=2(0.1) leading to constant-diffusivity intrinsic noise, and σ(x)=2(0.1)x2 leading to variable-diffusivity intrinsic noise. With zero intrinsic noise (σ(x)=0), we examine the effect of extrinsic noise on recovery, and assume uncertainty in the particle positions due to measurement noise at each timestep, Y=X+ε, for εN(0,ϵ2XtRMS2) i.i.d. and ϵ {0.01, 0.0316, 0.1, 0.316}. In this way ϵ is the noise ratio, such that εFXFϵ (computed with ε and X stretched into column vectors).

Measurement data consists of 100 timesteps at resolution Δt=0.01, coarsened from simulations with timestep 0.001. Initial particle positions are drawn from a mixture of three Gaussians each with standard deviation 0.005. Histograms are constructed with 256 bins of width h=0.0234. Typical histograms for each noise level are shown in Fig. 4 computed one experiment with N=8000 particles.

Fig. 4.

Fig. 4.

Histograms computed with 256 bins width h=0.0234 from 8000 particles in 1D evolving under K=KQANR(x) (5.4). Top left to top right: σ=0, σ(x)=2(0.1), σ(x)=2(0.1)x2. Bottom: deterministic particles with i.i.d. Gaussian noise added to particle positions with resulting noise ratios (left to right) ϵ=0.0316, 0.1, 0.316.

For the case of extrinsic noise (Fig. 5), we use only one experiment (M=1) and examine the number of particles N and the noise ratio ϵ. We find that recovery is accurate and reliable for ϵ0.1, yielding correct identification of KQANR with less than 1% relative error in at least 98/100 trials. Increasing N from 500 to 8000 leads to minor improvements in accuracy for ϵ0.1, but otherwise has little effect, implying that for low to moderate noise levels the mean-field equations are readily identifiable even from smaller particle systems. For ϵ=10120.3162 (see Fig. 4 (bottom right) for an example histogram), we observe a decrease in TPR(w^) (Fig. 5 middle panel) resulting from the generic identification of a linear diffusion term vxxu with v0.05. Using that 2v2(0.05)=ϵ, we can identify this as the best-fit intrinsic noise model. Furthermore, increases in N lead to reliable identification of the drift term, as measured by TPR(w^drift) (rightmost panel Fig. 5) which is the restriction of TPR to drift terms LK and LV.

For constant diffusivity σ(x)=2(0.1) (Fig. 6), the full model is recovered with less than 3% errors in K^ and σ^ in at least 98/100 trials when the total particle count NM is at least 8000, and yields errors less than 1% for NM16,000. The error trends for K^ and σ^ in this case both strongly agree with the predicted O(N12) rate. For non-constant diffusivity σ(x)=2(0.1)x2 (Fig. 7), we also observe robust recovery (TPR(w^)0.95) for NM8000 with error trends close to O(N12), although the accuracy in K^ and σ^ is diminished due to the strong order Δt12 convergence of Euler–Maruyama applied to diffusivities σ that are unbounded in x [55].

Fig. 6.

Fig. 6.

Recovery of (3.1) in one spatial dimension for K=KQANR and σ=2(0.1).

Fig. 7.

Fig. 7.

Recovery of (3.1) in one spatial dimension for K=KQANR and σ=2(0.1)x2.

5.3. Two-dimensional nonlocal model

We now discuss an example of singular interaction in two spatial dimensions using the logarithmic potential

K(x)=12πlogx (5.5)

with constant diffusivity σ(x)=σ{0,14π}. This example corresponds to the parabolic–elliptic Keller–Segel model of chemotaxis, where σc14π is the critical diffusivity such that σ>σc leads diffusion-dominated spreading of particles throughout the domain (vanishing particle density at every point in R2) and σ<σc leads to aggregation-dominated concentration of the particle density to the dirac-delta located at the center of mass of the initial particle density [44,56]. For σ=0 we examine the affect of additive i.i.d. measurement noise εN(0,ϵ2XtRMS2) for ϵ {0.01, 0.0316, 0.1, 0.316, 1}.

We simulate the particle system with a cutoff potential

Kδ(x)={12π(log(δ)1+xδ),x<δ12πlogx,xδ} (5.6)

with δ=0.01, so that Kδ is Lipschitz and Kδ has a jump discontinuity at the origin. Initial particle positions are uniformly distributed on a disk of radius 2 and the particle position data consists of 81 timepoints recorded at a resolution Δt=0.1, coarsened from 0.0025. Histograms are created with 128 × 128 bins in x and y of sidelength h=0.0469 (see Fig. 8 for histogram snapshots over time). We examine M=20,,26 experiments with N=2000 or N=4000 particles.

Fig. 8.

Fig. 8.

Histograms created from 4000 particles evolving under logarithmic attraction (Eq. (5.5) with varying noise levels at times (left to right) t=4, t=8, and t=12. Top: ϵ=0.316, σ=0 (extrinsic only). Bottom: ϵ=0, σ=(4π)120.28 (intrinsic only).

In Fig. 9 we observe a similar trend in the σ=0 case as in the 1D nonlocal example, namely that recovery for ϵ0.01 is robust with low errors in K^ (on the order of 0.0032), only in this case the full model is robustly recovered up to ϵ=0.316. At ϵ=1, with N=4000 the method frequently identifies a diffusion term vΔu with v0.5=ϵ22, and for N=2000 the method occasionally identifies the backwards diffusion equation tμt=αΔμt, α>0. This is easily prevented by enforcing positivity of σ, however we leave this and other constraints as an extension for future work.

With diffusivity σ=14π, we obtain TPR(w^) approximately greater than 0.95 for NM16,000 (Fig. 10, right), with an error trend in K^ following an O(N12) rate, and a trend in σ^ of roughly O(N23). Since convergence in M for any fixed N is not covered by the theorem above, this shows that combining multiple experiments may yield similar accuracy trends for moderately-sized particle systems.

Fig. 10.

Fig. 10.

Recovery of (3.1) in two spatial dimensions with K given by (5.5) and σ=14π.

6. Discussion

We have developed a weak-form method for sparse identification of governing equations for interacting particle systems using the formalism of mean-field equations. In particular, we have investigated two lines of inquiry, (1) is the mean-field setting applicable for inference from medium-size batches of particles? And (2) can a low-cost, low-regularity density approximation such as a histogram be used to enforce weak-form agreement with the mean-field PDE? We have demonstrated on several examples that the answer is yes to both questions, despite the fact that the mean-field equations are only valid in the limit of infinitely many particles (N). This framework is suitable for systems of several thousand particles in one and two spatial dimensions, and we have proved convergence in N for the associated least-squares problem using simple histograms as approximate particle densities. In addition, the sparse regression approach allows one to identify the full system, including interaction potential K, local potential V, and diffusivity σ.

It was initially unclear whether the mean-field setting could be utilized in weak form for finite particle batches, hence this can be seen as a proof of concept for particle systems with N in the range 103 – 105. With convergence in N and low computational complexity, our weak-form approach is well-suited as is for much larger particle systems. In the opposite regime, for small fixed N, the authors of [26] show that their maximum likelihood-based method converges as M (i.e. in the limit of infinite experiments). While the same convergence does not hold for our weak-form method, the results in Section 5 suggest that in practice, combining M independent experiments each with N particles improves results. Furthermore, we include evidence in Appendix A.3 that even for small N, our method correctly identifies the mean-field model when M is large enough, with performance similar to that in [26]. We leave a full investigation of the interplay between M and N to future work.

In the operable regime of N>103, there is potential for improvements and extensions in many directions. On the subject of density estimation, histograms are highly efficient, yet they lead to piecewise-constant approximations of μt and hence O(h) errors. Choosing a density kernel G to achieve high-accuracy quadrature without sacrificing the O(N) runtime of histogram computation seems prudent, although one must be cautious about making assumptions on the smoothness of mean-field distribution μt. For instance, in the 1D nonlocal example 5.2, discontinuities develop in μt for the case σ=0, hence a histogram approximation is more appropriate than using e.g. a Gaussian kernel.

The computational grid C, quadrature method ,h,Δt, and reference test function ψ may also be optimized further or adapted to specific problems. The approach chosen here of C equally-spaced and separable piecewise-polynomial ψ, along with integration using the trapezoidal quadrature, has several advantages, including high accuracy and fast computation using convolutions. However, this may need adjustment for higher dimensions. It might be advantageous to adapt C to the data Y, however this may prevent one from evaluating (G, b) using the FFT if a non-uniform grid results, hence increases the overall computational complexity. One could also use multiple reference test functions ψ. The possibilities of varying the test functions (within the smoothness requirements of the library L) have been largely unexplored in weak-form identification methods.

Several theoretical questions remain unanswered, namely model recovery statistics for finite N. As a consequence of Theorem 1, as well as convergence results on sequential thresholding [47], we have that G being full-rank and L containing the true model is sufficient to guarantee convergence w^w as N at the rate O(N12). Noise, whether extrinsic or intrinsic, for finite N may result in identification of an incorrect model when G is poorly-conditioned. The effect is more severe if the true model has a small coefficient, which requires a small threshold λ, which correspondingly may lead to a non-sparse solution. These are sensitivities of any sparse regression algorithm (see e.g. [57]) and accounting for the effect of noise and poor conditioning is an active area of research in equation discovery.

We also note that several researchers have focused on the uniqueness in kernel identifiability [34,58]. This issue does not directly apply to our scenario21 of identifying the triple (K, V, σ). Moreover, in the cases we considered, we do not see any identifiability issues (e.g. rank deficiency) even in the high noise case with low particle number. Quantifying the transition to identifiability as N as a function of the condition number κ(G) is an important subject for future work.

For extensions, the example system (5.2) and resulting homogenization motivates further study of effective equations for systems with complex microstructure. In other fields this is described as coarse-graining. A related line of study is inference of 2nd-order particle systems, as explored in [32], which often lead to an infinite hierarchy of mean-field equations. Our weak-form approach may provide a principled method for truncated and closing such hierarchies using particle data. Another extension is to enforce convex constraints in the regression problem, such as lower bounds on diffusivity, or K with long-range attraction depending on the distribution ρπP([0,)) of pairwise distances (see [26] for further use of ρπ). Finally, the framework we have introduced can easily be used to find nonlocal models from continuous solution data (e.g. given U instead of Y), whereby questions of nonlocal representations of models can be investigated.

Lastly, we note that MATLAB code is available at https://github.com/MathBioCU/WSINDy_IPS.

Table 3.

Trial function library for nonlocal 1D example (Section 5.2).

Mean-field term Trial function library
(UKU) (UxmU), m{1,2,3,4,5,6,7} {1, 2, 3, 4, 5, 6, 7}
(UV) x(Uxm), m{0,2,3,4,5,6,7,8} {0, 2, 3, 4, 5, 6, 7, 8}
12i,j=1d2(UσσT)ijxixj xx(Uxm), m{0,1,2,3,4,5,6,7,8} {0, 1, 2, 3, 4, 5, 6, 7, 8}

Table 4.

Trial function library for nonlocal 2D example (Section 5.3). Interaction potentials [K]δ indicate cutoff potentials of the form (5.6) with δ=0.01 such that the resulting potential is Lipschitz.

Mean-field term Trial function library
(UKU) {(UxmU)m{2,3,4,5,6}(U[x12]δU)(U[x(logx1)]δU)(U[logx]δU)}
(UV) xi(Ux1mx2n)0m+n5,i{1,2}
12i,j=1d2(UσσT)ijxixj Δ(Ucos(mx1)cos(nx2)), (m,n){0,1,2}

Acknowledgements

This research was supported in part by the NSF Mathematical Biology MODULUS grant 2054085, in part by the NSF/NIH Joint DMS/NIGMS Mathematical Biology Initiative grant R01GM126559, and in part by the NSF Computing and Communications Foundations grant 1815983. This work also utilized resources from the University of Colorado Boulder Research Computing Group, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University. The authors would also like to thank Prof. Vanja Dukić (University of Colorado at Boulder, Department of Applied Mathematics) for insightful discussions and helpful suggestions of references.

Appendix

A.1. Specifications for examples

In Tables 2-5 we include hyperparameter specifications and resulting attributes of Algorithm 4.1 applied to the three examples in Section 5. In particular, we report the typical walltime in Table 5, showing that on each example Algorithm 4.1 learns the mean-field equation from a dataset with ~64,000 particles in under 10 s.

Table 2.

Trial function library for local 2D example (Section 5.1).

Mean-field term Trial function library
(UKU) (UxmU), m{1,2,3,4,5,6,7} {1, 2, 3, 4, 5, 6, 7}
(UV) xi(Ucos(mx1)cos(nx2)), (m,n){0,1,2,3,4,5} {0, 1, 2, 3, 4, 5}, i{1,2} {1, 2}
12i,j=1d2(UσσT)ijxixj Δ(Ucos(mx1)cos(nx2)), (m,n){0,1,2,3,4,5} {0, 1, 2, 3, 4, 5}

Table 5.

Discretization parameters and general information for examples. The number of nonzeros in the true weight vector w0 is given for each parameter set examined. Namely, for the local 2D example, ω=1 results in a 4-term model, while the homogenized case ω=20 results in a three-term model. For the nonlocal 1D example, σ{0,2(0.1),2(0.1)x2} result in 2-term, 3-term, and 5-term models, respectively, and for the nonlocal 2D example σ{0,(4π)1} results in 1-term and 2-term models. The norm G1, condition number κ2(G) and walltime are listed for representative samples with 64,000 total particles.

Example (mx, mt) (px, pt) (sx, st) size(U) (h, Δt)
Local 2D (31,16) (5,3) (10,5) 128 × 128 × 101 (0.078, 0.02)
Nonlocal 1D (29,8) (5,3) (5,1) 256 × 101 (0.023, 0.01)
Nonlocal 2D (25,8) (5,3) (8,1) 128 × 128 × 81 (0.047, 0.1)
Example w0 size(G) G1 κ2(G) Walltime
Local 2D {4, 3} 686 × 85 2.0 × 103 3.0 × 107 9.2 s
Nonlocal 1D {2, 3, 5} 3400 × 24 1.3 × 105 8.7 × 108 0.7 s
Nonlocal 2D {1, 2} 6500 × 59 1.1 × 104 6.4 × 106 8.5 s

A.2. Derivation of homogenized equation (5.3)

We briefly provide a derivation of the homogenized equation (5.3) in the static case. Let ΩRd be an open bounded domain with smooth boundary and Td be the d-dimensional torus. Let a(x,y):Ω×TdR be continuous and uniformly bounded below,

a(x,y)α>0,(x,y)Ω×Td.

Then for any fL2(Ω), the equation

Δ(a(x,xϵ)uϵ(x))=f(x),uϵΩ=0

has a unique weak solution uϵL2(Ω) given by

uϵ(x)=(Gf)(x)a(x,xϵ),

where G is the Green’s function for (Δ)1 with homogeneous Dirichlet boundary conditions on Ω. By the coercivity of a we have that uϵL2(Ω) is uniformly bounded in ϵ. By the lemma in [59, Section 2.4], up to a subsequence {ϵj}jN, there exists a function u(x,y) periodic in its second variable such that for any continuous function ϕ(x,xϵ), we have

limϵ0uϵ(x)ϕ(x,xϵ)dx=u(x,y)ϕ(x,y)dydx.

Setting ϕ(x,y)=ϕ(x), we see that on the same subsequence, uϵu(x,y)dy. Applying the same lemma to the constant series uϵ=1 and letting ϕ(x,xϵ)=ϕ(x)a1(x,xϵ), we see that (up to possibly a second subsequence),

a1(x,xϵ)dya(x,y).

Letting a(x)(dya(x,y))1 and putting together the previous limits, we see that

uϵ(x)u(x)u(x,y)dy=(Gf)(x)dya(x,y)(Gf)(x)a(x),

and hence u solves the homogenized equation

Δ(au)=f.

A.3. Recovery for small N and large M

The related maximum-likelihood approach [26] is shown to be suitable for small N and large M, hence a natural line of inquiry is the performance of Algorithm 4.1 in this regime. Theorem 1 does not apply to this regime, and in fact convergence of the algorithm is not expected: letting UtM,N=1Mm=1MUt(m),N where Ut(m),N is the approximate density constructed from experiment m with N particles, we have the weak-measure convergence UtM,Nρt(1),N as M, where ρt(1),N is the 1-particle marginal of the distribution of Xt in RNd. Unlike the mean-field distribution μt, ρt(1),N is not a weak solution to the mean-field Fokker–Planck equation (3.1), instead we have

tρt(1),N=N1NRdK(xy)ρt(2),N(x,y)dy+(Vρt(1),N)+12i,j=1dxixj(σσTρt(1),N),

holding weakly, which depends on the 2-particle marginal ρt(2),N [35]. Nevertheless, using the 1D nonlocal example in Section 5.2 with σ=2(0.1)0.45, we observe in Fig. 11 (right panel) that our weak-form algorithm correctly identifies the model in > 96% of trials with just N=10 particles per experiment when M[210,212], and that error in K (left panel) follows a O(M12) trend. At M=4096103.61 experiments the error22 in K is less than 1% and the runtime is approximately 0.9 s. The lack of convergence in M is reflected in the diffusivity (middle panel of Fig. 11), where the error appears to plateau at around 1.7% for h0.0468 and at 3.5% for h0.0234. The lower resolution (larger binwidth h) appears to yield slightly better results, possibly indicating that larger h produces a coarse-graining effect such that ρ(2),Nρ(1),Nρ(1),N over larger distances, although this effect deserves more thorough study in future work.

A.4. Technical lemmas

We now prove Lemmas 2-4 under Assumption H. First, some consequences of Assumption H. (I) The η-Hölder continuity of sample paths (H.1) implies that for each t[0,T],

RdxpdμtN=1Ni=1NXt(i)p2pNi=1NX0(i)p+Cη2ptpη.

Together with the pth moment bound on μ0 (H.2), this implies

E[suptTRdxpdμtN]2p(Mp+CηTpη), (A.1)

independent of N. (II) The growth bounds on K, V, and σ (H.3)-(H.4) imply that for some C>0,

K(x)+V(x)+σ(x)(σ(x))TFC(1+xp), (A.2)

where F is the Frobenius norm.

Proof of Lemma 2. Applying Itô’s formula to the process 1Ni=1Nψ(X(i),t), we get that

L(μN,ψ,,)=1Ni=1N0Tψ(Xt(i),t)Tσ(Xt(i))dBt(i).

Note that each integral on the right-hand side is a local martingale, since (A.2) and (H.5) ensure boundedness of ψ(x,t)Tσ(x) over any compact set in Rd, hence has mean zero. By independence of the Brownian motions Bt(i), exchangeability of Xt(i), the moment bound (A.1), and the growth bounds on σ (H.4), the Itô isometry gives us

E[L(μN,ψ,,,)2]=1N0TEXρt(1)[ψ(X,t)Tσ(X)2]dt=1N0TE[Rdψ(x,t)Tσ(x)2dμtN(x)]dtCNψ2,20TE[1+RdxpdμtN(x)]dtCN1

where K=KQANR depends on σ=2(0.1), N=10, C, and Mp. The result follows from Jensen’s inequality.23

FIG. 11.

FIG. 11.

Recovery of (3.1) in one spatial dimension for Cp and T with only ψ particles per experiment.

Proof of Lemma 3. Using the notation fC from Lemma 1 to denote piecewise constant approximation of a function f over the domain D using the grid C, we have

L(U,ψ,,h)L(μN,ψ,,)=((ψ((K)CμN))C,μNψKμN,μN)Einteract+tψCtψ,μN(ψV)CψV,μN+12Tr(2ψσ(σ)T)CTr(2ψσ(σ)T),μN=Einteract+Elinear.

The right-hand side includes an interaction error Einteract followed by a sum Elinear of terms that are linear in the difference between a locally Lipschitz function and its piecewise constant approximation. Hence, we can bound Elinear using smoothness of ψ (H.5), the moment assumptions on μtN (H.2), and the growth assumptions on V and σ (H.3)-(H.4). Specifically, for xBk with center ck, the growth assumptions imply

ψ(x)V(x)ψ(ck)V(ck)Ch((ψ2,+Lip(ψ))(1+xp))Tr(2ψ(x)σ(x)(σ(x))T)Tr(2ψ(ck)σ(ck)(σ(ck))T)Ch((2ψF,+Lip(2ψ))(1+xp))

for C and C depending on p, d, and Cp, hence

ElinearCsupα2Lip(αψ)(T+0TRdxpdμtNdt)h. (A.3)

Similarly, for the interaction error we use that for xBk and yBj with centers ck and cj, we have

ψ(ck)K(ckcj)ψ(x)K(xy)ψ(ck)K(ckcj)K(xy)+ψ(ck)ψ(x)K(xy)Ch(ψ2,+Lip(ψ))(1+xyp)

with C also depending on p, d, and Cp. From this we have

EinteractC(T+0TRdRdxypdμtN(y)dμtN(x)dt)h. (A.4)

The result follows from taking expectation and using the moment bound (A.1), where the final constant C depends on p, d, Cp, Mp, T, η, and ψ.

Proof of Lemma 4. Again rewriting the spatial trapezoidal-rule integration in the form RdφC(x)dμtN, we see that

L(U,ψ,,h)L(U,ψ,,h,Δt) (A.5)

reduces to four terms of the form

A(φ)1Ni=1N(0TφC(Xt(i))dt)(Δt2=1L(φC(Xt+1(i))+φC(Xt(i))),

for φ{tψ,ψV,Tr(2ψσ(σ)T),ψKμtN}. Similarly to the bounds derived for φ(x)φC(x) in Lemma 3, the growth bounds on K, V and σ imply in general that

φ(x)φ(y)Cxy(1+max{x,y}p).

Rewriting the summands in A(φ),

0TφC(Xt(i))dtΔt2=1L(φC(Xt+1(i))+φC(Xt(i)))==1Ltt+1(ttΔt)(φC(Xt(i))φC(Xt+1(i)))dtI1+tt+1(t+1tΔt)(φC(Xt(i))φC(Xt(i)))dt,I2

and using

φC(x)φC(y)φ(x)φ(ck)+φ(x)φ(y)+φ(y)φ(c)C(2h+xy)(1+max{x,y}p)

where xBk and yB, we see that for I1,

tt+1(ttΔt)(φC(Xt(i))φC(Xt+1(i)))dttt+1(ttΔt)C(2h+Xt(i)Xt+1(i))(1+max{Xt(i),Xt+1(i)}p)dttt+1(ttΔt)C(2h+t+1tη)(1+max{Xt(i),Xt+1(i)}p)dt.

Taking expectation on both sides and using the moment bound (A.1), we get

E[tt+1(ttΔt)(φC(Xt(i))φC(Xt+1(i)))dt]C(Δth+Δt1+η).

We get the same bound for I2. Summing over , and taking the average in i, we then get

E[A(φ)]C(h+Δtη),

which implies the desired bound on the difference (A.5).

22

For comparison, in [26] Fig. 4 the error in recovering K using the maximum-likelihood approach on an opinion dynamics example for M=103.6, N=10, and σ=0.5 is approximately 100 × 10−1.2% = 6.3%.

23

fp,q for vector-valued functions f:RdRd denotes the Lq norm over x of the p norm of f(x). Also recall that ρt(1) is the Xt(1)-marginal of the process XtRdN.

Footnotes

CRediT authorship contribution statement

Daniel A. Messenger: Concept for the article, Wrote the first draft, Editing, Performed the mathematical analysis, Wrote software selected examples, Ran simulations, Analyzed the data. David M. Bortz: Concept for the article, Editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

1

We define the pth moment of a probability measure μ for p0 by Mp(μ)Rdxpdμ(x).

2

The set DD is defined DD={xy:(x,y)D×D}.

3

By Id we mean the identity in Rd.

4

We use the notation tμt to denote the evolution of probability measures. Subscripts will not be used to denote differentiation.

5

Meaning that for all continuous bounded functions ϕ:RdR, RddμtN(x)Rddμt(x).

6
For a function f:RdY, where Y is a metric space with metric ρ, we define Lip(f) by
Lip(f)supx,yRdρ(f(x),f(y))xy
where denotes the Euclidean norm. We say f is Lipschitz when Lip(f)<. Also, fC1f+i=1dfxi.
7

The indicator function is defined 1A(x){1,xA0,xA}.

8

In this case (4.2) is the variance of a Monte-Carlo estimator for ψ(x)dμ(x).

9

Details of the libraries used in examples can be found in Tables 2-4 in Appendix A.1.

10

This is not true in domains with boundaries, where nonlocalities can be seen to impart mean translation [42].

11

Note that this is feasible because the STLS algorithm terminates in finitely many iterations.

12

The Moore-Penrose inverse A is defined for a rank-r matrix A using the reduced SVD A=UrΣrVr as AVrΣr1Ur. The subscript r denotes restriction to the first r columns.

13

In particular, correlations result in large mutual incoherence, which renders algorithms such as Basis Pursuit, Orthogonal Matching Pursuit, and Hard Thresholding Pursuit useless (see [48, Chapter 5] for details).

14

Note that CC is simply C shifted to lie in the positive orthant {xRd:x0,1d} and reflected through each coordinate plane x=0. In this way CC discretizes the set DD{xyRd:(x,y)D×D} containing all observed interparticle distances.

15

We neglect the cost of computing the histogram U and evaluating ψ(C), together amounting to an additional O(NML+C) flops, as these terms are lower order and reused in each column of G and b.

16

For ARd×d the Frobenius norm is defined AF=Tr(ATA)

17

Gq is the induced matrix q-norm of G.

18

Note that the opposite convergence holds for the algorithm introduced in [26]: N-fixed, M results in recovery of K.

19

For example, identification of the true model (supp(w^)=supp(w)) results in a TPR(w^)=1, while identification of only half of the correct nonzero terms and no additional falsely identified terms results in TPR(w^)=0.5.

20

Since the model is local, (5.2) is the Fokker-Planck equation for the distribution of each particle, rather than only in the limit of infinite particles.

21

E.g. due to multiple representations of the drift combining both nonlocal and local terms – see Section 4.1.4

References

  • [1].Brunton Steven L., Proctor Joshua L., Kutz J. Nathan, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci 113 (15) (2016) 3932–3937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Messenger Daniel A., Bortz David M., Weak SINDy: Galerkin-based data-driven model selection, Multiscale Model. Simul 19 (3) (2021) 1474–1497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Messenger Daniel A., Bortz David M., Weak SINDy for partial differential equations, J. Comput. Phys (2021) 110525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Warren Michael S., Salmon John K., Astrophysical N-body simulations using hierarchical tree data structures, Proc. Supercomput (1992). [Google Scholar]
  • [5].Guo Jiawei, The progress of three astrophysics simulation methods: Monte-Carlo, PIC and MHD, J. Phys. Conf. Ser. 2012 (1) (2021) 012136, IOP Publishing. [Google Scholar]
  • [6].Lelievre Tony, Stoltz Gabriel, Partial differential equations and stochastic methods in molecular dynamics, Acta Numer. 25 (2016) 681–880. [Google Scholar]
  • [7].Sepúlveda Néstor, Petitjean Laurence, Cochet Olivier, Grasland-Mongrain Erwan, Silberzan Pascal, Hakim Vincent, Collective cell motion in an epithelial sheet can be quantitatively described by a stochastic interacting particle model, PLoS Comput. Biol 9 (3) (2013) e1002944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Van Liedekerke Paul, Palm MM, Jagiella N, Drasdo Dirk, Simulating tissue mechanics with agent-based models: concepts, perspectives and some novel results, Comput. Part. Mech 2 (4) (2015) 401–444. [Google Scholar]
  • [9].Bi Dapeng, Yang Xingbo, Marchetti M Cristina, Manning M Lisa, Motility-driven glass and jamming transitions in biological tissues, Phys. Rev. X 6 (2) (2016) 021011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Blondel Vincent D., Hendrickx Julien M., Tsitsiklis John N., Continuous-time average-preserving opinion dynamics with opinion-dependent communications, SIAM J. Control Optim 48 (8) (2010) 5214–5240. [Google Scholar]
  • [11].Keller Evelyn F., Segel Lee A., Model for chemotaxis, J. Theoret. Biol 30 (2) (1971) 225–234. [DOI] [PubMed] [Google Scholar]
  • [12].Gkeka Paraskevi, Stoltz Gabriel, Farimani Amir Barati, Belkacemi Zineb, Ceriotti Michele, Chodera John D, Dinner Aaron R, Ferguson Andrew L, Maillet Jean-Bernard, Minoux Hervé, et al. , Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems, J. Chem. Theory Comput 16 (8) (2020) 4757–4775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Bibby Bo Martin, Sørensen Michael, Martingale estimation functions for discretely observed diffusion processes, Bernoulli (1995) 17–39. [Google Scholar]
  • [14].Lo Andrew W., Maximum likelihood estimation of generalized Itô processes with discretely sampled data, Econom. Theory 4 (2) (1988) 231–247. [Google Scholar]
  • [15].Bishwal Jaya P.N., Parameter Estimation in Stochastic Differential Equations, Springer, 2007. [Google Scholar]
  • [16].Boninsegna Lorenzo, Nüske Feliks, Clementi Cecilia, Sparse learning of stochastic dynamical equations, J. Chem. Phys 148 (24) (2018) 241723. [DOI] [PubMed] [Google Scholar]
  • [17].Callaham Jared L, Loiseau J-C, Rigas Georgios, Brunton Steven L, Nonlinear stochastic modelling with langevin regression, Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci 477 (2250) (2021) 20210092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Li Yang, Duan Jinqiao, Extracting governing laws from sample path data of non-Gaussian stochastic dynamical systems, 2021, arXiv preprint arXiv: 2107.10127. [Google Scholar]
  • [19].Nardini John T, Baker Ruth E, Simpson Matthew J, Flores Kevin B, Learning differential equation models from stochastic agent-based model simulations, J. R. Soc. Interface 18 (176) (2021) 20200987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Brückner David B., Ronceray Pierre, Broedersz Chase P., Inferring the dynamics of underdamped stochastic systems, Phys. Rev. Lett 125 (5) (2020) 058103. [DOI] [PubMed] [Google Scholar]
  • [21].Chen Xiaoli, Yang Liu, Duan Jinqiao, Karniadakis George Em, Solving inverse stochastic problems from discrete particle observations using the Fokker–Planck equation and physics-informed neural networks, SIAM J. Sci. Comput 43 (3) (2021) B811–B830. [Google Scholar]
  • [22].Feng Jinchao, Ren Yunxiang, Tang Sui, Data-driven discovery of interacting particle systems using Gaussian processes, 2021, arXiv preprint arXiv: 2106.02735. [Google Scholar]
  • [23].Kasonga Raphael A., Maximum likelihood theory for large interacting systems, SIAM J. Appl. Math 50 (3) (1990) 865–875. [Google Scholar]
  • [24].Bishwal Jaya Prakash Narayan, et al. , Estimation in interacting diffusions: Continuous and discrete sampling, Appl. Math 2 (9) (2011) 1154–1158. [Google Scholar]
  • [25].Bongini Mattia, Fornasier Massimo, Hansen Markus, Maggioni Mauro, Inferring interaction rules from observations of evolutive systems I: The variational approach, Math. Models Methods Appl. Sci 27 (05) (2017) 909–951. [Google Scholar]
  • [26].Lu Fei, Maggioni Mauro, Tang Sui, Learning interaction kernels in stochastic systems of interacting particles from multiple trajectories, Foundations of Computational Mathematics (2021) 1–55. [Google Scholar]
  • [27].Chen Xiaohui, Maximum likelihood estimation of potential energy in interacting particle systems from single-trajectory data, Electron. Commun. Probab 26 (2021) 1–13. [Google Scholar]
  • [28].Sharrock Louis, Kantas Nikolas, Parpas Panos, Pavliotis Grigorios A, Parameter estimation for the mckean-vlasov stochastic differential equation, 2021, arXiv preprint arXiv:2106.13751. [Google Scholar]
  • [29].Gomes Susana N., Stuart Andrew M., Wolfram Marie-Therese, Parameter estimation for macroscopic pedestrian dynamics models from microscopic data, SIAM J. Appl. Math 79 (4) (2019) 1475–1500. [Google Scholar]
  • [30].Lukeman Ryan, Li Yue-Xian, Edelstein-Keshet Leah, Inferring individual rules from collective behavior, Proc. Natl. Acad. Sci 107 (28) (2010) 12576–12580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Rudy Samuel H, Brunton Steven L, Proctor Joshua L, Kutz J Nathan, Data-driven discovery of partial differential equations, Sci. Adv 3 (4) (2017) e1602614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Supekar Rohit, Song Boya, Hastewell Alasdair, Mietke Alexander, Dunkel Jörn, Learning hydrodynamic equations for active matter from particle simulations and experiments, 2021, arXiv preprint arXiv:2101.06568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Alves E Paulo, Fiuza Frederico, Data-driven discovery of reduced plasma physics models from fully-kinetic simulations, 2020, arXiv preprint arXiv: 2011.01927. [Google Scholar]
  • [34].Lang Quanjun, Lu Fei, Learning interaction kernels in mean-field equations of 1st-order systems of interacting particles, 2020, arXiv preprint arXiv: 2010.15694. [Google Scholar]
  • [35].Jabin Pierre-Emmanuel, Wang Zhenfu, Mean field limit for stochastic particle systems, in: Active Particles, Volume 1, Springer, 2017, pp. 379–402. [Google Scholar]
  • [36].Sznitman Alain-Sol, Topics in propagation of chaos, in: Ecole D’été de Probabilités de Saint-Flour XIX–1989, Springer, 1991, pp. 165–251. [Google Scholar]
  • [37].Méléard Sylvie, Asymptotic behaviour of some interacting particle systems; McKean-Vlasov and Boltzmann models, in: Probabilistic Models for Nonlinear Partial Differential Equations, Springer, 1996, pp. 42–95. [Google Scholar]
  • [38].Bolley François, Canizo José A, Carrillo José A, Stochastic mean-field limit: non-Lipschitz forces and swarming, Math. Models Methods Appl. Sci 21 (11) (2011) 2179–2210. [Google Scholar]
  • [39].Boers Niklas, Pickl Peter, On mean field limits for dynamical systems, J. Stat. Phys 164 (1) (2016) 1–16. [Google Scholar]
  • [40].Fetecau Razvan C., Huang Hui, Sun Weiran, Propagation of chaos for the keller–segel equation over bounded domains, J. Differential Equations 266 (4) (2019) 2142–2174. [Google Scholar]
  • [41].Fetecau Razvan C, Huang Hui, Messenger Daniel, Sun Weiran, Zero-diffusion limit for aggregation equations over bounded domains, 2018, arXiv preprint arXiv:1809.01763. [Google Scholar]
  • [42].Messenger Daniel A., Fetecau Razvan C., Equilibria of an aggregation model with linear diffusion in domains with boundaries, Math. Models Methods Appl. Sci 30 (04) (2020) 805–845. [Google Scholar]
  • [43].Fetecau Razvan C., Kovacic Mitchell, Swarm equilibria in domains with boundaries, SIAM J. Appl. Dyn. Syst 16 (3) (2017) 1260–1308. [Google Scholar]
  • [44].Carrillo JA, Delgadino MG, Patacchini FS, Existence of ground states for aggregation-diffusion equations, Anal. Appl 17 (03) (2019) 393–423. [Google Scholar]
  • [45].Araújo Dyego, Oliveira Roberto I., Yukimura Daniel, A mean-field limit for certain deep neural networks, 2019, arXiv preprint arXiv:1906.00193. [Google Scholar]
  • [46].Freedman David, Diaconis Persi, On the histogram as a density estimator: L2 theory, Z. Wahrscheinlichkeitstheor. Verwandte Gebiete 57 (4) (1981) 453–476. [Google Scholar]
  • [47].Zhang Linan, Schaeffer Hayden, On the convergence of the SINDy algorithm, Multiscale Model. Simul 17 (3) (2019) 948–972. [Google Scholar]
  • [48].Foucart Simon, Rauhut Holger, A Mathematical Introduction to Compressive Sensing, Birkhäuser Basel, 2013. [Google Scholar]
  • [49].Malik Osman Asif, Becker Stephen, Low-rank tucker decomposition of large tensors using tensorsketch, Adv. Neural Inf. Process. Syst 31 (2018) 10096–10106. [Google Scholar]
  • [50].Sun Yiming, Guo Yang, Luo Charlene, Tropp Joel, Udell Madeleine, Low-rank tucker approximation of a tensor from streaming data, SIAM J. Math. Data Sci 2 (4) (2020) 1123–1150. [Google Scholar]
  • [51].Jang Jun-Gi, Kang U, D-tucker: Fast and memory-efficient tucker decomposition for dense tensors, in: 2020 IEEE 36th International Conference on Data Engineering (ICDE), IEEE, 2020, pp. 1850–1853. [Google Scholar]
  • [52].Yu Wenjian, Gu Yu, Li Yaohang, Efficient randomized algorithms for the fixed-precision low-rank matrix approximation, SIAM J. Matrix Anal. Appl 39 (3) (2018) 1339–1359. [Google Scholar]
  • [53].Lagergren John H., Nardini John T., Lavigne G. Michael, Rutter Erica M., Flores Kevin B., Learning partial differential equations for biological transport models from noisy spatio-temporal data, Proc. R. Soc. A 476 (2234) (2020) . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Fetecau Razvan C., Huang Yanghong, Kolokolnikov Theodore, Swarm dynamics and equilibria for a nonlocal aggregation model, Nonlinearity 24 (10) (2011) 2681. [Google Scholar]
  • [55].Milstein Grigorii Noikhovich, Numerical Integration of Stochastic Differential Equations, Vol. 313, Springer Science & Business Media, 1994. [Google Scholar]
  • [56].Dolbeault Jean, Perthame Benoît, Optimal critical mass in the two dimensional Keller–Segel model in R2, C. R. Math 339 (9) (2004) 611–616. [Google Scholar]
  • [57].Cai T Tony, Wang Lie, Orthogonal matching pursuit for sparse signal recovery with noise, IEEE Trans. Inform. Theory 57 (7) (2011) 4680–4688. [Google Scholar]
  • [58].Li Zhongyang, Lu Fei, Maggioni Mauro, Tang Sui, Zhang Cheng, On the identifiability of interaction functions in systems of interacting particles, Stoch. Processes Appl 132 (2021) 135–163. [Google Scholar]
  • [59].Weinan E, Principles of Multiscale Modeling, Cambridge University Press, 2011. [Google Scholar]

RESOURCES