Skip to main content
Springer logoLink to Springer
. 2022 Apr 9;2022(1):29. doi: 10.1186/s13662-022-03702-y

Operator compression with deep neural networks

Fabian Kröpfl 1,, Roland Maier 2, Daniel Peterseim 1,3
PMCID: PMC9028012  PMID: 35531267

Abstract

This paper studies the compression of partial differential operators using neural networks. We consider a family of operators, parameterized by a potentially high-dimensional space of coefficients that may vary on a large range of scales. Based on the existing methods that compress such a multiscale operator to a finite-dimensional sparse surrogate model on a given target scale, we propose to directly approximate the coefficient-to-surrogate map with a neural network. We emulate local assembly structures of the surrogates and thus only require a moderately sized network that can be trained efficiently in an offline phase. This enables large compression ratios and the online computation of a surrogate based on simple forward passes through the network is substantially accelerated compared to classical numerical upscaling approaches. We apply the abstract framework to a family of prototypical second-order elliptic heterogeneous diffusion operators as a demonstrating example.

Keywords: Deep learning, Neural networks, Numerical homogenization, Model order reduction

Introduction

The remarkable success of machine learning technology, especially deep learning, in classical AI disciplines such as image recognition and natural language processing has led to an increased research interest in leveraging the power of these approaches in other science and engineering disciplines over the last years. In the field of numerical modeling and simulation, promising approaches are emerging that try to integrate machine learning algorithms and traditional physics-based approaches, combining the advantages of the data-driven regime with known physics and domain knowledge. In this spirit, many different approaches to approximating solutions of partial differential equations (PDEs) with neural networks have been proposed, for example so-called physics-informed neural networks (PINNs) [60], the deep Galerkin method [63], or the deep Ritz method [19]. It has become evident that the strategy of using neural networks as ansatz functions for the approximation of a PDE’s solution is especially advantageous for high-dimensional problems that are outside the reach of classical mesh-based methods [17, 18, 33]. For some classes of PDEs, e.g., Kolmogorov PDEs and semilinear heat equations, it has even been proven that neural networks break the curse of dimensionality [9, 39].

In this spirit, we strongly believe that the strength of neural networks lies in scenarios where one deals with a whole family of PDEs rather than one single equation, for example in the context of so-called parametric PDEs, i.e., settings where a family of partial differential operators parameterized by some coefficient is considered, see, e.g., [62]. This is particularly true for multiscale problems, where one is interested in computing coarse-scale surrogates for problems involving a range of scales that cannot be resolved in a direct numerical simulation.

In this paper, we study the problem of approximating a coefficient-to-surrogate map with a neural network in a very general setting of parameterized PDEs with arbitrarily rough coefficients that may vary on a microscopic scale. In other words, we are not trying to directly approximate the parameter-to-solution map, but rather compress the fine-scale information contained in the continuous operator to a finite-dimensional sparse object that is able to replicate the effective behavior of the solution on a macroscopic scale of interest even in the presence of unresolved oscillations of the underlying coefficient.

The output surrogate models are based on the idea of modern numerical homogenization techniques such as localized orthogonal decomposition [46, 49, 56], gamblets [52], rough polyharmonic splines [53], the multiscale finite element method [21, 38], or the generalized finite element method [7, 20]; see [5] and the references therein for a comprehensive overview. These methods have demonstrated high performance in many relevant applications such as porous media flow or wave scattering in heterogeneous media to mention only a few. In particular, they typically do not require explicit assumptions on the existence of lower-dimensional structures in the underlying family of PDE coefficients and yield sparse system matrices that ensure uniform approximation properties of the resulting surrogate. Moreover, the computation of the system matrices mimics the standard assembly procedure from finite element theory, consisting of the generation of local system matrices and their combination by local-to-global mappings, which is exploited to reduce the size of the network architecture and its complexity considerably.

The possibility of fast computation of the surrogates has high potential for multi-query problems, such as in uncertainty quantification, and time-dependent or inverse multiscale problems, which require the computation of surrogates for many different a priori unknown coefficients. Though the aforementioned numerical homogenization methods lead to accurate surrogates for the whole class of coefficients, their computation requires the resolution of all scales locally which marks a severe limitation when it has to be performed many times for the solution of a multi-query problem. There have been attempts to tackle this problem, but the results so far are only applicable to small perturbation regimes [35, 50] or settings where the parameterization fulfills additional smoothness requirements [3].

To overcome this problem, we propose to learn the whole nonlinear coefficient-to-surrogate map from a training set consisting of pairs of coefficients and their corresponding surrogates with a deep neural network. In other words, we are combining the domain knowledge from numerical homogenization with a data-driven deep learning approach by essentially learning a numerical homogenization method from data. To this end, we propose using an offline-online approach. In the offline phase, the neural network is trained based on data generated with existing numerical homogenization techniques. In the online phase, the compression of previously unseen operators can then be reduced to a simple forward pass of the neural network, which eliminates the computational bottleneck encountered in multi-query settings.

Our method is conceptually different from the existing approaches that try to integrate ideas from homogenization with neural networks. In [6] for example, the authors propose to learn a homogenized PDE from simulation data by linking deep learning with an equation-free multiscale approach. Other papers in the context of uncertainty quantification suggest training a neural network to identify suitable multiscale basis functions for the finite volume method given a porous random medium [13, 54]. In [31], the authors consider the problem of elasticity with history-dependent material properties, where a recurrent deep neural network connects microscopic and macroscopic material parameters. In deep multiscale model learning [65], learning techniques are used to predict the evolution from one time step to another within a given coarse multiscale space. The goal of this approach is to obtain a reasonable coarse operator for the successive approximation of a time-dependent PDE. Furthermore, several experimental and theoretical works on the approximation of the coefficient-to-solution map [11, 28, 30, 43] or other quantities of interest such as the ground state energy in Schrödinger equations [41] by deep neural networks have been published in the context of parametric PDEs.

This paper is structured as follows: in Sect. 2, we introduce and motivate the abstract framework for a very general class of linear differential operators. After that, we study the problem of elliptic homogenization as an example of how to apply the general methodology in practice. In Sect. 4, we conduct numerical experiments that show the feasibility of our ideas developed in the previous two sections. We conclude this work with an outlook on further research questions.

Abstract framework

In this section, we describe the general abstract problem of finding discrete compressed surrogates to a family of differential operators that allow us to satisfactorily approximate the original operators on a target scale of interest, given only the underlying coefficients but not a high resolution representation of the operators. We elaborate on how to speed up the online computation of those compressed representatives using deep neural networks after an initial offline training phase.

Setting

Let DRd, d{1,2,3} be a bounded Lipschitz domain and H01(D) be the Sobolev space of L2-functions with weak first derivatives in L2(D) that vanish on the boundary of D. We write H1(D) for the dual space of H01(D) and , for the duality pairing between H1(D) and H01(D). Consider a family of linear differential operators

L:={LA:H01(D)H1(D)|AA}

that is parameterized by some class AL(D) of admissible coefficients. We emphasize that we do not pose any assumptions on the structure of the coefficients AA such as periodicity or scale separation and explicitly allow for arbitrarily rough coefficients that may vary on a continuum of scales up to some microscale εdiam(D). We assume that for every AA the associated operator LA is symmetric (LAu,v=u,LAv), local (LAu,v=0 if u and v have disjoint supports) and bijective.

Bijectivity implies that for any given AA and fH1(D) there exists a unique uH01(D) that solves the equation

LAu=f 2.1

in a weak sense, i.e., the solution satisfies

LAu,v=f,vfor all vH01(D). 2.2

For given problem data A and f, we are interested in computing an approximation to u on some target scale in reasonable time.

Discretization

In order to be able to solve this problem computationally, we choose a finite-dimensional subspace VhH01(D) of dimension m=dim(Vh). As a standard example, one could take Vh to be a classical finite element space based on some mesh Th with characteristic mesh size h and approximate (2.2) with a Galerkin method. However, in the very general setting with A possibly having fine oscillations on a scale that is not resolved by the mesh size h, this approach leads to unreliable approximations of u. Then again, the resolution of these fine-scale features can be prohibitively expensive in terms of computational resources if ε is very small. Note that resolution here may mean that the actual mesh size is significantly smaller than ε, depending on the oscillations of the coefficient and its regularity [8, 58]. This means that more advanced discretization techniques are required to still obtain reasonable approximations in the unresolved setting. In practice, the challenge is therefore to compress the fine-scale information that is contained in the operator LA to a suitable surrogate SA on the target scale h, i.e., the surrogate SA must be chosen in such a way that it is still able to capture the characteristic behavior of the operator LA on the scale of interest. Moreover, we require SA to be a bijection that maps the space Vh to itself. This ensures that for any AA and fH1(D) we can find unique uhVh that weakly solves the discretized equation

SAuh=fh, 2.3

with fh=Mf, where M is a quadrature-type operator that maps a function in H1 to an appropriate approximation in Vh. Problem (2.3) needs to be understood as finding uh that satisfies

SAuh,vh=fh,vhfor all vhVh.

The choice of the surrogate is obviously highly dependent on the problem at hand, see for example Sect. 3.2 for possible choices in the case of second-order elliptic diffusion operators.

Characterization in terms of a system matrix

We restrict our discussion to choices of surrogates that can be represented by an m×m system matrix SA that is often called the effective system matrix in the following. We assume that SARm×m is of the form (SA)ij=SAλj,λi for a basis λ1,,λm of Vh. Note that the basis should be chosen as localized as possible in order for the resulting system matrix to be sparse. The process of operator compression can then be formalized by a compression operator

C:ARm×m

that maps a given coefficient A to the system matrix SA representing the compressed surrogate SA of the operator LA. Once C has been evaluated for given AA, the solution to (2.2) can then be approximated with a function uhVh for any right-hand side fH1(D) by solving the linear system SAU=F, where FRm is the vector with entries Fi:=Mf,λi and URm contains the coefficients of the basis representation uh=i=1mUiλi.

Multi-query scenarios

For many classes of coefficients A and based on the choice of the surrogate, evaluating C requires solving local auxiliary problems, during which the finest scale ε has to be resolved at some point. While this is acceptable if one wants to compress only a few operators in an offline computation, it becomes a major problem once C has to be evaluated for many different coefficients A in an online phase, as for example in certain inverse problems, uncertainty quantification, or the simulation of evolution equations with time-dependent coefficients. This motivates a data-driven offline–online approach, where the offline phase consists of training a neural network to approximate the compression operator C, such that in the subsequent online phase the evaluation of C can be replaced with a simple forward pass through the network, thus eliminating the computational bottleneck.

System matrix decomposition

In principle, one could try to directly approximate the global operator C with a neural network. If the coefficient involves oscillations on some fine scale ε, this would lead to a network architecture with an input layer of size O(εd), an output layer of size O(m), and possible hidden layers. Particularly for small ε, this leads to very large networks and, thus, requires a huge amount of free parameters and therefore extraordinary amounts of training data and storage space in order to preserve good generalization capabilities.

To reduce the necessary size of the network, one can exploit available information on the compression operator C by means of a certain structure in the resulting effective matrices SA. To this end, we think of SA as a matrix composed of multiple inflated sub-matrices, i.e.,

SA=jJΦj(SA,j), 2.4

where J denotes some given index set and SA,jRs×t,s,tm, are (typically dense) local matrices of equal size. The functions

Φj:Rs×tRm×m 2.5

represent local-to-global mappings inspired by classical finite element assembly processes as further explained below. More precisely, there exist index transformations πj and φj, such that

Φj(S)[πj(k),φj(l)]=S[k,l],1ks,1lt,

where M[ir,ic] denotes the entry of a matrix M in the irth row and the icth column. Note that πj and φj can also map to zero, indicating that the corresponding entry should be disregarded. Let us also emphasize that the mappings Φj (and, in turn, the index transformations πj and φj) are completely independent of coefficients A and solely depend on the domain D as well as the geometry of an allotted discretization. The precise definitions of the maps πj and φj as well as the index set J are usually determined in a canonical way by the choice of the computational mesh and the compression operator C. In Sect. 3.4, we present an example of how such mappings may look like.

Depending on the compression operator C and decomposition (2.4), we can expect that all the local matrices SA,j are created in a similar fashion and only depend on a local sub-sample of the coefficient. This can be understood as a generalization of the assembly process that underlies classical finite element system matrices: these matrices are composed of local system matrices that are computed on each element separately and only require knowledge about the coefficient on the respective element. In that context, the local sub-matrices all have a similar structure and the mapping by the functions Φj leads to overlapping contributions on the global level. Going back to the abstract setting, we generalize these properties and assume the existence of a lower-dimensional reduced compression operator

Cred:RrRs×t, 2.6

such that the contributions SA,j are of the form

SA,j=(CredRj)(A), 2.7

where the operators

Rj:ARr 2.8

extract r relevant features of a given global coefficient. In the context of, e.g., finite element matrices, the operators Rj correspond to the restriction of a coefficient to an element-based piecewise constant approximation and Cred incorporates the computation of a local system matrix based on such a sub-sample of the coefficient. To achieve a uniform length r of the output for the operators Rj, these operators may include artificially introduced zeros depending on the respective geometric configurations (e.g., at the boundary). An example for a quadrilateral mesh in two dimensions is shown in Fig. 1.

Figure 1.

Figure 1

Illustration of the extended element neighborhood N1(T) around a corner element TTh. An asterisk indicates that Aε|K[α,β], a zero that Aε|K=0 in the respective cell K of the refined mesh Tε

The problem of evaluating C can now be decomposed into multiple evaluations of the reduced operator Cred that takes the local information Rj(A) of A and outputs a corresponding local matrix as described in (2.7). In our setting of a coefficient A that is potentially unresolved by the target scale h, evaluating Cred is nontrivial and might become a bottleneck in multi-query scenarios as already indicated in Sect. 2.4. In such cases, we propose to approximate the operator Cred with a deep neural network

Ψ(,θ):RrRs×t,

where θRp is a set of p trainable parameters of moderate size such that, for given AA, the effective system matrix C(A)=SA can be efficiently approximated by

SˆA:=jJΦj(Ψ(Rj(A),θ)),

which requires just a single forward pass of the minibatch (Rj(A))jJ through the network. Note that the approximation SˆA possesses the same sparsity structure as the matrix SA, since the neural network yields only approximations to the local sub-matrices SA,j, whereas the assembling process which determines the sparsity structure of the global matrix is determined by the mappings Φj, which are independent of the network Ψ.

We emphasize that a decomposition of SA as described in (2.4)–(2.8) does not necessarily require a uniform operator Cred. If multiple reduced operators are required for such a decomposition, the idea of approximating them by one single neural network can still be applied. It is, however, necessary for the ability of the network to generalize well beyond data seen during training that the reduced operators at least involve certain similarities.

Network training

In practice, the neural network Ψ(,θ) has to be trained in an offline phase from a set of training examples before it can be used for approximating the mapping Cred. We propose to draw N global coefficients (A(i))i=1N from A, extracting the relevant information (Aj(i)):=(Rj(A(i))) from them and compressing it into the corresponding effective matrices (SA,j(i)) with Cred. This results in a total of |J|N training samples available for the neural network to train on, namely (Aj(i),SA,j(i)),i=1,N, jJ. In order to learn the parameters of the network, we then minimize the loss functional

J(θ)=1N|J|i=1NjJ12Ψ(Aj(i),θ)SA,j(i)Rs×t2SA,j(i)Rs×t2 2.9

over the parameter space Rp using iterative gradient-based optimization on minibatches of the training data. This can be very efficiently implemented within modern deep learning frameworks such as TensorFlow [1], PyTorch [55], or Flux [40], which allow for the automatic differentiation of the loss functional with respect to the network parameters.

Full algorithm

After having established all the conceptual pieces, we now put them together and return to the abstract variational problem (2.2) from the beginning of the section. Suppose that we want to solve (2.2) for a large number of given coefficients A(i),i=1,,M, and a given right-hand side fH1(D). For ease of notation, we restrict ourselves to a single right-hand side, which is, however, not necessarily required for our approach. The proposed procedure is summarized in Algorithm 1, divided into the offline and online stages of the method.

Algorithm 1.

Algorithm 1

Operator compression with neural network

Application to elliptic homogenization

In this section, we specifically consider a family of prototypical elliptic diffusion operators as a demonstrating example of how to apply the abstract framework laid down in Sect. 2 in practice.

Setting

From now on let the domain D be polyhedral. We consider the family of linear second-order diffusion operators

L:={div(A):H01(D)H1(D)|AA},

parameterized by the following set of admissible coefficients which possibly encode microstructures:

A:={AL(D)|0<αβ<:αA(x)βfor almost all xD}. 3.1

For the sake of simplicity, we restrict ourselves to scalar coefficients here. Note, however, that also the consideration of matrix-valued coefficients is not an issue from a numerical homogenization viewpoint. We remark that the family of operators L fulfills the assumptions of locality and symmetry from the abstract framework. In this setting, the abstract problem (2.1) amounts to solving the following linear elliptic PDE with homogeneous Dirichlet boundary condition:

{div(Au)=fin D,u=0on D,

which possesses a unique weak solution uH01(D) for every fH1(D) and AA. The corresponding counterpart to the weak formulation (2.2) can be written as: find uH01(D) such that

aA(u,v):=DAuvdx=f,vfor all vH01(D) 3.2

by using integration by parts on the divergence term.

Discretization and compression

Let now Th be a Cartesian mesh with characteristic mesh size h and denote with Q1(Th) the corresponding space of piecewise bilinear functions. We consider the conforming finite element space Vh:=Q1(Th)H01(D) of dimension m:=dim(Vh). Generally, also other types of meshes and finite element spaces could be employed, but we restrict ourselves to the above choice for the moment. As we have already mentioned in Sect. 2.2, if the mesh Th does not resolve the fine-scale oscillations of A, approximating u with a pure finite element ansatz of seeking uhVh such that

aA(uh,vh)=f,vhfor all vhVh

will not yield satisfactory results. In a setting where resolving A with the mesh is computationally too demanding, we are therefore interested in suitable choices for a compression operator C. In particular, we want C to produce effective system matrices on the target scale h that can be used to obtain appropriate approximations on this scale. In the following, we briefly comment on possible choices for this operator that are based on the finite element space Vh.

Compression by analytical homogenization

The idea of analytical homogenization is to replace an oscillating A with an appropriate homogenized coefficient AhomL(D,Rd×d). The mathematical theory of homogenization can treat very general nonperiodic coefficients in the framework of G- or H-convergence [14, 51, 64]. However, apart from being nonconstructive in many cases, homogenization in the classical analytical sense considers a sequence of operators div(Aε) indexed by ε>0 and aims to characterize the limit as ε tends to zero. In many realistic applications, such a sequence of models can hardly be identified or may not be available in the first place. Assuming that the necessary requirements on the coefficient A are met, a homogenized coefficient Ahom exists and does not involve oscillations on a fine scale. The coefficient Ahom can then be used in combination with a classical finite element ansatz, since Ahom does no longer include troublesome fine-scale quantities. In practice, the homogenized coefficients cannot be computed easily and need to be approximated. This is, for instance, done with the heterogeneous multiscale method (HMM) [2, 15, 16], which in the end replaces A with a computable approximation AhL(D,Rsymd×d) of Ahom with (Ah)|TRsymd×d for all TTh. With this piecewise constant approximation of A, we obtain a possible compression operator C. Given an enumeration 1,,m of the inner nodes in Th and writing λ1,,λm for the associated nodal basis of Vh, the compressed operator C(A) can be defined as

(C(A))i,j=(SA)i,j:=TThT((Ah)|Tλj)λidx. 3.3

That is, one takes the classical finite element stiffness matrix corresponding to the homogenized coefficient Ahom as an effective system matrix. In this case, decomposition (2.4) corresponds to a partition into element-wise stiffness matrices (with constant coefficient, respectively) that are merged with a simple finite element assembly routine.

We emphasize that approaches based on analytical homogenization – such as (3.3) – are able to provide reasonable approximations on the target scale h but are subject to structural assumptions, in particular scale separation and local periodicity. The goal to overcome these restrictions has led to a new class of numerical methods that are specifically tailored to treating general coefficients with minimal assumptions. These methods are known as numerical homogenization approaches and typically only require a boundedness condition as in (3.1).

Compression by numerical homogenization

The general idea of numerical homogenization methods is to replace the trial space Vh with a suitable multiscale space V˜h, see for instance the references [5, 7, 21, 38, 46, 52, 53]. One possible construction uses a one-to-one correspondence of V˜h to the space Vh, which implies that the two spaces possess the same number of degrees of freedom. Typically, the multiscale space is chosen in a problem-adapted way. We indicate this dependence by defining the new space V˜h:=PAVh, where PA:VhH01(D) particularly depends on A. Therefore, another possible choice of the operator C leads to the effective matrix C(A) given by

(C(A))i,j=(SA)i,j:=aA(PAλj,λi). 3.4

A prominent example for such an approach – and, thus, the operator C – is the Petrov–Galerkin version of the localized orthogonal decomposition (LOD) method which explicitly constructs a suitable operator PA. The LOD was introduced in [46] and theoretically and practically works for very general coefficients. It has also been successfully applied to other problem classes, for instance, wave propagation problems in the context of Helmholtz and Maxwell equations [26, 27, 45, 57, 61] or the wave equation [4, 29, 44, 59], eigenvalue problems [47, 48], and in connection with time-dependent nonlinear Schrödinger equations [37]. However, it requires a slight deviation from locality. That is, while the classical finite element method and the HMM result in a system matrix that only includes neighbor-to-neighbor communication between the degrees of freedom, the multiscale approach (3.4) moderately increases this communication to effectively incorporate the fine-scale information in A for a broader range of coefficients, which is a common property of modern homogenization techniques. As indicated in [12], this slightly increased communication indeed seems to be necessary to handle very general coefficients.

Since we consider a class A of arbitrarily rough coefficients, the compression operator (3.4) corresponding to the operator PA constructed in the LOD method is a suitable choice for our discussion as well as for the numerical experiments of Sect. 4. In the following subsection, we therefore have a closer look into its construction and summarize some main results. Note that we restrict ourselves to an elliptic model problem with homogeneous Dirichlet boundary conditions, but the compression approach can generally be extended to more involved settings such as the ones mentioned above.

Localized orthogonal decomposition

The method is based on a projective quasi-interpolation operator Ih:H01(D)Vh with the following approximation and stability properties: for an element TTh, we require that

h1(vIhv)L2(T)+IhvL2(T)CvL2(N(T))

for all vH01(D), where the constant C is independent of h, and N(S):=N1(S) is the neighborhood (of order 1) of SD defined by

N1(S):={KTh|SK}.

For a particular choice of Ih, we refer to [24].

For given Ih with the above properties, we can define the so-called fine-scale space W, which contains all functions that are not well captured by the finite element functions in Vh. It is defined as the kernel of Ih with respect to H01(D), i.e.,

W:=kerIh|H01(D),

and its local version, for any SD, is given by

W(S):={wW|supp(w)S}.

In order to incorporate fine-scale information contained in the coefficient A, the idea is now to compute coefficient-dependent local corrections of functions vhVh. To this end, we define the neighborhood of order N iteratively by N(S):=N(N1(S)), 2. For any function vhVh, its element corrector QA,TvhW(N(T)), TTh, is defined by

aA(QA,Tvh,w)=TAvhwdxfor all wW(N(T)). 3.5

Note that in an implementation, the element corrections QA,T have to be computed on a sufficiently fine mesh that resolves the oscillations of the coefficient A. Since the algebraic realization of the correctors and guidelines for an efficient implementation of the LOD method in general are not within the scope of the article, we refer to [23] for details. We emphasize that, by construction, the supports of the correctors QA,Tvh are limited to N(T). The global correction QA:VhW then consists of a summation of these local contributions and is given by

QA:=TThQA,T.

Note that the choice = corresponds to a computation of the element corrections on the entire domain D and leads to the orthogonality property

aA((1QA)vh,w)=0for all wW, 3.6

that defines an aA-orthogonal splitting H01(D)=(1QA)VhW. This particularly explains the name orthogonal decomposition. The use of localized element corrections is motivated by the decay of the corrections QA,T away from the element T. This is, for instance, shown in [36, 56] (based on [46]) and reads

(QAQA)vhL2(D)CecdecvhL2(D)

with a constant cdec which is independent of h and .

Motivated by decomposition (3.6) and the localized approximations in (3.5), we choose PA:=1QA in (3.4). The space V˜h:=PAVh=(1QA)Vh, which has the same number of degrees of freedom as Vh, can then be used as an ansatz space for the discretization of (3.2). Note that the original LOD method introduced in [46] considers a discretization where V˜h is also used as test space. We, however, consider the Petrov–Galerkin variant of the method as analyzed in [22] that uses the classical finite element space Vh as test space instead, i.e., we seek uhVh such that

aA((1QA)uh,vh)=fh,vhfor all vhVh, 3.7

where fh=MfVh is again a suitable approximation of fH1(D). This defines a compressed operator SA as in (2.3) that maps uhVh to fhVh. As it turns out, the Petrov–Galerkin formulation has some computational advantages over the classical method, in particular in terms of memory requirement. For details, we again refer to [23]. The theory in [22] shows that the approximation uh defined in (3.7) is first-order accurate in L2(D) provided that |logh| and, additionally, fL2(D). More precisely, it holds

(div(A))1SA1L2(D)L2(D)=supfL2(D)(div(A))1(f)SA1(Mf)L2(D)fL2(D)h+ecdec,

where Mf denotes the L2-projection of f onto Vh. Note that the methodology can actually be applied to more general settings beyond the elliptic case, see for instance [49] for an overview.

System matrix surrogate

We now return to the discussion of the compression operator C introduced in (3.4) that maps coefficients AA to the effective system matrices

(SA)i,j:=aA((1QA)λj,λi)

obtained from the Petrov–Galerkin LOD method. Once SA has been computed for a given A, an approximation uh=j=1mUjλj can be computed by solving the following linear system for the coefficients U=(U1,,Um)T:

SAU=F,

where F:=(Mf,λ1,,Mf,λm)T. Since uh is equivalently characterized by the solution of (3.7), it captures the effective behavior of the solution to the continuous problem (3.2) on the target scale h as discussed in the previous subsection.

The remainder of this section is dedicated to showing how the abstract decomposition (2.4) translates to the present LOD setting and how it can be implemented in practice. Writing N(S) for the set of inner mesh nodes on some subdomain SD and denoting NS=|N(S)|, the effective system matrix SA can be decomposed as

SA=TThΦT(SA,T), 3.8

where the matrices SA,T are local system matrices of the form

(SA,T)i,j=N(T)A(1QA,T)λjλidx,jN(T),iN(N(T)), 3.9

i.e., they correspond to the interaction of the localized ansatz functions (1QA,T)λj associated with the nodes of the element T with the classical first order nodal basis functions whose supports overlap with the element neighborhood N(T). This means that SA,T is an NN(T)×NT matrix. In practice, the coefficient A in (3.9) is often replaced with an element-wise constant approximation Aε on a finer mesh Tε that resolves all the oscillations of A and that we assume to be a uniform refinement of Th.

As already explained in the abstract framework, the mappings ΦT in (3.8) are local-to-global mappings that assemble the contributions SA,T on an element neighborhood to a global matrix. In particular, given an enumeration 1,,NN(T) of the nodes in N(N(T)), one considers a mapping gT() that assigns to a given node index i in the element neighborhood N(T) its global node index gT(i){1,,m}. This mapping can be represented by an m×NN(T) sparse matrix with entries

πT[i,j]={1,if i=gT(j),0,otherwise.

Analogously, given an enumeration 1,,NT of the nodes in N(T), there exists a mapping g˜T() – represented by an m×NT-matrix – that assigns to a given node in N(T) with index i its global representative with index g˜T(i). The corresponding matrix is given by

φT[i,j]={1,if i=g˜T(j),0,otherwise.

Using these matrices, decomposition (3.8) reads

SA=TThπTSA,TφT, 3.10

where φT denotes the transpose of the matrix φT.

From the definition of the local contributions SA,T introduced in (3.9), it directly follows that SA,T does only depend on the restriction of A, respectively Aε, to the element neighborhood N(T). Let now Tε(N(T)) be the restriction of the mesh Tε to N(T), consisting of r=|Tε(N(T))| elements. Enumerating the elements then leads to the following operators that correspond to the abstract reduction operators in (2.8):

RT:ARr 3.11

that map a global coefficient A to a vector that contains the values of Aε in the respective cells of Tε(N(T)).

As already mentioned in the abstract section above, we aim for a uniform output size of the operators RT, since the outputs of the operators RT will later on be fed into a neural network with a fixed number of input neurons. In order to achieve that, we artificially extend the domain D and the mesh Th by layers of outer elements around the boundary elements of Th, thus ensuring that the element neighborhood N(T) always consists of the same number of elements regardless of the respective location of the central element TTh relative to the boundary. Further, we extend the piecewise constant coefficient Aε by zero on those outer elements. Figure 1 illustrates this procedure for the case d=2 and =1 for an element T that lies in a corner of the computational domain. In this figure, Th is a uniform quadrilateral mesh on the domain D and Tε is obtained from Th by a single uniform refinement step. The asterisks indicate the coefficient Aε taking a regular value in the interval [α,β], whereas in the cells outside of D, we set Aε to zero.

Note that this enlargement of the mesh Th to obtain equally sized element neighborhoods N(T) also introduces artificial mesh nodes that lie outside of D and that are all formally considered as inner nodes for the definition of NS=|N(S)| with a subset S in the extended domain. This implies that the local system matrices SA,T of dimension NN(T)×NT introduced in (3.9) are all of equal size as well and the rows of SA,T corresponding to test functions associated with nodes that are attached to outer elements contain only zeros. During the assembly process of the local contributions to a global matrix, these zero rows are disregarded (which is also consistent with our definition of the matrices πT, φT).

Finally, in order to unify the computation of local contributions, we use an abstract mapping Cred with fixed input dimension r and fixed output dimension NN(T)×2d as proposed for the abstract framework in Sect. 2.5. The mapping takes the restriction of Aε to an element neighborhood N(T) as input data and outputs the corresponding approximation of a local effective matrix SA,T that will be determined by an underlying neural network Ψ(,θ).

Numerical experiments

In this section, we demonstrate the feasibility of our proposed approach by performing numerical experiments in the setting of Sect. 3. For all experiments, we consider the two-dimensional computational domain D=(0,1)2, which we discretize with a uniform quadrilateral mesh Th of characteristic mesh size h=25. The coefficients are allowed to vary on the finer unresolved scale ε=28.

Coefficient family

In order to test our method’s ability to deal with coefficients that show oscillating behavior across multiple scales, we introduce a hierarchy of meshes Tk,k=0,1,,8, where the initial mesh T0 consists only of a single element, and the subsequent meshes are obtained by uniform refinement, i.e., Tk is obtained from Tk1 by subdiving each element of Tk1 into four equally sized elements. This implies that the characteristic mesh size of Tk is given by 2k. In the following, we refer to the parameter k as the mesh level. In particular, the computational mesh Th=T5 corresponds to the mesh level 5, whereas the coefficients may vary on the mesh level 8 and are therefore only resolved by the finest mesh T8. We thus have a scenario where an information gap of 3 mesh levels has to be bridged. Based on the mesh hierarchy, we now define the coefficient family A of interest. Let Ak denote the set of element-wise constant coefficients on Tk whose values in the respective cells are iid uniformly distributed on the interval [α,β]:=[1,5], i.e.,

Ak:={AQ0(Tk)|A|TiidU([1,5]) for all TTk}.

Furthermore, let Ams denote the set of coefficients of the form

A=19k=08Ak,AkAk.

These multiscale coefficients are especially interesting, since they vary on all considered scales simultaneously and are therefore very hard to deal with using classical homogenization techniques due to their unstructured nature. The total set of interest A is then defined as

A:=k=08AkAms.

In the following, we will frequently index coefficients sampled from A by their corresponding level, i.e., write Ak,k{0,,8,ms} instead of a plain A.

Data generation and preprocessing

In order to train the network, we sample 500 coefficients Ak(i),i=1,,500, from each Ak,k{0,,8,ms}, where the individual samples Ak(i) on the coarser mesh levels k=0,,7 are prolongated to the finest mesh T8 in order to achieve a uniform dimension across all scales. The set of all sample coefficients is subsequently divided into a training, validation, and test set according to a 80–10–10 split. In order to achieve an identical distribution in all three sets, the splitting is performed separately on every level (including ms), i.e., for every k{0,,8,ms}, the first 400 coefficients Ak(i),i=1,,400, get assigned to the training set Dtrain, the samples with indices 401,,450 to the validation set Dval, and those with indices 451,,500 to the test set Dtest. Then, we individually split each sample Ak(i), using the reduction operators RT introduced in (3.11), into sub-samples Ak,T(i) based on element neighborhoods N(T) for =2 that are centered around the elements TTh, also taking into account the artificial extension of the element neighborhoods around the boundary of D. Since our target scale of interest is h=25 and Th is a uniform quadrilateral mesh, this yields 1024 sub-samples Ak,T(i)R1600 per sample Ak(i)R65,536. Note that the size of the sub-samples is obtained from the construction of the local neighborhoods N(T). Here, each neighborhood consists of (2+1)2=25 elements in Th=T5, which corresponds to 6425=1600 elements in the mesh Tε=T8.

The corresponding “labels”, i.e., the local effective system matrices SA,k,T(i)R36×4, are then computed with the Petrov–Galerkin LOD according to (3.9) and flattened column-wise to vectors in R144. In total, we obtain 104001024=4,096,000 pairs (Ak,T(i),SA,k,T(i))Dtrain to train our network with, and 512,000 pairs in Dval and Dtest each.

Network architecture and training

Given the above dataset, we now try to fit it with a suitable neural network Ψ. As network architecture, we consider a dense feedforward network with a total of eight layers including the input and output layer. As activation function, we choose the standard ReLU activation given by ρ(x):=max(0,x) in the first seven layers and the identity function in the last layer. By convention, the activation function acts component-wise on vectors. The network output is thus of the form

Ψ(x)=W(8)ρ(W(7)(ρ(W(2)ρ(W(1)x+b(1))+b(2)))+b(7))+b(8), 4.1

where the weight matrices and bias vectors have the following dimensions:

W(1)R1600×1600,W(2)R800×1600,W(3)R800×800,W(4)R400×800,W(5)R400×400,W(6)R144×400,W(7)R144×144,W(8)R144×144,b(1)R1600,b(2),R800,b(3)R800,b(4)R400,b(5)R400,b(6)R144,b(7)R144,b(8)R144,

yielding a total of 5,063,504 trainable parameters. The idea behind this architecture is that in the first six layers, information about the coefficient in the input element neighborhood is gathered and combined by allowing communication between all inputs in layers with odd indices, whereas in the layers with even indices, this information is repeatedly compressed. That is, every other layer is built in such a way that the input and output dimension are equal. If the neurons in that layer are understood as some sort of degrees of freedom in a mesh, this refers to having communication among all of these degrees of freedoms, while the layers in between reduce the number of degrees of freedom, which can be interpreted as transferring information to a coarser mesh. In the last two layers, this compressed information is taken and assembled to the local effective system matrix. Note that this logarithmic dependence of the number of layers on the number of scales that need to be bridged by the network (two layers per mesh level to be bridged plus two layers to assemble the local effective matrix) yielded the most reliable results in our experiments. Shallower networks had difficulties fitting the complex training set consisting of coefficients varying on different scales, whereas deeper networks were more prone to overfitting the training set. More involved architectures, for example the ones that include skip connections between layers like in the classic ResNet [34], are also conceivable; however, this seems not to be necessary to obtain good results. The key message here is that the coefficient-to-surrogate map can be satisfyingly approximated by a simple feedforward architecture, whose size does depend only on the scales ε and h, but not on any finer discretization scales. The implementation of the network as well as the training is performed using the library Flux [40] for the open-source scientific computing language Julia [10].

After initializing all parameters in the network according to a Glorot uniform distribution [32], network (4.1) is trained on minibatches of 1000 samples for a total of 20 epochs on Dtrain, using the ADAM optimizer [42] with a step size of 10−4 for the first 5 epochs before reducing it to 10−5 for the subsequent 15 epochs. It could be observed that further training led to a stagnation of the validation error, whereas the error on the training set continued to decrease (very slowly but gradually), indicating overfitting of the network. The development of the loss functional J defined in (2.9) over the epochs is shown in Fig. 2. Note that training and validation loss stay very close to each other during the whole training process since Dtrain and Dval have the same sample distribution due to our chosen splitting procedure. The development of the loss during the training and an average loss of 7.78105 on the test set Dtest indicates that the network has at least learned to approximate the local effective system matrices. In applications, however, we are mostly concerned about how well this translates to the global level, when computing solutions to problem (3.2) using a global system matrix assembled from network outputs. In order to investigate this question, the next three subsections are dedicated to evaluating the performance of the trained network at exactly this task for several coefficients unseen during training. For a given right-hand side f and coefficient A, we denote with uh the solution of (3.7), obtained with the Petrov–Galerkin LOD matrix SA defined in (3.10), and with uˆh the solution obtained by using the neural network approximation of this matrix, i.e.,

SˆA:=TThπTΨ(RT(A))φT. 4.2

Figure 2.

Figure 2

Development of loss functional J during 20 epochs of the network training

The spectral norm difference SASˆA2, the L2-error uhuˆhL2(D) as well as the visual discrepancy between the two solutions are then considered as a measure of the network’s global performance. We emphasize once more that computing approximate surrogates via (4.2) is significantly faster compared to (3.8) and (3.9). This is due to the fact that no corrector problems of the form (3.5) have to be solved to obtain the surrogate model. As pointed out, the solution of these local problems requires inversion on a very fine discretization scale that is significantly smaller than the scale ε on which the coefficient varies. In order to compute the system matrix SA, one has to solve NTh fine-scale linear systems, where NTh denotes the number of elements in Th. In contrast, the main computational effort of evaluating our trained network consists of NL matrix-matrix multiplications, where NL is the number of layers in the network (not taking into account bias vectors and the activation function).

Experiment 1

For our first experiment, we consider an unstructured multiscale coefficient sampled from Ams that was not a part of the training set and a constant right-hand side f1. The coefficient (top left), the error |uh(x)uˆh(x)| (top right) as well as representative cross-sections along x1=0.5 (bottom left) and x2=0.5 (bottom right) of the two solutions uh and uˆh are shown in Fig. 3. The spectral norm difference SASˆA26.58102 and the L2-error uhuˆhL2(D)2.13104 confirm the visual impression – the network has successfully learned to produce a system matrix that is able to capture the behavior of the solution on the target scale well.

Figure 3.

Figure 3

Results of Experiment 1: Unstructured multiscale coefficient sampled from Ams (top left), |uh(x)uˆh(x)| (top right), and comparison of uh vs uˆh along the cross sections x1=0.5 (bottom left) and x2=0.5 (bottom right)

Experiment 2

Next, we test the network’s performance for smoother and more regular coefficients than the ones it has been trained with. As a demonstrating example, we consider the coefficient A(x)=2+sin(2πx1)sin(2πx2). The network’s input is obtained by evaluating A on the midpoints of the mesh Tε on the fine unresolved scale ε. In this example, we choose the function f(x)=x1χ{x10.5} as a right-hand side, where χS denotes the characteristic function of the set SD. The results are shown in Fig. 4. We obtain an even better L2-error of uhuˆhL2(D)7.56105 and a spectral norm difference of SASˆA23.91102. A comparison between the solutions at the cross-sections shows that there is almost no discernible visual difference between the LOD-solution and the approximation obtained using our trained network.

Figure 4.

Figure 4

Results of Experiment 2: Smooth coefficient (top left), |uh(x)uˆh(x)| (top right), and comparison of uh vs uˆh along the cross sections x1=0.5 (bottom left) and x2=0.5 (bottom right)

Experiment 3

As a third experiment, we choose another coefficient that possesses an unfamiliar structure not seen by the network during the training phase, this time a less regular one. The coefficient is shown in the top left of Fig. 5. It is composed of a background part (blue region), which is obtained by sampling uniformly and independently on each element of Tε from the interval [1,2], and several “cracks” (yellow regions), in which the coefficient varies uniformly in [4,5]. The right-hand side here is f(x)=cos(2πx1). A computation of the L2-error uhuˆhL2(D)2.72104 shows that the overall error is still moderate; a closer visual inspection of the solutions along the cross-sections however reveals more prominent deviations of the neural network approximation to the ground truth. The spectral norm difference SASˆA22.81101 is also one order of magnitude larger than in the previous examples. Nevertheless, it might be possible that performing a few corrective training steps including samples of this nature would be sufficient to fix this issue. A thorough investigation of this hypothesis, along with the extension to other coefficient classes is subject of future work.

Figure 5.

Figure 5

Results of Experiment 3: Coefficient with cracks (top left), |uh(x)uˆh(x)| (top right), and comparison of uh vs uˆh along the cross sections x1=0.5 (bottom left) and x2=0.5 (bottom right)

Conclusion and outlook

We proposed an approach to the compression of linear differential operators – parameterized by PDE coefficients that may depend on microscopic quantities that are not resolved by a target discretization scale of interest – to lower-dimensional surrogates that are based on a combination of existing model reduction methods with a data-driven deep learning framework. Our method is motivated by the fact that the computation of the surrogates (represented by effective system matrices) via classical methods is nontrivial and requires significant computational resources in multi-query settings. To overcome this problem, we showed how the compression process can be approximated by a neural network based on given training data that can be generated using existing compression approaches. Importantly, we avoid a global approximation by a neural network and instead first decompose the compression map into local contributions, which can then be approximated by one single unified network. As an example, we studied a class of second-order elliptic diffusion operators. We showed how to approximate the compression map based on the Petrov–Galerkin formulation of the localized orthogonal decomposition method with a neural network. The proposed ansatz has been numerically validated for a large set of piecewise constant and highly oscillatory multiscale coefficients. Furthermore, it has been shown that the approach also generalizes well, in the sense that a well-trained network is able to produce reasonable results even for classes of coefficients that it has not been trained on.

For future research, many possible research questions building on the present work are conceivable. Straightforward extensions would be to consider stochastic settings with differential operators parameterized by random fields or settings with high contrast. Another question to investigate is to what degree the method can be made robust against changes in geometry, for example, by training the network not only on coefficients that are sampled on a fixed domain, but rather on reference patches with varying geometries. Mimicking a hierarchical discretization approach, one may also try to directly approximate the inverse operator which can be represented by a sparse matrix [25]. On a more theoretical level, the approximation properties of neural networks for various existing compression operators could be investigated, along with the question of the number of training samples required to faithfully approximate those for a given family of coefficients.

Acknowledgements

Not applicable.

Authors’ contributions

The work emerged from a close collaboration between the authors. All authors read and approved the final manuscript.

Funding

The work of F. Kröpfl and D. Peterseim is part of the project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 865751 - Computational Random Multiscale Problems). R. Maier acknowledges support by the Göran Gustafsson Foundation for Research in Natural Sciences and Medicine. Open Access funding enabled and organized by Projekt DEAL.

Availability of data and materials

The dataset generated and used for the numerical experiments in this work are available from the corresponding author F. Kröpfl on reasonable request.

Declarations

Competing interests

The authors declare that they have no competing interests.

Contributor Information

Fabian Kröpfl, Email: fabian.kroepfl@uni-a.de.

Roland Maier, Email: roland.maier@uni-jena.de.

Daniel Peterseim, Email: daniel.peterseim@uni-a.de.

References

  • 1.Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., Ghemawat S., Goodfellow I., Harp A., Irving G., Isard M., Jia Y., Jozefowicz R., Kaiser L., Kudlur M., Levenberg J., Mané D., Monga R., Moore S., Murray D., Olah C., Schuster M., Shlens J., Steiner B., Sutskever I., Talwar K., Tucker P., Vanhoucke V., Vasudevan V., Viégas F., Vinyals O., Warden P., Wattenberg M., Wicke M., Yu Y., TensorFlow X.Z. Large-Scale Machine Learning on Heterogeneous Systems. 2015. [Google Scholar]
  • 2.Abdulle A., E W., Engquist B., Vanden-Eijnden E. The heterogeneous multiscale method. Acta Numer. 2012;21:1–87. doi: 10.1017/S0962492912000025. [DOI] [Google Scholar]
  • 3.Abdulle A., Henning P. A reduced basis localized orthogonal decomposition. J. Comput. Phys. 2015;295:379–401. doi: 10.1016/j.jcp.2015.04.016. [DOI] [Google Scholar]
  • 4.Abdulle A., Henning P. Localized orthogonal decomposition method for the wave equation with a continuum of scales. Math. Comput. 2017;86(304):549–587. doi: 10.1090/mcom/3114. [DOI] [Google Scholar]
  • 5.Altmann R., Henning P., Peterseim D. Numerical homogenization beyond scale separation. Acta Numer. 2021;30:1–86. doi: 10.1017/S0962492921000015. [DOI] [Google Scholar]
  • 6.Arbabi H., Bunder J.E., Samaey G., Roberts A.J., Kevrekidis I.G. Linking machine learning with multiscale numerics: data-driven discovery of homogenized equations. JOM. 2020;72(12):4444–4457. doi: 10.1007/s11837-020-04399-8. [DOI] [Google Scholar]
  • 7.Babuška I., Lipton R. Optimal local approximation spaces for generalized finite element methods with application to multiscale problems. Multiscale Model. Simul. 2011;9(1):373–406. doi: 10.1137/100791051. [DOI] [Google Scholar]
  • 8.Babuška I., Osborn J.E. Can a finite element method perform arbitrarily badly? Math. Comput. 2000;69(230):443–462. doi: 10.1090/S0025-5718-99-01085-6. [DOI] [Google Scholar]
  • 9.Berner J., Dablander M., Grohs P. Numerically solving parametric families of high-dimensional Kolmogorov partial differential equations via deep learning. In: Larochelle H., Ranzato M., Hadsell R., Balcan M.F., Lin H., editors. Advances in Neural Information Processing Systems. Red Hook: Curran Associates; 2020. pp. 16615–16627. [Google Scholar]
  • 10.Bezanson J., Edelman A., Karpinski S., Shah V.B. Julia: a fresh approach to numerical computing. SIAM Rev. 2017;59(1):65–98. doi: 10.1137/141000671. [DOI] [Google Scholar]
  • 11.Bhattacharya K., Hosseini B., Kovachki N.B., Stuart A.M. Model reduction and neural networks for parametric PDEs. SMAI J. Comput. Math. 2021;7:121–157. doi: 10.5802/smai-jcm.74. [DOI] [Google Scholar]
  • 12.Caiazzo A., Maier R., Peterseim D. Reconstruction of quasi-local numerical effective models from low-resolution measurements. J. Sci. Comput. 2020;85(1):10. doi: 10.1007/s10915-020-01304-y. [DOI] [Google Scholar]
  • 13.Chan S., Elsheikh A.H. A machine learning approach for efficient uncertainty quantification using multiscale methods. J. Comput. Phys. 2018;354:493–511. doi: 10.1016/j.jcp.2017.10.034. [DOI] [Google Scholar]
  • 14.De Giorgi E. Sulla convergenza di alcune successioni d’integrali del tipo dell’area. Rend. Mat. 1975;6(8):277–294. [Google Scholar]
  • 15.E W., Engquist B. The heterogeneous multiscale methods. Commun. Math. Sci. 2003;1(1):87–132. doi: 10.4310/CMS.2003.v1.n1.a8. [DOI] [Google Scholar]
  • 16.E W., Engquist B. Multiscale Methods in Science and Engineering. Berlin: Springer; 2005. The heterogeneous multi-scale method for homogenization problems; pp. 89–110. [Google Scholar]
  • 17.E W., Han J., Jentzen A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 2017;5(4):349–380. doi: 10.1007/s40304-017-0117-6. [DOI] [Google Scholar]
  • 18.E W., Han J., Jentzen A. Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. Nonlinearity. 2021;35(1):278–310. doi: 10.1088/1361-6544/ac337f. [DOI] [Google Scholar]
  • 19.E W., Yu B. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018;6(1):1–12. doi: 10.1007/s40304-018-0127-z. [DOI] [Google Scholar]
  • 20.Efendiev Y.R., Galvis J., Wu X.-H. Multiscale finite element methods for high-contrast problems using local spectral basis functions. J. Comput. Phys. 2011;230(4):937–955. doi: 10.1016/j.jcp.2010.09.026. [DOI] [Google Scholar]
  • 21.Efendiev Y.R., Hou T.Y. Multiscale Finite Element Methods: Theory and Applications. New York: Springer; 2009. [Google Scholar]
  • 22.Elfverson D., Ginting V., Henning P. On multiscale methods in Petrov-Galerkin formulation. Numer. Math. 2015;131(4):643–682. doi: 10.1007/s00211-015-0703-z. [DOI] [Google Scholar]
  • 23.Engwer C., Henning P., Målqvist A., Peterseim D. Efficient implementation of the localized orthogonal decomposition method. Comput. Methods Appl. Mech. Eng. 2019;350:123–153. doi: 10.1016/j.cma.2019.02.040. [DOI] [Google Scholar]
  • 24.Ern A., Guermond J.-L. Finite element quasi-interpolation and best approximation. ESAIM: Math. Model. Numer. Anal. 2017;51(4):1367–1385. doi: 10.1051/m2an/2016066. [DOI] [Google Scholar]
  • 25.Feischl M., Peterseim D. Sparse compression of expected solution operators. SIAM J. Numer. Anal. 2020;58(6):3144–3164. doi: 10.1137/20M132571X. [DOI] [Google Scholar]
  • 26.Gallistl D., Henning P., Verfürth B. Numerical homogenization of H(curl)-problems. SIAM J. Numer. Anal. 2018;56(3):1570–1596. doi: 10.1137/17M1133932. [DOI] [Google Scholar]
  • 27.Gallistl D., Peterseim D. Stable multiscale Petrov–Galerkin finite element method for high frequency acoustic scattering. Comput. Methods Appl. Math. 2015;295:1–17. [Google Scholar]
  • 28.Gao H., Sun L., Wang J.-X. Phygeonet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 2021;428:110079. doi: 10.1016/j.jcp.2020.110079. [DOI] [Google Scholar]
  • 29. Geevers, S., Maier, R.: Fast mass lumped multiscale wave propagation modelling. IMA J. Numer. Anal. (2022) To appear
  • 30. Geist, M., Petersen, P., Raslan, M., Schneider, R., Kutyniok, G.: Numerical solution of the parametric diffusion equation by deep neural networks. J. Sci. Comput. (2022) To appear
  • 31.Ghavamian F., Simone A. Accelerating multiscale finite element simulations of history-dependent materials using a recurrent neural network. Comput. Methods Appl. Math. 2019;357:112594. [Google Scholar]
  • 32.Glorot X., Bengio Y. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 2010. Understanding the difficulty of training deep feedforward neural networks; pp. 249–256. [Google Scholar]
  • 33.Han J., Jentzen A., E W. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 2018;115(34):8505–8510. doi: 10.1073/pnas.1718942115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Deep residual learning for image recognition; pp. 770–778. [Google Scholar]
  • 35.Hellman F., Keil T., Målqvist A. Numerical upscaling of perturbed diffusion problems. SIAM J. Sci. Comput. 2020;42(4):A2014–A2036. doi: 10.1137/19M1278211. [DOI] [Google Scholar]
  • 36.Henning P., Peterseim D. Oversampling for the multiscale finite element method. Multiscale Model. Simul. 2013;11(4):1149–1175. doi: 10.1137/120900332. [DOI] [Google Scholar]
  • 37.Henning P., Wärnegård J. Superconvergence of time invariants for the Gross-Pitaevskii equation. Math. Comput. 2022;91(334):509–555. [Google Scholar]
  • 38.Hou T.Y., Wu X.-H. A multiscale finite element method for elliptic problems in composite materials and porous media. J. Comput. Phys. 1997;134(1):169–189. doi: 10.1006/jcph.1997.5682. [DOI] [Google Scholar]
  • 39.Hutzenthaler M., Jentzen A., Kruse T., Nguyen T.A. A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. Ser. Partial Differ. Equ. Appl. 2020;1(2):1–34. [Google Scholar]
  • 40. Innes, M.: Flux: elegant machine learning with Julia. J. Open Sour. Softw. (2018)
  • 41.Khoo Y., Lu J., Ying L. Solving parametric PDE problems with artificial neural networks. Eur. J. Appl. Math. 2021;32(3):421–435. doi: 10.1017/S0956792520000182. [DOI] [Google Scholar]
  • 42. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). Preprint arXiv:1412.6980
  • 43. Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx. (2022) To appear
  • 44.Maier R., Peterseim D. Explicit computational wave propagation in micro-heterogeneous media. BIT Numer. Math. 2019;59(2):443–462. doi: 10.1007/s10543-018-0735-8. [DOI] [Google Scholar]
  • 45. Maier, R., Verfürth, B.: Multiscale scattering in nonlinear Kerr-type media. Math. Comput. (2022) To appear
  • 46.Målqvist A., Peterseim D. Localization of elliptic multiscale problems. Math. Comput. 2014;83(290):2583–2603. doi: 10.1090/S0025-5718-2014-02868-8. [DOI] [Google Scholar]
  • 47.Målqvist A., Peterseim D. Computation of eigenvalues by numerical upscaling. Numer. Math. 2015;130(2):337–361. doi: 10.1007/s00211-014-0665-6. [DOI] [Google Scholar]
  • 48.Målqvist A., Peterseim D. Generalized finite element methods for quadratic eigenvalue problems. ESAIM: Math. Model. Numer. Anal. 2017;51(1):147–163. doi: 10.1051/m2an/2016019. [DOI] [Google Scholar]
  • 49.Målqvist A., Peterseim D. Numerical Homogenization by Localized Orthogonal Decomposition. Philadelphia: SIAM; 2020. [Google Scholar]
  • 50. Målqvist, A., Verfürth, B.: An offline-online strategy for multiscale problems with random defects. ESAIM: Math. Model. Numer. Anal. (2022) To appear
  • 51.Murat F., Tartar L. Séminaire d’Analyse Fonctionnelle et Numérique de l’Université d’Alger. 1978. H-convergence. [Google Scholar]
  • 52.Owhadi H. Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games. SIAM Rev. 2017;59(1):99–149. doi: 10.1137/15M1013894. [DOI] [Google Scholar]
  • 53.Owhadi H., Zhang L., Berlyand L. Polyharmonic homogenization, rough polyharmonic splines and sparse super-localization. ESAIM: Math. Model. Numer. Anal. 2014;48(2):517–552. doi: 10.1051/m2an/2013118. [DOI] [Google Scholar]
  • 54.Padmanabha G.A., Zabaras N. A Bayesian multiscale deep learning framework for flows in random media. Found. Data Sci. 2021;3(2):251–303. doi: 10.3934/fods.2021016. [DOI] [Google Scholar]
  • 55.Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., Desmaison A., Kopf A., Yang E., DeVito Z., Raison M., Tejani A., Chilamkurthy S., Steiner B., Fang L., Bai J., PyTorch S.C. An imperative style, high-performance deep learning library. In: Wallach H., Larochelle H., Beygelzimer A., Alché-Buc F., Fox E., Garnett R., editors. Advances in Neural Information Processing Systems. Red Hook: Curran Associates; 2019. pp. 8024–8035. [Google Scholar]
  • 56.Peterseim D. Building Bridges: Connections and Challenges in Modern Approaches to Numerical Partial Differential Equations. Cham: Springer; 2016. Variational multiscale stabilization and the exponential decay of fine-scale correctors; pp. 341–367. [Google Scholar]
  • 57.Peterseim D. Eliminating the pollution effect in Helmholtz problems by local subscale correction. Math. Comput. 2017;86(305):1005–1036. doi: 10.1090/mcom/3156. [DOI] [Google Scholar]
  • 58.Peterseim D., Sauter S.A. Finite elements for elliptic problems with highly varying, nonperiodic diffusion matrix. Multiscale Model. Simul. 2012;10(3):665–695. doi: 10.1137/10081839X. [DOI] [Google Scholar]
  • 59.Peterseim D., Schedensack M. Relaxing the CFL condition for the wave equation on adaptive meshes. J. Sci. Comput. 2017;72(3):1196–1213. doi: 10.1007/s10915-017-0394-y. [DOI] [Google Scholar]
  • 60.Raissi M., Perdikaris P., Karniadakis G.E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019;378:686–707. doi: 10.1016/j.jcp.2018.10.045. [DOI] [Google Scholar]
  • 61.Ren X., Hannukainen A., Belahcen A. Homogenization of multiscale Eddy current problem by localized orthogonal decomposition method. IEEE Trans. Magn. 2019;55(9):1–4. doi: 10.1109/TMAG.2019.2917400. [DOI] [Google Scholar]
  • 62.Schwab C., Zech J. Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ. Anal. Appl. 2019;17(01):19–55. doi: 10.1142/S0219530518500203. [DOI] [Google Scholar]
  • 63.Sirignano J., Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018;375:1339–1364. doi: 10.1016/j.jcp.2018.08.029. [DOI] [Google Scholar]
  • 64.Spagnolo S. Sulla convergenza di soluzioni di equazioni paraboliche ed ellittiche. Ann. Sc. Norm. Super. Pisa, Cl. Sci. 1968;3(22):571–597. [Google Scholar]
  • 65.Wang Y., Cheung S.W., Chung E.T., Efendiev Y., Wang M. Deep multiscale model learning. J. Comput. Phys. 2020;406:109071. doi: 10.1016/j.jcp.2019.109071. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset generated and used for the numerical experiments in this work are available from the corresponding author F. Kröpfl on reasonable request.


Articles from Advances in Continuous and Discrete Models are provided here courtesy of Springer

RESOURCES