Enforcing Dirichlet boundary conditions in physics-informed neural networks and variational physics-informed neural networks

S Berrone; C Canuto; M Pintore; N Sukumar

doi:10.1016/j.heliyon.2023.e18820

. 2023 Aug 2;9(8):e18820. doi: 10.1016/j.heliyon.2023.e18820

Enforcing Dirichlet boundary conditions in physics-informed neural networks and variational physics-informed neural networks

S Berrone ^a, C Canuto ^a, M Pintore ^a,^⁎, N Sukumar ^b

PMCID: PMC10432987 PMID: 37600384

Abstract

In this paper, we present and compare four methods to enforce Dirichlet boundary conditions in Physics-Informed Neural Networks (PINNs) and Variational Physics-Informed Neural Networks (VPINNs). Such conditions are usually imposed by adding penalization terms in the loss function and properly choosing the corresponding scaling coefficients; however, in practice, this requires an expensive tuning phase. We show through several numerical tests that modifying the output of the neural network to exactly match the prescribed values leads to more efficient and accurate solvers. The best results are achieved by exactly enforcing the Dirichlet boundary conditions by means of an approximate distance function. We also show that variationally imposing the Dirichlet boundary conditions via Nitsche's method leads to suboptimal solvers.

MSC: 35A15, 65L10, 65L20, 65K10, 68T05

Keywords: Dirichlet boundary conditions, PINN, VPINN, Deep neural networks, Approximate distance function

1. Introduction

Physics-Informed Neural Networks (PINNs), proposed in [1] after the initial pioneering contributions of Lagaris et al. [2], [3], [4], are rapidly emerging computational methods to solve partial differential equations (PDEs). In its basic formulation, a PINN is a neural network that is trained to minimize the PDE residual on a given set of collocation points in order to compute a corresponding approximate solution. In particular, the fact that the PDE solution is sought in a nonlinear space via a nonlinear optimizer distinguishes PINNs from classical computational methods. This provides PINNs flexibility, since the same code can be used to solve completely different problems by adapting the neural network loss function that is used in the training phase. Moreover, due to the intrinsic nonlinearity and the adaptive architecture of the neural network, PINNs can efficiently solve inverse [5], [6], [7], parametric [8], high-dimensional [9], [10] as well as nonlinear [11] problems. Another important feature characterizing PINNs is that it is possible to combine distinct types of information within the same loss function to readily modify the optimization process. This is useful, for instance, to effortlessly integrate (synthetic or experimental) external data into the training phase to obtain an approximate solution that is computed using both data and physics [12].

In order to improve the original PINN idea, several extensions have been developed. Some of these developments include the Deep Ritz method (DRM) [13], in which the energy functional of a variational problem is minimized; the conservative PINN (cPINN) [14], where the approximate solution is computed by a domain-decomposition approach enforcing flux conservation at the interfaces, as well as its improvement in the extended PINN (XPINN) [15]; and the variational PINN (VPINN) [16], [17], in which the loss function is defined by exploiting the variational structure of the underlying PDE.

Most of the existing PINN approaches enforce the essential (Dirichlet) boundary conditions by means of additional penalization terms that contribute to the loss function, these are each multiplied by constant weighting factors. See for instance [18], [19], [20], [21], [22], [23], [24], [25], [26]; note that this list is by no means exhaustive, therefore we also refer to [27], [28], [29] for more detailed overviews of the PINN literature. However, such an approach may lead to poor approximation, and therefore several techniques to improve it have been proposed. In [30] and [31], adaptive scaling parameters are proposed to balance the different terms in the loss functions. In particular, in [30] the parameters are updated during the minimization to maximize the loss function via backpropagation, whereas in [31] a fixed learning rate annealing procedure is adopted. Other alternatives are related to adaptive sampling strategies (e.g., [32], [33], [34]) or to specific techniques such as the Neural Tangent Kernel [35].

Note that although it is possible to automatically tune these scaling parameters during the training, such techniques require more involved implementations and in most cases lead to intrusive methods since the optimizer has to be modified. Instead, in this paper, we focus on three simple and non-intrusive approaches to impose Dirichlet boundary conditions and we compare their accuracy and efficiency. The proposed approaches are tested using standard PINN and interpolated VPINN which have been proven to be more stable than standard VPINNs [36].

The main contributions of this paper are as follows:

1.
We present three non-standard approaches to enforce Dirichlet boundary conditions on PINNs and VPINNs, and discuss their mathematical formulation and their pros and cons. Two of them, based on the use of an approximate distance function, modify the output of the neural network to exactly impose such conditions, whereas the last one enforces them approximately by a weak formulation of the equation.
2.
The performance of the distinct approaches to impose Dirichlet boundary conditions is assessed on various test cases. On average, we find that exactly imposing the boundary conditions leads to more efficient and accurate solvers. We also compare the interpolated VPINN to the standard PINN, and observe that the different approaches used to enforce the boundary conditions affect these two models in similar ways.

The structure of the remainder of this paper is as follows. In Section 2, the PINN and VPINN formulations are described: first, we describe the neural network architecture in Section 2.1 and then focus on the loss functions that characterize the two models in Section 2.2. Subsequently, in Section 3, we present the four approaches to enforce the imposition of Dirichlet boundary conditions; three of them can be used with both PINNs and VPINNs, whereas the last one is used to enforce the required boundary conditions only on VPINNs because it relies on the variational formulation. Numerical results are presented in Section 4. In Section 4.1, we first analyze for a second-order elliptic problem the convergence rate of the VPINN with respect to mesh refinement. In doing so, we demonstrate that when the neural network is properly trained, identical optimal convergence rates are realized by all approaches only if the PDE solution is simple enough. Otherwise, only enforcing the Dirichlet boundary conditions with Nitsche's method or by exactly imposing them via approximate distance functions ensure the theoretical convergence rate. In addition, we compare the behavior of the loss function and the $H^{1}$ error while increasing the number of epochs, as well as the behavior of the error when the network architecture is varied. In Section 4.2, we show that it is also possible to efficiently solve second-order parametric nonlinear elliptic problems. Furthermore, in Sections 4.3–4.5, we compare the performance of all approaches on PINNs and VPINNs by solving a linear elasticity problem and a stabilized Eikonal equation over an L-shaped domain, and a convection problem. Finally, in Section 5, we close with our main findings and present a few perspectives for future work.

2. PINNs and interpolated variational PINNs

In this section, we describe the PINN and VPINN that are used in Section 4. In particular, in Section 2.1 the neural network architecture is presented, and the construction of the loss functions is discussed in Section 2.2.

2.1. Neural network description

In this work we compare the efficiency of four approaches to enforce Dirichlet boundary conditions in PINN and VPINN. The main difference between these two numerical models is the training loss function; the architecture of the neural network is the same and is independent of the way the boundary conditions are imposed.

In our numerical experiments we only consider fully-connected feed forward neural networks with a fixed architecture. Such neural networks can be represented as nonlinear parametric functions $u^{NN} : R^{N_{in}} \to R^{N_{out}}$ that can be evaluated via the following recursive formula:

x_{i}^{⁎} = σ_{i} (A_{i} x_{i - 1}^{⁎} + b_{i}), i = 1, 2, \dots, L .

(2.1)

In particular, with the notation of (2.1), $x_{0}^{⁎} \in R^{N_{in}}$ is the neural network input vector, $x_{L}^{⁎} \in R^{N_{out}}$ is the neural network output vector, the neural network architecture consists of an input layer, $L - 1$ hidden layers and one output layer, $A_{i}$ and $b_{i}$ are matrices and vectors containing the neural network weights, and $σ_{i} : R \to R$ is the activation function of the i-th layer and is element-wise applied to its input vector. We also remark that the i-th layer is said to contain $dim (x_{i}^{⁎})$ neurons and that $σ_{i}$ has to be nonlinear for any $i = 1, 2, \dots, L - 1$ . Common nonlinear activation functions are the rectified linear unit ( $ReLU (x) : = max (0, x)$ ), the hyperbolic tangent and the sigmoid function. In this work, we take $σ_{L}$ to be the identity function in order to avoid imposing any constraint on the neural network output.

The weights contained in $A_{i}$ and $b_{i}$ can be logically reorganized in a single vector $w^{NN}$ . The goal of the training phase is to find a vector $w^{NN}$ that minimizes the loss function; however, since such a loss function is nonlinear with respect to $w^{NN}$ and the corresponding manifold is extremely complicated, we can at best find good local minima.

2.2. PINN and interpolated VPINN loss functions

For the sake of simplicity, the loss function for PINN and interpolated VPINN is stated for second-order elliptic boundary-value problems. However, the discussion can be directly generalized to different PDEs, and in Section 4, numerical results associated with other problems are also presented.

Let us consider the model problem:

{\begin{matrix} L u : = - \nabla \cdot (μ \nabla u) + β \cdot \nabla u + σ u = f & in Ω, \\ u = g & on Γ_{D}, \\ μ \frac{\partial u}{\partial n} = ψ & on Γ_{N}, \end{matrix}

(2.2)

where $Ω \subset R^{n}$ is a bounded domain whose Lipschitz boundary ∂Ω is partitioned as $\partial Ω = Γ_{D} \cup Γ_{N}$ , with ${meas}_{n - 1} (Γ_{D}) > 0$ . For the well-posedness of the boundary-value problem we require μ, $σ \in L^{\infty} (Ω)$ and $β \in {(W^{1, \infty} (Ω))}^{n}$ satisfying, in the entire domain Ω, $μ \geq μ_{0}$ for some strictly positive constant $μ_{0}$ and $σ - \frac{1}{2} \nabla \cdot β \geq 0$ . Moreover, $f \in L^{2} (Ω)$ , $ψ \in L^{2} (Γ_{N})$ and $g = {\overline{u}}_{| Γ_{D}}$ for some $\overline{u} \in H^{1} (Ω)$ . We point out that even if these assumptions ensure the well-posedness of the problem, PINNs and VPINNs often struggle to compute low regularity solutions. We refer to [37] for a recent example of a neural network based model that overcomes this issue.

In order to train a PINN, one introduces a set of collocation points ${x_{1}, \dots, x_{N_{I}}}$ and evaluates the corresponding equation residuals ${r_{1}^{PINN}, \dots, r_{N_{I}}^{PINN}}$ . Such residuals, for problem (2.2), are defined as:

r_{i}^{PINN} (u) = - \nabla \cdot (μ \nabla u) (x_{i}) + β \cdot \nabla u (x_{i}) + σ u (x_{i}) - f (x_{i}) \forall i = 1, 2, \dots, N_{I} .

(2.3)

Since we are interested in a neural network that satisfies the PDE in a discrete sense, the loss function minimized during the PINN training is:

R_{PINN}^{2} (w) = \sum_{i = 1}^{N_{I}} {| r_{i}^{PINN} (w) |}^{2} .

(2.4)

In (2.4), when $N_{I}$ is sufficiently large and $R_{PINN}^{2} (u^{NN})$ is close to zero, the function $u^{NN}$ represented by the neural network output approximately satisfies the PDE and can thus be considered a good approximation of the exact solution. Other terms are often added to impose the boundary conditions or improve the training, which are discussed in Section 3.

Let us now focus on the interpolated VPINN proposed in [36]. We introduce the function spaces $U : = H^{1} (Ω)$ and $V : = {v \in H^{1} (Ω) : v_{| Γ_{D}} = 0}$ , the bilinear form $a : U \times V \to R$ and the linear form $F : V \to R$ ,

a (w, v) = \int_{Ω} μ \nabla w \cdot \nabla v + β \nabla w v + σ w v, F (v) = \int_{Ω} f v + \int_{Γ_{N}} ψ v .

The variational counterpart of problem (2.2) thus reads: Find $u \in U$ such that:

a (u, v) = F (v) \forall v \in V, u = g on Γ_{D} .

(2.5)

In order to discretize problem (2.5), we use two discrete function spaces. Inspired by the Petrov-Galerkin framework, we denote the discrete trial space by $U_{h} \subset U$ and the discrete test space by $V_{h} \subset V$ . The functions comprising such spaces are generated on two conforming, shape-regular and nested partitions $T_{H}$ and $T_{h}$ with compatible meshsizes H and h, respectively. Assuming that $T_{h}$ is the finer mesh, one can claim that $H ≲ h < H$ and that every element of $T_{h}$ is strictly contained in an element of $T_{H}$ .

Denoting by $U_{H} : = span {φ_{i}^{u} : i \in I_{H}} \subset U$ the space of piecewise polynomial functions of order $k_{int}$ over $T_{H}$ and $V_{h} : = span {φ_{i}^{v} : i \in I_{h}} \subset V$ the space of piecewise polynomial functions of order $k_{test}$ over $T_{h}$ that vanish on $Γ_{D}$ , we define the discrete variational problem as: Find $u \in U_{H}$ such that:

a (u, v) = F (v) \forall v \in V_{h}, u = g_{H} on Γ_{D},

(2.6)

where $g_{H}$ is a suitable piecewise polynomial approximation of g. A representation of the spaces $U_{H}$ and $V_{h}$ in a one-dimensional domain is provided in Figs. 1a and 1b. Examples of pair of meshes $T_{H}$ and $T_{h}$ are shown in Fig. 1c.

Pair of meshes and corresponding basis functions of a one-dimensional discretization (left) and nested meshes $T_{H}$ and $T_{h}$ in a two-dimensional domain (right). (a) Basis functions of V_h. The filled circles (red) are the nodes of the corresponding mesh $T_{h}$ ; (b) Basis functions of U_H. The filled circles (blue) are the vertex nodes that define the elements of the corresponding mesh $T_{H}$ ; and (c) Meshes used in the numerical experiments of Sections 4.3 and 4.4. The blue mesh is $T_{H}$ , the red one is $T_{h}$ . All the figures are obtained with q = 3, k_test = 1, k_int = 4.

In order to obtain computable forms $a_{h}$ and $F_{h}$ , we introduce elemental quadrature rules of order q and define $a_{h} (\cdot, \cdot)$ and $F_{h} (\cdot)$ as the approximations of $a (\cdot, \cdot)$ and $F (\cdot)$ computed with such quadrature rules. In [36], under suitable assumptions, an a priori error estimate with respect to mesh refinement has been proved when $q = k_{int} + k_{test} - 2$ . It is then possible to define the computable variational residuals associated with the basis functions of $V_{h}$ as:

r_{h, i} (w) = F_{h} (φ_{i}^{v}) - a_{h} (w, φ_{i}^{v}), i \in I_{h} .

(2.7)

Consequently, in order to compute an approximate solution of problem (2.6), one seeks a function $w \in U_{H}$ that minimizes the quantity:

R_{h}^{2} (w) = \sum_{i \in I_{h}} r_{h, i}^{2} (w),

(2.8)

and satisfies the imposed boundary conditions. We refer to Section 3 for a detailed description of different approaches used to impose Dirichlet boundary conditions. It should be noted that, since in Sections 4.2–4.5 we consider problems other than (2.2), the residuals in (2.7) have to be suitably modified, while the loss function structure defined in (2.8) is maintained.

We are interested in using a neural network to find the minimizer of $R_{h}^{2}$ . We thus denote by $I_{H} : C^{0} (\overline{Ω}) \to U_{H}$ an interpolation operator used to map the function $u^{N N}$ associated with the neural network to its interpolating element in $U_{H}$ , and train the neural network to minimize the quantity $R_{h}^{2} (I_{H} u^{N N})$ . We highlight that in order to construct the function $I_{H} u^{N N}$ , the neural network has to be evaluated only on $dim (U_{H})$ interpolation points ${x_{1}^{I}, \dots, x_{dim (U_{H})}^{I}} \subset \overline{Ω}$ . Then, assuming that ${φ_{i}^{u} : i \in I_{H}}$ is a Lagrange basis such that $φ_{i}^{u} (x_{j}^{I}) = δ_{i j}$ for every $i, j \in I_{H}$ , it holds:

I_{H} u^{N N} = \sum_{i \in I_{H}} u^{N N} (x_{i}^{I}) φ_{i}^{u} .

(2.9)

We remark that the approaches proposed in Section 3 can also be used on non-interpolated VPINNs. However, we restrict our analysis to interpolated VPINNs because of their better stability properties (see Fig. 11 and the corresponding discussion).

H¹ errors using method M_B for standard (left) and interpolated VPINNs (right) as a function of the hyperparameters.

3. Mathematical formulation

We compare four methods to impose Dirichlet boundary conditions on PINNs and VPINNs. We do not consider Neumann or Robin boundary conditions since they can be weakly enforced by the trained VPINN due to the chosen variational formulation (computations using PINNs is discussed in [38]). We also highlight that method $M_{D}$ below can be used only with VPINNs because it relies on the variational formulation of the PDE. We analyze the following methods:

$M_{A}$ :
Incorporation of an additional cost in the loss function that penalizes unsatisfied boundary conditions; this is the standard approach in PINNs and VPINNs because of its simplicity and effectiveness. In fact, it is possible to choose $N_{B}$ control points ${x_{1}^{g}, \dots, x_{N_{B}}^{g}} \subset Γ_{D}$ and modify the loss functions defined in (2.4) or (2.8) as follows:
$R_{PINN}^{2} (w) = \sum_{i = 1}^{N_{I}} {| r_{i}^{PINN} (w) |}^{2} + λ \sum_{i = 1}^{N_{B}} {(w (x_{i}^{g}) - g (x_{i}^{g}))}^{2},$ (3.1)
or
$R_{h}^{2} (w) = \sum_{i \in I_{h}} r_{h, i}^{2} (w) + λ \sum_{i = 1}^{N_{B}} {(w (x_{i}^{g}) - g (x_{i}^{g}))}^{2},$ (3.2)
where $λ > 0$ is a model hyperparameter. Note that on considering the interpolated VPINN and exploiting the solution structure in (2.9), it is possible to ensure the uniqueness of the numerical solution by choosing the control points ${x_{1}^{g}, \dots, x_{N_{B}}^{g}}$ as the $N_{B}$ interpolation points belonging to $Γ_{D}$ .

We also highlight that such a method can be easily adapted to impose other types of boundary conditions just by adding suitable terms to (3.1) and (3.2). On the other hand, despite its simplicity, the main drawback of this approach is that it leads to a more complex multi-objective optimization problem.
$M_{B}$ :
Exactly imposing the Dirichlet boundary conditions as described in [38] and [36]. In this method we add a non-trainable layer B at the end of the neural network to modify its output w according to the rule:
$B w = \overline{g} + ϕ w,$ (3.3)
where $\overline{g} \in C^{0} (\overline{Ω})$ is an extension of the function g inside the domain Ω (i.e., ${\overline{g}}_{| Γ_{D}} = g$ ) and $ϕ \in C^{0} (\overline{Ω})$ is an approximate distance function (ADF) to the boundary $Γ_{D}$ , i.e., $ϕ (x) = 0$ if and only if $x \in Γ_{D}$ , and it is positive elsewhere. During the training phase one minimizes the quantity $R_{PINN}^{2} (B w)$ or $R_{h}^{2} (B w)$ .

For the sake of simplicity, we only consider ADFs for two-dimensional unions of segments, even though the approach generalizes to more complex geometries. Following the derivation of $\overline{g}$ and ϕ in [38], we start by defining d as the signed distance function from $x : = (x, y)$ to the line defined by the segment AB of length L with vertices $A = (x_{A}, y_{A})$ and $B = (x_{B}, y_{B})$ :
$d (x) = \frac{(x - x_{A}) (y_{B} - y_{A}) - (y - y_{A}) (x_{B} - x_{A})}{L} .$
Then, we denote $(x_{c}, y_{c}) : = ((x_{A} + x_{B}) / 2, (y_{A} + y_{B}) / 2)$ to be the center of AB and define t as the following trimming function:
$t (x) = \frac{1}{L} [{(\frac{L}{2})}^{2} - {‖ (x, y) - (x_{c}, y_{c}) ‖}^{2}] .$
Note that $t \geq 0$ defines a circle of center $(x_{c}, y_{c})$ . Finally, the ADF to AB is defined as
$ϕ (x) = \sqrt{d^{2} + {(\frac{\sqrt{t^{2} + d^{4}} - t}{2})}^{2}} .$
A graphical representation of $d (x)$ , $t (x)$ and $ϕ (x)$ for an inclined line segment is shown in Figs. 2a, 2b and 2c, respectively.

Assuming that $Γ_{D}$ can be expressed as the union of $n_{s}$ segments ${s_{1}, \dots, s_{n_{s}}}$ , then the ADF to $Γ_{D}$ , normalized up to order $m \geq 1$ , is defined as:
$ϕ = \frac{1}{\sqrt[m]{\frac{1}{ϕ_{1}^{m}} + \frac{1}{ϕ_{2}^{m}} + \dots + \frac{1}{ϕ_{n_{s}}^{m}}}},$ (3.4)
where $ϕ_{i}$ is the ADF to the segment $s_{i}$ (see [39]). We remark that an ADF normalized up to order $m \geq 1$ is an ADF such that, for every regular point of $Γ_{D}$ , the following holds:
$ϕ = 0, \frac{\partial ϕ}{\partial n} = 1, \frac{\partial^{k} ϕ}{\partial n^{k}} = 0 (k = 2, 3, \dots, m) .$
Such a normalization is useful to impose constraints associated with the solution derivatives and to obtain ADFs with about the same order of magnitude in every region of the domain Ω. However, one of the main limitations of this approach with collocation-PINN is that Δϕ tends to infinity near the vertices of $Γ_{D}$ (see Appendix A for an example). This phenomenon produces oscillations in the numerical solutions, hence collocation points that are close to such vertices should not be selected. On the other hand, when only first derivatives are present in the weak formulation of second-order problems (as in the present study), then one can choose quadrature points that are very close to the vertices of $Γ_{D}$ .

When a function $\overline{g}$ is not known, it is possible to construct it using transfinite interpolation. Let $g_{i}$ be a function such that ${g_{i}}_{| s_{i}} = g_{| s_{i}}$ , then $\overline{g}$ can be defined as:
$\overline{g} = \sum_{i = 1}^{n_{s}} w_{i} g_{i},$
where $w_{i}$ is defined as:
$w_{i} = \frac{\prod_{j = 1; j \neq i}^{n_{s}} ϕ_{j}}{\sum_{k = 1}^{n_{s}} \prod_{j = 1; j \neq k}^{n_{s}} ϕ_{j}} .$
Note that since $s_{i}$ is a segment, a function $g_{i}$ can be readily defined at any arbitrary point $(x, y)$ just by evaluating g at the orthogonal projection of $(x, y)$ onto $s_{i}$ .
$M_{C}$ :
Exactly imposing the Dirichlet boundary conditions as in $M_{B}$ but without normalizing the ADF. Therefore, we consider a different function ϕ in (3.3), namely
$ϕ = \prod_{i = 1}^{n_{s}} ϕ_{i} .$
This ensures that ϕ and all its derivatives exist and are bounded in $\overline{Ω}$ , although ϕ may be very small in regions close to many segments $s_{i}$ .
$M_{D}$ :
Using Nitsche's method [40]. The goal of this method is to variationally impose the Dirichlet boundary conditions. In doing so, the network architecture is not modified with additional layers (as in $M_{B}$ and $M_{C}$ ) and a single objective function suffices for network training.

To do so, one enlarges the space $V_{h}$ to contain all piecewise polynomials of order $k_{test}$ defined on $T_{h}$ and modifies the residuals defined in (2.7) in the following way:
$r_{h, i} (w) = F_{h} (φ_{i}^{v}) - a_{h} (w, φ_{i}^{v}) + \int_{Γ_{D}} (w - g) \frac{\partial ϕ_{i}^{v}}{\partial n} + γ \int_{Γ_{D}} h^{- 1} (g - w) ϕ_{i}^{v}, i \in I_{h},$ (3.5)
where γ is a positive constant satisfying $γ \geq γ_{0}$ for a suitable $γ_{0} > 0$ and $I_{h}$ is now an enlarged index set corresponding to the enlarged basis ${ϕ_{i}^{v} : i \in I_{H}}$ . Thanks to the scaling term $h^{- 1}$ that magnifies the quantity $\int_{Γ_{D}} (g - w) ϕ_{i}^{v}$ when fine meshes are used, the choice of γ is not as important as the one of λ in method $M_{A}$ . This property is confirmed by numerical results shown in Figure 8, Figure 10. Since there is no ambiguity, we maintain the same symbols $V_{h}$ , $I_{h}$ and ${ϕ_{i}^{v}}$ introduced in Section 2.2; they always represent the enlarged sets when method $M_{D}$ is considered. Note that when w satisfies the Dirichlet boundary conditions, the terms added in (3.5) vanish.

Representation of the signed distance function d(x) to a straight line (left), the trimming function t(x) (middle) and the approximate distance function ϕ(x) to a segment (right).

Error decay obtained with M_A and different values of λ. Forcing term and Dirichlet boundary conditions are set such that the exact solution is (4.3). The theoretical convergence rate is k_int. (a) Convergence rates: 3.66 (λ = 10³), 2.05 (λ = 1), 0.01 (λ = 10⁻³). (b) Convergence rates: 5.85 (λ = 10³), 4.42 (λ = 1), 2.89 (λ = 10⁻³). (c) Convergence rates: 3.95 (λ = 10³), 3.68 (λ = 1), 3.71 (λ = 10⁻³).

Error decay obtained with M_D and different values of γ. Forcing term and Dirichlet boundary conditions are set such that the exact solution is (4.3). The theoretical convergence rate is k_int. (a) Convergence rates: 4.18 (γ = 0.1), 4.79 (γ = 1), 3.78 (γ = 10). (b) Convergence rates: 6.54 (γ = 0.1), 6.51 (γ = 1), 7.06 (γ = 10). (c) Convergence rates: 4.19 (γ = 0.1), 4.19 (γ = 1), 4.20 (γ = 10).

We point out that method $M_{A}$ is often referred to as soft boundary condition imposition, whereas $M_{B}$ and $M_{C}$ are known as hard boundary condition impositions. Hence, we can treat $M_{D}$ as weak boundary condition imposition.

4. Numerical results

In this section, the methods $M_{A}$ , $M_{B}$ , $M_{C}$ and $M_{D}$ discussed in Section 3 are analyzed and compared. In each numerical experiment the neural network is a fully-connected feed-forward neural network as described in Section 2.1. The corresponding architecture is composed of 4 hidden layers with 50 neurons in each layer and with the hyperbolic tangent as the activation function, while the output layer is a linear layer with one or two neurons.

In order to properly minimize the loss function we use the first-order ADAM optimizer [41] with an exponentially decaying learning rate, and after a prescribed number of epochs, the second-order BFGS method [42] is used until a maximum number of iterations is reached, or it is not possible to further improve the objective function (i.e. when two consecutive iterates are identical, up to machine precision). When the interpolated VPINN is used, the training set consists of all the interpolation nodes ${x_{1}^{I}, \dots, x_{dim (U_{H})}^{I}}$ and no regularization is applied since the interpolation operator already filters the neural network high frequencies out. Instead, when the PINN is used, the training set contains a set of $dim (U_{H})$ control points inside the domain Ω, and when $M_{A}$ is employed, a set of approximately $\sqrt{dim (U_{H})}$ control points on the boundary ∂Ω. Moreover, in order to stabilize the PINN, the $L^{2}$ regularization term

L_{reg} (w^{NN}) = λ_{reg} {‖ w^{NN} ‖}_{2}^{2}

(4.1)

is added to the loss function, where $w^{NN}$ is the vector containing all the neural network weights defined in Section 2.1 and $λ_{reg} = 10^{- 6}$ . The value of this parameter has been chosen through several numerical experiments to minimize the $H^{1}$ norm of the error.

The computer code to perform the numerical experiments is written in Python, while the neural networks and the optimizers are implemented using the open-source Python package Tensorflow [43]. The loss function gradient with respect to the neural network weights and the PINN output gradient with respect to the spatial coordinates are always computed with automatic differentiation that is available in Tensorflow [44]. On the other hand, the VPINN output gradient with respect to its input is computed by means of suitable projection matrices as described in [36].

4.1. Rate of convergence for second-order elliptic problems

We focus on the VPINN model and show that the a priori error estimate proved in [36] for second-order elliptic problems holds even on varying the way in which the boundary conditions are imposed. On letting $x : = (x, y)$ , we consider problem (2.2) in the domain $Ω = {(- 1, 1)}^{2} ∖ {(0, 0.5)}^{2}$ with the physical parameters

μ (x) = 2 + \sin (x + 2 y), β (x) = {\sqrt{x - y^{2} + 5}, \sqrt{y - x^{2} + 5}}, σ (x) = e^{\frac{x}{2} - \frac{y}{3}} + 2 .

We consider two test cases. In the first one the Dirichlet boundary conditions and forcing term are chosen so that the exact solution is

u (x) = \cos (5 (x + y / 2)) + {(x + y / 2)}^{2},

(4.2)

whereas in the second one they are chosen such that the exact solution is more oscillatory. Its expression is:

u (x) = \sin [3 x (x - y)] \cos (4 y + x) + \sin [5 (x + 2 y)] \cos [3 (y - 2 x)] .

(4.3)

Such a solution is shown in Fig. 3a, whereas an example of numerical error corresponding to the VPINN in which Dirichlet boundary conditions are imposed using method $M_{B}$ is shown in Fig. 3b; it exhibits a rather uniform distribution of the error, which is not localized near boundaries. We remark that in these numerical tests and in the subsequent ones, the function $\overline{g}$ used in $M_{B}$ and $M_{C}$ is computed via transfinite interpolation.

Exact solution u (left) and a plot of the absolute error with VPINN and method M_B in which the Dirichlet boundary conditions are imposed on every edge of ∂Ω (right).

We vary both the order of the quadrature rule and the degree of the test functions, and train the same model with different meshes and impose the Dirichlet boundary conditions with the proposed approaches. In Figs. 4a–4c, 5a–5c and 6a–6c, in which the exact solution is the one in (4.2), we observe close agreement with the results shown in [36]. In fact, when the loss is properly minimized, all the approaches perform comparably and the corresponding empirical convergence rates are always close to the theoretical rate of $k_{int} = q + 2 - k_{test}$ . We point out that in [36] we prove that, when the solution is regular enough and a method similar to $M_{C}$ is used to enforce the boundary conditions, the convergence rate is $k_{int} = q + 2 - k_{test}$ . Here, instead we show that the same behavior is observed even if the boundary conditions are enforced in different ways. Note in particular that the choice $m = 1$ or $m = 2$ in $M_{B}$ , and the choice $γ = 0.1$ , $γ = 1$ or $γ = 10$ in $M_{D}$ yields nearly identical results (see Figure 5, Figure 6).

Error decay obtained with M_A and different values of λ. Forcing term and Dirichlet boundary conditions are set such that the exact solution is (4.2). The theoretical convergence rate is k_int. (a) Convergence rates: 4.04 (λ = 10³), 3.99 (λ = 1), 3.42 (λ = 10⁻³). (b) Convergence rates: 6.18 (λ = 10³), 6.00 (λ = 1), 5.52 (λ = 10⁻³). (c) Convergence rates: 4.50 (λ = 10³), 4.44 (λ = 1), 5.29 (λ = 10⁻³).

Error decay obtained with M_B, with different values of m, and M_C. Forcing term and Dirichlet boundary conditions are set such that the exact solution is (4.2). The theoretical convergence rate is k_int. (a) Convergence rates: 4.05 (M_B,m = 1), 4.05 (M_B,m = 2), 4.06 (M_C). (b) Convergence rates: 6.24 (M_B,m = 1), 6.25 (M_B,m = 2), 6.25 (M_C). (c) Convergence rates: 4.43 (M_B,m = 1), 4.43 (M_B,m = 2), 4.67 (M_C).

Error decay obtained with M_D and different values of γ. Forcing term and Dirichlet boundary conditions are set such that the exact solution is (4.2). The theoretical convergence rate is k_int. (a) Convergence rates: 3.98 (γ = 0.1), 3.89 (γ = 1), 4.39 (γ = 10). (b) Convergence rates: 6.45 (γ = 0.1), 5.69 (γ = 1), 6.90 (γ = 10). (c) Convergence rates: 4.46 (γ = 0.1), 4.43 (γ = 1), 4.43 (γ = 10).

We highlight that the different methods, while delivering similar empirical convergence rates with respect to mesh refinement, exhibit very different performance during training. To observe this phenomenon, let us train multiple identical neural networks on the same mesh but impose the Dirichlet boundary conditions in different ways. Here we only consider quadrature rules of order $q = 3$ and piecewise linear test functions. The values of the loss function and of the $H^{1}$ error prediction during training are presented in Figs. 7a and 7b, respectively. A vertical line separates the epochs where the ADAM optimizer is used from the ones where the BFGS optimizer is used.

Training loss (left) and H¹ error prediction (right) for the VPINN. The first 5000 epochs are performed with a standard ADAM optimizer, the remaining ones with the BFGS optimizer. The exact solution is given in (4.3).

It can be noted that the most efficient method is $M_{B}$ , as it converges faster and to a more accurate solution, while method $M_{D}$ is characterized by very fast convergence only when the BFGS optimizer is adopted. Such an optimizer is also crucial to train the VPINN with $M_{C}$ ; in fact the corresponding error does not decrease when the ADAM optimizer is used. Instead, the convergence obtained using method $M_{A}$ seems independent of the choice of the optimizer. It is important to remark that all the loss functions are decreasing even when the error is constant. This implies that there exist other sources of error that dominate and that a very small loss function does not ensure a very accurate solution; this phenomenon is also observed in Fig. 3 of [45] and is discussed in greater detail therein.

Note that, if we change the forcing term and Dirichlet boundary conditions to consider the more oscillatory exact solution in (4.3), some approaches do not ensure the theoretical convergence rate (see Figs. 8a–8c, 9a–9c and 10a–10c). In fact, in Fig. 8 it is evident that, in this case, large values of λ are required to properly enforce the Dirichlet boundary conditions. In Fig. 9, instead, we can observe that the VPINN trained with method $M_{C}$ is often inaccurate and the corresponding error decay is very noisy. The performance of methods $M_{B}$ and $M_{D}$ seems independent of the complexity of the forcing term and boundary conditions.

In order to show that interpolation acts as a stabilization, we fix a mesh and vary the number of layers and neurons of the neural network. The boundary conditions are imposed using method $M_{B}$ with $m = 2$ and the exact solution is the one in (4.2); the results are shown in Fig. 11. The number L of layers varies in ${2, 3, 4, 5}$ , whereas the number of neurons in each hidden layer belongs to the set ${1, 5, 10, 30, 50, 70, 100, 200, 500, 1000}$ . In Fig. 11a we show the performance of a non-interpolated VPINN trained with the $L^{2}$ regularization in (4.1), where $λ_{reg} = 10^{- 6}$ . It can be noted that the error is high when the neural network is small because of its poor approximation capability, and that it decreases with intermediate values of the two hyperparameters. However, when the neural networks contain more than 100 neurons in each layer the error increases because of uncontrolled spurious zero-energy modes and the fact that we are looking for good local minima in a very high-dimensional space. On the other hand, when the VPINN is interpolated and the neural network is sufficiently rich, the error is constant and independent of the network dimension (see Fig. 11b). In addition, note that the average accuracy of an interpolated VPINN is better than its non-interpolated counterpart.

4.2. Application to nonlinear parametric problem

Let us now extend our analysis to nonlinear and parametric PDEs. Since in the previous section we observed that method $M_{B}$ performs the best, in this example we do not consider $M_{C}$ and $M_{D}$ . We focus on the following problem:

{\begin{matrix} N (u; p) : = - \nabla \cdot (μ \nabla u) + β \cdot \nabla u + σ \sin (p u) u & = f in Ω = {(0, 1)}^{2}, \\ u & = g on Γ_{D} . \end{matrix}

(4.4)

It has been observed in [36] that considering constant or variable coefficients does not influence VPINN convergence. Hence, we choose $μ = 1$ , $β = [2, 3]$ , $σ = 4$ and assume that the exact solution is

u (x; p) = \sin (p π x) \sin (\frac{1}{p} π y),

(4.5)

where $p \in I_{p} = [0.5, 2]$ is a scalar parameter.

In order to train the VPINN to solve problem (4.4), we minimize

R_{h}^{2} (w) = \sum_{p \in I_{p}^{#}} [\sum_{i \in I_{h}} r_{h, i; p}^{2} (w) + λ \sum_{i = 1}^{N_{B}} {(w (x_{i}^{g}) - g (x_{i}^{g}; p))}^{2}]

when $M_{A}$ is used, or

R_{h}^{2} (B w) = \sum_{p \in I_{p}^{#}} \sum_{i \in I_{h}} r_{h, i; p}^{2} (B w)

when $M_{B}$ is used instead. Here $I_{p}^{#} = {p_{1}, \dots, p_{N_{p}^{train}}} \subset I_{p}$ is a finite set of parameter values and $r_{h, i; p}$ is the residual obtained using the i-th test function and the parameter p. In the numerical computations, we use $N_{p}^{train} = 13$ and the VPINN is trained with $q = 3$ and $k_{test} = 1$ .

In Fig. 12, we report the behavior of the loss function and the average $H^{1}$ error:

\frac{1}{N_{p}^{test}} \sum_{i = 1}^{N_{p}^{test}} {‖ u (\cdot; p_{i}) - u^{NN} (\cdot; p_{i}) ‖}_{H^{1} (Ω)},

where $N_{p}^{test} = 100$ . It is noted that the loss functions behave qualitatively similarly (see Fig. 12a). On the other hand, higher values of λ lead to lower errors when $M_{A}$ is adopted, but the most stable and accurate approach remains $M_{B}$ .

(a) Training loss and (b) H¹ error prediction for the VPINN. The first 10000 epochs are performed with a standard ADAM optimizer, the remaining ones with the BFGS optimizer. The exact solution is given in (4.5).

Moreover, when the VPINN is trained, it can be evaluated at arbitrary locations in the parameter domain $I_{p}$ , yielding the error plot shown in Fig. 13. Therein, for each dot and for each point belonging to the solid lines, given the parameter value $\hat{p}$ represented on the horizontal axis, its value on the vertical axis represents the $H^{1}$ error between the VPINN solution and the exact solution $u (\cdot; \hat{p})$ . Note that dots are associated with parameter values that are chosen in $I_{p}^{#}$ during the training, whereas solid lines are the predictions to assess the accuracy of the models for intermediate values of p. Such lines thus show the $H^{1}$ error for values of the parameter not used during the training.

H¹ error for different parameter values in problem (4.4).

4.3. Deformation of an elastic body

We consider the deformation of a linear elastic solid in the region $Ω_{L} = {(- 1, 1)}^{2} ∖ {[- 1, 0]}^{2}$ , which is subjected to a body force field f and Dirichlet boundary conditions imposed on $Γ_{D} = \partial Ω$ . The elastostatic boundary-value problem is:

{\begin{matrix} - \nabla \cdot σ = f & in Ω_{L}, & (a) \\ ε = \frac{1}{2} (\nabla u + (\nabla u^{T})) & in Ω_{L}, & (b) \\ σ = 2 μ ε + λ trace (ε) I & in Ω_{L}, & (c) \\ u = g & on Γ_{D} . & (d) \end{matrix}

(4.6)

In (4.6), $σ : = σ (u)$ is the Cauchy stress tensor, $ε : = ε (u)$ is the small strain tensor and (4.6c) is the isotropic linear elastic constitutive relation. The Lamé parameters λ and μ are related to the Young modulus E and the Poisson ratio ν via

μ = \frac{E}{2 (1 + ν)}, λ = \frac{E ν}{(1 + ν) (1 - 2 ν)} .

For the numerical experiments, we choose $E = 117$ , $ν = 1 / 3$ and the following body force field and boundary data:

f = (μ + λ) [x e^{y}, y \sqrt{x + 2}], g = [\sin (π (x + y)), e^{x - y}] x y .

The variational formulation of problem (4.6a) reads as: Find $u \in \overline{u} + {(H_{0}^{1} (Ω))}^{2}$ such that:

\int_{Ω} σ (u) : ε (v) = \int_{Ω} f v \forall v \in {(H_{0}^{1} (Ω))}^{2},

where $\overline{u} = g$ is the natural lifting of the boundary data. Such a formulation is used to compute the quantity $R_{h}^{2}$ in (2.8), where the modified residuals

r_{h, i} (w) = \int_{Ω} f φ_{i}^{v} - \int_{Ω} σ (w) : ε (φ_{i}^{v}), i \in I_{h},

replace the ones defined in (2.7). For this and the subsequent test cases, we will also provide a comparison with the results obtained by a PINN, in order to give a more complete view of the performance of the methods. The modified residuals required in the PINN loss function are defined as:

r_{i}^{PINN} (u) = \nabla \cdot σ (x_{i}) + f (x_{i}) \forall i = 1, 2, \dots, N_{I} .

Since the exact solution is not known, we produce a very accurate numerical solution for comparison (shown in Figs. 14a and 14b), using the open-source FEM solver FEniCS [46].

Reference finite element displacement field solution for problem (4.6). The x-component (left) and y-component (right) of u^h are shown.

Problem (4.6) is solved by training a VPINN on the mesh shown in Fig. 1c with $q = 3$ and $k_{test} = 1$ . Then, in order to compare the performance of PINN and VPINN, a standard PINN is trained to solve the same problem. In order to verify if the distribution of the collocation points affects the PINN accuracy, we firstly train it by choosing as collocation points the interpolation nodes used in the VPINN training, and then we train it with the same number of uniformly distributed collocation points.

For these three methods we analyze the $H^{1}$ error during the neural network training for a fixed training set dimension; we report the results in Figs. 15a–15c. Observing that Figs. 15b and 15c are very similar, we deduce that, in this case, the choice of control points in the PINN training is not strictly related to the efficacy of the different approaches.

H¹ error decay during the neural network training when solving problem (4.6). (a) VPINN error: H¹ error of the most accurate solution is 0.020; (b) PINN error: model is trained with collocation points distributed on a Delaunay mesh and the H¹ error of the most accurate solution is 0.070; and (c) PINN error: model is trained with collocation points from a uniform distribution and the H¹ error of the most accurate solution is 0.047. The legend in (a) also applies to (b) and (c).

It can be observed that method $M_{B}$ is always the most efficient approach and leads to convergence to more accurate solutions. Exactly imposing the Dirichlet boundary conditions via $M_{C}$ can be considered a good alternative since the solutions at convergence obtained with the VPINN and the PINN trained with random control points are very similar to the ones computed using $M_{B}$ , although the convergence is slower. The most commonly used approach, $M_{A}$ , is instead dependent on the choice of the non-trainable parameter λ. In this case, large values of λ ensure accurate solutions and acceptably efficient training phases, but the correct values are problem dependent and can be often found only after a potentially expensive tuning. Indeed, choosing the wrong values of λ can ruin the efficiency and the accuracy of the method, as it can be observed in Fig. 15 when $λ = 10^{- 3}$ or $λ = 1$ . We also highlight that the performance of method $M_{D}$ is very similar to method $M_{A}$ when reasonable values of λ are chosen.

4.4. Stabilized Eikonal equation

In this section we consider the stabilized Eikonal equation, which is a nonlinear second-order PDE and reads as:

{\begin{matrix} - ε Δ u + {‖ \nabla u ‖}_{2} = f & in Ω_{L}, \\ u = g & on Γ_{D}, \end{matrix}

(4.7)

where ε is a small positive constant. Note that when $ε = 0$ , $f = 1$ and $g = 0$ , the exact solution is the distance function to the boundary and the problem can be efficiently solved by the fast sweeping method [47] or by the fast marching method [48]. In our numerical computations we set $f = 1$ and $g = 0$ and we introduce a weak diffusivity with $ε = 0.1$ to guarantee uniqueness of the solution.

The PINN and VPINN residuals associated with problem (4.7) that extend the residuals in (2.3) and (2.7), respectively, are defined as:

r_{i}^{PINN} (w) = - ε Δ w (x_{i}) + {‖ \nabla w (x_{i}) ‖}_{2} - f (x_{i}) \forall i = 1, \dots, N_{I},

and

r_{h, i} (w) = \int_{Ω} f φ_{i}^{v} - \int_{Ω} ε \nabla w \nabla φ_{i}^{v} - \int_{Ω} {‖ \nabla w ‖}_{2} φ_{i}^{v}, i \in I_{h} .

We compute the VPINN and PINN numerical solutions as described in Section 4.3 and compute the corresponding $H^{1}$ errors using a finite element reference solution that is computed on a much finer mesh (see Fig. 16).

Reference finite element solution for problem (4.7).

As in Fig. 15, in Fig. 17 we show the decay of the $H^{1}$ error during the training for the different methods. Again, it can be noted that the most accurate method is always $M_{B}$ ; $M_{A}$ is a valid alternative provided λ is properly chosen. However, when the value of λ is not suitably chosen, convergence can be completely ruined (see, for instance, the curves associated with $λ = 10^{- 3}$ in Figs. 17b and 17c) or a second-order optimizer is required to retain convergence (see all the curves computed with $λ = 10^{3}$ ). Moreover, similar convergence issues are present when $M_{C}$ or $M_{D}$ are employed.

H¹ error decay during the neural network training when solving problem (4.7). (a) VPINN error: H¹ error of the most accurate solution is 0.021; (b) PINN error: model is trained with collocation points distributed on a Delaunay mesh and the H¹ error of the most accurate solution is 0.085; and (c) PINN error: model is trained with collocation points from a uniform distribution and the H¹ error of the most accurate solution is 0.029. The legend in (a) also applies to (b) and (c).

4.5. One-dimensional convection problem

As a final example, we consider a one-dimensional convection problem on the space-time domain $Ω : = Ω_{x} \times Ω_{t} = [0, 1] \times [0, 1]$ . As discussed in [49], when solving such a hyperbolic PDE with PINN, possible failure modes may arise due to the very complex loss landscape. The model problem reads as:

{\begin{matrix} \frac{\partial u}{\partial t} + β \frac{\partial u}{\partial x} = 0, & \forall x \in Ω_{x} = [0, 1], t \in Ω_{t} = [0, 1], \\ u (0, t) = g (t) & \forall t \in Ω_{t}, \\ u (x, 0) = h (x) & \forall x \in Ω_{x} . \end{matrix}

(4.8)

Let us consider the boundary condition $g (t) = - \sin (β t)$ and the initial condition $h (x) = \sin (x)$ . The corresponding exact solution is $u (x, t) = \sin (x - β t)$ . We solve problem (4.8) with the convection coefficient $β = 30$ .

Given a set of collocation points $(x_{i}, t_{i}) \in Ω$ , $i = 1, \dots, N_{I}$ and a suitable set of space-time test functions $V_{h} : = span {φ_{i}^{v} = φ_{i}^{v} (x, t) : i \in I_{h}}$ , the PINN and VPINN residuals that are used to train the models are given by

r_{i}^{PINN} (w) = \frac{\partial w}{\partial t} (x_{i}, t_{i}) + β \frac{\partial w}{\partial x} (x_{i}, t_{i}) \forall i = 1, 2, \dots, N_{I}

and

r_{h, i} (w) = \int_{Ω} [\frac{\partial w}{\partial t} + β \frac{\partial w}{\partial x}] φ_{i}^{v}, i \in I_{h},

respectively. When the boundary conditions are exactly imposed (i.e., when $M_{B}$ or $M_{C}$ are used), the function $ϕ = ϕ (x, t)$ is constructed as $ϕ (x, t) : = ϕ_{x} (x) ϕ_{t} (t)$ , where $ϕ_{t} (t) = t$ and $ϕ_{x} (x)$ is a function that vanishes on the Dirichlet boundary of $Ω_{x}$ . Note that, due to the simplicity of the spatial domain $Ω_{x}$ , there is no reason to distinguish between $M_{B}$ and $M_{C}$ . Therefore, we just consider the function $ϕ_{x} (x) = x$ in both approaches.

The numerical results obtained using the different approaches are presented in Fig. 18. In Fig. 18a, problem (4.8) is solved with the VPINN method. In this case, $M_{A}$ is slightly more accurate and efficient than $M_{B}$ (or $M_{C}$ since they coincide) if λ is chosen properly. However, when the value of λ is not optimal, the solution is significantly less accurate. Once more, method $M_{D}$ is not competitive with the other approaches. On the other hand, when PINN is considered, exactly imposing the boundary conditions ensures better accuracy and efficiency than using $M_{A}$ , regardless of the value of λ (see Figs. 18b and 18c).

H¹ error decay during the neural network training when solving problem (4.8). (a) VPINN error: H¹ error of the most accurate solution is 0.077; (b) PINN error: model is trained with collocation points distributed on a Delaunay mesh and the H¹ error of the most accurate solution is 0.125; and (c) PINN error: model is trained with collocation points from a uniform distribution and the H¹ error of the most accurate solution is 0.051. The legend in (a) also applies to (b) and (c).

5. Conclusions

In this paper, we analyzed the formulation and the performance of four different approaches to enforce Dirichlet boundary conditions in PINNs and VPINNs on arbitrary polygonal domains. In the first approach, which is the most commonly used when training PINNs, the boundary conditions are imposed by means of additional terms in the loss function that penalize the discrepancy between the neural network output and the prescribed boundary conditions. The subsequent two approaches exactly enforce the boundary conditions and differ in the way they modify the model output in order to force it to satisfy the desired conditions. The last approach, which can be used only when the loss function is derived from the weak formulation of the PDE, is based on Nitsche's method and enforces the boundary conditions variationally.

We have shown that $M_{B}$ and $M_{D}$ , in the considered second-order elliptic PDEs, always ensure the theoretically predicted convergence rate with respect to mesh refinement, regardless of the value of the involved parameter. Instead, method $M_{A}$ and $M_{C}$ ensure it only if the exact solution is not characterized by an intense oscillatory behavior.

In general, we observed that the most efficient and accurate approach is the one introduced in [38] (method $M_{B}$ ), which is based on the use of a class of approximate distance functions. A variant of this approach (method $M_{C}$ ) leads to suboptimal results and may even ruin the convergence of the method (as in Fig. 17c). Imposing the boundary conditions via additional cost (method $M_{A}$ ) can be considered a valid alternative, but the choice of the additional penalization parameter is crucial because wrong values can prevent convergence to the correct solution or dramatically slow down the training. In the proposed numerical experiments we fixed the penalization parameter. As discussed in the Introduction, we highlight that it is possible to tune it during training, but we chose to fix it in order to compare non-intrusive methods with simple implementations. Finally, we observed that Nitsche's method (method $M_{D}$ ) is in some cases similar to $M_{A}$ with an acceptable value of λ, while in other cases requires a second-order optimizer to converge to the correct solution.

Among possible extensions of this work, we mention applications to high-dimensional PDEs over complex geometries, where we expect methods $M_{B}$ and $M_{C}$ to be even more efficient than their alternatives. In fact, such methods can enforce the correct conditions on each portion of the boundary, whereas methods $M_{A}$ and $M_{D}$ are likely to be less robust and efficient.

CRediT authorship contribution statement

S. Berrone; C. Canuto; N. Sukumar: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

M. Pintore: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

CC, SB and MP performed this research in the framework of the Italian MIUR Award “Dipartimenti di Eccellenza 2018-2022” granted to the Department of Mathematical Sciences, Politecnico di Torino (CUP: E11G18000350001). The research leading to this paper has also been partially supported by the SmartData@PoliTO center for Big Data and Machine Learning technologies. SB was supported by the Italian MIUR PRIN Project 201744KLJL-004, CC was supported by the Italian MIUR PRIN Project 201752HKH8-003. CC, SB and MP are members of the Italian INdAM-GNCS research group.

Contributor Information

S. Berrone, Email: stefano.berrone@polito.it.

C. Canuto, Email: claudio.canuto@polito.it.

M. Pintore, Email: moreno.pintore@polito.it.

N. Sukumar, Email: nsukumar@ucdavis.edu.

Appendix A. On the Laplacian of the approximate distance function

In [38], the issue of the blowing-up of the Laplacian of ϕ in (3.4) is discussed. Herein, we illustrate the same for the simple setting in which $Γ_{D}$ is composed of two edges that intersect.

Let Ω be the non-negative quadrant ${(x, y) : x \geq 0, y \geq 0}$ and let us consider the (semi-infinite) segments $s_{1} = {(x, 0) : x \geq 0}$ and $s_{2} = {(0, y) : y \geq 0}$ . For the sake of simplicity, we consider the exact distance functions $ϕ_{1} (x, y) = x$ and $ϕ_{2} (x, y) = y$ . Let us compute the ADF of order $m = 1$ to the boundary $Γ_{D} = s_{1} \cup s_{2}$ . Substituting $m = 1$ in (3.4), ϕ can be written as:

ϕ = \frac{ϕ_{1} ϕ_{2}}{ϕ_{1} + ϕ_{2}} = \frac{x y}{x + y} .

The gradient of ϕ is:

\nabla ϕ = {[\frac{\partial ϕ}{\partial x}, \frac{\partial ϕ}{\partial y}]}^{T} = {[\frac{(x + y) y - x y}{{(x + y)}^{2}}, \frac{(x + y) x - x y}{{(x + y)}^{2}}]}^{T} = {[\frac{y^{2}}{{(x + y)}^{2}}, \frac{x^{2}}{{(x + y)}^{2}}]}^{T} .

Note that the gradient is bounded on $Γ_{D}$ , and in particular:

{\frac{\partial ϕ}{\partial x} |}_{x = 0} = 1, {\frac{\partial ϕ}{\partial y} |}_{y = 0} = 1, \nabla ϕ |_{y = α x} = {[\frac{α^{2}}{{(1 + α)}^{2}}, \frac{1}{{(1 + α)}^{2}}]}^{T},

where α is a strictly positive constant, i.e. ϕ is an ADF of order 1 and ∇ϕ is bounded along any straight line intersecting the origin and entering inside the domain. The Laplacian of ϕ is:

Δ ϕ = \frac{\partial^{2} ϕ}{\partial x^{2}} + \frac{\partial^{2} ϕ}{\partial y^{2}} = \frac{\partial}{\partial x} [\frac{y^{2}}{{(x + y)}^{2}}] + \frac{\partial}{\partial y} [\frac{x^{2}}{{(x + y)}^{2}}] = - 2 \frac{x^{2} + y^{2}}{{(x + y)}^{3}} .

Consider the limit at $(0, 0)$ along the line $y = α x$ , for $α \geq 0$ :

\lim_{x, y \to 0} Δ ϕ = \lim_{x \to 0} - 2 \frac{x^{2} + {(α x)}^{2}}{{(x + (α x))}^{3}} = \lim_{x \to 0} - 2 \frac{(1 + α^{2}) x^{2}}{{(1 + α)}^{3} x^{3}} = \lim_{x \to 0} - 2 \frac{(1 + α^{2})}{{(1 + α)}^{3}} \frac{1}{x} = - \infty .

Therefore, Δϕ is unbounded at the origin.

For $m = 2$ , the function ϕ is:

ϕ = \frac{1}{\sqrt{\frac{1}{ϕ_{1}^{2}} + \frac{1}{ϕ_{2}^{2}}}} = \frac{ϕ_{1} ϕ_{2}}{\sqrt{ϕ_{1}^{2} + ϕ_{2}^{2}}} = \frac{x y}{\sqrt{x^{2} + y^{2}}} .

Its gradient is:

\nabla ϕ = {[\frac{y}{\sqrt{x^{2} + y^{2}}} - \frac{x^{2} y}{{(x^{2} + y^{2})}^{3 / 2}}, \frac{x}{\sqrt{x^{2} + y^{2}}} - \frac{x y^{2}}{{(x^{2} + y^{2})}^{3 / 2}}]}^{T},

which in polar coordinates ( $x = ρ \cos (θ)$ , $y = ρ \sin (θ)$ ) is expressed as:

\nabla ϕ = {[\frac{ρ \sin (θ)}{ρ} - \frac{ρ^{3} \cos^{2} (θ) \sin (θ)}{ρ^{3}}, \frac{ρ \cos (θ)}{ρ} - \frac{ρ^{3} \cos (θ) \sin^{2} (θ)}{ρ^{3}}]}^{T}, = {[\sin (θ) - \cos^{2} (θ) \sin (θ), \cos (θ) - \cos (θ) \sin^{2} (θ)]}^{T} .

In polar coordinates, we can write

\frac{\partial^{2} ϕ}{\partial x^{2}} = 3 \frac{x^{3} y}{{(x^{2} + y^{2})}^{5 / 2}} - 3 \frac{x y}{{(x^{2} + y^{2})}^{3 / 2}} = 3 \frac{ρ^{4} \cos^{3} (θ) \sin (θ)}{ρ^{5}} - 3 \frac{ρ^{2} \cos (θ) \sin (θ)}{ρ^{3}} = 3 \frac{\cos (θ) \sin (θ)}{ρ} [\cos^{2} (θ) - 1] = - 3 \frac{\cos (θ) \sin^{3} (θ)}{ρ} .

Similarly,

\frac{\partial^{2} ϕ}{\partial y^{2}} = - 3 \frac{\cos^{3} (θ) \sin (θ)}{ρ}

holds, which implies:

Δ ϕ = \frac{\partial^{2} ϕ}{\partial x^{2}} + \frac{\partial^{2} ϕ}{\partial y^{2}} = - 3 \frac{\cos (θ) \sin (θ)}{ρ} .

Thus, as in the case $m = 1$ , $Δ ϕ \to - \infty$ when $ρ \to 0$ .

We point out that for $θ = π / 2$ and $θ = 0$ , respectively, we note that:

{\frac{\partial ϕ}{\partial x} |}_{x = 0} = 1, {\frac{\partial ϕ}{\partial y} |}_{y = 0} = 1, {\frac{\partial^{2} ϕ}{\partial x^{2}} |}_{x = 0, y > 0} = 0, {\frac{\partial^{2} ϕ}{\partial y^{2}} |}_{y = 0, x > 0} = 0 .

Therefore, ϕ is an ADF that is normalized up to order 2.

Data availability statement

Data will be made available on request.

References

1.Raissi M., Perdikaris P., Karniadakis G. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019;378:686–707. [Google Scholar]
2.Lagaris I.E., Likas A., Fotiadis D.I. Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 1997;104:1–14. [Google Scholar]
3.Lagaris I.E., Likas A., Fotiadis D.I. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 1998;9:987–1000. doi: 10.1109/72.712178. [DOI] [PubMed] [Google Scholar]
4.Lagaris I.E., Likas A.C., Papageorgiou D.G. Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 2000;11:1041–1049. doi: 10.1109/72.870037. [DOI] [PubMed] [Google Scholar]
5.Chen Y., Lu L., Karniadakis G.E., Negro L.D. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Opt. Express. 2020;28:11618–11633. doi: 10.1364/OE.384875. [DOI] [PubMed] [Google Scholar]
6.Guo Q., Zhao Y., Lu C., Luo J. High-dimensional inverse modeling of hydraulic tomography by physics informed neural network (HT-PINN) J. Hydrol. 2023;616 [Google Scholar]
7.Mishra S., Molinaro R. Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 2021 [Google Scholar]
8.Gao H., Sun L., Wang J.-X. PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 2021;428 [Google Scholar]
9.Han J., Jentzen A., Weinan E. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 2018;115:8505–8510. doi: 10.1073/pnas.1718942115. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lanthaler S., Mishra S., Karniadakis G.E. Error estimates for DeepONets: a deep learning framework in infinite dimensions. Trans. Math. Appl. 2022;6 [Google Scholar]
11.Jiang X., Wang D., Fan Q., Zhang M., Lu C., Tao Lau A.P. 2021 Optical Fiber Communications Conference and Exhibition (OFC) 2021. Solving the nonlinear Schrödinger equation in optical fibers using physics-informed neural network; pp. 1–3. [Google Scholar]
12.Chen Z., Liu Y., Sun H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 2021;12:1–13. doi: 10.1038/s41467-021-26434-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Weinan E., Yu B. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018;6:1–12. [Google Scholar]
14.Jagtap A.D., Kharazmi E., Karniadakis G.E. Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 2020;365 [Google Scholar]
15.Jagtap A.D., Karniadakis G.E. Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 2020;28:2002–2041. [Google Scholar]
16.Kharazmi E., Zhang Z., Karniadakis G. VPINNs: variational physics-informed neural networks for solving partial differential equations. 2019. arXiv:1912.00873 arXiv preprint.
17.Kharazmi E., Zhang Z., Karniadakis G. hp-VPINNs: variational physics-informed neural networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 2021;374 [Google Scholar]
18.De Ryck T., Jagtap A., Mishra S. Error estimates for physics-informed neural networks approximating the Navier-Stokes equations. IMA J. Numer. Anal. 2023 [Google Scholar]
19.De Ryck T., Mishra S. Error analysis for physics informed neural networks (PINNs) approximating Kolmogorov PDEs. Adv. Comput. Math. 2022;48 [Google Scholar]
20.Demo N., Strazzullo M., Rozza G. An extended physics informed neural network for preliminary analysis of parametric optimal control problems. 2021. arXiv:2110.13530 arXiv preprint.
21.Hu R., Lin Q., Raydan A., Tang S. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. 2022. arXiv:2209.11929 arXiv preprint.
22.Pu J., Li J., Chen Y. Solving localized wave solutions of the derivative nonlinear Schrödinger equation using an improved PINN method. Nonlinear Dyn. 2021;105:1723–1739. [Google Scholar]
23.Sirignano J., Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018;375:1339–1364. [Google Scholar]
24.Tartakovsky A., Marrero C., Perdikaris P., Tartakovsky G., Barajas-Solano D. Learning parameters and constitutive relationships with physics informed deep neural networks. 2018. arXiv:1808.03398 arXiv preprint.
25.Yang L., Meng X., Karniadakis G. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021;425 [Google Scholar]
26.Zhu Y., Zabaras N., Koutsourelakis P., Perdikaris P. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 2019;394:56–81. [Google Scholar]
27.Beck C., Hutzenthaler M., Jentzen A., Kuckuck B. An overview on deep learning-based approximation methods for partial differential equations. Discrete Contin. Dyn. Syst., Ser. B. 2022 [Google Scholar]
28.Cuomo S., Di Cola V.S., Giampaolo F., Rozza G., Raissi M., Piccialli F. Scientific machine learning through physics-informed neural networks: where we are and what's next. J. Sci. Comput. 2022;92 [Google Scholar]
29.Lawal Z., Yassin H., Lai D., Che Idris A. Physics-informed neural network (PINN) evolution and beyond: a systematic literature review and bibliometric analysis. Big Data Cogn. Comput. 2022;6 [Google Scholar]
30.McClenny L.D., Braga-Neto U.M. Self-adaptive physics-informed neural networks. J. Comput. Phys. 2023;474 [Google Scholar]
31.Wang S., Teng Y., Perdikaris P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 2021;43:A3055–A3081. [Google Scholar]
32.Wight C.L., Zhao J. Solving Allen-Cahn and Cahn-Hilliard equations using the adaptive physics informed neural networks. Commun. Comput. Phys. 2021;29:930–954. [Google Scholar]
33.Tang K., Wan X., Liao Q. Adaptive deep density approximation for Fokker-Planck equations. J. Comput. Phys. 2022;457 [Google Scholar]
34.Feng X., Zeng L., Zhou T. Solving time dependent Fokker-Planck equations via temporal normalizing flow. 2021. arXiv:2112.14012 arXiv preprint.
35.Wang S., Yu X., Perdikaris P. When and why pinns fail to train: a neural tangent kernel perspective. J. Comput. Phys. 2022;449 [Google Scholar]
36.Berrone S., Canuto C., Pintore M. Variational physics informed neural networks: the role of quadratures and test functions. J. Sci. Comput. 2022;92:1–27. [Google Scholar]
37.Taylor J.M., Pardo D., Muga I. A deep Fourier residual method for solving PDEs using neural networks. Comput. Methods Appl. Mech. Eng. 2023;405 [Google Scholar]
38.Sukumar N., Srivastava A. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Comput. Methods Appl. Mech. Eng. 2022;389 [Google Scholar]
39.Biswas A., Shapiro V. Approximate distance fields with non-vanishing gradients. Graph. Models. 2004;66:133–159. [Google Scholar]
40.Nitsche J.A. Uber ein Variationsprinzip zur Losung Dirichlet-Problemen bei Verwendung von Teilraumen, die keinen Randbedingungen unteworfen sind. Abh. Math. Semin. Univ. Hamb. 1971;36:9–15. [Google Scholar]
41.Kingma D.P., Ba J. Adam: a method for stochastic optimization. 2014. arXiv:1412.6980 arXiv preprint.
42.Wright S., Nocedal J., et al. vol. 35. 1999. Numerical Optimization. [Google Scholar]
43.Abadi M., et al. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. http://tensorflow.org/ software available from tensorflow.org.
44.Baydin A.G., Pearlmutter B.A., Radul A.A., Siskind J.M. Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 2018;18 [Google Scholar]
45.Berrone S., Canuto C., Pintore M. Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis. Ann. Univ. Ferrara. 2022;68:575–595. [Google Scholar]
46.Alnaes M.S., Blechta J., Hake J., Johansson A., Kehlet B., Logg A., Richardson C., Ring J., Rognes M.E., Wells G.N. The FEniCS project version 1.5. Arch. Numer. Softw. 2015;3 [Google Scholar]
47.Zhao H. A fast sweeping method for Eikonal equations. Math. Comput. 2005;74:603–627. [Google Scholar]
48.Sethian J.A. vol. 3. Cambridge University Press; 1999. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. [Google Scholar]
49.Krishnapriyan A., Gholami A., Zhe S., Kirby R., Mahoney M.W. Characterizing possible failure modes in physics-informed neural networks. Adv. Neural Inf. Process. Syst. 2021;34:26548–26560. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.

[br0010] 1.Raissi M., Perdikaris P., Karniadakis G. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019;378:686–707. [Google Scholar]

[br0020] 2.Lagaris I.E., Likas A., Fotiadis D.I. Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 1997;104:1–14. [Google Scholar]

[br0030] 3.Lagaris I.E., Likas A., Fotiadis D.I. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 1998;9:987–1000. doi: 10.1109/72.712178. [DOI] [PubMed] [Google Scholar]

[br0040] 4.Lagaris I.E., Likas A.C., Papageorgiou D.G. Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 2000;11:1041–1049. doi: 10.1109/72.870037. [DOI] [PubMed] [Google Scholar]

[br0050] 5.Chen Y., Lu L., Karniadakis G.E., Negro L.D. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Opt. Express. 2020;28:11618–11633. doi: 10.1364/OE.384875. [DOI] [PubMed] [Google Scholar]

[br0060] 6.Guo Q., Zhao Y., Lu C., Luo J. High-dimensional inverse modeling of hydraulic tomography by physics informed neural network (HT-PINN) J. Hydrol. 2023;616 [Google Scholar]

[br0070] 7.Mishra S., Molinaro R. Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 2021 [Google Scholar]

[br0080] 8.Gao H., Sun L., Wang J.-X. PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 2021;428 [Google Scholar]

[br0090] 9.Han J., Jentzen A., Weinan E. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 2018;115:8505–8510. doi: 10.1073/pnas.1718942115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0100] 10.Lanthaler S., Mishra S., Karniadakis G.E. Error estimates for DeepONets: a deep learning framework in infinite dimensions. Trans. Math. Appl. 2022;6 [Google Scholar]

[br0110] 11.Jiang X., Wang D., Fan Q., Zhang M., Lu C., Tao Lau A.P. 2021 Optical Fiber Communications Conference and Exhibition (OFC) 2021. Solving the nonlinear Schrödinger equation in optical fibers using physics-informed neural network; pp. 1–3. [Google Scholar]

[br0120] 12.Chen Z., Liu Y., Sun H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 2021;12:1–13. doi: 10.1038/s41467-021-26434-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0130] 13.Weinan E., Yu B. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018;6:1–12. [Google Scholar]

[br0140] 14.Jagtap A.D., Kharazmi E., Karniadakis G.E. Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 2020;365 [Google Scholar]

[br0150] 15.Jagtap A.D., Karniadakis G.E. Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 2020;28:2002–2041. [Google Scholar]

[br0160] 16.Kharazmi E., Zhang Z., Karniadakis G. VPINNs: variational physics-informed neural networks for solving partial differential equations. 2019. arXiv:1912.00873 arXiv preprint.

[br0170] 17.Kharazmi E., Zhang Z., Karniadakis G. hp-VPINNs: variational physics-informed neural networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 2021;374 [Google Scholar]

[br0180] 18.De Ryck T., Jagtap A., Mishra S. Error estimates for physics-informed neural networks approximating the Navier-Stokes equations. IMA J. Numer. Anal. 2023 [Google Scholar]

[br0190] 19.De Ryck T., Mishra S. Error analysis for physics informed neural networks (PINNs) approximating Kolmogorov PDEs. Adv. Comput. Math. 2022;48 [Google Scholar]

[br0200] 20.Demo N., Strazzullo M., Rozza G. An extended physics informed neural network for preliminary analysis of parametric optimal control problems. 2021. arXiv:2110.13530 arXiv preprint.

[br0210] 21.Hu R., Lin Q., Raydan A., Tang S. Higher-order error estimates for physics-informed neural networks approximating the primitive equations. 2022. arXiv:2209.11929 arXiv preprint.

[br0220] 22.Pu J., Li J., Chen Y. Solving localized wave solutions of the derivative nonlinear Schrödinger equation using an improved PINN method. Nonlinear Dyn. 2021;105:1723–1739. [Google Scholar]

[br0230] 23.Sirignano J., Spiliopoulos K. DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018;375:1339–1364. [Google Scholar]

[br0240] 24.Tartakovsky A., Marrero C., Perdikaris P., Tartakovsky G., Barajas-Solano D. Learning parameters and constitutive relationships with physics informed deep neural networks. 2018. arXiv:1808.03398 arXiv preprint.

[br0250] 25.Yang L., Meng X., Karniadakis G. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021;425 [Google Scholar]

[br0260] 26.Zhu Y., Zabaras N., Koutsourelakis P., Perdikaris P. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 2019;394:56–81. [Google Scholar]

[br0270] 27.Beck C., Hutzenthaler M., Jentzen A., Kuckuck B. An overview on deep learning-based approximation methods for partial differential equations. Discrete Contin. Dyn. Syst., Ser. B. 2022 [Google Scholar]

[br0280] 28.Cuomo S., Di Cola V.S., Giampaolo F., Rozza G., Raissi M., Piccialli F. Scientific machine learning through physics-informed neural networks: where we are and what's next. J. Sci. Comput. 2022;92 [Google Scholar]

[br0290] 29.Lawal Z., Yassin H., Lai D., Che Idris A. Physics-informed neural network (PINN) evolution and beyond: a systematic literature review and bibliometric analysis. Big Data Cogn. Comput. 2022;6 [Google Scholar]

[br0300] 30.McClenny L.D., Braga-Neto U.M. Self-adaptive physics-informed neural networks. J. Comput. Phys. 2023;474 [Google Scholar]

[br0310] 31.Wang S., Teng Y., Perdikaris P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 2021;43:A3055–A3081. [Google Scholar]

[br0320] 32.Wight C.L., Zhao J. Solving Allen-Cahn and Cahn-Hilliard equations using the adaptive physics informed neural networks. Commun. Comput. Phys. 2021;29:930–954. [Google Scholar]

[br0330] 33.Tang K., Wan X., Liao Q. Adaptive deep density approximation for Fokker-Planck equations. J. Comput. Phys. 2022;457 [Google Scholar]

[br0340] 34.Feng X., Zeng L., Zhou T. Solving time dependent Fokker-Planck equations via temporal normalizing flow. 2021. arXiv:2112.14012 arXiv preprint.

[br0350] 35.Wang S., Yu X., Perdikaris P. When and why pinns fail to train: a neural tangent kernel perspective. J. Comput. Phys. 2022;449 [Google Scholar]

[br0360] 36.Berrone S., Canuto C., Pintore M. Variational physics informed neural networks: the role of quadratures and test functions. J. Sci. Comput. 2022;92:1–27. [Google Scholar]

[br0370] 37.Taylor J.M., Pardo D., Muga I. A deep Fourier residual method for solving PDEs using neural networks. Comput. Methods Appl. Mech. Eng. 2023;405 [Google Scholar]

[br0380] 38.Sukumar N., Srivastava A. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Comput. Methods Appl. Mech. Eng. 2022;389 [Google Scholar]

[br0390] 39.Biswas A., Shapiro V. Approximate distance fields with non-vanishing gradients. Graph. Models. 2004;66:133–159. [Google Scholar]

[br0400] 40.Nitsche J.A. Uber ein Variationsprinzip zur Losung Dirichlet-Problemen bei Verwendung von Teilraumen, die keinen Randbedingungen unteworfen sind. Abh. Math. Semin. Univ. Hamb. 1971;36:9–15. [Google Scholar]

[br0410] 41.Kingma D.P., Ba J. Adam: a method for stochastic optimization. 2014. arXiv:1412.6980 arXiv preprint.

[br0420] 42.Wright S., Nocedal J., et al. vol. 35. 1999. Numerical Optimization. [Google Scholar]

[br0430] 43.Abadi M., et al. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. http://tensorflow.org/ software available from tensorflow.org.

[br0440] 44.Baydin A.G., Pearlmutter B.A., Radul A.A., Siskind J.M. Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 2018;18 [Google Scholar]

[br0450] 45.Berrone S., Canuto C., Pintore M. Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis. Ann. Univ. Ferrara. 2022;68:575–595. [Google Scholar]

[br0460] 46.Alnaes M.S., Blechta J., Hake J., Johansson A., Kehlet B., Logg A., Richardson C., Ring J., Rognes M.E., Wells G.N. The FEniCS project version 1.5. Arch. Numer. Softw. 2015;3 [Google Scholar]

[br0470] 47.Zhao H. A fast sweeping method for Eikonal equations. Math. Comput. 2005;74:603–627. [Google Scholar]

[br0480] 48.Sethian J.A. vol. 3. Cambridge University Press; 1999. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. [Google Scholar]

[br0490] 49.Krishnapriyan A., Gholami A., Zhe S., Kirby R., Mahoney M.W. Characterizing possible failure modes in physics-informed neural networks. Adv. Neural Inf. Process. Syst. 2021;34:26548–26560. [Google Scholar]

PERMALINK

Enforcing Dirichlet boundary conditions in physics-informed neural networks and variational physics-informed neural networks

S Berrone

C Canuto

M Pintore

N Sukumar

Abstract

1. Introduction

2. PINNs and interpolated variational PINNs

2.1. Neural network description

2.2. PINN and interpolated VPINN loss functions

Figure 1.

Figure 11.

3. Mathematical formulation

Figure 2.

Figure 8.

Figure 10.

4. Numerical results

4.1. Rate of convergence for second-order elliptic problems

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 9.

4.2. Application to nonlinear parametric problem

Figure 12.

Figure 13.

4.3. Deformation of an elastic body

Figure 14.

Figure 15.

4.4. Stabilized Eikonal equation

Figure 16.

Figure 17.

4.5. One-dimensional convection problem

Figure 18.

5. Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Contributor Information

Appendix A. On the Laplacian of the approximate distance function

Data availability statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases