Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 22;12139:158–171. doi: 10.1007/978-3-030-50420-5_12

Design of Loss Functions for Solving Inverse Problems Using Deep Learning

Jon Ander Rivera 15,16,, David Pardo 15,16,17, Elisabete Alberdi 15
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7304019

Abstract

Solving inverse problems is a crucial task in several applications that strongly affect our daily lives, including multiple engineering fields, military operations, and/or energy production. There exist different methods for solving inverse problems, including gradient based methods, statistics based methods, and Deep Learning (DL) methods. In this work, we focus on the latest. Specifically, we study the design of proper loss functions for dealing with inverse problems using DL. To do this, we introduce a simple benchmark problem with known analytical solution. Then, we propose multiple loss functions and compare their performance when applied to our benchmark example problem. In addition, we analyze how to improve the approximation of the forward function by: (a) considering a Hermite-type interpolation loss function, and (b) reducing the number of samples for the forward training in the Encoder-Decoder method. Results indicate that a correct design of the loss function is crucial to obtain accurate inversion results.

Keywords: Deep learning, Inverse problems, Neural network

Introduction

Solving inverse problems [17] is of paramount importance to our society. It is essential in, among others, most areas of engineering (see, e.g., [3, 5]), health (see, e.g. [1]), military operations (see, e.g., [4]) and energy production (see, e.g. [11]). In multiple applications, it is necessary to perform this inversion in real-time. This is the case, for example, of geosteering operations for enhanced hydrocarbon extraction [2, 10].

Traditional methods for solving inverse problems include gradient based methods [13, 14] and statistics based methods (e.g., Bayesian methods [16]). The main limitation of these kind of methods is that they lack an explicit construction of the pseudo-inverse operator. Instead, they only evaluate the inverse function for a given set of measurements. Thus, for each set of measurements, we need to perform a new inversion process. This may be time consuming.

Deep Learning (DL) seems to be a proper alternative to overcome the aforementioned problem. With DL methods, we explicitly build the pseudo-inverse operator rather than only evaluating it. Recently, the interest on performing inversion using DL techniques has grown exponentially (see, e.g., [9, 15, 18, 19]). However, the design of these methods is still somehow ad hoc and it is often difficult to encounter a comprehensive road map to construct robust Deep Neural Networks (DNNs) for solving inverse problems.

One major problem when designing DNNs is the error control. Several factors may lead to deficient results. Such factors include: poor loss function design, inadequate architecture, lack of convergence of the optimizer employed for training, and unsatisfactory database selection. Moreover, it is sometimes elusive to identify the specific cause of poor results. Even more, it is often difficult to asses the quality of the results and, in particular, determine if they can be improved.

In this work, we take a simple but enlightening approach to elucidate and design certain components of a DL algorithm when solving inverse problems. Our approach consists of selecting a simple inverse benchmark example with known analytical solution. By doing so, we are able to evaluate and quantify the effect of different DL design considerations on the inversion results. Specifically, we focus on analyzing a proper selection of the loss function and how it affects to the results. While more complex problems may face additional difficulties, those observed with the considered simple example are common to all inverse problems.

The remainder of this article is as follows. Section 2 describes our simple model inverse benchmark problem. Section 3 introduces several possible loss functions. Section 4 shows numerical results. Finally, Sect. 5 summarizes the main findings.

Simple Inverse Benchmark Problem

We consider a benchmark problem with known analytical solution. Let Inline graphic be the forward function and Inline graphic the pseudo-inverse operator. We want our benchmark problem to have more than one solution since this is one of the typical features exhibited by inverse problems. For that, we need Inline graphic to be non-injective. We select the non-injective function Inline graphic, whose pseudo-inverse has two possible solutions: Inline graphic. (See Fig. 1). The objective is to design a NN that approximates one of the solutions of the inverse problem.

Fig. 1.

Fig. 1.

Benchmark problem.

Database and Data Rescaling

We consider the domain Inline graphic. In there, we select a set of 1000 equidistant numbers. The corresponding dataset of input-output pairs Inline graphic is computed analytically.

In some cases, we perform a change of coordinates in our output dataset. Let’s name Inline graphic the linear mapping that goes from the output of the original dataset into the interval [0,1]. Instead of approximating function Inline graphic, our NN will approximate function Inline graphic given by

graphic file with name M11.gif 1

In the cases we perform no rescaling, we select Inline graphic, where Inline graphic is the identity mapping.

Loss Functions

We consider different loss functions. The objective here is to discern between adequate and poor loss functions for solving the proposed inverse benchmark problem.

We denote as Inline graphic and Inline graphic the NN approximations of the forward function and the pseudo-inverse operator, respectively. Weights Inline graphic and Inline graphic are the parameters to be trained (optimized) in the NN. Each value within the set of weights is a real number.

In a NN, we try to find the weights Inline graphic and Inline graphic that minimize a given loss function L. We express our problem mathematically as

graphic file with name M20.gif 2

Loss Based on the Missfit of the Inverse Data: We first consider the traditional loss function:

graphic file with name M21.gif 3

Theorem 1

Solution of minimization problem (2) with the loss function given by Eq. (3) has analytical solution for our benchmark problem in both the Inline graphic norm

graphic file with name M23.gif 4

and the Inline graphic norm

graphic file with name M25.gif 5

These solutions are such that:

  • For Inline graphic, Inline graphic, where x is any value in the interval Inline graphic.

  • For Inline graphic, Inline graphic.

Proof

We first focus on norm Inline graphic. We minimize the loss function:

graphic file with name M32.gif 6

where Inline graphic denotes the training dataset. For the exact pseudo-inverse operator Inline graphic, we can express each addend of (6) as follows:

graphic file with name M35.gif 7

Taking the derivative of Eq. (6) with respect to Inline graphic, we see in view of Eq. (7) that the loss function for the exact solution attains its minimum at every point Inline graphic.

In the case of norm Inline graphic, for each value of y we want to minimize:

graphic file with name M39.gif 8

Again, for the exact pseudo-inverse operator Inline graphic, we can express each addend of Eq. (8) as:

graphic file with name M41.gif 9

Taking the derivative of Eq. (8) with respect to Inline graphic and equaling it to zero, we obtain:

graphic file with name M43.gif 10

Thus, the function is minimized when the approximated value is 0. Inline graphic

Observation: Problem of Theorem 1 has infinite solutions in the Inline graphic norm. In the Inline graphic norm, the solution is unique; however, it differs from the two desired exact inverse solutions.

Loss Based on the Missfit of the Effect of the Inverse Data: As seen with the previous loss function, it is inadequate to look at the misfit in the inverted space. Rather, it is desirable to search for an inverse solution such that after applying the forward operator, we recover our original input. Thus, we consider the following modified loss function, where Inline graphic corresponds to the analytic forward function:

graphic file with name M48.gif 11

Unfortunately, computation of Inline graphic required in Inline graphic involves either (a) implementing Inline graphic in a GPU, which may be challenging in more complex examples, or (b) calling Inline graphic as a CPU function multiple times during the training process. Both options may considerably slow down the training process up to the point of making it impractical.

Encoder-Decoder Loss: To overcome the computational problems associated with Eq. (11), we introduce an additional NN, named Inline graphic, to approximate the forward function. Then, we propose the following loss function:

graphic file with name M54.gif 12

Two NNs of this type that are being simultaneously trained are often referred to as Encoder-Decoder [6, 12].

Two-Steps Loss: It is also possible to train Inline graphic and Inline graphic separately. By doing so, we diminish the training cost. At the same time, it allows us to separate the analysis of both NNs, which may simplify the detection of specific errors in one of the networks. Our loss functions are:

graphic file with name M57.gif 13

and

graphic file with name M58.gif 14

We first train Inline graphic using Inline graphic. Once Inline graphic is fixed (with weights Inline graphic), we train Inline graphic using Inline graphic.

Numerical Results

We consider two different NNs. The one approximating the forward function has 5 fully connected layers [8] with ReLU activation function [7]. The one approximating the inverse operator has 11 fully connected layers with ReLU activation function. ReLU activation function is defined as

graphic file with name M65.gif 15

These NN architectures are “overkilling” for approximating the simple benchmark problem studied in this work. Moreover, we also obtain results for different NN architectures, leading to identical conclusions that we omit here for brevity.

Loss Function Analysis

Loss Based on the Missfit of the Inverse Data: We produce two models using norms Inline graphic and Inline graphic, respectively. Figure 2 shows the expected disappointing results (see Theorem 1). The approximated NN values (green circles) are far from the true solution (blue line). From an engineering point of view, the recovered solution is worthless. The problem resides on the selection of the loss function.

Fig. 2.

Fig. 2.

Predicted (Inline graphic, green circles) vs exact (Inline graphic, blue line) inverse solutions evaluated over the testing dataset. (Color figure online)

Loss Based on the Missfit of the Effect of the Inverse Data: Figure 3 shows the real values of y (ground truth) vs their predicted pseudo-inverse values. The closer the predicted values are to the blue line, the better the result from the NN. We now observe an excellent match between the exact and approximated solutions. However, as mention in Sect. 3, this loss function entails essential limitations when considering complex problems.

Fig. 3.

Fig. 3.

Solution of the pseudo-inverse operator approximated by the NN. (Color figure online)

Encoder-Decoder Loss: Figure 4 shows the results for norm Inline graphic and Fig. 5 for norm Inline graphic. We again recover excellent results, without the limitations provided by loss function Inline graphic. Coincidentally, different norms recover different solution branches of the inverse problem. Note that in this problem, it is possible to prove that the probability of recovering either of the solution branches is identical.

Fig. 4.

Fig. 4.

Exact vs NN solutions using loss function Inline graphic and norm Inline graphic.

Fig. 5.

Fig. 5.

Exact vs NN solutions using loss function Inline graphic and norm Inline graphic.

Two-Steps Loss: Figures 6 and 7 show the results for norms Inline graphic and Inline graphic, respectively. The approximations of forward function and pseudo-inverse operator are accurate in both cases.

Fig. 6.

Fig. 6.

Exact vs NN solutions using loss functions Inline graphic and Inline graphic and norm Inline graphic.

Fig. 7.

Fig. 7.

Exact vs NN solutions using loss functions Inline graphic and Inline graphic and norm Inline graphic.

Hermite-Type Loss Functions

We now consider the two-steps loss function and we focus only on the forward function approximation given by Eq. (13). This is frequently the most time consuming part when solving an inverse problem with NNs. In this section, we analyze different strategies to work with a reduced dataset, which entails a dramatic reduction of the computational cost. We consider a dataset of three input-outputs pairs Inline graphic.

Figure 8 shows the results for norms Inline graphic and Inline graphic. Training data points are accurately approximated. Other points are poorly approximated.

Fig. 8.

Fig. 8.

Results of the NN that approximates the forward function. Red points correspond to the evaluation of the training dataset x and to Inline graphic. Green points correspond to the evaluation of a testing dataset. (Color figure online)

To improve the approximation, we introduce another term to the loss function. We force the NN to approximate the derivatives at each training point. This new loss is:

graphic file with name M89.gif 16

From a numerical point of view, the term that approximates the first derivatives could be very useful. If we think about x as a parameter of a Partial Differential Equation (PDE), we can efficiently evaluate derivatives via the adjoint problem.

Figure 9 shows the results when we use norms Inline graphic and Inline graphic for the training. For this benchmark problem, we select Inline graphic. Thus, to approximate derivatives, we evaluate the NN at the points Inline graphic.

Fig. 9.

Fig. 9.

Results of the NN that approximates the forward function. Red points correspond to the evaluation of the training dataset x and to Inline graphic. Green points correspond to the evaluation of a testing dataset. (Color figure online)

We observe that points nearby the training points are better approximated via Hermite interpolation, as expected. However, the entire approximation still lacks accuracy and exhibits undesired artifacts due to an insufficient number of training points. Thus, while the use of Hermite interpolation may be highly beneficial, especially in the context of certain PDE problems or when the derivatives are easily accessible, there is still a need to have a sufficiently dense database of sampling points. Figure 10 shows the evolution of the terms composing the loss function.

Fig. 10.

Fig. 10.

Evolution of the loss value when we train the NN that approximates Inline graphic using as loss Eq. (16). “Loss F” corresponds to the loss of the first term of Eq. (16). “Loss DER” corresponds to the loss of the second term of Eq. (16). “Total Loss” corresponds to the total value of Eq.(16).

Loss Function with a Reduced Number of Samples for the Forward Training

We now consider an Encoder-Decoder loss function, as described in Eq. (12). The objective is to minimize the number of samples employed to approximate the forward function since producing such database is often the most time-consuming part in a large class of inverse problems governed by PDEs.

We employ a dataset of three input-output pairs Inline graphic for the first term of Eq. (12) and a dataset of 1000 values of y obtained with an equidistant distribution on the interval [0, 1089] for the second term of Eq. (12).

Figure 11 shows the results of the NNs trained with norm Inline graphic. Results are disappointing. The forward function is far from the blue line (real forward function), specially nearby zero. The forward function leaves excessive freedom for the training of the inverse function. This allows the inverse function to be poorly approximated (with respect the to real inverse function).

Fig. 11.

Fig. 11.

Exact vs NN solutions using loss function Inline graphic, norm Inline graphic, and a reduced number of samples for the forward evaluation.

In order to improve the results, we train the NNs adding a regularization term to Eq. (12). We add the following regularization term maximizing smoothness on Inline graphic:

graphic file with name M101.gif 17

We evaluate this regularization term over a dataset of 1000 samples obtained with an equidistant distribution on the interval Inline graphic and we select Inline graphic.

Figure 12 shows the results of the NN. Now, the forward function is better approximated around zero. Unfortunately, the approximation is still inaccurate, indicating the need for additional points on the approximation. Figure 13 shows the evolution of the terms composing the loss function. The loss values associated with the first and the second terms are minimized. The loss corresponding to the regularization term remains as the largest one.

Fig. 12.

Fig. 12.

Exact vs NN solutions using loss function Inline graphic, norm Inline graphic, and a reduced number of samples for the forward evaluation.

Fig. 13.

Fig. 13.

Evolution of the loss value for Encoder-Decoder method trained with loss function Inline graphic and norm Inline graphic. “Loss F” corresponds to the loss of the first term of Eq. (17). “Loss FI” corresponds to the loss of the second term of Eq. (17). “Loss REG” corresponds to the loss of the third term of Eq. (17). “Total Loss” corresponds to the total value of Eq. (17).

Conclusions

We analyze different loss functions for solving inverse problems. We demonstrate via a simple numerical benchmark problem that some traditional loss functions are inadequate. Moreover, we propose the use of an Encoder-Decoder loss function, which can also be divided into two loss functions with a one-way coupling. This enables to decompose the original DL problem into two simpler problems.

In addition, we propose to add a Hermite-type interpolation to the loss function when needed. This may be especially useful in problems governed by PDEs where the derivative is easily accessible via the adjoint operator. Results indicate that Hermite interpolation provides enhanced accuracy at the training points and in the surroundings. However, we still need a sufficient density of points in our database to obtain acceptable results.

Finally, we evaluate the performance of the Encoder-Decoder loss function with a reduced number of samples for the forward function approximation. We observe that the forward function leaves excessive freedom for the training of the inverse function. To partially alleviate that problem, we incorporate a regularization term. The corresponding results improve, but they still show the need for additional training samples.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Jon Ander Rivera, Email: riverajonander@gmail.com.

References

  • 1.Albanese RA. Wave propagation inverse problems in medicine and environmental health. In: Chavent G, Sacks P, Papanicolaou G, Symes WW, editors. Inverse Problems in Wave Propagation. New York: Springer; 1997. pp. 1–11. [Google Scholar]
  • 2.Beer, R., et al.: Geosteering and/or reservoir characterization the prowess of new generation LWD tools. 51st Annual Logging Symposium Society of Petrophysicists and Well-Log Analysts (SPWLA) (2010)
  • 3.Bonnet M, Constantinescu A. Inverse problems in elasticity. Inverse Prob. 2005;21(2):R1–R50. doi: 10.1088/0266-5611/21/2/r01. [DOI] [Google Scholar]
  • 4.Broquetas A, Palau J, Jofre L, Cardama A. Spherical wave near-field imaging and radar cross-section measurement. IEEE Trans. Antennas Propag. 1998;46(5):730–735. doi: 10.1109/8.668918. [DOI] [Google Scholar]
  • 5.Burczyński, T., Beluch, W., Dugosz, A., Orantek, P., Nowakowski, M.: Evolutionary methods in inverse problems of engineering mechanics. In: Inverse Problems in Engineering Mechanics II, pp. 553–562. Elsevier Science Ltd., Oxford (2000). 10.1016/B978-008043693-7/50131-8. http://www.sciencedirect.com/science/article/pii/B9780080436937501318
  • 6.Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)
  • 7.Hara, K., Saito, D., Shouno, H.: Analysis of function of rectified linear unit used in deep learning. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
  • 8.Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
  • 9.Jin, Y., Wu, X., Chen, J., Huang, Y.: Using a physics-driven deep neural network to solve inverse problems for LWD azimuthal resistivity measurements, pp. 1–13, June 2019
  • 10.Li, Q., Omeragic, D., Chou, L., Yang, L., Duong, K.: New directional electromagnetic tool for proactive geosteering and accurate formation evaluation while drilling (2005)
  • 11.Liu, G., Zhou, B., Liao, S.: Inverting methods for thermal reservoir evaluation of enhanced geothermal system. Renew. Sustain. Energy Rev. 82, 471–476 (2018). 10.1016/j.rser.2017.09.065. http://www.sciencedirect.com/science/article/pii/S1364032117313175
  • 12.Mao, X.J., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections (2016)
  • 13.Neto, A.S., Soeiro, F.: Solution of implicitly formulated inverse heat transfer problems with hybrid methods. In: Computational Fluid and Solid Mechanics 2003, pp. 2369–2372. Elsevier Science Ltd., Oxford (2003). 10.1016/B978-008044046-0.50582-0
  • 14.Oberai AA, Gokhale NH, Feijóo GRF. Solution of inverse problems in elasticity imaging using the adjoint method. Inverse Prob. 2003;19(2):297–313. doi: 10.1088/0266-5611/19/2/304. [DOI] [Google Scholar]
  • 15.Puzyrev V. Deep learning electromagnetic inversion with convolutional neural networks. Geophys. J. Int. 2019;218:817–832. doi: 10.1093/gji/ggz204. [DOI] [Google Scholar]
  • 16.Stuart AM. Inverse problems: a Bayesian perspective. Acta Numerica. 2010;19:451–559. doi: 10.1017/S0962492910000061. [DOI] [Google Scholar]
  • 17.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. USA: Society for Industrial and Applied Mathematics; 2004. [Google Scholar]
  • 18.Xu, Y., et al.: Schlumberger: Borehole resistivity measurement modeling using machine-learning techniques (2018)
  • 19.Zhu, G., Gao, M., Kong, F., Li, K.: A fast inversion of induction logging data in anisotropic formation based on deep learning. IEEE Geosci. Remote Sens. Lett., 1–5 (2020). 10.1109/LGRS.2019.2961374

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES