Design of Loss Functions for Solving Inverse Problems Using Deep Learning

Jon Ander Rivera; David Pardo; Elisabete Alberdi

doi:10.1007/978-3-030-50420-5_12

. 2020 May 22;12139:158–171. doi: 10.1007/978-3-030-50420-5_12

Design of Loss Functions for Solving Inverse Problems Using Deep Learning

Jon Ander Rivera ^15,^16,^✉, David Pardo ^15,^16,¹⁷, Elisabete Alberdi ¹⁵

Editors: Valeria V Krzhizhanovskaya⁸, Gábor Závodszky⁹, Michael H Lees¹⁰, Jack J Dongarra¹¹, Peter M A Sloot¹², Sérgio Brissos¹³, João Teixeira¹⁴

PMCID: PMC7304019

Abstract

Solving inverse problems is a crucial task in several applications that strongly affect our daily lives, including multiple engineering fields, military operations, and/or energy production. There exist different methods for solving inverse problems, including gradient based methods, statistics based methods, and Deep Learning (DL) methods. In this work, we focus on the latest. Specifically, we study the design of proper loss functions for dealing with inverse problems using DL. To do this, we introduce a simple benchmark problem with known analytical solution. Then, we propose multiple loss functions and compare their performance when applied to our benchmark example problem. In addition, we analyze how to improve the approximation of the forward function by: (a) considering a Hermite-type interpolation loss function, and (b) reducing the number of samples for the forward training in the Encoder-Decoder method. Results indicate that a correct design of the loss function is crucial to obtain accurate inversion results.

Keywords: Deep learning, Inverse problems, Neural network

Introduction

Solving inverse problems [17] is of paramount importance to our society. It is essential in, among others, most areas of engineering (see, e.g., [3, 5]), health (see, e.g. [1]), military operations (see, e.g., [4]) and energy production (see, e.g. [11]). In multiple applications, it is necessary to perform this inversion in real-time. This is the case, for example, of geosteering operations for enhanced hydrocarbon extraction [2, 10].

Traditional methods for solving inverse problems include gradient based methods [13, 14] and statistics based methods (e.g., Bayesian methods [16]). The main limitation of these kind of methods is that they lack an explicit construction of the pseudo-inverse operator. Instead, they only evaluate the inverse function for a given set of measurements. Thus, for each set of measurements, we need to perform a new inversion process. This may be time consuming.

Deep Learning (DL) seems to be a proper alternative to overcome the aforementioned problem. With DL methods, we explicitly build the pseudo-inverse operator rather than only evaluating it. Recently, the interest on performing inversion using DL techniques has grown exponentially (see, e.g., [9, 15, 18, 19]). However, the design of these methods is still somehow ad hoc and it is often difficult to encounter a comprehensive road map to construct robust Deep Neural Networks (DNNs) for solving inverse problems.

One major problem when designing DNNs is the error control. Several factors may lead to deficient results. Such factors include: poor loss function design, inadequate architecture, lack of convergence of the optimizer employed for training, and unsatisfactory database selection. Moreover, it is sometimes elusive to identify the specific cause of poor results. Even more, it is often difficult to asses the quality of the results and, in particular, determine if they can be improved.

In this work, we take a simple but enlightening approach to elucidate and design certain components of a DL algorithm when solving inverse problems. Our approach consists of selecting a simple inverse benchmark example with known analytical solution. By doing so, we are able to evaluate and quantify the effect of different DL design considerations on the inversion results. Specifically, we focus on analyzing a proper selection of the loss function and how it affects to the results. While more complex problems may face additional difficulties, those observed with the considered simple example are common to all inverse problems.

The remainder of this article is as follows. Section 2 describes our simple model inverse benchmark problem. Section 3 introduces several possible loss functions. Section 4 shows numerical results. Finally, Sect. 5 summarizes the main findings.

Simple Inverse Benchmark Problem

We consider a benchmark problem with known analytical solution. Let Inline graphic be the forward function and the pseudo-inverse operator. We want our benchmark problem to have more than one solution since this is one of the typical features exhibited by inverse problems. For that, we need to be non-injective. We select the non-injective function , whose pseudo-inverse has two possible solutions: Inline graphic . (See Fig. 1). The objective is to design a NN that approximates one of the solutions of the inverse problem.

Database and Data Rescaling

We consider the domain Inline graphic . In there, we select a set of 1000 equidistant numbers. The corresponding dataset of input-output pairs is computed analytically.

In some cases, we perform a change of coordinates in our output dataset. Let’s name Inline graphic the linear mapping that goes from the output of the original dataset into the interval [0,1]. Instead of approximating function , our NN will approximate function given by

In the cases we perform no rescaling, we select Inline graphic , where is the identity mapping.

Loss Functions

We consider different loss functions. The objective here is to discern between adequate and poor loss functions for solving the proposed inverse benchmark problem.

We denote as Inline graphic and the NN approximations of the forward function and the pseudo-inverse operator, respectively. Weights and are the parameters to be trained (optimized) in the NN. Each value within the set of weights is a real number.

In a NN, we try to find the weights Inline graphic and that minimize a given loss function L. We express our problem mathematically as

Loss Based on the Missfit of the Inverse Data: We first consider the traditional loss function:

Theorem 1

Solution of minimization problem (2) with the loss function given by Eq. (3) has analytical solution for our benchmark problem in both the Inline graphic norm

and the Inline graphic norm

These solutions are such that:

For , , where x is any value in the interval .
For , .

Proof

We first focus on norm Inline graphic . We minimize the loss function:

where Inline graphic denotes the training dataset. For the exact pseudo-inverse operator , we can express each addend of (6) as follows:

Taking the derivative of Eq. (6) with respect to Inline graphic , we see in view of Eq. (7) that the loss function for the exact solution attains its minimum at every point .

In the case of norm Inline graphic , for each value of y we want to minimize:

Again, for the exact pseudo-inverse operator Inline graphic , we can express each addend of Eq. (8) as:

Taking the derivative of Eq. (8) with respect to Inline graphic and equaling it to zero, we obtain:

Thus, the function is minimized when the approximated value is 0. Inline graphic

Observation: Problem of Theorem 1 has infinite solutions in the Inline graphic norm. In the norm, the solution is unique; however, it differs from the two desired exact inverse solutions.

Loss Based on the Missfit of the Effect of the Inverse Data: As seen with the previous loss function, it is inadequate to look at the misfit in the inverted space. Rather, it is desirable to search for an inverse solution such that after applying the forward operator, we recover our original input. Thus, we consider the following modified loss function, where Inline graphic corresponds to the analytic forward function:

Unfortunately, computation of Inline graphic required in involves either (a) implementing in a GPU, which may be challenging in more complex examples, or (b) calling as a CPU function multiple times during the training process. Both options may considerably slow down the training process up to the point of making it impractical.

Encoder-Decoder Loss: To overcome the computational problems associated with Eq. (11), we introduce an additional NN, named Inline graphic , to approximate the forward function. Then, we propose the following loss function:

Two NNs of this type that are being simultaneously trained are often referred to as Encoder-Decoder [6, 12].

Two-Steps Loss: It is also possible to train Inline graphic and separately. By doing so, we diminish the training cost. At the same time, it allows us to separate the analysis of both NNs, which may simplify the detection of specific errors in one of the networks. Our loss functions are:

and

We first train Inline graphic using . Once is fixed (with weights ), we train using .

Numerical Results

We consider two different NNs. The one approximating the forward function has 5 fully connected layers [8] with ReLU activation function [7]. The one approximating the inverse operator has 11 fully connected layers with ReLU activation function. ReLU activation function is defined as

These NN architectures are “overkilling” for approximating the simple benchmark problem studied in this work. Moreover, we also obtain results for different NN architectures, leading to identical conclusions that we omit here for brevity.

Loss Function Analysis

Loss Based on the Missfit of the Inverse Data: We produce two models using norms Inline graphic and , respectively. Figure 2 shows the expected disappointing results (see Theorem 1). The approximated NN values (green circles) are far from the true solution (blue line). From an engineering point of view, the recovered solution is worthless. The problem resides on the selection of the loss function.

Fig. 2. — Predicted (, green circles) vs exact (, blue line) inverse solutions evaluated over the testing dataset. (Color figure online)

Loss Based on the Missfit of the Effect of the Inverse Data: Figure 3 shows the real values of y (ground truth) vs their predicted pseudo-inverse values. The closer the predicted values are to the blue line, the better the result from the NN. We now observe an excellent match between the exact and approximated solutions. However, as mention in Sect. 3, this loss function entails essential limitations when considering complex problems.

Fig. 3. — Solution of the pseudo-inverse operator approximated by the NN. (Color figure online)

Encoder-Decoder Loss: Figure 4 shows the results for norm Inline graphic and Fig. 5 for norm . We again recover excellent results, without the limitations provided by loss function . Coincidentally, different norms recover different solution branches of the inverse problem. Note that in this problem, it is possible to prove that the probability of recovering either of the solution branches is identical.

Fig. 4. — Exact vs NN solutions using loss function and norm .

Fig. 5. — Exact vs NN solutions using loss function and norm .

Two-Steps Loss: Figures 6 and 7 show the results for norms Inline graphic and , respectively. The approximations of forward function and pseudo-inverse operator are accurate in both cases.

Fig. 6. — Exact vs NN solutions using loss functions and and norm .

Fig. 7. — Exact vs NN solutions using loss functions and and norm .

Hermite-Type Loss Functions

We now consider the two-steps loss function and we focus only on the forward function approximation given by Eq. (13). This is frequently the most time consuming part when solving an inverse problem with NNs. In this section, we analyze different strategies to work with a reduced dataset, which entails a dramatic reduction of the computational cost. We consider a dataset of three input-outputs pairs Inline graphic .

Figure 8 shows the results for norms Inline graphic and . Training data points are accurately approximated. Other points are poorly approximated.

Fig. 8. — Results of the NN that approximates the forward function. Red points correspond to the evaluation of the training dataset x and to . Green points correspond to the evaluation of a testing dataset. (Color figure online)

To improve the approximation, we introduce another term to the loss function. We force the NN to approximate the derivatives at each training point. This new loss is:

From a numerical point of view, the term that approximates the first derivatives could be very useful. If we think about x as a parameter of a Partial Differential Equation (PDE), we can efficiently evaluate derivatives via the adjoint problem.

Figure 9 shows the results when we use norms Inline graphic and for the training. For this benchmark problem, we select . Thus, to approximate derivatives, we evaluate the NN at the points .

Fig. 9. — Results of the NN that approximates the forward function. Red points correspond to the evaluation of the training dataset x and to . Green points correspond to the evaluation of a testing dataset. (Color figure online)

We observe that points nearby the training points are better approximated via Hermite interpolation, as expected. However, the entire approximation still lacks accuracy and exhibits undesired artifacts due to an insufficient number of training points. Thus, while the use of Hermite interpolation may be highly beneficial, especially in the context of certain PDE problems or when the derivatives are easily accessible, there is still a need to have a sufficiently dense database of sampling points. Figure 10 shows the evolution of the terms composing the loss function.

Fig. 10. — Evolution of the loss value when we train the NN that approximates using as loss Eq. (16). “Loss F” corresponds to the loss of the first term of Eq. (16). “Loss DER” corresponds to the loss of the second term of Eq. (16). “Total Loss” corresponds to the total value of Eq.(16).

Loss Function with a Reduced Number of Samples for the Forward Training

We now consider an Encoder-Decoder loss function, as described in Eq. (12). The objective is to minimize the number of samples employed to approximate the forward function since producing such database is often the most time-consuming part in a large class of inverse problems governed by PDEs.

We employ a dataset of three input-output pairs Inline graphic for the first term of Eq. (12) and a dataset of 1000 values of y obtained with an equidistant distribution on the interval [0, 1089] for the second term of Eq. (12).

Figure 11 shows the results of the NNs trained with norm Inline graphic . Results are disappointing. The forward function is far from the blue line (real forward function), specially nearby zero. The forward function leaves excessive freedom for the training of the inverse function. This allows the inverse function to be poorly approximated (with respect the to real inverse function).

Fig. 11. — Exact vs NN solutions using loss function , norm , and a reduced number of samples for the forward evaluation.

In order to improve the results, we train the NNs adding a regularization term to Eq. (12). We add the following regularization term maximizing smoothness on Inline graphic :

We evaluate this regularization term over a dataset of 1000 samples obtained with an equidistant distribution on the interval Inline graphic and we select .

Figure 12 shows the results of the NN. Now, the forward function is better approximated around zero. Unfortunately, the approximation is still inaccurate, indicating the need for additional points on the approximation. Figure 13 shows the evolution of the terms composing the loss function. The loss values associated with the first and the second terms are minimized. The loss corresponding to the regularization term remains as the largest one.

Fig. 12. — Exact vs NN solutions using loss function , norm , and a reduced number of samples for the forward evaluation.

Fig. 13. — Evolution of the loss value for Encoder-Decoder method trained with loss function and norm . “Loss F” corresponds to the loss of the first term of Eq. (17). “Loss FI” corresponds to the loss of the second term of Eq. (17). “Loss REG” corresponds to the loss of the third term of Eq. (17). “Total Loss” corresponds to the total value of Eq. (17).

Conclusions

We analyze different loss functions for solving inverse problems. We demonstrate via a simple numerical benchmark problem that some traditional loss functions are inadequate. Moreover, we propose the use of an Encoder-Decoder loss function, which can also be divided into two loss functions with a one-way coupling. This enables to decompose the original DL problem into two simpler problems.

In addition, we propose to add a Hermite-type interpolation to the loss function when needed. This may be especially useful in problems governed by PDEs where the derivative is easily accessible via the adjoint operator. Results indicate that Hermite interpolation provides enhanced accuracy at the training points and in the surroundings. However, we still need a sufficient density of points in our database to obtain acceptable results.

Finally, we evaluate the performance of the Encoder-Decoder loss function with a reduced number of samples for the forward function approximation. We observe that the forward function leaves excessive freedom for the training of the inverse function. To partially alleviate that problem, we incorporate a regularization term. The corresponding results improve, but they still show the need for additional training samples.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Jon Ander Rivera, Email: riverajonander@gmail.com.

References

1.Albanese RA. Wave propagation inverse problems in medicine and environmental health. In: Chavent G, Sacks P, Papanicolaou G, Symes WW, editors. Inverse Problems in Wave Propagation. New York: Springer; 1997. pp. 1–11. [Google Scholar]
2.Beer, R., et al.: Geosteering and/or reservoir characterization the prowess of new generation LWD tools. 51st Annual Logging Symposium Society of Petrophysicists and Well-Log Analysts (SPWLA) (2010)
3.Bonnet M, Constantinescu A. Inverse problems in elasticity. Inverse Prob. 2005;21(2):R1–R50. doi: 10.1088/0266-5611/21/2/r01. [DOI] [Google Scholar]
4.Broquetas A, Palau J, Jofre L, Cardama A. Spherical wave near-field imaging and radar cross-section measurement. IEEE Trans. Antennas Propag. 1998;46(5):730–735. doi: 10.1109/8.668918. [DOI] [Google Scholar]
5.Burczyński, T., Beluch, W., Dugosz, A., Orantek, P., Nowakowski, M.: Evolutionary methods in inverse problems of engineering mechanics. In: Inverse Problems in Engineering Mechanics II, pp. 553–562. Elsevier Science Ltd., Oxford (2000). 10.1016/B978-008043693-7/50131-8. http://www.sciencedirect.com/science/article/pii/B9780080436937501318
6.Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)
7.Hara, K., Saito, D., Shouno, H.: Analysis of function of rectified linear unit used in deep learning. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
8.Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
9.Jin, Y., Wu, X., Chen, J., Huang, Y.: Using a physics-driven deep neural network to solve inverse problems for LWD azimuthal resistivity measurements, pp. 1–13, June 2019
10.Li, Q., Omeragic, D., Chou, L., Yang, L., Duong, K.: New directional electromagnetic tool for proactive geosteering and accurate formation evaluation while drilling (2005)
11.Liu, G., Zhou, B., Liao, S.: Inverting methods for thermal reservoir evaluation of enhanced geothermal system. Renew. Sustain. Energy Rev. 82, 471–476 (2018). 10.1016/j.rser.2017.09.065. http://www.sciencedirect.com/science/article/pii/S1364032117313175
12.Mao, X.J., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections (2016)
13.Neto, A.S., Soeiro, F.: Solution of implicitly formulated inverse heat transfer problems with hybrid methods. In: Computational Fluid and Solid Mechanics 2003, pp. 2369–2372. Elsevier Science Ltd., Oxford (2003). 10.1016/B978-008044046-0.50582-0
14.Oberai AA, Gokhale NH, Feijóo GRF. Solution of inverse problems in elasticity imaging using the adjoint method. Inverse Prob. 2003;19(2):297–313. doi: 10.1088/0266-5611/19/2/304. [DOI] [Google Scholar]
15.Puzyrev V. Deep learning electromagnetic inversion with convolutional neural networks. Geophys. J. Int. 2019;218:817–832. doi: 10.1093/gji/ggz204. [DOI] [Google Scholar]
16.Stuart AM. Inverse problems: a Bayesian perspective. Acta Numerica. 2010;19:451–559. doi: 10.1017/S0962492910000061. [DOI] [Google Scholar]
17.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. USA: Society for Industrial and Applied Mathematics; 2004. [Google Scholar]
18.Xu, Y., et al.: Schlumberger: Borehole resistivity measurement modeling using machine-learning techniques (2018)
19.Zhu, G., Gao, M., Kong, F., Li, K.: A fast inversion of induction logging data in anisotropic formation based on deep learning. IEEE Geosci. Remote Sens. Lett., 1–5 (2020). 10.1109/LGRS.2019.2961374

[CR1] 1.Albanese RA. Wave propagation inverse problems in medicine and environmental health. In: Chavent G, Sacks P, Papanicolaou G, Symes WW, editors. Inverse Problems in Wave Propagation. New York: Springer; 1997. pp. 1–11. [Google Scholar]

[CR2] 2.Beer, R., et al.: Geosteering and/or reservoir characterization the prowess of new generation LWD tools. 51st Annual Logging Symposium Society of Petrophysicists and Well-Log Analysts (SPWLA) (2010)

[CR3] 3.Bonnet M, Constantinescu A. Inverse problems in elasticity. Inverse Prob. 2005;21(2):R1–R50. doi: 10.1088/0266-5611/21/2/r01. [DOI] [Google Scholar]

[CR4] 4.Broquetas A, Palau J, Jofre L, Cardama A. Spherical wave near-field imaging and radar cross-section measurement. IEEE Trans. Antennas Propag. 1998;46(5):730–735. doi: 10.1109/8.668918. [DOI] [Google Scholar]

[CR5] 5.Burczyński, T., Beluch, W., Dugosz, A., Orantek, P., Nowakowski, M.: Evolutionary methods in inverse problems of engineering mechanics. In: Inverse Problems in Engineering Mechanics II, pp. 553–562. Elsevier Science Ltd., Oxford (2000). 10.1016/B978-008043693-7/50131-8. http://www.sciencedirect.com/science/article/pii/B9780080436937501318

[CR6] 6.Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)

[CR7] 7.Hara, K., Saito, D., Shouno, H.: Analysis of function of rectified linear unit used in deep learning. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)

[CR8] 8.Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

[CR9] 9.Jin, Y., Wu, X., Chen, J., Huang, Y.: Using a physics-driven deep neural network to solve inverse problems for LWD azimuthal resistivity measurements, pp. 1–13, June 2019

[CR10] 10.Li, Q., Omeragic, D., Chou, L., Yang, L., Duong, K.: New directional electromagnetic tool for proactive geosteering and accurate formation evaluation while drilling (2005)

[CR11] 11.Liu, G., Zhou, B., Liao, S.: Inverting methods for thermal reservoir evaluation of enhanced geothermal system. Renew. Sustain. Energy Rev. 82, 471–476 (2018). 10.1016/j.rser.2017.09.065. http://www.sciencedirect.com/science/article/pii/S1364032117313175

[CR12] 12.Mao, X.J., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections (2016)

[CR13] 13.Neto, A.S., Soeiro, F.: Solution of implicitly formulated inverse heat transfer problems with hybrid methods. In: Computational Fluid and Solid Mechanics 2003, pp. 2369–2372. Elsevier Science Ltd., Oxford (2003). 10.1016/B978-008044046-0.50582-0

[CR14] 14.Oberai AA, Gokhale NH, Feijóo GRF. Solution of inverse problems in elasticity imaging using the adjoint method. Inverse Prob. 2003;19(2):297–313. doi: 10.1088/0266-5611/19/2/304. [DOI] [Google Scholar]

[CR15] 15.Puzyrev V. Deep learning electromagnetic inversion with convolutional neural networks. Geophys. J. Int. 2019;218:817–832. doi: 10.1093/gji/ggz204. [DOI] [Google Scholar]

[CR16] 16.Stuart AM. Inverse problems: a Bayesian perspective. Acta Numerica. 2010;19:451–559. doi: 10.1017/S0962492910000061. [DOI] [Google Scholar]

[CR17] 17.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. USA: Society for Industrial and Applied Mathematics; 2004. [Google Scholar]

[CR18] 18.Xu, Y., et al.: Schlumberger: Borehole resistivity measurement modeling using machine-learning techniques (2018)

[CR19] 19.Zhu, G., Gao, M., Kong, F., Li, K.: A fast inversion of induction logging data in anisotropic formation based on deep learning. IEEE Geosci. Remote Sens. Lett., 1–5 (2020). 10.1109/LGRS.2019.2961374

PERMALINK

Design of Loss Functions for Solving Inverse Problems Using Deep Learning

Jon Ander Rivera

David Pardo

Elisabete Alberdi

Abstract

Introduction

Simple Inverse Benchmark Problem

Fig. 1.

Database and Data Rescaling

Loss Functions

Theorem 1

Proof

Numerical Results

Loss Function Analysis

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Hermite-Type Loss Functions

Fig. 8.

Fig. 9.

Fig. 10.

Loss Function with a Reduced Number of Samples for the Forward Training

Fig. 11.

Fig. 12.

Fig. 13.

Conclusions

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Design of Loss Functions for Solving Inverse Problems Using Deep Learning

Jon Ander Rivera

David Pardo

Elisabete Alberdi

Abstract

Introduction

Simple Inverse Benchmark Problem

Fig. 1.

Database and Data Rescaling

Loss Functions

Theorem 1

Proof

Numerical Results

Loss Function Analysis

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Hermite-Type Loss Functions

Fig. 8.

Fig. 9.

Fig. 10.

Loss Function with a Reduced Number of Samples for the Forward Training

Fig. 11.

Fig. 12.

Fig. 13.

Conclusions

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases