On Learned Operator Correction in Inverse Problems

Sebastian Lunz; Andreas Hauptmann; Tanja Tarvainen; Carola-Bibiane Schönlieb; Simon Arridge

doi:10.1137/20M1338460

. Author manuscript; available in PMC: 2024 Dec 31.

Published in final edited form as: SIAM J Imaging Sci. 2021 Jan 26;14(1):92–127. doi: 10.1137/20M1338460

On Learned Operator Correction in Inverse Problems^*

Sebastian Lunz ^†, Andreas Hauptmann ^‡, Tanja Tarvainen ^§, Carola-Bibiane Schönlieb ^†, Simon Arridge ^¶

PMCID: PMC7617273 EMSID: EMS123349 PMID: 39741577

Abstract

We discuss the possibility of learning a data-driven explicit model correction for inverse problems and whether such a model correction can be used within a variational framework to obtain regularized reconstructions. This paper discusses the conceptual difficulty of learning such a forward model correction and proceeds to present a possible solution as a forward-adjoint correction that explicitly corrects in both data and solution spaces. We then derive conditions under which solutions to the variational problem with a learned correction converge to solutions obtained with the correct operator. The proposed approach is evaluated on an application to limited view photoacoustic tomography and compared to the established framework of the Bayesian approximation error method.

Keywords: model correction, inverse problems, operator learning, deep learning, variational methods, photoacoustic tomography

AMS subject classifications: 65K10, 65F22, 94A08, 47A52

1. Introduction

In inverse problems it is usually considered imperative to have an accurate forward model of the underlying physics. Nevertheless, such accurate models can be computationally highly expensive due to possible nonlinearities, large spatial and temporal dimensions, as well as stochasticity. Thus, in many applications approximate models are used in order to speed up reconstruction times and to comply with hardware and cost restrictions. As a consequence the introduced approximation errors need to be taken into account when solving ill-posed inverse problems or a degradation of the reconstruction quality can be expected.

For instance, in classical computerized tomography with a relatively high dose, models based on ray transforms are sufficiently accurate for the reconstruction task, whereas the full physical model would incorporate stochastic X-ray scattering events. Nevertheless, in some cone beam computerized tomography applications the dose is typically relatively low with a large field of view and hence scattering becomes more prevalent [38] and simple models based on the ray transform are not enough to guarantee sufficient image quality. However, as these scattering events are stochastic, accurate models would be too expensive for practical image reconstruction. Therefore, the basic model is used as an approximation with an appropriate correction that accounts for the full physical phenomena [47].

In applications where the forward model is given by the solution of a partial differential equation, model reduction techniques are often used to reduce computational costs [8, 14, 39]. Such reductions lead to known approximation errors in the model and can be corrected for by explicit modeling [4, 23]. Recently, with the possibility of combining deep learning techniques with classical variational methods, approximate models are now also used in the framework of learned image reconstruction [20]. In this case, the approximate model is embedded in an iterative scheme and updates are performed by a convolutional neural network (CNN). Here, model correction is performed implicitly by the network while computing the iterative updates.

In this paper we investigate the possibility of correcting such approximation errors explicitly with data-driven methods, in particular, using a CNN. In what follows, we restrict ourselves to linear inverse problems, with both theory and experiments considering the linear case only. However, we expect many of the challenges and approaches discussed here to be relevant and to give insight into the nonlinear case as well. Let x ∈ X be the unknown quantity of interest we aim to reconstruct from measurements y ∈ Y, where X, Y are Hilbert spaces and x and y fulfil the relation

A x = y,

(1.1)

where A : X → Y is the accurate forward operator modeling the underlying physics sufficiently accurately for any systematic error to be well below the noise level of the acquisition. We assume that the evaluation of accurately operator A is computationally expensive and we rather want to use an approximate model Ã : X → Y to compute x from y. In doing so, we introduce an inherent approximation error in (1.1) and have

\tilde{A} x = \tilde{y}

(1.2)

leading to a systematic model error

δ y = y - \tilde{y} .

(1.3)

1.1. Remark

In general, the range and domain of Ã might be different than those of A. To simplify the remainder of this paper we assume, unless otherwise stated, that appropriate projections between the range and domain of the approximate operator Ã as well as the range and domain of the accurate operator A are included in the implementation of Ã, so that expressions such as (1.3) are well-defined.

In this work, we consider corrections for this approximation error via a parameterizable, possibly nonlinear, mapping F_Θ: Y → Y, applied as a correction to Ã. This leads to a corrected operator A_Θ of the form

A_{Θ} = F_{Θ} \circ \tilde{A} .

(1.4)

We aim to choose the correction F_Θ such that ideally A_Θ(x) ≈ Ax for some x ∈ X of interest. Restricting the corrected operator A_Θ to be a composition of the approximate operator Ã and a parameterizable correction yields various advantages compared to fully parameterizing the corrected operator A_Θ: X → Y, without utilizing the knowledge of Ã. It avoids having to model the typically global dependencies of A in the learned correction and allows us to employ generic network architectures for F_Θ, such as the popular U-Net [34].

The primary question that we aim to answer is, whether such corrected models (1.4) can be subsequently used in variational regularization approaches that find a reconstruction x* as

x^{*} = \underset{x \in X}{\arg \min} \frac{1}{2} ∥ A_{Θ} (x) - y ∥_{Y}^{2} + λ R (x)

(1.5)

with regularization functional R and associated hyperparameter λ. Apart from investigating the practical performance of (1.5), we will discuss conditions on the model correction that need to be satisfied to guarantee convergence of solutions to (1.5) to the accurate solution as the corrected operator A_Θ approaches the accurate operator A. We provide theoretical results, which show that variational regularization strategies can be applied under certain conditions. In particular, as we will discuss in this study, while it is fairly easy to learn a model correction that fulfils (1.4), it cannot be readily guaranteed to yield high-quality reconstructions when used within the variational problem (1.5). This is a conceptual difficulty caused by a possible discrepancy in the range of the adjoints of A and Ã that can be an inherent part of the approximate model and hence first order methods to solve (1.5) yield nondesirable results.

To overcome this restriction, we introduce a forward-adjoint correction that combines an explicit forward model correction with an explicit correction of the adjoint. We will show that such a forward-adjoint correction—if trained sufficiently well—provides a descent direction for a gradient scheme to solve (1.5) for which we can guarantee convergence to a neighborhood of the solution obtained with the accurate operator A.

This work fits into the wider field of learned image reconstruction techniques that have sparked large interest in recent years [5, 22, 25]. In particular, we are motivated by model-based learned iterative reconstruction techniques that have shown to be highly successful in a variety of application areas [1, 2, 17, 21, 36]. These methods generally mimic iterative gradient descent schemes and demonstrate impressive reconstruction results with often considerable speedups [18], but are mostly empirically motivated and lack convergence guarantees. In contrast, this paper follows a recent development in understanding how deep learning methods can be combined with classical reconstruction algorithms, such as variational techniques, to retain established theoretical results on convergence. Whereas most studies are concentrated on learning a regularizer [27, 31, 33, 37], we concentrate here on the operator only and keep a fixed, analytical form for the regularizer. Further, related works that consider learned corrections by utilising explicit knowledge of the operator range are [7, 9, 37]. Another line of research examines the incorporation of imperfectly known forward operators into a fully variational model [10, 29] as well as perturbations in [13, 32]. We note also the connection to the concept of calibration in a Bayesian setting [26].

This paper is organized as follows. In section 2, we introduce the concept of model correction and compare it to previous work in the field. In section 3, we discuss forward corrections and demonstrate their limitations. To overcome these limitations, we introduce the forward-adjoint corrections in section 4, where we also present convergence results for this correction. This is followed by a discussion of computational challenges and the experimental setup in section 5. Finally, in section 6, we demonstrate the performance of the discussed approaches on two data sets for limited view photoacoustic tomography.

Glossary

To improve readability throughout the paper we provide a glossary (see Table 1) with the definition of frequently used notation.

Table 1.

Symbol	Description	Definition
X	Reconstruction space	Hilbert space, norm \|\| · \|\|X, product 〈·,·〉X
Y	Measurement space	Hilbert space, norm \|\| · \|\|Y, product 〈·,·〉Y
A	Exact forward operator	A : X ⟶ Y
Ã	Approximate forward operator	Ã : X ⟶ Y
FΦ	Parameterizable correction in Y	FΦ : Y ⟶ Y
GΦ	Parameterizable correction in X	GΦ : X ⟶ X
AΦ	Corrected forward operator	AΦ : X ⟶ Y, AΦ = FΦ o Ã
$A_{Φ}^{*}$	Corrected adjoint	$A_{Φ}^{} : Y \to X, A_{Φ}^{} = G_{Φ} \circ {\tilde{A}}^{*}$
Df (t)	Fréchet derivative of f at t	Df (t) : dom(f) ⟶ rng(f) f (t + δt) = f (t) + Df (t)δt + O(δt²)
R	Regularization functional	R : X → ℝ₊
𝓛	Variational functional with A	$L (x) = \frac{1}{2} ∥ A x - y ∥_{Y}^{2} + λ R (x)$
𝓛Φ	Variational functional with AΦ	$L_{Θ} (x) = \frac{1}{2} A_{Θ} (x) - y_{Y}^{2} + λ R (x)$

Open in a new tab

2. Learning a model correction

As we have motivated above, we only consider an explicit model correction (1.4) in this study and leave the regularization term untouched. Therefore, we will discuss in the following how a model correction using data-driven methods is possible and what the main challenges are.

Before we turn to the discussion of an explicit correction, it is important to make the distinction from an implicit correction in the framework of learned iterative reconstructions. In particular, we concentrate here on learned gradient schemes [1], which can be formulated by a network Λ_Θ, that is designed to mimic a gradient descent step. In particular, we train the networks to perform an iterative update, such that

x_{k + 1} = Λ_{Θ} (\nabla_{x} \frac{1}{2} ∥ A x_{k} - y ∥_{Y}^{2}, x_{k}),

(2.1)

where $\nabla_{x} \frac{1}{2} ‖ A x_{k} - y_{Y}^{2} ‖ = A^{*} (A x_{k} - y)$ . Now, one could use an approximate model instead of the accurate model and compute an approximate gradient given by Ã*(Ãx_k – y) for the update in (2.1), as proposed in [20]. The network Λ_Θ then implicitly corrects the model error to produce the new iterate. That means, the correction and a prior are, hence, trained simultaneously with the update in (2.1). Such approaches are typically trained by using a loss function, like the L²-loss, to measure the distance between reconstruction and a ground-truth phantom.

On the other hand, in the explicit approach that we pursue here, we aim to learn a correction A_Θ that is independent of the regularization use. It can hence be trained using knowledge of the accurate and approximate operator alongside training data in either X or Y, without requiring pairs of measurements and their corresponding ground-truth phantoms. In a scenario where the operators cannot been accessed directly, samples of pairs from the two operators can even be sufficient to fit an explicit operator correction. While implicit methods have been shown to perform well in practice [20], our approach will yield an explicit correction and as such can be used in combination with any regularization functional and builds on the established variational framework. Furthermore, we note that the study of explicit methods also allows one to uncover and investigate some of the fundamental challenges of model correction that might easily be left ignored in implicit approaches.

Thus, we will concentrate our discussion in the following on how an explicit data correction can be achieved, how the correction of the model Ã can be parameterized by a neural network, and how this can be incorporated into a variational framework.

2.1. Approximation error method (AEM)

A well-established approach to incorporate model correction into a reconstruction framework, such as (1.5), is given by Bayesian approximation error modeling [23, 24]. Let us shortly recall, that in Bayesian inversion we want to determine the posterior distribution of the unknown x given y, and by Bayes’ formula we obtain

p (x ∣ y) = p (y ∣ x) \frac{p (x)}{p (y)} .

(2.2)

Thus, the posterior distribution is characterized by the likelihood p(y|x) and the chosen prior p(x) on the unknown. Typically, the likelihood p(y|x) is modeled using accurate knowledge of the forward operator A : X → Y as well as the noise model. In the AEM, the purpose is now to adjust the likelihood by examining the difference between the (accurate) forward operator A and its approximation Ã of the model (1.1)–(1.2) as

ε = δ y = A x - \tilde{A} x .

(2.3)

Including an additive model for the measurement noise e, this leads to an observation model

y = \tilde{A} x + ε + e .

(2.4)

We model the noise e independently of x as Gaussian e ~ N(η_e, Γ_e), where η_e and Γ_e are the mean and covariance of the noise. Further, the model error ε is approximated as Gaussian ε ~ N(η_ε, Γ_ε) and is modeled independently of noise e and unknown parameters x leading to a Gaussian distributed total error n = ε + e, n ~ N(η_n, Γ_n), where η_ε and η_n are means and Γ_ε and Γ_n are the covariance matrices of model error and total errors, respectively. This leads to a so-called enhanced error model [23] with a likelihood distribution of the form

p (y ∣ x) \sim \exp (- \frac{1}{2} ∥ L_{n} (\tilde{A} x - y + η_{n}) ∥_{Y}^{2}),

where $L_{n}^{T} L_{n} = Γ_{n}^{- 1}$ is a matrix square root such as the Cholesky decomposition of the inverse covariance matrix of the total error. In the case of Gaussian white noise with a zero mean and a constant standard deviation σ, this can be written as

p (y ∣ x) \sim \exp (- \frac{1}{2 σ} ∥ L_{ε} (\tilde{A} x - y + η_{ε}) ∥_{Y}^{2}),

where $L_{ε}^{T} L_{ε} = Γ_{ε}^{- 1}$ . This could be used to motivate writing the variational problem (1.5) in the form

x^{*} = \underset{x \in X}{\arg \min} \frac{1}{2} ∥ L_{ε} (\tilde{A} x - y + η_{ε}) ∥_{Y}^{2} + λ R (x) .

(2.5)

In order to utilize the approach, the unknown distribution of the model error needs to be approximated. That can be obtained, for example, by simulations [4, 41] as follows. Let {xⁱ, i = 1,…, N} be a set of samples drawn from a training set. The corresponding samples of the model error are then

ε^{i} = A x^{i} - \tilde{A} x^{i}

(2.6)

and the mean and covariance of the model error can be estimated from the samples as

η_{ε} = \frac{1}{N} \sum_{i = 1}^{N} ε^{i},

(2.7)

Γ_{ε} = \frac{1}{N - 1} \sum_{i = 1}^{N} ε^{i} (ε^{i})^{T} - η_{ε} η_{ε}^{T} .

(2.8)

2.2. Learning a general model correction

The classical Bayesian AEM provides an affine linear correction of the likelihood in (2.5) and by construction is limited to cases where the error between accurate and approximate models (2.3) can be approximated as normally distributed. As this can be too restrictive in certain cases to describe more complicated errors, we will now address a more general concept of learning a nonlinear explicit model correction.

That is, given an accurate underlying forward model A, we aim to find a (partially) learned operator A_Θ which we consider as an explicitly corrected approximate model of the form (1.4). To do so, we need to set a notion of distance between A and A_Θ in order to assess the quality of the approximation. A seemingly natural notion of distance between two operators would be the supremum norm over elements in X, that is, we consider here

∥ A - A_{Θ} ∥_{X \to Y} : = \sup_{x \in X : ∥ x ∥ = 1} ∥ A x - A_{Θ} (x) ∥_{Y} .

(2.9)

However, in many relevant applications it is impossible to find a correction of the form A_Θ = F_Θ ◦ Ã that achieves low uniform approximation error, making this notion of distance too restrictive. For instance, if we consider the case of a learned a posteriori correction of some approximate model Ã with a parameterizable mapping F_Θ : Y → Y that fulfils (1.4), then the approximate model Ã can exhibit a nullspace kern(Ã) that is different from that of the accurate operator and, in particular, is potentially much larger. Thus, there may exist a (or several) v ∈ kern(Ã) with Av ≠ 0. Any corrected operator A_Θ = F_Θ ◦ Ã then exhibits an error in the sense of (2.9) of at least ||Av||_Y, as

\begin{array}{l} ∥ A - A_{Θ} ∥_{X \to Y} \geq \max {∥ A v - F_{Θ} (0) ∥_{Y}, ∥ A (- v) - F_{Θ} (0) ∥_{Y}} \\ \geq \min_{y \in Y} \max {∥ A v - y ∥_{Y}, ∥ - A v - y ∥_{Y}} \\ = ∥ A v ∥_{Y}, \end{array}

where in the last equality we have used that the point minimizing the maximum of the distance to two other points is the center of the line through those points. In our case, the center of the line between Av and —Av is always the origin of the coordinate system 0, independently of the choice of A and v. In other words, the information in direction v is lost in the approximate model and would need to be recovered subsequently by the correction F_Θ. If there are several such nontrivial v ∈ kern(Ã), a uniform correction becomes increasingly difficult in the form of (2.9). We will illustrate this difficulty in the following section 2.2.1.

While aiming for a uniform correction is impractical, it can nevertheless be possible to correct the operator Ã using an a posteriori correction as in (1.4), provided a weaker notion of operator distance is employed. Here, we propose an empirical, learned notion of operator correction, that is optimized for a training set of points {xⁱ, i = 1,…,N}, similar to section 2.1. More precisely, we examine the average deviation of A_Θ from A as

\frac{1}{N} \sum_{i}^{N} ∥ A_{Θ} (x^{i}) - A x^{i} ∥_{Y}

(2.10)

in a suitable norm ||·||. In this notion, it is sufficient for the operators to be close in the mean for a given training set and hence we call this a statistical or learned correction with respect to the chosen training set. For instance, if the kernel direction v ∈ kern(Ã) is orthogonal to the sample xⁱ the information lost in direction v is not crucial for representing the data of interest. Alternatively, the kernel direction v might be highly correlated with another direction w ∉ kern(Ã) in the sense that 〈xⁱ, v〉 ≈ 〈xⁱ, w〉 for all i. Then the result of Av can be inferred from Ãw, even though Ãv = 0.

To conclude this section, we note that in many cases we cannot hope to find a uniform model correction, but that correcting the model error can be still attempted using the notion of learned correction, quantified by (2.10). This is possible even if the operators A and Ã are exhibiting different kernel spaces, as long as the training set {xⁱi = 1,…,N} exhibits sufficient structure to compensate for the loss of information in the approximate model.

Remark 2.1

We consider nonlinear corrections A_Θ = F_Θ ◦ Ã in this paper even when correcting a linear operator A from a linear approximation Ã, as in our computational examples.

We have three main motivations to do so. First, there are well-established nonlinear network architectures, such as U-Net [34], that are highly powerful and in fact have considerably fewer parameters than a fully parameterized linear map when the method is applied to applications in 3 dimensions, making the nonlinear approach scalable. Second, when considering nonlinear corrections, a generalization to the context of nonlinear operators will be easier. Finally and most importantly, while the operators A and Ã might be linear, the region of interest in image and data space where we need a good correction is highly nonlinear, in the sense that the samples xⁱ in (2.10) are drawn from a distribution with nonlinear support. This makes nonlinear corrections considerably more powerful in correcting model errors than their linear counterparts.

2.2.1. A toy case: Downsampling

In order to illustrate the challenge of a learned operator correction, we consider a toy case. Here, the accurate forward model A is given by a downsampling operator with an averaging filter, while the approximate model Ã simply skips every other sample. Concretely, we consider x ∈ ℝⁿ, y ∈ ℝ^n/2 and Ã, A ∈ ℝ^n/2×n, given by

A = (\begin{matrix} \frac{1}{2} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\ ⋱ & ⋱ & ⋱ \\ \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \end{matrix}) and \tilde{A} = (\begin{matrix} 1 & 0 \\ 0 & 1 & 0 \\ ⋱ & ⋱ & ⋱ \\ 0 & 1 & 0 \end{matrix}) .

(2.11)

Clearly, both operators have very different kernel spaces, with A vanishing on inputs of even magnitude with alternating sign, whereas Ã vanishes for every v with v[j] = 0, with index j even, and any value for j odd. In other words, the null space is spanned by the unit vectors with odd index, kern(Ã) = {e_j | 0 < j ≤ n, j even}. In fact, by the same argument as above, these v ∈ kern(Ã) with ||v||_∞ = 1 are such that the uniform approximation error for any correction will be ||Av – F_Θ(Ãv)]]_∞ ≥ ||Av||_∞ ≥ 0.25 for all v ∈ kern(Ã).

This example exhibits the two features described in the previous section: First, a uniform correction in the sense of (2.9) is impossible due to different kernel spaces. However, a learned correction in the mean (2.10) is possible on some data {xⁱ, i = 1,…, N} consisting of piecewise constant functions: On these samples the two operators A and A already coincide everywhere except near jumps, where a weighted average can be employed to correct the approximation error.

2.3. Solving the variational problem

We now aim to solve an inverse problem given the corrected model A_Θ by solving the associated variational problem (1.5). In this context it is natural to require that the solutions of the two minimization problems, involving the operator correction A_Θ and A, are close, that is,

\underset{x \in X}{\arg \min} \frac{1}{2} ∥ A_{Θ} (x) - y ∥_{Y}^{2} + λ R (x) \approx \underset{x \in X}{\arg \min} \frac{1}{2} ∥ A x - y ∥_{Y}^{2} + λ R (x) .

(2.12)

Note that this formulation is different than the AEM (2.5), where the data fidelity term is given by $‖ L_{ε} (\tilde{A} x - y + η_{ε}) ‖_{Y}^{2}$ . Solutions to (1.5) are then usually computed by an iterative algorithm. Here we consider first order methods to draw connections to learned iterative schemes [1, 2, 17]. In particular, we consider a classic gradient descent scheme, assuming differentiable R. Then, given an initial guess x₀, we can compute a solution by the following iterative process:

x_{k + 1} = x_{k} - γ_{k} \nabla_{x} (\frac{1}{2} ∥ A x_{k} - y ∥_{X}^{2} + λ R (x_{k}))

(2.13)

with appropriately chosen step size γ_κ > 0. When using (2.13) for the corrected operator it seems natural to ask for a gradient consistency of the approximate gradient

\nabla_{x} ∥ A_{Θ} (x) - y ∥_{X}^{2} \approx \nabla_{x} ∥ A x - y ∥_{X}^{2}

(2.14)

and hence we can identify

\sum_{i = 1}^{N} ∥ \nabla_{x} ∥ A_{Θ} (x^{i}) - y^{i} ∥_{X}^{2} - \nabla_{x} ∥ A x^{i} - y^{i} ∥_{X}^{2} ∥

(2.15)

as another relevant measure of quality for model corrections within the variational framework, if gradient schemes are used to solve (1.5). In the following we will discuss the possibilities of obtaining a correction, such that we can guarantee a closeness of solutions in the sense of (2.12).

3. Forward model correction

We will now present the possibility of correcting the forward model only and discuss resulting shortcomings of this approach. More precisely, in a forward model correction, the approximate operator Ã : X → Y is corrected using a neural network F_Θ : Y → Y that is trained to remove artefacts in data space for a given training set. This leads to a corrected operator of the form A_Θ = F_Θ ◦ Ã.

3.1. The adjoint problem

To solve the minimization problem (1.5) with the learned forward operator with an iterative scheme such as (2.13), we need to compute the gradient of the data fidelity. We recall that the corrected operator A_Θ = F_Θ ◦ Ã, where the correction F_Θ is given by a nonlinear neural network. Following the chain rule we obtain the following gradient:

\frac{1}{2} \nabla_{x} ∥ A_{Θ} (x) - y ∥_{2}^{2} = {\tilde{A}}^{*} [D F_{Θ} (\tilde{A} x)]^{*} (F_{Θ} (\tilde{A} x) - y) .

(3.1)

Here, we denote by DF_Θ(y) the Fréchet derivative of F_Θ at y, which is a linear operator Y → Y, whereas the gradient for the correct data fidelity term is simply given by

\frac{1}{2} \nabla_{x} ∥ A x - y ∥_{Y}^{2} = A^{*} (A x - y) .

That means, to satisfy the gradient consistency condition (2.14), we would need

{\tilde{A}}^{*} [D F_{Θ} (\tilde{A} x)]^{*} (F_{Θ} (\tilde{A} x) - y) \approx A^{*} (A x - y) .

(3.2)

On the other hand, if we train the forward model correction, only requiring consistency in data space by minimizing (2.10), we will only ensure consistency of the residuals F_Θ(Ãx) – y ≈ Ax – y, but not full gradient consistency as in (2.14). In order to enforce gradient consistency we need to control the derivative of the network DF_Θ(Ãx) and consequently also need to take the adjoint into consideration when training the forward correction. This could be done by adding an additional penalty term to (2.10) that penalizes the network for exhibiting an adjoint different from A*. For that purpose, let us examine the adjoint of the linearization of the correction operator A_Θ around a point x:

(D A_{Θ} (x))^{*} [y] = {\tilde{A}}^{*} (D F_{Θ} (\tilde{A} x))^{*} [y] .

With this we can consider the following additional penalty term in the training:

∥ (A^{*} - {\tilde{A}}^{*} \circ {[D F_{Θ} (\tilde{A} x)]}^{*}) (r) ∥_{X}, where r = F_{Θ} (\tilde{A} x) - y

(3.3)

and choose r to be the residual in data space F_Θ(Ax) – y that arises when minimizing the data fidelity term as in (3.1).

However this solution comes with its own drawback. As we can see in (3.1), the range of the corrected fidelity term’s gradient (3.1) is limited by the range of the approximate adjoint, rng(Ã*). Thus, we identify the key difficulty here in the differences of the range of the accurate and the approximate adjoints rather than the differences in the forward operators themselves, which links back to the discussion in 2.2.

Indeed, a correction of the forward operator via composition with a parameterized model F_Θ in measurement~space is not able to yield gradients close to the gradients of the accurate data term if rng(Ã*) and rng(A*) are too different. This problem is exacerbated if the dimensions of these two spaces differ and we cannot expect to find a correction that satisfies the gradient consistency (3.2) and, related to Remark 1.1, even suitable projections in Ã would not be sufficient to compensate for this. This observation can be made precise in the following theorem.

Theorem 3.1 (unlearnability of a gradient consistent forward model correction)

Let A and Ã be compact linear operators from X to Y and the solutions

\hat{x} \in \underset{x}{\arg \min} \frac{1}{2} ∥ A x - y ∥_{Y}^{2},

(3.4)

{\hat{x}}_{Θ} critical point of \frac{1}{2} ∥ A_{Θ} (x) - y ∥_{Y}^{2}

(3.5)

be given. If ${\tilde{x}}_{0} \in r n g ({\tilde{A}}^{*})$ and $\hat{x} \notin \bar{r n g ({\tilde{A}}^{*})}$ , then a gradient descent algorithm for the functional in (3.5), initialized with ${\tilde{x}}_{0}$ , yields a solution such that ${\hat{x}}_{Θ} \neq \hat{x}$ for any $\hat{x}$ solving (3.4).

Proof

This follows directly from the update equations for solving (3.5) by

{\tilde{x}}_{k + 1} = {\tilde{x}}_{k} - λ_{k} Δ {\tilde{x}}_{k}

with

Δ {\tilde{x}}_{k} : = \frac{1}{2} \nabla_{{\tilde{x}}_{k}} ∥ A_{Θ} ({\tilde{x}}_{k}) - y ∥_{Y}^{2} = {\tilde{A}}^{*} [D F_{Θ} (\tilde{A} {\tilde{x}}_{k})]^{*} (F_{Θ} (\tilde{A} {\tilde{x}}_{k}) - y) .

(3.6)

If ${\tilde{x}}_{0} \in r n g ({\tilde{A}}^{*})$ then $Δ {\tilde{x}}_{0} \in r n g ({\tilde{A}}^{*})$ , and hence ${\tilde{x}}_{1} \in r n g ({\tilde{A}}^{*})$ . By induction this is true for all k > 0, i.e., ${\tilde{x}}_{k} \in r n g ({\tilde{A}}^{*})$ , ∀k and thus any limit point ${\hat{x}}_{Θ} \in \bar{r n g ({\tilde{A}}^{*})}$ lies in the closure of the range of Ã*. Since $\hat{x} \notin \bar{r n g ({\tilde{A}}^{*})}$ it follows that $x \neq {\hat{x}}_{Θ}$ for any limit point of a gradient descent algorithm for solving (3.5).

Thus, a correction of the forward model by requiring only consistency in data space does not in fact ensure consistency of the data term, when solving a variational problem. Additionally, according to Theorem 3.1 even including an additional penalty term in the form of (3.3) does not solve this problem.

3.1.1. Illustration with the toy case

Going back to the toy case from section 2.2.1, where we considered a downsampling operation, the approximate operator was chosen such that the null space is spanned by the unit vectors with even index. The range of the adjoint can then be characterized by the identity rng(Ã*) = (kern(Ã))^⊥ and hence we have rng(Ã*) = {e_j | 0 ≤ j ≤ n, j odd}. It is now clear, that we cannot compute any solution x* ∉ rng(Ã*) by the updates in (3.6), if we initialize them with ${\tilde{x}}_{0} \in r n g ({\tilde{A}}^{*})$ , since all updates are restricted to the range of the adjoint of the approximate operator. This problem is illustrated in Figure 1, where we consider an imaging problem for illustrative purposes and x is vectorized before the operators in (2.11) are applied. Whereas the difference in the forward operator is minimal for this example, the range of the approximate adjoint makes it impossible to recover the phantom without further adjustments after application of the adjoint, which will be addressed in the next section.

As we can see, the range of the adjoint and approximate adjoint are essentially different. Even if the approximate adjoint Ã* is applied to the ideal data Ax (bottom right), representing a perfect fit of the forward model, the range of the approximate adjoint **rng** (Ã*) makes it impossible to compute a consistent gradient in (2.14) without further modifications.

4. Forward-adjoint correction

As is evident from the last section, a forward model correction that is computed to minimize (2.10) in data space alone is not sufficient to compute the actual reconstruction in a variational framework. We additionally require consistency in the gradients of the data fidelity term (2.15) which in turn boils down to a condition for a correction on the adjoint of the corrected forward operator in image space, motivated by (3.3). We will refer to such a correction in data and image space as a forward-adjoint correction, as we will learn a correction of the forward operator, as well as a correction of the adjoint (backward).

4.1. Obtaining a forward-adjoint correction

The goal is now to obtain a gradient consistent model correction. To achieve this we propose to learn two networks. That is, we learn a network F_Θ that corrects the forward model and another network G_Φ that corrects the adjoint, such that we have

A_{Θ} : = F_{Θ} \circ \tilde{A}, A_{Φ}^{*} : = G_{Φ} \circ {\tilde{A}}^{*} .

These corrections are obtained as follows. Given a set of training samples (xⁱ, Axⁱ), we train the forward correction F_Θ acting in measurement space Y with the loss

\min_{Θ} \sum_{i} ∥ F_{Θ} (\tilde{A} x^{i}) - A x^{i} ∥_{Y} .

(4.1)

In an analogous way, we correct the adjoint with the network G_Φ acting on image space X with the loss

\min_{ϕ} \sum_{i} ∥ G_{Φ} ({\tilde{A}}^{*} r^{i}) - A^{*} r^{i} ∥_{X} .

(4.2)

Here, we can choose the direction rⁱ = F_Θ(Ãxⁱ) – yⁱ as in (3.3) for the adjoint loss. This ensures that the adjoint correction is in fact trained in directions relevant when solving the variational problem.

At evaluation time, the corrected operators can then be used to compute approximate gradients of the data fidelity term $∥ A x - y ∥_{Y}^{2}$ . The gradient then takes the form

A^{*} (A x - y) \approx (G_{Φ} \circ {\tilde{A}}^{*}) (F_{Θ} (\tilde{A} x) - y) .

(4.3)

Let us note that the separate correction of the adjoint and the forward operator comes with a change of philosophy compared to existing methods for forward operator correction as presented in section 2.1. Instead of trying to fit a single corrected operator A_Θ that is already parameterized according to its use within the data fidelity term of a variational problem, we fit a nonlinear corrected operator A_Θ whose use within the variational problem requires to fit the gradient of the data term directly. This gradient fit takes the form as in (4.3). We use the gradient of the data fidelity term to directly obtain the gradient of the variational functional for our corrected operator, allowing us to perform minimization techniques like gradient descent. We take the obtained critical point of these dynamics as the reconstruction. Note that the approximate gradient cannot be associated with a variational functional for the forward-adjoint method anymore. Instead, the gradient is parameterized directly, without parametrizing the variational functional first.

Remark 4.1

We note that such a separate correction in image and data space can be related to learned primal dual (LPD) methods [2], where the correction is performed implicitly as described in section 2. This explains in part why LPD approaches might be especially suitable for applications with an imperfectly known operator; see also [44].

In the following section we will discuss how these dynamics relate to the original variational problem and we will see that they can in fact take us close to the original reconstruction if both the forward and adjoint are fit sufficiently well.

4.2. Convergence analysis

The purpose of this section is to show that sufficiently small training losses can ensure that gradient descent over (1.5) converges to a neighborhood of the reconstruction $\hat{x}$ , obtained with the accurate operator A. The section relates to the forward adjoint correction (4.3) and uses the notation of this approach. In the case of forward-adjoint correction, these loss functions are given by

∥ A x - A_{Θ} (x) ∥_{Y} and ∥ (A^{*} - A_{Φ}^{*}) (A_{Θ} (x) - y) ∥_{X} .

(4.4)

Let us now consider for any y ∈ Y the two functionals

\begin{array}{l} ℒ (x) & : = \frac{1}{2} ∥ A x - y ∥_{Y}^{2} + λ R (x), \\ ℒ_{Θ} (x) & : = \frac{1}{2} ∥ A_{Θ} (x) - y ∥_{Y}^{2} + λ R (x) \end{array}

associated with the variational problem for the reconstruction x from the measurement y. We will show connections between the reconstruction $\hat{x} : = \arg \min_{x} L (x)$ using the accurate operator A and the solutions ${\hat{x}}_{Θ} \in \arg \min_{x} L_{Θ} (x)$ obtained with our corrected operator A_Θ.

When considering the gradient descent dynamics over 𝓛Θ, we do not refer to the actual gradient over 𝓛Θ but instead consider the direct fit to the gradient of the form $A_{Φ}^{*} (A_{Θ} (x) - y) + λ \nabla R (x)$ as discussed in the last section. In a slight abuse of notation we will nevertheless denote this gradient as $\nabla^{†} L_{Θ} : = A_{Φ}^{*} (A_{Θ} (x) - y) + λ \nabla R (x)$ to keep the notation easy to read in the remainder of this section. If R is merely subdifferentiable, then ΔR(x) denotes an element in the subgradient of R.

For the remainder of this chapter, we make the following assumption on the regularization functional R.

Assumption 4.2 (strong convexity)

We assume that the regularization functional R is strongly convex and denote the strong convexity constant by m.

4.3. Remark

Assumption 4.2 in particular holds for R being the Tikhonov regularization functional $R (x) = ∥ x ∥_{X}^{2}$ and for the pseudo-Huber loss $R (x) = \int_{{[0, 1]}^{2}} δ [\sqrt{1 + \frac{1}{δ^{2}} | | \nabla_{t} x (t) | |^{2}} - 1]$ for a bounded function x : [0, l]² ↦ ℝ and δ > 0 which we use in the experimental section. For operators A with bounded inverse it is sufficient for the regularization functional to be convex to ensure strong convexity of the resulting variational functional 𝓛. In this case, strong convexity of the regularization functional is not required.

This allows us to use the following two fundamental lemmas on the behavior of 𝓛 near the minimum of the variational functional. As a direct consequence of 4.2 and the convexity of the data term for linear forward operators we will from now on assume 𝓛 to be strongly convex.

Lemma 4.4 (proximity to minimizer)

Let 𝓛 be strongly convex. Then for every 𝜖 there is a δ > 0 such that for any x

ℒ (x) - ℒ (\hat{x}) \leq δ \Rightarrow ∥ x - \hat{x} ∥_{X} \leq ϵ,

(4.5)

where $\hat{x} : = \arg \min_{x} L (x)$ .

Proof

By the definition of strong convexity we have

ℒ (x) \geq ℒ (\hat{x}) + {〈 s_{\hat{x}}, x - \hat{x} 〉}_{X} + \frac{m}{2} ∥ x - \hat{x} ∥_{X}^{2},

where $s_{\hat{x}} \in \partial L (\hat{x})$ is in the subdifferential of 𝓛 at $L at \hat{x} . Using 0 \in \partial L (\hat{x}) yields$

δ \geq ℒ (x) - ℒ (\hat{x}) \geq \frac{m}{2} ∥ x - \hat{x} ∥_{X}^{2}

which proves the claim by setting $δ = \frac{ϵ^{2}}{2 m}$ .

Lemma 4.5 (lower gradient norm bound)

Let 𝓛 be strongly convex. For every δ there is a δ > 0 such that for any x

∥ x - \hat{x} ∥_{X} > ϵ \Rightarrow \forall s \in \partial ℒ (x) : ∥ s ∥_{X} > δ,

(4.6)

where ∂𝓛(x) denotes the subdifferential of 𝓛 at x and $L (\hat{x}) - L (x) < 0$ .

Proof

By the definition of strong convexity

ℒ (\hat{x}) \geq ℒ (x) + {〈 s_{x}, \hat{x} - x 〉}_{X} + \frac{m}{2} ∥ x - \hat{x} ∥_{X}^{2},

where again s_x denotes an element in the subdifferential of 𝓛 around x. Then by the Cauchy-Schwarz inequality

ℒ (\hat{x}) - ℒ (x) - \frac{m}{2} ∥ x - \hat{x} ∥_{X}^{2} \geq - ∥ s_{x} ∥_{X} ∥ \hat{x} - x ∥_{X} .

Using $∥ (\hat{x}) - ∥ (x) < 0$ by assumption shows

\frac{m}{2} ∥ x - \hat{x} ∥_{X}^{2} \leq ∥ s_{x} ∥_{X} ∥ \hat{x} - x ∥_{X}

and, hence, $∥ s_{x} ∥_{X} \geq \frac{m}{2} ∥ x - \hat{x} ∥_{X}$ , which proves the result.

4.6. Remark

The assumption of strong convexity is used in the following results via Lemmas 4.4 and 4.5 only. While it is a sufficient condition for these to hold, it is not necessary. In particular, if the variational functional is not strongly convex but such that 4.4 and 4.5 hold true, the following results still apply.

We now turn to show that a minimizer ${\hat{x}}_{Θ}$ of the approximate functional can in fact be computed with a gradient descent scheme and that this leads minimizer in fact to the accurate reconstruction $\hat{x}$ . We begin by extending Lemma 4.5 to include the regularization term. For this purpose, we consider the alignment of the variational gradients including the regularization term

\cos Φ_{v} (x) : = \frac{〈 \nabla ℒ (x), \nabla^{†} ℒ_{Θ} (x) 〉}{∥ \nabla ℒ (x) ∥^{2}} .

(4.7)

We show how the alignment can be used as the key quantity to guarantee convergence of the approximate dynamics to a a neighborhood of the accurate solution. We remark again the abuse of notation $\nabla^{†} L_{Θ} (x) : = A_{Φ}^{*} (A_{Θ} (x) - y) + λ \nabla R (x)$ .

Proposition 4.7 (convergence under alignment constraints)

Assume that outside a neigh-borhood U of the minimizer $\hat{x}$ of the exact functional 𝓛 we have

\cos Φ (x) > δ_{1} > 0

for some δ₁ > 0. Then eventually the gradient descent dynamics over 𝓛Θ will reach the neighborhood U.

Proof

Denote by xΘ(t) the trajectory of the reconstruction under the gradient flow

\partial_{t} x_{Θ} (t) = - \nabla^{†} ℒ_{Θ} (x_{Θ} (t)) .

Consider the evaluation of the variational loss 𝓛 that invokes the correct forward operator A. Using the bound of the alignment as in Lemma 4.8, we can bound

\begin{array}{l} \partial_{t} ℒ (x_{Θ} (t)) = {〈 \nabla ℒ (x_{Θ} (t)), \partial_{t} x_{Θ} (t) 〉}_{X} = - {〈 \nabla^{†} ℒ (x_{Θ} (t)), \nabla ℒ_{Θ} (x_{Θ} (t)) 〉}_{X} \\ \leq - δ_{1} \cdot ∥ \nabla ℒ (x) ∥_{X}^{2} . \end{array}

As long as _Θ(ŕ) has not reached the neighborhood U, by (4.6), we have ||Δ𝓛(x)||_X > δ₂ for some δ₂ and hence

\partial_{t} ℒ (x_{Θ} (t)) \leq - δ_{1} \cdot ∥ \nabla ℒ (x) ∥_{X}^{2} \leq - \frac{1}{2} δ_{1} δ_{2} = : - c < 0.

The gradient flow dynamics induced by Δ^†𝓛_𝜣 hence induce a decrease of scrL at a rate that is globally bounded by c outside a neighborhood U around x, concluding the proof by Lemma 4.4.

We have shown that even though the corrected operator A_Θ is potentially nonlinear, the gradient dynamics induced by Δ^†𝓛_𝜣 can in fact minimize the variational problem with the accurate operator A, effectively minimizing the associated variational functional scrL and leading us close to the accurate solution $\hat{x}$ . The proposition is based on an assumption about the alignment cos Θ. We will directly track this quantity in our experimental section, making sure the convergence results can be applied to our experimental findings. The training loss, however, is not based on the alignment directly, but rather minimizes a combination of forward and adjoint loss. We have in fact found that this combination of loss functionals is both more interpretable and more stable than directly minimizing the alignment. The following lemma and theorem show that these loss functions in fact minimize a lower bound on the alignment and hence a sufficiently well-trained correction can also be guaranteed to yield results close to the minimizer x of the variational functional involving the exact operator A. In this context, a well-trained correction is such that it achieves sufficiently low training errors.

Lemma 4.8 (complete gradient alignment bound)

Let 𝓛 and 𝓛Θ be defined as above. We have the lower bound

\cos Φ_{v} \geq 1 - \frac{∥ A ∥_{X \to Y} ∥ (A - A_{Θ}) (x) ∥_{Y} + ∥ (A^{*} - A_{Φ}^{*}) (A_{Θ} (x) - y) ∥_{X}}{∥ \nabla ℒ (x) ∥_{X}},

where cosΦ_v is defined as in (4.7).

Proof

A straightforward calculation shows

\begin{array}{l} \frac{{〈 \nabla ℒ (x), \nabla^{†} ℒ_{Θ} (x) 〉}_{X}}{∥ \nabla ℒ (x) ∥_{X}^{2}} \\ = \frac{{〈 \nabla ℒ (x), \nabla ℒ (x) 〉}_{X}}{∥ \nabla ℒ (x) ∥_{X}^{2}} + \frac{{〈 \nabla^{†} ℒ_{Θ} (x) - \nabla ℒ (x), \nabla ℒ (x) 〉}_{X}}{∥ \nabla ℒ (x) ∥_{X}^{2}} \\ \geq 1 - \frac{∥ \nabla^{†} ℒ_{Θ} (x) - \nabla ℒ (x) ∥_{X}}{∥ \nabla ℒ (x) ∥_{X}} . \end{array}

The result follows by using the bound

\begin{array}{l} ∥ A^{*} (A x - y) - A_{Φ}^{*} (A_{Θ} (x) - y) ∥_{X} \\ \leq ∥ A ∥_{X \to Y} ∥ (A - A_{Θ}) (x) ∥_{Y} + ∥ (A^{*} - A_{Φ}^{*}) (A_{Θ} (x) - y) ∥_{X}, \end{array}

which itself emerges directly from the triangular inequality applied to the identity

A * (A x - y) - A_{Φ}^{*} (A_{Θ} (x) - y) = A * (A - A_{Θ}) (x) + (A * - A_{Φ}^{*}) (A_{Θ} (x) - y) .

Theorem 4.9 (convergence to a neighborhood of $\hat{x}$ )

Let δ > 0 and pick δ as in (4.6).

Assume both the adjoint and forward operator are fit up to a δ/4-margin, i.e.,

∥ A ∥_{X \to Y} ∥ (A - A_{Θ}) (x_{n}) ∥_{Y} < δ / 4, ∥ (A^{*} - A_{Φ}^{*}) (A_{Θ} (x_{n}) - y) ∥_{X} < δ / 4

(4.8)

for all y and x_n obtained during gradient descent over 𝓛_Θ. Then eventually the gradient descent dynamics over 𝓛Θ will reach an δ neighborhood of the accurate solution $\hat{x}$ .

Proof

We apply 4.7, with the neighborhood U chosen as the δ ball around x. Using Lemma 4.8, we can bound

\cos Φ \geq 1 - \frac{∥ A ∥_{X \to Y} ∥ (A - A_{Θ}) (x) ∥_{Y} + ∥ (A^{*} - A_{Φ}^{*}) (A_{Θ} (x) - y) ∥_{X}}{∥ \nabla ℒ (x) ∥_{X}} \geq 1 - \frac{δ / 4 + δ / 4}{∥ \nabla ℒ (x) ∥_{X}} .

As long as ${‖ x_{Θ} (t) - \hat{x} ‖}_{X} \geq ϵ$ , by (4.6), we have ||∇𝓛(x)||_X > δ and hence

\cos Φ \geq 1 - \frac{δ / 2}{δ} > 0.

We can hence apply 4.7 to conclude the proof.

Overall, we have thus shown that a sufficiently well-trained nonlinear corrected operator A_Θ induces gradient dynamics Δ^†𝓛_𝜣 that lead close to the accurate solution $\hat{x}$ .

We note that the main assumption in Theorem 4.9 is that the learned operator A_Θ has to be sufficiently close to the accurate operator A throughout the minimization trajectory, in the sense of (4.8). While this corresponds directly to the quantities of the loss functions that the approximations A_Θ and A^*_Φ were trained on, it includes any x_n occurring during the gradient descent dynamics. Thus, we will discuss the concept of adding exactly these samples x_n to the training set in the next chapter, effectively making our training loss function minimize exactly the relevant quantities ||(A – A_Θ)(x_n)||_Y and ||(A* – AΦ)(AΘ(x_n) – y)||χ.

4.10. Remark

The above Theorem 4.9 makes use of both proximity of the forward operator as well as of the adjoints. While this is necessary to guarantee convergence of the gradient descent dynamics to a neighborhood of the accurate solution, it is not strictly necessary to guarantee proximity of the minimizers of 𝓛_Θ and of 𝓛. In fact, in Appendix B we show that under certain assumptions a good forward approximation quality is sufficient to ensure closeness of minimizers, without considering a specific optimization scheme. While this result is interesting from a theoretical viewpoint, Theorem 4.9 is essential for supporting and explaining the experimental results in this study.

5. Computational considerations

In the following we will first address some details on the training procedures and then continue to present the design of experiments to evaluate the performance of the discussed approaches. In particular, as we mentioned above, in order to ensure the convergence in Theorem 4.9, we need to make sure that the forward fit as well as the backward fit in (4.8) are satisfied throughout the minimization process, which makes a special recursive training of the corrections necessary.

5.1. Recursive training

Let us now address how to ideally choose the training sets for the forward-adjoint correction to ensure a good fit of the forward correction F_Θ by minimizing (4.1) and the adjoint correction G_Φ with (4.2). To create the training set, there are two possibilities. Either we are given a set of measurements {yⁱ,i = 1,…,N} or, alternatively, if we are given a set of samples in image space {xⁱ, i = 1,…, N}, then we need to create a corresponding set of measurements by applying the accurate model yⁱ = Axⁱ + eⁱ with the addition of noise eⁱ. Either way, given the set of measurements yⁱ we need to train F_Θ and G_Φ on a meaningful starting point for the gradient descent to solve the variational problem; a natural candidate would be to choose the backprojection $x_{0}^{i} = {\tilde{A}}^{*} y^{i}$ .

Training the corrected operators A_Θ and A_Φ with the samples ${x_{n}^{i} A x_{n}^{i}}$ only yields operator corrections that approximate A and Aⁱ well for samples x that are close to backprojections of measurements. However, the purpose of this paper is to learn a correction of A that can be used within the variational problem to obtain a solution close to the one obtained using the accurate operator A. We observe that training A_Θ on the backprojections $x_{0}^{i} = \tilde{A} * y^{i}$ only is not sufficient to achieve this goal. While this leads to A_Θ being a good approximation to A for the first iterates in the gradient descent scheme, the approximation quality tends to deteriorate for later iterates, making A_Θ not a good appproximation to A anymore. Such a behavior is in fact what one would heuristically expect, as A_Θ has never been trained on later iterates to match the accurate operator.

This connects to the assumptions made in the convergence Theorem 4.9, where we assume low approximation errors for both the forward and the adjoint at all iterates of the gradient descent scheme. We hence need to ensure a uniformly low approximation error at any iterate to be able to guarantee convergence and it is in particular not sufficient to ensure a low approximation error at the initial point of the minimization of the variational problem only.

A natural solution to mitigate this problem is to include later iterates of the variational problem into the training samples for the corrected operator. More precisely, given some weights Θ of the correction operator, denote by ${x_{n}^{i}$ the iterates obtained following the dynamics

x_{n + 1}^{i} = x_{n}^{i} - μ [A_{Φ}^{*} (A_{Θ} (x_{n}^{i}) - y_{n}^{i}) + λ \nabla R (x)],

(5.1)

where μ denotes the step size. We add these samples to the original training set ${(x^{i}, A x^{i})}$ , i.e., we also train on ${(x_{n}^{i}, A x_{n}^{i})}$ for all n < N_ter and i. Here N_ter is the maximal number of gradient descent steps we take. This allows us to ensure that the corrections A_Θ, as well as $A_{Φ}^{*}$ for the forward-adjoint method, are fit consistently well at any iterate $x_{n}^{i}$ of the gradient descent dynamics.

A major drawback of this approach is the additional computational burden it incurs during training. Obtaining the iterates of the minimization to solve the variational problem requires performing the minimization at training time. To reduce the additional computational burden one can make use of the fact that the gradient of the data term for the learned operator correction A_Θ has to be computed for two different purposes. First, it is used to perform minimization over the variational functional and, second to further train the A_Θ to better match the accurate operator. One can hence perform this computation only once, using it for both purposes. This reduces computational costs particularly when training on every iterate of the minimization over the variational functional, in which case little overhead cost compared to regular training is inflicted.

Additionally, the trajectory (5.1) depends on the network weights Θ. The training samples can hence change during training and convergence is not clear a priori. Empirically, we find that training on the full trajectory $(x_{n}^{i}, A x_{n}^{i})$ for n < N_iter from the beginning tends to be unstable, as this will lead to most training samples differing greatly from both the original training distribution as well as the accurate trajectory we are finally interested in. There are, however, two effective solutions to this problem: First, one could alternatively train on the trajectory obtained when using the accurate operator A, avoiding instabilities in the beginning of training. This, however, could lead to errors accumulating during training. We found that the most effective solution is to have N_iter increase from 1 to some N_max during training. With this approach, we start off by training on the original samples $x_{0}^{i}$ only and then add in more samples from the trajectory as training proceeds. We have noticed that once trained on backprojections, adding later iterates to the training set does not change the behavior of the learned correction on backprojections by much. In this sense, one can interpret the latter approach to recursive training as gradually extending the domain the correction is valid on, without considerably changing the behavior of the correction on the part of the image domain that it is already valid on. This heuristically explains why recursive training can be performed very stably when gradually increasing N_iter.

5.2. Experimental design

For a practical application we consider photoacoustic tomography (PAT) in two dimensions; for more details on PAT see [6] and the discussion in Appendix A. Here, the measurement data are given as a set of time series in a limited view geometry measured with a line detector at the surface, which we visualize as a space-time image in Figure 2. In this limited view scenario, the reconstruction task is already a very challenging inverse problem in itself even with the accurate operator available; we refer to [30, 45] for details. Here, the accurate model A is given by a pseudospectral time-stepping model [42, 43], whereas the approximate model A is given by a regriding and fast Fourier transform which neglects the effect of singularities and introduces systematic errors in the forward mapping [11, 28]. In particular, to avoid singularities in the approximate model we threshold incident waves with an angle up to Θ_max = 60° from normal incidence, which means that this part of the data is inevitably lost. Nevertheless, the approximate forward model still exhibits strong aliasing artefacts, as can be clearly seen in Figure 2 indicating that this application is an ideal candidate for this study. For more details on the models, we refer to the discussion in Appendix A. We developed the majority of code in Python using the TensorFlow package and using the k-Wave MATLAB (R2018b) toolbox [42] for some calculations concerning the accurate operator. We used a single Quadro P6000 to conduct the experiments.

Illustration of the limited view imaging scenario under consideration. Left: numerical phantom with a line detector (red line). Middle: ideal data from the accurate forward model. Right: data obtained with an approximate model with clearly visible aliasing artefacts.

Model corrections under consideration

We evaluate the forward only method with a gradient penalty term as described in section 3 as well as the forward-adjoint approach as outlined in section 4.¹ For both of these methods, we conduct experiments with a model trained on back-projected measurements only and with a model that has been trained using recursive training (section 5.1). As a baseline method, we compare this to the widely used AEM approach as outlined in section 2.1, a linear approach to model correction. We finally compare this to reconstructions obtained with the uncorrected operator as well as to the reconstruction the accurate operator yields. This allows us to assess how well various correction approaches are able to correct the shortcomings of the uncorrected operator.

Measurement setup

We consider a limited view problem in this study, where measurements are only taken on top of the target with a line detector, as indicated in Figure 2. In particular, we consider an image size of 64 × 64, the measurements are taken with a line detector of the same width as the target, and t = 64 time points, resulting in a measurement space of the same size, i.e., 64 × 64. The detector is modeled as a Fabry-Perot sensor [46] with wide bandwidth and no directivity. Since both image and data space can be represented as a two-dimensional image, it is reasonable to use the same network architecture for both spaces.

Training samples

For the evaluation of the various model correction methods, we utilize two different sets of samples. First, a simple synthetic set of “ball” images, consisting of circles of varying intensity in [0.75,1], with fixed radius, but random location on an empty, zero intensity background. We employ a total of 4096 ball samples for fitting the correction and an additional 64 for evaluation. An example of a ball image and the corresponding data are illustrated in Figure 2. Second, a realistic vessel set that has been obtained by segmenting vessels from three-dimensional (3D) CT scans to provide realistic phantoms, see [21] for details. For this study, the 3D volumes have been projected to two dimensions by a maximum intensity projection and subsequently cropped to the intended target size; we note that all samples are normalized between [0,1]. Examples of the obtained vessel phantoms are displayed in Figure 3. We use 2760 unique vessel phantoms for training, augmented by a rotation by 90\circ for a training set of 5520 samples in total. We evaluate on a separate test set containing 64 samples. All phantoms had a resolution of 64² and resolution in data space is the same for both, correct and approximate model. The phantoms are used to generate synthetic measurements yⁱ := Axⁱ + eⁱ by applying the accurate operator A and adding Gaussian white noise at 1% of the maximum value in measurement space.

Examples from the vessel set used for training of the model correction. The phantoms were obtained from segmented CT scans to provide a realistic ground-truth image for photoacoustic imaging of vessel structures

Training scheme

For every measurement yⁱ, we compute $x_{0}^{i} : = 4 \cdot {\tilde{A}}^{*} y$ as an initial reconstruction. We choose to rescale the adjoint Ã*y by a factor of 4 as in our measurement setup we typically have $∥ A x ∥_{Y} \approx \frac{1}{2} ∥ x ∥_{X} and A^{*} y_{X} \approx \frac{1}{2} ∥ y ∥_{Y} .$ This is due to the fact that we measure along a line on one side of the object only, hence recording only half the energy emitted on the measurement device. This ensures that the average intensity of the backprojection roughly matches the one of both the ground truth and the minimizer of the variational functional. It allows us to keep the norm of the reconstruction approximately stable throughout solving the variational problem (5.5) and hence makes operator approximations more robust throughout the trajectory of minimizing (5.5).

Given a set of training samples yⁱ, we then train the forward approximation with the loss term

\sum_{i} \underset{Forward Loss}{\underset{︸}{{‖ F_{Θ} (\tilde{A} x_{0}^{i}) - A x_{0}^{i} ‖}_{Y}}} + \underset{Adjoint Loss}{\underset{︸}{| | (A^{*} - {\tilde{A}}^{*} {[D F_{Θ} (\tilde{A} x_{o}^{i})]}^{*}) (F_{Θ} (\tilde{A} x_{0}^{i}) - y^{i}) | |_{X}}},

(5.2)

\underset{i}{\sum ​} | | F_{Θ} (\tilde{A} x_{0}^{i}) - A x_{0}^{i} | |_{Y},

(5.3)

weighting the forward and adjoint loss equally. In the case of a forward-adjoint correction, the forward approximation is trained using the loss(5.3) while the adjoint is trained with the loss

\underset{i}{\sum | |} (G_{Φ} \circ {\tilde{A}}^{*} - A^{*}) (F_{Θ} (\tilde{A} x_{0}^{i}) - y^{i}) | |_{X} .

(5.4)

Note that the quasi-adjoint of the approximate operator $A_{Φ}^{*} : = G_{Φ} \circ {\tilde{A}}^{*}$ as well as the adjoint of the forward approximation in (5.2) is evaluated in direction $r : = F_{Θ} (\tilde{A} x_{0}^{i}) - y^{i}$ . This loss is chosen to be consistent with the terms arising during a gradient descent based optimization of (5.5), as shown in the previous chapters.

If recursive training is applied, we additionally compute the iterates of a gradient descent scheme on the penalty functional

\underset{x}{\arg \min} | | A_{Θ} (x) - y^{i} | | + λ R (x) .

(5.5)

All losses are summed over the later iterates $x_{n}^{i} with n \geq 0$ with n ≥ 0, instead of taking the initial point $x_{0}^{i}$ only. To make recursive training stable, the number of recursive steps considered during training is gradually increased to the maximal value, instead of beginning by training on the full trajectory from the start as outlined in section 5.1.

Network details

The networks F_Θ and G_Φ are built with a U-Net [34] architecture, that has been particularly popular in the image reconstruction community including applications to PAT [3, 12, 15] and other modalities [16, 19, 22]. We follow the standard architecture with 4 downsampling and the same amount of upsampling blocks, each containing two convolutional layers with filters of size 5 \times 5. We employed average pooling for downsampling and transpose convolutions for upsampling layers. We note, that the proposed framework is agnostic to the employed architecture; we expect similar results with other sufficiently expressive network architectures.

Solving the variational problem

We employ gradient descent with a fixed step size of 0.2 for all experiments to solve the variational problem (5.5), which we have seen can lead to a near-optimal reconstruction given sufficient approximation quality in section 4.2. We additionally add a positivity constraint x_n ≥ 0 everywhere to the minimization that we incorporate using projected gradient descent. This means we cut the negative part of every iterate to 0 everywhere, as negative values are nonphysical.

As regularization functional R we choose the pseudo-Huber varation functional

R (x) : = \sum_{i, j} δ [\sqrt{1 + \frac{1}{δ^{2}} [{(x [i + 1, j] - x [i, j])}^{2} + {(x [i, j + 1] - x [i, j])}^{2}]} - 1]

(5.6)

to reconstruct x ∈ R^64×64. Here x[i, j] denotes the pixel of x at location i along the vertical and j along the horizontal axis. This functional approximates the L²-norm of the gradient of the reconstruction for small values and the L¹-norm for large values of the gradient, coinciding with total variation (TV) in the limit δ —> 0. The parameter δ specifies the characeristic length at which the behavior of the regularization functional changes from approximating L² to L¹. We chose δ = 0.01 for all experiments. We remark that this functional is strongly convex on all bounded domains for all δ > 0, with the strong convexity constant depending on δ and the diameter of the imaging domain. The latter is in our case specified by the constraint x[i,j] ∈ [0,1].

The regularization parameter λ is tuned for every experiment and baseline individually via a grid search over a logarithmically evenly spaced grid with grid points being a factor of Zop(lO) apart. The best parameter was chosen in terms of L² distance to the ground-truth image.

6. Computational results

Synthetic ball phantoms

To evaluate the proposed approaches we solve the variational problem employing the various approaches for model correction for a set of samples generated from a test set that is different from the samples used for fitting the correction. We use the same Huber regularization functional and regularization parameter as discussed in the last paragraph.

First, we investigate the correction accuracy in terms of the alignment of the gradient of the data fidelity term with the accurate gradient A*(Ax_n – y) throughout the minimization of the variational functional in Figure 4. As a notion of alignment we consider

\cos Φ_{v} (x) = \frac{{〈 A^{*} (A x_{n} - y), (G_{Φ} \circ {\tilde{A}}^{*}) (F_{Θ} (\tilde{A} x) - y) 〉}_{X}}{| | A^{*} (A x_{n} - y) | |_{X} | | (G_{Φ} \circ {\tilde{A}}^{*}) (F_{Θ} (\tilde{A} x) - y) | |_{X}}

(6.1)

in the case of the forward-adjoint method. For the forward only and AEM methods, the expression (G_Φ o A*) (F_Θ(Ax) – y) is replaced by the appropriate gradient of the corrected data fidelity term. Equation (6.1) is a slight deviation from (4.7) used in the theory section. This is to ensure good comparability with the baseline AEM and better interpretability by rescaling the alignment with the norm of the approximate gradient. This also makes different choices of regularization parameters more comparable. In the theory section we instead rescale with the norm of the accurate gradient only, making the proofs more straightforward.

Alignment (6.1) of approximate gradient to the gradient of the accurate data term Ã*(Ax_n – y) for each approach on the ball test set of 64 samples. The alignment is recorded over all minimization steps for solving the associated variational problem. On the left (a) for the full trajectory and on the right (b) for the first 500 steps.

We note that all correction methods apart from the AEM approach start at a high alignment of > 0.8 at the first iterate. However, only the forward-adjoint based methods are able to achieve an alignment of > 0.95 at the first iterate. Forward only approaches that rely on fitting a correction in measurement space only are limited by the range of the adjoint Ã* as discussed in section 4.

However, the alignment starts decreasing rapidly over the minimization of the variational problem, dropping below 0 for the forward-adjoint method before the 200th iterate. The recursive versions of the forward and forward-adjoint methods, as discussed in section 5.1, are able to mitigate some of this shortcoming. While the alignment between accurate gradient and the correction also declines throughout the minimization of the variational problem when employing recursive training, the decline is significantly less steep and occurs at a later stage of the minimization. We also note that the alignment never drops under 0.2 for recursively trained corrections.

The benchmark AEM method is not able to correct the gradient as accurately as any of the methods we discussed for the first iterates of the variational problem. However, it does not exhibit a decline of the alignment as drastic as any of the other methods throughout the minimization process. This can be explained by the lower expressive power of AEM compared to the corrections based on neural networks that does not allow the method to fit the accurate gradient as well for early iterates but prevents overfitting on later iterates, leading to the method being stable throughout the minimization of the variational functional.

The different behaviors of forward and forward-adjoint methods as well as their recursive counterparts is investigated in Figure 5. We note that in terms of the forward approximation error, applying recursive training makes the key difference in terms of keeping a low error throughout gradient descent. For the adjoint approximation error we note that methods based on the forward scheme that fit a single operator are not able to achieve low error,

Approximation error of the model correction compared to the accurate operator on the ball test set of 64 samples, tracked throughout the first 300 steps of the gradient descent scheme. Left (a): relative error of the forward approximation as defined in (5.3). Right (b): relative error for the adjoint, as defined for the forward only in (5.2) and for the forward-adjoint method in (5.4).

(a) Relative approximation error of forward opertor (b) Relative approximation error of adjoint operaator even at the first iterate due to the fundamental limitations of the method. Forward-adjoint methods on the other hand are able to fit the accurate adjoint well at the first iterates, but also suffer from deteriorated approximation quality for later steps.

In Figure 6, we see evolution of the data term ||Ax_n – y||_Y evaluated using the accurate operator A in order to test if the corrections minimize the original variational problem. We note that both recursive methods are able to effectively minimize the data term quickly, with both converging stably to their respective minimal value. This empirical observation shows that the learned reconstructions in fact lead to a variational energy that satisfies Lemma 4.4 to ensure closeness of minimizer. We note that forward-adjoint recursive is able to achieve a lower data loss than its forward only counterpart, which is consistent with the behavior observed in Figure 4. It is interesting to note, that both methods are able to minimize the accurate data term significantly better than the baseline AEM. When omitting recursive training both the forward only and the forward-adjoint algorithm are not able to minimize the accurate data term well.

True data term ‖Ax_n – y||y evaluated for all methods on the ball test set of 64 samples, tracked throughout the gradient descent scheme.

Finally, we evaluate the model correction in terms of the distance of the reconstruction to the ground-truth image, measured by the relative L² error shown in Figure 7. We note that all approximation approaches outperform the uncorrected operator in this metric. Both corrections, forward and forward-adjoint, without recursive training lead to a decrease in reconstruction error reconstruction quality for the first 300 optimization steps, stagnating or even deteriorating afterwards. This is again consistent with the findings in Figure 4, which show that the gradient generated by these methods does not align with the accurate gradient any more at this point of the minimization. The recursive counterparts of the forward and forward-adjoint method produce considerably better results, with the recursive forwardadjoint method generating reconstructions that are nearly of the same quality as the ones generated by the accurate operator. The baseline with AEM is converging more slowly than any of the other methods but is able to produce high-quality results after 4000 gradient descent steps that are on par with the forward recursive method, but are significantly outperformed by the recursive forward-adjoint method.

Relative reconstruction error (L2) for all methods on the ball test set of 64 samples, tracked throughout the gradient descent scheme.

For a qualitative evaluation, we show obtained reconstructions in Figure 8 for all methods discussed and two samples with different behavior. In the first example, where the ball is close to the line detector, we note that all methods are able to correct the errors introduced by the approximate operator to some extent. However, both the forward and forward-adjoint method introduce background artefacts when not trained recursively. These artefacts disappear when recursive training is applied, leading to near perfect reconstructions. Compared to AEM as baseline, which is able to correct the approximate operator without introducing background artefacts, the correction by AEM introduces blurred edges of the ball that are not observed by any of the neural network based corrections we are investigating. The second sample is particularly more challenging, with the ball being far from the detector exhibiting stronger limited view artefacts and consequently the approximate operator introduces severe artefacts if uncorrected. For the corrections without recursive training we see again that both approaches, forward and forward-adjoint, introduce background artefacts. For the forward method, these artefacts cannot be suppressed by applying recursive training, leaving a severe artefact at the boundary of the domain. Only the recursive forward-adjoint is able to produce a reconstruction that is nearly on par with the reconstruction obtained with the accurate operator and that does not exhibit any obvious artefacts. The baseline with AEM also introduces background artefacts leaking from the ball, but those are more structured and less severe than those of all other methods apart from the forward-adjoint recursive approach which gives the best visual results in this setting as well. The visual quality of the reconstructions hence coincides with the quantitative results discussed in Figure 7.

Reconstructions for the various model correction algorithms for two samples from the ball set. We show the results after 4000 *steps of gradient descent. Huber regularization is used. Top* (a): *Phantom close to the detector, which corresponds to an easy setting for limited view PAT. Bottom* (b): *Phantom far from the the detector, which corresponds to a very challenging setting*.

Figure 9 visualizes the effect of the forward-adjoint recursive approach on the ball images, showing Ax₀ Ãx₀, and A_Θ (x₀) as well as the gradients of the data term for each of the operators A, Ã, and A_Θ. The visualizations are computed for sample (b) in Figure 8 on the ball samples. We see that the forward-adjoint approach is in fact able to correct for approximation artefacts both in the forward operator as well as in its adjoint, leading to a good approximation of the accurate gradient of the data term.

Estimated measurements and gradients at initialization of the gradient descent scheme for a sample from the ball images.

Vessel phantoms

The results on the vessel phantoms quantitatively match the overall behavior observed on the ball set. The alignment, as shown in Figure 10, is again initially higher with forward-adjoint methods achieving higher values as forward only methods. If no recursive training is applied, alignment declines very quickly. AEM is again generating gradients of comparatively low initial alignment, that however stays relatively steady throughout solving the variational problem. We note that the overall alignment is significantly lower than in the case of the ball samples, reflecting the additional difficulty of the vessel set.

*Alignment* (6.1) *of approximate gradient to the gradient of the accurate data term Ã*(Ax_n – y*) *for each method on the vessel test set with* 64 *samples, recorded over the* 250 *steps of solving the associated variational problem*.

The relative error of the reconstructions compared to the ground truth can be seen in Figure 11. We again see both the forward and forward-adjoint methods fail to improve reconstruction quality further early into the minimization process if recursive training is omitted. In case recursive training is applied, both methods lead to a clear improvement over the uncorrected operator, with the forward-adjoint approach again performing considerably better than the forward only. On the vessel samples we however note a considerably larger gap between the forward-adjoint correction and the accurate operator that is caused by the extremely challenging nature of the vessel set. The AEM baseline converges slowly on the vessels, an indication that the estimated covariance matrix is fairly ill-conditioned. We hence additionally report the reconstruction quality at convergence, which we observed after 20000 steps of gradient descent. While this is a competitive reconstruction, it is still outperformed slightly by the recursive forward-adjoint method. We remark that we have applied early stopping for all other methods on the vessel samples.

*Relative reconstruction error (L2) for all methods on the vessel test set with* 64 *samples, tracked throughout the gradient descent scheme*. 250 *steps of gradient descent were performed for all methods but AEM, where* 20000 *steps were taken*.

We present reconstructions for all discussed methods for two samples in Figure 12. We note for the first sample that the vessel structure at the right of the image completely disappears when using the uncorrected approximation. In fact, the corresponding measurement is severely reduced due to the thresholding of incident waves in the approximate model. Hence, no correction method is able to fully recover the vessel structure at the right of the first sample, with AEM, forward method, and forward-adjoint method coming closest. For all correction methods we observe a deterioration in reconstruction quality compared to the accurate operator. We note that the recursive forward method seems to lead to striping artefacts. Consistent with the quantitative results in Figure 11 the forward-adjoint recursive reconstructions are of the highest visual quality compared to the other reconstructions using a model correction, leading to sharper results than the AEM baseline and to fewer artefacts than methods based on the forward only approach or those omitting recursive training. We remark that, up to some extent, perceived differences in smoothness can also be caused as the regularization parameter has been optimized for all methods individually and hence might differ slightly between reconstructions.

*Reconstructions on vessels using the various operator corrections. We show the results after* 250 *iterations of gradient descent for all methods but AEM, for which* 20000 *iteration steps were taken. Huber regularization is used*.

To this end, we note that the training set with a total of 2760 samples (5520 with rotations) is fairly small when taking into account the complexity of the vessel structures; see, for instance, the discussion with respect to AEM in [35]. It is hence possible that the remaining gap in reconstruction quality to the accurate operator could be closed further by using a more extensive training set. However, we expect that the gap cannot be closed completely on samples with a complexity comparable to the vessel phantoms as too much information might be lost in the thresholding step of the approximate operator that cannot be recovered even when taking into account the structure of the samples with highly parameterized learned corrections. This underlines the necessity of a statistical correction as discussed throughout section 2 to compensate for lost kernel directions in the approximate operator.

Model transfer between vessel and ball phantoms

In this paragraph we investigate how well the operator corrections trained on either the ball or the vessel samples generalize to the other of the two data sets. In particular, we discuss using models trained on balls to reconstruct vessels and vice versa. The aim of these experiments is to obtain a first understanding on how well-trained model corrections generalize to new data sets in general, especially if the new set is very different from the training data in terms of image characteristics.

When using models trained on the ball samples and tested on vessel images, we notice that the model gives reasonable corrections at the initialization of the variational scheme for the vessel samples, yielding corrected gradients. Nevertheless, the correction quality deteriorated rapidly during the gradient descent steps and the final reconstruction was not satisfactory compared to reconstructions obtained with the uncorrected approximate operator A. We hypothesize that the ball data were too distinct from the vessel samples and that the structure of the ball data were too simple for the learned model to perform reasonably on the much more complicated vessel data. In particular, the learned corrections were potentially fit very tightly to data and measurements induced by the ball phantoms that do not contain the same level of complexity as the vessel phantoms. Heuristically speaking, the data manifold of the ball samples seems to be too low dimensional to generalize to other data.

On the other hand, when using the forward-adjoint recursive model trained on the vessel samples on the ball samples, we obtained results that are clear improvements over reconstructions obtained with the uncorrected operator and are even comparable to the nonrecursively trained methods on the ball data. We do, however, not match the performance of the forwardadjoint recursive model trained on the ball samples themselves. Figure 13 shows reconstructions on a ball sample for various methods trained on the vessel samples. The reconstructions show a well-localized ball reconstruction with fairly sharp edges even in the challenging case of the ball sample located far from the detector plate. The results can be compared to results obtained with methods trained on the ball samples, as shown in Figure 8. The visual assessment of reconstruction quality matches the quantitative results in terms of L² error as shown in Table 2.

Models trained on vessel samples, evaluated on ball samples. From left to right: Ground-truth image, reconstruction using the uncorrected operator, reconstruction using a recursive forward-adjoint correction with the same TV parameter as used on vessel data, reconstruction using a recursive forward-adjoint correction with new optimal TV parameter.

Table 2.

Performance of the recursive forward-adjoint correction on ball samples. We evaluate the performance of models trained on vessel samples and compare to models trained on ball samples. Results are reported in terms of the L² error compared to the ground-truth image.

	Training data	L² error
Accurate operator	-	0.11
Approximate operator	-	0.55
Forward-adjoint	balls	0.15
For.-adj. (old TV param.)	vessels	0.40
For.-adj. (new TV param.)	vessels	0.35

Open in a new tab

Finally, we note in both Figure 13 and Table 2 that adopting the regularization parameter λ of the forward-adjoint correction trained on vessel samples to a new optimal value for the ball data yields considerable improvements in performance. This demonstrates one of the main advantages of explicit corrections over their implicit counterparts, as separating between model correction and regularization allows for an adaption of the regularization parameter to the task, independently of the model correction learned.

7. Conclusion

In this paper, we have introduced various approaches to learn a data-driven explicit model correction for inverse problems to be employed within a variational reconstruction framework. We have investigated several strategies to learn such a correction, starting with a simple forward correction for which we pointed out some fundamental limitations. In particular, we observed that this approach is limited by the range of the adjoint of the approximate operator when employed in a gradient descent scheme and is therefore unable to fully correct all modeling errors. To mitigate this, we have proposed a forward-adjoint correction as an alternative approach, overcoming these limitations by fitting an independent adjoint correction.

To ensure a model correction that can be employed throughout the optimization process and avoid overfitting the initial reconstruction, we proposed to augment all methods with a recursive training scheme. For the recursive forward-adjoint correction we provided a theoretical convergence analysis to show that the method approximates the accurate solution when trained to a sufficiently low loss. Finally, we have shown the potential of our approach on the task of limited view PAT, demonstrating our theoretical considerations in practice and showing improved results compared to the commonly used AEM.

For the data chosen, the algorithm can be trained very quickly, requiring 12h for nonrecursive experiments and around 16h for their recursive counterparts. For images larger than the 64 × 64 format used in the paper, the number of operations scales linearly with the number of pixels and hence quadratically with resolution in 2 dimensions and cubically in 3 dimensions. The actual increase in computational time might scale lower than the increase in operators as a larger number of operations per layer increases the potential for parallelization. The number of network parameters, however, does not necessarily change with resolution. Higher resolutions might make a deeper architecture appropriate, but the increase in weights caused by this would typically be strongly sublinear.

This work is orthogonal to previous attempts at using neural networks to learn operator corrections that were exclusively focused on the idea of implicit model corrections, learning the correction operator, and a reconstruction prior simultaneously in an end-to-end trained reconstruction network. While this approach comes with advantages in terms of performance, our explicit model correction allows us to flexibly use any prior model alongside the corrected operator and can be integrated in the well-established framework of variational regularization. Furthermore, our work unveils some of the challenges in model correction that are hidden in implicit schemes. Our findings can be used to inspire the design of novel implicit algorithms and allows for an analysis of implicit correction in future studies. In particular, our observations on the limitations of the range of the adjoint of the approximation motivates the use of corrections in both reconstruction and data space for implicit model correction, motivating the use of algorithms such as LPD [2].

In future work one could apply the proposed method to different fields of application, such as CT. In this application, the accurate model can be obtained by expensive photon-level Monte Carlo simulations, whereas a computationally efficient approximation is given by the widely used ray transform. In general, applications to inverse problems involving nonlinear operators are an interesting direction deserving further study; we refer to a related study exploring first ideas in this direction [40]. A class of very challenging applications are settings where we do not have explicit access to the accurate forward operator, but instead have access to empirical measurements only. Examples of such problems are tomography with slightly wrong estimated angles or deconvolution problems with errors in the point-spread function. These problems differ from the setting considered in this paper, where explicit access to the accurate operator was given and the approximation was performed to overcome computational constraints. In particular, the concept of recursive training, as presented here, requires explicit access to the accurate operator and is thus not readily applicable for problems where we have access to empirical measurements only, making them particularly challenging. We believe that in such settings, alternate training regimes that are not fully supervised and make use of secondary measures will be needed, estimating the approximation error from the data itself.

Finally, we mention a possible combination of the proposed approach with AEM techniques. Since the latter, after training, yields a multivariate normal distribution as an estimate of the distribution of model errors it becomes increasingly unreliable as the non-Gaussianity of the accurate distribution increases. However, after an initial nonlinear correction of the form A_Φ described here, the AEM could be reestimated using such a model. Commensurately, the estimated statistics of the model error from the AEM could be used in place of the simple T²-loss used in the training in (5.2) and (5.3) for example (i.e., the norm implied in the space T). A possible future research direction could therefore be to iterate these approaches with a view to obtaining a more accurate probabilistic estimate of the eventual remaining model errors.

Supplementary Material

Appendix

EMS123349-supplement-Appendix.pdf^{(218.5KB, pdf)}

Acknowledgments

The authors acknowledge helpful discussions with Jonas Adler, Jari Kapio, Yury Korolev, Ozan Öktem, and Peter Maass amongst others.

Funding

The work of the authors was partially supported by the Academy of Finland projects 312123, 312342 (Finnish Centre of Excellence in Inverse Modeling and Imaging, 2018-2025), 334817, 314411, the Jane and Aatos Erkko Foundation, the British Heart Foundation grant NH/18/1/33511, the CMIC-EPSRC platform grant (EP/M020533/1), and the EPSRC-Wellcome grant WT101957. The work of the first author was supported by the EPSRC grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis, and the Cantab Capital Institute for the Mathematics of Information. The work of the fourth author was supported by the Leverhulme Trust project “Breaking the non-convexity barrier,” the Philip Leverhulme Prize, the EPSRC grants EP/S026045/1, EP/T003553/1, the EPSRC Centre grant EP/N014588/1, the Wellcome Innovator Award RG98755, the RISE projects CHiPS and NoMADS, the Cantab Capital Institute for the Mathematics of Information, and the Alan Turing Institute. The work of the fifth author was supported by the EPSRC grants EP/N022750/1, EP/T000864/1.

Contributor Information

Sebastian Lunz, Email: lunz@math.cam.ac.uk.

Andreas Hauptmann, Email: Andreas.Hauptmann@oulu.fi.

Tanja Tarvainen, Email: tanja.tarvainen@uef.fi.

Carola-Bibiane Schönlieb, Email: cbs31@cam.ac.uk.

Simon Arridge, Email: s.arridge@ucl.ac.uk.

References

[1].Adler J, Oktem O. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems. 2017;33:124007 [Google Scholar]
[2].Adler J, Oktem O. Learned primal-dual reconstruction. IEEE Trans Med Imaging. 2018;37:1322–1332. doi: 10.1109/TMI.2018.2799231. [DOI] [PubMed] [Google Scholar]
[3].Antholzer S, Haltmeier M, Schwab J. Deep learning for photoacoustic tomography from sparse data. Inverse Prob Sci Eng. 2019;27:987–1005. doi: 10.1080/17415977.2018.1518444. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Arridge S, Kaipio J, Kolehmainen V, Schweiger M, Somersalo E, Tarvainen T, Vauhkonen M. Approximation errors and model reduction with an application in optical diffusion tomography. Inverse Problems. 2006;22:175–195. doi: 10.1109/IEMBS.2006.260738. [DOI] [PubMed] [Google Scholar]
[5].Arridge S, Maass P, Oktem O, SchÖnlieb C-B. Solving inverse problems using data-driven models. Acta Numer. 2019;28:1–174. [Google Scholar]
[6].Beard P. Biomedical photoacoustic imaging. Interface Focus. 2011;1:602–631. doi: 10.1098/rsfs.2011.0028. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Boink YE, Brune C. Learned SVD: Solving Inverse Problems via Hybrid Autoencoding. 2019. preprint, https://arxiv.org/abs/1912.10840.
[8].Borcea L, Druskin V, Mamanov A, Moskow S, Zaslvsky M. Reduced order models for spectral domain inversion: Embedding into the continuous problem and generation of internal data. Inverse Problems. 2020;36:055010 [Google Scholar]
[9].Bubba TA, Kutyniok G, Lassas M, Maerz M, Samek W, Siltanen S, Srinivasan V. Learning the invisible: A hybrid deep learning-shearlet framework for limited angle computed tomography. Inverse Problems. 2019;35:064002 [Google Scholar]
[10].Burger M, Korolev Y, Rasch J. Convergence rates and structure of solutions of inverse problems with imperfect forward models. Inverse Problems. 2019;35:024006 [Google Scholar]
[11].Cox B, Beard P. Fast calculation of pulsed photoacoustic fields in fluids using k-space methods. JAcoust Soc Amer. 2005;117:3616–3627. doi: 10.1121/1.1920227. [DOI] [PubMed] [Google Scholar]
[12].Davoudi N, Deán-Ben XL, Razansky D. Deep learning optoacoustic tomography with sparse data. Nature Mach Intell. 2019;1:453–460. [Google Scholar]
[13].Egger H, Pietschmann J-F, Schlottbom M. Identification of chemotaxis models with volume filling. SIAM J Appl Math. 2015;75:275–288. [Google Scholar]
[14].Freund RW. Model reduction methods based on Krylov subspaces. Acta Numer. 2003;12:267–319. [Google Scholar]
[15].Guan S, Khan A, Sikdar S, Chitnis P. Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. IEEE J BioMed Health Inform. 2020;24:568–576. doi: 10.1109/JBHI.2019.2912935. [DOI] [PubMed] [Google Scholar]
[16].Hamilton SJ, Hauptmann A. Deep D-bar: Real-time electrical impedance tomography imaging with deep neural networks. IEEE Trans Med Imaging. 2018;37:2367–2377. doi: 10.1109/TMI.2018.2828303. [DOI] [PubMed] [Google Scholar]
[17].Hammernik K, Klatzer T, Kobler E, Recht MP, Sodickson DK, Pock T, Knoll F. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med. 2018;79:3055–3071. doi: 10.1002/mrm.26977. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Hauptmann A, Adler J, Arridge S, Öktem O. Multi-scale learned iterative reconstruction. IEEE Trans Comput Imaging to appear. doi: 10.1109/TCI.2020.2990299. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Hauptmann A, Arridge S, Lucka F, Muthurangu V, Steeden JA. Real-time cardiovascular MR with spatio-temporal artifact suppression using deep learning-proof of concept in congenital heart disease. Magn Reson Med. 2019;81:1143–1156. doi: 10.1002/mrm.27480. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Hauptmann A, Cox B, Lucka F, Huynh N, Betcke M, Beard P, Arridge S. Approximate k-space models and deep learning for fast photoacoustic reconstruction. International Workshop on Machine Learning for Medical Image Reconstruction; Cham, Switzerland. Springer; 2018. pp. 103–111. [Google Scholar]
[21].Hauptmann A, Lucka F, Betcke M, Huynh N, Adler J, Cox B, Beard P, Ourselin S, ARridge S. Model based learning for accelerated, limited-view 3d photoacoustic tomography. IEEE Trans Med Imaging. 2018;39:1382–1393. doi: 10.1109/TMI.2018.2820382. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Jin KH, McCann MT, Froustey E, Unser M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Process. 2017;26:4509–4522. doi: 10.1109/TIP.2017.2713099. [DOI] [PubMed] [Google Scholar]
[23].Kaipio J, Somersalo E. Appl Math Sci. Vol. 160 Springer; New York: 2005. Statistical and Computational Inverse Problems. [Google Scholar]
[24].Kaipio J, Somersalo E. Statistical inverse problems: Discretization, model reduction and inverse crimes. J Comput Appl Math. 2007;198:493–504. [Google Scholar]
[25].Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys. 2017;44:e360–e375. doi: 10.1002/mp.12344. [DOI] [PubMed] [Google Scholar]
[26].Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc Ser B Stat Methodol. 2001;63:425–464. [Google Scholar]
[27].Kobler E, Effland A, Kunisch K, Pock T. Total deep variation for linear inverse problems. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE, Piscataway, NJ. pp. 7549–7558. [Google Scholar]
[28].Koestli KP, Frenz M, Bebie H, Weber HP. Temporal backward projection of optoacoustic pressure transients using Fourier transform methods. Phys Med Biol. 2001;46:1863–1872. doi: 10.1088/0031-9155/46/7/309. [DOI] [PubMed] [Google Scholar]
[29].Korolev Y, Lellmann J. Image reconstruction with imperfect forward models and applications in deblurring. SIAM J Imaging Sci. 2018;11:197–218. [Google Scholar]
[30].Kuchment P, Kunyansky L. Mathematics of thermoacoustic tomography. European J Appl Math. 2008;19:191–224. [Google Scholar]
[31].Li H, Schwab J, Antholzer S, Haltmeier M. Nett: Solving inverse problems with deep neuralnetworks. Inverse Problems. 2020;36:065005 [Google Scholar]
[32].Lorz A, Pietschmann J-F, Schlottbom M. Parameter identification in a structured populationmodel. Inverse problems. 2019;35:095008 [Google Scholar]
[33].Lunz S, Öktem O, SchÖnlieb C-B. Adversarial regularizers in inverse problems. Advances in Neural Information Processing Systems, Curran Associates; Red Hook, NY. pp. 8507–8516. [Google Scholar]
[34].Ronneberger Ö, Fischer P, Brox T. U-net: Convolutional networks for biomedical imagesegmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer, Cham, Switzerland. 2015. pp. 234–241. [Google Scholar]
[35].SahlstrÖm T, Pulkkinen A, Tick J, Leskinen J, Tarvainen T. Modeling of errors due to uncertainties in ultrasound sensor locations in photoacoustic tomography. IEEE Trans Med Imaging. 2020;39:2140–2150. doi: 10.1109/TMI.2020.2966297. [DOI] [PubMed] [Google Scholar]
[36].Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans Med Imaging. 2017;37:491–503. doi: 10.1109/TMI.2017.2760978. [DOI] [PubMed] [Google Scholar]
[37].Schwab J, Antholzer S, Haltmeier M. Deep null space learning for inverse problems: Convergence analysis and rates. Inverse Problems. 2019;35:025008. doi: 10.1080/17415977.2018.1518444. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Siewerdsen JH, Jaffray DA. Cone-beam computed tomography with a flat-panel imager:Magnitude and effects of X-ray scatter. Med Phys. 2001;28:220–231. doi: 10.1118/1.1339879. [DOI] [PubMed] [Google Scholar]
[39].Smyl D, Liu D. Less is often more: Applied inverse problems using hp-forward models. J Comput Phys. 2019;399:108949 [Google Scholar]
[40].Smyl DT, Tallman N, Black JA, Hauptmann A, Liu D. Learning and Correcting Non Gaussian Model Errors. 2020. preprint, https://arxiv.org/abs/2005.14592.
[41].Tarvainen T, Pulkkinen A, Cox BT, Kaipio JP, Arridge SR. Bayesian image reconstruction in quantitative photoacoustic tomography. IEEE Trans Med Imaging. 2013;32:2287–2298. doi: 10.1109/TMI.2013.2280281. [DOI] [PubMed] [Google Scholar]
[42].Treeby BE, Cox BT. k-wave: Matlab toolbox for the simulation and reconstruction of photoacoustic wave fields. J BioMed Opt. 2010;15:021314. doi: 10.1117/1.3360308. [DOI] [PubMed] [Google Scholar]
[43].Treeby BE, Jaros J, Rendell AP, Cox BT. Modeling nonlinear ultrasound propagationin heterogeneous media with power law absorption using a k-space pseudospectral method. J Acoust Soc Amer. 2012;131:4324–4336. doi: 10.1121/1.4712021. [DOI] [PubMed] [Google Scholar]
[44].Vishnevskiy V, Rau R, Goksel O. Deep variational networks with exponential weighting for learning computed tomography. International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer, Cham, Switzerland. 2019. pp. 310–318. [Google Scholar]
[45].Y Xu LW, Ambartsoumian G, Kuchment P. Reconstructions in limited-view thermoacoustic tomography. Med Phys. 2004;31:724–733. doi: 10.1118/1.1644531. [DOI] [PubMed] [Google Scholar]
[46].Zhang E, Laufer J, Beard P. Backward-mode multiwavelength photoacoustic scanner using a planar Fabry-Perot polymer film ultrasound sensor for high-resolution three-dimensional imaging of biological tissues. Appl Opt. 2008;47:561–577. doi: 10.1364/ao.47.000561. [DOI] [PubMed] [Google Scholar]
[47].Zhu L, Xie Y, Wang J, Xing L. Scatter correction for cone-beam CT in radiation therapy. Med Phys. 2009;36:2258–2268. doi: 10.1118/1.3130047. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

EMS123349-supplement-Appendix.pdf^{(218.5KB, pdf)}

[R1] [1].Adler J, Oktem O. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems. 2017;33:124007 [Google Scholar]

[R2] [2].Adler J, Oktem O. Learned primal-dual reconstruction. IEEE Trans Med Imaging. 2018;37:1322–1332. doi: 10.1109/TMI.2018.2799231. [DOI] [PubMed] [Google Scholar]

[R3] [3].Antholzer S, Haltmeier M, Schwab J. Deep learning for photoacoustic tomography from sparse data. Inverse Prob Sci Eng. 2019;27:987–1005. doi: 10.1080/17415977.2018.1518444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Arridge S, Kaipio J, Kolehmainen V, Schweiger M, Somersalo E, Tarvainen T, Vauhkonen M. Approximation errors and model reduction with an application in optical diffusion tomography. Inverse Problems. 2006;22:175–195. doi: 10.1109/IEMBS.2006.260738. [DOI] [PubMed] [Google Scholar]

[R5] [5].Arridge S, Maass P, Oktem O, SchÖnlieb C-B. Solving inverse problems using data-driven models. Acta Numer. 2019;28:1–174. [Google Scholar]

[R6] [6].Beard P. Biomedical photoacoustic imaging. Interface Focus. 2011;1:602–631. doi: 10.1098/rsfs.2011.0028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Boink YE, Brune C. Learned SVD: Solving Inverse Problems via Hybrid Autoencoding. 2019. preprint, https://arxiv.org/abs/1912.10840.

[R8] [8].Borcea L, Druskin V, Mamanov A, Moskow S, Zaslvsky M. Reduced order models for spectral domain inversion: Embedding into the continuous problem and generation of internal data. Inverse Problems. 2020;36:055010 [Google Scholar]

[R9] [9].Bubba TA, Kutyniok G, Lassas M, Maerz M, Samek W, Siltanen S, Srinivasan V. Learning the invisible: A hybrid deep learning-shearlet framework for limited angle computed tomography. Inverse Problems. 2019;35:064002 [Google Scholar]

[R10] [10].Burger M, Korolev Y, Rasch J. Convergence rates and structure of solutions of inverse problems with imperfect forward models. Inverse Problems. 2019;35:024006 [Google Scholar]

[R11] [11].Cox B, Beard P. Fast calculation of pulsed photoacoustic fields in fluids using k-space methods. JAcoust Soc Amer. 2005;117:3616–3627. doi: 10.1121/1.1920227. [DOI] [PubMed] [Google Scholar]

[R12] [12].Davoudi N, Deán-Ben XL, Razansky D. Deep learning optoacoustic tomography with sparse data. Nature Mach Intell. 2019;1:453–460. [Google Scholar]

[R13] [13].Egger H, Pietschmann J-F, Schlottbom M. Identification of chemotaxis models with volume filling. SIAM J Appl Math. 2015;75:275–288. [Google Scholar]

[R14] [14].Freund RW. Model reduction methods based on Krylov subspaces. Acta Numer. 2003;12:267–319. [Google Scholar]

[R15] [15].Guan S, Khan A, Sikdar S, Chitnis P. Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. IEEE J BioMed Health Inform. 2020;24:568–576. doi: 10.1109/JBHI.2019.2912935. [DOI] [PubMed] [Google Scholar]

[R16] [16].Hamilton SJ, Hauptmann A. Deep D-bar: Real-time electrical impedance tomography imaging with deep neural networks. IEEE Trans Med Imaging. 2018;37:2367–2377. doi: 10.1109/TMI.2018.2828303. [DOI] [PubMed] [Google Scholar]

[R17] [17].Hammernik K, Klatzer T, Kobler E, Recht MP, Sodickson DK, Pock T, Knoll F. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med. 2018;79:3055–3071. doi: 10.1002/mrm.26977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Hauptmann A, Adler J, Arridge S, Öktem O. Multi-scale learned iterative reconstruction. IEEE Trans Comput Imaging to appear. doi: 10.1109/TCI.2020.2990299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Hauptmann A, Arridge S, Lucka F, Muthurangu V, Steeden JA. Real-time cardiovascular MR with spatio-temporal artifact suppression using deep learning-proof of concept in congenital heart disease. Magn Reson Med. 2019;81:1143–1156. doi: 10.1002/mrm.27480. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Hauptmann A, Cox B, Lucka F, Huynh N, Betcke M, Beard P, Arridge S. Approximate k-space models and deep learning for fast photoacoustic reconstruction. International Workshop on Machine Learning for Medical Image Reconstruction; Cham, Switzerland. Springer; 2018. pp. 103–111. [Google Scholar]

[R21] [21].Hauptmann A, Lucka F, Betcke M, Huynh N, Adler J, Cox B, Beard P, Ourselin S, ARridge S. Model based learning for accelerated, limited-view 3d photoacoustic tomography. IEEE Trans Med Imaging. 2018;39:1382–1393. doi: 10.1109/TMI.2018.2820382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Jin KH, McCann MT, Froustey E, Unser M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Process. 2017;26:4509–4522. doi: 10.1109/TIP.2017.2713099. [DOI] [PubMed] [Google Scholar]

[R23] [23].Kaipio J, Somersalo E. Appl Math Sci. Vol. 160 Springer; New York: 2005. Statistical and Computational Inverse Problems. [Google Scholar]

[R24] [24].Kaipio J, Somersalo E. Statistical inverse problems: Discretization, model reduction and inverse crimes. J Comput Appl Math. 2007;198:493–504. [Google Scholar]

[R25] [25].Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys. 2017;44:e360–e375. doi: 10.1002/mp.12344. [DOI] [PubMed] [Google Scholar]

[R26] [26].Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc Ser B Stat Methodol. 2001;63:425–464. [Google Scholar]

[R27] [27].Kobler E, Effland A, Kunisch K, Pock T. Total deep variation for linear inverse problems. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE, Piscataway, NJ. pp. 7549–7558. [Google Scholar]

[R28] [28].Koestli KP, Frenz M, Bebie H, Weber HP. Temporal backward projection of optoacoustic pressure transients using Fourier transform methods. Phys Med Biol. 2001;46:1863–1872. doi: 10.1088/0031-9155/46/7/309. [DOI] [PubMed] [Google Scholar]

[R29] [29].Korolev Y, Lellmann J. Image reconstruction with imperfect forward models and applications in deblurring. SIAM J Imaging Sci. 2018;11:197–218. [Google Scholar]

[R30] [30].Kuchment P, Kunyansky L. Mathematics of thermoacoustic tomography. European J Appl Math. 2008;19:191–224. [Google Scholar]

[R31] [31].Li H, Schwab J, Antholzer S, Haltmeier M. Nett: Solving inverse problems with deep neuralnetworks. Inverse Problems. 2020;36:065005 [Google Scholar]

[R32] [32].Lorz A, Pietschmann J-F, Schlottbom M. Parameter identification in a structured populationmodel. Inverse problems. 2019;35:095008 [Google Scholar]

[R33] [33].Lunz S, Öktem O, SchÖnlieb C-B. Adversarial regularizers in inverse problems. Advances in Neural Information Processing Systems, Curran Associates; Red Hook, NY. pp. 8507–8516. [Google Scholar]

[R34] [34].Ronneberger Ö, Fischer P, Brox T. U-net: Convolutional networks for biomedical imagesegmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer, Cham, Switzerland. 2015. pp. 234–241. [Google Scholar]

[R35] [35].SahlstrÖm T, Pulkkinen A, Tick J, Leskinen J, Tarvainen T. Modeling of errors due to uncertainties in ultrasound sensor locations in photoacoustic tomography. IEEE Trans Med Imaging. 2020;39:2140–2150. doi: 10.1109/TMI.2020.2966297. [DOI] [PubMed] [Google Scholar]

[R36] [36].Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans Med Imaging. 2017;37:491–503. doi: 10.1109/TMI.2017.2760978. [DOI] [PubMed] [Google Scholar]

[R37] [37].Schwab J, Antholzer S, Haltmeier M. Deep null space learning for inverse problems: Convergence analysis and rates. Inverse Problems. 2019;35:025008. doi: 10.1080/17415977.2018.1518444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Siewerdsen JH, Jaffray DA. Cone-beam computed tomography with a flat-panel imager:Magnitude and effects of X-ray scatter. Med Phys. 2001;28:220–231. doi: 10.1118/1.1339879. [DOI] [PubMed] [Google Scholar]

[R39] [39].Smyl D, Liu D. Less is often more: Applied inverse problems using hp-forward models. J Comput Phys. 2019;399:108949 [Google Scholar]

[R40] [40].Smyl DT, Tallman N, Black JA, Hauptmann A, Liu D. Learning and Correcting Non Gaussian Model Errors. 2020. preprint, https://arxiv.org/abs/2005.14592.

[R41] [41].Tarvainen T, Pulkkinen A, Cox BT, Kaipio JP, Arridge SR. Bayesian image reconstruction in quantitative photoacoustic tomography. IEEE Trans Med Imaging. 2013;32:2287–2298. doi: 10.1109/TMI.2013.2280281. [DOI] [PubMed] [Google Scholar]

[R42] [42].Treeby BE, Cox BT. k-wave: Matlab toolbox for the simulation and reconstruction of photoacoustic wave fields. J BioMed Opt. 2010;15:021314. doi: 10.1117/1.3360308. [DOI] [PubMed] [Google Scholar]

[R43] [43].Treeby BE, Jaros J, Rendell AP, Cox BT. Modeling nonlinear ultrasound propagationin heterogeneous media with power law absorption using a k-space pseudospectral method. J Acoust Soc Amer. 2012;131:4324–4336. doi: 10.1121/1.4712021. [DOI] [PubMed] [Google Scholar]

[R44] [44].Vishnevskiy V, Rau R, Goksel O. Deep variational networks with exponential weighting for learning computed tomography. International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer, Cham, Switzerland. 2019. pp. 310–318. [Google Scholar]

[R45] [45].Y Xu LW, Ambartsoumian G, Kuchment P. Reconstructions in limited-view thermoacoustic tomography. Med Phys. 2004;31:724–733. doi: 10.1118/1.1644531. [DOI] [PubMed] [Google Scholar]

[R46] [46].Zhang E, Laufer J, Beard P. Backward-mode multiwavelength photoacoustic scanner using a planar Fabry-Perot polymer film ultrasound sensor for high-resolution three-dimensional imaging of biological tissues. Appl Opt. 2008;47:561–577. doi: 10.1364/ao.47.000561. [DOI] [PubMed] [Google Scholar]

[R47] [47].Zhu L, Xie Y, Wang J, Xing L. Scatter correction for cone-beam CT in radiation therapy. Med Phys. 2009;36:2258–2268. doi: 10.1118/1.3130047. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On Learned Operator Correction in Inverse Problems*

Sebastian Lunz

Andreas Hauptmann

Tanja Tarvainen

Carola-Bibiane Schönlieb

Simon Arridge

Abstract

1. Introduction

1.1. Remark

Glossary

Table 1.

2. Learning a model correction

2.1. Approximation error method (AEM)

2.2. Learning a general model correction

Remark 2.1

2.2.1. A toy case: Downsampling

2.3. Solving the variational problem

3. Forward model correction

3.1. The adjoint problem

Theorem 3.1 (unlearnability of a gradient consistent forward model correction)

Proof

3.1.1. Illustration with the toy case

Figure 1. Illustration of mapping properties for the toy case.

4. Forward-adjoint correction

4.1. Obtaining a forward-adjoint correction

Remark 4.1

4.2. Convergence analysis

Assumption 4.2 (strong convexity)

4.3. Remark

Lemma 4.4 (proximity to minimizer)

Proof

Lemma 4.5 (lower gradient norm bound)

Proof

4.6. Remark

Proposition 4.7 (convergence under alignment constraints)

Proof

Lemma 4.8 (complete gradient alignment bound)

Proof

Theorem 4.9 (convergence to a neighborhood of x^)

Proof

4.10. Remark

5. Computational considerations

5.1. Recursive training

5.2. Experimental design

Figure 2.

Model corrections under consideration

Measurement setup

Training samples

Figure 3.

Training scheme

Network details

Solving the variational problem

6. Computational results

Synthetic ball phantoms

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Vessel phantoms

Figure 10.

Figure 11.

Figure 12.

Model transfer between vessel and ball phantoms

Figure 13.

Table 2.

7. Conclusion

Supplementary Material

Acknowledgments

Funding

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

On Learned Operator Correction in Inverse Problems^*

Theorem 4.9 (convergence to a neighborhood of $\hat{x}$ )