An Inexact Newton–Krylov Algorithm for Constrained Diffeomorphic Image Registration

Andreas Mang; George Biros

doi:10.1137/140984002

. Author manuscript; available in PMC: 2016 Sep 7.

Published in final edited form as: SIAM J Imaging Sci. 2015 May 5;8(2):1030–1069. doi: 10.1137/140984002

An Inexact Newton–Krylov Algorithm for Constrained Diffeomorphic Image Registration^*

Andreas Mang ^†, George Biros ^†

PMCID: PMC5014413 NIHMSID: NIHMS776666 PMID: 27617052

Abstract

We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. We formulate the nonrigid image registration problem as a problem of optimal control. This leads to an infinite-dimensional partial differential equation (PDE) constrained optimization problem. The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolution of the image intensity. The control variable is the velocity field. Tikhonov regularization on the control ensures well-posedness. We consider standard smoothness regularization based on H¹- or H²-seminorms. We augment this regularization scheme with a constraint on the divergence of the velocity field (control variable) rendering the deformation incompressible (Stokes regularization scheme) and thus ensuring that the determinant of the deformation gradient is equal to one, up to the numerical error. We use a Fourier pseudospectral discretization in space and a Chebyshev pseudospectral discretization in time. The latter allows us to reduce the number of unknowns and enables the time-adaptive inversion for nonstationary velocity fields. We use a preconditioned, globalized, matrix-free, inexact Newton–Krylov method for numerical optimization. A parameter continuation is designed to estimate an optimal regularization parameter. Regularity is ensured by controlling the geometric properties of the deformation field. Overall, we arrive at a black-box solver that exploits computational tools that are precisely tailored for solving the optimality system. We study spectral properties of the Hessian, grid convergence, numerical accuracy, computational efficiency, and deformation regularity of our scheme. We compare the designed Newton–Krylov methods with a globalized Picard method (preconditioned gradient descent). We study the influence of a varying number of unknowns in time. The reported results demonstrate excellent numerical accuracy, guaranteed local deformation regularity, and computational efficiency with an optional control on local mass conservation. The Newton–Krylov methods clearly outperform the Picard method if high accuracy of the inversion is required. Our method provides equally good results for stationary and nonstationary velocity fields for two-image registration problems.

Keywords: large deformation diffeomorphic image registration, optimal control, PDE constrained optimization, Stokes regularization, Newton–Krylov method, pseudospectral Galerkin method, Stokes solver

1. Introduction and motivation

Image registration has become a key area of research in computer vision and (medical) image analysis [54, 65]. The task is to establish spatial correspondence between two images, m_R : Ω̄ → R, x ↦ m_R(x), and m_T: Ω̄ → R, x ↦ m_R(x), with compact support on some domain Ω := (−π, π)^d ⊂ R^d via a mapping y : R^d → R^d, x ↦ y(x), such that m_T ∘ y ≈ m_R [22, 27], where ∘ is the function composition. Here, m_T is referred to as the template image (the image to be registered), m_R is referred to as the reference image (the fixed image), and d ∈ {2, 3} is the data dimensionality. We limit ourselves to non-rigid image registration. The search for y is typically formulated as a variational optimization problem [27, 54],

min_{y \in Y} {J [y] : = \frac{1}{2} {‖ m_{R} - m_{T} \circ y ‖}_{L^{2} (Ω)}^{2} + \frac{β}{2} S [y]} .

(1.1)

The proximity between m_R and m_T ∘ y is measured on the basis of an L²-distance (other measures can be considered [54, 65]). The functional 𝒮 in (1.1) is a regularization model that is introduced to overcome ill-posedness. The regularization parameter β > 0 weights the contribution of 𝒮. Various regularization models 𝒮 have been considered (see [13, 14, 19, 22, 25, 26, 28, 42, 63] for examples).

A key requirement in many image registration problems is that the mapping y is a diffeomorphism [4, 14, 23, 68, 69, 70]. This translates into the necessary condition det(∇y) > 0, where ∇y ∈ R^d×d is the deformation gradient (also referred to as the Jacobian matrix). An intuitive approach is to explicitly safeguard against nondiffeomorphic mappings y by adding a constraint to (1.1) [39, 48, 61]. Another strategy is to perform the optimization on the manifold of diffeomorphic mappings [2, 3, 4, 44, 52, 53, 69, 70]. The latter models, in general, do not control geometric properties of the deformation field and may result in fields that are close to being nondiffeomorphic. Further, for certain image registration problems, restricting the search space to the manifold of diffeomorphisms does not necessarily guarantee that y is physically meaningful. Some applications may benefit from extending these types of models by introducing additional constraints. One example for such a constraint is incompressibility (i.e., enforcing det(∇y) = 1; see also [10, 16, 52, 62]). Incompressibility is a requirement that might be of interest in medical image computing applications. If required, we can modify the incompressibility constraint to control the deviation of det(∇y) from identity. This will be the topic of a follow-up paper, in which we will extend our formulation. Here, we focus on the algorithmic issues of incorporating the incompressibility constraint. Furthermore, we remark that our optimal control formulation can naturally be extended to account for more complex constraints on the velocity field, for example, constraints related to biophysical models (examples of such models can be found in [31, 40, 45, 51, 66]).

In what follows, we outline our method (see section 1.1), summarize the key contributions (see section 1.2), list the limitations of our method (see section 1.3), and relate our work to existing approaches (see section 1.4).

1.1. Outline of the method

We assume the images are compactly supported functions, defined on the open set Ω := (−π, π)^d ⊂ R^d with boundary ∂Ω and closure Ω̄ := Ω ∪ ∂Ω. The deformation is modeled in an Eulerian frame of reference. We introduce a pseudotime variable t > 0 and solve on a unit time horizon for a velocity field v : Ω̄ × [0, 1] → R^d, (x, t) ↦ v(x, t), as follows:

min_{v \in V} {J [v] : = \frac{1}{2} {‖ m_{R} - m_{1} ‖}_{L^{2} (Ω)}^{2} + \int_{0}^{1} S [v] d t subject to C [m, v] = 0},

(1.2a)

where

C [m, v] : = {\begin{matrix} \partial_{t} m + \nabla m \cdot v & in Ω \times (0, 1], \\ m - m_{T} & in Ω \times {0}, \\ γ (\nabla \cdot v) & in Ω \times [0, 1], \end{matrix}

(1.2b)

with periodic boundary conditions on ∂Ω. The parameter γ ∈ {0, 1} in (1.2b) is introduced to enable or disable this constraint on the control v. In PDE constrained optimization theory, m is referred to as the state variable and v as the control variable. The first equation in (1.2b), in combination with its initial condition (second equation), models the flow of m_T subject to v, where m : Ω̄ × [0, 1] → R, (x, t) ↦ m(x, t), represents the transported intensities of m_T. Accordingly, the final state m₁ := m(·, 1), m₁ : Ω̄ → R, x ↦ m₁(x), corresponds to m_T ∘ y in (1.1). We measure the proximity between the deformed template image m₁ and the reference image m_R in terms of an L²-distance. Once we have found v, we can compute y from v as a postprocessing step (this is also true for the deformation gradient ∇y; see Appendix D for details). The third equation in (1.2b) is a control on the divergence of v and guarantees that the flow is incompressible (Stokes flow), i.e., the volume is conserved. This is equivalent to enforcing det(∇y) = 1 (see [33, p. 77ff.]).

We use either an H¹- or an H²-seminorm for the smoothness regularization 𝒮 (resulting in Laplacian or biharmonic vector operators, respectively; see section 3.1). We report results for standard H¹- and H²-regularization (neglecting ∇·v = 0, i.e., γ = 0 in (1.2b)) and for a Stokes regularization scheme (incompressible flow; enforcing ∇ ·v = 0, i.e., γ = 1 in (1.2b)).

In section 4 we will see that the optimality condition for (1.2) is a system of space-time nonlinear multicomponent PDEs for the transported image m, the velocity v, and the adjoint variables for the transport and the divergence condition. Efficiently solving this system is a significant challenge. For example, when we include the incompressibility constraint, the equation for the velocity ends up being a linear Stokes equation.

We solve for the first-order optimality conditions using a globalized, matrix-free, preconditioned Newton–Krylov method for the Schur complement of the velocity v (a linearized Stokes problem driven by the image mismatch). We first derive the optimality conditions and then discretize using a pseudospectral discretization in space with a Fourier basis. We use a second-order Runge–Kutta method in time. The preconditioner for the Newton–Krylov schemes is based on the exact spectral inverse of the second variation of 𝒮.

1.2. Contributions

The fundamental contributions are as follows:

We design a numerical scheme for (1.2) with the following key features:
- We use an adjoint-based Newton-type solver.
- We provide a fast Stokes solver.
- We introduce a spectral Galerkin method in time.
- We design a parameter continuation method for automatically selecting the regularization parameter β.
- Our framework guarantees deformation regularity.¹
We provide a numerical study for the designed framework. We compare a globalized Picard method (preconditioned gradient descent) to an inexact Newton–Krylov and a Gauss–Newton–Krylov method. We report results for synthetic and real-world problems. We study spectral properties, grid convergence, and numerical accuracy of the proposed scheme. We study the effects of compressible (plain H¹- and H²-regularization) and incompressible (Stokes regularization) deformation models. We report results for a varying number of unknowns in time (i.e., inverting for stationary and nonstationary velocity fields).

The numerical discretization (pseudospectral) allows for an efficient solution of the Stokes-like equations by eliminating the pressure (i.e., the adjoint variable for the incompressibility constraint ∇ ·v = 0 in (1.2b)).

The inf-sup condition for pressure spaces [12, p. 200ff.] is not an issue with our scheme.² In fact, for smooth images, our scheme is spectrally accurate in space and second-order accurate in time. We will see that we can numerically enforce incompressibility up to almost machine precision. Also, our scheme allows for efficient preconditioning of the Hessian: at the cost of a diagonal scaling we obtain a problem with a bounded condition number.

Overall, we demonstrate that the designed framework(i) is efficient and accurate, (ii) features a precise control of the deformation regularity, and (iii) does not require manual tuning of parameters.

1.3. Limitations

The main limitations of our method are as follows:

The considered model assumes a constant illumination of m_R and m_T (a consequence of the transport equation and the L²-distance in (1.2)). Therefore, it is (in its current state) not directly applicable to multimodal registration problems. Nevertheless, let us remark that the L²-distance is commonly used in practice [4, 16, 28, 41, 44, 48, 57, 71].
The efficient use of a Fourier discretization for the PDEs requires periodic boundary conditions. If the images are not periodic, we artificially introduce periodic boundary conditions by mollification and zero padding.

1.4. Related work

Due to the vast body of literature, it is not possible to provide a comprehensive review of numerical methods for nonrigid image registration. Background on image registration formulations and numerics can be found in [27, 54, 65]. We limit the discussion to approaches that(i) model the deformation via a velocity field v, (ii) view image registration as a problem of optimal control, and/or (iii) constrain v to be divergence free (i.e., introduce a mass conservation equation as an additional constraint).

Fluid mechanical models have been introduced [18, 19] to overcome limitations of small deformation models [13, 25, 26]. The work in [18, 19] has been extended in [4, 23, 53, 68] using concepts from differential geometry. This class of approaches is referred to as large deformation diffeomorphic metric mapping (LDDMM). Under the assumption that v is adequately smooth, it is guaranteed that y is a diffeomorphism [23, 68]. The associated smoothness requirements are enforced by the regularization model 𝒮 (typically an H²-norm) [3, 4]. The optimization is performed on the space of nonstationary velocity fields [4]. To reduce the number of unknowns, it has been suggested to perform the optimization either on the space of stationary velocity fields [1, 2, 44] or with respect to an initial momentum that entirely defines the flow of the map y [3, 71, 74].

The idea of parameterizing a diffeomorphism y via a stationary velocity field [1] has also been introduced to the demons registration framework [52, 69, 70]. Here, optimization is performed in a sequential fashion, alternating between updates resulting from the distance measure (forcing term) and the application of a smoothing operator to regularize the problem (typically through Gaussian smoothing [67, 69, 70]). This scheme is somewhat equivalent to the Picard scheme we discuss in our paper, but it is unclear how one couples it with line search or trust region techniques.

Approaches that more closely reflect an optimal control PDE constrained optimization formulation (1.2) are described in [10, 16, 41, 48, 49, 57, 62, 71]. The model in [16] is equivalent to (1.2). The model in [62] follows the traditional optical flow formulation [46]. The conceptual difference between our formulation and the latter is that in optical flow, the transport equation constraint appears in the objective (see, e.g., [46, 47, 62]) and is therefore only fulfilled approximately in an L² least squares sense. We treat it as a hard constraint instead. In [57] an optimal mass transport formulation is described, which is based on the Monge-Kantorovich problem. The formulations in [41, 71] do not account for incompressibility. The optical flow approach in [10] treats incompressibility as an L²-penalty. An optimal control formulation for a constant-in-time velocity field was proposed in [48, 49], in which the divergence of the velocity field is penalized along with smoothness constraints.

What sets our work apart are the numerical algorithms and the discretization scheme. Almost all existing efforts on large deformation diffeomorphic image registration that are closely related to our optimal control formulation exclusively use first-order information for numerical optimization [2, 4, 10, 16, 18, 19, 41, 44, 48, 49, 57, 62, 71, 74]. We use second-order information. The only work³ in the context of large deformation diffeomorphic image registration that to our knowledge uses second-order information is [3]. The model in [3] is based on the LDDMM framework [4, 23, 48, 49, 53, 44, 68]. The inversion is, likewise to [71], performed with respect to an initial momentum. No additional constraints on v are considered. Another difference is that we use a Galerkin method in time to reduce the number of unknowns. This allows us to invert for stationary [1, 2, 44, 52, 69, 70] as well as time-varying [4, 10, 16, 41] velocity fields. Nothing changes in our formulation other than the number of unknowns. Furthermore, we globalize our methods with a line search strategy (i.e., we guarantee a sufficient decrease of the objective 𝒥). This is a standard—yet important—ingredient for guaranteeing convergence, which is often not accounted for [2, 16, 17, 41, 52, 69, 70].

We, likewise to [10, 16, 48, 49, 52, 62], consider incompressibility as an optional constraint (see (1.2b)). Operating with divergence-free velocity fields is equivalent to enforcing det(∇y) = 1 up to numerical accuracy (see [33, p. 77ff.]; other formulations for controlling det(∇y) can be found in [14, 36, 37, 39, 50, 55, 58, 60, 61, 64, 73]). Unlike [10, 48, 49], which penalize the divergence of the velocity, we treat it exactly. We are not arguing that this approach is better per se. The use of penalties is adequate unless one has reasons to insist on an incompressible velocity field. In that case, a penalty method results in ill-conditioning.

Finally our pseudospectral formulation in space allows us to resolve several numerical difficulties related to the incompressibility constraint. For example, the inf-sup condition for pressure spaces is not an issue with our scheme. Regarding accuracy, for smooth images, our scheme is spectrally accurate in space and second-order accurate in time. We do not have to use different discretization models [16, 62] for solving the individual subsystems of the mixed-type (hyperbolic-elliptic) optimality conditions. Since we use second-order explicit time stepping in combination with Fourier spectral methods, we have at hand a scheme that displays minimal numerical diffusion and does not require flux-limiters [10, 16, 41, 71]. As we will see, the conditioning of the Hessian that appears in our Newton–Krylov scheme can be quite bad. Although the literature for preconditioners for PDE constrained optimization problem is quite rich (e.g., [6, 10, 8, 34]), none of these methods directly applies to our formulation. Developing effective preconditioning schemes for our formulation is ongoing work in our group.

2. Outline

In section 3 the mathematical model is developed. The numerical strategies are described in section 4. In particular, we specify(i) the optimality conditions (see section 4.1), (ii) strategies for numerical optimization (see section 4.2), and (iii) implementation details (see section 4.3). Numerical experiments on synthetic and real-world data are reported in section 5. Final remarks can be found in section 6.

3. Continuous problem formulation

We provide a summary of the basic notation in Table 1. The original problem formulation is stated in section 1.1. The only missing building block is the considered choices for 𝒮 in (1.2a). This is what we discuss next. Note that we neglect any technicalities with respect to the associated function spaces; we assume that the considered functions are adequately smooth (i.e., sufficiently many derivatives exist and are bounded).

Table 1.

Notation (frequently used acronyms and symbols).

Notation	Description

GN	Gauss–Newton
KKT system	Karush–Kuhn–Tucker system
PCG	preconditioned conjugate gradient method
PDE	partial differential equation
PDE solve	solution of hyperbolic PDEs of optimality systems (4.1) and (4.3)

m_R	reference (fixed) image (m_R : R^d → R)
m_T	template image (image to be registered; m_T: R^d → R)
y	mapping (deformation; y : R^d → R^d)
v	velocity field (control variable; v : R^d × [0, 1] → R^d)
m	state variable (transported image; m : R^d × [0_, 1] → R)
m₁	state variable at t = 1 (deformed template image; m₁ : R^d → R)
λ	adjoint variable (transported mismatch; λ : R^d × [0_, 1] → R)
f	body force (drives the registration; f : R^d × [0_, 1] → R^d)
F₁	deformation gradient (tensor field) at t = 1 (F₁ : R^d → R^d×d)
𝒥	objective functional
𝒮	regularization functional
𝒜	differential operator (first and second variation of 𝒮)
β	regularization parameter
γ	parameter that enables (γ = 1) or disables (γ = 0) the incompressibility constraint
n_t	number of time points (discretization)
n_c	number of coefficient fields (spectral Galerkin method in time)
n_x	number of grid points (discretization; $n_{x} = {(n_{x}^{1}, \dots, n_{x}^{d})}^{⊤}$ )
g	reduced gradient (first variation of Lagrangian with respect to v)
ℋ	reduced Hessian (second variation of Lagrangian with respect to v)

Open in a new tab

3.1. Regularization models

In contrast to [10] we do not explicitly enforce continuity in time. We relax the model to an L²-integrability instead (see (1.2a)). This relaxation still yields a velocity field that varies smoothly in time [16].

Quadratic smoothing regularization models are commonly used in nonrigid image registration [27, 54, 56, 65]. They can be defined as

S [v] : = \frac{β}{2} {‖ v ‖}_{W}^{2} = \frac{β}{2} {〈 B [v], B [v] 〉}_{L^{2} (Ω)},

(3.1)

where ℬ is a differential operator that (together with its dual) defines the function space 𝒲 and β > 0 is a regularization parameter that balances the contribution of 𝒮.

As images are functions of bounded variation, regularity requirements on v ∈ 𝒱, 𝒱 := L²([0, 1]; 𝒲) (i.e., the choice of 𝒲 in (3.1)) have to be considered with care (for an analytical result see [16]). Experimental analysis suggests that an H¹-seminorm is appropriate if incompressibility is considered (i.e., γ = 1 in (1.2b); see also [16]). Thus,

B [v] = \nabla v .

This choice is motivated from continuum mechanics and yields a viscous model of linear Stokes flow (see section 4.1; Stokes regularization). If we neglect the incompressibility constraint (i.e., γ = 0 in (1.2b)), we use a vectorial Laplacian operator,

B [v] = Δ v,

instead. This choice is motivated by the fact that H²-norm-based quadratic regularization is commonly used in large deformation diffeomorphic image registration [4, 41, 44].

4. Numerics

We describe the numerical methods used to solve (1.2) next. Whenever discretized quantities are considered, a superscript h is added to the continuous variables and operators (i.e., the discretized representation of v is denoted by v^h). Likewise, if we refer to a discrete variable at a particular iteration, we will add the iteration index as a subscript (i.e., v^h at iteration k is denoted by $v_{k}^{h}$ ).

We discretize the data on a nodal grid in space and time. The number of spatial grid points is denoted by $n_{x} : = {(n_{x}^{1}, \dots, n_{x}^{d})}^{⊤} \in N^{d}$ with spatial step size $h_{x} : = (h_{x}^{1}, \dots, h_{x}^{d}) \in R_{> 0}^{d}$ . The number of time points is denoted by n_t ∈ N with step size h_t = 1/n_t, h_t > 0.

We use the method of Lagrange multipliers to numerically solve (1.2) with Lagrange multipliers λ : Ω̄ ×[0, 1] → R, (x, t) ↦ λ(x, t) (for the hyperbolic transport equation in (1.2b)), and p : Ω̄ ×[0, 1] → R, (x, t) ↦ p(x, t) (pressure; for the incompressibility constraint in (1.2b)). We use an optimize-then-discretize approach (for a discussion on advantages and disadvantages see [32, p. 57ff.]). The resulting optimality conditions are described next.

4.1. Optimality conditions

Computing variations of the Lagrangian with respect to perturbations of the state (m), adjoint (λ and p), and control (v) variables, respectively, yields the (necessary) first-order optimality (KKT) conditions (in strong form)

\partial_{t} m + \nabla m \cdot v = 0 in Ω \times (0, 1],

(4.1a)

- \partial_{t} λ - \nabla \cdot (v λ) = 0 in Ω \times [0, 1),

(4.1b)

γ (\nabla \cdot v) = 0 in Ω \times [0, 1],

(4.1c)

g : = β A [v] + γ \nabla p + f = 0 in Ω \times [0, 1],

(4.1d)

subject to the initial and terminal conditions

m = m_{T} in Ω \times {0} and λ = - (m_{1} - m_{R}) in Ω \times {1}

and periodic boundary conditions on ∂Ω; (4.1d) is referred to as the reduced gradient, where f := λ∇m, f : Ω̄ × [0, 1] → R^d, (x, t) ↦ f(x, t), is the applied body force and 𝒜 = ℬℬ^H is the Gâteaux derivative of 𝒮. In particular, we have

A [v] = - Δ v (H^{1} - regularization; γ = 1 in (4.1)),

(4.2a)

A [v] = Δ^{2} v (H^{2} -regularization; γ = 0 in (4.1)),

(4.2b)

respectively. We refer to (4.1a) (hyperbolic initial value problem) as the state equation, to (4.1b) as the adjoint equation (hyperbolic final value problem) and to (4.1d) as the control equation (elliptic problem). Note that the adjoint equation (4.1b) is, likewise to (4.1a), a scalar conservation law that flows the mismatch between m_R and m₁ backward in time. If we neglect the incompressibility constraint in (1.2b), γ in (4.1) is set to zero (i.e., (4.1) consists only of (4.1a), (4.1b), and (4.1d)).

Taking second variations of the Lagrangian yields the system

\partial_{t} \tilde{m} + \nabla \tilde{m} \cdot v + \nabla m \cdot \tilde{v} = 0 in Ω \times (0, 1],

(4.3a)

- \partial_{t} \tilde{λ} - \nabla \cdot (v \tilde{λ}) - \nabla \cdot (\tilde{v} λ) = 0 in Ω \times [0, 1),

(4.3b)

γ (\nabla \cdot \tilde{v}) = 0 in Ω \times [0, 1],

(4.3c)

H [\tilde{v}] : = β A [\tilde{v}] + γ \nabla \tilde{p} + \tilde{f} = - g in Ω \times [0, 1],

(4.3d)

subject to initial and terminal conditions m̃₀ := m̃(·, 0) = 0, m̃₀ : Ω̄ → R, x ↦ m̃₀(x), and λ̃₁ := λ̃(·, 1) = −m̃₁, λ̃₁ : Ω̄ → R, x ↦ λ̃₁(x), m̃₁ := m̃(·, 1), m̃₁ : Ω̄ → R, x ↦ m̃₁(x), respectively, and periodic boundary conditions on ∂Ω. Here, (4.3a), (4.3b), and (4.3d) are referred to as the incremental state, adjoint, and control equations, respectively; the incremental variables are denoted with a tilde. Further, ℋ in (4.3d) is referred to as the reduced Hessian and f̃:= λ̃∇m+ λ∇m̃, f̃ : Ω̄ ×[0, 1] → R, (x, t) ↦ f̃(x, t), is the incremental body force. The operator 𝒜 in (4.3d) represents the second variation of 𝒮 with respect to the control v. We use the same symbol as in (4.1), since the second variation of 𝒮 with respect to v is identical to its first variation (the corresponding vectorial differential operators are given in (4.2a) and (4.2b), respectively).

4.2. Numerical optimization

We discuss strategies for numerical optimization next. We consider second-order Newton–Krylov methods (see section 4.2.1) and a first-order Picard method (see section 4.2.3).

We use a backtracking line search subject to the Armijo condition with search direction s_k ∈ Rⁿ and step size α_k > 0 at (outer) iteration k ∈ N₀ to ensure a sequence of monotonically decreasing objective values 𝒥^h (we use default parameters; see [59, Algorithm 3.1, p. 37]). Note that each evaluation of 𝒥^h requires a forward solve (i.e., the solution of (4.1a) to obtain $m_{1}^{h}$ given some trial solution $v_{k}^{h} \in R^{n}$ ). Therefore, it is desirable to keep the number of line search steps at minimum.

4.2.1. Inexact Newton–Krylov method

Applying Newton’s method to (4.1) yields a large KKT system that has to be solved numerically at each outer iteration k. We will refer to the iterative solution of this system as inner iterations.⁴ In reduced space methods, incremental adjoint and state variables are eliminated from the system via block elimination (under the assumption that state and adjoint equations are fulfilled exactly) [8, 9]. We obtain the reduced KKT system

H_{k}^{h} {\tilde{v}}_{k}^{h} = - g_{k}^{h}, k \in N,

(4.4)

where $H_{k}^{h} \in R^{n \times n}$ corresponds to the reduced Hessian in (4.3d) (i.e., the Schur complement of the full Hessian for the control variable v^h) and ${\tilde{v}}_{k}^{h} \in R^{n}$ to the incremental control variable in (4.3) (which is nothing but the search direction s_k mentioned earlier). Further, the right-hand side $g_{k}^{h} \in R^{n}$ corresponds to the reduced gradient in (4.1d).

The numerical scheme amounts to a sequential solution of the optimality conditions (4.1) and (4.3). Algorithm 1 illustrates a realization of an outer iteration.⁵ Note that we eliminate (4.1c) and (4.3c) from the optimality conditions (see section 4.3.4). The inner iteration (i.e., the solution of (4.4)) is what we discuss next.

Forming or storing ℋ^h in (4.4) is computationally prohibitive. Therefore, it is desirable to use an iterative solver for which ℋ^h does not have to be assembled in practice. Krylov-subspace methods are a popular choice [5, 8, 15, 34], as they only require matrix-vector products. We use a PCG method, exploiting the fact that ℋ^h is positive definite (i.e., ℋ^h ≻ 0; see section 4.2.2 for a discussion) and symmetric.

Solving (4.4) exactly can be prohibitively expensive and might not be justified if an iterate is far from the (true) solution [20]. A common strategy is to perform inexact solves. That is, starting with a large tolerance for the Krylov-subspace method we successively reduce the tolerance and by that solve more accurately for the search direction, as we approach a (local) minimizer [21, 24]. This can be achieved with the termination criterion

{‖ H_{ι}^{h} {\tilde{v}}_{ι}^{h} + g_{k}^{h} ‖}_{2} = : {‖ r_{ι} ‖}_{2} \leq η_{k} {‖ g_{k}^{h} ‖}_{2}

(4.5)

for the Krylov-subspace method. Here, $η_{k} : = min (0.5, \sqrt{{‖ g_{k}^{h} ‖}_{2} / {‖ g_{0}^{h} ‖}_{2}})$ , η_k ∈ [0, 1), is referred to as a forcing sequence (assuming superlinear convergence; details can be found in e.g., [59, p. 165ff.]); ι ∈ N in (4.5) is the iteration index of the inner iteration (i.e., for the iterative solution of (4.4)) at a given outer iteration k.

Algorithm 1.

Outer iteration of the designed inexact Newton–Krylov method.

1:	$v_{0}^{h} \leftarrow 0$ compute $m_{0}^{h}, λ_{0}^{h}, J^{h} (v_{0}^{h})$ , and $g_{0}^{h}$ ; k ← 0
2:	while true do
3:	stop ← (4.9)
4:	if stop break
5:	s_k ← solve (4.4) given $m_{k}^{h}, λ_{k}^{h}, v_{k}^{h}$ , and $g_{k}^{h}$	▷ Newton step
6:	α_k ← perform line search on s_k
7:	$v_{k + 1}^{h} \leftarrow v_{k}^{h} + α_{k} s_{k}$ ,
8:	$m_{k + 1}^{h} (t = 0) \leftarrow m_{T}^{h}$
9:	$m_{k + 1}^{h} \leftarrow solve$ (4.1a) forward in time given $v_{k + 1}^{h}$	▷ forward solve
10:	$λ_{k + 1}^{h} (t = 1) \leftarrow (m_{R}^{h} - m_{k + 1}^{h} (t = 1))$
11:	$λ_{k + 1}^{h} \leftarrow solve$ (4.1b) backward in time given $v_{k + 1}^{h}$ and $m_{k + 1}^{h}$	▷ adjoint solve
12:	compute $J^{h} (v_{k + 1}^{h})$ and $g_{k + 1}^{h}$ given $m_{k + 1}^{h}, λ_{k + 1}^{h}$ , and $v_{k + 1}^{h}$
13:	k ← k + 1
14:	end while

Open in a new tab

The course of an inner iteration follows the standard PCG steps (see, e.g., [59, p. 119, Algorithm 5.3]). During each inner iteration ι we have to apply ℋ^h in (4.3d) to a vector. We summarize this matrix-vector product in Algorithm 2. As can be seen, each application of ℋ^h requires an additional forward and adjoint solve (i.e., the solution of the incremental state and adjoint equations (4.3a) and (4.3b), respectively). This is a direct consequence of the block elimination in reduced space methods.

The number of inner iterations essentially depends on the spectrum of the operator ℋ^h. Typically, ℋ^h displays poor conditioning. An optimal preconditioner P ∈ R^n×n renders the number of iterations independent of n and β. The design of such a preconditioner is an open area of research [6, 7, 8, 34]. Standard techniques like incomplete factorizations or algebraic multigrid are not applicable, as they require the assembling of ℋ^h in (4.4). Geometric, matrix-free preconditioners are a valid option. This is something we will investigate in the future. Here, we consider a left preconditioner based on the exact spectral inverse of the regularization part of ℋ^h. That is, P := 𝒜^h (implementation details can be found in section 4.3.3). Note that the PCG method only requires the action of P⁻¹ on a vector (i.e., a matrix-free implementation is in place). Since we use a Fourier spectral method, the cost of our preconditioning amounts to a spectral diagonal scaling. We will refer to this algorithm as the Newton-PCG (N-PCG) method.

Algorithm 2.

Hessian matrix-vector product of the designed inexact Newton–Krylov algorithm at outer iteration k ∈ N. We illustrate the computational steps required for applying ℋ^h in (4.3d) to the PCG search direction at inner iteration index ι ∈ N.

{\tilde{m}}_{ι}^{h} (t = 0) \leftarrow 0

{\tilde{m}}_{ι}^{h} \leftarrow solve

(4.3a) forward in time given

m_{k}^{h}, v_{k}^{h}

, and

{\tilde{v}}_{ι}^{h}

▷ incremental forward solve

{\tilde{λ}}_{ι}^{h} (t = 1) \leftarrow - {\tilde{m}}_{ι}^{h} (t = 1)

{\tilde{λ}}_{ι}^{h} \leftarrow solve

(4.3b) backward in time given

λ_{k}^{h}, v_{k}^{h}

, and

{\tilde{v}}_{ι}^{h}

▷ incremental adjoint solve

apply

H_{ι}^{h}

in (4.3d) to the PCG search direction given

λ_{k}^{h}, {\tilde{λ}}_{ι}^{h}, m_{k}^{h}

, and

{\tilde{m}}_{ι}^{h}

Open in a new tab

4.2.2. GN approximation

Even though ℋ^h is in the proximity of a (local) minimum by construction positive semidefinite (i.e., ℋ^h ⪰ 0) it can be indefinite or singular far away from the solution. Accordingly, the search direction is not guaranteed to be a descent direction. One remedy is to terminate the inner iteration whenever negative curvature occurs [21]. Another approach is to use a quasi-Newton approximation. We consider a GN approximation $H_{GN}^{h}$ instead. Here, we drop certain expressions of ℋ^h, which in turn guarantees that $H_{GN}^{h} ≻ 0$ . In particular, we drop all expressions in (4.3) in which λ appears. Accordingly, we obtain the (continuous) system

\partial_{t} \tilde{m} + \nabla \tilde{m} \cdot v + \nabla m \cdot \tilde{v} = 0 in Ω \times (0, 1],

(4.6a)

- \partial_{t} \tilde{λ} - \nabla \cdot (v \tilde{λ}) = 0 in Ω \times [0, 1),

(4.6b)

γ (\nabla \cdot \tilde{v}) = 0 in Ω \times [0, 1],

(4.6c)

H_{GN} [\tilde{v}] : = β A [\tilde{v}] + γ \nabla p + \tilde{λ} \nabla m = - g in Ω \times [0, 1] .

(4.6d)

We expect the rate of convergence to drop from quadratic to (super-)linear when turning to (4.6). However, if the L²-distance can be driven to zero, we recover fast local convergence close to the true solution v^★, even if the adjoint variable is neglected. This is due to the fact that (4.1b) models the flow of the mismatch backward in time, such that λ → 0 for v → v^★. We refer to this method as the (inexact) GN-PCG method [8, 9]. We remark that all algorithmic details described in this note apply to both Newton–Krylov methods.

4.2.3. Picard method

We consider a globalized Picard iteration (fixed point iteration) in addition to the described Newton–Krylov methods. Based on (4.1d) we have

v_{k + 1}^{h} = - {(β A^{h})}^{- 1} [γ \nabla^{h} p_{k}^{h} + f_{k}^{h}] .

(4.7)

Since we use Fourier spectral methods, the inversion of 𝒜^h in (4.7) comes at the cost of a diagonal scaling (implementation details can be found in section 4.3.3). Accordingly, this scheme does not require the (iterative) solution of a linear system. However, it potentially results in a larger number of outer iterations until convergence as we expect the optimization problem to be poorly conditioned.

We do not directly use the solution of (4.7) as a new iterate but compute a search direction s_k instead. This in turn allows us to perform a line search on s_k. That is, we subtract the new from the former iterate. This scheme can be viewed as a gradient descent in the function space induced by 𝒲 (i.e., a preconditioned gradient descent scheme; see Appendix C).

Note that s_k is, in contrast to Newton methods, arbitrarily scaled. Therefore, we provide an augmented implementation that tries to estimate an optimal scaling during the course of optimization. Details can be found in section 4.3.5.

4.2.4. Termination criteria

The termination criteria are in accordance with [56] (see [29, p. 305 ff.] for a discussion) given by

\begin{array}{l} (C 1) & J^{h} (v_{k - 1}^{h}) - J^{h} (v_{k}^{h}) & < τ_{J} (1 + J^{h} (v_{0}^{h})), \\ (C 2) & {‖ v_{k - 1}^{h} - v_{k}^{h} ‖}_{\infty} & < \sqrt{τ_{J}} (1 + {‖ v_{k}^{h} ‖}_{\infty}), \\ (C 3) & {‖ g_{k}^{h} ‖}_{\infty} & < \sqrt[3]{τ_{J}} (1 + J^{h} (v_{0}^{h})), \\ (C 4) & {‖ g_{k}^{h} ‖}_{\infty} & < 1 E 3 ε_{mach}, \\ (C 5) & k & > n_{opt} . \end{array}

(4.8)

Here, τ_𝒥 > 0 is a user defined tolerance, ε_mach > 0 is the machine precision, and n_opt ∈ N is the maximal number of outer iterations. The algorithm is terminated if

{(C 1) \land (C 2) \land (C 3)} \lor (C 4) \lor (C 5),

(4.9)

where ∧ denotes the logical or and ∨ the logical and operator, respectively.

4.3. Algorithmic details

This section provides additional specifics on the implementation. In particular, we describe(i) the numerical discretization (see section 4.3.1), (ii) the parameterization in time (see section 4.3.2), (iii) the inversion of the operator 𝒜^h (see section 4.3.3), and (iv) strategies for the parameter selection (see section 4.3.5).

4.3.1. Numerical discretization

We use a (regular) nodal grid for the discretization in space and time. The problem is defined on the space-time interval Ω × [0, 1], where Ω := (−π, π)^d. Accordingly, we obtain the time step size via h_t = 1/n_t. The cell size (pixel or voxel size) $h_{x} : = (h_{x}^{1}, \dots, h_{x}^{d}) \in R_{\geq 0}^{d}$ for a spatial grid cell can be computed via $h_{x}^{i} = 2 π / n_{x}^{i}$ , i = 1, …, d, where $n_{x}^{i}$ is the number of grid points along the ith spatial direction xⁱ.

The derivative operators are discretized via Fourier spectral methods [11]. The time integrator for the forward and adjoint solves is an explicit second-order Runge–Kutta method, which, in connection with Fourier spectral methods, displays minimal numerical diffusion.

Following standard numerical theory for hyperbolic equations, the step size h_t > 0 is bounded from above by h_t,_max := ε_CFL/ max(||v^h||_∞ ⊘ h_x), h_t,_max > 0 (Courant–Friedrich–Lewy (CFL) condition). Here, ⊘ denotes a Hadamard division and ε_CFL > 0 is the CFL number. The theoretical bound for h_t,_max is attained for ε_CFL = 1. We use ε_CFL = 0.2 for all experiments. Since we use a spectral Galerkin method in time (see section 4.3.2), we can adaptively adjust n_t (and therefore h_t) for the forward and adjoint solves as required by the CFL condition.

4.3.2. Spectral Galerkin method

To reduce the number of unknowns, v is expanded in time in terms of basis functions b_l : [0, 1] → R, t ↦ b_l(t), l = 1, …, n_c,

v (x, t) = \sum_{l = 1}^{n_{c}} b_{l} (t) v_{l} (x),

(4.10)

where v_l : R^d → R^d, x ↦ v_l(x), is a coefficient field. The coefficients v_l are the new unknowns of our problem. This reduces the number of unknowns in time from n_t to n_c, where n_c ≪ n_t. Thus, we can invert for a stationary (n_c = 1) or a nonstationary velocity field as required. Nothing changes in our formulation—just the number of unknowns.

We use Chebyshev polynomials as basis functions b_l on account of their excellent approximation properties as well as their orthogonality (see Appendix A for details). The expansion (4.10) solely affects 𝒮 and the (incremental) control equation (i.e., (4.1d) and (4.3d)); v is computed from the coefficient fields v_l, l = 1, …, n_c, during the forward and adjoint solves according to (4.10).

4.3.3. Inversion: Regularization operators

The Picard iteration in (4.7) as well as the preconditioning of (4.4) require the inversion of the differential operator 𝒜^h. Since we use Fourier spectral methods this inversion can be accomplished at the cost of a spectral diagonal scaling. However, 𝒜^h has a nontrivial kernel (which only includes constant functions due to the periodic boundary conditions). We make 𝒜^h invertible by setting the base frequency of the inverse of 𝒜^h (including the scaling by β) to one. This ensures not only invertibility, but also that the constant part of the (incremental) body force f (or f̃, respectively) remains in the kernel of our regularization scheme. This in turn allows us to invert for constant velocity fields.

4.3.4. Elimination of p and p̃

In our numerical scheme, we eliminate p and by that (4.1c) from (4.1). Details on the derivation can be found in Appendix B. We obtain

\tilde{g} : = β A [v] + K [f] = 0, K [f] : = - \nabla (Δ^{- 1} (\nabla \cdot f)) + f,

(4.11)

to replace (4.1d), where f is the body force as defined in section 4.1 and 𝒜 the first variation of 𝒮 with respect to v (see (4.2a) and (4.2b), respectively). It immediately follows that we obtain

\hat{H} [\tilde{v}] : = β A [\tilde{v}] + K [\tilde{f}] = - \tilde{g}

(4.12)

to replace (4.3d); f̃ denotes the incremental body force defined in section 4.1.

4.3.5. Parameter selection

To the extent possible, it is desirable to design a numerical scheme that does not require a selection of parameters (black-box solver). This is challenging for previously unseen data. In general, the user should only be required to decide on the following:

The desired accuracy of the inversion (controlled by the tolerance τ_𝒥; see section 4.2.4).
The desired properties of the mapping y (controlled by ε_θ or ε_F, respectively; see below).
The budget we are willing to assign to the computation (controlled by n_opt; see section 4.2.4).

For the purpose of this numerical study we proceed as follows.

Optimization

We set the maximum number of iterations n_opt (see (4.8)) to 1E6, as we do not want our algorithm to terminate early (i.e., we make sure that we terminate only if either we reach the defined tolerances or we no longer observe a decrease in 𝒥^h). For the convergence study in section 5, we use the relative change of the ℓ^∞-norm of the gradient g^h as a stopping criterion, as we are interested in studying convergence properties. This enables an unbiased comparison in terms of the required work to solve an optimization problem up to a desired accuracy. In particular, we terminate the optimization if the relative change of the reduced gradient g^h is larger than or equal to three orders of magnitude.

Following standard textbook literature [29, 56] we use the stopping criteria in (4.8) for the remainder of the experiments. We set the tolerance to τ_𝒥 = 1E–3. We qualitatively did not observe significant differences in the final results for the experiments performed in this study, when turning to smaller tolerances. We will further elaborate on the required accuracy for the inversion (i.e., the registration quality) in a follow-up paper.

The tolerance of the PCG method is set as discussed in section 4.2.1 (see (4.5)). The maximal number of iterations for the PCG method is set to n (order of the reduced KKT system in (4.4)). In theory, this guarantees that the PCG method converges to a solution. This choice not only ensures that we provide an unbiased study (i.e., we do not terminate early) but also makes sure that we do not miss any issues in the implementation or parameter selection. We converged for all experiments conducted in this study after only a fraction of n inner iterations. This statement is confirmed by the reported number of PDE solves.⁶

For all our experiments we initialized the line search with a factor of α_k = 1 (see section 4.2). This is a sensible choice, as search directions obtained from second-order methods are nicely scaled (i.e., we expect α_k to be 1). However, this is not the case for the Picard scheme (i.e., the preconditioned gradient descent). Our implementation features an option to memorize the scaling of s_k for the next outer iteration. That is, we introduce an additional scaling factor α̃_k > 0 that is applied to s_k before entering the line search (initialized with α̃_k = 1). If the line search kicks in, we downscale α̃_k by α_k. On the contrary, we upscale α̃_k by a factor of two if α_k = 1.

PDE solver

The number of time steps n_t is bounded from below due to stability requirements (see section 4.3.1). Since we use an expansion in time (see section 4.3.2), it is possible to adaptively adjust n_t, so that numerical stability is attained.

However, we fix n_t for the numerical experiments in section 5.3 as we are interested in studying the convergence behavior with respect to the employed grid size. We set n_t to 4 max(n_x). This is a pessimistic choice. If we still encounter instabilities (as judged by monitoring the CFL condition (see section 4.3.1)), s_k is scaled by a factor of 0.5 until numerical stability is attained, before entering the line search. For all numerical experiments conducted in this study, we did not observe any instabilities for the Newton–Krylov methods. However, for the Picard method we observed instabilities in the case when we did not consider the rescaling procedure detailed above. This is due to the fact that s_k is arbitrarily scaled for first-order methods (as opposed to second-order methods). By introducing the additional scaling parameter α̃_k we could stabilize the Picard method—we did not observe a violation of the CFL condition for any of the conducted experiments (for n_t fixed).

Regularization

Estimating an optimal value for β is an area of research by itself. A variety of methods has been designed (see, e.g., [72]). A key difficulty is computational complexity. Methods based on the assumption that differences between model output and observed data are associated with random noise (such as generalized cross validation) might not be reliable in the context of nonrigid image registration. This is due to the fact that the noise in the images is likely to be highly structured [38]. Another possibility is to estimate the regularization parameter on the basis of the spectral properties of the Hessian (see section 5.3.1). That is, we can estimate the condition number of the problem during the PCG solves for the unregularized problem using the Lanczos algorithm (see [30, p. 528]). We can do this very efficiently by initializing the problem with a zero velocity field. Given v is zero, the application of the Hessian within the PCG is computationally inexpensive, as a lot of the terms in the optimality systems drop (see section 4.1). However, the level of regularization depends not only on properties of the data, but also on regularity requirements on y.

Another common strategy is to perform a parameter continuation in β (see, e.g., [35, 38]). In [38] it has been suggested to inform the algorithm about the required regularity of a solution on the basis of a lower bound on the L²-distance between the reference and the deformed template image. The decision on such bound, however, might not be intuitive for practitioners. Further, one is ultimately interested not only in a small residual but also in a bounded determinant of the deformation gradient. Therefore, we propose to inform the algorithm on regularity requirements in terms of a lower bound ε_F ∈ (0, 1) on $det (F_{1}^{h})$ (i.e., a bound on the tolerable compression of a volume element). If the Stokes regularization scheme (γ = 1 in (1.2b) and 𝒮 is an H¹-seminorm) is considered, bounds on geometric constraints of the deformation of a volume element can be used. In particular, we use a lower bound ε_θ > 0 on the acute angle of a grid element. The upper bound on the obtuse angle is given by 2π − ε_θ. Note that it is actually necessary to monitor geometric properties to guarantee a local diffeomorphism; a lower bound on $det (F_{1}^{h})$ is not sufficient.

Our algorithm proceeds as follows. In the first step, the registration problem is solved for a large value of β (β = 1 in our experiments) so that we underfit the data.⁷ Subsequently, β is reduced by one order of magnitude until we reach ε_F (or ε_θ). From there on, a binary search is performed. The algorithm is terminated if the change in β is below 5% of the value for β, for which ε_F (or ε_θ) was breached. We add a lower bound of 1E–6 on β as well as a lower bound for the relative change of the L₂-distance of 1E–2 to ensure that we do not perform unnecessary work. We never reached these bounds for the experiments conducted in this study.

Presmoothing

A numerical challenge in image computing is that images are functions of bounded variation. Therefore, an accurate computation of the derivatives becomes more involved. A common approach to ensure numerical stability and avoid the Gibbs phenomena is to reduce high-frequency information in the data. We use a Gaussian smoothing, which is parametrized by a user-defined standard deviation σ > 0. We experimentally found a value of σ = 2π/min(n_x) to be adequate for the problems at hand. However, we note that we increased σ by a factor of 2 for one set of experiments in section 5.3.2. We also note that we implemented a method for grid and scale continuation for the images. This avoids the problem of deciding on σ. We will investigate an automatic selection strategy for σ in a follow-up paper.

It is important to note that the sensitivity of second-order derivatives to noise in the data is problematic. Therefore, we refrain from applying the N-PCG method to nonsmooth images.

5. Numerical experiments

We report results only in two dimensions. We test the algorithm on real-world and synthetic registration problems (see section 5.1). The measures to analyze the registration results are summarized in section 5.2. We conduct a numerical study (see section 5.3), which includes an analysis of(i) the spectral properties of the Hessian (see section 5.3.1), (ii) grid convergence (see section 5.3.2), and (iii) the effects of varying the number of the unknowns in time (see section 5.3.3). We additionally report results for a fully automatic registration on high-resolution images based on the designed parameter continuation in β (see section 5.4).

5.1. Data

We consider synthetic and real-world registration problems.⁸ These are illustrated in Figure 1. All images have been normalized to an intensity range of [0, 1]. The synthetic problems are constructed by solving the forward problem to create an artificial template image m_T given some image m_R and some velocity field v^★ (sinusoidal images and UT images in Figure 1). Here, v^★ is chosen to live on the manifold of divergence-free velocity fields to provide a testing environment for the Stokes regularization scheme. Further, v^★ is by construction assumed to be constant in time (i.e., n_c = 1). In particular, we have

Synthetic and real-world registration problems. All images have been normalized to an intensity range of [0, 1]. The registration problems are referred to as sinusoidal images (top row, left), UT images (top row, right), hand images (bottom row, left), and brain images (bottom row, right). Each row displays the reference image *m_R*, the template (deformable) image *m_T*, and a map of their pointwise difference (from left to right as identified by the inset in the images). We provide an illustration of the deformation pattern y (overlaid onto *m_T*) for the synthetic problems. This mapping is computed from v^★ in (5.1) via (D.1).

v^{i, ★} (x^{i}, t) = 0.5 sin (x^{i}) cos (x^{i}) \forall t \in [0, 1], i = 1, \dots, d .

(5.1)

5.2. Measures of performance

We report the number of the (outer) iterations and the number of the hyperbolic PDE solves to assess the computational work load. The latter is a good proxy for the wall clock time. It provides a transparent comparison between the designed first- and second-order methods, given that the number of the hyperbolic PDE solves varies between these methods: Two hyperbolic PDE solves are required during each iteration of the Picard method (we have to solve (4.1a) and (4.1b) to evaluate the reduced gradient in (4.1d); see also Algorithm 1). For the Newton–Krylov methods, we require an additional two hyperbolic PDE solves per inner iteration (we have to solve (4.3a) and (4.3b) to compute the Hessian matrix-vector product given in (4.3d); see also Algorithm 2). Each evaluation of 𝒥^h in (1.2) (i.e., each line search step) requires an additional hyperbolic PDE solve (we have to compute the deformed template image at t = 1 by solving (4.1a)). Note that the solution of the forward and adjoint problems is the key bottleneck of our algorithm. We will report the wall clock times in a follow-up paper, in which we extend the current framework to a three-dimensional implementation. This study focuses on algorithmic features.

We report the relative change of(i) the L²-distance, (ii) the objective functional 𝒥^h, and (iii) the ℓ^∞-norm of the (reduced) gradient g^h to assess the quality of the inversion. We additionally report values for the determinant of the deformation gradient to study local deformation properties. These measures are defined more explicitly in Table 2.

Table 2.

Overview of the quantitative measures that are used to assess the registration performance. We report the number of outer iterations (steps for updating the control variable v^h) and the number of hyperbolic PDE solves (i.e., how often we have to solve one of the hyperbolic PDEs (4.1a), (4.1b), (4.3a), and (4.3b) that appear in the optimality systems) to assess the work load. We report the relative change of the L²-distance, the objective, and the reduced gradient to assess the quality of the inversion. We report values for the determinant of the deformation gradient to assess the regularity of the computed deformation map. We report the relative power spectrum of the coefficients $v_{l}^{h}$ to assess which of the coefficients of the expansion in (4.10) are significant.

Description

Definition

# of required outer iterations

k^★

# of required hyperbolic PDE solves

n_PDE

Relative change of L²-distance

{‖ m_{1}^{h} - m_{R}^{h} ‖}_{2, rel} : = {‖ m_{1}^{h} - m_{R}^{h} ‖}_{2}^{2} / {‖ m_{T}^{h} - m_{R}^{h} ‖}_{2}^{2}

Relative change of objective value

δ J_{rel}^{h} : = J^{h} (v_{k^{★}}^{h}) / J^{h} (v_{0}^{h})

Relative change of reduced gradient

{‖ g^{h} ‖}_{\infty, rel} : = {‖ g_{k^{★}}^{h} ‖}_{\infty} / {‖ g_{0}^{h} ‖}_{\infty}

Determinant of deformation gradient

det (F_{1}^{h})

Relative power spectrum of

v_{l}^{h}

{‖ v_{l}^{h} ‖}_{2, rel} : = {‖ v_{l}^{h} ‖}_{2} / {‖ {v_{l^{'}}^{h}}_{l^{'} = 1}^{n_{c}} ‖}_{2}

Open in a new tab

We visually support this quantitative analysis on the basis of snapshots of the registration results. Information on the reconstruction accuracy can be obtained from pointwise maps of the residual difference between m_R and m₁. The deformation regularity and the mass conservation can be assessed via images of the pointwise determinant of the deformation gradient and/or of a deformed grid overlaid onto m₁. Details on how these are obtained and on how to interpret them can be found in Appendix D.

5.3. Numerical study

We study the spectral properties of the Hessian (see section 5.3.1), grid convergence (see section 5.3.2), as well as the influence of an increase in the number of the unknowns in time (see section 5.3.3).

5.3.1. Spectral analysis

Purpose

We study the ill-posedness and the ill-conditioning of the problem at hand. We report spectral properties of ℋ^h. We study the eigenvalues and the eigenvectors with respect to different choices for β. We study the differences between plain H²-regularization and the Stokes regularization scheme (H¹-regularization).

Setup

This study is based on the UT images (the true solution v^h, ^★ is divergence free; see section 5.1 for more details on the construction of this synthetic registration problem; n_x = (64, 64)^⊤ and n_c = 1 so that n = 8192). The eigendecomposition V Λ V ⁻¹, $V = {(ν_{i})}_{i = 1}^{n}$ , ν_i ∈ Rⁿ, ||ν_i||₂ = 1, Λ = diag(Λ₁₁, …, Λ_nn), Λ_ii > 0, is computed at the true solution v^h, ^★ to guarantee that ℋ^h ≻ 0. The spectrum is computed for three different choices of β: for the unregularized problem (β = 0), an empirically determined (moderate) value (β = 1E–3), and solely for the regularization model (β = 1E6).

Results

Figure 2 displays the trend of the absolute value of the eigenvalues Λ_ii, i = 1, …, n. They are sorted in descending order for β = 0 and in ascending order otherwise. If an eigenvalue drops below machine precision (i.e., 1E–16), it is set to 1E–16 (only for visualization purposes). The extremal real and imaginary part of the eigenvalues is summarized in Table 3. Figure 3 provides the spatial variation of the eigenvectors ν_i ∈ Rⁿ that correspond to the eigenvalues Λ_ii, i ∈ {1, 5, 20, 100, 1000}, in Figure 2 with respect to different choices for β and different regularization schemes. We only display the first component $ν_{i}^{1}$ of the coefficient field $ν_{i} : = (ν_{i}^{1}, ν_{i}^{2})$ . The pattern for the second component is (qualitatively) alike.

Trend of the absolute value of the eigenvalues Λ*_ii*, i = 1, …, 8192, of the reduced Hessian ℋ^h for plain H²-regularization (γ = 0; left) and the Stokes regularization scheme (γ = 1; H¹-regularization; right) for β ∈ {0, 1E–2, 1E6} (as indicated in the legend of the plots). We report the trend of the entire set of 8192 eigenvalues. The test problem is the UT images (see section 5.1 for details on the construction of this synthetic registration problem; n_x = (64, 64)^⊤ and *n_c* = 1). The Hessian is computed at the true solution v^h^{, ★} to ensure that ℋ^h ≻ 0 (this statement is confirmed by the values reported in Table 3). The eigenvalues (absolute value) are sorted in descending order for the unregularized problem (i.e., for β = 0) and in ascending order otherwise (i.e., for β = 1E–3 and β = 1E6).

Table 3.

Extrema of the eigenvalues Λ_ii, i = 1, …, 8192, of the reduced Hessian reported in Figure 2. We report values for plain H²-regularization (γ = 0; top block) and the Stokes regularization scheme (γ = 1; H¹-regularization; bottom block). We refer to Figure 2 and the text for details on the experimental setup. We report the smallest and the largest real part as well as the largest absolute value of the imaginary part of the eigenvalues Λ_ii with respect to different choices of the regularization parameter β ∈ {0, 1E–2, 1E6}.

H²-regularization (γ = 0)

min {{(Re (Λ_{i i}))}_{i = 1}^{n}}

max {{(Re (Λ_{i i}))}_{i = 1}^{n}}

max {{(∣ Im (Λ_{i i}) ∣)}_{i = 1}^{n}}

−8.35E–15

3.72E1

3.26E–7

1E–3

2.59E–3

4.19E3

1E6

1.30

4.19E12

Stokes regularization (γ = 1)

min {{(Re (Λ_{i i}))}_{i = 1}^{n}}

max {{(Re (Λ_{i i}))}_{i = 1}^{n}}

max {{(∣ Im (Λ_{i i}) ∣)}_{i = 1}^{n}}

−3.74E–13

2.48E1

5.92E–6

1E–3

1.00E–3

2.56E1

2.59E–6

1E6

1.29

2.05E9

1.43E–7

Open in a new tab

Eigenvector plots of the reduced Hessian ℋ^h ∈ Rⁿ^×ⁿ, n = 8192, for β ∈ {0, 1E–3, 1E6} for plain H²-regularization (γ = 0; top) and for the Stokes regularization scheme (γ = 1; H¹-regularization; bottom). The results correspond to the eigenvalue plots reported in Figure 2. We refer to Figure 2 and the text for details on the experimental setup. Each plot provides the spatial variation of the portion of an eigenvector ν_i ∈ Rⁿ associated with the first component of the coefficient field $ν_{l}^{h}$ , l = *n_c*, *n_c* = 1. The individual plots correspond to the eigenvalues Λ*_ii* > 0, i = 1, 5, 20, 100, 1000 in Figure 2. The range of the values for $ν_{i}^{1}$ is provided below each plot.

Observations

The most important observations are that(i) the regularization schemes display a very similar behavior (as judged by the clustering of the eigenvalues as well as the spatial variation of the eigenvectors for β = 1E6), (ii) the smoothness of the eigenvectors decreases with a decreasing regularization parameter β and increasing eigenvalues (for β = 1E–3 and β = 1E6) for both regularization schemes, and (iii) it is less clear how to identify the smooth eigenvectors within the eigenspace of the Stokes regularization scheme.

The eigenvalues Λ_ii, i = 1, …, 8192, drop rapidly for the unregularized problem, approaching almost machine precision for i ≈ 4000 (see Figure 2). This demonstrates ill-conditioning and ill-posedness. The eigenvalues are bounded away from zero for the regularized problem. Increasing β shifts the trend of |Λ_ii| to larger numbers. The values in Table 3 confirm that ℋ^h ≻ 0 (up to almost machine precision) at the true solution v^h, ^★.

Turning to the eigenvector plots, we can see that the first eigenvector displays a delta peak like structure for both regularization schemes, since there is no local coupling of the spatial information. For the regularized problem we can observe a smooth spatial variation for the eigenvectors associated with large eigenvalues for both regularization schemes. The first eigenvector plot is almost constant for β = 1E6 (bottom row of each block in Figure 3). The structure of the pattern for β = 1E6 is analogue for both schemes, which indicates similarities in the behavior of both schemes. As the index i increases, the eigenvectors become more oscillatory. We can observe strong differences between the two schemes for a moderate regularization (β = 1E–3; middle row of each block in Figure 3). Also, we can observe a more complex structure for the Stokes regularization scheme for small eigenvalues (i.e., it is difficult to identify where the smoothest eigenvectors are located within the eigenspace).

Conclusion

We conclude that the Hessian operator is singular if we do not include a smoothness regularization model for the control variable. For practical values of the regularization parameter, the Hessian behaves as a compact operator; larger eigenvalues are associated with smooth eigenvectors. It is well known that designing a preconditioner for such operators is challenging.

5.3.2. Convergence study

We study the grid convergence of the considered iterative optimization methods on the basis of synthetic registration problems. We use a rigid setting to prevent bias originating from adaptive changes during the computations. That is, the results are computed on a single resolution level. No grid, scale, or parameter continuation is applied. The number of the time points is fixed to n_t = 4max(n_x) for all experiments. Further, we use empirically determined values for the regularization parameter β, namely, β ∈ {1E–2, 1E–3}. Since we are interested in studying the convergence properties of our method, we consider the relative change of the ℓ^∞-norm of the reduced gradient g^h as a stopping criterion. This yields a fair comparison between the different optimization methods, as a reduction in the norm of g^h directly reflects how well an optimization problem is solved (i.e., we exploit that g^h = 0 is a necessary condition for a minimizer). We terminate if the relative change of the ℓ^∞-norm of g^h is at least three orders of magnitude. However, since the Picard method tends to converge slowly for low tolerances with respect to the gradient, we stop if we detect a stagnation in the objective. In particular, we terminate the optimization if the change in the objective in ten consecutive iterations was equal or below 1E–6. We solve for a stationary velocity field (i.e., n_c = 1).

C^∞ registration problem

Purpose

We study the numerical behavior for smooth registration problems. We report results for grid convergence and deformation regularity. We compare the Picard, GN-PCG, and N-PCG methods.

Setup

This experiment is based on the sinusoidal images (see section 5.1 for more details on the construction of this synthetic registration problem). Therefore, m_T, m_R ∈ C^∞(Ω), and v^★ ∈ L²([0, 1];C^∞(Ω)^d) so that the excellent convergence properties of Fourier spectral methods are expected to pay off. Additionally, it is not problematic to apply the N-PCG method. We report results for different grid sizes $n_{x} = {(n_{x}^{1}, n_{x}^{2})}^{⊤}, n_{x}^{i} \in {64, 128, 256}$ , i = 1, 2, n_t = 4max(n_x). No presmoothing is applied. We use an experimentally determined value of β = 1E–3 for all experiments. The remainder of the parameters is chosen as stated in the introduction to this section as well as in section 4.3.5.

Results

The grid convergence results are summarized in Table 4. Values derived from the deformation gradient $F_{1}^{h}$ are reported in Table 5. Exemplary results for the plain H²-regularization (γ = 0) and the Stokes regularization scheme (γ = 1; H¹-regularization) are displayed in Figure 4. The definitions of the quantitative measures reported in Table 4 and Table 5 can be found in Table 2.

Table 4.

Quantitative analysis of the convergence of the Picard, N-PCG, and GN-PCG methods. The test problem is the sinusoidal images (see section 5.1 for details on the construction of this synthetic registration problem). We compare convergence results for plain H²-regularization (γ = 0; top block) and the Stokes regularization scheme (γ = 1; H¹-regularization; bottom block). We report results for different grid sizes $n_{x} = {(n_{x}^{1}, n_{x}^{2})}^{⊤}, n_{x}^{i} \in {64, 128, 256}$ , i = 1, 2. We invert for a stationary velocity field (i.e., n_c = 1). We terminate the optimization if the relative change of the ℓ^∞-norm of the reduced gradient g^h is at least three orders of magnitude or if the change in 𝒥^h between 10 successive iterations is below or equal to 1E–6 (i.e., the algorithm stagnates). The regularization parameter is empirically set to β = 1E–3. The number of the (outer) iterations (k^★), the number of the hyperbolic PDE solves (n_PDE), and the relative change of (i) the L²-distance ( ${‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}$ ), (ii) the objective ( $δ J_{rel}^{h}$ ), and (iii) the (reduced) gradient (||g^h||_∞,_rel) and the average number of the required line search steps ᾱ are reported. Note that we introduced a memory for the step size into the Picard method (preconditioned gradient descent) to stabilize the optimization (see section 4.3.5 and the description of the results). The definitions for the reported measures are summarized in Table 2.

n_{x}^{i}

k^★

n_PDE

{‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}

δ J_{rel}^{h}

||g^h||_∞,rel

ᾱ

H²-regularization

Picard

420

4.78E–3

6.05E–2

7.86E–3

1.72

128

414

4.69E–3

6.04E–2

6.48E–3

1.82

256

414

4.68E–3

6.03E–2

7.42E–3

1.75

GN-PCG

4.63E–3

6.05E–2

5.57E–5

1.00

128

4.59E–3

6.04E–2

5.60E–4

1.00

256

4.58E–3

6.03E–2

5.10E–4

1.00

N-PCG

4.63E–3

6.05E–2

1.94E–4

1.00

128

4.57E–3

6.04E–2

2.83E–4

1.00

256

4.57E–3

6.03E–2

2.86E–4

1.00

Stokes regularization

Picard

269

6.61E–4

1.56E–2

2.55E–3

1.68

128

250

5.79E–4

1.55E–2

3.55E–3

1.65

256

233

5.67E–4

1.55E–2

2.31E–3

1.71

GN-PCG

5.86E–4

1.56E–2

7.30E–5

1.00

128

4.87E–4

1.55E–2

8.09E–5

1.00

256

4.86E–4

1.55E–2

8.33E–5

1.00

N-PCG

5.86E–4

1.56E–2

2.60E–4

1.00

128

4.87E–4

1.55E–2

2.69E–4

1.00

256

4.87E–4

1.55E–2

2.51E–4

1.00

Open in a new tab

Table 5.

Obtained values for $det (F_{1}^{h})$ of exemplary registration results of the convergence study reported in Table 2 with respect to different iterative optimization methods (Picard, N-PCG, and GN-PCG). We refer to Table 4 and the text for details on the experimental setup. We report results for plain H²-regularization (γ = 0; top block) and for the Stokes regularization scheme (γ = 1; H¹-regularization; bottom block) for a grid size of n_x = (256, 256)^⊤.

min (det (F_{1}^{h}))

max (det (F_{1}^{h}))

mean (det (F_{1}^{h}))

std (det (F_{1}^{h}))

H²-regularization

Picard

8.81E–1

1.19

1.00

7.58E–2

GN-PCG

8.83E–1

1.19

1.00

7.56E–2

N-PCG

8.83E–1

1.19

1.00

7.54E–2

Stokes regularization

Picard

1.00

4.55E–12

GN-PCG

1.00

4.52E–12

N-PCG

1.00

4.52E–12

Open in a new tab

Qualitative comparison of exemplary registration results of the convergence study reported in Table 4. In particular, we display the results for the N-PCG method for a grid size of n_x = (256, 256)^⊤. We refer to Table 4 and the text for details on the experimental setup. We report results for plain H²-regularization (γ = 0; images to the left) and the Stokes regularization scheme (γ = 1; H¹-regularization; images to the right). We display the deformed template image m₁, a pointwise map of the residual differences between *m_R* and m₁ (which appears completely white, as the residual differences are extremely small), as well as a pointwise map of the determinant of the deformation gradient det(F₁) (from left to right as identified by the inset in the images). The values for the det(F₁) are reported in Table 5. Information on how to interpret these images can be found in Appendix D.

Observations

The most important observations are that (i) there are significant differences in computational work between the Picard and Newton–Krylov methods with the latter being much more efficient, (ii) the differences between the Newton–Krylov methods are insignificant, (iii) the rate of convergence is independent of the grid resolution, and (iv) the numerical accuracy is almost at the order of machine precision.

The registered images are quantitatively (see Table 4) and qualitatively (see Figure 4) in excellent agreement. For the considered tolerance (reduction of the ℓ^∞-norm of the reduced gradient by three orders of magnitude) we can reduce the L²-distance between three (compressible deformation) and four (incompressible deformation) orders of magnitude (see Table 4). The search direction of the Newton–Krylov methods is nicely scaled. No additional line search steps are necessary. We require 1.57 to 1.71 line search steps for the Picard iteration (on average). Note that we prescale the search direction of the Picard method by an additional parameter α̃_k, which is estimated during the computation (see section 4.3.5 for details). Otherwise, the number of the line search steps would be seven to eight on average for the Picard method. The Picard method did stagnate during the computations. This is why the gradient has not been reduced by three orders of magnitude for the Picard method. However, it is in general possible to reduce the gradient accordingly. We decided to report only until stagnation for the Picard method as the number of the iterations would significantly increase without making any real progress.

The Newton–Krylov methods display quick convergence. Only five outer iterations are necessary to reduce the gradient by more than four orders of magnitude. The results demonstrate a significant difference in the computational work between first- and second-order methods for the considered tolerance.

The reconstruction quality improves by approximately one order of magnitude when switching from plain H²-regularization (γ = 0) to a Stokes regularization scheme (γ = 1; H¹-regularization), as judged by the relative change in the L²-distance. This is expected, since the synthetic problem has been created under the assumption of mass conservation (i.e., ∇ · v^★ = 0). Second, we expect a smaller contribution of the H¹-regularization model on the solution for the same values of β.

From a theoretical point of view, we expect N-PCG to outperform GN-PCG (quadratic versus superlinear convergence). The reported results demonstrate an almost identical performance. This is due to the fact that we can drive the residual almost to zero, such that we can recover fast local convergence for the GN-PCG method (see section 4.2.2 for details).

The Picard method converges faster for the Stokes regularization scheme. However, the differences between the Picard and the Newton–Krylov methods are still significant with an approximately four-fold difference in n_PDE. For Newton–Krylov methods, we can globally observe a slight increase in the number of the inner iterations when switching from plain H²-regularization to the Stokes regularization scheme. These differences have to be attributed to a varying relative change in the reduced gradient.⁹

The results reported in Table 5 demonstrate an excellent numerical accuracy for the mass conservation for all numerical schemes. The error in the determinant of the deformation gradient is 𝒪(1E–12), i.e., we achieve an accuracy that is almost at the order of machine precision for a grid resolution of n_x = (256, 256)^⊤ and n_t = 1024.

Conclusion

We conclude that we can interchangeably use the Newton–Krylov methods. Therefore, given that N-PCG is more sensitive to noise and discontinuities in the data, we will exclusively consider GN-PCG for the remainder of the experiments. Also, if we require an inversion with high accuracy, the Newton–Krylov methods clearly outperform the Picard method (i.e., the preconditioned gradient descent).

Images with sharp features

Purpose

We study the grid convergence and deformation regularity for an image with sharp features. We compare the Picard and the GN-PCG method.

Setup

We consider the UT images (see section 5.1 for details on the construction of this synthetic registration problem). We report results for experimentally determined values of β ∈ {1E–2, 1E–3} with respect to different grid resolution levels $n_{x} = {(n_{x}^{1}, n_{x}^{2})}^{⊤}, n_{x}^{i} \in {64, 128, 256}$ , i = 1, 2, n_t = 4max(n_x). The remainder of the parameters are chosen as stated in the introduction of this section and in section 4.3.5. Both plain H²-regularization (γ = 0) as well as the Stokes regularization scheme (γ = 1; H¹-regularization) are considered.

For images of size n_x = (256, 256)^⊤ and a Stokes regularization scheme, we observed difficulties in the inversion (only the number of the outer and the inner iterations increased; the algorithm still converges to the same solution), due to a strong forcing (i.e., the sharp features pushed the solver at an early stage to a solution that was far away from the final minimizer). We increased the smoothing by a factor of two as a remedy. This is not an issue for the practical application of our algorithm, as our framework features a method for performing a scale continuation as well as a continuation in the regularization parameter. Therefore, the user does not have to decide on σ or on β. In addition to that, we currently investigate adaptive approaches to automatically detect insufficient smoothness during the course of the optimization to prevent a deterioration in the convergence behavior.

The remainder of the parameters are chosen as stated in the introduction of this section as well as in section 4.3.5.

Results

Table 6 summarizes the results of the convergence study. We illustrate intermediate results with respect to the first 13 (outer) iterations k in Figure 5 (plain H²-regularization; γ = 0). We report the trend of the individual building blocks of 𝒥^h (contribution of the L²-distance and the regularization model 𝒮^h) in Figure 6. We report measures of deformation regularity in Table 7.

Table 6.

Quantitative analysis of the convergence for the Picard and the GN-PCG method. The test problem is the UT images (see section 5.1 for more details on the construction of this synthetic registration problem). We compare convergence results for plain H²-regularization (γ = 0; top block) and the Stokes regularization scheme (γ = 1; bottom block; H¹-regularization) for empirically chosen regularization parameters β ∈ {1E–2, 1E–3}. We report results for different grid sizes $n_{x} = {(n_{x}^{1}, n_{x}^{2})}^{⊤}, n_{x}^{i} \in {64, 128, 256}$ , i = 1, 2, n_t = 4 max(n_x). We invert for a stationary velocity field (i.e., n_c = 1). We terminate the optimization if the relative change of the ℓ^∞-norm of the reduced gradient g^h is larger than or equal to three orders of magnitude or if the change in 𝒥^h between 10 successive iterations is below or equal to 1E–6 (i.e., the algorithm stagnates). We report the number of the (outer) iterations (k^★), the number of the hyperbolic PDE solves (n_PDE) and the relative change of (i) the L²-distance ( ${‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}$ ), (ii) the objective ( $δ J_{rel}^{h}$ ), and (iii) the (reduced) gradient (||g^h||_∞,_rel) as well as the average number of the line search steps ᾱ. Note that we introduced a memory for the step size into the Picard method to stabilize the optimization (see section 4.3.5 and the description of the results). The definitions for the reported measures can be found in Table 2. This study directly relates to the results for the smooth registration problem (see section 5.3.2, in particular Table 4).

n_{x}^{i}

k^★

n_PDE

{‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}

δ J_{rel}^{h}

||g^h||_∞,rel

ᾱ

H²-regularization

β = 1E–2

Picard

130

752

1.99E–2

1.18E–1

2.06E–2

1.68

128

231

1589

8.29E–3

8.47E–2

4.03E–2

1.72

256

388

3022

5.05E–3

7.92E–2

9.86E–2

1.68

GN-PCG

282

1.94E–2

1.16E–1

3.30E–4

1.00

128

450

8.09E–3

8.37E–2

6.07E–4

1.00

256

789

4.87E–3

7.85E–2

7.28E–4

1.00

β = 1E–3

Picard

339

4410

1.84E–3

1.54E–2

2.46E–2

1.56

128

466

9671

7.19E–4

9.84E–3

5.98E–2

1.64

256

632

8690

5.35E–4

8.81E–3

1.08E–1

1.64

GN-PCG

670

1.44E–3

1.50E–2

2.59E–4

1.00

128

929

4.47E–4

9.60E–3

4.76E–4

1.00

256

1744

2.45E–4

8.55E–3

4.04E–4

1.00

Stokes regularization

β = 1E–2

Picard

448

2.73E–3

3.62E–2

7.54E–3

1.64

128

136

958

8.92E–4

2.38E–2

1.63E–2

1.72

256

143

1004

1.09E–3

2.53E–2

1.75E–2

1.68

GN-PCG

313

2.72E–3

3.59E–2

4.70E–4

1.00

128

437

8.88E–4

2.37E–2

5.59E–4

1.00

256

514

1.09E–3

2.52E–2

5.49E–4

1.00

β = 1E–3

Picard

216

2823

9.99E–4

4.69E–3

1.34E–2

1.56

128

179

5465

3.57E–4

2.77E–3

1.79E–2

1.69

256

175

5569

5.27E–4

3.07E–3

2.22E–2

1.68

GN-PCG

1162

7.57E–4

4.55E–3

8.23E–4

1.00

128

1069

2.15E–4

2.65E–3

5.67E–4

1.00

256

769

3.43E–4

2.92E–3

6.44E–4

1.00

Open in a new tab

Illustration of the course of the optimization for the Picard (top block) and the GN-PCG (bottom block) methods with respect to the (outer) iteration index k for exemplary results of the convergence study reported in Table 6. We refer to Table 6 and the text for details on the experimental setup. We report results for plain H²-regularization (γ = 0), with an empirically chosen regularization parameter of β = 1E–3 for images of grid size n_x = (256, 256)^⊤. We report results until convergence of the GN-PCG method (k^★ = 13). We display the deformed template m₁ (top row) and a map of the pointwise difference between *m_R* and m₁ (bottom row) for both iterative optimization methods (as identified by the inset on the right of the images). Information on how to interpret these images can be found in Appendix D.

Trend of the objective 𝒥^h, the L²-distance, and the regularization model 𝒮^h (logarithmic scale) for the Picard and the GN-PCG method with respect to the (outer) iteration index k for exemplary results of the convergence study reported in Table 6. We refer to Table 6 and the text for details on the experimental setup. The trend of the functionals is plotted for different (empirically determined) choices of β (left column: β = 1E–2; right column: β = 1E–3) and a grid size of n_x = (256, 256)^⊤. We report results for plain H²-regularization (γ = 0; top row) and the Stokes regularization scheme (γ = 1; H¹-regularization; bottom row).

Table 7.

Values for the determinant of the deformation gradient $det (F_{1}^{h})$ for exemplary results of the convergence study reported in Table 6. We report results for the Picard and the GN-PCG method. We refer to Table 6 and the text for details on the experimental setup. We report results for the Stokes regularization scheme (γ = 1; H¹-regularization). The regularization parameter is set to β = 1E–3. The grid size is n_x = (256, 256)^⊤. These results directly relate to those reported for the smooth registration problem (see section 5.3.2, in particular Table 5).

min (det (F_{1}^{h}))

max (det (F_{1}^{h}))

mean (det (F_{1}^{h}))

std (det (F_{1}^{h}))

Picard

10.00E–1

1.00

2.03E–5

GN-PCG

10.00E–1

1.00

2.29E–5

Open in a new tab

Observations

The most important observations are(i) the GN-PCG method displays a quicker convergence than the Picard method, (ii) we cannot achieve the same inversion accuracy with the Picard method as compared to the GN-PCG method, and (iii) the number of the (inner and outer) iterations increases and is no longer independent of the resolution level.

The rate of convergence decreases compared to the results reported in the former section (see Table 4). Overall, we require more outer and inner iterations to solve the registration problem.

The residual differences between m_R and m₁ clearly depend on the choice of β (see Table 4 and Figure 6). We achieve a similar reduction in the L²-distance for both the Picard and the GN-PCG method (two to four orders of magnitude). The residual differences are less pronounced when switching from plain H²-regularization to the Stokes regularization scheme as compared to the results reported in section 5.3.2.

We cannot guarantee that it is possible to reduce the gradient by three orders of magnitude if we use the Picard method. Even if we do not include a condition to terminate if we observe stagnation (i.e., the change in 𝒥^h is below or equal to 1E–6 for 10 consecutive iterations), it is for some of the experiments not possible to reduce the gradient by three orders of magnitude as the changes of the objective hit our numerical accuracy (which causes the line search to fail). We do not observe this issue when considering the GN-PCG method. Further, there are significant differences in terms of the computational work. If we do not account for the stagnation of the Picard method we have observed a number of hyperbolic PDE solves that is well above 𝒪(1E4). Clearly, in a practical application we terminate the Picard method at an earlier stage, as we no longer make significant progress. However, in this part of the study we are interested in the convergence properties. This experiment demonstrates that we cannot guarantee a high inversion accuracy (i.e., a significant reduction in the gradient) when turning to first-order methods. Note that we have stabilized the Picard method by introducing an additional scaling parameter for the search direction that prevents additional line search steps (see section 4.3.5). If we neglect this scaling, we observe seven to nine line search steps on average (results not included in this study) for the considered problem; also, the optimization fails at an early stage. The search direction obtained via the GN-PCG method is nicely scaled; no additional line search steps are necessary.

The trend of the 𝒥^h, the L²-distance and 𝒮^h in Figure 6 confirm these observations. The plots in Figure 6 illustrate that the Picard and the GN-PCG method perform very similarly during the first few outer iterations. However, after about four outer iterations the differences between the methods manifest, in particular with respect to the reduction of the L²-distance. This observation confirms standard numerical optimization theory on convergence properties of the Picard and the inexact Newton–Krylov methods.

Focusing on the GN-PCG method we can observe that the number of the outer iterations is almost constant across different grid sizes. However, the effectiveness of the spectral preconditioner decreases with an increasing grid size as well as with a reduction of the regularization parameter (as judged by an increase in the number of the inner iterations). This demonstrates that the preconditioner is not optimal. A similar behavior can be observed for the Picard method.¹⁰

The numerical accuracy of the incompressibility constraint deteriorates (slightly, but not significantly) as compared to the results reported in the former section. In particular, we obtain a numerical accuracy of 𝒪(1E–5) for the GN-PCG method (see Table 7).

Conclusion

We conclude that the GN-PCG is less sensitive, provides a better inversion accuracy, and overall displays quicker convergence if a high accuracy of the inversion is required and, therefore it is to be preferred.

5.3.3. Number of unknowns in time

Purpose

It is not immediately evident how the number of the coefficient fields $v_{l}^{h} : R^{d} \to R^{d}$ , l = 1, …, n_c, affects the registration quality. We study the effects of varying n_c on the reconstruction quality and the rate of convergence. We also provide advice on how to decide on n_c.

Setup

We report results for registration problems of varying complexity. The analysis is limited to the GN-PCG method. We consider the hand images (n_x = (128, 128)^⊤) and the brain images (n_x = (200, 200)^⊤). The number of the time steps is fixed to n_t = 2max(n_x). The regularization parameter is empirically set to β = 1E–3 and β = 2E–2, respectively. We consider the full set of stopping conditions in (4.8) with τ_𝒥 = 1E–3, as we no longer compare different methods. The remainder of the parameters are set as stated in section 4.3.5.

One possibility to estimate an adequate number of coefficients for the registration of unseen images m_R and m_T is to compute the relative spectral power (see Table 2) of an individual coefficient field $v_{l}^{h}$ for different choices of n_c. If only a small number of coefficients is necessary to recover the deformation, this energy should decrease rapidly with an increasing l. The problem is stationary for n_c = 1.

Results

The trend of the relative ℓ²-norm (i.e., the spectral power) of an individual coefficient field $v_{l}^{h}$ for different choices of n_c ∈ {1, 2, 4, 8, 16} is plotted in Figure 7. Convergence results are reported in Table 8. A qualitative comparison of the registration results for different choices of n_c can be found in Figure 8.

Relative power spectrum of the individual coefficient fields $v_{l}^{h} : R^{d} \to R^{d}$ , l = 1, …, *n_c*, for different choices of *n_c* used to solve the considered registration problems. We report results for plain H²- regularization. The reported results correspond to Table 8. We refer to Table 8 and the text for details on the experimental setup. We report exemplary results for the brain images (n_x = (200, 200)^⊤; left) and the hand images (n_x = (128, 128)^⊤; right). We choose *n_c* to be in {1, 2, 4, 8, 16} as indicated in the legend of each plot. The definition of the relative ℓ²-norm (relative power spectrum) of $v_{l}^{h}$ can be found in Table 2.

Table 8.

Comparison of the inversion results for the GN-PCG method for a varying number of the spatial coefficient fields $v_{l}^{h} : R^{d} \to R^{d}$ , l = 1, …, n_c (i.e., we change the number of the unknowns in time). We report results for plain H²-regularization. We consider the hand images (n_x = (128, 128)^⊤; top block) and the brain images (n_x = (200, 200)^⊤; bottom block). We consider the full set of stopping conditions in (4.8) with τ_𝒥 = 1E–3, as we no longer study grid convergence and/or compare different optimization methods. We report the number of the (outer) iterations (k^★), the number of the hyperbolic PDE solves (n_PDE) and the relative change of (i) the L²-distance ( ${‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}$ ), (ii) the objective ( $δ J_{rel}^{h}$ ), and (iii) the (reduced) gradient (||g^h||_∞,_rel) as well as the minimal and maximal values of the determinant of the deformation gradient. The definitions of these measures can be found in Table 2. The number of the coefficient fields n_c used to solve the individual registration problems is chosen to be in {1, 2, 4, 8, 16}.

n_c

n_PDE

{‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}

δ J_{rel}^{h}

||g^h||_∞,rel

min (det (F_{1}^{h}))

max (det (F_{1}^{h}))

Hand images

279

6.65E–2

8.52E–2

2.27E–2

2.14E–1

6.60

279

6.60E–2

8.48E–2

2.29E–2

2.15E–1

6.44

283

6.41E–2

8.24E–2

2.29E–2

2.07E–1

6.49

277

6.41E–2

8.24E–2

2.51E–2

2.08E–1

6.46

281

6.41E–2

8.24E–2

2.56E–2

2.08E–1

6.45

Brain images

669

5.49E–1

6.68E–1

3.71E–2

4.93E–2

6.47

667

5.47E–1

6.66E–1

3.72E–2

4.96E–2

6.45

710

5.31E–1

6.51E–1

3.66E–2

4.55E–2

7.33

710

5.30E–1

6.51E–1

3.52E–2

4.53E–2

7.37

708

5.30E–1

6.51E–1

3.54E–2

4.53E–2

7.37

Open in a new tab

Qualitative comparison of exemplary registration results for *n_c* = 1 (images to the left) and *n_c* = 16 (images to the right) of the study reported in Table 8. We refer to Table 8 and the text for details on the experimental setup. We report results for the hand images (n_x = (128, 128)^⊤) and the brain images (n_x = (200, 200)^⊤) for plain H²-regularization. We display (for each experiment) the deformed template image m₁, a pointwise map of the absolute difference between *m_R* and m₁, and a map of the determinant of the deformation gradient det(F₁) (from left to right as identified by the inset in the images). Information on how to interpret these images can be found in Appendix D.

Observations

The most important observation is that we obtain the same results for stationary as well as time varying velocity fields for two-image registration problems. Qualitatively, we cannot observe any differences for a varying number of coefficient fields (see Figure 8). This observation is confirmed by the values for the relative reduction in the L²-distance in Table 8. Increasing the number of the coefficients slightly reduces the L²-distance. These differences, however, are practically insignificant. In particular, we (on average) observe a relative change in the L²-distance of 6.50E–2±1.20E–3 (hand images, plain H²-regularization) and 5.37E–1 ± 9.75E–3 (brain images, plain H²-regularization). Also, we obtain identical deformation patterns as judged by careful visual inspection (see Figure 8) and the variations in the determinant of the deformation gradient. We obtain identical results for the UT images (for plain H²-regularization and the Stokes regularization scheme; results are not included in this study).

Turning to the required work load, we observe that the differences are also insignificant. The number of the outer iterations is almost constant; just the number of the inner iterations varies. In particular, we require 7 outer iterations with (on average) ≈280 inner iterations (hand images; plain H²-regularizatoin; γ = 0) and 20–21 outer iterations with (on average) ≈693 inner iterations (brain images; plain H²-regularization; γ = 0). However, we have to keep in mind that each application of the reduced Hessian is slightly more expensive and we require more memory as n_c increases. That is, we have to store more coefficient fields $v_{l}^{h}$ (to all of which the regularization operator has to be applied). The cost of the forward and adjoint solves (which is the key bottleneck), however, is (almost) the same, since we expand v^h (note that this expansion is not necessary for n_c = 1; see section 4.3.2).

The power spectrum of the coefficient fields drops quickly (see Figure 7). This also indicates that only a small number of coefficients is required to obtain an excellent agreement between the images. However, we expect the differences to manifest, when registering time series of images (multiple time frames). Here, we might benefit from being able to invert for a time varying velocity field.

Conclusion

We conclude that it is sufficient to use stationary velocity fields for two-image registration problems.

5.4. Parameter continuation to estimate β

Purpose

We study the stability and accuracy of the designed parameter continuation method (see section 4.3.5) and the associated control over the properties of the mapping. That is, we study how the quantities of interest (determinant of the deformation gradient and L²-distance) behave during the course of the parameter continuation and how close we actually approach the given bounds.

Setup

The registration problems are solved on images with a grid size of n_x = (512, 512)^⊤. The number of time points is adapted as required by monitoring the CFL condition (see section 4.3.1). We use the full set of stopping conditions (see section 4.2.4) with a tolerance of τ_𝒥 = 1E–3. We consider the hand images and the brain images (see Figure 1). We invert for a stationary velocity field (i.e., n_c = 1). In case we consider a plain H¹- and H²-regularization (smoothness regularization; γ = 0), we set the lower bound on $det (F_{1}^{h})$ to 1E–1 (hand images) and 5E–2 (brain images), respectively. For the case of the Stokes regularization (γ = 1; H¹-regularization) we set the bound on the grid angle to ε_θ = π/16 (11.25°). The remainder of the parameters are set as described in section 4.3.5.

Results

We report the obtained estimates for β as well as results for the reconstruction quality and deformation regularity in Table 9. We provide an exemplary illustration of the obtained registration results in Figure 9. We report results for the course of the parameter continuation in Figure 10.

Table 9.

Quantitative analysis of the parameter continuation in β. We report results for the hand images and the brain images for different regularization schemes (see Figure 1). The spatial grid size for the images is n_x = (512, 512)^⊤. The number of the time points n_t is chosen adaptively (see section 4.3.5 for details). We use the full set of stopping conditions (see section 4.2.4) with a tolerance of τ_𝒥 = 1E–3. We report results for plain H¹- and H²-regularization (γ = 0; top block) as well as for the Stokes regularization scheme (γ = 1; H¹-regularization; bottom block). We invert for a stationary velocity field (i.e., n_c = 1). We report (i) the considered lower bound on the deformation gradient (ε_F) or the grid angle (ε_θ), (ii) the number of the required estimation steps, (iii) the minimal value for the deformation gradient for the optimal regularization parameter, (iv) the computed optimal value for β(β^★), (v) the minimal change in β(δβ_min), as well as (vi) the relative change in the L²-distance ( ${‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}$ ).

Smoothness regularization

Data

𝒮

ε_F

steps

min (det (F_{1}^{h}))

β^★

δβ_min

{‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}

Brain images

H¹

5E–2

5.09E–2

3.11E–1

5.00E–3

7.01E–1

H²

5E–2

5.11E–2

2.13E–2

5.00E–4

5.52E–1

Hand images

H¹

1E–1

1.13E–1

3.39E–2

5.00E–4

8.74E–2

H²

1E–1

1.05E–1

2.69E–4

5.00E–6

6.83E–2

Stokes regularization

Data

𝒮

ε_θ

steps

min (det (F_{1}^{h}))

β^★

δβ_min

{‖ m_{R}^{h} - m_{1}^{h} ‖}_{2, rel}

Hand images

H¹

π/16

10.00E–1

2.13E–2

5.00E–4

1.17E–1

Open in a new tab

Qualitative illustration of exemplary registration results of the results for the parameter continuation in β reported in Table 9. We refer to Table 9 and the text for details on the experimental setup. We report results for the brain images (top row; plain H²-regularization; γ = 0) and the hand images (middle row: plain H²-regularization (γ = 0); bottom row: Stokes regularization scheme (γ = 1; H¹-regularization)). We display the deformed template image m₁, a map of the absolute difference between *m_R* and *m_T* and between *m_R* and m₁ and a map of the determinant of the deformation gradient det(F₁) (from left to right as indicated in the inset in the images). Information on how to interpret these images can be found in Appendix D.

Exemplary illustration of the course of the parameter continuation in β for the quantitative results reported in Table 9. We refer to Table 9 and the text for details on the experimental setup. We report results for the brain images (top row) and the hand images (bottom row) for plain H²-regularization. For each experiment, we display (from left to right) (i) the trend of the minimal value of the determinant of the deformation gradient (the dashed line indicates the user-defined lower bound on $det (F_{1}^{h})$ , (ii) the trend of the L²-distance (*h_d* is the grid cell volume and $r : = m_{R}^{h} - m_{T}^{h}$ , r ∈ R^ñ, $\tilde{n} = \prod_{i = 1}^{2} n_{x}^{i}$ ) and (iii) the trend of β, all with respect to the parameter continuation step. We indicate our judgment on the results in color. That is, if a result is accepted (i.e., $min (det (F_{1}^{h})) \geq ε_{F}$ ) we plot the marker in green, and if a result is rejected (i.e., $min (det (F_{1}^{h})) < ε_{F}$ ) we plot the marker in red. The optimal value is plotted in blue. The plots correspond to the results reported in Table 9.

Observations

The most important observation is that we can precisely control the properties of our mapping without having to manually tune any parameters. We only have to decide on geometric bounds (the smallest tolerable deformation of a grid element or a bound on the shear angle of the grid cell), the decision on which is intuitive for practitioners.

The accuracy of our method (in space) is only limited by the grid resolution (i.e., how many frequencies we can resolve; this statement is confirmed by the experiments conducted in section 5.3.2) as well as the defined bounds on the binary search used to estimate β (see section 4.3.5). Clearly, the desired level of accuracy competes with the computational work load we are willing to invest.

For plain H¹- and H²-regularization, we achieve an excellent agreement between m_R and m₁ (see Figure 9) with a reduction of the L²-distance by approximately half an order of magnitude for the brain images and 1.5 orders of magnitude for the hand images (see Table 9). The discrepancy between the lower bound ε_F and $min (det (F_{h}^{1}))$ for the obtained optimal value of β is small. In particular, we are, e.g., bounded from above by an absolute difference of 1.13E–3 for the brain images (H²-regularization) and 5.10E–3 for the hand images (H²-regularization). These values are well above the attainable accuracy reported in section 5.3.2.

For the results reported for the Stokes regularization scheme we can qualitatively (see Figure 9) and quantitatively (see Table 9) observe that enforcing incompressibility up to numerical accuracy is a too-strong prior for the considered problem. However, the key observation and intention of this experiment is to demonstrate that we attain a deformation that is very well behaved (with $det (F_{1}^{h}) = 1$ ). A direct comparison to the result obtained for the incompressible case reveals that the mapping is diffeomorphic but displays a large variation in the magnitude of the determinant of the deformation gradient (see the leftmost image in the middle row and bottom row as well as the corresponding maps for $det (F_{1}^{h})$ in Figure 9). If we further decrease the bound on $det (F_{1}^{h})$ we will loose control and generate a mapping that locally is close to being nondiffeomorphic. We again emphasize that the intention of this work is the study of algorithmic properties. We will address the practical benefit of exploiting a model of (near-)incompressible flow in a follow-up paper and refer to [10, 16, 52, 62] for potential applications. This exemplary result on real-world data demonstrates that it might be beneficial to consider a relaxation of the incompressibility constraint in order to improve the mismatch between the considered images while maintaining as much control on the deformation regularity as possible.

In future work, we will focus on improvements of the computational efficiency for estimating β. We have tested combining it with a grid continuation but could not observe strong improvements. We will also investigate the idea of providing a coarse estimation of β via the spectral properties of the Hessian and from there on do a parameter continuation (see section 4.3.5 for additional comments).

Conclusion

We conclude that the designed framework is highly accurate, is stable, guarantees deformation regularity (assuming that the user-defined tolerance is sufficiently bounded away from irregularities), and does not require any additional tuning of parameters. The user merely has to provide a lower bound on an acceptable volume change or a bound on the distortion of a volume element (shear angle), the decision on which is intuitive for practitioners.

6. Conclusions

We have presented numerical methods for large deformation diffeomorphic nonrigid image registration that(i) operate in a black-box fashion, (ii) introduce novel algorithmic features (including a second-order Newton–Krylov method, spectral preconditioning, an efficient solver for Stokes problems, and a spectral Galerkin method in time), (iii) is stable and efficient and (iv) guarantees deformation regularity with an explicit control on the quality of the deformation.

In addition, we have conducted a detailed numerical study to demonstrate computational performance and numerical behavior on synthetic and real-world problems. The most important observations of our study are the following:

The Newton–Krylov methods outperform the globalized Picard method (see section 5.3.2) as we increase the image size and the registration fidelity.
We can enforce incompressibility with high accuracy. The numerical accuracy (in space) is only limited by the resolution of the data (see section 5.3.2).
We can compute deformations that are guaranteed to be regular (i.e., a local diffeomorphism) up to user specifications. Controlling the magnitude of det(F₁) is not sufficient, as volume elements might still collapse (deformation field with strong shear). Therefore, we introduced a parameter continuation in β that can be interfaced not only with lower bounds on the determinant of the deformation gradient but also with bounds on the geometric properties of the grid cells (in particular, the shear angle of a grid cell; see section 5.4).
The experiments reported in this study demonstrate that it is adequate to limit the inversion to stationary velocity fields when considering two-image registration problems. This observation is in accordance with results reported for other classes of large deformation image registration algorithms [1, 3, 44, 52, 69, 70]. We have additionally provided advice on how to decide on the number of unknowns in time (see section 5.3.3).

The control equation for the velocity is a space-time nonlinear elliptic system. But the main cost in our formulation is the solution of transport problems to compute the image transformation and the adjoint variables. That is, we require two hyperbolic PDE solves for computing the gradient (which essentially corresponds to one Picard iteration or one outer iteration of a Newton–Krylov method; see Algorithm 1) and an additional two hyperbolic PDE solves for evaluating the incremental control equation (Hessian matrix-vector product in Newton–Krylov methods; see Algorithm 2) in each inner iteration of the Krylov-subspace method. Because we use a pseudospectral discretization in space, the elliptic solve for the Picard iteration is (for quadratic regularization models) only at the cost of a spectral diagonal scaling. For the Newton–Krylov methods, we have to solve a linear system using an iterative solver. The Picard scheme has a lower cost per iteration but requires more iterations than the Newton–Krylov scheme.

Our results demonstrate that there is a significant difference in stability, computational work, and accuracy between the Picard and Newton–Krylov methods, especially when we require a high accuracy of the inversion. If we require an inaccurate solution or use a strong regularization, the differences between Picard and Newton–Krylov methods are less pronounced. Better preconditioning of the Hessian would make the Newton–Krylov approach preferable across the spectrum of accuracy requirements. The Newton–Krylov approach is not significantly more complex, since we essentially use the same numerical tools that have been used for the solution of the first-order optimality conditions. Also, the individual building blocks ((incremental) forcing term, regularization operator 𝒜 and the projection operator 𝒦) that appear in the first- and second-order optimality system are very similar. Therefore, the difference of solving the first- or the second-order optimality conditions essentially amounts to interfacing a Krylov-subspace method to solve the saddle point problem.

By formulating the nonrigid image registration as a problem of optimal control, we target the design of a generic, biophysically constrained framework for large deformation diffeomorphic image registration. Further, there are many applications that do require incompressible or near-incompressible deformations, for example, in medical image analysis. Our framework provides such a technology.

We report results only in two dimensions. Nothing in our formulation and numerical approximation is specific to the two-dimensional case. The next steps will be its extension to three-dimensions and to problems that have time sequences of images. For such cases, we expect to have to invert for a nonstationary velocity field. In addition, we aim at designing a framework that allows for a relaxation of the incompressibility constraint, as we observed in this study that incompressibility might be a too-strong prior. We also observed (results not included in this study) that the use of an incompressibility constraint can promote shear. In a follow-up paper, we will target this problem by introducing a novel continuum mechanical model that allows us to control the shear inside the deformation field.

Table 4.

Acknowledgments

We would like to thank Amir Gholaminejad, Georg Stadler, and Bryan Quaife for helpful discussions and comments. We would like to thank Florian Tramnitzke for his initiative work on this project. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the AFOSR or the NSF.

Appendix A. Expansion in time: Derivation

This section summarizes modifications of the regularization operator as well as the (incremental) control equation on account of the expansion in time (see section 4.3.2). Inserting (4.10) into (3.1) yields

\int_{0}^{1} S [v] d t = \frac{β}{2} \sum_{l = 1}^{n_{c}} \sum_{l^{'} = 1}^{n_{c}} c_{l l^{'}} \int_{Ω} 〈 B [v_{l}], B [v_{l^{'}}] 〉 d x, where c_{l l^{'}} : = \int_{0}^{1} b_{l} b_{l^{'}} d t \forall l, l^{'} \in N .

(A.1)

Taking first and second variations with respect to the lth expansion coefficient v_l yields the control equation

β \sum_{l^{'} = 1}^{n_{c}} c_{l l^{'}} A [v_{l^{'}}] + \int_{0}^{1} b_{l} K [f] d t = 0, l = 1, \dots, n_{c},

(A.2)

and the incremental control equation

H_{l} [{\tilde{v}}_{l}] : = β \sum_{l^{'} = 1}^{n_{c}} c_{l l^{'}} A [{\tilde{v}}_{l^{'}}] + \int_{0}^{1} b_{l} K [\tilde{f}] d t, l = 1, \dots, n_{c},

(A.3)

respectively. Accordingly, the operators ℬ and 𝒜 simply act on v_l instead of v. The definition for these operators can be found in section 4.1.

We use a global basis on the unit time horizon for the expansion (see section 4.3.2). We use Chebyshev polynomials as basis functions in (4.10) on account of their excellent approximation properties as well as their orthogonality. The latter property considerably reduces the computational complexity, since c_ll_′ = 0 for all l, l′, l′ ≠ l, and c_ll = 1 (see (A.1), (A.2) and (A.3)). To avoid Runge’s phenomenon (see, e.g., [11, p. 82ff.]) Chebyshev–Gauss–Labatto nodes are used.

Appendix B. Incompressibility constraint: Elimination

Here, we derive the elimination of p and p̃ from the optimality systems for the Stokes regularization scheme. We only consider a quadratic H¹-regularization for the velocity v. However, the same line of arguments applies to the H²-regularization model.

Applying the divergence to (4.1d) results in¹¹

- \nabla \cdot β Δ v + Δ p + \nabla \cdot f = 0 in Ω \times [0, 1] .

Under the optimality assumption ∇ · v = 0 it follows from the definition of the vectorial Laplacian that p = −Δ⁻¹(∇ · f). Inserting this expression into (4.1d) projects v onto the manifold of divergence-free velocity fields and as such eliminates (4.1c) (assuming that the initial v is divergence free). Accordingly, we obtain the control equation (reduced gradient)

- β Δ v + K [f] = 0, where K [f] : = - \nabla (Δ^{- 1} (\nabla \cdot f)) + f,

(B.1)

to replace (4.1d). This expression is equivalent to (4.11) in the case when an H¹-regularization model is used (i.e., 𝒜 = −Δ). As stated above, the derivation also holds for the H²-regularization operator. We only have to replace −βΔv by βΔ²v. Computing the second variations of the weak form of the eliminated system yields the incremental control equation

- β A [\tilde{v}] + K [\tilde{f}] = - \tilde{g},

where ĝ is the reduced gradient in (B.1).

Appendix C. Relation to LDDMM

In this section we relate our work to [41] and by that to approaches based on LDDMM [3, 4, 23, 53, 68]. Since the work in [4, 41] is based on first-order information, we only consider the reduced gradient in (4.1d) (setting γ = 0). In weak form we have

\int_{0}^{1} {〈 g, \tilde{v} 〉}_{L^{2} (Ω)} d t = \int_{0}^{1} {〈 β (B B^{H}) [v] + f, \tilde{v} 〉}_{L^{2} (Ω)} d t = \int_{0}^{1} {〈 β v + {(B B^{H})}^{- 1} [f], \tilde{v} 〉}_{W \cdot} d t .

The expression βv + (ℬℬ^H)⁻¹[f] = v + (βℬℬ^H)⁻¹[f] is exactly the gradient in the function space 𝒲 that has been used in [41]. This expression yields the preconditioned gradient descent scheme

v_{k + 1}^{h} = v_{k}^{h} - α_{k} ({(β B^{h} B^{h, H})}^{- 1} [f_{k}^{h}] - v_{k}^{h}),

where (βℬ^hℬ^h^,H)⁻¹[f^h] is nothing but a Picard iterate (see (4.7)). Subtracting $v_{k}^{h}$ translates this iterate into an update. This is exactly the formulation we have used in this work (see section 4.2.3) so that the considered first-order method is equivalent to the solver used in [41] (under the assumption that α_k = 1, i.e., if we neglect the line search). Accordingly, the same line of arguments used in [41] to relate their work to LDDMM apply to our numerical framework.

Appendix D. Measures of deformation regularity

D.1. Deformation map

To visualize the deformation pattern, y has to be inferred from v. This can be done by solving

\partial_{t} u + (\nabla u) v = v in Ω \times (0, 1], u = 0 in Ω \times {0},

(D.1)

with periodic boundary conditions on ∂Ω. Here, u : Ω̄ × [0, 1] → R^d, (x, t) ↦ u(x, t), is a displacement field and y := x − u₁, y : Ω̄ → R^d, where u₁ := u(·, t = 1), u₁ : Ω̄ → R^d, x ↦ u₁(x).

Visualization

As can be seen in the visualization of the deformed grids, the mapping y actually corresponds the inverse of the deformation map applied to an image. This reflects the fact that our model is formulated in an Eulerian frame of reference. Note that all images reported are high-resolution vector graphics. Zooming in on the digital version of the paper will reveal local properties of the deformation map.

D.2. Deformation gradient

It is well known from calculus that the determinant of the Jacobian matrix det(∇y) can be used to assess invertibility of y as well as local volume change, provided that y ∈ C²(Ω)^d. In the framework of continuum mechanics, we can obtain this information from the deformation tensor field F : Ω̄ × [0, 1] → R^d^×^d, where F is related to v by

\partial_{t} F + (v \cdot \nabla) F = (\nabla v) F in Ω \times (0, 1], F = I in Ω \times {0},

(D.2)

with periodic boundary conditions on ∂Ω. Here, I = diag(1, …, 1) ∈ R^d^×^d; det(F₁) is equivalent to det(∇y), where F₁ := F(·, t = 1), F₁ : Ω̄ → R^d^×^d, x ↦ F₁(x).

Visualization

We limit the color map for the display of det(F₁) to [0, 2]. In particular, the color map ranges from black (compression: det(F₁) ∈ (0, 1); black corresponds to values of 0 or below (due to clipping), which represents a singularity or the loss of mass, respectively) to orange (mass conservation: det(F₁) = 1) to white (expansion: det(F₁) > 1; white represents values of 2 or greater (due to clipping)).

Footnotes

This work was supported by AFOSR grants FA9550-12-10484 and FA9550-11-10339; by NSF grants CCF-1337393 OCI-1029022; by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under awards DE-SC0010518, DE-SC0009286, and DE-FG02-08ER2585; and by NIH grant 10042242.

Note that controlling the magnitude of det(∇y) is not sufficient to guarantee that y is locally well behaved. Therefore, our framework features geometric constraints that guarantee a nice, locally diffeomorphic mapping y. In particular, we control the shear angle of the cells within the deformed grid during the parameter continuation in β.

The inf-sup condition is an important requirement when solving Stokes-like problems via the finite element method (see [12, p. 200ff.]; examples for a finite element discretization of similar problem formulations can be found in [16, 62]). Satisfying the inf-sup condition ensures that the finite element solution exists and is stable and optimal. Essentially, we require two different finite element spaces for the discretization of the pressure and the velocity. The inf-sup condition is key for the decision on an adequate pair of spaces. For our scheme, we can use the same basis (Fourier) for the discretization of the pressure and the velocity. Also, it is very efficient since we can eliminate the pressure from the optimality system (see section 4.3.4).

After the submission of our work, another contribution on second-order numerical optimization for LD-DMM appeared [43].

⁴

As opposed to the steps for updating $v_{k}^{h}$ , which we refer to as outer iterations; see Algorithm 1.

⁵

Note that the scheme in Algorithm 1 also applies to the Picard method (see section 4.2.3). The only difference is the way we compute the search direction s_k in line 5.

⁶

“PDE solve” refers to the solution of one of the hyperbolic PDEs that appear in the optimality system (4.1) and (4.3).

⁷

Note that for large β the optimization problem is almost quadratic, so that Newton–Krylov methods converge quickly.

⁸

The hand images are taken from [56].

⁹

Note that the tolerance of the Krylov-subspace method and therefore the number of the inner iterations depends on the gradient (see (4.5)).

¹⁰

Note that the Picard method is a gradient descent scheme in the function space induced by 𝒲. We can interpret the inverse of 𝒜^h as a preconditioner acting on the body force f. This operator is exactly the spectral preconditioner we use for the Newton–Krylov methods, which explains the similar behavior.

¹¹

Since we discuss the implementation of the incompressibility constraint we set γ = 1.

AMS subject classifications. 68U10, 49J20, 35Q93, 65K10, 76D55, 90C20

Contributor Information

Andreas Mang, Email: andreas@ices.utexas.edu.

George Biros, Email: gbiros@acm.org.

References

1.Arsigny V, Commowick O, Pennec X, Ayache N. A Log-Euclidean framework for statistics on diffeomorphisms. Springer; New York: 2006. Medical Image Computing and Computer-Assisted Intervention, Lect. Notes in Comput. Sci 4190; pp. 924–931. [DOI] [PubMed] [Google Scholar]
2.Ashburner J. A fast diffeomorphic image registration algorithm. Neuro Image. 2007;38:95–113. doi: 10.1016/j.neuroimage.2007.07.007. [DOI] [PubMed] [Google Scholar]
3.Ashburner J, Friston KJ. Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation. Neuro Image. 2011;55:954–967. doi: 10.1016/j.neuroimage.2010.12.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Beg MF, Miller MI, Trouvé A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int J Comput Vis. 2005;61:139–157. [Google Scholar]
5.Benzi M, Golub GH, Liesen J. Numerical solution of saddle point problems. Acta Numer. 2005;14:1–137. [Google Scholar]
6.Benzi M, Haber E, Taralli L. A preconditioning technique for a class of PDE-constrained optimization problems. Adv Comput Math. 2011;35:149–173. [Google Scholar]
7.Biros G, Doǧan G. A multilevel algorithm for inverse problems with PDE constraints. Inverse Problems. 2008;24 [Google Scholar]
8.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part I. The Krylov–Schur Solver, SIAM. J Sci Comput. 2005;27:687–713. [Google Scholar]
9.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part II. The Lagrange–Newton Solver and its application to Optimal Control of Steady Viscous Flows, SIAM. J Sci Comput. 2005;27:714–739. [Google Scholar]
10.Borzì A, Ito K, Kunisch K. Optimal control formulation for determining optical flow. SIAMJ Sci Comput. 2002;24:818–847. [Google Scholar]
11.Boyd JP. Chebyshev and Fourier Spectral Methods. Dover; Mineola, NY: 2000. [Google Scholar]
12.Brezzi F, Fortin M, editors. Mixed and Hybrid Finite Element Methods. Springer; New York: 1991. [Google Scholar]
13.Broit C. PhD thesis. Computer and Information Science, University of Pennsylvania; Philadelphia: 1981. Optimal Registration of Deformed Images. [Google Scholar]
14.Burger M, Modersitzki J, Ruthotto L. A hyperelastic regularization energy for image registration. SIAM J Sci Comput. 2013;35:B132–B148. [Google Scholar]
15.Byrd RH, Curtis FE, Nocedal J. An inexact SQP method for equality constrained optimization. SIAM J Optim. 2008;19:351–369. [Google Scholar]
16.Chen K, Lorenz DA. Image sequence interpolation using optimal control. J Math Imaging Vision. 2011;41:222–238. [Google Scholar]
17.Chen K, Lorenz DA. Image sequence interpolation based on optical flow, segmentation and optimal control. IEEE Trans Image Process. 2012;21:1020–1030. doi: 10.1109/TIP.2011.2179305. [DOI] [PubMed] [Google Scholar]
18.Christensen GE, Rabbitt RD, Miller MI. 3D brain mapping using a deformable neuroanatomy. Phys Med Biol. 1994;39:609–618. doi: 10.1088/0031-9155/39/3/022. [DOI] [PubMed] [Google Scholar]
19.Christensen GE, Rabbitt RD, Miller MI. Deformable templates using large deformation kinematics. IEEE Trans Image Process. 1996;5:1435–1447. doi: 10.1109/83.536892. [DOI] [PubMed] [Google Scholar]
20.Dembo RS, Eisenstat SC, Steihaug T. Inexact Newton methods. SIAM J Numer Anal. 1982;19:400–408. [Google Scholar]
21.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math Program. 1983;26:190–212. [Google Scholar]
22.Droske M, Rumpf M. A variational approach to non-rigid morphological registration. SIAM Appl Math. 2003;64:668–687. [Google Scholar]
23.Dupuis P, Gernander U, Miller MI. Variational problems on flows of diffeomorphisms for image matching. Quart Appl Math. 1998;56:587–600. [Google Scholar]
24.Eisentat SC, Walker HF. Choosing the forcing terms in an inexact Newton method. SIAMJ Sci Comput. 1996;17:16–32. [Google Scholar]
25.Fischer B, Modersitzki J. Fast diffusion registration. Contemp Math. 2002;313:117–129. [Google Scholar]
26.Fischer B, Modersitzki J. Curvature based image registration. J Math Imaging Vision. 2003;18:81–85. [Google Scholar]
27.Fischer B, Modersitzki J. Ill-posed medicine—an introduction to image registration. Inverse Problems. 2008;24:1–16. [Google Scholar]
28.Frohn-Schauf C, Henn S, Witsch K. Multigrid based total variation image registration. Comput Vis Sci. 2008;11:101–113. [Google Scholar]
29.Gill PE, Murray W, Wright MH. Practical Optimization. Academic Press; Waltham, MA: 1981. [Google Scholar]
30.Golub GH, Van Loan CF. Matrix Computations. 3. Johns Hopkins University Press; Baltimore: 1996. [Google Scholar]
31.Gooya A, Pohl KM, Bilello M, Cirillo L, Biros G, Melhem ER, Davatzikos C. GLISTR: Glioma image segmentation and registration. IEEE Trans Med Imaging. 2013;31:1941–1954. doi: 10.1109/TMI.2012.2210558. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Gunzburger MD. Perspectives in flow control and optimization. SIAM; Philadelphia: 2003. [Google Scholar]
33.Gurtin ME. Math Sci Engrg. Academic Press; New York: 1981. An Introduction to Continuum Mechanics; p. 158. [Google Scholar]
34.Haber E, Ascher UM. Preconditioned all-at-once methods for large, sparse parameter estimation problems. Inverse Problems. 2001;17:1847–1864. [Google Scholar]
35.Haber E, Ascher UM, Oldenburg D. On optimization techniques for solving nonlinear inverse problems. Inverse Problems. 2000;16:1263–1280. [Google Scholar]
36.Haber E, Horesh R, Modersitzki J. Numerical optimization for constrained image registration. Numer Linear Algebra. 2010;17:343–359. [Google Scholar]
37.Haber E, Modersitzki J. Numerical methods for volume preserving image registration. Inverse Problems. 2004;20:1621–1638. [Google Scholar]
38.Haber E, Modersitzki J. A multilevel method for image registration. SIAM J Sci Comput. 2006;27:1594–1607. [Google Scholar]
39.Haber E, Modersitzki J. Image registration with guaranteed displacement regularity. Int J Comput Vis. 2007;71:361–372. [Google Scholar]
40.Hand L, Hipwell JH, Eiben B, Barratt D, Modat M, Ourselin S, Hawkes DJ. A nonlinear biomechanical model based registration method for aligning prone and supine MR breast images. IEEE Trans Med Imaging. 2014;33:682–694. doi: 10.1109/TMI.2013.2294539. [DOI] [PubMed] [Google Scholar]
41.Hart GL, Zach C, Niethammer M. An optimal control approach for deformable registration. Proceedings of CVPR, IEEE; 2009; pp. 9–16. [Google Scholar]
42.Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM J Sci Comput. 2005;27:831–849. [Google Scholar]
43.Hernandez H. Gauss-Newton inspired preconditioned optimization in large deformation diffeomorphic metric mapping. Phys Med Biol. 2014;59:6085–6115. doi: 10.1088/0031-9155/59/20/6085. [DOI] [PubMed] [Google Scholar]
44.Hernandez M, Bossa MN, Olmos S. Registration of anatomical images using paths of diffeomorphisms parameterized with stationary vector field flows. Int J Comput Vis. 2009;85:291–306. [Google Scholar]
45.Hogea C, Davatzikos C, Biros G. Brain-tumor interaction biophysical models for medical image registration. SIAM J Sci Comput. 2008;30:3050–3072. [Google Scholar]
46.Horn BKP, Shunck BG. Determining optical flow. Artificial Intelligence. 1981;17:185–203. [Google Scholar]
47.Kalmoun EM, Garrido L, Caselles V. Line search multilevel optimization as computational methods for dense optical flow. SIAM J Imaging Sci. 2011;4:695–722. [Google Scholar]
48.Lee E, Gunzburger M. An optimal control formulation of an image registration problem. J Math Imaging Vision. 2010;36:69–80. [Google Scholar]
49.Lee E, Gunzburger M. Analysis of finite element discretization of an optimal control fomulation of the image registration problem. SIAM J Numer Anal. 2011;49:1321–1349. [Google Scholar]
50.Loeckx D, Maes F, Vandermeulen D, Suetens P. Nonrigid image registration using free-form deformations with a local rigidity constraint. Springer; New York: 2004. Medical Image Computing and Computer-Assisted Intervention, Lecturte Notes in Comput. Sci 3216; pp. 639–646. [Google Scholar]
51.Mang A, Toma A, Schuetz TA, Becker S, Eckey T, Mohr C, Petersen D, Buzug TM. Biophysical modeling of brain tumor progression: From unconditionally stable explicit time integration to an inverse problem with parabolic PDE constraints for model calibration. Med Phys. 2012;39:4444–4459. doi: 10.1118/1.4722749. [DOI] [PubMed] [Google Scholar]
52.Mansi T, Pennec X, Sermesant M, Delingette H, Ayache N. iLogDemons: A Demonsbased registration algorithm for tracking incompressible elastic biological tissues. Int J Comput Vis. 2011;92:92–111. [Google Scholar]
53.Miller MI. Computational anatomy: Shape, growth and atrophy comparison via diffeomorphisms. Neuro Image. 2004;23:S19–S33. doi: 10.1016/j.neuroimage.2004.07.021. [DOI] [PubMed] [Google Scholar]
54.Modersitzki J. Numerical Methods for Image Registration. Oxford University Press; New York: 2004. [Google Scholar]
55.Modersitzki J. FLIRT with rigidity—image registration with a local non-rigidity penalty. Int J Comput Vis. 2008;76:153–163. [Google Scholar]
56.Modersitzki J. FAIR: Flexible Algorithms for Image Registration. SIAM; Philadelphia, Pennsylvania, US: 2009. [Google Scholar]
57.Museyko O, Stiglmayr M, Klamroth K, Leugering G. On the application of the Monge-Kantorovich problem to image registration. SIAM J Imaging Sci. 2009;2:1068–1097. [Google Scholar]
58.Nielsen M, Johansen P, Jackson AD, Lautrup B. Brownian warps: A least committed prior for non-rigid registration. Springer; New York: 2002. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 2489; pp. 557–564. [Google Scholar]
59.Nocedal J, Wright SJ. Numerical Optimization. Springer; New York: 2006. [Google Scholar]
60.Pennec X, Stefanescu R, Arsigny V, Fillard P, Ayache N. Riemannian elasticity: A statistical regularization framework for non-linear registration. Springer; New York: 2005. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 3750; pp. 943–950. [DOI] [PubMed] [Google Scholar]
61.Rohlfing T, Maurer CR, Bluemke DA, Jacobs MA. Volume-preserving nonrigid registration of MR breast images using free-form deformation with an incompressibility constraint. IEEE Trans Medical Imaging. 2003;22:730–741. doi: 10.1109/TMI.2003.814791. [DOI] [PubMed] [Google Scholar]
62.Ruhnau P, Schnörr C. Optical Stokes flow estimation: An imaging-based control approach. Exp Fluids. 2007;42:61–78. [Google Scholar]
63.Rumpf M, Wirth B. A nonlinear elastic shape averaging approach. SIAM J Imaging Sci. 2009;2:800–833. [Google Scholar]
64.Sdika M. A fast nonrigid image registration with constraints on the Jacobian using large scale constrained optimization. IEEE Trans Medical Imaging. 2008;27:271–281. doi: 10.1109/TMI.2007.905820. [DOI] [PubMed] [Google Scholar]
65.Sotiras A, Davatzikos C, Paragios N. Deformable medical image registration: A survey. IEEE Trans Medical Imaging. 2013;32:1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Sundar H, Davatzikos C, Biros G. Biomechanically constrained 4D estimation of mycardial motion. Springer; New York: 2009. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 5762; pp. 257–265. [DOI] [PubMed] [Google Scholar]
67.Thirion JP. Image matching as a diffusion process: An analogy with Maxwell’s demons. Med Image Anal. 1998;2:243–260. doi: 10.1016/s1361-8415(98)80022-4. [DOI] [PubMed] [Google Scholar]
68.Trouvé A. Diffeomorphism groups and pattern matching in image analysis. Int J Comput Vis. 1998;28:213–221. [Google Scholar]
69.Vercauteren T, Pennec X, Perchant A, Ayache N. Symmetric log-domain diffeomorphic registration: A demons-based approach. Springer; New York: 2008. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 5241; pp. 754–761. [DOI] [PubMed] [Google Scholar]
70.Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: Efficient non-parametric image registration. Neuro Image. 2009;45:S61–S72. doi: 10.1016/j.neuroimage.2008.10.040. [DOI] [PubMed] [Google Scholar]
71.Vialard FX, Risser L, Rueckert D, Cotter CJ. Diffeomorphic 3D. image registration via geodesic shooting using an efficient adjoint calculation. Int J Comput Vis. 2012;97:229–241. [Google Scholar]
72.Vogel CR. Computational Methods for Inverse Problems. SIAM; Philadelphia: 2002. [Google Scholar]
73.Yanovsky I, Thompson PM, Osher S, Loew AD. Topology preserving log-unbiased nonlinear image registration: Theory and implementation. Proceedings CVPR, IEEE; 2007; pp. 1–8. [Google Scholar]
74.Younes L. Jacobi fields in groups of diffeomorphisms and applications. Quart Appl Math. 2007;650:113–134. [Google Scholar]

[R1] 1.Arsigny V, Commowick O, Pennec X, Ayache N. A Log-Euclidean framework for statistics on diffeomorphisms. Springer; New York: 2006. Medical Image Computing and Computer-Assisted Intervention, Lect. Notes in Comput. Sci 4190; pp. 924–931. [DOI] [PubMed] [Google Scholar]

[R2] 2.Ashburner J. A fast diffeomorphic image registration algorithm. Neuro Image. 2007;38:95–113. doi: 10.1016/j.neuroimage.2007.07.007. [DOI] [PubMed] [Google Scholar]

[R3] 3.Ashburner J, Friston KJ. Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation. Neuro Image. 2011;55:954–967. doi: 10.1016/j.neuroimage.2010.12.049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Beg MF, Miller MI, Trouvé A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int J Comput Vis. 2005;61:139–157. [Google Scholar]

[R5] 5.Benzi M, Golub GH, Liesen J. Numerical solution of saddle point problems. Acta Numer. 2005;14:1–137. [Google Scholar]

[R6] 6.Benzi M, Haber E, Taralli L. A preconditioning technique for a class of PDE-constrained optimization problems. Adv Comput Math. 2011;35:149–173. [Google Scholar]

[R7] 7.Biros G, Doǧan G. A multilevel algorithm for inverse problems with PDE constraints. Inverse Problems. 2008;24 [Google Scholar]

[R8] 8.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part I. The Krylov–Schur Solver, SIAM. J Sci Comput. 2005;27:687–713. [Google Scholar]

[R9] 9.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part II. The Lagrange–Newton Solver and its application to Optimal Control of Steady Viscous Flows, SIAM. J Sci Comput. 2005;27:714–739. [Google Scholar]

[R10] 10.Borzì A, Ito K, Kunisch K. Optimal control formulation for determining optical flow. SIAMJ Sci Comput. 2002;24:818–847. [Google Scholar]

[R11] 11.Boyd JP. Chebyshev and Fourier Spectral Methods. Dover; Mineola, NY: 2000. [Google Scholar]

[R12] 12.Brezzi F, Fortin M, editors. Mixed and Hybrid Finite Element Methods. Springer; New York: 1991. [Google Scholar]

[R13] 13.Broit C. PhD thesis. Computer and Information Science, University of Pennsylvania; Philadelphia: 1981. Optimal Registration of Deformed Images. [Google Scholar]

[R14] 14.Burger M, Modersitzki J, Ruthotto L. A hyperelastic regularization energy for image registration. SIAM J Sci Comput. 2013;35:B132–B148. [Google Scholar]

[R15] 15.Byrd RH, Curtis FE, Nocedal J. An inexact SQP method for equality constrained optimization. SIAM J Optim. 2008;19:351–369. [Google Scholar]

[R16] 16.Chen K, Lorenz DA. Image sequence interpolation using optimal control. J Math Imaging Vision. 2011;41:222–238. [Google Scholar]

[R17] 17.Chen K, Lorenz DA. Image sequence interpolation based on optical flow, segmentation and optimal control. IEEE Trans Image Process. 2012;21:1020–1030. doi: 10.1109/TIP.2011.2179305. [DOI] [PubMed] [Google Scholar]

[R18] 18.Christensen GE, Rabbitt RD, Miller MI. 3D brain mapping using a deformable neuroanatomy. Phys Med Biol. 1994;39:609–618. doi: 10.1088/0031-9155/39/3/022. [DOI] [PubMed] [Google Scholar]

[R19] 19.Christensen GE, Rabbitt RD, Miller MI. Deformable templates using large deformation kinematics. IEEE Trans Image Process. 1996;5:1435–1447. doi: 10.1109/83.536892. [DOI] [PubMed] [Google Scholar]

[R20] 20.Dembo RS, Eisenstat SC, Steihaug T. Inexact Newton methods. SIAM J Numer Anal. 1982;19:400–408. [Google Scholar]

[R21] 21.Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math Program. 1983;26:190–212. [Google Scholar]

[R22] 22.Droske M, Rumpf M. A variational approach to non-rigid morphological registration. SIAM Appl Math. 2003;64:668–687. [Google Scholar]

[R23] 23.Dupuis P, Gernander U, Miller MI. Variational problems on flows of diffeomorphisms for image matching. Quart Appl Math. 1998;56:587–600. [Google Scholar]

[R24] 24.Eisentat SC, Walker HF. Choosing the forcing terms in an inexact Newton method. SIAMJ Sci Comput. 1996;17:16–32. [Google Scholar]

[R25] 25.Fischer B, Modersitzki J. Fast diffusion registration. Contemp Math. 2002;313:117–129. [Google Scholar]

[R26] 26.Fischer B, Modersitzki J. Curvature based image registration. J Math Imaging Vision. 2003;18:81–85. [Google Scholar]

[R27] 27.Fischer B, Modersitzki J. Ill-posed medicine—an introduction to image registration. Inverse Problems. 2008;24:1–16. [Google Scholar]

[R28] 28.Frohn-Schauf C, Henn S, Witsch K. Multigrid based total variation image registration. Comput Vis Sci. 2008;11:101–113. [Google Scholar]

[R29] 29.Gill PE, Murray W, Wright MH. Practical Optimization. Academic Press; Waltham, MA: 1981. [Google Scholar]

[R30] 30.Golub GH, Van Loan CF. Matrix Computations. 3. Johns Hopkins University Press; Baltimore: 1996. [Google Scholar]

[R31] 31.Gooya A, Pohl KM, Bilello M, Cirillo L, Biros G, Melhem ER, Davatzikos C. GLISTR: Glioma image segmentation and registration. IEEE Trans Med Imaging. 2013;31:1941–1954. doi: 10.1109/TMI.2012.2210558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Gunzburger MD. Perspectives in flow control and optimization. SIAM; Philadelphia: 2003. [Google Scholar]

[R33] 33.Gurtin ME. Math Sci Engrg. Academic Press; New York: 1981. An Introduction to Continuum Mechanics; p. 158. [Google Scholar]

[R34] 34.Haber E, Ascher UM. Preconditioned all-at-once methods for large, sparse parameter estimation problems. Inverse Problems. 2001;17:1847–1864. [Google Scholar]

[R35] 35.Haber E, Ascher UM, Oldenburg D. On optimization techniques for solving nonlinear inverse problems. Inverse Problems. 2000;16:1263–1280. [Google Scholar]

[R36] 36.Haber E, Horesh R, Modersitzki J. Numerical optimization for constrained image registration. Numer Linear Algebra. 2010;17:343–359. [Google Scholar]

[R37] 37.Haber E, Modersitzki J. Numerical methods for volume preserving image registration. Inverse Problems. 2004;20:1621–1638. [Google Scholar]

[R38] 38.Haber E, Modersitzki J. A multilevel method for image registration. SIAM J Sci Comput. 2006;27:1594–1607. [Google Scholar]

[R39] 39.Haber E, Modersitzki J. Image registration with guaranteed displacement regularity. Int J Comput Vis. 2007;71:361–372. [Google Scholar]

[R40] 40.Hand L, Hipwell JH, Eiben B, Barratt D, Modat M, Ourselin S, Hawkes DJ. A nonlinear biomechanical model based registration method for aligning prone and supine MR breast images. IEEE Trans Med Imaging. 2014;33:682–694. doi: 10.1109/TMI.2013.2294539. [DOI] [PubMed] [Google Scholar]

[R41] 41.Hart GL, Zach C, Niethammer M. An optimal control approach for deformable registration. Proceedings of CVPR, IEEE; 2009; pp. 9–16. [Google Scholar]

[R42] 42.Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM J Sci Comput. 2005;27:831–849. [Google Scholar]

[R43] 43.Hernandez H. Gauss-Newton inspired preconditioned optimization in large deformation diffeomorphic metric mapping. Phys Med Biol. 2014;59:6085–6115. doi: 10.1088/0031-9155/59/20/6085. [DOI] [PubMed] [Google Scholar]

[R44] 44.Hernandez M, Bossa MN, Olmos S. Registration of anatomical images using paths of diffeomorphisms parameterized with stationary vector field flows. Int J Comput Vis. 2009;85:291–306. [Google Scholar]

[R45] 45.Hogea C, Davatzikos C, Biros G. Brain-tumor interaction biophysical models for medical image registration. SIAM J Sci Comput. 2008;30:3050–3072. [Google Scholar]

[R46] 46.Horn BKP, Shunck BG. Determining optical flow. Artificial Intelligence. 1981;17:185–203. [Google Scholar]

[R47] 47.Kalmoun EM, Garrido L, Caselles V. Line search multilevel optimization as computational methods for dense optical flow. SIAM J Imaging Sci. 2011;4:695–722. [Google Scholar]

[R48] 48.Lee E, Gunzburger M. An optimal control formulation of an image registration problem. J Math Imaging Vision. 2010;36:69–80. [Google Scholar]

[R49] 49.Lee E, Gunzburger M. Analysis of finite element discretization of an optimal control fomulation of the image registration problem. SIAM J Numer Anal. 2011;49:1321–1349. [Google Scholar]

[R50] 50.Loeckx D, Maes F, Vandermeulen D, Suetens P. Nonrigid image registration using free-form deformations with a local rigidity constraint. Springer; New York: 2004. Medical Image Computing and Computer-Assisted Intervention, Lecturte Notes in Comput. Sci 3216; pp. 639–646. [Google Scholar]

[R51] 51.Mang A, Toma A, Schuetz TA, Becker S, Eckey T, Mohr C, Petersen D, Buzug TM. Biophysical modeling of brain tumor progression: From unconditionally stable explicit time integration to an inverse problem with parabolic PDE constraints for model calibration. Med Phys. 2012;39:4444–4459. doi: 10.1118/1.4722749. [DOI] [PubMed] [Google Scholar]

[R52] 52.Mansi T, Pennec X, Sermesant M, Delingette H, Ayache N. iLogDemons: A Demonsbased registration algorithm for tracking incompressible elastic biological tissues. Int J Comput Vis. 2011;92:92–111. [Google Scholar]

[R53] 53.Miller MI. Computational anatomy: Shape, growth and atrophy comparison via diffeomorphisms. Neuro Image. 2004;23:S19–S33. doi: 10.1016/j.neuroimage.2004.07.021. [DOI] [PubMed] [Google Scholar]

[R54] 54.Modersitzki J. Numerical Methods for Image Registration. Oxford University Press; New York: 2004. [Google Scholar]

[R55] 55.Modersitzki J. FLIRT with rigidity—image registration with a local non-rigidity penalty. Int J Comput Vis. 2008;76:153–163. [Google Scholar]

[R56] 56.Modersitzki J. FAIR: Flexible Algorithms for Image Registration. SIAM; Philadelphia, Pennsylvania, US: 2009. [Google Scholar]

[R57] 57.Museyko O, Stiglmayr M, Klamroth K, Leugering G. On the application of the Monge-Kantorovich problem to image registration. SIAM J Imaging Sci. 2009;2:1068–1097. [Google Scholar]

[R58] 58.Nielsen M, Johansen P, Jackson AD, Lautrup B. Brownian warps: A least committed prior for non-rigid registration. Springer; New York: 2002. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 2489; pp. 557–564. [Google Scholar]

[R59] 59.Nocedal J, Wright SJ. Numerical Optimization. Springer; New York: 2006. [Google Scholar]

[R60] 60.Pennec X, Stefanescu R, Arsigny V, Fillard P, Ayache N. Riemannian elasticity: A statistical regularization framework for non-linear registration. Springer; New York: 2005. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 3750; pp. 943–950. [DOI] [PubMed] [Google Scholar]

[R61] 61.Rohlfing T, Maurer CR, Bluemke DA, Jacobs MA. Volume-preserving nonrigid registration of MR breast images using free-form deformation with an incompressibility constraint. IEEE Trans Medical Imaging. 2003;22:730–741. doi: 10.1109/TMI.2003.814791. [DOI] [PubMed] [Google Scholar]

[R62] 62.Ruhnau P, Schnörr C. Optical Stokes flow estimation: An imaging-based control approach. Exp Fluids. 2007;42:61–78. [Google Scholar]

[R63] 63.Rumpf M, Wirth B. A nonlinear elastic shape averaging approach. SIAM J Imaging Sci. 2009;2:800–833. [Google Scholar]

[R64] 64.Sdika M. A fast nonrigid image registration with constraints on the Jacobian using large scale constrained optimization. IEEE Trans Medical Imaging. 2008;27:271–281. doi: 10.1109/TMI.2007.905820. [DOI] [PubMed] [Google Scholar]

[R65] 65.Sotiras A, Davatzikos C, Paragios N. Deformable medical image registration: A survey. IEEE Trans Medical Imaging. 2013;32:1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] 66.Sundar H, Davatzikos C, Biros G. Biomechanically constrained 4D estimation of mycardial motion. Springer; New York: 2009. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 5762; pp. 257–265. [DOI] [PubMed] [Google Scholar]

[R67] 67.Thirion JP. Image matching as a diffusion process: An analogy with Maxwell’s demons. Med Image Anal. 1998;2:243–260. doi: 10.1016/s1361-8415(98)80022-4. [DOI] [PubMed] [Google Scholar]

[R68] 68.Trouvé A. Diffeomorphism groups and pattern matching in image analysis. Int J Comput Vis. 1998;28:213–221. [Google Scholar]

[R69] 69.Vercauteren T, Pennec X, Perchant A, Ayache N. Symmetric log-domain diffeomorphic registration: A demons-based approach. Springer; New York: 2008. Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci 5241; pp. 754–761. [DOI] [PubMed] [Google Scholar]

[R70] 70.Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: Efficient non-parametric image registration. Neuro Image. 2009;45:S61–S72. doi: 10.1016/j.neuroimage.2008.10.040. [DOI] [PubMed] [Google Scholar]

[R71] 71.Vialard FX, Risser L, Rueckert D, Cotter CJ. Diffeomorphic 3D. image registration via geodesic shooting using an efficient adjoint calculation. Int J Comput Vis. 2012;97:229–241. [Google Scholar]

[R72] 72.Vogel CR. Computational Methods for Inverse Problems. SIAM; Philadelphia: 2002. [Google Scholar]

[R73] 73.Yanovsky I, Thompson PM, Osher S, Loew AD. Topology preserving log-unbiased nonlinear image registration: Theory and implementation. Proceedings CVPR, IEEE; 2007; pp. 1–8. [Google Scholar]

[R74] 74.Younes L. Jacobi fields in groups of diffeomorphisms and applications. Quart Appl Math. 2007;650:113–134. [Google Scholar]

PERMALINK

An Inexact Newton–Krylov Algorithm for Constrained Diffeomorphic Image Registration*

Andreas Mang

George Biros

Abstract

1. Introduction and motivation

1.1. Outline of the method

1.2. Contributions

1.3. Limitations

1.4. Related work

2. Outline

3. Continuous problem formulation

Table 1.

3.1. Regularization models

4. Numerics

4.1. Optimality conditions

4.2. Numerical optimization

4.2.1. Inexact Newton–Krylov method

Algorithm 1.

Algorithm 2.

4.2.2. GN approximation

4.2.3. Picard method

4.2.4. Termination criteria

4.3. Algorithmic details

4.3.1. Numerical discretization

4.3.2. Spectral Galerkin method

4.3.3. Inversion: Regularization operators

4.3.4. Elimination of p and p̃

4.3.5. Parameter selection

Optimization

PDE solver

Regularization

Presmoothing

5. Numerical experiments

5.1. Data

Figure 1.

5.2. Measures of performance

Table 2.

5.3. Numerical study

5.3.1. Spectral analysis

Purpose

Setup

Results

Figure 2.

Table 3.

Figure 3.

Observations

Conclusion

5.3.2. Convergence study

C∞ registration problem

Purpose

Setup

Results

Table 4.

Table 5.

Figure 4.

Observations

Conclusion

Images with sharp features

Purpose

Setup

Results

Table 6.

Figure 5.

Figure 6.

Table 7.

Observations

Conclusion

5.3.3. Number of unknowns in time

Purpose

Setup

Results

Figure 7.

Table 8.

Figure 8.

Observations

Conclusion

5.4. Parameter continuation to estimate β

Purpose

Setup

An Inexact Newton–Krylov Algorithm for Constrained Diffeomorphic Image Registration^*

C^∞ registration problem