Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Nov 21.
Published in final edited form as: SIAM J Sci Comput. 2017 Nov 21;39(6):B1064–B1101. doi: 10.1137/16M1070475

A SEMI-LAGRANGIAN TWO-LEVEL PRECONDITIONED NEWTON-KRYLOV SOLVER FOR CONSTRAINED DIFFEOMORPHIC IMAGE REGISTRATION

Andreas Mang *, George Biros *
PMCID: PMC5731678  NIHMSID: NIHMS923830  PMID: 29255342

Abstract

We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation.

We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations.

We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20× speedup for a two dimensional, real world multi-subject medical image registration problem.

Keywords: Newton, Krylov method, semi-Lagrangian formulation, KKT preconditioners, constrained diffeomorphic image registration, stationary velocity field registration, optimal control, PDE constrained optimization

AMS subject classifications: 68U10, 49J20, 35Q93, 65K10, 65F08, 76D55

1. Introduction

Image registration finds numerous applications in image analysis and computer vision [45,67]. Image registration establishes meaningful spatial correspondence between two images mR:Ω¯R (the “reference image”) and mT:Ω¯R (the “template image”) of a scene such that the deformed template image mT becomes similar to mR, i.e., mTy ≈ mR [61]; the images are defined on an open set Ω ⊂ Rd, d ∈ {2, 3}, with closure Ω¯:=ΩΩ and boundary Ω, ○ denotes the composition of two functions, and y:Ω¯Ω¯ is the sought after deformation map. There exist various approaches to image registration; we refer to [33,61,67] for a lucid overview.

Image registration is typically formulated as a variational optimization problem with an objective functional that consists of a data fidelity term and a Tikhonov-type regularization norm [3]; the unregularized problem is ill-posed. Here, we follow up on our preceding work on constrained diffeomorphic image registration [58, 59]. In diffeomorphic image registration we require that the map y is a diffeomorphism, i.e., y is a bijection, continuously differentiable, and has a continuously differentiable inverse. Formally, we require that det ∇y ≠ 0, ∇yRd×d, ∀x ∈ Ω and—under the assumption that y is orientation preserving—det ∇y > 0, ∀x ∈ Ω.

Different approaches to guarantee a diffeomorphic y have appeared in the past. One approach is to penalize det ∇y as done in [22, 30, 42, 43, 65]. Another approach is to change the formulation; instead of inverting directly for the deformation map y, we invert for its velocity v = dty. If v is sufficiently smooth it can be guaranteed that the resulting y is a diffeomorphism [10,31,70]. In our formulation, we augment this type of smoothness regularization by constraints on the divergence of v [58, 59]. For instance, for ∇ · v = 0 the flow becomes incompressible. This is equivalent to enforcing det ∇y = 1 [40, pages 77ff.].

Velocity field formulations for diffeomorphic image registration can be distinguished between approaches that invert for a time dependent v [10, 31, 46, 58] and approaches that invert for a stationary v [48, 57, 59]. We invert for a stationary v. We formulate the diffeomorphic image registration problem as a PDE constrained optimization problem, where the constraint is a transport equation for the scalar field m:Ω¯×[0,1]R (the image intensities). Due to ill-conditioning, non-convexity, large-problem size, infinite-dimensional structure, and the need for adjoint operators, such problems are challenging to solve. We use a reduced space Newton–Krylov method [58]. In reduced space methods we eliminate state variables (in our case the transported image) and iterate in the control variable space (in our case the velocity space). Newton methods typically display faster convergence than gradient-descent methods (see [58]). Using a Newton method, however, requires solving linear systems with the reduced space Hessian, which—upon discretization—is a large, dense, and ill-conditioned operator. Efficient preconditioning is critical for making our solver effective across a wide spectrum of image resolutions, regularization weights, and inversion tolerances. Standard preconditioning techniques like incomplete factorization cannot be applied since we do not have access to the matrix entries (too expensive to compute). Instead, we present a matrix-free, two-level preconditioner for the reduced space Hessian that significantly improves performance. Another computational challenge of our formulation is that the reduced space formulation requires the exact solution of two hyperbolic transport equations—the state and adjoint equations of our problem—every time we evaluate the reduced gradient or apply the reduced space Hessian operator. We introduce a semi-Lagrangian formulation to further speed up our solver.

1.1. Outline of the Method

We are given two functions mR:Ω¯R (fixed image) and mT:Ω¯R (deformable image) compactly supported on an open set Ω := (−π, π)d, d ∈ {2, 3}, with boundary Ω, and closure Ω¯:=ΩΩ. We solve for a stationary velocity field νU and a mass source as follows [59]:

minm,ν,w12mRm1L2(Ω)2+βv2νV2+βw2wW2 (1a)

subject to

tm+mν=0inΩ×(0,1], (1b)
m=mTinΩ×{0}, (1c)
ν=winΩ, (1d)

and periodic boundary conditions on Ω. In our formulation m1(x) := m(x, t = 1)—-i.e., the solution of the hyperbolic transport equation (1b) with initial condition (1c)—is equivalent to mTy; the deformation map y can be computed from v in a post-processing step (see, e.g., [58, 59]). The weights βv > 0, and βw > 0 control the regularity of v.

The regularization norm for v not only alleviates issues related to the ill-posedness of our problem but also ensures the existence of a diffeomorphism y parameterized by v if chosen appropriately. The constraint in (1d) allows us to control volume change; setting w = 0 results in an incompressible diffeomorphism y, i.e., the deformation gradient det ∇y is fixed to one for all x ∈ Ω. The deformation map y is no longer incompressible if we allow w to deviate from zero (this formulation has originally been introduced in [59]; a similar formulation can be found in [19]). We can control this deviation with βw; the regularization norm for w acts like a penalty on ∇ · v. We will specify and discuss the choices for the spaces U, V, and W in more detail §2 and §A.

We use the method of Lagrange multipliers to solve (1). We first formally derive the optimality conditions and then discretize using a pseudospectral discretization in space with a Fourier basis (i.e., we use an optimize-then-discretize approach; see §3). We solve for the first-order optimality conditions using a globalized, matrix-free, preconditioned, inexact Newton–Krylov algorithm for the velocity field v (see [58] for details). The hyperbolic transport equations are solved via a semi-Lagrangian method.

1.2. Contributions

Our Newton-Krylov scheme has originally been described in [58], in which we compared it to gradient-descent approach in the Sobolev space induced by the regularization operator (the latter approach is, e.g., used in [46]). The latter, as expected, is extremely slow and not competitive with (Gauss–)Newton schemes. In [59] we introduced and studied different regularization functionals, and compared the performance of our method against existing approaches for diffeomorphic image registration, in particular the Demons family of algorithms [71, 72]. Here we extend our preceding work in the following ways:

  • We propose a semi-Lagrangian formulation for our entire optimality system, i.e., the state, adjoint, and incremental state and adjoint equations. We compare it with an stabilized Runge–Kutta method, which we also introduce here; we show that the semi-Lagrangian scheme has excellent stability properties.

  • We introduce an improved preconditioner for the reduced Hessian system. It is a two-level pre-conditioner that uses spectral restriction and prolongation operators and a Chebyshev stationary iterative method for an approximate coarse grid solve.

  • We provide an experimental study of the performance of our improved numerical scheme based on synthetic and real-world problems. We study self-convergence, grid convergence, numerical accuracy, convergence as a function of the regularization parameter, and the time to solution. We account for different constraints and regularization norms.

Taken together, the new algorithm results in order of magnitude speedups over the state-of-the-art. For example, for a magnetic resonance image of a brain with 5122 resolution the new scheme is 18× faster (see Tab. 8 in §4) than the scheme described in [59].

Table 8.

Convergence results for different strategies to precondition the reduced space KKT system. We report results for two preconditioners—our original preconditioner based on the regularization operator (PREG) and the proposed, nested preconditioner (P2L). We solve the reduced space KKT system via a PCG method with a tolerance of 1E−6. We use two different solvers for the latter to invert the preconditioner—a PCG method with a tolerance that is 1E−1 times the tolerance of the PCG method used to solve the reduced space KKT system (i.e., a tolerance of 1E−7) and a CHEB method with a fixed number of 10 iterations. We consider a compressible diffeomorphism with an H2-regularization model. We report results for different images (SMOOTH A, BRAIN, and HAND), for different regularization weights βv, and a varying grid sizes nx (grid convergence; number of unknowns n=2nx1nx2). We solve the reduced space KKT system at the true solution vh; the velocity field vh corresponds to the test problem SMOOTH A. We consider the RK2A method with a CFL number of 0.2 for the regularization preconditioner and the SL scheme for the nested preconditioner with a CFL number of 5. We report (i) the number of PCG iterations until convergence, (ii) the time spent on the Hessian matvecs (in seconds), (iii) the percentage of that time spent on inverting the preconditioner (if applicable), and (iv) the speedup compared to our original preconditioner (regularization preconditioner in combination with the RK2A scheme).

SMOOTH A HAND BRAIN

n βv P PDE solver PC solver run iter time % PC speedup run iter time % PC speedup run iter time % PC speedup
8192 1E−1 PREG RK2A(0.2) #1 4 2.67 #2 19 7.12 #3 21 8.39
P2L SL(5) PCG(1E−1) #4 2 3.08 64.04% 8.66E−1 #5 7 1.67E+1 90.42% 4.27E−1 #6 7 1.66E+1 91.15% 5.05E−1
P2L SL(5) CHEB(10) #7 2 2.36 55.81% 1.13 #8 6 6.19 75.56% 1.15 #9 7 6.44 75.69% 1.30
1E−2 PREG RK2A(0.2) #10 6 3.14 #11 47 1.59E+1 #12 53 1.76E+1
P2L SL(5) PCG(1E−1) #13 2 4.31 73.56% 7.29E−1 #14 8 4.60E+1 95.95% 3.47E−1 #15 8 4.39E+1 96.10% 3.99E−1
P2L SL(5) CHEB(10) #16 3 3.19 62.15% 9.83E−1 #17 8 8.26 79.30% 1.93 #18 7 6.98 76.15% 2.52
1E−3 PREG RK2A(0.2) #19 16 6.11 #20 138 4.26E+1 #21 161 4.96E+1
P2L SL(5) PCG(1E−1) #22 2 7.77 84.55% 7.86E−1 #23 10 1.66E+2 98.69% 2.56E−1 #24 11 2.24E+2 98.98% 2.22E−1
P2L SL(5) CHEB(10) #25 6 5.93 75.31% 1.03 #26 24 2.16E+1 83.93% 1.97 #27 22 2.04E+1 82.94% 2.43

32 768 1E−1 PREG RK2A(0.2) #28 4 1.89E+1 #29 22 4.31E+1 #30 26 5.93E+1
P2L SL(5) PCG(1E−1) #31 2 3.53 47.19% 5.37 #32 7 2.73E+1 85.64% 1.58 #33 7 2.95E+1 88.16% 2.01
P2L SL(5) CHEB(10) #34 2 4.16 50.82% 4.56 #35 6 9.70 64.92% 4.44 #36 6 9.04 71.28% 6.56
1E−2 PREG RK2A(0.2) #37 6 2.35E+1 #38 54 1.07E+2 #39 74 1.56E+2
P2L SL(5) PCG(1E−1) #40 3 7.14 64.44% 3.29 #41 7 6.50E+1 94.88% 1.65 #42 7 7.24E+1 96.04% 2.16
P2L SL(5) CHEB(10) #43 3 5.16 55.89% 4.55 #44 9 1.26E+1 69.26% 8.51 #45 10 1.44E+1 69.92% 1.09E+1
1E−3 PREG RK2A(0.2) #46 16 3.85E+1 #47 160 3.45E+2 #48 224 5.47E+2
P2L SL(5) PCG(1E−1) #49 3 1.27E+1 77.88% 3.04 #50 9 2.53E+2 98.27% 1.36 #51 10 3.49E+2 98.82% 1.57
P2L SL(5) CHEB(10) #52 6 8.51 67.56% 4.52 #53 27 3.92E+1 73.91% 8.80 #54 31 4.10E+1 76.74% 1.34E+1

131072 1E−1 PREG RK2A(0.2) #55 4 6.62E+1 #56 25 2.71E+2 #57 33 3.52E+2
P2L SL(5) PCG(1E−1) #58 2 1.24E+1 31.60% 5.33 #59 6 7.17E+1 78.58% 3.77 #60 6 1.02E+2 82.62% 3.46
P2L SL(5) CHEB(10) #61 2 1.33E+1 46.49% 4.97 #62 5 2.58E+1 53.36% 1.05E+1 #63 5 3.07E+1 56.96% 1.15E+1
1E−2 PREG RK2A(0.2) #64 6 8.38E+1 #65 63 6.61E+2 #66 92 9.42E+2
P2L SL(5) PCG(1E−1) #67 2 1.56E+1 39.61% 5.37 #68 7 2.17E+2 92.20% 3.05 #69 7 2.51E+2 94.07% 3.76
P2L SL(5) CHEB(10) #70 3 2.06E+1 47.60% 4.06 #71 11 5.75E+1 57.39% 1.15E+1 #72 12 5.86E+1 59.85% 1.61E+1
1E−3 PREG RK2A(0.2) #73 16 1.66E+2 #74 188 1.74E+3 #75 279 2.29E+3
P2L SL(5) PCG(1E−1) #76 3 3.08E+1 75.16% 5.39 #77 9 6.73E+2 97.64% 2.58 #78 8 9.54E+2 98.19% 2.40
P2L SL(5) CHEB(10) #79 6 3.04E+1 57.77% 5.46 #80 33 1.37E+2 65.15% 1.27E+1 #81 38 1.74E+2 63.03% 1.32E+1

524288 1E−1 PREG RK2A(0.2) #82 4 5.40E+2 #83 25 1.97E+3 #84 37 2.68E+3
P2L SL(5) PCG(1E−1) #85 2 6.20E+1 29.01% 8.71 #86 5 3.54E+2 77.07% 5.58 #87 5 4.25E+2 83.31% 6.31
P2L SL(5) CHEB(10) #88 2 7.15E+1 35.60% 7.55 #89 4 1.21E+2 46.50% 1.63E+1 #90 5 1.60E+2 50.11% 1.68E+1
1E−2 PREG RK2A(0.2) #91 6 6.49E+2 #92 67 4.61E+3 #93 103 7.12E+3
P2L SL(5) PCG(1E−1) #94 2 7.12E+1 36.97% 9.11 #95 6 8.72E+2 90.68% 5.29 #96 6 1.20E+3 93.45% 5.92
P2L SL(5) CHEB(10) #97 3 8.31E+1 38.18% 7.81 #98 11 2.96E+2 55.59% 1.56E+1 #99 14 3.51E+2 56.00% 2.03E+1
1E−3 PREG RK2A(0.2) #100 16 1.31E+3 #101 196 1.31E+4 #102 310 2.10E+4
P2L SL(5) PCG(1E−1) #103 2 1.27E+2 59.84% 1.03E+1 #104 7 2.93E+3 96.85% 4.47 #105 7 4.58E+3 97.99% 4.58
P2L SL(5) CHEB(10) #106 6 1.68E+2 49.40% 7.77 #107 35 8.89E+2 60.45% 1.47E+1 #108 46 1.17E+3 61.08% 1.80E+1

1.3. Limitations and Unresolved Issues

Several limitations and unresolved issues remain. We assume similar intensity statistics for mR and mT. This is a common assumption in many deformable registration algorithms. For multimodal registration problems we have to replace the squared L2-distance in (1a) with more involved distance measure; examples can be found in [61, 67]. We present results only for d = 2. Nothing in our formulation and numerical approximation is specific to the two-dimensional case. In this work we discuss improvements of the algorithm used in our preceding work [58,59] en route to an effective three-dimensional solver. Once this three-dimensional solver is available, we will extend the study presented in [58], by providing a detailed comparison of our method against diffeomorphic image registration approaches of other groups in terms of efficiency and inversion accuracy.

1.4. Related Work

The body of literature on diffeomorphic image registration, numerical optimization in optimal control, preconditioning of KKT systems, and the effective solution of hyperbolic transport equations is extensive. We limit the discussion to work that is most relevant to ours.

1.4.1. Diffeomorphic Image Registration

Lucid overviews for image registration can be found in [33, 62,67]. Related work on velocity field based diffeomorphic image registration is discussed in [5,6,10,19,46, 55,58,59] and references therein. Related optimal control formulations for image registration are described in [9, 13, 19, 25, 55, 58, 59, 64, 66, 73]. Most work on velocity based diffeomorphic registration considers first order information for numerical optimization (see, e.g., [10,19,23,25,46,55,73]), with the exceptions of our own work [58, 59] and [6, 13, 47, 66]; only [13, 66] discuss preconditioning strategies (see also below). The application of a Newton–Krylov solver for incompressible and near-incompressible formulations (with an additional control on a mass-source term) for diffeomorphic image registration is, to the best of our knowledge, exclusive to our group [58,59].

1.4.2. PDE Constrained Optimization

There exists a huge body of literature for the numerical solution of PDE constrained optimization problems. The numerical implementation of an efficient solver is, in many cases, tailored towards the nature of the control problem, e.g., by accounting for the type and structure of the PDE constraints; see for instance [1, 15] (elliptic), [2, 37, 60, 69] (parabolic), or [13, 19, 55] (hyperbolic). We refer to [14, 21, 39, 49, 51] for an overview on theoretical and algorithmic developments in optimal control and PDE constrained optimization. A survey on strategies for preconditioning saddle point problems can be found in [12]. We refer to [20] for an overview on multigrid methods for optimal control problems.

Our preconditioner can be viewed as a simplified two level multigrid v-cycle with a smoother based on the inverse regularization operator and the coarse grid solve is inexact. We note, that more sophisticated multigrid preconditioners for the reduced Hessian exist [2,19]. Multigrid approaches have been considered in [19] for optical flow and in [13, 66] for the Monge-Kantorovich functional. The work of [19] is the most pertinent to our problem. It is a space-time multigrid in the full KKT conditions and the time discretization scheme is CFL restricted, and, thus, very expensive. The effectiveness of the smoother depends on the regularization functional–it is unclear how to generalize it to incompressible velocities. Our scheme is simpler to implement, supports general regularizations, and is compatible with our semi-Lagrangian time discretization. The preconditioner in [13] is a block triangular preconditioner based on a perturbed representation of the GN approximation of the full space KKT system. A similar preconditioner that operates on the reduced space Hessian is described in [66]. In some sense, we do not approximate the structure of our Hessian operator; we invert an exact representation. We amortize the associated costs as follows: (i) we solve for the action of the inverse inexactly, and (ii) we invert the operator on a coarser grid.

1.4.3. The Semi-Lagrangian Method

We refer to [32] for a summary on solvers for advection dominated systems. Example implementations for the solution of hyperbolic transport equations that have been considered in the work cited above are implicit Lax-Friedrich schemes [13, 66], explicit high-order total variation diminishing schemes [19,25,46], or explicit, pseudospectral (in space) RK2 schemes [58,59]. These schemes suffer either from numerical diffusion and/or CFL time step restrictions. We use a high-order, unconditionally stable semi-Lagrangian formulation. Semi-Lagrangian methods are well established and have first been considered in numerical weather prediction [68]. The use of semi-Lagrangian schemes is not new in the context of diffeomorphic image registration. However, such schemes have only been used to solve for the deformation map and/or solve the forward problem [10, 23, 25, 48] and for the adjoint problem in the context of approximate gradient-descent methods.

1.5. Organization and Notation

We summarize our notation in Tab. 1. We summarize the optimal control formulation for diffeomorphic image registration in §2. We describe the solver in §3. We provide the optimality system and the Newton step in §3.1. We describe the discretization in §3.2. The schemes for integrating the hyperbolic PDEs that appear in our formulation are discussed in §3.3. We describe our Newton–Krylov solver in §3.4; this includes a discussion of the preconditioners for the solution of the reduced space KKT system. We provide numerical experiments in §4. We conclude with §5.

Table 1.

Commonly used notation and symbols.

Symbol/Notation Description
CFL Courant–Friedrichs–Lewy (condition)
FFT Fast Fourier Transform
GN Gauss–Newton
KKT Karush–Kuhn–Tucker (system)
matvec (Hessian) matrix-vector product
PCG Preconditioned Conjugate Gradient (method)
PCG(ε) PCG, where ε > 0 indicates the used tolerance
PDE partial differential equation
PDE solve solution of a hyperbolic transport equation
RK2 2nd order Runge–Kutta (method)
RK2(c) RK2 method, where c indicates the employed CFL number
RK2A RK2 scheme based on an antisymmetric form
RK2A(c) RK2A method, where c indicates the employed CFL number
SL semi-Lagrangian (method)
SL(c) SL method, where c indicates the employed CFL number

d spatial dimensionality; typically d ∈ {2, 3
spatial domain; Ω := (−π, π)dRd with boundary Ω and closure Ω¯:=ΩΩ
x spatial coordinate; x := (x1, …, xd)R
mR reference image; mR:Ω¯R
mT template image; mT:Ω¯R
m state variable (transported intensities); m:Ω¯×[0,1]R
m1 deformed template image (state variable at t = 1); m1:Ω¯R
λ adjoint variable (transport equation); λ:Ω¯×[0,1]R
p adjoint variable (incompressibility constraint); p:Ω¯R
v control variable (stationary velocity field); v:Ω¯Rd
w control variable (mass source); w:Ω¯R
b body force; b:Ω¯Rd
H
(reduced) Hessian
g (reduced) gradient
y Eulerian (pullback) deformation map
F deformation gradient at t = 1 (computed from v); F:Ω¯Rd×d; F := (∇y)1
βv regularization parameter for the control v
βw regularization parameter for the control w
A
regularization operator (variation of regularization model acting on v)
i partial derivative with respect to xi, i = 1, …, d.
t partial derivative with respect to time
Dt Lagrangian derivative
gradient operator (acts on scalar and vector fields)
Δ Laplacian operator (acts on scalar and vector fields)
∇·, divergence operator (acts on vector and 2nd order tensor fields)
,L2(X)
L2 inner product on X

2. Optimal Control Formulation

We consider a PDE constrained formulation, where the constraints consist of a scalar transport equation for the image intensities. We solve for a stationary velocity field νU and a mass-source wW as follows [59]:

minm,ν,wJ[ν,w]=12m1mRL2(Ω)2+βv2νV2+βw2wW2 (2a)

subject to

tm+mν=0inΩ×(0,1], (2b)
m=mTinΩ×{0}, (2c)
ν=winΩ (2d)

and periodic boundary conditions on Ω. We measure the distance between the reference image mR and the deformed template image m1 via a squared L2-distance. The contributions of the regularization models for w and v are controlled by the weights βv > 0 and βw > 0, respectively. We consider an H1-regularization norm for w, i.e.,

wH1(Ω)2:=Ωww+w2dx. (3)

We consider three quadratic regularization models for v; an H1-, an H2-, and an H3-seminorm:

|ν|H1(Ω)d2:=Ων:νdx,|ν|H2(Ω)d2:=ΩΔνΔνdx,and|ν|H3(Ω)d2:=ΩΔν:Δνdx. (4)

The use of an H1-seminorm is motivated by related work in computational fluid dynamics; we will see that the first order variations of our formulation will result in a system that reflects a linear Stokes model under the assumption that we enforce ∇ · v = 0 [25, 58, 59, 64]. We use an H2-seminorm if we neglect the incompressibility constraint (2d). This establishes a connection to related formulations for diffeomorphic image registration [10,46,48]; an H2-norm is the paramount model in many algorithms (or its approximation via its Green’s function; a Gaussian kernel) [10].

Remark 1

The norm on w acts like a penalty on ∇ · v. In fact, we can eliminate (2d) from (2) by inserting ∇ · v for w into the regularization norm in (2a). If we neglect the incompressibility constraint (2d) the space U for v is given by the Sobolev space V (this formulation is, e.g., used in [46] for a non-stationary velocity with H2-regularity in space and L2 regularity in time). If we set w in (2d) to zero, the computed velocity will be in the space of divergence free velocity fields with Sobolev regularity in space, as defined by V (examples for this formulation can be found in [25, 58, 59, 64]). For a non-zero w we additionally require that the divergence of v is in W. An equivalent formulation is, e.g., presented in [18,19]. They use H1-regularity for v and stipulate L2-regularity for its divergence, and proof existence of the state and adjoint variables for smooth images [19]. In particular, they provide existence results for a unique, H1-regular solution of the forward problem under the assumption of H1-regularity for the template image. The same regularity requirements hold true for the adjoint equation. In our formulation, we not only require v to be an H1-function, but also that its divergence is in H1 (according to (3)). Another approach to impose regularity on v is to not only control the divergence but also control its curl (see, e.g., [4,55]). We provide additional remarks in §A.

3. Numerics and Solver

In what follows, we describe our numerical solver for computing a discrete approximation to the continuous problem. We use a globalized, preconditioned, inexact, reduced space1 (Gauss–)Newton–Krylov method. Our scheme is described in detail in [58]. We will briefly recapitulate the key ideas and main building blocks.

We use the (formal) Lagrangian method [56] to solve (2); the Lagrangian functional L is given by

L[ϕ]:=J[ν,w]+01tm+νm,λL2(Ω)dt+m0mT,νL2(Ω)νw,pL2(Ω) (5)

with ϕ := (m, λ, p, w, v) and Lagrange multipliers λ×Ω¯×[0,1]R for the hyperbolic transport equation (2b), v:Ω¯R for the initial condition (2c), and p:Ω¯R for the incompressibility constraint (2d) (we neglect the periodic boundary conditions for simplicity). The Lagrange multiplier functions inherit the boundary conditions of the forward operator.

Remark 2

We can consider two numerical strategies to tackle (2). We can either use an optimize-then-discretize approach or a discretize-then-optimize approach. We choose the former, i.e., we compute variations of the continuous problem and then discretize the optimality system. In general, this approach does not guarantee that the discretization of the gradient is consistent with the discretized objective. Further, it is not guaranteed that the discretized forward and adjoint operators are transposes of one another. Likewise, it is not guaranteed that the discretized Hessian is a symmetric operator. We report numerical experiments to quantify these errors; we will see that they are below the tolerances we target for the inversion. By using a discretize-then-optimize approach one can (by construction) guarantee that the derived operators are consistent. However, it is, e.g., not guaranteed that the forward and adjoint operators (in the transposed sense) yield the same numerical accuracy (see, e.g., [29, 44]). We refer, e.g., to [21,39] for additional remarks on the discretization of optimization and control problems.

3.1. Optimality Conditions and Newton Step

From Lagrange multiplier theory we know that we require vanishing variations of L in (5) with respect to the state, adjoint, and control variables ϕ for an admissible solution to (2). We present the steps necessary to evaluate the reduced gradient and Hessian matvec. The associated PDE operators are derived using calculus of variations, and invoking Green’s identities. We will see that the optimality conditions of our problem form a system of PDEs. This system needs to be solved to find a solution of (2). We will only present the strong form of our reduced space formulation.2 Note, that we also eliminate the incompressibility constraint from the optimality system (see [58, 59] for details; we comment on this in more detail in §A); we only iterate on the reduced space for the velocity field v. The control/decision equation (reduced gradient) for our problem is given by

g(ν):=βvA[ν]+K[b]=βvA[ν]+K[01λmdt] (6)

with (pseudo-)differential operators A (regularization) and K (projection); the definitions are given below. Formally, we require g(v*) = 0 for an admissible solution v to (2). We can compute this minimizer iteratively using g in a gradient descent scheme. To evaluate g we need to find the space-time fields m and λ given a candidate v. We can compute m by solving the state equation (primal)

tm+νm=0inΩ×(0,1], (7a)
m=mTinΩ×{0}, (7b)

with periodic boundary conditions on forward in time. Once we have found m at t = 1 we can compute λ by solving the adjoint or costate equation (dual)

tλλν=0inΩ×[0,1), (8a)
λ=(mmR)inΩ×{1}, (8b)

with periodic boundary conditions on backward in time; notice, that for vanishing ∇ · v (8a) will also be a transport equation.

What is missing to complete the picture for g is a specification of the operators A and K. The differential operator A in (6) corresponds to the first variation of the seminorms in (4). We have

A[ν]=Δν,A[ν]=Δ2ν,andA[ν]=Δ3ν (9)

for the H1, H2, and H3 case, respectively, resulting in an elliptic, biharmonic, or triharmonic integro-differential control equation for v, respectively. The pseudo-differential operator K in (6) originates from the elimination of p and (1d). For instance, if we set w = 0 we obtain the Leray operator K[b]:=Δ1b+b; for non-zero w this operator becomes more complicated (see [58, 59] for details on the derivation of this operator). Combining the state (primal) (7), the adjoint (8), and the control equation (6) provides the formal optimality conditions (see §A).

A common strategy to compute a minimizer for (2) is to use

ν=ν+(βνA)1K[01λmdt]

as a search direction (see, e.g., [46]). We opt for a (Gauss–)Newton–Krylov method instead, due to its superior rate of convergence (see [58] for a comparison). Formally, this requires second variations of L. The expression for the action of the reduced space Hessian H on a vector ν (i.e., the incremental control/decision equation) is given by

H[ν](ν):=βνA[ν]+K[b]=βνA[ν]+K[01λm+λmdt]. (10)

The operators A and K are as defined above. We, likewise to the reduced gradient g in (6), need to find two space-time fields m and λ. We can find the incremental state variable m by solving

tm+νm+νm=0inΩ×(0,1], (11a)
m=0inΩ×{0}, (11b)

with periodic boundary conditions on Ω, forward in time. Once we have found m we can compute the incremental adjoint variable λ by solving

tλ(λν+λν)=0inΩ×[0,1), (12a)
λ=minΩ×{1}, (12b)

with periodic boundary conditions on backward in time. Thus, each time we apply the Hessian to a vector we have to solve two PDEs—(11a) and (12a).

3.2. Discretization

We subdivide the time interval [0, 1] into ntN uniform steps tj, j = 0, …, nt, of size ht = 1/nt. We discretize Ω := (−π, π)d via a regular grid with cell size hx=(hx1,,hxd)R>0d, hx=2πnx, nx=(nx1,,nx2)Nd; we use a pseudospectral discretization with a Fourier basis. We discretize the integral operators based on a midpoint rule. We use cubic splines as a basis function for our interpolation model.

3.3. Numerical Time Integration

An efficient, accurate, and stable time integration of the hyperbolic PDEs that appear in our optimality system is critical for our solver to be effective. Each evaluation of the objective functional J in (2a) requires the solution of (7a) (forward in time). The evaluation of the reduced gradient g in (6) requires an additional solution of (8a) (backward in time). Applying the reduced space Hessian H (Hessian matvec) in (10) necessitates the solution of (11a) (forward in time) and (12a) (backward in time).

3.3.1. Second order Runge-Kutta Schemes

In our original work [58, 59] we solved the transport equations based on an RK2 scheme (in particular, Heun’s method). This method—in combination with a pseudospectral discretization in space—offers high accuracy solutions, minimal numerical diffusion, and spectral convergence for smooth problems at the cost of having to use a rather small time step due to its conditional stability; the time step size ht has to be chosen according to considerations of stability rather than accuracy. This scheme can become unstable, even if we adhere to the conditional stability (see §4.1 for examples). One strategy to stabilize our solver is to rewrite the transport equations in antisymmetric form [35,53]. Here we extend this stable scheme to the adjoint problem and the Hessian operator. We do so by deriving the antisymmetric form of the forward operator and then formally computing its variations. It is relatively straightforward but we have not seen this in the literature related to inverse transport problems. We present the associated PDE operators in §B. We refer to this solver as RK2A scheme. It is evident that the discretization in antisymmetric from requires more work (see §B). We provide estimates in terms of the number of FFTs we have to perform in Tab. 12 in §C.

Table 12.

Computational complexity of our solver for the compressible case. We report this complexity as a function of the number of FFTs and interpolation steps within the key building blocks of our solver. We provide these counts for (i) the hyperbolic transport equations that appear in the optimality system (state equation (7a): SE; adjoint equation (8a): AE; incremental state equation (11a): incSE; incremental adjoint equation (12a): incAE), the evaluation of the objective Jh in (2a), the evaluation of the gradient gh in (6), and the Hessian matvec in (10). We report numbers for the full Newton case (FN) and the Gauss–Newton approximation (GN). The costs for evaluating the objective include the costs for the forward solve. We assign the costs of the adjoint solve to the evaluation of the gradient. The costs for the Hessian matvec include the solution of the incSE and incAE.

RK2 RK2A SL

FFTs IPs FFTs IPs FFTs IPs
compressible SE 2(d + 1)nt 2(d +1)+ 4(d + 1)nt d + nt
AE 2(d + 1)nt 2(d +1)+ 4(d + 1)nt d+1 d + nt +1
incSE 4(d + 1)nt 4(d +1) + 6(d + 1)nt (d + 1)nt d + (d + 1)nt
incAE (FN) 2(d+1)nt 4(d +1) + 6(d + 1)nt (d + 1)nt 2nt + 1
incAE (GN) 2(d+1)nt 2(d +1) + 4(d + 1)nt d + 1 nt + 1
Jh
2d + 2(d + 1)nt 2d + 2(d +1) + 4(d +1)nt 2d d + nt
gh 2d + 3(d + 1)nt 2d + 2(d +1)+ 7(d + 1)nt 2d + (d +1) (nt +1) d + nt +1
matvec (FN) 2d + 8(d + 1)nt 2d + 8(d +1)+ 18(d + 1)nt 2d + 4(d + 1)nt d + (d + 3) nt +1
matvec (GN) 2d + 7(d + 1)nt 2d + 6(d +1)+ 13(d + 1)nt 2d + (d +1) (2nt +1) d + (d + 2) nt +1

3.3.2. Semi-Lagrangian Formulation

Next, we describe our semi-Lagrangian formulation. To be able to apply the semi-Lagrangian method to the transport equations appearing in our optimality systems, we have to reformulate them. Using the identity ∇ · uv = u∇ · v +∇u ·v for some arbitrary scalar function u:Ω¯R, we obtain

tm+νm=0inΩ×(0,1], (13a)
tλνλλν=0inΩ×[0,1), (13b)
tm+νm+νm=0inΩ×(0,1], (13c)
tλνλλνλν=0inΩ×[0,1). (13d)

These equations are all of the general form dtu = tu + v ·u = f (u, v), where u:Ω¯×[0,1]R is some arbitrary scalar function and dt := t + v · ∇. If the Lagrangian derivative vanishes, i.e., dtu = 0, u is constant along the characteristics X : [τ0, τ1] → Rd of the flow, where [τ0, τ1] ⊆ [0, 1]. We can compute X by solving the ODE

dtX(t)=ν(X(t))in(τ0,τ1] (14a)
X(t)=xat{τ0} (14b)

The solution of (14) requires the knowledge of the velocity field v at points that do not coincide with the computational grid; we have to interpolate v in space.3

The idea of pure Lagrangian schemes is to solve dtu = f (u, v) along the characteristic lines (14). The key advantage of these methods is that they are essentially unconditionally stable [68]; i.e., the time step ht may be chosen according to accuracy considerations rather than stability considerations.4 On the downside the solution will no longer live on a regular grid; the grid changes over time and eventually might become highly irregular. Semi-Lagrangian methods can be viewed as a hybrid between Lagrangian and Eulerian methods; they combine the best from both worlds—they operate on a regular grid and are unconditionally stable.

The semi-Lagrangian scheme involves two steps: For each time step tj we have to compute the departure point XD := X(t = tj−1) of a fluid parcel by solving the characteristic equation (14) backward in time, with initial condition X(t = tj) = x.5 We revert to a uniform grid by interpolation. The second step is to compute the transported quantity along the characteristic X. The accuracy of the semi-Lagrangian method is sensitive to the time integrator for solving (14) as well as the interpolation scheme used to evaluate the departure points XD. We discuss the individual building blocks of our solver next.

Tracing the Characteristic

For each time step tj of the integration of a given transport equation we have to trace the characteristic X backward in time in an interval [tj−1, tj] ⊂ [0, 1]. Since we invert for a stationary velocity field v we have to trace X (i.e., compute the departure points XD) only once in every Newton iterations used for all time steps.6 We use an explicit RK2 scheme (Heun’s method) to do so [68]. We illustrate the computation of the characteristic in Fig. 1. Each evaluation of the right hand side of (14) requires interpolation.

Fig. 1.

Fig. 1

Tracing the characteristic X in a semi-Lagrangian scheme. We start with a regular grid Ωh (dark orange points on the right) consisting of coordinates x at time point tj. We assume we have already computed the intermediate solution uh of a given transport equation at time point tj−1; we know the input data on the regular grid at time point tj−1 (the regular grid nodes are illustrated in light gray). In a first step, we trace back the characteristic X by solving (14) backward in time subject to the initial condition X(t = tj) = x. Once we have found the characteristic (black line in the figure on the left) we can—in a second step—assign the value of uh at tj given at the departure point XD (dark orange point on the left) to x at tj based on some interpolation model. We illustrate the grid of departure points in light orange and the original grid in gray (left figure).

Interpolation

We use a cubic spline interpolation model to evaluate the transported quantities along the characteristic X. We pad the imaging data to account for the periodic boundary conditions. The size of the padding zone is computed at every iteration based on the maximal displacement between the original grid nodes and the departure points XD; we also account for the support of the basis functions of the interpolation model.7

Transport

To transport the quantity of interest we have to solve equations of the form

dtu(X(t),t)=f(u(X(t)),t),ν(X(t)))in[tj1,tj] (15)

along the characteristic X. We use an explicit RK2 scheme to numerically solve (15). Since u will be needed along the characteristic X we have to interpolate u at the computed departure points XD.

3.4. Numerical Optimization

We use a globalized, inexact, matrix-free (Gauss–)Newton–Krylov method for numerical optimization. Our solver has been described and tested in [58]. In what follows, we will briefly revisit this solver and from thereon design a nested, two-level preconditioner for the reduced space optimality conditions.

3.4.1. Newton–Krylov Solver

The Newton step for updating νkhRn, n=di=1dnxi, is in general format given by

Hhνkh=gkh,νk+1h=νkh+αkνkh, (16)

where HhRn,n, is the reduced space Hessian operator, νkhRn the search direction, and gkhRn the reduced gradient.8 We globalize our iterative scheme on the basis of a backtracking line search subject to the Armijo–Goldstein condition with step size αk > 0 at iteration kN (see, e.g., [63, page 37]). We keep iterating until the relative change of the gradient gkhrel:=gkh/g0h is smaller or equal to 1E2 or gkh1E5 (other stopping conditions can be used; see e.g. [38, pages 305 ff.]). We refer to the steps necessary for updating νkh as outer iterations and to the steps necessary for “inverting the reduced Hessian” in (16) (i.e., the steps necessary to solve for the search direction νkh) as inner iterations (see [58,59] for more details).

We use a Krylov iterative solver to compute νkh. To evaluate the reduced gradient gkhRn on the right hand side of (16) (see (6)) we have to solve (7a) forward in time and (8a) backward in time for a given iterate νkh.9 Once we have found the gradient, we can solve (16). The reduced space Hessian in (16) is a large, dense, ill-conditioned operator. Solving this system is a significant challenge; we use a PCG method [50]. Indefiniteness of Hh can be avoided by using a GN approximation to the true Hessian10 or by terminating the PCG solve in case negative curvature occurs. By using a GN approximation, we sacrifice speed of convergence; quadratic convergence drops to superlinear convergence; we locally recover quadratic convergence as λh tends to zero.

An important property of Krylov subspace methods is that we do not have to store or form the reduced space Hessian Hh; we merely need an expression for the action of Hh on a vector; this is exactly what (10) provides. Each application of Hh (i.e., each PCG iteration) requires the solution of (11a) and (12a). This results in high computational costs. We use inexact solves (see [63, pages 165ff.] and references therein) to reduce these costs. Another key ingredient to keep the number of PCG iterations small is an effective preconditioner. This is what we discuss next.

3.4.2. Preconditioner

The design of an optimal preconditioner for KKT systems arising in large-scale inverse problems is an active area of research [13, 1517, 41].11 Standard techniques, like incomplete factorizations, are not applicable as they require the assembling of Hh. We provide two matrix-free strategies below.

Given the reduced gradient g, the Newton step in the reduced space is given by

H[ν](ν)=βvA[ν]+K[01λm+λmdt]=βvA[ν]+Q[ν]=g. (17)

We have introduced the operator Q[ν]:=Q[λ,m,λ,m](ν,ν) in (17) for notational convenience and to better illustrate its dependence on ν; the incremental state and adjoint variables, m and λ, are functions of ν through (11a) and (12a), respectively.

We use a left preconditioner P−1; our solver will see the system P1Hhνkh=P1gkh. Ideally the preconditioned matrix will have a much better spectral condition number and/or eigenvalues that are clustered around one. An ideal preconditioner is one that has vanishing costs for its construction and application and at the same time represents an excellent approximation to the Hessian operator Hh so that P1HhIn [11]. These are in general competing goals. Since we use a PCG method to iteratively solve (17), we only require the action of P−1 on a vector.

Regularization Preconditioner

In our original work [58,59], we use a preconditioner that is based on the exact, spectral inverse of the regularization operator Ah, i.e.,

PREG=βvAh=βvWΓW1,PREGRn,n, (18)

where W1=IdW^Cn,n, Id = diag(1, …, 1) ∈ Rd,d, Ŵ is a DFT matrix and Γ=IdΓ^Rn,n are the spectral weights for the Laplacian, biharmonic, or triharmonic differential operators in (9). The operator Ah has a non-trivial kernel; to be able to invert this operator analytically we replace the zero entries in Γ by one. If we apply PREG1 to the reduced Hessian in (17) the system we are effectively solving is a low-rank, compact perturbation of the identity:

νh+(βvAh)1Qh[νh]=(In+(βvAh)1Qh)νh. (19)

Notice that the operator PREG acts as a smoother on Qh. Applying and inverting this preconditioner has vanishing computational costs (due to our pseudospectral discretization). This preconditioner becomes ineffective for small regularization parameters βv and a high inversion accuracy (i.e., small tolerances for the relative reduction of the reduced gradient; see, e.g., [59]).

Nested Preconditioner

We use a coarse grid correction by an inexact solve to provide an improved preconditioner. This corresponds to a simplified two-level multigrid v-cycle, where the smoother is the inverse of the regularization operator and the coarse grid solve is inexact. We introduce spectral restriction and prolongation operators to change from the fine to the coarse grid and vice versa. The action of the preconditioner, i.e., of the action of the reduced space Hessian in (17), is computed on the coarse grid. This preconditioner only operates on the low frequency modes due to the restriction to a coarser grid. In our implementation, we treat the high and the low frequency components separately; we apply the nested preconditioner to the low frequency modes and leave the high frequency modes untouched. We separate the frequency components by applying an ideal low- and high-pass filter to the vector the preconditioner is applied to. As we will see below, we will actually treat the high frequency components with a smoother that is based on the inverse of our regularization operator, i.e., the Hessian to be preconditioned does not correspond to the reduced space Hessian in (17) but the preconditioned Hessian in (19). We refer to this preconditioner as P2L.

The effectiveness of this scheme is dictated by the computational costs associated with the inversion of the (coarse grid) Hessian operator H2hRn/2,n/2. One strategy for applying this preconditioner is to compute the action of the inverse of H2h using a nested PCG method. From the theory of Krylov subspace methods we know that we have to solve for the action of this inverse with a tolerance that is smaller than the one we use to solve (17) (exact solve; we refer to this approach as PCG(ε), where ε ∈ (0, 1) is the scaling for the tolerance used to solve (17)) for the outer PCG method to not break down. This increased accuracy may lead to impractical computational costs, especially since each application of H2h requires two PDE solves, and we expect this preconditioner to have a very similar conditioning as Hh. Another strategy is to solve the system inexactly. This requires the use of flexible Krylov subspace methods or a Chebyshev (CHEB) method (see e.g. [8, pages 179ff.]) with a fixed number of iterations (we refer to this strategy as CHEB(k), where k is the number of iterations). This makes the work spent on inverting the preconditioner constant but the inexactness might lead to a less effective preconditioner. Another bottleneck is the fact that the CHEB method requires estimates of the spectral properties of of the operator we try to invert; estimating the eigenvalues is expensive and can lead to excessive computational costs. We provide implementation details next, some of which are intended to speed up the formation and application of our nested preconditioner.

  • Spectral Preconditioning of Hh: Since the application of the inverse of the regularization operator Ah comes at almost no cost, we decided to use the spectrally preconditioned Hessian operator in (19) within our two-level scheme, with a small technical modification. The left preconditioned Hessian in (19) is not symmetric. We can either opt for Krylov methods that do not require the operator we try to invert to be symmetric, or we employ a spectral split preconditioner. We opt for the latter approach to be able to use a PCG method, attributed to its efficiency. The split preconditioned system is given by
    (In+(βvA)1/2Qh(βvA)1/2)s=(βvA)1/2g,
    where s:=(βvA)1/2ν. Notice, that the inverse of the regularization operator can be viewed as a smoother, which establishes a connection of our scheme to more sophisticated multigrid strategies [2,19].
  • Eigenvalue Estimates for the CHEB Method: The computational costs for estimating eigenvalues of P2L are significant. Our assumption is that we have to estimate the extremal eigenvalues only once for the registration for a given set of images (we will experimentally verify this assumption; see §4.2.1); if we change the regularization parameter we simply have to scale the estimated eigenvalues. Notice that we can efficiently estimate the eigenvalues for a zero velocity field since a lot of the terms drop in the optimality systems. We compute an estimate for the largest eigenvalue emax based on an implicitly restarted Lanczos algorithm. We approximate the smallest eigenvalue analytically under the assumption that Qh is a low-rank operator of order O(1);eminmin(In+(βvΓ)1).

  • Hyperbolic PDE Solves: Each matvec with H2h requires the solution of (11a) and (12a) on the coarse grid. We exclusively consider the semi-Lagrangian formulation to speed up the computation. In general we assume that we do not need high accuracy solutions for our preconditioner. This might even be true for the PDE solves within each Hessian matvec.12

  • Restriction/Prolongation: We use spectral restriction and prolongation operators. We do not apply an additional smoothing step after or before we restrict the data to the coarser grid. We actually observed that applying an additional Gaussian smoothing with a standard deviation of 2hx (i.e., one grid point on the coarser grid) significantly deteriorates the performance of our preconditioner for small grid sizes (e.g., 64 × 64). A more detailed study on how the choices for the restriction and prolongation operators affect the performance of our solver with respect to changes in the regularity of the underlying objects remains for future work.

  • Filters: We use simple cut-off filters before applying the restriction and prolongation operators, with a cut-off frequency of half the frequency that can be represented on the finer grid.

3.5. Implementation Details and Parameter Settings

Here, we briefly summarize some of the implementation details and parameter choices.

  • Image Data: Our solver can not handle images with discontinuities. We ensure that the images are adequately smooth by applying a Gaussian smoothing kernel with an empirically selected standard deviation of one grid point in each spatial direction. We normalize the intensities of the images to [0, 1] prior to registration.

  • PDE Solves: We use a CFL number of 0.2 for the explicit RK2 schemes; we observed instabilities for some of the test cases for a CFL number of 0.5. The semi-Lagrangian method is unconditionally stable; we test different CFL numbers.

  • Restriction/Prolongation: We use spectral prolongation and restriction operators within our pre-conditioner (more implementation details for our preconditioner can be found in the former section). We do not perform any other grid, scale, or parameter continuation to speed up our computations.

  • Interpolation: We consider a C2-continuous cubic spline interpolation model. We extend our data periodically to account for the boundary conditions.

  • Regularization: Since we study the behavior of our solver as a function of the regularization parameters, we will set their value empirically. For practical applications, we have designed a strategy that allows us to probe for an ideal regularization parameter; we perform a parameter continuation that is based on a binary search and considers bounds on the determinant of the deformation gradient as a criterion; see [58,59].

  • Globalization: We use a backtracking line search subject to the Armijo–Goldstein condition to globalize our Newton–Krylov scheme (see, e.g., [63, page 37]).

  • Stopping Criteria: We terminate the inversion if the relative change of the gradient is smaller or equal to 1E2 or gkh1E5 (other stopping conditions can be used; see, e.g., [38, pages 305 ff.]).

  • Hessian: We use a GN approximation to the reduced space Hessian Hh to avoid indefiniteness. This corresponds to dropping all expressions with λ in (10) and (12a) (see [58] for more details); we recover quadratic convergence for λ → 0.

  • KKT solve: If not noted otherwise, we will solve the reduced space KKT system in (16) inexactly, with a forcing sequence that assumes quadratic convergence (see [63, pages 165ff.] and references therein); we use a PCG method to iteratively solve (16).

  • PC solve: We compute the action of the inverse of the 2-level preconditioner either exactly using a nested PCG method or inexactly based on a nested CHEB method with a fixed number of iterations.

4. Numerical Experiments

We report numerical experiments next. The error of our discrete approximation to the control problem depends on the smoothness of the solution, the smoothness of the data, and the numerical errors/order-of-accuracy of our scheme. We perform a detailed numerical study to quantify these errors experimentally. We start with a comparison of the numerical schemes for solving the hyperbolic PDEs that appear in the optimality system and the Newton step (see §4.1). The second set of experiments analyzes the effectiveness of our schemes for preconditioning the reduced space KKT system (see §4.2).

All experiments are carried out for d = 2 using Matlab R2013a on a Linux cluster with Intel Xeon X5650 Westmere EP 6-core processors at 2.67GHz with 24GB DDR3-1333 memory. We illustrate the synthetic and real world data used for the experiments in Fig. 2 and Fig. 3, respectively.13

Fig. 2.

Fig. 2

Synthetic test problems. From left to right: reference image mR; template image mT; v1 component of velocity field; and v2 component of velocity field. The intensity values of the images are in [0, 1]. The magnitude of the velocity field is in [−0.5, 0.5] (top row) and [−1, 1] (bottom row), respectively.

Fig. 3.

Fig. 3

Registration problems. Top left: UT images (synthetic problem); top right: HAND images [3, 62]; bottom left: HEART images; bottom right: BRAIN images [26]. The intensity values for these images are normalized to [0, 1]. We provide (from left to right for each set of images) the reference image mR, the template image mT, and the residual differences between these images prior to registration.

4.1. Hyperbolic PDE solver

We study the performance of the time integrators for the hyperbolic transport equations. We only consider the problems SMOOTH A and SMOOTH B in Fig. 2 as these are constructed to be initially resolved on the considered grids Ωh. This allows us to study grid convergence without mixing in any additional problems due to potential sharp transitions in the intensity values of the image data. We will see that these simple test cases can already break standard numerical schemes.

4.1.1. Self-Convergence: State and Adjoint Equation

Purpose

To study the numerical stability and accuracy of the considered schemes for integrating the hyperbolic transport equations that appear in our optimality system.

Setup

We study the self-convergence of the considered numerical time integrators. We consider the RK2 scheme (pseudospectral discretization in space), the stabilized RK2A scheme (pseudospectral discretization in space), and the SL method (cubic interpolation combined with a pseudospectral discretization; see §3.3 for details). We test these schemes for the synthetic problems SMOOTH A and SMOOTH B in Fig. 2. We consider the state and the adjoint equation. We compute the relative 2-error between the solution of the transport equations (state equation (7a) and adjoint equation (8a)) obtained on a spatial grid of size nx and the solution obtained on a spatial grid of size nx=2nx. We compute this error in the Fourier domain; formally, the error is given by

δuhrel:=M[W1uh]nx[W1uh]nx2/[W1uh]nx2

for a given numerical solution uh. Here, [ · ]n indicates that the data is represented on a grid of size n; M is a prolongation operator that maps the data from a grid of size nx to a grid of size ñx; and W1 represents the forward Fourier operator. We use a CFL number of 0.2 to compute the number of time steps nt for the RK2 and the RK2A method. For the SL method we use the CFL numbers 0.2, 1, and 5. We expect the error to tend to zero for an increasing number of discretization points.

Results

We report results for the self-convergence of our numerical schemes in Tab. 2. We illustrate a subset of these results in Fig. 4.

Table 2.

Self-convergence for the RK2, the RK2A, and the SL method for the numerical integration of the state (see (7a); results reported in top block) and the adjoint (see (8a); results reported in bottom block) equation. We report the relative 2-error δuhrel between solutions for the state (uh=m1h) and the adjoint (uh=λ0h) equation computed on a grid of size nx and a grid of size 2nx. We use a CFL number of 0.2 for the RK2 and the RK2A method, and a CFL number of 0.2, 1, and 5, for the SL method; we provide the associated number of time points nt. We report errors for different grid sizes and test problems (top block: SMOOTH A; bottom block: SMOOTH B; see Fig. 2);

RK2(0.2) RK2A(0.2) SL(0.2) SL(1) SL(5)

nxi
run nt
δuhrel
time run nt
δuhrel
time run nt
δuhrel
time run nt
δuhrel
time run nt
δuhrel
time
STATE EQ A 64 #1 26 1.25E−5 3.25E−1 #2 26 1.25E−5 6.64E−1 #3 26 3.79E−6 8.53E−1 #4 6 4.93E−5 2.70E−1 #5 2 3.19E−4 5.88E−2
128 #6 51 3.28E−6 1.63 #7 51 3.28E−6 3.20 #8 51 8.63E−7 3.85 #9 11 1.52E−5 1.32 #10 3 1.69E−4 2.41E−1
256 #11 102 8.19E−7 1.07E+1 #12 102 8.19E−7 2.04E+1 #13 102 2.00E−7 3.04E+1 #14 21 4.26E−6 6.81 #15 5 6.75E−5 1.56
512 #16 204 2.05E−7 8.44E+1 #17 204 2.05E−7 1.61E+2 #18 204 4.81E−8 2.53E+2 #19 41 1.14E−6 5.09E+1 #20 9 2.22E−5 1.14E+1

B 64 #21 102 2.36E−1 9.20E−1 #22 102 7.67E−2 2.38 #23 102 9.93E−2 3.33 #24 21 8.64E−2 6.78E−1 #25 5 7.35E−2 2.24E−1
128 #26 204 2.81E−2 6.44 #27 204 9.83E−3 1.26E+1 #28 204 1.06E−2 1.77E+1 #29 41 9.67E−3 4.60 #30 9 9.81E−3 1.15
256 #31 408 *** 4.23E+1 #32 408 1.20E−4 7.86E+1 #33 408 4.03E−4 1.16E+2 #34 82 3.67E−4 2.51E+1 #35 17 1.19E−3 6.94
512 #36 815 *** 3.43E+2 #37 815 3.88E−6 6.52E+2 #38 815 1.88E−5 1.06E+3 #39 163 2.74E−5 1.96E+2 #40 33 3.10E−4 4.23E+1

ADJOINT EQ A 64 #41 26 1.28E−4 2.14E−1 #42 26 1.28E−4 6.60E−1 #43 26 8.33E−5 7.33E−1 #44 6 1.39E−3 3.05E−1 #45 2 8.36E−3 1.64E−1
128 #46 51 3.40E−5 1.36 #47 51 3.40E−5 3.42 #48 51 2.25E−5 6.04 #49 11 4.47E−4 1.30 #50 3 4.61E−3 5.30E−1
256 #51 102 8.52E−6 9.29 #52 102 8.52E−6 2.36E+1 #53 102 5.66E−6 3.58E+1 #54 21 1.28E−4 8.24 #55 5 1.93E−3 2.08
512 #56 204 2.13E−6 7.09E+1 #57 204 2.13E−6 1.63E+2 #58 204 1.42E−6 2.92E+2 #59 41 3.46E−5 6.37E+1 #60 9 6.54E−4 1.56E+1

B 64 #61 102 4.04E−1 8.21E−1 #62 102 4.19E−1 2.23 #63 102 6.07E−1 3.27 #64 21 5.30E−1 7.79E−1 #65 5 4.34E−1 2.82E−1
128 #66 204 5.72E−2 6.31 #67 204 5.94E−2 1.14E+1 #68 204 6.56E−2 1.80E+1 #69 41 6.06E−2 3.39 #70 9 5.69E−2 1.12
256 #71 408 *** 4.20E+1 #72 408 7.31E−4 7.78E+1 #73 408 2.37E−3 1.46E+2 #74 82 2.04E−3 2.31E+1 #75 17 3.92E−3 4.81
512 #76 815 *** 2.81E+2 #77 815 1.17E−5 6.42E+2 #78 815 1.08E−4 1.19E+3 #79 163 1.13E−4 2.29E+2 #80 33 1.01E−3 5.03E+1
***

indicates that the solver became unstable (not due to a violation of the CFL condition; see text for details). We also report the time to solution (in seconds).

Fig. 4.

Fig. 4

Self-convergence for the forward solver. We illustrate solutions of the forward problem (state equation; see (7a)) for the synthetic test problems in Fig. 2 (top row: SMOOTH A; bottom rows: SMOOTH B). We report results for different grid sizes nx=(nx1,nx2). We use the same number of time steps (CFL number of 0.2) for all PDE solvers.

Observations

The most important observations are that (i) our SL scheme delivers an accuracy that is at the order of the RK2A and the RK2 scheme with a speed up of one order of magnitude,14 and (ii) that our standard RK2 scheme can become unstable if we combine it with a spectral discretization—even for smooth initial data and a smooth velocity field (run #31, run #36, run #71, and run #76 in Tab. 2). This instability is a consequence of the absence of numerical diffusion; it is completely unrelated to the CFL condition. The RK2A and the SL method remain stable across all considered test cases with a similar performance. The rate of convergence for the RK2 and the RK2A scheme are excellent; we expect second order convergence in time and spectral convergence in space (this has been verified; results not reported here). The self-convergence for the SL(0.2) method is at the order, but overall slightly better, than the one observed for the RK2 and the RK2A scheme. The error for the self-convergence increases by one order of magnitude if we increase the CFL number for the SL method to 1 or 5, respectively. Switching from test problem SMOOTH A to SMOOTH B the self-convergence deteriorates for both methods. We can observe that we can not fully resolve the problem SMOOTH B if we solve the equations on a spatial grid with less than 128 nodes along each spatial direction; the errors range between O(1E1) and O(1E2) (run #21 through run #25 for the state equation and run #61 through run #70 for the adjoint equation; see also Fig. 4). Notice that we can fully resolve the initial data and the velocity field for smaller grid sizes.

We can also observe that we loose about one order of magnitude in the rate of convergence if we switch from the state to the adjoint equation—even for the mild case SMOOTH A. This observation is consistent across all solvers. This demonstrates that the adjoint equation is in general more difficult to solve than the state equation. This can be attributed to the fact that the adjoint equation is a transport equation for the residual; the residual has, in general, less regularity than the original images (see also [73]).

As for the time to solution we can observe that the SL(0.2) scheme delivers a performance that is at the order of the RK2A(0.2) scheme (slightly worse). We have to switch to use a CFL number of 1 to be competitive with the RK2(0.2) scheme. For a CFL number of 5 the SL scheme outperforms the RK2 and RK2A scheme by about one order of magnitude in terms of time to solution. Intuitively, one would expect that the SL method delivers much more pronounced speedup due to the unconditional stability. However, the discrepancy is due to the fact that we essentially replace a large number of highly optimized FFT operations with cubic spline interpolation operations. We report estimates for computational complexity in terms of FFTs and IPs in Tab. 12 in §C. We can see in Tab. 13 in §C that the differences in CPU time between these two operations are significant.

Table 13.

Wall clock times for applying one FFT or one cubic spline interpolation step with respect to different grid sizes. The timings are obtained for the fftn and interp2 functions in Matlab R2013a on a Linux cluster with Intel Xeon X5650 Westmere EP 6-core processors at 2.67GHz with 24GB DDR3-1333 memory.

nxi
FFTs IPs factor
16 3.03E−5 4.98E−3 1.64E+2
32 3.79E−5 2.43E−3 6.40E+1
64 1.21E−4 2.90E−3 2.40E+1
128 3.22E−4 5.14E−3 1.59E+1
256 8.86E−4 2.24E−2 2.53E+1
512 3.83E−3 1.42E−1 3.70E+1
1024 1.30E−2 6.07E−1 4.69E+1
Conclusions

We can not guarantee convergence to a valid solution if we use a standard RK2 scheme in combination with a spectral discretization, even for smooth initial data; we have to use more sophisticated schemes. We provide two alternatives: a stabilized RK2 scheme (RK2A) and an SL scheme (see §3.3 for details). Both schemes remained stable across all experiments. The SL scheme delivers a performance that is very similar to the RK2A scheme with a speedup of one order of magnitude—even for our non-optimized implementation.15

4.1.2. Convergence to RK2A

Purpose

To assess (i) the convergence of the SL method to the solution of the RK2A scheme and by that (ii) the numerical errors that might affect the overall convergence of our Newton–Krylov solver.

Setup

We assess the convergence of the SL method to a solution computed on the basis of the RK2A scheme for the state and the adjoint equation (7a) and (8a), respectively. Based on our past experiments (see [58, 59]) we assume that the solution of the RK2A scheme is a silver standard. We compute the reference solution on a grid of size nx = (512, 512) with a CFL number of 0.2 (RK2A(0.2)). Likewise to the former experiment we compute the discrepancy between the numerical solutions in the Fourier domain; i.e., we report relative errors δm1hrel and δλ0hrel, where we have nx=(512,512) for the RK2A(0.2) reference solution. We report results for different discretization levels (varying number of grid points nx and nt; the CFL numbers for the SL method are 0.2, 1, 2, 5, and 10). As a reference, we also compute errors for the RK2A scheme for a CFL number of 0.2. We also compute convergence errors for the gradient; the setup is the same as for the experiment for the adjoint and state equation.

Results

We report the relative error between the solution computed based on our SL formulation and the RK2A scheme in Tab. 3. The error estimates for the reduced gradient can be found in Tab. 4.

Table 3.

Convergence of the SL method to a reference solution computed via the RK2A scheme. We compute the reference solution on a grid of size nx = (512, 512) using the RK2A scheme with a CFL number of 0.2. We report results for varying discretization sizes. We report the CFL number c, the associated number of time steps nt, and the relative 2-error between the solution for the SL scheme and the reference solution computed via the RK2A scheme. We also report errors for the RK2A method as a reference (self convergence). We report results for the state equation (two blocks on the left; see (7a)) and the adjoint equation (two blocks on the right; see (8a)). We consider the test problems SMOOTH A and SMOOTH B in Fig. 2 for the velocity field and to set up the initial and terminal conditions, respectively.

STATE EQ ADJOINT EQ

SMOOTH A SMOOTH B SMOOTH A SMOOTH B

nxi
c nt run SL run RK2A nt run SL run RK2A nt run SL run RK2A nt run SL run RK2A
64 10 2 #1 5.84E−4 3 #2 8.11E−2 2 #3 1.57E−2 3 #4 4.18E−1
5 2 #5 5.84E−4 5 #6 7.58E−2 2 #7 1.57E−2 5 #8 4.41E−1
2 3 #9 2.67E−4 11 #10 7.99E−2 3 #11 7.44E−3 11 #12 4.90E−1
1 6 #13 7.04E−5 21 #14 8.57E−2 6 #15 2.01E−3 21 #16 5.31E−1
0.2 26 #17 5.10E−6 #18 1.66E−5 102 #19 9.81E−2 #20 7.66E−2 26 #21 1.14E−4 #22 1.71E−4 102 #23 6.06E−1 #24 4.19E−1

128 10 2 #25 5.84E−4 5 #26 1.78E−2 2 #27 1.57E−2 5 #28 7.10E−2
5 3 #29 2.67E−4 9 #30 1.08E−2 3 #31 7.43E−3 9 #32 5.91E−2
2 6 #33 6.90E−5 21 #34 9.55E−3 6 #35 2.01E−3 21 #36 5.90E−2
1 11 #37 2.11E−5 41 #38 9.84E−3 11 #39 6.22E−4 41 #40 6.14E−2
0.2 51 #41 1.33E−6 #42 4.10E−6 204 #43 1.07E−2 #44 9.83E−3 51 #45 3.08E−5 #46 4.25E−5 204 #47 6.65E−2 #48 5.94E−2

256 10 3 #49 2.67E−4 9 #50 5.06E−3 3 #51 7.43E−3 9 #52 1.55E−2
5 5 #53 9.86E−5 17 #54 1.60E−3 5 #55 2.84E−3 17 #56 5.21E−3
2 11 #57 2.10E−5 41 #58 5.02E−4 11 #59 6.21E−4 41 #60 2.09E−3
1 21 #61 5.96E−6 82 #62 3.93E−4 21 #63 1.75E−4 82 #64 2.12E−3
0.2 102 #65 4.92E−7 #66 8.19E−7 408 #67 4.21E−4 #68 1.20E−4 102 #69 8.61E−6 #70 8.52E−6 408 #71 2.47E−3 #72 7.31E−4

512 10 5 #73 9.86E−5 17 #74 1.47E−3 5 #75 2.84E−3 17 #76 4.60E−3
5 9 #77 3.11E−5 33 #78 4.15E−4 9 #79 9.18E−4 33 #80 1.35E−3
2 21 #81 5.94E−6 82 #82 8.61E−5 21 #83 1.75E−4 82 #84 2.85E−4
1 41 #85 1.71E−6 163 #86 3.59E−5 41 #87 4.70E−5 163 #88 1.29E−4
0.2 204 #89 3.19E−7 #90 0 815 #91 2.09E−5 #92 0 204 #93 3.73E−6 #94 0 815 #95 1.05E−4 #96 0
Table 4.

Convergence of the reduced gradient computed via the SL method to the gradient computed via the RK2A method. We evaluate the reference gradient on a grid of size nx= (512, 512) via the RK2A method with a CFL number of 0.2. For the SL method the reduced gradient is computed on a grid of size nx = (256, 256) and nx = (512, 512) with a varying number of time steps nt. We report the CFL number c, the associated number of time steps nt, the relative 2-error between numerical approximations to the reduced gradient gh, and the wall-clock time for the evaluation of gh. We consider the test problems SMOOTH A and SMOOTH B in Fig. 2 as input data. As a reference, we also provide relative errors for the RK2A scheme.

SMOOTH A SMOOTH B

nxi
c run nt SL time RK2A time run nt SL time RK2A time
256 10 #1 3 2.54E−3 4.51E−1 #2 9 2.28E−2 1.29E0
5 #3 5 9.08E−4 1.02E0 #4 17 2.22E−2 2.19E0
2 #5 11 2.19E−4 1.77E0 #6 41 2.21E−2 3.87E0
1 #7 21 1.31E−4 2.59E0 #8 82 2.20E−2 9.55E0
0.2 #9 102 1.21E−4 1.17E+1 1.21E−4 8.57E0 #10 408 2.20E−2 3.00E+1 2.19E−2 2.74E+1

512 10 #11 5 9.00E−4 3.66E0 #12 17 1.42E−3 7.59E0
5 #13 9 2.74E−4 3.74E0 #14 33 3.94E−4 1.19E+1
2 #15 21 4.93E−5 8.72E0 #16 82 7.89E−5 2.86E+1
1 #17 41 1.23E−5 1.49E+1 #18 163 3.60E−5 6.13E+1
0.2 #19 204 4.87E−7 7.32E+1 0 4.96E+1 #20 815 2.90E−5 2.65E+2 0 2.04E+2
Observations

The most important observation is that the SL scheme converges to the RK2A(0.2) reference solution with a similar rate than the RK2A scheme itself. The SL scheme delivers an equivalent or even better rate of convergence than the RK2A scheme for a CFL number of 0.2. We lose one to two digits if we switch to higher CFL numbers; this loss in accuracy might still be acceptable for our Newton–Krylov solver to converge to almost identical solutions, something we will investigate below. Likewise to the former experiment we can again observe that the error for the adjoint equation are overall about one order of magnitude larger than those obtained for the state equation; this observation is again consistent for both schemes—the RK2A scheme and the SL scheme (see for instance run #41 and run #45; and run #42 and run #46 in Tab. 3).

Conclusions

Our SL scheme behaves very similar than the RK2A scheme with the benefit of an orders of magnitude reduction in computational work load due to the unconditional stability. We expect significant savings, especially for evaluating the Hessian, as accuracy requirements for the Hessian and its preconditioner are less significant than those for the reduced gradient for our Newton–Krylov solver to still converge. If high accuracy solutions are required, we can simply increase the number of time points to match the accuracy obtained for the RK2A scheme at the expense of an increase in CPU time.

4.1.3. Adjoint Error

Purpose

To assess the numerical errors of the discretized forward and adjoint operator.

Setup

We solve the state equation (7a) and the adjoint equation (8a) on a grid of size nx = (256, 256) for a varying number of time points nt. We consider the problem SMOOTH A in Fig. 2 to setup the equations. We report the relative error between the discretized forward operator Ch and the discretized adjoint operator (Ch):δADJ:=|Chm0h,Chm0h(Ch)Chm0h,m0h|/|Chm0h,Chm0h|. The continuous forward operator is self-adjoint, i.e., C=C; the error should tend to zero if our numerical scheme preserves this property.16

Results

We report the relative adjoint errors in Tab. 5.

Table 5.

Relative adjoint error δADJ (see text for details) for a grid size of nx = (256, 256) and a varying number of time steps nt. We consider the test problem SMOOTH A in Fig. 2. We report the CFL number c, the associated number of time steps nt, and the relative errors for the SL and the RK2A scheme.

c nt SL RK2A
10 3 3.28E−3
5 5 1.26E−3
2 11 2.75E−4
1 21 7.72E−5
0.2 102 3.30E−6 1.24E−16
Observations

The most important observation is that the RK2A scheme is self-adjoint (up to machine precision) whereas the error for the SL method ranges between O(1E6) and O(1E3) as a function of nt. If we solve the problem with a CFL number of 2 or smaller, the adjoint error is below or at the order of the accuracy we typically solve the inverse problem with in practical applications (relative change of the gradient of 1E2 or 1E3 and an absolute tolerance for the -norm of the reduced gradient of 1E−5).

Conclusions

Our SL scheme is not self-adjoint. The numerical errors are acceptable for the tolerances we use in practical applications—even for moderate CFL numbers. If we intend to solve the problem with a higher accuracy, we might have to either use a larger number of time steps or switch to the RK2A scheme to guarantee convergence. We already note that we have not observed any problems in terms of the convergence (failure to converge) nor the necessity for any additional line search steps in our solver, even if we considered a CFL number of 10.

4.2. Preconditioner

Next, we analyze the performance of our preconditioners (see §3.4).

4.2.1. Eigenvalue Estimation

We need to estimate the extremal eigenvalues of P2L if we use the CHEB method to compute the action of its inverse. This estimation results in a significant amount of computational work if we have to do it frequently (about 30 matvecs for the estimation of emax). We estimate the smallest eigenvalue based on an analytical approximation; we estimate the largest eigenvalue numerically (see §3.4.2 for details).

Purpose

To assess if the estimates for the largest eigenvalue vary significantly during the course of on inverse solve.

Setup

We solve the inverse problem for different sets of images; we consider the UT, HAND, HEART, and BRAIN images in Fig. 3. We terminate the inversion if the gradient is reduced by three orders of magnitude or if gkh1E5, k = 0, 1, 2, …. We estimate the largest eigenvalue every time the preconditioner is applied. We consider a compressible diffeomorphism (H2-regularization). The solution is computed using a GN approximation. We report results for different regularization weights βv.

Results

We summarize the estimates for the largest eigenvalue emax in Tab. 6.

Table 6.

Estimates for the largest eigenvalue emax of P2L during the course of the inversion. We limit this experiment to a compressible diffeomorphism (H2-regularization). We report results for the UT(256×256), the HAND (128×128), the HEART (192×192), and the BRAIN (256×300) images (see Fig. 3). We terminate the inversion if the relative change of the gradient is equal or smaller than three orders of magnitude or if the -norm of the reduced gradient is smaller or equal to 1E−5. We consider different regularization weights βv. We estimate emax every time we apply the preconditioner P2L. We report the initial estimate for emax (zero velocity field), and the min, mean, and max values of the estimates computed during the course of the entire inversion.

run βv emax,0 min max mean
UT #1 1E−1 1.07E+2 9.26E+1 1.07E+2 9.34E+1
#2 1E−2 1.06E+3 8.19E+2 1.06E+3 8.59E+2
#3 1E−3 1.06E+4 7.88E+3 1.06E+4 8.36E+3

HAND #4 1E−1 2.73E+1 2.50E+1 2.73E+1 2.54E+1
#5 1E−2 2.64E+2 2.20E+2 2.64E+2 2.24E+2
#6 1E−3 2.63E+3 2.13E+3 2.63E+3 2.16E+3

HEART #7 1E−1 4.22E+1 4.22E+1 4.23E+1 4.23E+1
#8 1E−2 4.13E+2 4.13E+2 4.14E+2 4.14E+2
#9 1E−3 4.13E+3 4.12E+3 4.13E+3 4.13E+3

BRAIN #10 1E−1 2.97E+1 2.92E+1 2.97E+1 2.94E+1
#11 1E−2 2.88E+2 2.82E+2 2.90E+2 2.89E+2
#12 1E−3 2.87E+3 2.81E+3 2.87E+3 2.83E+3
Observations

The most important observation is that the estimates for the largest eigenvalue do not vary significantly during the course of the iterations for most of the considered test cases. We have verified this for different reference and template images and as such for varying velocity fields. Our results suggest that we might have to only estimate the eigenvalues once for the initial guess—a zero velocity field. The costs for applying the Hessian for a zero velocity field are small—several expressions in (10), (11a), and (12a) drop or are constant. Our results suggest that the changes in the eigenvalues are a function of the changes in the magnitude of the velocity field v, i.e., the amount of expected deformation between the images. That is, we have only subtle residual differences and a small deformation in case of the HEART images; the estimated eigenvalues are almost constant. For the HAND and the UT images the deformations and the residual differences are larger; the changes in the estimates for the largest eigenvalue are more pronounced. Another important observation is that the most significant changes occur during the first few outer iterations. Once we are close to the solution of our problem, the eigenvalues are almost constant.

Overall, these results suggest that we can limit the estimation of the eigenvalues to the first iteration, or—if we observe a deterioration in the performance of our preconditioner—re-estimate the eigenvalues and for the subsequent solves again keep them fixed. We can observe that we have to estimate the eigenvalues only once for a given set of images; changes in the regularization parameter can simply be accounted for by rescaling these eigenvalue estimates. This is in accordance with our theoretical understanding of how changes in the regularization parameter affect the spectrum of the Hessian operator.

Conclusions

The estimates for the largest eigenvalue do not vary significantly during the course of the inversion for the considered test problems. We can estimate the eigenvalues efficiently during the first iteration (zero initial guess) and potentially use this estimate throughout the entire inversion.

4.2.2. Convergence: KKT Solve

Purpose

To assess the rate of convergence of the KKT solve for the different schemes to precondition the reduced space Hessian.

Setup

We consider three sets of images, the test problem SMOOTH A in Fig. 2, and the BRAIN and the HAND images in Fig. 3. We solve the forward problem to setup a synthetic test problem based on the velocity field vh of problem SMOOTH A, i.e., we transport mR to obtain a synthetic template image mT. We consider a GN approximation to Hh. We study three schemes to precondition the KKT system: (i) the regularization preconditioner PREG, (ii) the nested preconditioner P2L the inverse action of which we compute using a PCG method, and (iii) the nested preconditioner P2L the inverse action of which we compute based on a CHEB method. If we use a PCG method to invert the preconditioner, we have to use a higher accuracy than the one we use to solve the KKT system. We increase the accuracy by one order of magnitude; we refer to this solver as PCG(1E1). For the CHEB method we can use a fixed number of iterations; we have tested 5, 10, and 20 iterations. We observed an overall good performance for 10 iterations. We refer to this strategy as CHEB(10).

We perform two experiments: In the first experiment we use a true solution νh=0.5νh and apply the Hessian operator to generate a synthetic right hand side bh. We solve the KKT system Hhνh=bh with a zero initial guess for νh using a PCG method with a tolerance of 1E−12. We compute the (relative) 2-norm of the difference between νh and νh to assess if our schemes converge to the true solution with the same accuracy. We set up the KKT system based on the test problem SMOOTH A in Fig. 2.

For the second experiment, we evaluate the reduced gradient gh at the true solution νh and solve the system Hhνh=gh with a zero initial guess for νh. We solve the system using a PCG method with a tolerance of 1E6. We consider a compressible diffeomorphism (H2-regularization). We use the RK2A scheme with a CFL number of 0.2 for the regularization preconditioner and the SL scheme with a CFL number of five for the two-level preconditioner. We report results for the test problems SMOOTH A, BRAIN, and HAND. We consider different spatial resolution levels (grid convergence) and different choices for the regularization parameter βv. An ideal preconditioner is mesh-independent and delivers the same rate of convergence irrespective of the choice of the regularization weight.

Results

We summarize the results for the first experiment in Tab. 7 and the results for the second part in Tab. 8. We illustrate the convergence of a subset of the results reported in Tab. 8 in Fig. 5.

Table 7.

Error between the true solution vh and the numerical solution vh of the KKT system for different schemes to precondition the reduced space Hessian. We report the absolute and the relative 2-error between vh and vh. We report results for different preconditioners (PREG, P2L), different choices for the PDE solver (SL(c) for different CFL numbers c and RK2A(0.2)), and different choices for the method to solve for the action of the inverse of the preconditioner (CHEB(10) and PCG(1E−1)). We solve for vh using a PCG method with a tolerance of 1E−12. We consider the test problem SMOOTH A as in Fig. 2 to set up the problem. We solve the system on a grid of size 256 × 256.

run PC PDE solver PC solver
δ2
δ2,rel
#1 PREG RK2A(0.2) 7.49E−13 4.14E−14
#2 P2L SL(0.2) CHEB(10) 8.17E−14 4.51E−15
#3 SL(1) PCG(1E−1) 1.63E−12 9.03E−14
#4 SL(1) CHEB(10) 1.23E−13 6.77E−15
#5 SL(2) PCG(1E−1) 1.14E−12 6.30E−14
#6 SL(2) CHEB(10) 1.89E−13 1.05E−14
#7 SL(5) PCG(1E−1) 4.49E−12 2.48E−13
#8 SL(5) CHEB(10) 5.22E−13 2.88E−14
#9 SL(10) PCG(1E−1) 2.06E−12 1.14E−13
#10 SL(10) CHEB(10) 2.06E−12 1.14E−13
Fig. 5.

Fig. 5

Convergence results for different strategies to precondition the reduced space KKT system. We report exemplary trends of the relative residual ‖rkrel := ‖rk2/‖r02 with respect to the iteration number k. We report results for different images (top row: BRAIN; bottom row: HAND; grid size (256, 256)) with respect to varying regularization weights βv (left column: βv = 1E−1; middle column: βv = 1E−2; right column: βv = 1E−3). We solve the system at the true solution available for the considered synthetic test problems. We use an H2-regularization model (compressible diffeomorphism). We use a PCG method with a tolerance of 1E−6 to solve this system. We report results for the regularization preconditioner PREG (red curve) and the nested preconditioner P2L. We use two solvers to invert the preconditioner: PCG(1E−1) (blue curve) and CHEB(10) (green curve). The results correspond to those reported in Tab. 8.

Observations

The most important observation is that the nested preconditioner P2L is very effective; it allows us to significantly reduce the number of iterations especially when turning to low regularization parameters. Our new scheme results in a speedup by—on average—one order of magnitude, with a peak performance of more than 20x.

The results in Tab. 7 demonstrate that our schemes all converge to the true solution with an error that is at least at the order of the tolerance used to invert the KKT system, i.e. O(1E12). The solver does not seem to be sensitive to the CFL number used for the SL method.

The results in Fig. 5 suggest that there are dramatic differences in the performance of our preconditioners; the number of iterations reduces significantly for P2L. We can for instance reduce the number of iterations from 310 (run #102) to 7 (run #105). The differences in time to solution, however, are less pronounced. Our spectral discretization makes it in general extremely challenging to design a preconditioner that is more effective than PREG given its ideal application and construction costs; inverting and applying PREG is only at the cost of a spectral diagonal scaling (see §3.4.2 for details). The regularization preconditioner is effective for smooth problems and large regularization parameters βv (see the first column and, e.g., run #29 (HAND images) or run #30 (BRAIN images) in in Tab. 8). We can observe that this preconditioner becomes less effective as we decrease βv (see also [58, 59]). For example, the number of iterations increases from 33 to 92 to 279 if we reduce βv from 1E1 to 1E−2, and finally to 1E−3 (run #57, run #66, and run #75 in Tab. 8, respectively). We can reduce the number of iterations by more than one order of magnitude if we use the nested preconditioner P2L. If we use a PCG method with a tolerance of 1E7 to compute the action of the inverse of P2L the preconditioner is almost ideal, i.e., the number of iterations is independent of βv and the grid size nx (see e.g., run #14, run #41, run #68, and run #95 in Tab. 8). The low tolerance (in our case one order of magnitude smaller than the tolerance we use to solve the reduced space KKT system) to compute the action of the inverse of P2L results in significant application costs. Despite this increase in application costs we can—already for the present two-dimensional prototype implementation—reduce the time to solution for most of the test problems (see, e.g., run #67 or run #95 in Tab. 8). A significant factor is the SL scheme. We can further reduce the CPU time if we replace the PCG method for computing the action of the inverse of P2L by a CHEB method with a fixed number of iterations. We can see that the effectiveness of this scheme is almost independent of the grid size nx (compare, e.g., run #17, run #44, run #71, and run #98 in Tab. 8). Also, given that we use a fixed number of iterations the percentage of CPU time spent on applying the preconditioner remains almost constant for βv fixed. The nested preconditioner becomes less effective if we reduce βv from 1E−2 to 1E−3; the number of iterations increases, which in turn makes the speedup less pronounced (see, e.g., run #98 vs. run #107 or run #99 vs. run #108 in Tab. 8). Although the speedup varies from case to case, we can see that P2L in combination with CHEB(10) outperforms our original scheme for all experiments.

Conclusions

Our nested preconditioner allows us to reduce the number of iterations by more than one order of magnitude and the time to solution by up to a factor of more than 20. We expect these differences to be more pronounced for an optimized three-dimensional implementation, something we will investigate in a follow up paper.

4.3. Inverse Solve

Purpose

To study the rate of convergence of our scheme for the entire inverse solve.

Setup

We consider different test images to study the performance our our numerical scheme (HAND, HEART, BRAIN, UT). We terminate our solver if the gradient is reduced by two orders of magnitude or if the norm of gkh, k = 1, 2, …, is equal or smaller than 1E−5. We consider the regularization preconditioner with an RK2A(0.2) PDE solver and the two level preconditioner with an SL(5) PDE solver. We estimate the eigenvalues for the CHEB method only for the first iteration (zero velocity field). We report results for compressible, near incompressible, and incompressible diffeomorphisms, accounting for different regularization norms (H1-seminorm, H2-seminorm, and H3-seminorm). We study convergence (number of outer iterations and Hessian matvecs) as a function of the grid size, constraints, regularization parameter, and regularization norm. We choose the regularization weights empirically (based on experience from our former work [58, 59]). We report (i) the relative change of the reduced gradient, (ii) the relative change of the residual between mRh and m1h, (iii) the number of outer iterations, (iv) the number of Hessian matvecs, (v) the time to solution, and (vi) the obtained speedup compared to our original scheme.

Results

We report results for a compressible diffeomorphism (H2-regularization norm) in Tab. 9. We study grid convergence and convergence with respect to different regularization weights βv. We report results for an incompressible diffeomorphism in Tab. 10 accounting for different regularization norms (H1-seminorm; H2-seminorm; and H3-seminorm). We report results for a near-incompressible diffeomorphism in Tab. 11.

Table 9.

Convergence results for the inversion using our formulation for a compressible diffeomorphism (H2-regularization). We report results for registering different sets of images using our original preconditioner (PREG; RK2A scheme with a CFL number of 0.2) and the proposed preconditioner (P2L; SL scheme with a CFL number of 5 and 10; CHEB method with a fixed number of 10 iterations). We report results for different registration problems: HAND (grid sizes: 128 × 128; 256 × 256; and 512 × 512), HEART (grid size 192 × 192), and BRAIN (grid size: 256 × 300); see Fig. 3. We study convergence as a function of the grid size (HAND images; number of unknowns n=2nx1nx2) and as a function of the regularization parameter βv (HAND, HEART, and BRAIN images). We terminate the inversion if the relative change of the -norm of the reduced gradient is at least two orders of magnitude or if the -norm of the gradient is smaller or equal to 1E−5. We report (i) the relative change of the reduced gradient ‖grel, (ii) the relative change of the residual ‖r‖rel (L2-distance between mR and m1), (iii) the number of outer iterations, (iv) the number of Hessian matvecs, (v) the time to solution, and (vi) the speedup compared to our original scheme.

n βv run P PDE solver PC solver g‖rel rrel iter matvecs time speedup
HAND

32768 1.00E−1 #1 PREG RK2A(0.2) 4.85E−3 2.42E−1 8 58 9.45E+1
#2 P2L SL(5) CHEB(10) 6.82E−3 2.42E−1 8 21 1.72E+1 5.50
1.00E−2 #3 PREG RK2A(0.2) 8.39E−3 1.00E−1 8 97 1.55E+2
#4 P2L SL(5) CHEB(10) 5.42E−3 9.99E−2 9 30 2.23E+1 6.95
1.00E−3 #5 PREG RK2A(0.2) 8.59E−3 6.48E−2 11 401 9.89E+2
#6 P2L SL(5) CHEB(10) 8.61E−3 6.50E−2 11 67 6.63E+1 1.49E+1
131072 1.00E−1 #7 PREG RK2A(0.2) 9.04E−3 3.32E−1 12 113 7.71E+2
#8 P2L SL(5) CHEB(10) 7.27E−3 3.32E−1 13 39 9.31E+1 8.28
1.00E−2 #9 PREG RK2A(0.2) 9.67E−3 2.00E−1 11 159 1.17E+3
#10 P2L SL(5) CHEB(10) 3.52E−3 1.99E−1 14 60 1.72E+2 6.81
1.00E−3 #11 PREG RK2A(0.2) 9.59E−3 1.57E−1 17 758 1.06E+4
#12 P2L SL(5) CHEB(10) 8.44E−3 1.57E−1 18 150 6.00E+2 1.76E+1
524288 1.00E−1 #13 PREG RK2A(0.2) 1.06E−2 3.40E−1 14 134 6.38E+3
#14 P2L SL(5) CHEB(10) 9.83E−3 3.40E−1 15 46 4.81E+2 1.33E+1
1.00E−2 #15 PREG RK2A(0.2) 1.01E−2 2.11E−1 13 208 1.22E+4
#16 P2L SL(5) CHEB(10) 1.09E−2 2.11E−1 16 65 9.72E+2 1.25E+1
1.00E−3 #17 PREG RK2A(0.2) 1.11E−2 1.65E−1 19 853 8.43E+4
#18 P2L SL(5) CHEB(10) 1.13E−2 1.65E−1 23 171 3.78E+3 2.23E+1

HEART

73728 1.00E−2 #19 PREG RK2A(0.2) 9.10E−3 8.01E−1 20 473 5.46E+2
#20 P2L SL(5) CHEB(10) 9.49E−3 8.00E−1 20 105 1.63E+2 3.36
1.00E−3 #21 PREG RK2A(0.2) 9.30E−3 5.09E−1 31 1659 3.77E+3
#22 P2L SL(5) CHEB(10) 9.92E−3 5.09E−1 31 410 7.52E+2 5.02
1.00E−4 #23 PREG RK2A(0.2) 9.81E−3 2.98E−1 81 14455 6.17E+4
#24 P2L SL(5) CHEB(10) 8.86E−3 2.96E−1 76 2865 6.45E+3 9.57

BRAIN

153600 1.00E−1 #25 PREG RK2A(0.2) 9.05E−3 4.82E−1 21 269 1.75E+3
#26 P2L SL(5) CHEB(10) 9.16E−3 4.82E−1 21 70 1.69E+2 1.04E+1
#27 P2L SL(10) CHEB(10) 9.22E−3 4.82E−1 21 70 1.50E+2 1.17E+1
1.00E−2 #28 PREG RK2A(0.2) 8.75E−3 3.21E−1 74 2645 2.60E+4
#29 P2L SL(5) CHEB(10) 9.29E−3 3.21E−1 79 619 2.09E+3 1.25E+1
#30 P2L SL(10) CHEB(10) 8.99E−3 3.22E−1 80 624 1.91E+3 1.37E+1
1.00E−3 #31 PREG RK2A(0.2) 9.00E−3 2.05E−1 110 10306 1.21E+5
#32 P2L SL(5) CHEB(10) 9.54E−3 2.05E−1 74 1156 5.15E+3 2.35E+1
#33 P2L SL(10) CHEB(10) 9.21E−3 2.05E−1 76 1239 4.53E+3 2.67E+1
Table 10.

Convergence results for the inversion using our formulation for a fully incompressible diffeomorphism (linear Stokes regularization). We report results for registering the UT images (see Fig. 3; grid size: 256 × 256) using our original preconditioner (PREG; RK2A scheme with a CFL number of 0.2) and the proposed preconditioner (P2L; SL scheme with a CFL number of 5; CHEB method with a fixed number of 10 iterations). We consider different regularization norms: an H1-seminorm; an H2-seminorm; and an H3-seminorm (from top to bottom). We terminate the inversion if the change in the -norm of the reduced gradient gkh, k = 1, 2, …, is at least two orders of magnitude or if the -norm of gkh is smaller or equal to 1E−5. We report (i) the relative change of the reduced gradient ‖grel, (ii) the relative change of the residual ‖r‖ rel (L2-distance between mR and m1), (iii) the number of outer iterations, (iv) the number of Hessian matvecs, (v) the time to solution, and (vi) the speedup compared to our original scheme.

run norm P PDE solver PC solver grel rrel iter matvecs time speedup
#1 H1 PREG RK2A(0.2) 8.04E−3 1.40E−2 12 137 1.20E+3
#2 P2L SL(5) CHEB(10) 6.40E−3 1.38E−2 13 43 8.88E+1 1.35E+1
#3 H2 PREG RK2A(0.2) 9.05E−3 1.90E−1 17 177 1.38E+3
#4 P2L SL(5) CHEB(10) 8.81E−3 1.90E−1 16 53 1.06E+2 1.30E+1
#5 H3 PREG RK2A(0.2) 9.14E−3 6.24E−1 34 402 2.60E+3
#6 P2L SL(5) CHEB(10) 9.30E−3 6.24E−1 36 111 2.22E+2 1.17E+1
Table 11.

Convergence results for the inversion using our formulation for a near-incompressible diffeomorphism (linear Stokes regularization). We consider the HAND images in Fig. 3. We report results for our original preconditioner (PREG; RK2A scheme with a CFL number of 0.2) and the proposed preconditioner (P2L; SL scheme with a CFL number of 5; CHEB method with a fixed number of 10 iterations). We terminate the inversion if the change in the -norm of the reduced gradient gkh, k = 1, 2, …, is at least two orders of magnitude or if the -norm of gkh is smaller or equal to 1E−5. We report (i) the relative change of the reduced gradient ‖grel, (ii) the relative change of the residual ‖rrel (L2-distance between mR and m1), (iii) the number of outer iterations, (iv) the number of Hessian matvecs, (v) the time to solution, and (vi) the speedup compared to our original scheme.

βv βw run P PDE solver PC solver grel rrel iter matvecs time speedup
1.00E−1 1.00E−3 #1 PREG RK2A(0.2) 9.20E−3 1.55E−1 10 119 1.76E+2
#2 P2L SL(5) CHEB(10) 5.55E−3 1.57E−1 10 29 2.40E+1 7.32
1.00E−4 #3 PREG RK2A(0.2) 9.49E−3 1.48E−1 9 99 1.63E+2
#4 P2L SL(5) CHEB(10) 9.02E−3 1.50E−1 9 25 1.82E+1 8.98
1.00E−2 1.00E−3 #5 PREG RK2A(0.2) 9.14E−3 6.56E−2 14 731 1.50E+3
#6 P2L SL(5) CHEB(10) 9.57E−3 6.60E−2 13 60 7.05E+1 2.13E+1
1.00E−4 #7 PREG RK2A(0.2) 8.52E−3 5.31E−2 13 513 1.09E+3
#8 P2L SL(5) CHEB(10) 9.58E−3 5.37E−2 13 60 6.28E+1 1.74E+1

Observations

The most important observation is that our solver remains effective for the entire inversion irrespective of the regularization weights, norms, and grid size.

The average speedup compared to the stabilized version of our original solver is about 10x (see, e.g., run #1 through run #6 in Tab. 10) with a peak performance of more than 20x (see, e.g., run #17 vs. run #18 and run #31 vs. run #33 in Tab. 9 or run #5 vs. run #6 in Tab. 11). We can, e.g., reduce the time to solution from 3 hours to 10 minutes for a 256 × 256 image (run #11 vs. run #12 in Tab. 9). We can also infer that the reduced accuracy in time does not significantly affect the overall rate of convergence of our solver; the number of outer iterations remains almost constant.

Potential options to further improve our scheme are a re-estimation of the eigenvalues during the solution process once we have made significant progress (i.e., the velocity field changed drastically) and an increased accuracy for the evaluation of the gradient and the objective. That is, we currently use the same accuracy for evaluating the Hessian and the gradient. We might be able to further improve the overall accuracy and maybe convergence if we use a more accurate SL scheme when evaluating the reduced gradient.

Conclusions

Our experiments suggest that our improved solver remains effective irrespective of the regularization norm, regularization weight, or grid size. We can achieve good performance for our compressible, incompressible and near-incompressible formulations for constrained diffeomorphic image registration. We obtain a speedup of about 10x with a peak performance of 20x compared to the stabilized version of our original solver.

5. Conclusions

With this paper we follow up on our former work on constrained diffeomorphic image registration [58, 59]. We have provided an improved numerical scheme to efficiently solve the registration problem. Our solver features a semi-Lagrangian formulation, which—combined with a two-level preconditioner for the reduced space KKT system—provides a one order of magnitude speedup compared to our original solver [58,59].

We have originally described our Newton–Krylov solver in [58]; this includes a comparison against a first order gradient descent scheme still predominantly used in many diffeomorphic registration algorithms that operate on velocity fields; see, e.g., [10,46,73].17 We have extended our original formulation [58] for constrained diffeomorphic image registration in [59]. The work in [59] features a preliminary study of registration quality as a function of regularization norms and weights. The present work focuses on numerical aspects of our solver. Our contributions are:

  • The implementation of an unconditionally stable SL scheme for constrained diffeomorphic image registration.

  • The implementation of a two-level preconditioner for the system in (16).

  • A detailed numerical study of our new, improved solver.

We perform numerical tests on various synthetic and real-world datasets to study (i) the convergence behavior of our forward solver, (ii) the adjoint errors of our schemes, (iii) the grid convergence of our preconditioner, (iv) the sensitivity of our preconditioner to changes in terms of the regularization parameters, and (v) the overall convergence of our solver with respect to different choices for the preconditioner, forward solver, regularization norms, and constraints. We found that

  • Our original solver (spectral discretization in combination with an RK2 scheme; [58, 59]) can become unstable, even for smooth problems (see, e.g., run #36 in Tab. 2).

  • Our new solver (spectral discretization in combination with an RK2A scheme and our SL scheme) remains stable for all considered test cases.

  • The SL scheme results in an order of magnitude speedup due to its unconditional stability compared to the RK2A scheme, subject to a reduction in numerical accuracy (see, e.g., run #17 vs. run #20 in Tab. 2 or run #53 vs. run #66 in Tab. 3; we loose two digits accuracy by increasing the CFL number from 0.2 to 5, i.e., by switching from RK2A(0.2) to SL(5)). Our numerical study suggests that this reduction in accuracy is not critical with respect to the overall performance of our Newton–Krylov solver.

  • Our new scheme delivers a reduction in the number of inner iterations (i.e., the solution of the reduced space KKT system) by more than one order of magnitude (e.g., 8 vs. 279 iterations (see run #78 vs. run #75 in Tab. 8) or 7 vs. 310 iterations (see run #105 vs. run #102 in Tab. 8); see also Fig. 5). More importantly, we observe a speedup of, on average, 10x up to more than 20x (see, e.g., run #93 vs. run #99 in Tab. 8 for an individual solve of the KKT system, and run #32 vs. run #33 in Tab. 9, or run #5 vs. run #6 in Tab. 11 for the entire inversion).

Our algorithm can be used in other applications besides medical imaging, such as weather prediction and ocean physics (for tracking Lagrangian tracers in the oceans) [52] or reconstruction of porous media flows [34]. Although our method is highly optimized for regular grids with periodic boundary conditions, many aspects of our algorithm carry over. Our current Matlab prototype implementation is not yet competitive with efficient, highly optimized implementations for diffeomorphic image registration in terms of runtime [72].18 Even with the speedup we could achieve here, we are still not competitive with the (highly optimized multi-core) implementation of the algorithm presented in [72]. We expect this to change for the implementation of our solver for the three-dimensional case (which will feature highly-optimized implementations of the computational kernels of our solver dedicated to multi-core platforms and the design of efficient grid, scale, and parameter continuation schemes to further reduce the time-to-solution). We will extend the study in [59] by comparing our solver to state-of-the-art implementations of other groups (e.g., [7, 72, 73]) in terms of time-to-solution, registration quality, and inversion accuracy in the three-dimensional setting, something we are currently actively working on. We will also investigate other formulations for large deformation diffeomorphic image registration, such as for instance the map based approach in [46] or the inversion for an initial momentum in [73].

Appendix A. Optimality Conditions

We can derive the optimality conditions of our problem by computing the first variation of L with respect to the state, adjoint, and control variables, and applying integration by parts. Our derivation will be formal only. In general, we have to specify the regularity of the underlying objects to ensure existence of an optimal solution. The choices we make for the spaces for the velocity and the images are not independent. We will discuss this in more detail below. If we assume that the objective functional J and the PDE constraints are continuously differentiable, and satisfy a regularity condition on the constraints (see, e.g., [51]), the following system holds true at a solution ϕ* := (m*, λ*, p*, w*, v*) of problem (1):

tm+mv=0inΩ×(0,1], (20a)
m=mTinΩ×{0}, (20b)
tλ(νλ)=0inΩ×[0,1), (20c)
λ=mRminΩ×{1}, (20d)
ν=winΩ, (20e)
βvA[v]+p+b=0inΩ, (20f)
βwB[w]+p=0inΩ, (20g)

with periodic boundary conditions on Ω. We refer to (20a) with initial condition (20b), to (20c) with final condition (20d), and to (20f) and (20g) as state (variation of L with respect to λ), adjoint (variation of L with respect to m), and control (variation of L with respect to v and w) equations, respectively. The differential operator A in (20f) corresponds to the first variation of the Hk-regularization norms in (4) (see (9)). The operator B in (20g) corresponds to the first variation of the regularization operator for w.

We can completely eliminate the adjoint variables w and p, the control equation (20g), and the constraint (20e) from (20) by simple algebraic manipulations. This is straightforward for w = 0 (see [58]) and becomes slightly more involved for a non-zero w (see [59]). This elimination introduces the pseudo-differential operator K in (6). We refer to our preceding work for more details [58, 59]. Overall, we will arrive at the optimality conditions presented in §3.1. Computing variations of the weak form of the optimality conditions in §3.1 (which includes the operators K arising from the elimination of w and p) yields the PDE operators for the Newton step.

Remark 3

The derivation of the optimality conditions (20) is formal only. We note that the presentation of existence and uniqueness proofs for an optimal solution of (20) are beyond the scope of the present paper. In order for us to ensure the well-posedness of the forward problem, the differentiability of the objective functional and the constraints, and, ultimately, the existence and uniqueness for an optimal solution of the control problem, we have to make sure that the variables in our control formulation meet certain regularity requirements; we have to specify appropriate function spaces for the input images ml, l ∈ {R, T}, and the velocity field v. Several works of other authors have addressed these theoretical requirements in the context of related (optimal control) formulations; see, e.g., [9, 10, 19, 24, 25, 27, 28, 55, 73, 74]. These, e.g., include results for formulations that model images as functions of bounded variation [25, 74] or functions of Sobolev regularity [10, 19, 73], respectively. They consider H1 [19, 24, 27, 28], H2 [10, 55, 73], and H3 [25] regularization models for v, accounting for incompressible [24, 25] or near-incompressible [19,27] velocities. It has also been suggested to stipulate adequate regularity requirements by introducing a diffusion operator into the transport problem [9,54].

In our formulation, we model images as compactly supported, smooth functions; we use appropriate mollification and Gaussian smoothing to ensure that we meet these requirements. Numerically, we control the smoothness of the velocity by adjusting the weights for the regularization operator for v to ensure that we obtain a diffeomorphic map (up to numerical accuracy). Our experimental results suggest that we stably converge to a local optimal solution using our formulation. However, we note that we are not aware of a theoretical proof that H1-regularity for v and its divergence are sufficient to guarantee the existence of an optimal solution of our control problem in the theoretical limit. We also note that we observed instabilities if we stipulate H1-regularity for v only without controlling its divergence, in our numerical experiments. Instead of directly controlling the smoothness of v, we can also (additionally) control the curl of v; this will add additional regularity to our solution [4, 55]. For instance, adding an additional H1-regularization model for the curl of v will ensure that v is an H2-function. A rigorous proof remains open for future work. Finally, we note that we can change the regularization operators if our formulation for near-incompressible diffeomorphisms does not meet the theoretical requirements; all the derivations and algorithmic features presented here will still apply.

Appendix B. Stabilized RK2 Scheme

Here, we present the derivation of the stabilized RK2 scheme introduced in §3.3 for the transport equations that appear in our optimality system. We refer to [35,53] for a general discussion of this scheme. We start by deriving the antisymmetric form of the forward problem in (7a). Inserting and subtracting the term 12mν and by using the identity ∇ · mv = ∇m · v + m∇ · v we obtain

tm+12(mν+mνmν)=0inΩ×(0,1],

with periodic boundary conditions on Ω. We use this antisymmetric form as the forward operator. Computing first and second variations using this model results in

tλ12(λν+λν+λν)=0inΩ×[0,1),
tm+12(mνmν+(mν+mν)+mνmν)=0inΩ×(0,1],
tλ12(λν+λν+(λν+λν)+λν+λν)=0inΩ×[0,1),

for the adjoint, incremental state and incremental adjoint equation (notice, that some terms in the above equations will drop for the incompressible case, i.e., for ∇ · v = 0). Similarly, we obtain the integro-differential operators

b=0112(λmmλ+(λm))dt

and

b=0112(λmmλ+(λm)+λmmλ+(λm))dt

for the reduced gradient in (6) and the Hessian matvec in (10), respectively.

Appendix C. Computational Complexity

We report the computational complexity as a function of the number of FFTs we have to compute in Tab. 12. A comparison of the timings for applying a single FFT and for one cubic spline interpolation step with respect to different grid sizes can be found in Tab. 13. We compute the characteristic X for the forward and the adjoint problems only once per iteration. When we evaluate the objective and the gradient we have to solve the state and the adjoint equation. This is when we compute the characteristic; we do not recompute it during the incremental solves. We assign these costs to the solution of the state and adjoint equation.

Footnotes

1

By reduced space we mean that we will only iterate on the reduced space of the velocity v; we assume that the state and adjoint equations are fulfilled exactly. This is different to all-at-once or full space approaches, in which one iterates on all unknown variables simultaneously (see §3.1 and §A).

2

We refer to [16,17] for more details on reduced-space methods.

3

Notice that the scheme becomes more complicated if v is non-stationary; we have to interpolate in time and space.

4

For rapidly varying velocity fields, instabilities may still occur.

5

The direction of time integration depends on the transport equation. For simplicity, we will limit the description of the semi-Lagrangian method to transport equations that are solved forward in time. Notice that (13) also contains equations that have to be solved backward in time.

6

In total, we actually need compute two characteristics, one for the forward (state or primal) equations and one for the backward (adjoint or dual) equations; one defined forward in time and one defined backward in time.

7

We have tested a more accurate implementation that applies the interpolation step on a grid of half the cell size hx to minimize the interpolation error as well as the numerical diffusion. We prolong and restrict the data in the Fourier domain. The gain in numerical accuracy (one to two digits) did not justify the significant increase in CPU time.

8

The Hessian matvec is given by (10); the expression for the reduced gradient is given in (6).

9

In general we associate the cost for solving (7a) to the evaluation of the objective in (2a), i.e., to the line search.

10

This corresponds to dropping all terms with λ in (10) and (12a) (see [58,59]).

11

We study the spectral properties of Hh for the compressible and incompressible case in [58].

12

We have also tested an approximation of the forcing term by dropping all second order terms of the RK2 scheme for numerically integrating (11a) and (12a). Since we have observed instabilities in the RK2 schemes and due to the effectiveness of the semi-Lagrangian method (see §4.1) we do not report results for this preconditioner.

13

The HAND images in Fig. 3 are taken from [62]. The BRAIN images in Fig. 3 are taken from the “Nonrigid Image Registration Evaluation Project” (NIREP) available at http://nirep.org (data sets na01 and na02) [26].

14

We have also tested an implementation of the SL method that delivers more accurate solutions (less numerical diffusion) at the expense of a significant increase in time to solution. In this scheme we upsampled the data to a grid of size 2nx whenever we had to interpolate. The associated gain in accuracy did not justify the increase in computational cost.

15

Matlab’s FFT library is based on the highly optimized FFTW library (see http://www.fftw.org; [36]). For interpolation we use Matlab’s built in interp2 routine (Matlab R2013a; we report timings in Tab. 13 in §C). We expect an additional speedup if we switch to an optimized, three-dimensional C++ implementation, something we will investigate in future work.

16

Our solver is based on an optimize-then-discretize approach (see §3). We can not guarantee that the properties of the continuous operators of our constrained formulation and its variations are preserved after discretization. In a discretize-then-optimize approach the discretization is differentiated, which will result in consistent operators; we refer to [39, pages 57ff.] for a more detailed discussion on the pros and cons; we also discuss this in §3.

17

The control equation in (6) corresponds to the reduced L2 gradient, i.e., the variation of the Lagrangian L in (5) with respect to v; we use the gradient in the Sobolev space induced by the regularization operator in our gradient descentLscheme in [58]; see also [10,46].

18

We provide a more detailed study in [59]. Here, we show that we can outperform existing approaches for diffeomorphic image registration in terms of registration quality with our new formulation.

References

  • 1.Adavani SS, Biros G. Fast algorithms for source identification problems with elliptic PDE constraints. SIAM Journal on Imaging Sciences. 2008;3:791–808. [Google Scholar]
  • 2.Adavani SS, Biros G. Multigrid algorithms for inverse problems with linear parabolic PDE constraints. SIAM Journal on Scientific Computing. 2008;31:369–397. [Google Scholar]
  • 3.Amit Y. A nonlinear variational problem for image matching. SIAM Journal on Scientific Computing. 1994;15:207–224. [Google Scholar]
  • 4.Amrouche C, Bernardi C, Dauge M, Girault V. Vector potentials in three-dimensional nonsmooth domains. Mathematical Methods in the Applied Sciences. 1998;21:823–864. [Google Scholar]
  • 5.Ashburner J. A fast diffeomorphic image registration algorithm. NeuroImage. 2007;38:95–113. doi: 10.1016/j.neuroimage.2007.07.007. [DOI] [PubMed] [Google Scholar]
  • 6.Ashburner J, Friston KJ. Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation. NeuroImage. 2011;55:954–967. doi: 10.1016/j.neuroimage.2010.12.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Avants BB, Epstein CL, Brossman M, Gee JC. Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis. 2008;12:26–41. doi: 10.1016/j.media.2007.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Axelsson O. Iterative solution methods. Cambridge University Press; 1996. [Google Scholar]
  • 9.Barbu V, Marinoschi G. An optimal control approach to the optical flow problem. Systems & Control Letters. 2016;87:1–9. [Google Scholar]
  • 10.Beg MF, Miller MI, Trouvé A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision. 2005;61:139–157. [Google Scholar]
  • 11.Benzi M. Preconditioning techniques for large linear systems: A survey. Journal of Computational Physics. 2002;182:418–477. [Google Scholar]
  • 12.Benzi M, Golub GH, Liesen J. Numerical solution of saddle point problems. Acta Numerica. 2005;14:1–137. [Google Scholar]
  • 13.Benzi M, Haber E, Taralli L. A preconditioning technique for a class of PDE-constrained optimization problems. Advances in Computational Mathematics. 2011;35:149–173. [Google Scholar]
  • 14.Biegler LT, Ghattas O, Heinkenschloss M, van Bloemen Waanders B. Large-scale PDE-constrained optimization, Springer. 2003;3 [Google Scholar]
  • 15.Biros G, Doğan G. A multilevel algorithm for inverse problems with elliptic PDE constraints. Inverse Problems. 2008;24 [Google Scholar]
  • 16.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization—Part I: The Krylov-Schur solver. SIAM Journal on Scientific Computing. 2005;27:687–713. [Google Scholar]
  • 17.Biros G, Ghattas O. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization—Part II: The Lagrange-Newton solver and its application to optimal control of steady viscous flows. SIAM Journal on Scientific Computing. 2005;27:714–739. [Google Scholar]
  • 18.Borzì A, Ito K, Kunisch K. An optimal control approach to optical flow computation. International Journal for Numerical Methods in Fluids. 2002;40:231–240. [Google Scholar]
  • 19.Borzì A, Ito K, Kunisch K. Optimal control formulation for determining optical flow. SIAM Journal on Scientific Computing. 2002;24:818–847. [Google Scholar]
  • 20.Borzì A, Schulz V. Multigrid methods for PDE optimization, SIAM Review. 2009;51:361–395. [Google Scholar]
  • 21.Borzì A, Schulz V. Computational optimization of systems governed by partial differential equations. SIAM; Philadelphia, Pennsylvania, US: 2012. [Google Scholar]
  • 22.Burger M, Modersitzki J, Ruthotto L. A hyperelastic regularization energy for image registration. SIAM Journal on Scientific Computing. 2013;35:B132–B148. [Google Scholar]
  • 23.Cao Y, Miller MI, Winslow RL, Younes L. Large deformation diffeomorphic metric mapping of vector fields, Medical Imaging. IEEE Transactions on. 2005;24:1216–1230. doi: 10.1109/tmi.2005.853923. [DOI] [PubMed] [Google Scholar]
  • 24.Chen K. PhD thesis. University of Bremen; 2011. Optimal control based image sequence interpolation. [Google Scholar]
  • 25.Chen K, Lorenz DA. Image sequence interpolation using optimal control. Journal of Mathematical Imaging and Vision. 2011;41:222–238. [Google Scholar]
  • 26.Christensen GE, Geng X, Kuhl JG, Bruss J, Grabowski TJ, Pirwani IA, Vannier MW, Allen JS, Damasio H. Introduction to the non-rigid image registration evaluation project. Proc Biomedical Image Registration. 2006;LNCS 4057:128–135. [Google Scholar]
  • 27.Crippa G. PhD thesis. University of Zürich; 2007. The flow associated to weakly differentiable vector fields. [Google Scholar]
  • 28.DiPerna RJ, Lions PL. Ordinary differential equations, transport theory and Sobolev spaces. Inventiones Mathematicae. 1989;98:511–547. [Google Scholar]
  • 29.Dontchev AL, Hager WW, Veliov VM. Second-order Runge–kutta approximations in control constrained optimal control. SIAM Journal on Numerical Analysis. 2000;38:202–226. [Google Scholar]
  • 30.Droske M, Rumpf M. A variational approach to non-rigid morphological registration. SIAM Journal on Applied Mathematics. 2003;64:668–687. [Google Scholar]
  • 31.Dupuis P, Gernander U, Miller MI. Variational problems on flows of diffeomorphisms for image matching. Quarterly of Applied Mathematics. 1998;56:587–600. [Google Scholar]
  • 32.Ewing RE, Wong H. A summary of numerical methods for time-dependent advection-dominated partial differential equations. Journal of Computational and Applied Mathematics. 2001;128:423–445. [Google Scholar]
  • 33.Fischer B, Modersitzki J. Ill-posed medicine – an introduction to image registration. Inverse Problems. 2008;24:1–16. [Google Scholar]
  • 34.Fohring J, Haber E, Ruthotto L. Geophysical imaging for fluid flow in porous media. SIAM Journal on Scientific Computing. 2014;36:S218–S236. [Google Scholar]
  • 35.Fornberg B. On a Fourier method for the integration of hyperbolic equations. SIAM Journal on Numerical Analysis. 1975;12:509–527. [Google Scholar]
  • 36.Frigo M, Johnson SG. The design and implementation of FFTW3. Proc of the IEEE. 2005;93:216–231. [Google Scholar]
  • 37.Gholami A, Mang A, Biros G. An inverse problem formulation for parameter estimation of a reaction-diffusion model of low grade gliomas. Journal of Mathematical Biology. 2016;72:409–433. doi: 10.1007/s00285-015-0888-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gill PE, Murray W, Wright MH. Practical optimization, Academic Press, Waltham, Massachusetts, US. 1981;9:12. [Google Scholar]
  • 39.Gunzburger MD. Perspectives in flow control and optimization. SIAM; Philadelphia, Pennsylvania, US: 2003. [Google Scholar]
  • 40.Gurtin ME. An introduction to continuum mechanics. Academic Press; 1981. (vol 158 of Mathematics in Science and Engineering). [Google Scholar]
  • 41.Haber E, Ascher UM. Preconditioned all-at-once methods for large, sparse parameter estimation problems. Inverse Problems. 2001;17:1847–1864. [Google Scholar]
  • 42.Haber E, Modersitzki J. Numerical methods for volume preserving image registration. Inverse Problems. 2004;20:1621–1638. [Google Scholar]
  • 43.Haber E, Modersitzki J. Image registration with guaranteed displacement regularity. International Journal of Computer Vision. 2007;71:361–372. [Google Scholar]
  • 44.Hager WW. Runge-Kutta methods in optimal control and the transformed adjoint system. Numerische Mathematik. 2000;87:247–282. [Google Scholar]
  • 45.Hajnal JV, Hill DLG, Hawkes DJ. Medical Image Registration. CRC Press; Boca Raton, Florida, US: 2001. [Google Scholar]
  • 46.Hart GL, Zach C, Niethammer M. An optimal control approach for deformable registration. Proc IEEE Conference on Computer Vision and Pattern Recognition. 2009:9–16. doi: 10.1109/CVPR.2009.5206565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Hernandez M. Gauss-Newton inspired preconditioned optimization in large deformation diffeomorphic metric mapping. Physics in Medicine and Biology. 2014;59:6085–6115. doi: 10.1088/0031-9155/59/20/6085. [DOI] [PubMed] [Google Scholar]
  • 48.Hernandez M, Bossa MN, Olmos S. Registration of anatomical images using paths of diffeomorphisms parameterized with stationary vector field flows. International Journal of Computer Vision. 2009;85:291–306. [Google Scholar]
  • 49.Herzog R, Kunisch K. Algorithms for PDE-constrained optimization. GAMM Mitteilungen. 2010;33:163–176. [Google Scholar]
  • 50.Hestenes MR, Stiefel E. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards. 1952;49:409–436. [Google Scholar]
  • 51.Hinze M, Pinnau R, Ulbrich M, Ulbrich S. Optimization with PDE constraints. Springer; Berlin, DE: 2009. [Google Scholar]
  • 52.Kalany E. Atmospheric modeling, data assimilation and predictability. Oxford University Press; 2002. [Google Scholar]
  • 53.Kreiss HO, Oliger J. Comparison of accurate methods for the integration of hyperbolic equations. Tellus. 1972;24:199–215. [Google Scholar]
  • 54.Kunisch K, Lu X. Optimal control for multi-phase fluid Stokes problems. Nonlinear Analysis. 2011;74:585–599. [Google Scholar]
  • 55.Lee E, Gunzburger M. An optimal control formulation of an image registration problem. Journal of Mathematical Imaging and Vision. 2010;36:69–80. [Google Scholar]
  • 56.Lions JL. Optimal control of systems governed by partial differential equations. Springer; 1971. [Google Scholar]
  • 57.Lorenzi M, Pennec X. Geodesics, parallel transport and one-parameter subgroups for diffeomorphic image registration. International Journal of Computer Vision. 2013;105:111–127. [Google Scholar]
  • 58.Mang A, Biros G. An inexact Newton–Krylov algorithm for constrained diffeomorphic image registration. SIAM Journal on Imaging Sciences. 2015;8:1030–1069. doi: 10.1137/140984002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Mang A, Biros G. Constrained H1-regularization schemes for diffeomorphic image registration. SIAM Journal on Imaging Sciences. 2016;9:1154–1194. doi: 10.1137/15M1010919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Mang A, Toma A, Schuetz TA, Becker S, Eckey T, Mohr C, Petersen D, Buzug TM. Biophysical modeling of brain tumor progression: from unconditionally stable explicit time integration to an inverse problem with parabolic PDE constraints for model calibration. Medical Physics. 2012;39:4444–4459. doi: 10.1118/1.4722749. [DOI] [PubMed] [Google Scholar]
  • 61.Modersitzki J. Numerical methods for image registration. Oxford University Press; New York: 2004. [Google Scholar]
  • 62.Modersitzki J. FAIR: Flexible algorithms for image registration. SIAM; Philadelphia, Pennsylvania, US: 2009. [Google Scholar]
  • 63.Nocedal J, Wright SJ. Numerical Optimization. Springer; New York, New York, US: 2006. [Google Scholar]
  • 64.Ruhnau P, Schnörr C. Optical Stokes flow estimation: An imaging-based control approach. Experiments in Fluids. 2007;42:61–78. [Google Scholar]
  • 65.Sdika M. A fast nonrigid image registration with constraints on the Jacobian using large scale constrained optimization. Medical Imaging, IEEE Transactions on. 2008;27:271–281. doi: 10.1109/TMI.2007.905820. [DOI] [PubMed] [Google Scholar]
  • 66.Simoncini V. Reduced order solution of structured linear systems arising in certain PDE-constrained optimization problems. Computational Optimization and Applications. 2012;53:591–617. [Google Scholar]
  • 67.Sotiras A, Davatzikos C, Paragios N. Deformable medical image registration: A survey. Medical Imaging, IEEE Transactions on. 2013;32:1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Staniforth A, Côté J. Semi-Lagrangian integration schemes for atmospheric models—A review. Montly Weather Review. 1991;119:2206–2223. [Google Scholar]
  • 69.Stoll M, Breiten T. A low-rank in time approach to PDE-contrained optimization. SIAM Journal on Scientific Computing. 2015;37:B1–B29. [Google Scholar]
  • 70.Trouvé A. Diffeomorphism groups and pattern matching in image analysis. International Journal of Computer Vision. 1998;28:213–221. [Google Scholar]
  • 71.Vercauteren T, Pennec X, Perchant A, Ayache N. Symmetric log-domain diffeomorphic registration: A demons-based approach. Proc Medical Image Computing and Computer-Assisted Intervention. 2008;LNCS 5241:754–761. doi: 10.1007/978-3-540-85988-8_90. [DOI] [PubMed] [Google Scholar]
  • 72.Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage. 2009;45:S61–S72. doi: 10.1016/j.neuroimage.2008.10.040. [DOI] [PubMed] [Google Scholar]
  • 73.Vialard FX, Risser L, Rueckert D, Cotter CJ. Diffeomorphic 3D image registration via geodesic shooting using an efficient adjoint calculation. International Journal of Computer Vision. 2012;97:229–241. [Google Scholar]
  • 74.Vialard FX, Santambrogio F. Extension to BV functions of the large deformation diffeomorphism matching approach. Comptes Rendus de l’Academié des Sciences. 2009;347:27–32. [Google Scholar]

RESOURCES