Abstract
Nonlinear registration is critical to many aspects of Neuroimaging research. It facilitates averaging and comparisons across multiple subjects, as well as reporting of data in a common anatomical frame of reference. It is, however, a fundamentally ill-posed problem, with many possible solutions which minimise a given dissimilarity metric equally well. We present a regularisation method capable of selectively driving solutions towards those which would be considered anatomically plausible by penalising unlikely lineal, areal and volumetric deformations. This penalty is symmetric in the sense that geometric expansions and contractions are penalised equally, which encourages inverse-consistency. We demonstrate that this method is able to significantly reduce local volume changes and shape distortions compared to state-of-the-art elastic (FNIRT) and plastic (ANTs) registration frameworks. Crucially, this is achieved whilst simultaneously matching or exceeding the registration quality of these methods, as measured by overlap scores of labelled cortical regions. Extensive leveraging of GPU parallelisation has allowed us to solve this highly computationally intensive optimisation problem while maintaining reasonable run times of under half an hour.
Keywords: Nonlinear registration, Elastic deformation, Regularisation, GPU Parallelisation, Anatomical plausibility, Diffeomorphism
1. Introduction
Nonlinear registration is commonly used in neuroimaging to deform images of individual brains into some common space (normalising both size and shape). Often the registration is based on structural, e.g., T1-weighted, images. The resulting warp is subsequently applied to both structural images as well as images depicting some function of interest such as BOLD or diffusion-derived connectivity. The rationale behind this is typically to facilitate statistical analysis of those functional data across subjects and populations.
An important distinction in this context is between volume- and surface-based methods. The former attempt to find the inter-subject mappings in the original (3D) space. The latter, in contrast, are typically only concerned with the cortex and attempt to inflate the folded 2D-manifold (the cortex) onto a sphere before finding the (2D) inter-subject mapping on that sphere (e.g., Fischl (2012); Robinson et al. (2014)). In this paper we present a novel volume based method.
Existing volume based methods can broadly be categorised based on:
Construction of warps: As part of the iterative process a warp can be updated by adding an update, or by warping the previous warps by an update. The former is often referred to as an elastic or a small deformation framework, as opposed to a plastic or large deformation framework for the latter (Miller et al., 2003). Note that for infinitesimally small displacements, composition of warps can be extended to the concept of integrating a velocity field (Beg et al., 2005).
Regularisation of warps: Warps can be regularised, i.e., have smoothness imposed on them, by adding a penalty term to the costfunction, or by explicitly smoothing the updates.
Similarity measure: This is a scalar measure that assesses the similarity/difference between the images and whose maximisation/mini-misation drives the registration.
The present paper is concerned mainly with the second of these, the regularisation, though we will touch upon the construction of warps and the effect of similarity metric choice as well. A thorough review of these categorisations can be found in Sotiras et al. (2013).
Historically, for the small deformation framework, one important aspect of the regularisation was to try to ensure that the warps stayed one-to-one and onto i.e., that each point in image A mapped to a unique point in image B, and that each point in image B could be reached from image A. This also means that the mapping is invertible. This was often attempted by adding a penalty term to the cost-function such that one simultaneously minimised the difference between the images (dissimilarity) and some function of the warps (regularisation). That function would often be borrowed from mechanics, such as bending energy (Bookstein, 1997), membrane energy (Amit et al., 1991) or linear-elastic energy (Miller et al., 1993) (whose formulations are summarised in Ashburner and Friston (1999)). However, none of those functions had any specifically biological relevance to the problem of mapping individual brains to each other. Moreover, they do not explicitly guarantee invertibility. So it was common practice to either ignore the issue of invertibility, or to empirically calibrate the relative weights of the image difference and the warp penalty so that the resulting warps were (almost) always invertible. But that typically meant a quite high weight for the regularisation term which resulted in an inability to properly model large deformations.
This limitation led to the development of the large deformation framework, where warps are updated by warping (resampling) the previous warp by an update warp (equivalent to a composition of multiple small warps). If both the previous and the update warps are invertible, this guarantees that the composed warp is also invertible. Therefore, as long as one ensures that each update warp is invertible the end result will be too. Furthermore, certain methods (such as LDDMM (Beg et al., 2005) and Geodesic Shooting (Ashburner and Friston, 2011)) seek solutions where the overall set of composed warps (or integrated velocity field) are as “smooth” (according to some criterion) as possible, rather than greedily smoothing at each step. The result is then that these methods are able to model large deformations while remaining invertible by taking many small update steps, leading to better matching than could be achieved within the small deformation framework.
However, invertibility is not the be-all and end-all for regularising warps. It does restrict the space of allowed warps, but some form of regularisation is still needed for choosing the optimal warp within that space. Moreover, it is no more clear in the large than in the small deformation framework how exactly that regularisation of the warps/ velocity fields should be performed. Even when maintaining invertibility one can, and frequently does, still obtain warps that correspond to implausibly large local changes in volume and/or shape (Ashburner and Ridgway, 2013; Mang and Biros, 2016; Coalson et al., 2018). For a more detailed discussion of some of the tools which address invertibility, see Section 4.2.
In this paper we suggest using a small deformation framework together with a penalty function that explicitly penalises changes in both volume and shape. Furthermore, this penalty approaches infinity as relative volumes approach zero (which would break invertibility), meaning that it can be given a small weight in order to allow large deformations and still guarantee diffeomorphic warps. Hence, it addresses the same issue of invertibility as LDDMM based methods as well as the issue of how to find a particular set of warps within that space. It should be stressed at this point that the penalty we propose is not the same, or even particularly similar to, previously suggested penalties based on some function of the Jacobian determinant (see for example Rohlfing et al. (2003) or Leow et al. (2007)). We should also be clear that even though we refer to this as an elastic deformation, the prior is not based on any linear elastic model with the ensuing small deformation assumption. We will refer to our penalty as the Symmetric Prior for the Regularisation of Elastic Deformations (SPRED). The mathematical formulation and theoretical basis of the penalty are explained in Section 2.3. It has been used in the past (Ashburner et al., 1999, 2000), but the computational cost is such that at the time it was not practically feasible. With the advent of general-purpose computing on graphics processing units (GPGPUs) that is no longer the case, and we describe our implementation using NVIDIA’s CUDA framework (NVIDIA, 2019) of a registration algorithm based on that penalty function in Section 2.4.
The rationale behind the suggested penalty function is twofold.
To minimise geometric distortions, i.e. changes in both shape and volume, and to ensure the warps are one-to-one and onto.
To be symmetric in the sense that a geometric (lineal, areal or volumetric) change by a factor of 2 is penalised the same as one by a factor of 0.5.
The first desideratum comes from the notion that brain tissue implements function, be that function integration of signal in the grey matter or transmission of signal in the white matter. The implementation of that function will necessarily occupy some physical space and it would seem unlikely that a given function can be implemented in a very small fraction of space in one brain compared to in another. A penalty should therefore favour no volume changes at all. It should also rapidly increase as the relative volumes become unrealistic, approaching infinity as the volume changes get close to breaking the one-to-one and onto criteria, thereby never allowing negative relative volumes. But it should also prevent unrealistic shape changes. Given what we know about the structure of both grey and white matter it is hard to picture a situation where a function implemented in a 1 × 1 × 1 mm3 cube in one brain might occupy a 0.01 × 10 × 10 mm3 sheet or a 0.1 × 0.1 × 100 mm3 stick in another. Note that this can all be reduced to penalising lineal changes since that would automatically also penalise shape and volume changes.
The second property, symmetry, comes from the desire that a registration algorithm should be inverse consistent, i.e., that the transform that is obtained from registering image A to image B is the inverse of that obtained from registering B to A (Christensen, 1999). A necessary, but not sufficient, condition for this is a prior that is symmetric in terms of penalising an expansion from A to B equally to the corresponding contraction from B to A.
2. Methods
2.1. Registration framework
We now broadly describe the registration framework within which our SPRED penalty is implemented in order that specific details of the implementation are more readily understandable.
SPRED forms part of the development of our new MultiModal Registration Framework (MMORF) tool. MMORF is a volumetric registration tool whose underlying transformation model is a 3D cubic B-spline parametrised free-form deformation. The parameter space over which the optimisation is performed therefore represents the spline co-efficients of three displacement fields (x, y and z-warps). A mean-squares (MSQ) similarity (or rather dissimilarity) metric is the data-consistency term optimised during registration. Regularisation penalties are calculated in the reference image space. A multi-smoothing, multi-resolution approach is employed to encourage convergence towards a globally optimal solution. From the outset MMORF has been designed to leverage GPU parallelisation, which is a largely why SPRED is computationally tractable within this framework.
The preferred optimisation strategy is the Levenberg variant of Gauss–Newton (Levenberg, 1944; Press et al., 2007). In this scheme, the Gauss-Newton Hessian H GN is replaced with H L, where:
| (1) |
λLis used to ensure that the update step always leads to a decrease in the cost function. If an update would successfully reduce the cost function λLis decreased, otherwise it is increased and the update step re-evaluated. Additionally, the same strategy is used to ensure no update step is ever taken which would lead to non-diffeomorphic warps. This is achieved by checking that the resulting Jacobian determinants are all positive before accepting any update step, and modifying λL accordingly. Due to memory constraints, the Hessian can only be stored on the GPU until a warp resolution of 5 mm isotropic. Beyond this resolution our implementation offers a choice between two optimisation methods that do not require explicit representation of the Hessian. One is the Scaled Conjugate Gradient (Moller, 1993) method, which is related to other quasi-Newton methods and which uses the history of cost-function changes from earlier iterations to calculate a step-length along the next direction, thereby eliminating the need for line minimisations. The other is the Majorise-Minimisation (MM) method (Hunter and Lange, 2004) which replaces the Hessian with a diagonal matrix where the ith value on the diagonal is the sum of absolute values of the elements in the ith column of the Gauss-Newton Hessian.
2.2. Intuition
In the following section we will detail how to calculate the suggested penalty, and why it is an ideal, albeit challenging, candidate for Single Instruction Multiple Data (SIMD) style parallelisation on a GPU (NVIDIA, 2019). For the estimation of the warp parameters we use the Gauss-Newton method, so we will also need to calculate the gradient of the penalty with respect to the spline coefficients of the warps and a Gauss-Newton approximation to the Hessian.
We attempt now to give an intuitive outline of the technical/mathe-matical formalism which follows in Section 2.3. At each voxel one can calculate a Jacobian matrix using the values of the surrounding warp B-spline coefficients. The singular values of this matrix represent the orthogonal scalings of that voxel due to the warp. The SPRED penalty (described in Section 2.3.2) is a function of these singular values, and we can therefore calculate each voxel’s contribution to the total penalty. Those calculations are a perfect match for SIMD parallelisation with one computational thread per voxel, where each thread performs identical calculations but on different data. The individual voxel contributions are subsequently summed (a reduction in GPU parlance) to obtain the total penalty for the warp.
In order to calculate the gradient of the penalty, one needs to apply the chain rule, which says that:
| (2) |
Remembering that for example g might be a vector-valued function of a vector-valued argument x which would make a vector and a matrix.
In our case the contribution to an element of the gradient from a given voxel is a scalar valued function of the vector of singular values of the local Jacobian matrix. These in turn are each a function of the vector of elements of the Jacobian matrix. Finally, these elements are functions of the vector of spline-coefficients whose support includes that voxel. The first two factors are again calculated using one thread per voxel, while the third factor uses one thread per spline-coefficient. Similarly each element of the total gradient is calculated by one thread per spline-coefficient.
To understand how the Hessian matrix is calculated consider a matrix A where each row corresponds to a voxel and each column to a warp spline-coefficient, and where A ij is the rate of change of the penalty contribution from voxel i with respect to coefficient j. The Gauss-Newton approximation to the Hessian then becomes A T A. However, for practical reasons, A is never explicitly calculated or stored and instead the Hessian H is calculated directly. In this instance we use one thread per non-zero element of H for the final calculation, but there are a number of preceding steps which employ the same one thread per voxel approach as in the gradient calculations.
2.3. Theory
We now formalise the concepts introduced in Section 2.2 in terms of the mathematics required for incorporating the SPRED penalty into the optimisation strategy of our registration framework.
2.3.1. Jacobian matrix
Consider the generic problem of registering some 3D moving image g to a reference image f, where both are defined on . Let x, y and z be the 3 orthogonal directions in . We define a transformation on the domain of f with parameters , such that g() becomes our transformed moving image. At each point (x,y,z):
| (3) |
| (4) |
| (5) |
Where is a displacement field. If defines a continuous, differentiable function on , then we may calculate a local Jacobian matrix J at any point in f, with:
| (6) |
| (7) |
Where is the partial derivative of the i component of in the j direction.
JThen describes how the image is locally scaled and rotated by the action of . These actions can be decoupled by applying singular value decomposition (SVD) to J, such that:
| (8) |
U and V are unitary matrices describing the rotational effect of J, and will not be considered further as they do not geometrically distort the image in any way. S is a diagonal matrix containing the singular values and represents the orthogonal scaling effect of J. It is the individual elements si of S which will be penalised in our regularisation, and we will refer to them as the Jacobian Singular Values (JSVs).
2.3.2. Penalty function
The penalty is based on the prior belief that the JSVs are drawn from a lognormal distribution. Such a penalty meets the biological plausibility arguments set out in Section 1 as:
Si = 1 is most likely
Si = 0 and Si = ∞ are infinitely unlikely
Si = a and Si = are equally likely
Now, we define C() as the penalty (or cost) associated with :
| (9) |
| (10) |
Where
C = total cost/penalty
= transformation parameters
N = total voxels in image
cn = cost/penalty at voxel n
v = voxel volume
J n = J n() = Jacobian matrix at voxel n
si = si() = i th singular value of J n
From Equation (10) we recognise that the log2 si term represents our lognormal prior on the JSVs. However this prior is based on the distribution of singular values only in f. Therefore we have also included a term v(1+|J n|) to ensure that our penalty is truly symmetric in g by accounting for the total volume in both images being penalised.
In practice, computation of the logarithm in Equation (9) (and its associated effect on the gradient and Hessian) may add a significant computational overhead (Ashburner et al., 2000). We therefore apply the following approximation (derived in Appendix A):
| (11) |
Recognising that:
| (12) |
and that:
| (13) |
where λi(A T A) denotes the ith eigenvalue of A T A and where si(A) denotes the ith singular value of A, we may then write:
| (14) |
| (15) |
An important benefit of this approximation is that our cost function is now an explicit function of only the elements of J, and therefore we never need to calculate the actual SVD of J.
2.3.3. Transformation model
The previous sections have used a generic transformation , but at this point it becomes necessary to define the actual transformation model used. As in Equation (5), we utilise a displacement field where each of the three directional components (dx, dy and dz) in is composed of a basis set of uniformly spaced, 3D, cubic B-splines (B x, B y and B z). Our transformation parameters are then defined to be the coefficients of each of the B-splines in our basis set. If M B-splines are required to fully cover the N voxels in our reference image f, then we require 3M parameters to fully describe (i.e. M parameters for each displacement direction).
2.3.4. Gradient and Hessian
The penalty function as defined in Section 2.3.2 will be included into a Gauss-Newton style optimisation framework, and therefore we require the gradient and the Gauss-Newton approximation to the Hessian of Equation (14). We note that for a valid application of Gauss-Newton (i.e., for being able to use the Gauss-Newton approximation to the Hessian) the function being minimised must be of the form (x)2, which Equation (15) is not (Chen, 2011). However, by making the substitution:
| (16) |
it becomes of that form. This means that:
| (17) |
and therefore that the Gauss-Newton Hessian approximation becomes:
| (18) |
i.e., it introduces a factor compared to how one would “normally”calculate the Gauss-Newton Hessian.
The gradient is a 3M × 1 column vector (where M is the number of B-splines used to represent one directional component of the warp-field), whose mth element is of the form:
| (19) |
Where :
Vm = {voxels in f where B — spline m has support}
where n ∈ Vm means that the summation is over all voxels n for which the B-spline m has support, and where is a vectorised version of the Jacobian at the nth voxel and is a vector with the three singular values of that Jacobian.
Similarly the Gauss-Newton Hessian is a 3M x 3M matrix whose jk th element is:
| (20) |
Where :
Vj = {voxels in f where B — spline j has support}
Vk = {voxels in f where B — spline k has support}
where n ∈ {Vj ⋂Vk} denotes summation over all voxels n for which both B-spline j and k have support. It should be noted that although 3M × 3M can be a very large number, especially for high warp–resolutions/small knot-spacings, the vast majority of elements are zero because most spline combinations jk have no overlap in support.
We will demonstrate the practicalities of actually calculating these entities in the sections which follow.
2.4. Implementation
We now present how we achieved computational tractability of the SPRED penalty, its gradient, and GN Hessian.
2.4.1. Parallelising calculation of gradient and Hessian
As with all but the simplest optimisation strategies, calculating the value of the SPRED penalty itself is not the computationally costly part of this algorithm. Rather, it is the corresponding gradient and Hessian calculations which are massively computationally complex. See, for example, Equation (19), where we have multiple applications of the chain rule to vector valued functions of vectors, leading to intermediate matrices. These intermediates then need to be multiplied together, further increasing the complexity. It is easy to see that this complexity is compounded even further in Equation (20) when calculating the Hessian.
If these calculations were all performed sequentially the problem would become computationally intractable, and the suggested form of regularisation would be a mathematical nicety with no practical impact.
However we will demonstrate how the problem can be framed in terms of the application of multiple massively parallelised SIMD operations, rendering it realistic to use.
As an example, let us consider how one might parallelise the gradient calculation in Equation (19). Firstly we note that the part:
| (21) |
depends only on entities dependent on which voxel n is currently being considered. Each of these 1 × 9 vectors can therefore be calculated and stored independently by one GPU thread. We represent these precalculated vectors as an N× 9 matrix (where N is the total number of voxels in image f) which we denote where the nth row is:
| (22) |
The columns of can be thought of as partial derivative images and can be visualised as demonstrated in Appendix B. Note also that in these calculations one can use the approximation given in Equation (15) to avoid explicit computation of the JSVs.
The next part:
| (23) |
is “constant” in the sense that its elements are given by the spatial derivatives of the B-spline basis-functions and can be calculated (analytically) once and for all for a given knot-spacing. If we denote a column-vectorised version of a single 3D B-spline basis–function (whose dimensions are given by the knot-spacing) by B we can create a matrix containing all the elements we need as:
| (24) |
where B (*) denotes the spline basis-function spatially differentiated in the ith direction and NS denotes the total size of the 3D spline (determined by the knot-spacing).
Finally each element VCm of the gradient can be calculated by independent threads as:
| (25) |
Where :
Vm = {voxels in f where B — spline m has support}
Lm = {elements of J n where B — spline m has an effect}
k ∈ Lmdenotes that k will select the three columns of that contribute to the mth element of the gradient. To be concrete, if 1 ≤ m ≤ M the element pertains to the x-component of the warp-field and k will select the columns of corresponding to the first row of J (see equation 6), and if M < m ≤ 2M it will select the columns of corresponding to the second row etc. The ‘ superscripts on n’ and k’ indicate transformed versions of n and k such that n’ takes into account the offset of the mth spline into the image, and k’ will select the corresponding column (regardless of row) in J.
Equation (25) can also serve as an illustration of some of the challenges of SIMD parallelisation. We have chosen a strategy where each element of ∇C is calculated by one thread. That choice means that we avoid having to synchronise writes to ∇C or having to perform reductions, both of which reduce occupancy and speed. The downside of that choice is that the different threads within a warp will access global memory in a non-coalesced fashion. This is because each thread will access a 3D sub-volume in each of the nine “partial derivative images” defined in Equation (22), where these sub-volumes are offset by an integer multiple of the knot-spacing in one or more dimensions. In the supplementary material there are timings and more detailed examples of the challenges of parallelising the proposed regulariser.
So, in summary the calculation of the gradient can be divided into three parts:
One which is separable over voxels (or more generally ‘‘sample points’’) where the calculations for one voxel can be performed and stored independently by one GPU thread.
One which is ‘‘constant’’ and can be performed once and for all and stored where the results can be accessed by all threads.
One which is separable over warp parameters (spline coefficients) where calculations for one warp parameter can be performed and stored independently by one GPU thread.
Following this line of reasoning, we may similarly divide the Hessian calculation of Equation (20) into portions which are constant, separable over samples, and separable over warp parameters.
As it is this process which is central to how using the SPRED penalty is made tractable, we provide an intuitive understanding of what implementing the parallelisation of Equation (19) actually looks like in practice by considering a simple 2D example. This can be found in Appendix B.
2.5. Testing
As assessing the performance of a registration tool is non-trivial, we here try to strike a balance between considering some measure of data-consistency, and some measure of anatomical plausibility. It is known that image similarity is a poor choice for measuring registration accuracy as it is just a proxy for what we are truly interested in, i.e., alignment of common anatomical structures (Rohlfing, 2012). Image-wide tissue maps are not a good option either, as they are global in nature and only really test tissue classification rather than anatomical alignment (Rohlfing, 2012). We therefore choose the overlap of manually segmented cortical regions as our data-consistency measure, which is an established method by which to assess the quality of registration (Klein et al., 2009).
The focus of this work is not primarily on assessing our method’s ability to maximise data consistency. However, if a set of warps was unable to produce high quality overlap results then it would not be of interest irrespective of how plausible the deformation is. High registration accuracy therefore simply allows us to evaluate the anatomical plausibility of the warps in a meaningful way.
For any method, regardless of the specifics of the regulariser, there is a trade-off between the image similarity term and regularisation term.
That trade-off is determined by their relative weights, something that is typically calibrated empirically (however see Simpson et al. (2012) for an attempt to determine it probabilistically). Therefore, in order to compare two methods in terms of plausible warps it is crucial to calibrate both methods so that they achieve (close to) identical overlap scores.
We ensured a high registration accuracy by using ANTs (Avants et al. (2008)) with similar settings to those used in Klein et al. (2009) as a yardstick for registration accuracy. To make sure warp comparisons were meaningful we calibrated the other methods to yield very similar overlap scores. If we could not achieve that for a method we did not include that in the comparison of warps.
We use two main indices for assessing what we consider to be anatomical plausibility. One is the Jacobian determinant that quantifies local volume change. We consider both the histogram of Jacobian-determinants within the brain, which provides information about how aggressively the warp is squeezing or expanding the volume on average, and how the Jacobian-determinants are spatially distributed, i.e., whether the deformation appears to be spatially sensible.
The second index is the cube-volume aspect ratio (CVAR, Smith and Wormald (1998)) which quantifies shape changes. In three dimensions it is the cube root of the ratio of the volume of the smallest possible enclosing cube to the actual volume. Similarly to the Jacobian determinant above we consider both the histograms and the spatial distributions of the CVAR.
2.5.1. RegiStration ToolS
In order to make a meaningful and relevant evaluation of the effect of the SPRED penalty on registration performance, four volumetric registration tools were included in this comparison, namely:
FLIRT ( Jenkinson and Smith, 2001 ) We chose to include a linear registration method in order to provide a yardstick by which to compare the overlap scores, and in particular the differences in overlap, of the other methods.
ANTs (Avants et al., 2008, 2014) ANTs has been shown to perform well in a direct comparison with other methods (Klein et al. (2009); Ou et al. (2014)). A method performing significantly worse would not be of any practical interest, so it provides a minimum accuracy bar for the other methods in the comparison. We ran ANTs both with cross correlation (ANTs-CC) (Avants et al. (2011)) and mean sum of squares (ANTs-MSQ) image similarity metrics. In both cases the greedy SyN transformation model was used.
FNIRT (Andersson et al., 2007) We included FNIRT because it uses the same image similarity and warp representation as MMORF, but employs a different strategy to ensure invertibility (Karacali and Davatzikos, 2004) and a different regulariser (bending energy).
MMORF Implements the regulariser (SPRED) that we propose in the present paper. In order to demonstrate just how different to a Jacobian Determinant regulariser (for example Rohlfing et al. ((2003) or Leow et al. ((2007)) SPRED is, we also implemented a regulariser based on the sum of squares of log-determinants of the Jacobians in the exact same framework.
Whilst there are myriad other tools that may also have been included in this comparison (e.g., SPM Unified Segmentation (Ashburner and Friston, 2005), DARTEL (Ashburner, 2007), elastix (Klein et al., 2010), MIRTK (Rueckert et al., 1999; Schnabel et al., 2001), Diffeomorphic Demons (Vercauteren et al., 2009), ART (Ardekani et al., 2005)), the chosen methods represent a broad enough overview of the types of tools available for the purposes of characterising the performance of MMORF.
2.5.2. Registration parameter selection
An important aspect when comparing registration algorithms is ensuring that the user-selectable parameters are as optimal as possible (Klein et al., 2009).
FNIRT was run with an optimised strategy proposed by the tool’s author. This consisted of a multi–resolution, multi-smoothing level registration, to a final knot spacing of 1 mm isotropic, and is explained in detail in Andersson et al. (2019).
ANTs-CC was run using the antsRegistrationSyn.sh script, but with the gradient step and smoothing values changed to match those supplied by the tool’s author for use in previously published comparative studies (Klein et al., 2009). ANTs-MSQ was run with hand tuned parameters. Both versions of ANTs employed a multi-resolution, multi-smoothing level approach, to a final resolution of 1 mm isotropic.
MMORF was run using parameters that were shown to perform well during the development of the tool. Again, a multi-resolution, multismoothing level approach was used, but with a final knot spacing of 1.25 mm.
Note that whilst significant effort has been made to ensure that each tool performs as well as possible, the overarching goal was not to definitively classify which tool performed best in terms of overlap scores. But rather to investigate how the SPRED penalty affects the nature of the deformations when compared to similar performing tools. In other words, the main goal was to calibrate their respective performances in terms of overlap accuracy so that they were not significantly different. Further details regarding parameter selection can be found in Appendix C.
2.5.3. Test Dataset
We have chosen to use the Non-rigid Image Registration Evaluation Project (NIREP) dataset (Christensen et al., 2006) in testing the registration methods. This dataset consists of the T1-weighted scans of 16 healthy subjects, eight male (average age 32.5 years) and eight female (average age 29.8 years). Each subject’s scan has been brain extracted, bias corrected, and 32 cortical regions (16 left hemisphere, 16 right) have been expertly hand-segmented. The segmented regions provide a ground truth for anatomical correspondence of a finer granularity than simple tissue maps.
2.5.4. Evaluation strategy
The registration tools were evaluated using pairwise registrations of each combination of the 16 subjects (240 warps in total).
Each of the 32 segmented cortical regions were transformed through these warps and various overlap metrics calculated, namely: Jaccard Coefficient, Dice Coefficient, Specificity, Sensitivity. The distributions of these overlaps in each region were then compared between registration methods.
Additionally, distributions of Jacobian determinants and of cubevolume aspect ratios (CVAR, see Appendix D for a definition) were calculated in order to gain some insight into how aggressive each method is in terms of volume and shape changes. These were inspected and compared as histograms, spatial maps and by summary statistics such as min, max, mean, 5th and 95th percentiles. The comparisons were performed only for the methods for which we achieved overlap scores that equalled those of ANTs-CC.
3. Results
3.1. Overlap scores
Combined left/right hemisphere overlap scores are shown in Fig. 1. All overlap measures showed similar trends, and therefore we focus only on the Jaccard Coefficient as our measure of choice. FLIRT, as expected, achieves the lowest levels of overlap, with mean overlaps varying depending on the region being considered. The change in overlap between FLIRT and the nonlinear methods also provides a useful yardstick against which to gauge the differences between those methods. All of the nonlinear methods, again as expected, improve the degree of overlap considerably. ANTs-MSQ has the lowest average overlap of the nonlinear methods. FNIRT has the second lowest overlap scores, although in certain areas it is outperformed by ANTs-MSQ. Note that the FNIRT results are significantly better than in previously reported comparisons (e.g. Ou et al. (2014) who used the parameters from Klein et al. (2009), which were unsuited for skull stripped data), but are in line with those from the tool’s author (Andersson et al., 2019). ANTs-CC and MMORF perform the best and equally well on average, but one or the other performs better in individual areas. These observations are supported when one considers the overall distributions of overlap scores in Fig. 2 which display the same trends.
Fig. 1. Comparative Jaccard coefficients.
Jaccard coefficients of 16 regions (32 original cortical segmentations combined left/right) for each of the 4 nonlinear and 1 linear registration methods. The FLIRT result provides a baseline for evaluating the improvement in overlap between linear and nonlinear methods. Overall ANTs-CC and MMORF perform similarly. In some areas ANTs-CC produces better results (e.g., cingulate gyrus), and in some area MMORF performs better (e.g., inferior parietal lobule). In all cases, both methods outperformed ANTs-MSQ and, to a lesser extent, FNIRT.
Fig. 2. Comparative Jaccard coefficients.
Distribution of Jaccard coefficients combined across all regions and subject pairs for each of the 4 nonlinear and 1 linear registration methods. This confirms the trends observed in Fig. 1, with MMORF and ANTs-CC performing best, followed by FNIRT and then ANTs-MSQ. The MMORF and ANTs-CC distributions are remarkably similar, despite the slight region-to-region differences in performance.
It is perhaps surprising that the two versions of ANTs produce such different results, given that they differ only in their choice of similarity metric. However, this is in line with what the creators of ANTs report themselves (Avants et al. (2011)) on the LPBA40 data set, so we believe this to be an accurate representation of their respective performances.
Because the best overlap performance we could achieve with ANTs-MSQ was significantly worse than that of MMORF and ANTs-CC we did not consider it meaningful to include it further in the comparison of warp metrics (Jacobian determinant and CVAR).
We did include FNIRT in one of the warp-comparisons, even though it did perform slightly worse than ANTs-CC and MMORF. That was because the difference was considerably less than for ANTs-MSQ, and also because it used the same image similarity metric and warp-construction as MMORF, differing only with respect to the strategy for ensuring invertibility and regularisation.
Our main interest was in the comparisons between ANTs-CC, being a very accurate and widely used method, and MMORF. The fact that we managed to calibrate the two methods so that they have almost identical overlap scores means that the warp distributions can be compared in a fair and meaningful way.
The joint distribution of overlap scores for MMORF and ANTs-CC for all regions and warps shown in Fig. 3 confirms that the two methods are on average very comparable.
Fig. 3. Jaccard coefficients.
Joint histogram of Jaccard coefficients across all regions and warps for MMORF and ANTs-CC. Note the tight, linear relationship, supporting the observation from Fig. 1 that MMORF and ANTs-CC have very comparable overall performance across the NIREP dataset.
3.2. Jacobian Determinant distributions
Fig. 4 shows the Jacobian determinant distribution of the warps generated with MMORF, ANTs-CC and FNIRT for a randomly selected image pair. We show both the non-log and log distributions since they highlight slightly different behaviours. It should be noted that although these warps represent a single pair of subjects, we could have chosen any of the 240 warps and the relationship would remain essentially the same.
Fig. 4. Jacobian determinant distributions.
Jacobian determinant and log-Jacobian determinant distribution of a randomly selected registration pair from the NIREP dataset. Only values within the brain itself have been considered. As desired, all three methods produce Jacobian determinants with few to no negative values. There are distinct differences in appearance though. FNIRT is the most different, with a visibly heavier negative log-Jacobian tail. This is an effect of projecting the Bending-Energy regularised registration onto a field without negative Jacobian determinants. ANTs-CC displays the next widest log-Jacobian distribution, but with more evenly weighted tails.
The Jacobian determinant distribution for FNIRT has a shape that is quite distinct from that of MMORF and ANTs-CC. It is seen very clearly in the non-log distribution where instead of tapering off smoothly towards zero there is an almost linear drop. This is most likely due to FNIRT projecting its warps onto the nearest B-Spline field which has no negative Jacobian determinants (Karacali and Davatzikos, 2004). The shapes of the distributions for MMORF and ANTs-CC are much more similar, but with ANTs-CC having a notably greater dispersion.
Of the three methods that had the most similar overlap scores it is clear that FNIRT causes substantially greater volume distortions, in addition to having slightly lower overlap scores. We therefore drop FNIRT from further analysis at this stage and focus on MMORF and ANTs-CC.
From the log distribution, the range of the 5th to 95th percentile for each method can be calculated. This was done for every warp, and the results are summarised in Fig. 5. It can be seen that both the mean and the dispersion of the ranges are considerably greater for ANTs-CC than for MMORF.Fig. 6 plots for each of the 240 registrations the log 5th–95th range for ANTs-CC against that of MMORF. In all but 4 cases the percentile range is smaller for MMORF.
Fig. 5. Log-Jacobian determinant range distributions.
The 5th to 95th percentile is used as a summary measure for the log-Jacobian determinant distributions in Fig. 4. The distribution of this percentile range over all 240 registrations is shown for the two best performing methods. ANTs-CC displays a significantly higher range on average, as well as a greater dispersion of ranges when compared to MMORF.
Fig. 6. Log-Jacobian determinant 5th to 95th percentile range.
Scatter plot of MMORF and ANTs-CC log-Jacobian determinant 5th to 95th percentile ranges. The key feature here is that ANTs-CC produces systematically wider ranges compared to MMORF, with only 4 of the 240 registrations favouring ANTs-CC. This is despite ANTs-CC and MMORF producing comparable Jaccard coefficients overall, as shown in Fig. 3.
3.3. CVAR distributions
Unlike the Jacobian determinant, the mean (across sample points within the brain) of the CVAR is a meaningful statistic. We therefore use that as our summary measure of shape distortions for a given warp. Fig. 7 demonstrates that the mean is both lower on average, and has smaller dispersion for MMORF compared to ANTs-CC.
Fig. 7. Mean CVAR distributions.
Distribution of mean CVAR within the brain over all 240 registrations is shown for the two best performing methods. ANTs-CC displays a significantly higher mean CVAR on average, as well as greater variance when compared to MMORF.
3.4. Spatial maps
In the previous section we showed that of the methods with comparable overlap scores ANTs-CC caused substantially more volume distortions than MMORF. In order to better understand the source of that difference we look at the spatial maps of the Jacobian determinants and of the CVAR. Fig. 8 shows maps of Jacobian determinants and Fig. 9 of CVAR for the same randomly selected pair of subjects as Fig. 4. There are clear visual differences between the results of the two methods. In areas where the T1-weighted signal intensity has strong contrast and carries information about the tissue type, such as along the cortex, both methods produce similar looking maps with varying amounts of expansion and compression. Where they differ in appearance is predominantly within areas displaying a relatively flat T1-weighted signal profile, such as in white matter. Here MMORF produces smoothly changing values with magnitudes close to 1. ANTs-CC by contrast displays highly oscillatory behaviour, with a higher proportion of values deviating away from 1.
Fig. 8. Jacobian determinant spatial maps.
Spatial distribution of Jacobian determinants for MMORF and ANTs. Note the distinct visual differences within the white matter for ANTs-CC. The SPRED penalty has resulted in smooth changes within this region, whilst ANTs-CC has produced high frequency variations. As a T1 image contains little information within the white matter, the anatomical plausibility of those rapid variations is not clear. By maintaining smoothness within the white matter, MMORF produces a result that is more anatomically believable given the information present within the images being registered. Importantly, the irregularity introduced by ANTs-CC is not necessary for achieving high overlap scores, as Figs. 1 and 3 show. Finally, whilst this may not have any deleterious effect on overlap scores of grey matter regions, the effect of transforming a modality rich in white matter information through such a warp could be significant.
Fig. 9. CVAR spatial maps.
Spatial distribution showing the Cube-Volume Aspect Ratio (CVAR) for MMORF and ANTs-CC. CVAR is a measure of shape change which extends the concept of aspect ratio to arbitrary dimensions, and is described in detail in Appendix D. Values above 1 correspond to increasing deviations from the original shape. Note that the appearance for MMORF and ANTs-CC is very similar to that in Fig. 8.
As a T1-weighted volume contains minimal information within the white matter, there is little reason to believe that the deformation should be anything other than smooth within these regions. Clearly this has not detrimentally affected the performance of ANTs-CC in terms of achieving high cortical overlap scores. However, if this warp was used to transform a modality rich in white-matter information (such as diffusion tensor imaging) the results could potentially be very deleterious.
By contrast, the MMORF result is more balanced. In areas where there is a large amount of information to drive the deformation (tissue boundaries) the warps are correspondingly larger and more variable, whereas areas with minimal information are left relatively unchanged.
When looking at Fig. 9 it should be noted that the particular (randomly selected) registration we chose for displaying spatial maps happens to be one of 65 (out of 240) for which the mean CVAR was greater for MMORF than for ANTs-CC.
4. Discussion
We have implemented and investigated a previously suggested method for regularising warps by penalising volume and shape distortions (Ashburner and Friston (1999) and Ashburner et al. (2000)). The regulariser has some very attractive properties, but is computationally very expensive. This meant that the implementation in the original paper used a voxel-by-voxel optimising scheme rather than a global optimiser, which can potentially cause order effects and convergence to a non-optimal minimum. We implemented the regulariser on a GPU, using a B-spline basis to represent the warps and a Gauss-Newton optimiser, resulting in run-times of under half an hour for a full multi-level registration. This has allowed us to run a large number of registrations and compare the results to one of the best performing algorithms in popular use (Avants et al. (2008); Klein et al. (2009)). We have been able to show that it performs as well as ANTs in terms of overlap scores, and that it achieves that with significantly less volume and shape distortions.
4.1. Relationship to earlier work
Pennec et al. (2005) suggested a very similar penalty function based on the concept of a Riemannian manifold for the allowed (diffeomorphic) warps. Other regularisers (Burger et al. (2013)) with similarities to that investigated by us are based on models for hyperelastic materials. These are models that describe the stress-strain relationships of materials, for example rubber, which are not well described by linear elastic models. These models have also been used as regularisation for correcting susceptibility artefacts in diffusion-weighted MRI (Ruthotto et al. (2012)). Similar hyperelastic models have additionally been used to model cortical growth in the developing brain (Knutsen et al., 2010) and for regularisation of surface based cortical registration (Robinson et al., 2018).
It should be noted that neither the present work, nor that in Ashburner et al. (2000), is explicitly based on hyperelastic models. Nor do we believe that hyperelastic models are necessarily meaningful descriptive generative models for explaining anatomical differences between subjects. Any such model (see for example Van Essen, 1997) is unlikely to be simple enough to be described by a few material constants. The reasoning behind our regularisation function is purely empirical, based on fulfilling our criteria for a useful function and on proving to produce smooth and plausible warps in parts of images with little or no anatomical information.
4.2. Beyond enforcing diffeomorphic warps
It is widely agreed that invertibility is a desirable property in a nonlinear spatial transform. There are two principle ways in which this can be enforced.
Warp Construction: Any warp that is a composition (integration in the limit) of diffeomorphic warps is itself diffeomorphic. Hence, methods that estimate warps as compositions of many small updates will by construction ensure that the end result is invertible.
Warp Penalisation: A highly nonlinear penalty function that goes to infinity as the local Jacobian determinant approaches zero will allow large deformations while maintaining invertibility. It should be noted that functions such as membrane energy or bending energy do not fall into this category.
For completeness both of these approaches will be discussed in more detail below. However, diffeomorphic warps are not the be all and end all. A warp can be invertible and yet highly unrealistic in that it causes very big volume and/or shape distortions. For this reason most methods that enforce invertibility using one of the methods mentioned above will combine that with one or more “traditional” regularisers that enforce smoothness. Ashburner and Ridgway (2013) have shown very convincingly that even within the space of diffeomorphic warps one will obtain very different solutions depending on the exact details and weights of the additional regularisers.
Hence, what we aim to achieve with our choice of regulariser goes much beyond just ensuring diffeomorphic warps. The aim is to find the maximally plausible, in terms of volume and shape distortions, of all possible warps within the space of diffeomorphisms. And to do this with a single penalty function (and a single weight) that achieves the joint objective of ensuring invertibility and simultaneously minimising volume and shape distortions.
We recognise that a judicious choice of forms and weights of additional penalty functions within inherently diffeomorphic frameworks, such as LDDMM and viscous fluid based methods, could potentially find an equally advantageous solution. But it is nevertheless the case that when comparing our regulariser to a state-of-the-art diffeomorphic method we were able to obtain invertible warps with equally good overlap scores and significantly less volume and shape distortions. Furthermore, there is no intrinsic superiority of inherently diffeomorphic methods over any other method that also guarantees invertibility, but achieves that with equally good, or better, registration accuracy.
4.2.1. Diffeomorphism by Warp construction
Methods such as ANTs fall into the category of inherently diffeo-morphic warps. In particular, ANTs is an example of a greedy approximation of the general LDDMM method. LDDMM methods seek to find a time varying velocity field which minimises a metric based on total path length in the space of diffeomorphisms. Originally introduced by Beg et al. (2005), these methods guarantee that the total deformation remains diffeomorphic when the velocity field is integrated over sufficiently small timesteps. A downside to these methods is the large number of parameters which need to be estimated at each timepoint. Tools such as DARTEL (Ashburner, 2007) seek to overcome this by instead parametrising a stationary velocity field, thereby greatly reducing the parameter space, but at the expense of potentially larger path lengths. Subsequently, Geodesic Shooting (Ashburner and Friston, 2011) based methods have been able to reformulate the original LDDMM problem such that instead of estimating the entire sequence of time-varying velocity fields, only the initial velocity need be estimated, thereby greatly reducing both the parameter space and convergence time of the optimisation.
Whilst all of these methods are capable of ensuring warps remain diffeomorphic, they differ from our method in that the prior on which they are based is that deformations follow the shortest path-lengths, rather than deformations conserving the shape of underlying anatomy. As such, diffeomorphism is an intrinsic rather than a controlled property of the transformation model. It is certainly not impossible to include explicit regularisation of shape changes in these models, however it is not the norm.
4.2.2. Diffeomorphism by Warp Penalisation
An alternative method of enforcing diffeomorphism is by using a penalty function that goes to infinity as the local Jacobian determinant approaches zero.
Most commonly, the Jacobian determinant itself is used as a hard constraint that bounds the allowed range (Haber and Modersitzki, 2004; Sdika, 2008; 2013; Haber et al., 2010), or fixes it to a particular value (Mansi et al., 2011), or a smoothly varying function of the Jacobian determinant is used as a soft constraint (Borzì et al., 2003; Yanovsky et al., 2007; Leow et al., 2007; Heyde et al., 2016; Mang et al., 2018). It should be noted that when modelling the warps as a velocity field the local divergence can be used as a proxy for the Jacobian determinant.
The crucial difference between our penalty and those based on a function of the Jacobian determinant is that the latter only penalises volume distortions. Not only does that mean that shape changes are not penalised, in practice such a function will result in a method where very large shape changes are used to circumvent the volume change limitations. In Appendix E we show the results obtained with a Jacobian determinant penalty function where the weight was calibrated so as to yield the same overlap score as our SPRED penalty. It can be seen that the Jacobian determinant penalty yields equally good overlap scores (Fig. E.1 and E.2) and results in warps that are dif-feomorphic, with a very narrow range of Jacobian determinants (Fig. E.3 and E.5). However, this has been achieved by completely ignoring shape changes, and has resulted in much larger CVARs (Fig. E.4 and E.6). Looking at a randomly selected warp (Fig. E.7) it is very clear that the Jacobian determinant penalty has resulted in lots of gratuitous distortions.
This is the reason that many groups have combined a hard or soft constraint on volume changes with an additional regulariser (see for example Haber et al. (2010), Yanovsky et al. (2007), Leow et al. (2007) or Mang et al. (2018)). While that may work well, it means that an additional weight factor needs to be empirically determined.
There is another group of algorithms (Loeckx et al., 2004; Ruan et al., 2006; Staring et al., 2007; Modersitzki, 2008; Mang and Biros, 2016) that use functions that penalise deviations from local rigidity (i.e. anything beyond local translation and rotation). In that respect they have similarities to our penalty, but they have mostly been used to enforce near-total rigidity in selected parts of images containing a mix of rigid tissue (e.g., bone) and non-rigid tissue (e.g., muscle).
Within this category of registration methods, Loeckx et al. (2004) and Mang and Biros (2016) are probably the most closely related to ours of which we are aware. Mang and Biros (2016) acknowledge the issue of excessive shape changes and seek to combat it by explicitly penalising the shear of the velocity field. In Appendix A we show the relationship between our SPRED penalty and the local rigidity penalty of Loeckx et al. (2004) . We present the results of using this penalty in n
Appendix E, with the weighting calibrated so as to yield the same overlap score as SPRED. It can be seen that the local rigidity penalty yields equally good overlap scores (Fig. E.1 and E.2), but it is unable to preserve diffeomorphism. Whilst the CVAR values are mostly well controlled (Fig. E4 and E6), looking at a randomly selected warp (Fig. E.7) demonstrates that topology is not being preserved in the cortex.
4.3. Other paralleled algorithms
It is increasingly common for registration tools to employ some form of parallelisation, and it is worth contextualising our approach in comparison to some of these methods.
Some of the Insight Segmentation and Registration Toolkit (Yoo et al., 2002) methods use CPU multithreading on a single device to accelerate the portions of their code which solve the linear system of equations necessary to compute update steps. This is a simple method to implement, but gains tend to be modest.
A recently described (Mang et al., 2018) parallelised version of Mang and Biros, 2016 utilises distributed-memory parallelism on multiple CPU nodes. It uses a Gauss-Newton optimisation strategy, and solves for a stationary velocity field transformation. The primary focus of the parallelism is in the inversion, rather than the calculation, of the Hessian.
Possibly the most closely related method to ours is NiftyReg (Modat et al., 2010) which employs a B–spline transformation and GPU parallelisation. NiftyReg differs in using a first order optimisation strategy and therefore does not require calculation of the Hessian. Reported performance improvements are very similar to ours, indicating comparable levels of code optimisation. Their impressive sub-minute runtimes are aided by a computationally simpler bending energy regularisation metric. Therefore a direct comparison of runtimes to our method using SPRED is not meaningful.
As discussed in Eklund et al., 2013, most applications of GPU parallelisation to medical image registration focus on speeding up existing methods. However, an interesting alternative is the development of more advanced registration methods which might otherwise have been rejected on the basis of computational complexity.
4.4. Registration accuracy
As we have previously stated, absolute registration accuracy is not the primary focus of this work. Instead it stands as a reference point for our discussion of anatomical plausibility, in that only two methods which are comparable in terms of registration accuracy can be meaningfully differentiated based on anatomical plausibility.
Based on the overlap scores in Section 3.1 we see that MMORF and ANTs-CC are clearly the best performing tools in terms of registration accuracy. Additionally there is very little to differentiate between their performance as a whole. We note that the results for ANTs-CC are slightly better than those which have been previously reported (Ou et al., 2014) for the same validation data set. We therefore believe that the way we have used ANTs-CC has yielded a close to optimal performance on this dataset.
4.5. Anatomical plausibility
Based on the summary 5th to 95th percentile log-Jacobian determinant range metric, MMORF is significantly more anatomically plausible than ANTs-CC. Fig. 6 supports this argument, by showing that this is true for the vast majority of the registrations. In other words, for a given registration accuracy one is almost guaranteed to have smaller volumetric changes when using MMORF over ANTs-CC. Another way of framing this result is to say that larger volume distortions are not a necessary trade-off for achieving high registration accuracy.
A striking feature of the Jacobian-range (Fig. 8) and CVAR maps (Fig. 9) are the seemingly gratuitous warps in white matter. This could potentially be a particular problem if one intends to use the structural registration to transform diffusion data. We believe this to be a strong case for extending the notion of anatomical plausibility beyond that of simply maintaining diffeomorphism and for building that notion into the regularisation.
Why this behaviour is observed in ANTs-CC is an interesting question in and of itself. At first this might be thought to be a case of insufficient regularisation, however the deformations generated by ANTs-MSQ (data not shown) did not display this same behaviour. Therefore we must posit that this is due to the somewhat scale-free nature of the CC metric. In other words, the gradient of CC does not depend on the amount of signal contrast present, only on the local correlation of that signal. What this means in practice is that regularisation on the level of the gradient cannot alter this behaviour, rather regularisation of the metric itself would be required. How to achieve this is beyond the scope of this work, however it highlights the importance of taking a considered approach to evaluating the believability of a deformation.
4.6. Limitations of the current work
The aim of the present paper is to introduce the SPRED penalty and to show that it can be used to yield large deformation, diffeomorphic and anatomically plausible warps. It is not yet a finished registration package that deals with differences in contrast, receive bias-field or B1 in-homogeneities. These will be the subject of future work. Neither have we applied it to data where very large deformations are needed, such as when registering images of severely atrophied brains or inter-species registration. Finally, whilst our regularisation penalty is symmetric, our similarity measure is not. Thus, for the overall algorithm to be truly symmetric we would ideally either modulate our similarity measure by (1+|J|) (Tagare et al., 2009), or simultaneously register both images to a mid-space (Avants et al., 2008).
4.7. Performance summary
The SPRED penalty allows MMORF to overcome the oft-quoted limitations of small deformation frameworks and achieve levels of registration accuracy comparable to state-of-the-art large deformation tools such as ANTs-CC. MMORF achieves this whilst at the same time producing warps which are consistently less aggressive in terms of both volume and shape distortions. Finally, given the information present in a structural scan, the spatial distribution of the log-Jacobian determinants MMORF produces appear more plausible compared to ANTs-CC, which produces variations that are not obviously necessary.
Overall we conclude that SPRED as implemented within MMORF produces warps which can be used with a high level of confidence. MMORF’s registration accuracy is on par with state-of-the-art volumetric methods, and therefore no sacrifice need be made in terms of maximising data consistency.
5. Conclusion and outlook
We have demonstrated that our SPRED penalty is capable of matching the registration accuracy (as measured using overlap scores of manually segmented cortical regions) of the most well established large-deformation (ANTs-CC) framework available today. Additionally, we find that the results from using the SPRED penalty are consistently more anatomically plausible both in terms of Jacobian determinants and CVARs. We leverage GPU parallelisation in order to make optimisation of the penalty computationally tractable, allowing this method to be of practical benefit as part of a useable registration framework.
We have shown that using a regularisation function such as this can overcome the shortcomings of elastic small deformation frameworks, yielding the best of both worlds: large deformation, diffeomorphic warps with minimal distortions and high registration accuracy. Future work will focus on extending MMORF to include the simultaneous registration of scalar and tensor modalities (see, e.g., Irfanoglu et al. (2016)). Furthermore we wish to investigate the effect of spatially varying the weighting of the SPRED penalty based on prior information regarding tissue variability (e.g., allowing larger deformations within the ventricles).
Supplementary Material
Acknowledgements
The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust [grant number 203139/Z/16/Z]. The Wellcome Centre for Human Neuroimaging is supported by core funding from the Wellcome Trust [grant number 203147/Z/16/Z].
This work was supported by funding from the Engineering and Physical Sciences Research Council (EPSRC) and Medical Research Council (MRC) [grant number EP/L016052/1]; and the NVIDIA Corporation GPU grant program.
FJL is supported by The Oppenheimer Memorial Trust, St Catherine’s College Oxford and The Rotary Foundation.
JLRA is supported by a Wellcome Trust Strategic Award [grant number 098369/Z/12/Z]; and the National Institutes of Health (NIH) Human Connectome Project [grant numbers 1U01MH109589-01 and 1U01AG052564–01].
The authors wish to thank all of the reviewers for giving of their time. We extend particular thanks to Reviewer 3 for providing the local rigidity interpretation of our penalty.
Footnotes
Declaration of competing interest
We have no conflicts of interest to declare.
Contributor Information
John Ashburner, Email: j.ashburner@ucl.ac.uk.
Stephen M. Smith, Email: steve@fmrib.ox.ac.uk.
Jesper L.R. Andersson, Email: jesper.andersson@ndcn.ox. ac.uk.
References
- Amit Y, Grenander U, Piccioni M. Structural image restoration through deformable templates. J Am Stat Assoc. 1991;86(414):376–387. [Google Scholar]
- Andersson JLR, Jenkinson M, Smith S. Non-linear registration, aka spatial normalisation. FMRIB Technical Report. 2007:TR07JA2 [Google Scholar]
- Andersson JLR, Smith SM, Jenkinson M. High Resolution Nonlinear Registration with Simultaneous Modelling of Intensities. bioRxiv. 2019:646802. URL http://biorxiv.org/content/early/2019/05/22/646802.abstract. [Google Scholar]
- Ardekani BA, Guckemus S, Bachman A, Hoptman MJ, Wojtaszek M, Nierenberg J. Quantitative comparison of algorithms for inter-subject registration of 3D volumetric brain MRI scans. J Neurosci Methods. 2005;142(1):67–76. doi: 10.1016/j.jneumeth.2004.07.014. [DOI] [PubMed] [Google Scholar]
- Ashburner J. A fast diffeomorphic image registration algorithm. Neuroimage. 2007;38(1):95–113. doi: 10.1016/j.neuroimage.2007.07.007. [DOI] [PubMed] [Google Scholar]
- Ashburner J, Friston KJ. Nonlinear spatial normalization using basis functions. Hum Brain Mapp. 1999;7(4):254–266. doi: 10.1002/(SICI)1097-0193(1999)7:4<254::AID-HBM4>3.0.CO;2-G. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner J, Friston KJ. Unified segmentation. Neuroimage. 2005;26(3):839–851. doi: 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
- Ashburner J, Friston KJ. Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation. Neuroimage. 2011 Apr;55(3):954–967. doi: 10.1016/j.neuroimage.2010.12.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner J, Ridgway GR. Symmetric diffeomorphic modeling of longitudinal structural MRI. Front Neurosci. 2013;6(197):1–19. doi: 10.3389/fnins.2012.00197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner J, Andersson JL, Friston KJ. High-dimensional image registration using symmetric priors. Neuroimage. 1999;9(6 Pt 1):619–628. doi: 10.1006/nimg.1999.0437. [DOI] [PubMed] [Google Scholar]
- Ashburner J, Andersson JL, Friston KJ. Image registration using a symmetric prior-in three dimensions. Hum Brain Mapp. 2000 Apr;9(4):212–225. doi: 10.1002/(SICI)1097-0193(200004)9:4<212::AID-HBM3>3.0.CO;2-#. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal. 2008 Feb;12(1):26–41. doi: 10.1016/j.media.2007.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011;54(3):2033–2044. doi: 10.1016/j.neuroimage.2010.09.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avants BB, Tustison NJ, Stauffer M, Song G, Wu B, Gee JC. The Insight ToolKit image registration framework. Front Neuroinf. 2014;8(44):1–13. doi: 10.3389/fninf.2014.00044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beg MF, Miller MI, Trouvé A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int J Comput Vis. 2005;61(2):139–157. [Google Scholar]
- Bookstein FL. Quadratic variation of deformations. In: Duncan J, Gindi G, editors. Information Processing in Medical Imaging LNCS, 1230. Springer; Berlin, Heidelberg: 1997. pp. 15–28. [Google Scholar]
- Borzì A, Ito K, Kunisch K. Optimal control formulation for determining optical flow. SIAM J Sci Comput. 2003;24(3):818–847. [Google Scholar]
- Burger M, Modersitzki J, Ruthotto L. A hyperelastic regularization energy for image registration. SIAM J Sci Comput. 2013;35(1):B132–B148. [Google Scholar]
- Chen P. Hessian matrix vs. Gauss-Newton Hessian matrix. Siam. 2011;49(4):1417–1435. [Google Scholar]
- Christensen GE. Consistent linear-elastic transformations for image matching. In: Kuba A, Sáamal M, Todd-Pokropek A, editors. Information Processing in Medical Imaging. Springer; Berlin, Heidelberg: 1999. pp. 224–237. (LNCS, 1613). [Google Scholar]
- Christensen GE, Geng X, Kuhl JG, Bruss J, Grabowski TJ, Pirwani IA, Vannier MW, Allen JS, Damasio H. Introduction to the non-rigid image registration evaluation Project (NIREP) In: Pluim JPW, Likar B, et al., editors. Biomedical Image Registration, WBIR. Springer; Berlin, Heidelberg: 2006. pp. 128–135. (LNCS, 4057). [Google Scholar]
- Coalson TS, Van Essen DC, Glasser MF. The impact of traditional neuroimaging methods on the spatial localization of cortical areas. Proc Natl Acad Sci Unit States Am. 2018;115(27):E6356–E6365. doi: 10.1073/pnas.1801582115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eklund A, Dufort P, Forsberg D, LaConte SM. Medical image processing on the GPU - past, present and future. Med Image Anal. 2013;17(8):1073–1094. doi: 10.1016/j.media.2013.05.008. [DOI] [PubMed] [Google Scholar]
- Fischl B. FreeSurfer. NeuroImage. 2012;62(2):774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber E, Horesh R, Modersitzki J. Numerical Optimization for Constrained Image Registration. Numerical Linear Algebra with Applications 00. 2010 [Google Scholar]
- Haber E, Modersitzki J. Numerical methods for volume preserving image registration. Inverse Probl. 2004;20(5):1621–1638. [Google Scholar]
- Heyde B, Alessandrini M, Hermans J, Barbosa D, Claus P, D’hooge J. Anatomical image registration using volume conservation to assess cardiac deformation from 3D ultrasound recordings. IEEE Trans Med Imag. 2016;35(2):501–511. doi: 10.1109/TMI.2015.2479556. [DOI] [PubMed] [Google Scholar]
- Hunter DR, Lange K. A tutorial on MM algorithms. Am Statistician. 2004;58(1):30–37. [Google Scholar]
- Irfanoglu MO, Nayak A, Jenkins J, Hutchinson EB, Sadeghi N, Thomas CP, Pierpaoli C. DR-TAMAS: diffeomorphic registration for tensor accurate alignment of anatomical structures. Neuroimage. 2016;132:439–454. doi: 10.1016/j.neuroimage.2016.02.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med Image Anal. 2001;5(2):143–156. doi: 10.1016/s1361-8415(01)00036-6. [DOI] [PubMed] [Google Scholar]
- Klein S, Staring M, Murphy K, Viergever M, Pluim J. Elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imag. 2010;29(1):196–205. doi: 10.1109/TMI.2009.2035616. [DOI] [PubMed] [Google Scholar]
- Knutsen AK, Chang YV, Grimm CM, Phan L, Taber LA, Bayly PV. A new method to measure cortical growth in the developing brain. J Biomech Eng. 2010;132(10):101004. doi: 10.1115/1.4002430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karacali B, Davatzikos C. Estimating topology preserving and smooth displacement fields. IEEE Trans Med Imag. 2004;23(7):868–880. doi: 10.1109/TMI.2004.827963. [DOI] [PubMed] [Google Scholar]
- Klein A, Andersson J, Ardekani BA, Ashburner J, Avants B, Chiang MC, Christensen GE, Collins DL, Gee J, Hellier P, Song JH, et al. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. Neuroimage. 2009;46(3):786–802. doi: 10.1016/j.neuroimage.2008.12.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leow AD, Yanovsky I, Chiang M-C, Lee AD, Klunder AD, Lu A, Becker JT, Davis SW, Toga AW, Thompson PM. Statistical properties of jacobian maps and the realization of unbiased large-deformation nonlinear image registration. IEEE Trans Med Imag. 2007;26(6):822–832. doi: 10.1109/TMI.2007.892646. [DOI] [PubMed] [Google Scholar]
- Levenberg K. A method for the solution of certain non-linear problems in least squares. Q Appl Math. 1944;2(2):164–168. [Google Scholar]
- Loeckx D, Maes F, Vandermeulen D, Suetens P. Nonrigid image registration using free-form deformations with a local rigidity constraint. In: Barillot C, Haynor DR, Hellier P, editors. Medical Image Computing and Computer-Assisted Intervention. Springer; Berlin, Heidelberg: 2004. pp. 639–646. (LNCS, 3216). [Google Scholar]
- Mang A, Biros G. Constrained H1-Regularization schemes for diffeomorphic image registration. SIAM J Imag Sci. 2016;9(3):1154–1194. doi: 10.1137/15M1010919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mang A, Gholami A, Davatzikos C, Biros G. CLAIRE: a distributed-memory solver for constrained large deformation diffeomorphic image registration. 2018 doi: 10.1137/18m1207818. URL http://arxiv.org/abs/1808.04487. [DOI] [PMC free article] [PubMed]
- Mansi T, Pennec X, Sermesant M, Delingette H, Ayache N. iLogDemons: a demons-based registration algorithm for tracking incompressible elastic biological tissues. Int J Comput Vis. 2011;92(1):92–111. [Google Scholar]
- Miller MI, Christensen GE, Amitt Y, Grenander ULF. Mathematical textbook of deformable neuroanatomies. Proc Natl Acad Sci U S A. 1993;90(24):11944–11948. doi: 10.1073/pnas.90.24.11944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller M, Banerjee A, Christensen G, Joshi S, Khaneja N, Grenander U, Matejic L. Statistical methods in computational anatomy. Stat Methods Med Res. 2003;6(3):267–299. doi: 10.1177/096228029700600305. [DOI] [PubMed] [Google Scholar]
- Modat M, Ridgway GR, Taylor ZA, Lehmann M, Barnes J, Hawkes DJ, Fox NC, Ourselin S. Fast free-form deformation using graphics processing units. Comput Methods Progr Biomed. 2010;98(3):278–284. doi: 10.1016/j.cmpb.2009.09.002. [DOI] [PubMed] [Google Scholar]
- Modersitzki J. FLIRT with rigidity-image registration with a local non-rigidity penalty. Int J Comput Vis. 2008;76(2):153–163. [Google Scholar]
- Moller MF. A scaled conjugate gradient algorithm for fast supervised learning. Neural Network. 1993;6(4):525–533. [Google Scholar]
- Nvidia. CUDA C programming guide. 2019 URL https://docs.nvidia.com/cuda/cuda-c-programming-guide/
- Ou Y, Akbari H, Bilello M, Da X, Davatzikos C. Comparative evaluation of registration algorithms in different brain databases with varying difficulty: results and insights. IEEE Trans Med Imag. 2014;33(10):2039–2065. doi: 10.1109/TMI.2014.2330355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennec X, Stefanescu R, Arsigny V, Fillard P, Ayache N. Riemannian elasticity: a statistical regularization framework for non-linear registration. In: Duncan JS, Gerig G, editors. Medical Image Computing and Computer-Assisted Intervention. Springer; Berlin, Heidelberg: 2005. pp. 943–950. (LNCS). [DOI] [PubMed] [Google Scholar]
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes : the Art of Scientific Computing. Cambridge University Press; 2007. [Google Scholar]
- Robinson EC, Garcia K, Glasser MF, Chen Z, Coalson TS, Makropoulos A, Bozek J, Wright R, Schuh A, Webster M, Hutter J, et al. Multimodal surface matching with higher-order smoothness constraints. Neuroimage. 2018;167:453–465. doi: 10.1016/j.neuroimage.2017.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson EC, Jbabdi S, Glasser MF, Andersson J, Burgess GC, Harms MP, Smith SM, Van Essen DC, Jenkinson M. MSM: a new flexible framework for Multimodal Surface Matching. Neuroimage. 2014;100:414–426. doi: 10.1016/j.neuroimage.2014.05.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohlfing T. Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE Trans Med Imag. 2012;31(2):153–163. doi: 10.1109/TMI.2011.2163944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohlfing T, Maurer CR, Bluemke DA, Jacobs MA. Volume-preserving nonrigid registration of MR breast images using free-form deformation with an incompressibility constraint. IEEE Trans Med Imag. 2003;22(6):730–741. doi: 10.1109/TMI.2003.814791. [DOI] [PubMed] [Google Scholar]
- Ruan D, Fessler JA, Roberson M, Balter J, Kessler M. Nonrigid registration using regularization that accomodates local tissue rigidity. In: Reinhardt JM, Pluim JPW, editors. Medical Imaging 2006: Image Processing, 6144. SPIE; 2006. pp. 346–354. [Google Scholar]
- Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ. Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans Med Imag. 1999;18(8):712–721. doi: 10.1109/42.796284. [DOI] [PubMed] [Google Scholar]
- Ruthotto L, Kugel H, Olesch J, Fischer B, Modersitzki J, Burger M, Wolters CH. Diffeomorphic susceptibility artifact correction of diffusion-weighted magnetic resonance images. Phys Med Biol. 2012;57(18):5715–5731. doi: 10.1088/0031-9155/57/18/5715. [DOI] [PubMed] [Google Scholar]
- Schnabel JA, Rueckert D, Quist M, Blackall JM, Castellano-Smith AD, Hartkens T, Penney GP, Hall WA, Liu H, Truwit CL, Gerritsen FA, et al. A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations. In: Niessen WJ, Viergever MA, editors. Medical Image Computing and Computer-Assisted Intervention. Springer; Berlin, Heidelberg: 2001. pp. 573–581. (LNCS, 2208). [Google Scholar]
- Sdika M. A fast nonrigid image registration with constraints on the jacobian using large scale constrained optimization. IEEE Trans Med Imag. 2008;27(2):271–281. doi: 10.1109/TMI.2007.905820. [DOI] [PubMed] [Google Scholar]
- Sdika M. A sharp sufficient condition for B-spline vector field invertibility. Application to diffeomorphic registration and interslice interpolation. SIAM J Imag Sci. 2013;6(4):2236–2257. [Google Scholar]
- Simpson IJ, Schnabel JA, Groves AR, Andersson JL, Woolrich MW. Probabilistic inference of regularisation in non-rigid registration. Neuroimage. 2012;59(3):2438–2451. doi: 10.1016/j.neuroimage.2011.09.002. [DOI] [PubMed] [Google Scholar]
- Smith W, Wormald N. Geometric separator theorems and applications; Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280); 1998. pp. 232–243. [Google Scholar]
- Sotiras A, Davatzikos C, Paragios N. Deformable medical image registration: a survey. IEEE Trans Med Imag. 2013;32(7):1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staring M, Klein S, Pluim JP. A rigidity penalty term for nonrigid registration. Med Phys. 2007;34(11):4098–4108. doi: 10.1118/1.2776236. [DOI] [PubMed] [Google Scholar]
- Tagare HD, Groisser D, Skrinjar O. Symmetric non-rigid registration: a geometric theory and some numerical techniques. J Math Imag Vis. 2009;34(1):61–88. [Google Scholar]
- Van Essen DC. A tension-based theory of morphogenesis and compact wiring in the central nervous system. Nature. 1997;385(6614):313–318. doi: 10.1038/385313a0. [DOI] [PubMed] [Google Scholar]
- Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: efficient non-parametric image registration. Neuroimage. 2009;45(1 Suppl l):S61–S72. doi: 10.1016/j.neuroimage.2008.10.040. [DOI] [PubMed] [Google Scholar]
- Yanovsky I, Thompson PM, Osher S, Leow AD. Topology preserving log unbiased nonlinear image registration: theory and implementation; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoo TS, Ackerman MJ, Lorensen WE, Schroeder W, Chalana V, Aylward S, Metaxas D, Whitaker R. Engineering and algorithm design for an image processing API: a technical report on ITK - the Insight Toolkit. Stud Health Technol Inf. 2002;85:586–592. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









