Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 13.
Published in final edited form as: Neuroimage. 2016 Sep 7;143:235–249. doi: 10.1016/j.neuroimage.2016.09.011

Fast and Sequence-Adaptive Whole-Brain Segmentation Using Parametric Bayesian Modeling

Oula Puonti a,*, Juan Eugenio Iglesias b,d, Koen Van Leemput a,c
PMCID: PMC8117726  NIHMSID: NIHMS817279  PMID: 27612647

Abstract

Quantitative analysis of magnetic resonance imaging (MRI) scans of the brain requires accurate automated segmentation of anatomical structures. A desirable feature for such segmentation methods is to be robust against changes in acquisition platform and imaging protocol. In this paper we validate the performance of a segmentation algorithm designed to meet these requirements, building upon generative parametric models previously used in tissue classification. The method is tested on four different datasets acquired with different scanners, field strengths and pulse sequences, demonstrating comparable accuracy to state-of-the-art methods on T1-weighted scans while being one to two orders of magnitude faster. The proposed algorithm is also shown to be robust against small training datasets, and readily handles images with different MRI contrast as well as multi-contrast data.

Keywords: MRI, Segmentation, Atlases, Parametric models, Bayesian modeling

1. Introduction

So-called whole-brain segmentation techniques aim to automatically label a multitude of cortical and subcortical regions from brain MRI scans. Recent years have seen tremendous advances in this field, enabling, for the first time, fine-grained comparisons of regional brain morphometry between large groups of subjects. Current state-of-the-art whole-brain segmentation algorithms are typically based on supervised models of image appearance in T1-weighted scans, in which the relationship between intensities and neuroanatomical labels is learned from a set of manually annotated training images.

This approach suffers from two fundamental limitations. First, segmentation performance often degrades when the algorithms are applied to T1-weighted data acquired on different scanner platforms or using different imaging sequences, due to subtle changes in the obtained image contrast (Han and Fischl, 2007; Roy et al., 2013). And second, the exclusive focus on only T1-weighted images hinders the ultimate translation of whole-brain segmentation techniques into clinical practice, where they hold great potential to support personalized treatment of patients suffering from brain diseases. This is because clinical imaging uses additional MRI contrast mechanisms to show clinically relevant information, including T2-weighted or fluid attenuated inversion recovery (FLAIR) images that are much more sensitive to certain pathologies than T1-weighted scans (e.g., white matter lesions or brain tumors). Although incorporating models of lesions into whole-brain segmentation techniques is an open problem in itself, a first necessary step towards bringing these techniques into clinical practice is to make them capable of handling the multi-contrast images that are acquired in standard clinical routine.

In this article, we present and validate the performance of a fast, sequence-independent whole-brain segmentation algorithm. The method, which is based on a mesh-based computational atlas combined with a Gaussian appearance model, yields segmentation accuracies comparable to the state of the art; automatically adapts to different MRI contrasts (even if multimodal); requires only a small amount of training data; and achieves computational times comparable to those of the fastest algorithms in the field (Zikic et al., 2014; Ta et al., 2014).

1.1. Current state of the art in whole-brain segmentation

Early methods for the segmentation of brain structures often relied on parametric models, in which the available training data were summarized in relevant statistics that were subsequently used to inform the segmentation of previously unseen subjects. Because many distinct brain structures have similar intensity characteristics in MRI, these methods were typically built around detailed probabilistic models of the expected shape and relative positioning of different brain regions, using surface-based (Kelemen et al., 1998; Pizer et al., 2003; Patenaude et al., 2011; Cootes et al., 1998) or volumetric (Fischl et al., 2002; Pohl et al., 2006b) models. These anatomical models were then combined with supervised models of appearance to encode the typical intensity characteristics of the relevant structures in the training data, often using Gaussian models for either the intensity of individual voxels (Fischl et al., 2002; Pohl et al., 2006b) or for entire regional intensity profiles (Kelemen et al., 1998; Pizer et al., 2003; Patenaude et al., 2011; Cootes et al., 1998). The segmentation problem was then formulated in a Bayesian setting, in which segmentations were sought that satisfy both the shape and appearance constraints.

More recently, non-parametric methods1 have gained increasing attention in the field of whole-brain segmentation, mostly in the form of multi-atlas label fusion (Rohfling et al., 2004a; Heckemann et al., 2006; Isgum et al., 2009; Artaechevarria et al., 2009; Sabuncu et al., 2010; Rohfling et al., 2004b; Wang et al., 2013; Coupé et al., 2011; Rousseau et al., 2011; Tong et al., 2013; Wu et al., 2014; Asman and Landman, 2013; Zikic et al., 2014; Iglesias and Sabuncu, 2015). In these methods, each of the manually annotated training scans is first deformed onto the target image using an image registration algorithm. Then, the resulting deformation fields are used to warp the manual annotations, which are subsequently fused into a final consensus segmentation. Although early methods used a simple majority voting rule (Rohfling et al., 2004a; Heckemann et al., 2006), recent developments have concentrated on exploiting local intensity information to guide the atlas fusion process. This is particularly helpful in cortical areas, for which accurate inter-subject registration is challenging (Sabuncu et al., 2010; Ledig et al., 2012b). Label fusion methods have been shown to yield very accurate whole-brain segmentations (Landman and Warfield, 2012), but their accuracy comes at the expense of a high computational cost as a result of the multiple non-linear registrations that are required. Efforts to alleviate this issue include a local search using entire image patches, such that much faster linear registrations can be used (Coupé et al., 2011; Ta et al., 2014), as well as using rich contextual features so that only a single non-linear warp is needed (Zikic et al., 2014).

1.2. Existing methods that handle changes in MRI contrast

With the exception of simple majority voting (Rohfling et al., 2004a; Heckemann et al., 2006), all the methods reviewed above use supervised intensity models, in the sense that they explicitly exploit the specific image contrast properties of the dataset used for training. This poses limitations on their ability to segment images that were acquired with different scanners or imaging sequences than the training scans.

A generic way of making such methods work across imaging platforms is histogram matching (also known as intensity normalization), in which the intensity profiles of new images are altered so as to resemble those of the images used for training (Nyúl et al., 2000; Roy et al., 2013). However, histogram matching can only be used when the training and target data have been acquired with the same type of MRI sequence (e.g., T1-weighted), and it does not completely cancel the negative effects that intensity mismatches have on segmentation accuracy (Roy et al., 2013).

Another approach is to have the training dataset include images that are representative of all the scanners and protocols that are expected to be encountered in practice. However, this approach quickly becomes impractical due to the large number of possible combinations of MRI hardware and acquisition parameters. The situation is exacerbated for clinical data, due to the lack of standardized protocols to acquire multi-contrast MRI data across clinical imaging centers.

In contrast synthesis (Roy et al., 2013), the original scan is not directly segmented, but rather used to generate a new scan with the desired intensity profile, which is then segmented instead. The premise of this technique is that a database of scans acquired with both the source and target contrast is available, so that the relationship between the two can be learned (Iglesias et al., 2013a; Roy et al., 2013). This approach makes it unnecessary to manually annotate additional training data for each new set-up that is considered – a considerable advantage given that a manual whole-brain segmentation often takes several days per scan (Fischl et al., 2002). However, it still requires that additional example subjects are scanned with both the source and target scanner and protocol, which is not always practical.

Finally, a more fundamental way to address the problem is to perform whole-brain segmentation in the space of intrinsic MRI tissue parameters (Fischl et al., 2004b). However, this requires the usage of specific MRI sequences for which a physical forward model is available, which are not widely implemented on MRI scanning platforms, and particularly not on clinical systems.

1.3. Contribution: validation of a fast, sequence-adaptive whole-brain segmentation algorithm

In contrast to the aforementioned approaches to whole-brain segmentation, which rely on supervised models of the specific intensity profiles seen in the training data, in this paper we validate an unsupervised approach that automatically learns appropriate intensity models from the images being analyzed. At the core of the method is an intensity clustering algorithm (a Gaussian mixture model) that derives its independence of specific image contrast properties by simply grouping together voxels with similar intensities. This approach is well-established for the purpose of tissue classification (aimed at extracting the white matter, gray matter and cerebrospinal fluid) where it is typically augmented with models of MRI imaging artifacts (Wells et al., 1996a; Van Leemput et al., 1999a; Ashburner and Friston, 2005) and spatial models such as probabilistic atlases (Ashburner and Friston, 1997; Van Leemput et al., 1999a; Ashburner and Friston, 2005) or Markov random fields (Van Leemput et al., 1999b; Zhang et al., 2001).

Here we validate a method for whole-brain segmentation that is rooted in this type of approach, building on prior work from our group including a proof-of-concept demonstration in whole-brain segmentation (Van Leemput, 2009), as well as the automated segmentation methods for hippocampal subfields (Iglesias et al., 2015a) and subregions of the brainstem (Iglesias et al., 2015b) that are distributed with the FreeSurfer software package (Fischl et al., 2002). The method we validate here uses a mesh-based probabilistic atlas to provide whole-brain segmentation accuracy at the level of the state of the art, both within and across scanner platforms and pulse sequences. Unlike many other techniques, the method does not need any preprocessing such as skull stripping, bias field correction or intensity normalization. Furthermore, because the method is parametric, only a single non-linear registration (of the atlas to the target image) is required, yielding a very fast overall computational footprint.

An early version of this work, with a preliminary validation, was presented in (Puonti et al., 2013). The current article adds a more detailed explanation of our modeling approach, quantitative comparisons with additional state-of-the-art label fusion algorithms, and more extensive experiments – particularly regarding test-retest reliability, segmentation of multi-contrast and non-T1-contrast data, and the sensitivity of the method to the size of the training dataset.

2. Modeling framework

Let D = (d1, …, dI) denote a matrix collecting the intensities in a multi-contrast brain MRI scan with I voxels, where the vector di=(di1,,diN)T contains the intensities in voxel i for each of the available N contrasts. Furthermore, let l = (l1, …, lI) be the corresponding segmentation, where li ∈ {1, …, K} denotes the one of K possible segmentation labels assigned to voxel i.

In order to estimate l from D, i.e., to compute automated segmentations, we use a generative modeling approach: a forward probabilistic model of MRI images is defined, and subsequently “inverted” to obtain the segmentation. The model consists of two parts: a prior and a likelihood. The prior is a probability distribution over segmentations p(l) that encodes prior knowledge on human neuroanatomy. The likelihood is a probability distribution over image intensities that is conditioned on the segmentation p(D|l), which models the imaging process through which a certain segmentation yields the observed MRI scan. This type of model is generative because it provides a mechanism to generate data through the forward model: in our case, we could generate a random brain MRI scan by first sampling the prior to obtain a segmentation, and then sampling the likelihood conditioned on the resulting segmentation.

Within this framework, the posterior distribution of image segmentations given an input brain MRI scan is given by Bayes’ rule:

p(I|D)p(D|I)p(I). (1)

Maximizing Eq. 1 with respect to l then yields the maximum a posteriori (MAP) estimate of the segmentation.

In the rest of this Section, we will describe in depth the prior (Section 2.1) and likelihood (Section 2.2); we will propose an inference algorithm to approximately maximize Eq. 1 (Section 2.3); and finally we will describe the details of the implementation of this algorithm (Section 2.4).

2.1. Prior

For the prior p(l) we use a generalization of the probabilistic brain atlases often used in brain MRI segmentation (Ashburner and Friston, 1997; Van Leemput et al., 1999b,a, 2001; Zijdenbos et al., 2002; Fischl et al., 2002; Ashburner and Friston, 2005; Prastawa et al., 2005; Pohl et al., 2006b; D’Agostino et al., 2006; Awate et al., 2006; Pohl et al., 2007). This model, detailed in (Van Leemput, 2009), is based on a deformable tetrahedral mesh, the properties of which are learned automatically from a set of manual example segmentations made on MRI scans of training subjects. Each of the vertices of the mesh has an associated set of label probabilities specifying how frequently each of the K labels occurs at the vertex. The resolution of the mesh is locally adaptive, being sparse in large uniform regions and dense around the structure borders. This automatically introduces a locally varying amount of spatial blurring in the resulting atlas, aiming to avoid over-fitting of the model to the available training samples (Van Leemput, 2009). During training, the topology of the mesh and the position of its vertices in atlas space (henceforth “reference position”) is computed along with the label probabilities in a non-linear, group-wise registration of the labeled training data. An example of the resulting probabilistic brain atlas, computed from manual parcellations in 20 subjects, is displayed in its reference position in Figure 1; note the irregularity in the shapes and sizes of the tetrahedra.

Figure 1:

Figure 1:

Left: T1-weighted scan from the training data. Center: corresponding manual segmentation. Right: atlas mesh built from 20 randomly selected subjects from the training data.

The positions of the mesh nodes x can change according to their prior distribution p(x):

p(x)exp(βt=1Tϕt(x,xref)) (2)

where T and xref denote the number of tetrahedra and the reference position of the mesh, respectively; ϕt(x, xref) is a penalty for deforming tetrahedron t from its reference to its actual position; and β > 0 is a scalar that controls the global stiffness of the mesh. We use the penalty term proposed in (Ashburner et al., 2000), which goes to infinity when the Jacobian determinant of the deformation approaches zero. This choice prevents the mesh from tearing or folding onto itself, thus preserving its topology.

Given a deformed mesh with node positions x, the probability pi(k|x) of observing label k at a voxel i is obtained by barycentric interpolation of the label probabilities at the vertices of the tetrahedron containing the voxel. Moreover, we assume conditional independence of the labels of the different voxels given the mesh node positions, such that

p(I|x)=i=1Ipi(Ii|x). (3)

The expression for the prior distribution over segmentations is finally:

p(I)=xp(I|x)p(x)dx. (4)

2.2. Likelihood

The likelihood p(D|l) models the relationship between segmentation labels and image intensities. For this purpose, we associate a mixture of Gaussian distributions with each label (Ashburner and Friston, 2005), and assume that the bias field imaging artifact typically seen in MRI can be modeled as a multiplicative and spatially smooth effect (Wells et al., 1996a). For computational reasons, we use log-transformed image intensities in D, and model the bias field as a linear combination of spatially smooth basis functions that is added to the local voxel intensities (Van Leemput et al., 1999a).

Specifically, letting θ denote all bias field and Gaussian mixture parameters, with uniform prior p(θ) ∝ 1, the likelihood is defined by

p(D|I)=θp(D|I,θ)p(θ)dθ, (5)

where

p(D|I,θ)=i=1Ipi(di|li,θ), (6)
pi(d|k,θ)=g=1Gkwk,gN(dCϕi|μk,g,Σk,g),

and

N(d|μ,Σ)=1(2π)N|Σ|exp(12(dμ)TΣ1(dμ)).

Here, Gk is the number of Gaussian distributions in the mixture associated with label k; and μk,g, Σk,g, and wk,g are the mean, covariance matrix, and weight of component g ∈ {1, …, Gk} in the mixture model of label k (satisfying wk,g ≥ 0 and ∑g wk,g = 1). Furthermore,

C=(c1TcNT),cn=(cn,1cn,P) and ϕi=(ϕ1iϕ2iϕPi),

where P denotes the number of bias field basis functions, ϕip is the basis function p evaluated at voxel i, and cn holds the bias field coefficients for MRI contrast n.

The entire forward model is summarized in Table 1.

Table 1:

Equations for the forward probabilistic model of MRI brain scans

X ~ p(x) (Eq. 2)
I ~ p(l|x) (Eq. 3)
θ ~ p(θ) ∝ 1
D ~ p(D|l, θ) (Eq. 6)

2.3. Inference

Using the model described above, the MAP segmentation for a given MRI scan is obtained by maximizing Eq. 1 with respect to l:

I^=arg max Ip(I|D)=arg max Ip(D|I)p(I), (7)

which is intractable due to the integrals over the parameters x and θ that appear in the expressions for p(l) (Eq. 4) and p(D|l) (Eq. 5), respectively. This difficulty can be side-stepped if the posterior distribution of the model parameters in light of the data is heavily peaked around its mode:

p(x,θ|D)δ(xx^,θθ^),

where δ(·) is Dirac’s delta and the point estimates {x^, θ^} are given by:

{x^,θ^}=argmax {x,θ}p(x,θ|D). (8)

In that scenario, we can approximate:

p(I|D)=xθp(I|D,x,θ)p(x,θ|D)dxdθp(I|D,x^,θ^), (9)

which no longer involves intractable integrals. The resulting inference algorithm then involves two distinct phases, detailed below: first, computing the point estimates by maximizing Eq. 8; and subsequently computing the segmentation by maximizing Eq. 9 with respect to l.

Computation of point estimates.

Applying Bayes’ rule to Eq. 8, we obtain:

p(x,θ|D)p(D|x,θ)p(x)p(θ)(Ip(D|I,θ)p(I|x))p(x)=i=1I(k=1Kpi(di|k,θ)pi(k|x))p(x).

Taking the logarithm, we can rewrite the problem as the maximization of the following objective function:

{x^,θ^}=argmax{x,θ}[i=1Ilog(k=1Kpi(di|k,θ)pi(k|x))+logp(x)]. (10)

We solve this problem with a coordinate ascent scheme, in which the mesh node positions x and likelihood parameters θ are iteratively updated, by alternately optimizing one while keeping the other fixed.

To optimize the mesh node positions x with fixed θ, we use a standard conjugate gradient optimizer (Shewchuk, 1994). To optimize the likelihood parameters θ with fixed x, we use a generalized expectation-maximization (GEM) algorithm (Dempster et al., 1977) similar to the one proposed in (Van Leemput et al., 1999a). In particular, the GEM optimization involves iteratively computing the following soft assignments of each voxel to each of the Gaussian distributions, based on the current parameter estimates:

qik,g=wk,gN(diCϕi|μk,g,Σk,g)pi(k|x)k=1Kpi(di|k,θ)pi(k|x), (11)

and subsequently updating the parameters accordingly:

μk,gi=1Iqik,g(diCϕi)i=1Iqik,g,wk,gi=1Iqik,gi=1Ig=1Gkqik,g
Σk,gi=1Iqik,g(diμk,gCϕi)(diμk,gCϕi)Ti=1Iqik,g
(c1cN)(ATS1,1AATS1,NAATSN,1AATSN,NA)1.(AT(S1,1r1,1++S1,Nr1,N)AT(SN,1rN,1++SN,NrN,N)),

Where

A=(ϕ11ϕP1ϕ1IϕPI),Sm,n=diag(sim,n)

and rm,n=(r1m,n,,rIm,n)T,with

sim,n=k=1Kg=1Gksi,k,gm,n,si,k,gm,n=qik,g(Σk,g1)m,n
rim,n=dinl=1Kg=1G1si,k,gm,n(μk,g)nl=1Kg=1G1si,k,gm,n

It can be shown that this process is guaranteed to increase the objective function of Eq. (10) with respect to θ in each GEM iteration (Dempster et al., 1977; Van Leemput et al., 1999a).

Computation of the final segmentation.

Given the point estimates of the model parameters, the conditional posterior distribution of the segmentation l factorizes over voxels:

p(I|D,x^,θ^)=i=1Ipi(li|di,x^,θ^),pi(k|di,x^,θ^)=g=1Gkqik,g.

The optimal segmentation for each voxel is therefore given by:

l^i=argmaxkg=1Gkqik,g

2.4. Implementation

In practice, we have found that modeling substructures with similar intensity properties (e.g., all white matter structures) with the same Gaussian mixture model improves the robustness of the algorithm while giving faster execution times. Letting f denote a set of structures that share the same mixture model, this is accomplished by altering the GEM update equations for the Gaussian mixture parameters as follows:

μk,gi=1Iqif,g(diCϕi)i=1Iqif,gkf
wk,gi=1Iqif,gi=1Ig=1Gfqif,gkf
Σk,gi=1Iqif,g(diμk,gCϕi)(diμk,gCϕi)Ti=1Iqif,gkf

where

qif,g=kfqik,g

The details of which structures share the same mixture models will be given in Section 3.3.

To initialize the algorithm, we first affinely align the atlas to the target image using the registration method described in (D’Agostino et al., 2004), which uses atlas probabilities – rather than an intensity template – to drive the registration process. After the initial registration we mask out non-brain tissues by excluding voxels that have a prior probability lower than 0.01 of belonging to any of the brain structures.

The image intensities are then log-transformed to accommodate the additive bias field that is employed (cf. Section 2.2). For the bias field modeling, we use the lowest frequency components of the 3D discrete cosine transform (DCT) as basis functions (for the number of components see Section 3.3).

The subsequent optimization is done at two resolution levels. In the first level, the atlas probabilities are smoothed using a Gaussian kernel with a standard deviation of 2.0 mm in order to fit large scale mesh deformations. No smoothing is used in the second level, which refines the registration on a smaller scale.

The stopping criteria for the different components of the algorithm are as follows: the likelihood parameters θ are updated until the relative change in the objective function (Eq. 10) falls under 10−5; the mesh node positions are updated until the maximum deformation across vertices falls under 10−3 mm; and the GEM and conjugate gradient optimizers are iteratively interleaved until the decrease in the cost function falls under 10−6.

The algorithm is implemented in Matlab except for the computationally demanding optimization of the mesh node positions, which is implemented in C++, and which involves computing the mesh node deformation prior p(x) (Eq. 2), the interpolated prior probabilities p(l|x) (Eq. 3) and the gradient of the objective function (Eq. 10) with respect to the mesh node positions.

3. Experiments

In this section, we first describe the brain MRI datasets used in this study (Section 3.1). Then, we outline four methods that our algorithm is benchmarked against (Section 3.2). Next, we detail how the free parameters of each method are set (Section 3.3). Finally, we describe the setups for four different experiments in which the different methods are tested (Section 3.4).

3.1. MRI data

In the experiments, we use five different sets of scans: one exclusively for training the segmentation methods, and the other four for testing the performance on unseen data. For training, we use a dataset of 39 T1-weighted MRI scans and corresponding expert segmentations. The expert segmentations were obtained using a validated semi-automated protocol developed at the Center for Morphometric Analysis (CMA), MGH, Boston (Caviness Jr et al., 1989; Caviness et al., 1996; Kennedy et al., 1989). All raters had to pass tests measuring intra- and inter-rater reliability before they were allowed to perform segmentations. The resulting training data consists of 28 healthy subjects and 11 subjects with questionable or probable Alzheimer’s disease with ages ranging from under 30 years old to over 60 years old (Sabuncu et al., 2010). The scans were acquired on a 1.5T Siemens Vision scanner using an MPRAGE sequence with parameters: TR=9.7ms, TE=4ms, TI=20ms, flip angle = 10° and voxel size = 1.0 × 1.0 × 1.5mm3 (128 sagittal slices), where the scan parameters were empirically optimized for gray-white matter contrast (Buckner et al., 2004). This is the same dataset used for training in the publicly available software package FreeSurfer (Fischl et al., 2002). An example scan and a corresponding manual segmentation are shown in Figure 1.

For testing, we use four different datasets acquired on scanners from different manufacturers, with different field strengths and pulse sequences. For three of the datasets, including a total of 35 subjects, we have access to expert manual segmentations, enabling quantitative comparisons of automated segmentation accuracy. All these manual segmentations were performed using the same protocol as was used for the training data. The fourth test dataset consists of 40 subjects scanned at two time points; it does not have expert segmentations but will be used to assess test-retest reliability instead. Below we provide details on each of these four test datasets.

The first test dataset consists of T1-weighted scans of 13 individuals with age and disease status matching those of the training dataset, acquired on a 1.5T Siemens Sonata scanner with the same sequence and parameters as the training data (Han and Fischl, 2007). Given the similarity with the training data (vendor, field strength, pulse sequence), we will refer to this dataset as the “intra-scanner dataset”. An example scan and a corresponding manual segmentation are shown in Figure 2.

Figure 2:

Figure 2:

On the left an example slice from the intra-scanner dataset and on the right a corresponding manual segmentation.

The second test dataset consists of T1-weighted scans of 14 individuals with age and disease status matching those of the training dataset, acquired on a 1.5T GE Signa Scanner using an SPGR sequence with parameters: TR = 35 ms, TE = 5 ms, flip angle = 45° and voxel size = 0.9375 × 0.9375 × 1.5mm3 (124 coronal slices) (Han and Fischl, 2007). This dataset will be referred to as the “cross-scanner dataset”. An example scan and a corresponding manual segmentation are shown in Figure 3.

Figure 3:

Figure 3:

On the left an example slice from the cross-scanner dataset and on the right a corresponding manual segmentation.

The third test dataset consists of multi-echo FLASH scans from 8 healthy subjects acquired on a 1.5T Siemens Sonata scanner. The acquisition parameters were: TR = 20 ms, TE = min, flip angle = 3°, 5°, 20° and 30°, and voxel size = 1.0mm3 isotropic (Fischl et al., 2004b; Iglesias et al., 2012). The different flip angles correspond to different contrast properties, with the smallest angle having contrast similar to proton density (PD) weighting and the largest one having a contrast similar to T1-weighting. These data will be referred to as the “multi-echo dataset”. A sample slice from this dataset, with flip angles 30° and 3°, is shown in Figure 4.

Figure 4:

Figure 4:

An example of the T1- (flip angle = 30°) and PD-weighted (flip angle = 3°) scans of the same subject from the multi-echo dataset.

The fourth and final test dataset consists of 40 healthy subjects scanned at two different time points at different facilities, with scan intervals ranging from 2 days to six months, amounting to a total of 80 T1- and T2-weighted scans for the whole dataset (Holmes et al., 2012). The scans were all acquired with 3T Siemens Tim Trio scanners using identical multi-echo MPRAGE sequences for the T1 and 3D T2-SPACE sequences for the T2, with voxel size = 1.2 × 1.2 × 1.2mm3. Note that the acquisition protocol was highly optimized for speed, with a total acquisition time for both scans of under 5 minutes. This dataset will be referred to as the “test-retest dataset”. One of the scans had to be excluded because of motion artifacts. Moreover, some of the T2-weighted scans have minor artifacts not present in the T1-weighted scans. These scans were however included in the experiments. Manual segmentations were not available for this dataset; however, these scans are still useful in test-retest experiments quantifying the differences between the two time points. Ideally, as all the subjects are healthy, the biological variations should be small and the segmentations between the two time points should be identical. An example of the T1- and T2-weighted scans is shown in Figure 5.

Figure 5:

Figure 5:

An example of the T1- and T2-weighted scans of the same subject from the test-retest dataset.

3.2. Benchmark methods

In order to gauge the performance of the proposed algorithm with respect to the state of the art in brain MRI segmentation, we compare its performance against four representative methods:

  • BrainFuse2 (Sabuncu et al., 2010) is a multi-atlas segmentation method that uses an intensity-based label fusion approach to merge a set of propagated training labelings into a final segmentation of a target scan. More specifically, it assumes a generative model in which the joint intensity-label space is modeled with a Parzen density estimator (using a logOdds-based kernel (Pohl et al., 2006a) for the labels, and a Gaussian kernel for the intensities); with an optional Markov random field prior enforcing spatial consistency. Segmentation is carried out through Bayesian inference, effectively giving more weight to atlases that have locally similar intensities to the target scan. In the publicly available implementation, the Markov random field prior is not included – however it does not yield a significant increase in segmentation accuracy (Sabuncu et al., 2010). For computing the registrations between the training and target subjects, BrainFuse employs asymmetric bidirectional registrations based on an efficient Demons-style algorithm that uses a one parameter sub-group of diffeomorphisms combined with a sum of squared intensity differences (SSD) similarity measure (Sabuncu et al., 2010). The freely available implementation of BrainFuse is optimized to work with data that has been preprocessed (skull-stripped, bias field corrected, intensity normalized and re-sampled to a 1mm3 grid) with FreeSurfer. In the experiments, we follow these preprocessing requirements. The free parameters of the registration method are set to the values reported in (Sabuncu et al., 2010), where the authors cross-validated the parameter values on the same training dataset that we use in this study.

  • PICSL MALF3 (Wang et al., 2013) assumes that the segmentation errors of the propagated training labelings can be correlated, as opposed to BrainFuse, in which independence of the errors of the different labelings is assumed. PICSL MALF formulates a weighted voting problem in terms of trying to minimize the expectation of the labeling error, i.e., the error between the fused labels and the true segmentation in every voxel. To achieve this, it approximates the expected pairwise joint label differences between the training scans and the target scan using intensity similarity information. The intensity similarities are computed within a patch around each voxel. The patch intensities are normalized to have zero mean and a constant norm, making the similarity measure robust against linear intensity change, which is often enough to correct for small differences in MRI contrast. Moreover, PICSL MALF also performs a local search to try to find the voxel that is most similar to the corresponding target image voxel patch-wise. This can be interpreted as additional refinement of the pre-computed pairwise registrations. For computing the initial pair-wise registrations between the training and target subjects PICSL MALF uses ANTs/SyN4 (Avants et al., 2008), which is a diffeomorphic registration algorithm. We follow the implementation details that were used in the implementation of PICSL MALF that won the MICCAI 2012 Grand Challenge on Multi-Atlas Labeling (Landman and Warfield, 2012). Specifically, for computing the pair-wise registrations, we use the cross-correlation (CC) similarity metric, which adapts naturally to situations where locally varying intensities occur (Avants et al., 2008); and we set the registration parameters to the values reported in (Landman and Warfield, 2012). The authors use no specific preprocessing steps such as bias field correction; however, the ANTs/SyN registration algorithm has been shown to be robust to quite severe bias field effects when the CC similarity metric is used (Avants et al., 2008). We note that the PICSL MALF software also provides a post-processing procedure to correct systematic segmentation errors based on corrective learning (Wang et al., 2011); however since this is an independent module that is equally applicable to the other benchmark methods as well it was not used in this study.

  • FreeSurfer5 (Fischl et al., 2002) is based on a statistical atlas of neuroanatomy, along with an intensity atlas in which a Gaussian distribution is associated with each voxel and class. The parameters of these Gaussians are estimated in a supervised fashion from training data. The model is completed by a Markov random field model that ensures spatial smoothness of the segmentation, which is computed as the MAP estimate in a Bayesian framework. We note that FreeSurfer was trained on the same training data that we are using in this study, which makes direct comparison with our approach and the multi-atlas methods feasible.

  • Majority Voting (Rohfling et al., 2004a; Heckemann et al., 2006) is a simple multi-atlas segmentation method, where the propagated training labelings are fused into a final segmentation by picking, in each voxel, the most frequent label across the propagated labelings. We include this method as a reference against which we can compare the performance of the more sophisticated label fusion approaches. For our implementation of majority voting, we use the same pair-wise registrations as for PICSL MALF.

These benchmark methods cover a wide spectrum of modern brain MRI segmentation algorithms. Majority voting, BrainFuse and PICSL MALF represent multi-atlas segmentation, which is arguably the most popular segmentation paradigm at the moment. Moreover, they are non-parametric methods, whereas our method and FreeSurfer represent parametric approaches.

3.3. Cross-validation experiments on training data for parameter tuning

The free parameters of the different methods are determined using the 39-subject training dataset as follows:

Proposed Algorithm.

We use 20 randomly picked subjects out of the available 39 to build our probabilistic atlas. Only 20 subject are chosen, because the atlas building process is very computationally expensive (several weeks to build an atlas with 20 subjects) and the results show that the segmentation performance does not increase any further when more subjects are added (see Section 4.3). The remaining 19 subjects are used to find suitable values for the free parameters in our algorithm: the global stiffness of the mesh β, the number of bias field basis functions P, the groups of structures f that share the same GMM parameters, and the number of mixture components associated with each structure group. The parameters are tuned based on a visual inspection of the automatic segmentations in the 19 training subjects. The chosen values for the mesh stiffness and number of bias field basis functions are: β = 0.1 and P = 5 per dimension, amounting to a total of P = 53 = 125 basis functions in 3D. The choice of which sets of structures share the Gaussian mixture parameters, as well as the number of Gaussians for each mixture, is summarized in Table 2.

Table 2:

Details of the parameter sharing between structure classes. The groups of structures that share their Gaussian mixture parameters are shown in the first column, and the corresponding amount of Gaussians in the mixture in the second column.

Structures with shared parameters Number Of Gaussians
Non-brain tissues 3
L/R Cerebral White Matter (WM)
L/R Cerebellum White Matter (CWM)
Brain Stem (BS) 2
L/R Ventral Diencephalon
Optic Chiasm
L/R Cerebral Cortex (CT)
L/R Cerebellum Cortex (CCT)
L/R Caudate (CA) 3
L/R Hippocampus (HP)
L/R Amygdala (AM)
L/R Accumbens Area
L/R Lateral Ventricle (LV)
L/R Inferior Lateral Ventricle
3rd Ventricle
Cerebro-Spinal Fluid (CSF) 3
5th Ventricle
4th Ventricle
Vessel
L/R Choroid Plexus
L/R Thalamus (TH) 2
L/R Putamen (PU) 2
L/R Pallidum (PA) 2

BrainFuse.

We use the optimal parameters listed in the original publication (Sabuncu et al., 2010); this choice is appropriate because the authors cross-validate the parameter values on the same training dataset as used in this study.

PICSL MALF.

For this method we need to determine the optimal values for the patch radius over which the intensity similarity is calculated; a constant controlling the inverse distance function which maps the intensity difference to the joint error; and the size of the local search window (Wang et al., 2013). For this purpose, we randomly select 10 subjects as test data and use the remaining 29 subjects as training data, and perform a cross-validation grid search using similarity patch radii of rp = [1, 2, 3], local search radii of rs = [0, 1, 2, 3] and inverse mapping constants of β = [0.5, 1, 1.5, 3, 6]. As a measure of goodness we use the mean Dice overlap score6 (which is the main performance metric used in the experiments below) over the structures listed in Section 3.4 below. The resulting optimal values are: rp = 1, rs = 2 and β = 3.

FreeSurfer.

We use the standard processing pipeline with default parameters. No cross-validation needs to be performed as FreeSurfer is trained on the same training dataset (using all 39 subjects) we use in this study.

Majority Voting.

Given the pre-computed registrations, majority voting has no parameters to tune.

3.4. Experimental setup

We perform a comprehensive evaluation consisting of four sets of experiments:

  1. In a first experiment, we use models trained on the training dataset to segment the scans from the intra-scanner and the cross-scanner datasets, comparing each method’s segmentations with the corresponding manual annotations using the Dice overlap score. This experiment enables us not only to compare the performance of the different methods, but also to assess how much their performance degrades when the image intensity properties of the training and test datasets are not matched.

  2. In a second experiment, we evaluate the computational efficiency of the various methods on the intra- and inter-scanner datasets. We compute the running time of the different algorithms on a cluster where each node has two quad-core Xeon 5472 3.0GHz CPUs and 32GB of RAM; we only use one core in the experiments in order to make fair comparisons, even though all the algorithms can potentially be parallelized. We also record the execution time of a multi-threaded implementation of our method, using 8 cores on a computer with 8 dual-cores with 3.4Ghz CPU and 64GB of RAM. This setup represents a realistic scenario that enables us to compare the running time of our algorithm with those reported by other studies in the literature.

  3. In a third experiment, we study the effect of the number of training subjects on the segmentation performance. To achieve accurate segmentations, a representative training set is needed to capture all the structural variation one might see within the subjects to be segmented (Aljabar et al., 2009). However, some algorithms require less training data than others to approach their asymptotic performance, which represents a saving in manual labeling effort. We therefore randomly pick 5 sets of 5, 10 and 15 subjects from the training data, and re-evaluate the segmentation performance of the proposed method, BrainFuse, PICSL MALF and majority voting on the intra- and cross-scanner datasets.

  4. In a final experiment, we evaluate the ability of the proposed algorithm to segment non-T1-contrast and multi-contrast MR scans using the multi-echo and the test-retest datasets. Given a training set consisting only of T1-weighted scans, using multi-contrast or non-T1-contrast information is out of reach for the four specific benchmark methods we compare against in this article, although we note that several multi-atlas label fusion techniques exist that could potentially be used in this context (cf. discussion in Section 5). For the multi-echo dataset we first run the proposed method using only the T1-weighted images (i.e., flip angle 30°), then only the PD-weighted images (i.e., flip angle 3°), and finally using both the T1- and PD-weighted images simultaneously. The resulting automated segmentations are then compared to the expert segmentations using Dice scores. For the test-retest dataset, we first segment the two time points using only the T1-weighted images, and subsequently using both T1- and T2-weighted images together. Because no manual segmentations are available for this dataset, we use absolute symmetrized percent change (ASPC) (Reuter et al., 2012) to quantify the differences in the automatic segmentations between the two time points. This metric is defined as the absolute value of the difference in volume, normalized by the mean volume:
    ASPC=2|V2V1|V1+V2
    where V1, V2 are the volumes at the two time points. Ideally this number should be small, as the subjects are all healthy and the time between the scans is not so long.

We report the Dice scores and the ASPC on a representative subset of 23 relevant structures that is also used in other studies (e.g., (Fischl et al., 2002; Sabuncu et al., 2010)): left and right cerebral white matter (WM), cerebellum white matter (CWM), cerebral cortex (CT), cerebellum cortex (CCT), lateral ventricle (LV), hippocampus (HP), thalamus (TH), putamen (PU), pallidum (PA), caudate (CA), amygdala (AM) and brain stem (BS). We will refer to these structures as the “regions of interest” (ROIs); note that for clarity of presentation we report the average Dice score of the left and right hemisphere for all structures except for the brain stem.

4. Results

4.1. Intra-scanner and cross-scanner segmentation performance

The Dice scores between the manual and automated segmentations of the ROIs, obtained using the different methods, are shown for the intra-scanner dataset in Figure 6 (top). Table 3 (first column) summarizes the scores in average over the ROIs and subjects, and reports statistically significant differences between the methods. The significance testing was done using paired, two-sided t-tests, by stacking the individual Dice scores in each ROI and subject for a given method. Corresponding scores and significant differences for each ROI separately are reported in Supplementary Material, Table 1. All of the methods perform well on the intra-scanner dataset, which was expected, as the contrast properties of the training data are identical to those of this dataset. The multi-atlas segmentation methods achieve the highest mean scores, with PICSL MALF being the best method for this dataset. Majority voting also obtains a very high mean score despite its simple fusion strategy. This is likely due to the accurate ANTs/SyN registration framework, which has been shown to perform very well on intra-scanner data (Klein et al., 2009). We note that each of the benchmark methods is specifically trained for this type of data, whereas the proposed method is not.

Figure 6:

Figure 6:

The Dice scores of the different methods for the intra-scanner (top) and cross-scanner (bottom) data. The proposed method = green, BrainFuse = blue, PICSL MALF = magenta, FreeSurfer = red and Majority Voting=black. Additional results, obtained by preprocessing the input data using the FreeSurfer pipeline, are also shown (filled boxes with broken lines). On each box, the central horizontal line is the median, the circle is the mean, and the edges of the box are the 25th and 75th percentiles. Data points falling outside of the range covered by scaling the box four times are considered outliers, and are plotted individually. The whiskers extend to the most extreme data points that are not considered outliers. See Section 3.4 for the acronyms.

Table 3:

Mean Dice scores of the different methods over the ROIs for the intra-scanner (first column) and cross-scanner (third column) datasets. Additional results, obtained by preprocessing the input data using the FreeSurfer pipeline, are also shown (in italics, second and fourth columns). The superscript lists the methods that obtain significantly lower scores compared to a given method. The significance was tested using a paired two-sided t-test with a 5% significance level.

Intra-scanner data Cross-scanner data
Method Native Preprocessed Native Preprocessed
Proposed (P) 0.863FS 0.865FS 0.807BF,PM,FS,MV 0.806BF,FS
BrainFuse (BF) 0.868FS 0.868FS 0.744mv 0.744
PICSL MALF (PM) 0.895p,BF,FS,MV 0.895p,BF,FS,MV 0.760BF,MV 0.822p,BF,FS,MV
FreeSurfer (FS) 0.853 0.853 0.799BF,PM,MV 0.799BF
Majority Voting (MV) 0.883p,BF,FS 0.885p,BF,FS 0.698 0.808BF,FS

For the cross-scanner data, where the contrast properties of the target data are different from the training data, the ROI Dice scores are shown in Figure 6 (bottom) and the mean scores over the ROIs and subjects in Table 3 (third column). Corresponding scores and significant differences for each ROI separately are reported in Supplementary Material, Table 2. Compared to the intra-scanner dataset, the overall segmentation accuracy of all methods decreases, which is likely due to the lower intrinsic image contrast of the SPGR pulse sequence as noted in (Han and Fischl, 2007) and as also visible from Figure 3. In this dataset, the proposed method achieves the highest mean score, demonstrating its robustness against changes in contrast. Although FreeSurfer explicitly encodes the contrast properties of the training scans, its performs relatively well on this un-matched data; this can be explained by its in-built renormalization procedure for T1 acquisitions, which applies a multi-linear atlas-image registration and a histogram matching step to update the class-conditional densities for each structure (Han and Fischl, 2007). In contrast, the label fusion methods, which directly rely on the image intensities of the training data in their registration and fusion steps, are clearly affected by the changes in the MRI contrast. The pair-wise registrations are especially more challenging for this dataset, leading to misregistrations that are the principal error source in multi-atlas segmentation.

The segmentation accuracy of BrainFuse, which uses FreeSurfer-preprocessed images, varies between different structures. In general it seems to perform well on some of the larger structures (LV, BS), whereas the performance is not so good on some of the smaller structures (AM, HP, PA). This is likely explained by the choice of registration algorithm and especially the SSD similarity measure, which is not invariant against small intensity changes. Although the PICSL MALF and majority voting methods use the more robust CC similarity measure in the ANTs/SyN registration framework, there are some subjects in the cross-scanner dataset for which computing the registrations without preprocessing is very difficult, resulting in the segmentation outliers shown in Figure 6. We note that although majority voting does not rely on intensity information when fusing the labels, its usage of CC in the registration step indirectly assumes that the training and target scans have similar properties (linear local intensity transformation). Compared to PICSL MALF, its simple fusion rule makes majority voting much more dependent on the quality of the pair-wise registrations, as the effect of poorly registered subjects can not be downplayed.

In order to further analyze the relative performance of the various methods without the influence of outlier subjects in majority voting and PICSL MALF, we performed an additional, post hoc analysis with the explicit aim of avoiding ANTs/SyN registration failures in the cross-scanner data. For this purpose, we re-ran majority voting and PICSL MALF, as well as the proposed method, on data that had been preprocessed with the FreeSurfer pipeline (which includes skull-stripping, bias field correction, intensity normalization, and re-sampling to a 1mm3 grid), although we note that this preprocessing is not part of the default implementations of these algorithms.

The resulting average Dice scores for the intra- and the cross-scanner data are shown (in italics) in the second and the fourth column of Table 3 – note that BrainFuse already depends on FreeSurfer so that all five methods effectively use the same preprocessing pipeline in this scenario. The Dice scores obtained this way for each ROI individually are also displayed in Figure 6. Comparing the results obtained with and without FreeSurfer preprocessing, it can be seen that the additional preprocessing effectively avoids ANTs/SyN registration failures in the cross-scanner data, resulting in a strong performance for majority voting with only a relatively minor improvement for the more advanced label fusion of PICSL MALF, which obtains the strongest overall segmentation accuracy. Unlike in the intra-scanner data, however, majority voting no longer outperforms the proposed method in the cross-scanner data even though all its pair-wise registrations are successful. It can also be seen that the proposed method does not benefit from FreeSurfer preprocessing in either the intra-scanner or the cross-scanner data, an indirect demonstration of its intrinsic bias field correction and skull stripping performance.

Finally, Tables 3 and 4 of the Supplementary Material list the Dice scores and significant differences for each ROI separately for the intra- and cross-scanner data preprocessed with FreeSurfer. It can be seen that, although after preprocessing PICSL MALF outperforms other techniques in seven structures on the cross-scanner data, the proposed technique remains the best method for four other structures, especially those in cortical areas (WM, CT, CWM).

A limitation of the comparisons presented in this Section is that the proposed method uses an atlas built from 20 randomly selected subjects, potentially introducing a bias when comparing to benchmark methods that use all 39 subjects without selection as training set. However, as shown in Section 4.3, PICSL MALF, BrainFuse and majority voting all benefit from using all the available training subjects compared to random subsets of various sizes, whereas the proposed method saturates around 10 subjects with very little further gains from larger training sets.

4.2. Running time

The approximate mean computation time for a single scan using the different methods is shown in Table 4. The proposed method is approximately 7 times faster than FreeSurfer, 12 times faster than BrainFuse and 100 times faster than PICSL MALF and majority voting.

Table 4:

Mean computational time for the different methods (single core). For label fusion methods the computation times for registration (Reg.) and label fusion (Fusion) are listed separately.

Mean time per subject (single core)
Method Reg. Fusion Full Time
BrainFuse 16h 1h 17h
Majority voting 143.9h 0.1h 144h
PICSL MALF 143.9h 3.8h 147.7h
FreeSurfer - - 9.5h
Proposed - - 1.4h

In general, the parametric methods (i.e., FreeSurfer and the proposed method) are significantly faster than the label fusion approaches. This is because only a single non-linear registration is needed, as opposed to the multiple pair-wise registrations used in the non-parametric methods. Moreover, in PICSL MALF the local search is especially time consuming with large search windows. Compared with FreeSurfer, which is also parametric, our method is faster due to the sparse encoding of the mesh prior. Encoding this sparsity is computationally expensive, but needs to be done only once (in an offline fashion). Furthermore, in the proposed approach, no special post or preprocessing of the target scans is needed.

In its multi-threaded setup, the proposed method has an execution time of 23.5 minutes per scan on average. The fastest whole-brain segmentation method to our knowledge is presented in (Zikic et al., 2014) with execution times in the range of 5 to 13 minutes; however this method is not designed to handle image contrast differences.

4.3. Effect of the number of training subjects

Figure 7 shows the effect on each method’s Dice scores, averaged across all ROIs, of training on randomly selected subsets of the entire training pool, both for the intra-scanner and the cross-scanner datasets. In order to compare the different methods’ performance without the influence of gross registration failures in majority voting and PICSL MALF, results obtained after preprocessing the data with FreeSurfer are also provided. The figure shows that adding more training subjects generally yields more accurate segmentations for all methods, but that the proposed method reaches its maximum performance faster than the multi-atlas methods: Already with 10 training subjects the segmentation accuracy of the proposed method is above 99% of its maximal performance in all experiments, regardless of the specific subjects included in the training set. This is especially useful for populations where expert segmentations are expensive or difficult to obtain, such as infants. The fact that the performance of the proposed method is not more dependent on the specific subjects included in the training set is likely due to the atlas construction process that explicitly avoids over-fitting to training data (Van Leemput, 2009), yielding sparser tetrahedral meshes (and therefore blurrier probabilistic atlases) when fewer training subjects are available. This effect is illustrated in Table 5, where the average number of mesh vertices for the 5,10 and 15 training subject groups are reported.

Figure 7:

Figure 7:

Mean Dice scores over the ROIs for the intra-scanner (left) and the cross-scanner (right) data when the different methods are trained using randomly picked subsets of only 5, 10 and 15 training subjects. The error bars correspond to the lowest and highest obtained mean Dice score across the random subsets. The score obtained when all subjects in the training pool are used is also shown for reference (fourth bar of each method). The proposed method (P) is shown in green, BrainFuse (BF) in blue, PICSL MALF (PM) in magenta and majority voting (MV) in black. Additional results, obtained by preprocessing the input data using the FreeSurfer pipeline, are also shown (filled bars with broken lines).

Table 5:

Average number of vertices in the proposed atlas mesh for different numbers of training subjects.

Number of subjects Average number of vertices
5 33,606
10 44,614
15 51,258

The effect of FreeSurfer preprocessing appears to be minimal for the proposed method across the different training set sizes, showing a similar performance in both the intra-scanner and the cross-scanner data compared to when no preprocessing is applied. In contrast, preprocessing is crucial for both majority voting and PICSL MALF in the cross-scanner setting, as ANTs/SyN registration failures otherwise severely compromise segmentation performance. Compared to the other multi-atlas methods working on the same (i.e., preprocessed) data, as well as the proposed method (with or without preprocessing), BrainFuse appears to be much more sensitive to small training datasets, both in terms of the average Dice scores that it obtains as well as its sensitivity to the specific random subjects that are used for training.

4.4. Multi-contrast performance

Figure 8 shows the Dice scores of the proposed method on the multi-echo dataset, for various combinations of single- (T1-weighted only or PD-weighted only) and multi-contrast (T1- and PD-weighted simultaneously) input data. The results are very similar between T1-weighted only and multi-contrast input data, whereas using the PD-weighted contrast alone often yields reduced performance. This indicates that the PD-weighted contrast does not add much useful information to the T1-weighted scan when healthy brains are segmented. Example segmentations of the multi-echo dataset using T1-weighted only and multi-contrast scans are shown in Figure 9.

Figure 8:

Figure 8:

Dice scores for the multi-echo dataset. Performance on T1-weighted data is shown in dark green, on PD-weighted data in orange, and on multi-contrast input data in light green. The box plots are drawn in the same way as explained in Figure 6.

Figure 9:

Figure 9:

Top row: target scans, T1-weighted on the left and PD-weighted on the right. Bottom row: automatic segmentation using only the T1-weighted scan on the left, automatic segmentation using both scans on the right.

The volume differences between the two time points in the 39 subjects of the T1/T2 test-retest dataset are shown in Figure 10. In general, they are quite similar and small for both single- (only T1) and multi-contrast (both T1 and T2) segmentations, with the median ASPC in the 1–2% range. There are some larger differences – especially in the thalamus and pallidum – when using multi-contrast data. This appears to be mostly due to imaging artifacts in the T2-scans, an example of which is shown in Figure 11. We note that this dataset has the lowest resolution of all the datasets we tested the method on, and therefore is affected the most by partial volume segmentation errors.

Figure 10:

Figure 10:

The ASPC scores for the test-retest dataset. Volume differences between the time points on multi-contrast input data is shown in light green, and on T1-weighted data only in dark green. The box plots are drawn in the same way as explained in Figure 6. The outlier marked by an arrow is the one shown in Figure 11.

Figure 11:

Figure 11:

An example of an outlier subject marked by the arrow in Figure 10. Top row from left to right: a T1-weighted scan with no visible artifacts, a T2-weighted scan with a line-like artifact in the pallidum and thalamus area marked by red arrows, and an automated segmentation of pallidum and thalamus showing the segmentation error caused by the artifact. The bottom row shows zoomed figures of the affected area, highlighting vertical lines in the T2-scan that cause jagged borders in the automatic segmentation, resulting in a poor ASPC score for this subject.

In order to put the ASPC test/retest results of Figure 10 in perspective, we also report the ASPC scores for the benchmark methods when applied to the T1-weighted scans of the two time points. Because of the heavy computational burden of some of the methods (e.g., PICSL MALF occupies a CPU core for more than six days per scan, cf. Table 4), we only report the results on 10 randomly chosen subjects (20 scans in total) out of the available 39. The benchmark methods’ ASPC scores are shown in Figure 12, along with those obtained with the proposed method on the same subjects (both T1-only and multi-contrast). The figure shows that the proposed method, PICSL MALF and majority voting perform most reliably across the time points, while BrainFuse and FreeSurfer have more variance in their segmentations. As discussed before, the weaker performance of BrainFuse compared to the other label fusion methods is likely a combination of the chosen registration framework and sub-optimal similarity measure used for the registrations; the reasons for FreeSurfer’s weaker performance are not immediately clear. On the selected 10 subjects we did not observe problems with the pair-wise registrations when using the ANTs/SyN registration framework, leading to a robust performance of the PICSL MALF and majority voting methods in this experiment.

Figure 12:

Figure 12:

The ASPC scores of the different methods for 10 randomly chosen subjects from the test-retest dataset. The performance of the proposed method when using only T1-weighted data in dark green and when using both T1- and T2-weighted scans in light green, BrainFuse in blue, PICSL MALF in magenta, FreeSurfer in red and Majority Voting in black. The box plots are drawn in the same way as explained in Figure 6.

5. Discussion and conclusion

In this paper we have validated a whole-brain segmentation method that builds upon the parametric, unsupervised intensity clustering models commonly used in tissue classification. We have demonstrated that these type of models are capable of achieving state-of-the-art segmentation performance, while being very fast, adaptive to changes in tissue contrast, and able to handle multi-contrast data. We emphasize that the exact same algorithm was used for all datasets in this paper, without any parameter retuning or configuration changes, demonstrating the robustness of the approach.

Our experiments indicate that, in the general cross-scanner scenario, the proposed method yields a robust segmentation performance on par with the very best competitors, while being orders of magnitude faster and without requiring any form of preprocessing. The method’s accuracy is outperformed only when the image intensities of the training and test data are perfectly matched; however we believe this scenario will seldom occur in practice because manual whole-brain segmentation is so time-consuming (e.g., taking hundreds of days for the training data used in this paper) that the available training data will seldom be acquired on the exact same imaging system as the images being segmented.

Since the method we have validated here combines Gaussian mixture modeling with MRI bias field correction and probabilistic atlas deformation, it is closely related to the unified segmentation framework described in (Ashburner and Friston, 2005); however only basic tissue classification on T1-weighted images was attempted in that work. A related method based on fuzzy c-means clustering and a topological atlas was described in (Bazin and Pham, 2008), but that only segmented a handful of structures, and relied on the availability of pre-defined centroid initializations for each type of MRI sequence the method is expected to encounter.

An early attempt at whole-brain segmentation using a deformable probabilistic atlas combined with unsupervised intensity clustering was described in (Babalola et al., 2009); however, the atlas registration was performed independently of the segmentation process, using relatively coarse deformations, and the resulting segmentation performance was found to trail that of label fusion methods. Subsequent methods showing better performance (Ledig et al., 2012a, 2015; Makropoulos et al., 2014; Iglesias et al., 2013b; Tang et al., 2013) have used the non-parametric paradigm instead, where a probabilistic atlas is computed in the space of the target scan, i.e., after warping each of the training scans onto the target image using pairwise registration. We note that well-known majority voting methods (Rohfling et al., 2004a; Heckemann et al., 2006) using mutual information (Maes et al., 1997; Wells et al., 1996b; Studholme et al., 1999) as registration criterion also implicitly combine the non-parametric paradigm (multi-atlas label fusion) with unsupervised intensity clustering, since mutual information-based registration can be understood as jointly estimating registration parameters and class-conditional densities (Roche et al., 2000). In general, however, such non-parametric approaches are computationally much more expensive than the parametric method we evaluated here.

In the current paper, we only analyzed images of healthy subjects, and our experiments on multi-contrast images showed no benefit in terms of segmentation accuracy compared to when only T1-weighted scans are used. However, the ability to seamlessly handle multi-contrast data becomes essential when analyzing diseased populations, since many brain lesions are much better visualized in T2-weighted and FLAIR scans than in T1-weighted contrast. In future work we will therefore include models of pathologies in the proposed framework, enabling simultaneous whole-brain segmentation and pathology detection (Puonti and Van Leemput, 2016).

The proposed method has been evaluated on a set of structures in which the cerebral cortex was considered a single structure, without attempting to further parcellate it into neuroanatomical subregions. However, we note that the volumetric white matter segmentations generated by the method can be used to build and label cortical surface models using FreeSurfer (Dale et al., 1999; Fischl et al., 2004a). Exploring this direction remains as future work.

The segmentation software used in this paper, including the source code, the sparse probabilistic atlases and the code to build such atlases from training data, will be made publicly available.

Supplementary Material

dummy_label

6. Acknowledgments

This research was supported by the NIH NCRR (P41-RR14075, 1S10RR023043), NIBIB (R01EB013565), the Lundbeck foundation and financial contributions from the Technical University of Denmark. JEI acknowledges financial support from the Gipuzkoako Foru Aldundia (Fellows Gipuzkoa Program), as well as from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 654911.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Note that the distinction between parametric vs. non-parametric methods here only refers to the overall segmentation approach that is taken – the pair-wise registrations in non-parametric segmentation methods can still be either parametric (e.g., B-splines, Rueckert et al., 1999) or non-parametric (e.g., Demons, Thirion, 1998).

6

Dice = 2|lAlM|/(|la| + |lm|), where lA and lM are the automatic and manual segmentations respectively and | · | is the cardinality of a set.

References

  1. Aljabar P, Heckemann AR, Hammers A, Hajnal VJ, Rueckert D, 2009. Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy. NeuroImage 46, 726–738. [DOI] [PubMed] [Google Scholar]
  2. Artaechevarria X, Munõz Barrutia A, Ortiz-de Solórzano C, 2009. Combination strategies in multi-atlas image segmentation: Application to brain MR data. IEEE Transactions on Medical Imaging 28, 1266–1277. [DOI] [PubMed] [Google Scholar]
  3. Ashburner J, Andersson RLJ, Friston JK, 2000. Image registration using a symmetric prior–in three dimensions. Human Brain Mapping 9, 212–225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ashburner J, Friston JK, 1997. Multimodal image coregistration and partitioning – a unified framework. NeuroImage 6, 209–217. [DOI] [PubMed] [Google Scholar]
  5. Ashburner J, Friston JK, 2005. Unified segmentation. NeuroImage 26, 839–885. [DOI] [PubMed] [Google Scholar]
  6. Asman JA, Landman AB, 2013. Non-local statistical label fusion for multi-atlas segmentation. Medical Image Analysis 17, 194–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Avants BB, Epstein CL, Grossman M, Gee JC, 2008. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12, 26–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Awate SP, Tasdizen T, Whitaker RT, Foster N, 2006. Adaptive Markov modeling for mutual-information-based, unsupervised MRI brain-tissue classification. Medical Image Analysis 10, 726–739. [DOI] [PubMed] [Google Scholar]
  9. Babalola KO, Patenaude B, Aljabar P, Schnabel J, Kennedy D, Crum W, Smith S, Cootes T, Jenkinson M, Rueckert D, 2009. An evaluation of four automatic methods of segmenting the subcortical structures in the brain. NeuroImage 47, 1435–1447. [DOI] [PubMed] [Google Scholar]
  10. Bazin PL, Pham DL, 2008. Homeomorphic brain image segmentation with topological and statistical atlases. Medical Image Analysis 12, 616–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Buckner R, Head D, Parker J, Fotenos A, Marcus D, Morris J, Snyder A, 2004. A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas-based head size normalization: Reliability and validation against manual measurement of total intracranial volumee. NeuroImage 23, 724–738. [DOI] [PubMed] [Google Scholar]
  12. Caviness VS, Meyer J, Makris N, Kennedy DN, 1996. MRI-Based Topographic Parcellation of Human Neocortex: An Anatomically Specified Method with Estimate of Reliability. Journal of Cognitive Neuroscience 8, 566–587. [DOI] [PubMed] [Google Scholar]
  13. Caviness V Jr, Filipek P, Kennedy D, 1989. Magnetic resonance technology in human brain science: blueprint for a program based upon morphometry. Brain & Development 11, 1–13. [DOI] [PubMed] [Google Scholar]
  14. Cootes FT, Edwards JG, Taylor JC, 1998. Active appearance models. Proceedings of the 5th European Conference on Computer Vision-Volume II, 484–498. [Google Scholar]
  15. Coupé P, Manjón VJ, Fonov V, Pruessner J, Robles M, Collins LD, 2011. Patch-based segmentation using expert priors: Application to hippocampus and ventricle segmentation. NeuroImage 54, 940–954. [DOI] [PubMed] [Google Scholar]
  16. D’Agostino E, Maes F, Vandermeulen D, Suetens P, 2004. Non-rigid atlas-to-image registration by minimization of class-conditional image entropy. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2004, 745–753. [Google Scholar]
  17. D’Agostino E, Maes F, Vandermeulen D, Suetens P, 2006. A unified framework for atlas based brain image segmentation and registration, in: Biomedical Image Registration. volume 4057, pp. 136–143. [Google Scholar]
  18. Dale MA, Fischl B, Sereno IM, 1999. Cortical surface-based analysis I: Segmentation and surface reconstruction. NeuroImage 9, 179–194. [DOI] [PubMed] [Google Scholar]
  19. Dempster AP, Laird NM, Rubin DB, 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society, Series B 39, 1–38. [Google Scholar]
  20. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, Salat HD, Busa E, Seidman JL, Goldstein J, Kennedy D, Caviness V, Makris N, Rosen B, Dale MA, 2004a. Automatically parcellating the human cerebral cortex. Cerebral Cortex 14, 11–22. [DOI] [PubMed] [Google Scholar]
  21. Fischl B, Salat DH, van der Kouwe AJW, Makris N, Ségonne F, Quinn BT, Dale AM, 2004b. Sequence-independent segmentation of magnetic resonance images. NeuroImage 23, S69–S84. [DOI] [PubMed] [Google Scholar]
  22. Fischl B, Salat HD, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale MA, 2002. Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355. [DOI] [PubMed] [Google Scholar]
  23. Han X, Fischl B, 2007. Atlas renormalization for improved brain MR image segmentation across scanner platforms. IEEE Transactions on Medical Imaging 26, 479–486. [DOI] [PubMed] [Google Scholar]
  24. Heckemann RA, Hajnal JV, Aljabar P, Rueckert D, Hammers A, 2006. Automatic anatomical brain MRI segmentation combining label propagation and decision fusion. NeuroImage 33, 115–126. [DOI] [PubMed] [Google Scholar]
  25. Holmes AJ, Lee PH, Hollinshead MO, Bakst L, Roffman JL, Smoller JW, Buckner RL, 2012. Individual differences in amygdala-medial prefrontal anatomy link negative affect, impaired social functioning, and polygenic depression risk. The Journal of Neuroscience 32, 18087–18100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Iglesias JE, Augustinack JC, Nguyen K, Player CM, Player A, Wright M, Roy N, Frosch MP, McKee AC, Wald LL, Fischl B, Van Leemput K, 2015a. A computational atlas of the hippocampal formation using ex vivo, ultra-high resolution mri: Application to adaptive segmentation of in vivo mri. NeuroImage 115, 117–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Iglesias JE, Konukoglu E, Zikic D, Glocker B, Van Leemput K, Fischl B, 2013a. Is synthesizing MRI contrast useful for intermodality analysis?, in: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, pp. 631–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Iglesias JE, Sabuncu MR, 2015. Multi-atlas segmentation of biomedical images: A survey. Medical Image Analysis 24, 205–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Iglesias JE, Sabuncu RM, Van Leemput K, 2012. A generative model for multi-atlas segmentation across modalities, in: 9th IEEE International Symposium on Biomedical Imaging (ISBI), 2012, pp. 888–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Iglesias JE, Sabuncu RM, Van Leemput K, 2013b. A unified framework for cross-modality multi-atlas segmentation of brain MRI. Medical Image Analysis 17, 1181–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Iglesias JE, Van Leemput K, Bhatt P, Casillas C, Dutt S, Schuff N, Truran-Sacrey D, Boxer A, Fischl B, 2015b. Bayesian segmentation of brainstem structures in mri. NeuroImage 113, 184–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Isgum I, Staring M, Rutten A, Prokop M, Viergever MA, van Ginneken B, 2009. Multi-atlas-based segmentation with local decision fusion – application to cardiac and aortic segmentation in CT scans. IEEE Transactions on Medical Imaging 28, 1000–1010. [DOI] [PubMed] [Google Scholar]
  33. Kelemen A, Székely G, Gerig G, 1998. Three-dimensional model-based segmentation of brain MRI, in: Workshop on Biomedical Image Analysis, pp. 4–13. [Google Scholar]
  34. Kennedy D, Filipek P, Caviness V, 1989. Anatomic segmentation and volumetric calculations in nuclear magnetic resonance imaging. IEEE Transactions on Medical Imaging 8, 1–7. [DOI] [PubMed] [Google Scholar]
  35. Klein A, Andersson J, Ardekani AB, Ashburner J, Avants B, Chiang MC, Christensen EG, Collins LD, Gee J, Hellier P, Song HJ, Jenkinson M, Lepage C, Rueckert D, Thompson P, Vercauteren T, Woods PR, Mann JJ, Parsey VR, 2009. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. NeuroImage 46, 786–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Landman AB, Warfield KS, 2012. Miccai 2012 workshop on multi-atlas labeling, in: 15th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2012), pp. 91–95. [Google Scholar]
  37. Ledig C, Heckemann AR, Aljabar P, Wolz R, Hajnal VJ, Hammers A, Rueckert D, 2012a. Segmentation of MRI brain scans using MALP-EM, in: MICCAI 2012 Grand Challenge and Workshop on Multi-Atlas Labeling, pp. 79–82. [Google Scholar]
  38. Ledig C, Heckemann RA, Hammers A, Lopez JC, Newcombe VFJ, Makropoulos A, Lötjönen J, Menon DK, Rueckert D, 2015. Robust whole-brain segmentation: Application to traumatic brain injury. Medical Image Analysis 21, 40–58. [DOI] [PubMed] [Google Scholar]
  39. Ledig C, Wolz R, Aljabar P, Lötjönen J, Heckemann RA, Hammers A, Rueckert D, 2012b. Multi-class brain segmentation using atlas propagation and EM-based refinement, in: 9th IEEE International Symposium on Biomedical Imaging (ISBI), 2012, pp. 896–899. [Google Scholar]
  40. Maes F, Collignon A, Vandermeulen D, Marchal G, Suetens P, 1997. Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, 187–198. [DOI] [PubMed] [Google Scholar]
  41. Makropoulos A, Gousias IS, Ledig C, Aljabar P, Serag A, Hajnal JV, Edwards AD, Counsell SJ, Rueckert D, 2014. Automatic whole brain MRI segmentation of the developing neonatal brain. IEEE Transactions on Medical Imaging 33, 1818–1831. [DOI] [PubMed] [Google Scholar]
  42. Nyúl LG, Udupa JK, Zhang X, 2000. New variants of a method of MRI scale standardization. IEEE Transactions on Medical Imaging 19, 143–150. [DOI] [PubMed] [Google Scholar]
  43. Patenaude B, Smith MS, Kennedy ND, Jenkinson M, 2011. A Bayesian model of shape and appearance for subcortical brain segmentation. NeuroImage 56, 907–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Pizer MS, Fletcher TP, Sarang J, Thall A, Chen ZJ, Fridman Y, Fritsch SD, Gash GA, Glotzer MJ, Jiroutek RM, Lu C, Muller EK, Tracton G, Yushkevich P, Chaney LE, 2003. Deformable M-reps for 3D medical image segmentation. International Journal of Computer Vision 55, 85–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Pohl K, Fisher J, Shenton M, McCarley R, Grimson W, Kikinis R, Wells W, 2006a. Logarithm odds maps for shape representation, in: Larsen R, Nielsen M, Sporring J (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006. Springer Berlin Heidelberg. volume 4191 of Lecture Notes in Computer Science, pp. 955–963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Pohl KM, Bouix S, Nakamura M, Rohlfing T, McCarley RW, Kikinis R, Grimson WEL, Shenton ME, Wells WM, 2007. A hierarchical algorithm for MR brain image parcellation. IEEE Transactions on Medical Imaging 26, 1201–1212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pohl MK, Fisher J, Grimson LEW, Kikinis R, Wells MW, 2006b. A Bayesian model for joint segmentation and registration. NeuroImage 31, 228–239. [DOI] [PubMed] [Google Scholar]
  48. Prastawa M, Gerig G, Lin W, Gilmore JH, 2005. Automatic segmentation of MR images of the developing newborn brain. Medical Image Analysis 9, 457–466. [DOI] [PubMed] [Google Scholar]
  49. Puonti O, Iglesias JE, Van Leemput K, 2013. Fast, sequence adaptive parcellation of brain MR using parametric models, in: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, pp. 727–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Puonti O, Van Leemput K, 2016. Simultaneous whole-brain segmentation and white matter lesion detection using contrast-adaptive probabilistic models, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Springer International Publishing. volume 9556 of Lecture Notes in Computer Science, pp. 9–20. [Google Scholar]
  51. Reuter M, Schmansky JN, Rosas DJ, Fischl B, 2012. Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage 61, 1402–1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Roche A, Malandain G, Ayache N, 2000. Unifying maximum likelihood approaches in medical image registration. International Journal of Imaging Systems and Technology 11, 71–80. [Google Scholar]
  53. Rohfling T, Brandt R, Menzel R, Maurer RC Jr., 2004a. Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains. NeuroImage 21, 1428–1442. [DOI] [PubMed] [Google Scholar]
  54. Rohfling T, Russakoff BD, Maurer RC Jr., 2004b. Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation. IEEE Transactions on Medical Imaging 23, 983–994. [DOI] [PubMed] [Google Scholar]
  55. Rousseau F, Habas AP, Studholme C, 2011. A supervised patch-based approach for human brain labeling. IEEE Transactions on Medical Imaging 30, 1852–1862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Roy S, Carass A, Prince J, 2013. Magnetic resonance image example-based contrast synthesis. IEEE Transactions on Medical Imaging 32, 2348–2363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rueckert D, Sonoda LI, Hayes C, Hill DL, Leach MO, Hawkes DJ, 1999. Nonrigid registration using free-form deformations: application to breast MR images. IEEE Transactions on Medical Imaging 18, 712–721. [DOI] [PubMed] [Google Scholar]
  58. Sabuncu MR, Yeo TTB, Van Leemput K, Fischl B, Golland P, 2010. A generative model for image segmentation based on label fusion. IEEE Transactions on Medical Imaging 29, 1714–1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Shewchuk JR, 1994. An introduction to the conjugate gradient method without the agonizing pain. Technical Report. School of Computer Science, Carnegie Mellon University. [Google Scholar]
  60. Studholme C, Hill DL, Hawkes DJ, 1999. An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognition 32, 71–86. [Google Scholar]
  61. Ta VT, Giraud R, Collins DL, Coupé P, 2014. Optimized patchmatch for near real time and accurate label fusion, in: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, pp. 105–112. [DOI] [PubMed] [Google Scholar]
  62. Tang X, Oishi K, Faria AV, Hillis AE, Albert MS, Mori S, Miller MI, 2013. Bayesian parameter estimation and segmentation in the multi-atlas random orbit model. PloS one 8, e65591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Thirion JP, 1998. Image matching as a diffusion process: an analogy with Maxwell’s demons. Medical Image Analysis 2, 243–260. [DOI] [PubMed] [Google Scholar]
  64. Tong T, Wolz R, Coupé P, Hajnal VJ, Rueckert D, 2013. Segmentation of MR images via discriminative dictionary learning and sparse coding: Application to hippocampus labeling. NeuroImage 76, 11–23. [DOI] [PubMed] [Google Scholar]
  65. Van Leemput K, 2009. Encoding probabilistic brain atlases using Bayesian inference. IEEE Transactions on Medical Imaging 28, 822–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Van Leemput K, Maes F, Vandermeulen D, Suetens P, 1999a. Automated model-based bias field correction of MR images of the brain. IEEE Transactions on Medical Imaging 18, 897–908. [DOI] [PubMed] [Google Scholar]
  67. Van Leemput K, Maes F, Vandermeulen D, Suetens P, 1999b. Automated model-based tissue classification of MR images of the brain. IEEE Transactions on Medical Imaging 18, 885–896. [DOI] [PubMed] [Google Scholar]
  68. Van Leemput K, Maes F, Vandermeulen D, Suetens P, 2001. Automated segmentation of multiple sclerosis lesions by model outlier detection. IEEE Transactions on Medical Imaging 20, 677–688. [DOI] [PubMed] [Google Scholar]
  69. Wang H, Das SR, Suh JW, Altinay M, Pluta J, Craige C, Avants B, Yushkevich PA, 2011. A learning-based wrapper method to correct systematic errors in automatic image segmentation: consistently improved performance in hippocampus, cortex and brain segmentation. NeuroImage 55, 968–985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Wang H, Suh WJ, Das RS, Pluta J, Craige C, Yushkevich AP, 2013. Multi-atlas segmentation with joint label fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 611–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Wells MW III, Grimson LEW, Kikinis R, Jolesz AF, 1996a. Adaptive segmentation of MRI data. IEEE Transactions on Medical Imaging 15, 429–442. [DOI] [PubMed] [Google Scholar]
  72. Wells WM, Viola P, Atsumi H, Nakajima S, Kikinis R, 1996b. Multi-modal volume registration by maximization of mutual information. Medical Image Analysis 1, 35–51. [DOI] [PubMed] [Google Scholar]
  73. Wu G, Wang Q, Zhang D, Nie F, Huang H, Shen D, 2014. A generative probability model of joint label fusion for multi-atlas based brain segmentation. Medical Image Analysis 18, 881–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Zhang Y, Brady M, Smith S, 2001. Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging 20, 45–57. [DOI] [PubMed] [Google Scholar]
  75. Zijdenbos AP, Forghani R, Evanc AC, 2002. Automatic “pipeline” analysis of 3-D MRI data for clinical trials: Application to multiple sclerosis. IEEE Transactions on Medical Imaging 21, 1280–1291. [DOI] [PubMed] [Google Scholar]
  76. Zikic D, Glocker B, Criminisi A, 2014. Encoding atlases by randomized classification forests for efficient multi-atlas label propagation. Medical Image Analysis 18, 1262–1273. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

dummy_label

RESOURCES