Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Feb 3.
Published in final edited form as: Neuroimage. 2020 Oct 22;225:117471. doi: 10.1016/j.neuroimage.2020.117471

A contrast-adaptive method for simultaneous whole-brain and lesion segmentation in multiple sclerosis

Stefano Cerri a,b,*, Oula Puonti b, Dominik S Meier c, Jens Wuerfel c, Mark Mühlau d, Hartwig R Siebner b,e,f, Koen Van Leemput a,g
PMCID: PMC7856304  NIHMSID: NIHMS1659126  PMID: 33099007

Abstract

Here we present a method for the simultaneous segmentation of white matter lesions and normal-appearing neuroanatomical structures from multi-contrast brain MRI scans of multiple sclerosis patients. The method integrates a novel model for white matter lesions into a previously validated generative model for whole-brain segmentation. By using separate models for the shape of anatomical structures and their appearance in MRI, the algorithm can adapt to data acquired with different scanners and imaging protocols without retraining. We validate the method using four disparate datasets, showing robust performance in white matter lesion segmentation while simultaneously segmenting dozens of other brain structures. We further demonstrate that the contrast-adaptive method can also be safely applied to MRI scans of healthy controls, and replicate previously documented atrophy patterns in deep gray matter structures in MS. The algorithm is publicly available as part of the open-source neuroimaging package FreeSurfer.

Keywords: Lesion segmentation, Multiple sclerosis, Whole-brain segmentation, Generative model

1. Introduction

Multiple sclerosis (MS) is the most frequent chronic inflammatory autoimmune disorder of the central nervous system, causing progressive damage and disability. The disease affects nearly half a million Americans and 2.5 million individuals world-wide (Goldenberg, 2012; Rosati, 2001), generating more than $10 billion in annual healthcare spending in the United States alone (Adelman et al., 2013).

The ability to diagnose MS and track its progression has been greatly enhanced by magnetic resonance imaging (MRI), which can detect characteristic brain lesions in white and gray matter (Bakshi et al., 2008; Blystad et al., 2015; García-Lorenzo et al., 2013; Lövblad et al., 2010). Lesions visualized by MRI are up to an order of magnitude more sensitive in detecting disease activity compared to clinical assessment (Filippi et al., 2006). The prevalence and dynamics of white matter lesions are thus used clinically to diagnose MS (Thompson et al., 2018), define disease stages and to determine the efficacy of a therapeutic regimen (Sormani, 2013). MRI is also an unparalleled tool for characterizing brain atrophy, which occurs at a faster rate in patients with MS compared to healthy controls (Azevedo et al., 2018; Barkhof et al., 2009) and, especially in deep gray matter structures and the cerebral cortex, has been shown to correlate with measures of disability (Geurts et al., 2012).

Although manual labeling remains the most accurate way1 of delineating white matter lesions in MS (Commowick et al., 2018), this approach is very cumbersome and in itself prone to considerable intra- and inter-rater disagreement (Zijdenbos et al., 1998). Furthermore, manually labeling various normal-appearing brain structures to assess atrophy is simply too time consuming to be practically feasible. Therefore, there is a clear need for automated tools that can reliably and efficiently characterize the morphometry of white matter lesions, various neuroanatomical structures, and their changes over time directly from in vivo MRI. Such tools are of great potential value for diagnosing disease, tracking progression, and evaluating treatment. They can also help in obtaining a better understanding of underlying disease mechanisms, and to facilitate more efficient testing in clinical trials. Ultimately, automated software tools may help clinicians to prospectively identify which patients are at highest risk of future disability accrual, leading to better counseling of patients and better overall clinical outcomes.

Despite decades of methodological development (cf. García-Lorenzo et al., 2013 or Danelakis et al., 2018), currently available computational tools for analyzing MRI scans of MS patients remain limited in a number of important ways:

  • Poor generalizability: Existing tools are often developed and tested on very specific imaging protocols, and may not be able to work on data that is acquired differently. Especially with the strong surge of supervised learning in recent years, where the relationship between image appearance and segmentation labels in training scans is directly and statically encoded, the segmentation performance of many state-of-the-art algorithms will degrade substantially when applied to data from different scanners and acquisition protocols (García-Lorenzo et al., 2013; Valverde et al., 2019), severely limiting their usefulness in practice.

  • Dearth of available software: Despite the very large number of proposed methods, most algorithms are only developed and tested inhouse, and very few tools are made publicly available (Griffanti et al., 2016; Schmidt et al., 2012; Shiee et al., 2010; Valverde et al., 2017). In order to secure that computational methods will make a real practical impact, they must be accompanied by software implementations that work robustly across a wide array of image acquisitions; that are made publicly available; and that are open-sourced, rigorously tested and comprehensively documented.

  • Limitations in assessing atrophy: There is a lack of dedicated tools for characterizing brain atrophy patterns in MS: many existing methods characterize only aggregate measures such as global brain or gray matter volume (Smeets et al., 2016; Smith et al., 2002) rather than individual brain structures, or require that lesions are pre-segmented so that their MRI intensities can be replaced with placeholder values to avoid biased atrophy measures (Azevedo et al., 2018; Battaglini et al., 2012; Ceccarelli et al., 2012; Chard et al., 2010; Gelineau-Morel et al., 2012; Sdika and Pelletier, 2009) (so-called lesion filling).

In order to address these limitations, we describe a new open-source software tool for simultaneously segmenting white matter lesions and 41 neuroanatomical structures from MRI scans of MS patients. An example segmentation produced by this tool is shown in Fig. 1. By performing lesion segmentation in the full context of whole-brain modeling, the method obviates the need to segment lesions and assess atrophy in two separate processing phases, as currently required in lesion filling approaches. The method works robustly across a wide range of imaging hardware and protocols by completely decoupling computational models of anatomy from models of the imaging process, thereby sidestepping the intrinsic generalization difficulties of supervised methods such as convolutional neural networks. Our software implementation is freely available as part of the FreeSurfer neuroimaging analysis package (Fischl, 2012).

Fig. 1.

Fig. 1.

Segmentation of white matter lesions and 41 different brain structures from the proposed method on T1w-FLAIR input. From left to right: sagittal, coronal, axial view. From top to bottom: T1w, FLAIR, automatic segmentation.

To the best of our knowledge, only two other methods have been developed for joint whole-brain and white matter lesion segmentation in MS. Shiee et al. (2010) model lesions as an extra tissue class in an unsupervised whole-brain segmentation method (Bazin and Pham, 2008), removing false positive detections of lesions using a combination of topological constraints and hand-crafted rules implementing various intensity- and distance-based heuristics. However, the method segments only a small set of neuroanatomical structures (10), and validation of this aspect was limited to a simulated MRI scan of a single subject. McKinley et al. (2019) use a cascade of two convolutional neural networks, with the first one skull-stripping individual image modalities and the second one generating the actual segmentation. However, the whole-brain segmentation performance of this method was only evaluated on a few structures (7). Furthermore, as a supervised method its applicability on data that differs substantially from its training data will necessarily be limited.

A preliminary version of this work was presented in Puonti and Van Leemput (2016). Compared to this earlier work, the current article employs more advanced models for the shape and appearance of white matter lesions, and includes a more thorough validation of the segmentation performance of the proposed method, including an evaluation of the whole-brain segmentation component and comparisons with human inter-rater variability.

2. Contrast-adaptive whole-brain segmentation

We build upon a method for whole-brain segmentation called Sequence Adaptive Multimodal SEGmentation (SAMSEG) that we previously developed (Puonti et al., 2016), and that we propose to extend with the capability to handle white matter lesions. SAMSEG robustly segments 41 structures from head MRI scans without any form of preprocessing or prior assumptions on the scanning platform or the number and type of pulse sequences used. Since we build heavily on this method for the remainder of the paper, we briefly outline its main characteristics here.

SAMSEG is based on a generative approach, in which a forward probabilistic model is inverted to obtain automated segmentations. Let D = (d1 , …, dI) denote a matrix collecting the intensities in a multi-contrast brain MR scan with I voxels, where the vector di=(di1,,dNi)T contains the intensities in voxel i for each of the available N contrasts. Furthermore, let l = (l1 , …, lI)T be the corresponding labels, where li ∈ {1 , … K} denotes one of the K possible segmentation labels assigned to voxel i. SAMSEG estimates a segmentation l from MRI data D by using a generative model, illustrated in black in Fig. 2. According to this model, l is sampled from a segmentation prior p(l|θl), after which D is obtained by sampling from a likelihood function p(D|l, θd), where θl and θd are model parameters with priors p(θl) and p(θd). Segmentation then consists of inferring the unknown l from the observed D under this model. In the following, we summarize the segmentation prior and the likelihood used in SAMSEG, as well as the way the resulting model is used to obtain automated segmentations.

Fig. 2.

Fig. 2.

Graphical model of the proposed method. In black the existing contrast-adaptive whole-brain segmentation method SAMSEG (without lesion modeling), in blue the proposed additional components to also model white matter lesions. Shading indicates observed variables. The plate indicates I repetitions of the included variables, where I is the number of voxels.

2.1. Segmentation prior

To model the spatial configuration of various neuroanatomical structures, we use a deformable probabilistic atlas as detailed in Puonti et al. (2016). In short, the atlas is based on a tetrahedral mesh, where the parameters θl are the spatial positions of the mesh’s vertices, and p(θl) is a topology-preserving deformation prior that prevents the mesh from tearing or folding (Ashburner et al., 2000). The model assumes conditional independence of the labels between voxels for a given deformation:

p(l|θ1)=i=1Ip(li|θl),

and computes the probability of observing label k at voxel i as

p(li=k|θl)=j=1Jαjkψji(θl), (1)

where akj are label probabilities defined at the J vertices of the mesh, and ψji(θl) denotes a spatially compact, piecewise-linear interpolation basis function attached to the jth vertex and evaluated at the ith voxel (Van Leemput, 2009).

The topology of the mesh, the mode of the deformation prior p(θl), and the label probabilities αkj can be learned automatically from a set of segmentations provided as training data (Van Leemput, 2009). This involves an iterative process that combines a mesh simplification operation with a group-wise nonrigid registration step to warp the atlas to each of the training subjects, and an Expectation Maximization (EM) algorithm (Dempster et al., 1977) to estimate the label probabilities αkj in the mesh vertices. The result is a sparse mesh that encodes high-dimensional atlas deformations through a compact set of vertex displacements. As described in Puonti et al. (2016), the atlas used in SAMSEG was derived from manual whole-brain segmentations of 20 subjects, representing a mix of healthy individuals and subjects with questionable or probable Alzheimer’s disease.

2.2. Likelihood function

For the likelihood function we use a Gaussian model for each of the K different structures. We assume that the bias field artifact can be modelled as a multiplicative and spatially smooth effect (Wells et al., 1996). For computational reasons, we use log-transformed image intensities in D, and model the bias field as a linear combination of spatially smooth basis functions that is added to the local voxel intensities (Van Leemput et al., 1999). Letting θd collect all bias field parameters and Gaussian means and variances, the likelihood is defined as

p(D|l,θd)=i=1Ip(di|li,θd),
p(di|li=k,θd)=N(di|μk+Cϕi,Σk),
C=(c1TcNT),cn=(cn,1cn,P),ϕi=(ϕ1iϕPi),

where P denotes the number of bias field basis functions, ϕpi is the basis function p evaluated at voxel i, and cn holds the bias field coefficients for MRI contrast n. We use a flat prior for the parameters of the likelihood: p(θd) ∝ 1.

2.3. Segmentation

For a given MRI scan D, segmentation proceeds by computing a point estimate of the unknown model parameters θ = {θd, θl}:

θ^=argmaxθp(θ|D),

which effectively fits the model to the data. Details of this procedure are given in Appendix A. Once θ^ is found, the corresponding maximum a posteriori (MAP) segmentation

l^=argmaxIp(l|D,θ^)

is obtained by assigning each voxel to the label with the highest probability, i.e., l^i=argmaxkw^i,k, where 0w^i,k1 are probabilistic label assignments

wi,k=N(di|μk+Cϕi,Σk)p(li=k|θl)Σk=1KN(di|μk+Cϕi,Σk)p(li=k|θl) (2)

evaluated at the estimated parameters θ^. It is worth emphasizing that, since the class means and variances {μk, Σk} are estimated from each target scan individually, the model automatically adapts to each scan’s specific intensity characteristics – a property that we demonstrated experimentally on several data sets acquired with different imaging protocols, scanners and field strengths in Puonti et al. (2016).

Our implementation of this method, written in Python with the exception of C++ parts for the computationally demanding optimization of the atlas mesh deformation, is available as part of the open-source package FreeSurfer2. It segments MRI brain scans without any form of preprocessing such as skull stripping or bias field correction, taking around 10 minutes to process one subject on a state-of-the-art computer (measured on a machine with an Intel 12-core i7-8700K processor). As explained in Puonti et al. (2016), in our implementation we make use of the fact that many neuroanatomical structures share the same intensity characteristics in MRI to reduce the number of free parameters in the model (e.g., all white matter structures share the same Gaussian mean μk and variance Σk, as do most gray matter structures). Furthermore, for some structures (e.g., non-brain tissue) we use Gaussian mixture models instead of a single Gaussian. In addition to using full covariance matrices Σk, our implementation also supports diagonal covariances, which is currently selected as the default behavior.

3. Modeling lesions

In order to make SAMSEG capable of additionally segmenting white matter lesions, we augment its generative model by introducing a binary lesion map z = (z1 , … , zI)T, where zi ∈ {0, 1} indicates the presence of a lesion in voxel i. The augmented model is depicted in Fig. 2, where the blue parts indicate the additional components compared to the original SAMSEG method. The complete model consists of a joint (i.e., over both l and z simultaneously) segmentation prior p(l, z|h, θl), where h is a new latent variable that helps constrain the shape of lesions, as well as a joint likelihood p(D|l, z, θd, θles), where θles are new parameters that govern their appearance. In the following, we summarize the segmentation prior and the likelihood used in the augmented model, as well as the way the resulting model is used to obtain automated segmentations.

3.1. Segmentation prior

We use a joint segmentation prior of the form

p(l,z|h,θl)=p(z|h,θl)p(l|θl),

where p(l|θl) is the deformable atlas model defined in Section 2.1, and

p(z|h,θl)=i=1Ip(zi|h,θl)

is a factorized model where the probability that a voxel is part of a lesion is given by:

p(zi=1||h,θl)=fi(h)ρi(θl).

Here 0 ≤ fi(h) ≤ 1 aims to enforce shape constraints on lesions, whereas 0 ≤ ρi(θl) ≤ 1 takes into account a voxel’s spatial location within its neuroanatomical context. Below we provide more details on both these components of the model.

3.1.1. Modeling lesion shapes

In order to model lesion shapes, we use a variational autoencoder (Kingma and Welling, 2013; Rezende et al., 2014) according to which lesion segmentation maps z are generated in a two-step process: An unobserved, low-dimensional code h is first sampled from a spherical Gaussian distribution p(h)=N(h|0,I), and subsequently “decoded” into z by sampling from a factorized Bernoulli model:

pω(z|h)=i=1Ifi(h)zi(1fi(h))(1zi).

Here fi(h) are the outputs of a “decoder” convolutional neural network (CNN) with filter weights ω, which parameterize the model.

Given a training data set in the form of N binary segmentation maps D={z(n)}n=1N,, suitable network parameters ω can in principle be estimated by maximizing the log-probability assigned to the data by the model :

logpω(D)=zDlogpω(z),wherepω(z)=hpω(zn|h)p(h)dh.

However, because the integral over the latent codes makes this intractable, we use amortized variational inference in the form of stochastic gradient variational Bayes (Kingma and Welling, 2013; Rezende et al., 2014). In particular, we introduce an approximate posterior

qυ(h|z)=N(h|μυ(z),diag(συ2(z))),

where the functions μυ(z) and ρυ(z) are implemented as an “encoder” CNN parameterized by υ. The variational parameters υ are then learned jointly with the model parameters ω by maximizing a variational lower bound zDLω,υ(z)logpω(D) using stochastic gradient descent, where

Lω,υ(z)=DKL(qυ(h|z)||p(h))+Eqυ(h|z)[logpω(z|h)]. (3)

The first term is the Kullback–Leibler divergence between the approximate posterior and the prior, which can be evaluated analytically. The expectation in the last term is approximated using Monte Carlo sampling, using a change of variables (known as the “reparameterization trick”) to reduce the variance in the computation of the gradient with respect to υ (Kingma and Welling, 2013; Rezende et al., 2014).

Our training data set D was derived from manual lesion segmentations in 212 MS subjects, obtained from the University Hospital of Basel, Switzerland. The segmentations were all affinely registered and resampled to a 1 mm isotropic grid of size 197×233×189. In order to reduce the risk of overfitting to the training data, we augmented each segmentation in the training data set by applying a rotation of 10 degrees around each axis, obtaining a total of 1484 segmentations. The architecture for our encoder and decoder networks is detailed in Fig. 3. We trained the model for 1000 epochs with mini-batch size of 10 using Adam optimizer (Kingma and Ba, 2014) with a learning rate of 1e-4. We approximated the expectation in the variational lower bound of Eq. (3) by using a single Monte Carlo sample in each step.

Fig. 3.

Fig. 3.

Lesion shape model architecture consisting of two symmetrical convolutional neural networks: (a) decoder network and (b) encoder network. The decoder network generates lesion segmentations from a low-dimensional code. Its architecture has ReLU activation functions (f(x) = max(0, x)) and batch normalization (Ioffe and Szegedy, 2015) between each deconvolution layer, with the last layer having a sigmoid activation function, ensuring 0 ≤ fi (h) ≤ 1. The encoder network encodes lesion segmentations into a latent code. The main differences compared to the decoder network are the use of convolutional layers instead of deconvolutional layers and, to encode the mean and variance parameters, the last layer has been split in two, with no activation function for the mean and a softplus activation function (f(x) = ln(1 + ex)) for the variance.

3.1.2. Modeling the spatial location of lesions

In order to encode the spatially varying frequency of occurrence of lesions across the brain, we model the probability of finding a lesion in voxel i, based on its location alone, as

ρi(θl)=j=1Jβjψji(θl),

where lesion probabilities 0 ≤ βj ≤ 1 defined in the vertices of the SAM-SEG atlas mesh are interpolated at the voxel location. This effectively defines a lesion probability map that deforms in conjunction with the SAMSEG atlas to match the neuroanatomy in each image being segmented, allowing the model to impose contextual constraints on where lesions are expected to be found.

We estimated the parameters βj by running SAMSEG on MRI scans (T1-weighted (T1w) and FLAIR) of 54 MS subjects in whom lesions had been manually annotated (data from the University Hospital of Basel, Switzerland), and recording the estimated atlas deformations. The parameters βj were then computed from the manual lesion segmentations by applying the same technique we used to estimate the αjk parameters in the SAMSEG atlas training phase (cf. Section 2.1).

3.2. Likelihood function

For the likelihood, which links joint segmentations {l, z} to intensities D, we use the same model as SAMSEG in voxels that do not contain lesion (zi = 0), but draw intensities in lesions (zi = 1) from a separate Gaussian with parameters θles = {μles, Σles}:

p(D|l,z,θd,θles)=i=1Ip(di|li,zi,θd,θles),

where

p(di|li=k,zi,θd,θles)={N(di|μles+Cϕi,Σles)ifzi=1,N(di|μk+Cϕi,Σk)otherwise.

In order to constrain the values that the lesion intensity parameters θles can take, we make them conditional on the remaining intensity parameters using a normal-inverse-Wishart distribution :

p(θles|θd)=N(μles|μWM,ν1Σles)IW(Σles|κνΣWM,νN2). (4)

Here the subscript “WM” denotes the white matter Gaussian and κ > 1 and ν ≥ 0 are hyperparameters in the model.

This choice of model is motivated by the fact that the normal-inverse-Wishart distribution is a conjugate prior for the parameters of a Gaussian distribution: Eq. (4) can be interpreted as providing ν “pseudo-voxels” with empirical mean μW M and variance κΣW M in scenarios where the lesion intensity parameters μles and Σles need to be estimated from data. In the absence of any such pseudo-voxels (ν = 0), Eq. (4) reduces to a flat prior on θles and lesions are modeled as a completely independent class. Although such models have been used in the literature (Guttmann et al., 1999; Kikinis et al., 1999; Shiee et al., 2010; Sudre et al., 2015) their robustness may suffer when applied to subjects with no or very few lesions, such as controls or patients with early disease, since there is essentially no data to estimate the lesion intensity parameters from. In the other extreme case, the number of pseudo-voxels can be set to such a high value (ν → ∞) that the intensity parameters of the lesions are fully determined by those of WM. This effectively replaces the Gaussian intensity model for WM in SAMSEG by a distribution with longer tails, in the form of a mixture of two Gaussians with identical means (μlesμW M) but variances that differ by a constant factor (ΣlesκΣW M vs. ΣW M). In this scenario, MS lesions are detected as model outliers in a method using robust model parameter estimation (Huber, 1981), another technique that has also frequently been used in the literature (Aït-Ali et al., 2005; Bricq et al., 2008; García-Lorenzo et al., 2011; Liu et al., 2009; Prastawa and Gerig, 2008; Rousseau et al., 2008; Van Leemput et al., 2001).

Based on pilot experiments on a variety of datasets (distinct from the ones used in the results section), we found that good results are obtained by using an intermediate value of ν = 500 pseudo-voxels for 1 mm3 isotropic scans, together with a scaling factor κ = 50. In order to adapt to different image resolutions, ν is scaled inversely proportionally with the voxel size in our implementation. We will visually demonstrate the role of these hyperparameters in constraining the lesion intensity parameters in Section 5.1.

3.3. Segmentation

As in the original SAMSEG method, segmentation proceeds by first obtaining point estimates θ^ that fit the model to the data, and then inferring the corresponding segmentation posterior:

p(l,z|D,θ^),

which is now jointly over l and z simultaneously. Unlike in SAMSEG, however, both steps are made intractable by the presence of the new variables θles and h in the model. In order to side-step this difficulty, we obtain θ^ through a joint optimization over both θ and θles:

{θ^,θ^les}=argmax{θ,θles}p(θ,θles|D)

in a simplified model in which the constraints on lesion shape have been removed, by clamping all decoder network outputs fi(h) to value 1. This simplification is defensible since the aim here is merely to find appropriate model parameters, rather than highly accurate lesion segmentations. By doing so, the latent code h is effectively removed from the model and the optimization simplifies into the one used in the original SAM-SEG method, with only minor modifications due to the prior p(θles|θd). Details are provided in Appendix B.

Once parameter estimates θ^ are available, we compute segmentations using the factorization

p(l,z|D,θ^)=p(z|D,θ^)p(l|,z,D,θ^),

first estimating z from p(z|D,θ^) (Step 1 below), and then plugging this into p(l|,z,D,θ^) to estimate l (Step 2):

Step 1: Evaluating p(z|D,θ^) involves marginalizing over both h and θles, which we approximate by drawing S Monte Carlo samples {h(s),θles(s)}s=1S from p(h,θles|D,θ^):

p(z|D,θ^)=h,θlesp(z|D,θ^,h,θles)p(h,θles|D,θ^)dh,θles1Ss=1Sp(z|D,θ^,h(s),θles(s)).

This allows us to estimate the probability of lesion occurrence in each voxel, which we then compare with a user-specified threshold value γ

p(zi=1|di,θ^)γ

to obtain the final lesion segmentation z^i. Details on how we approximate p(zi=1|di,θ^) using Monte Carlo sampling are provided in Appendix C.

Step 2: Voxels that are not assigned to lesion (z^i=0) in the previous step are finally assigned to the neuroanatomical structure with the highest probability p(li=k|zi=0,di,θ^),, which simply involves computing l^i=argmaxkw^i,k with w^i,k defined in Eq. (2).

In agreement with other work (Aït-Ali et al., 2005; García-Lorenzo et al., 2011; Jain et al., 2015; Prastawa and Gerig, 2008; Shiee et al., 2010; Van Leemput et al., 2001), we have found that using known prior information regarding the expected intensity profile of MS lesions in various MRI contrasts can help reduce the number of false positive detections. Therefore, we prevent some voxels from being assigned to lesion (i.e., forcing z^i=0) based on their intensities in relation to the estimated intensity parameters {μ^k,Σ^k}k=1K:: In our current implementation only voxels with an intensity higher than the mean of the gray matter Gaussian in FLAIR and/or T2 (if these modalities are present) are considered candidate lesions.

Since estimating p(zi=1|di,θ^) involves repeatedly invoking the decoder and encoder networks of the lesion shape model, as detailed in Appendix C, we implemented the proposed method as an add-on to SAMSEG in Python using the Tensorflow library (Abadi et al., 2015). Estimating θ^ has the same computational complexity as running SAM-SEG (i.e., taking approximately 10 minutes on a state-of-the-art machine with an Intel 12-core i7-8700K CPU), while the Monte Carlo sampling takes an additional 5 minutes on a GeForce GTX 1060 graphics card, bringing the total computation time to around 15 minutes per subject.

4. Evaluation datasets and benchmark methods

In this section, we describe four datasets that we will use for the experiments in this paper, including two taken from public challenges. We also outline two relevant methods for MS lesion segmentation that the proposed method is compared to in detail, as well as the metrics and measures used in our experiments.

4.1. Datasets

In order to test the proposed method and demonstrate its contrast-adaptiveness, we conducted experiments on four datasets acquired with different scanner platforms, field strengths, acquisition protocols and image resolution:

  • MSSeg: This dataset is the publicly available training set of the MS lesion segmentation challenge that was held in conjunction with the MICCAI 2016 conference (Commowick et al., 2018). It consists of 15 MS cases from three different scanners, all acquired using a harmonized imaging protocol (Cotton et al., 2015). For each patient a 3D T1w sequence, a contrast-enhanced (T1c) sequence, an axial dual PD-T2-weighted (T2w) sequence and a 3D fluid attenuation inversion recovery (FLAIR) sequence were acquired. Each subject’s lesions were delineated by seven different raters on the FLAIR scan and, if necessary, corrected using the T2w scan. These delineated images were then fused to create a consensus lesion segmentation for each subject. Both raw images and pre-processed images (pre-processing steps: denoising, rigid registration, brain extraction and bias field correction – see Commowick et al. (2018) for details) were made available by the challenge organizers. In our experiments we used the pre-processed data, which required only minor modifications in our software to remove non-brain tissues from the model. We note that the original challenge also included a separate set of 38 test subjects, but at the time of writing this data is no longer available.

  • Trio: This dataset consists of 40 MS cases acquired on a Siemens Trio 3T scanner at the Danish Research Center of Magnetic Resonance (DRCMR). For each patient, a 3D T1w sequence, a T2w sequence and a FLAIR sequence were acquired. Ground truth lesion segmentations were automatically delineated on the FLAIR images using Jim software3, and then checked and, if necessary, corrected by and expert rater at DRCMR using the T2w and MPRAGE images.

  • Achieva: This dataset consists of 50 MS cases and 25 healthy controls acquired on a Philips Achieva 3T scanner at DRCMR. After a visual inspection of the images, we decided to remove 2 healthy controls from the dataset as they present marked gray matter atrophy and white matter hyperintensities. For each patient, a 3D T1w sequence, a T2w sequence and a FLAIR sequence were acquired. Ground truth lesion segmentations were delineated using the same protocol as the one used for the Trio dataset.

  • ISBI: This dataset is the publicly available test set of the MS lesion segmentation challenge that was held at the 2015 International Symposium on Biomedical Imaging (Carass et al., 2017). It consists of 14 longitudinal MS cases, with 4 to 6 time points each, separated by approximately one year. Images were acquired on a Philips 3T scanner. For each patient, a 3D T1w sequence, a T2w sequence, a PDw sequence and a FLAIR sequence were acquired. Images were first preprocessed (inhomogeneity correction, skull stripping, dura stripping, again inhomogeneity correction – see Carass et al. (2017) for details), and then registered to a 1 mm MNI template. Each subject’s lesions were delineated by two different raters on the FLAIR scan, and, if necessary, corrected using the other contrasts. As part of the challenge, a training dataset of 5 additional longitudinal MS cases is also available, with the same scanner, imaging protocols and delineation procedure as the test dataset.

A summary of the datasets, with scanner type, image modalities and voxel resolution details, can be found in Table 1. For each subject all the contrasts were co-registered and resampled to the FLAIR scan for MSSeg, and to the T1w scan for Trio, Achieva and ISBI. This is the only preprocessing step required by the proposed method.

Table 1.

Summary of the datasets used in our experiments.

Dataset Scanner Modality Voxel resolution [mm] Subjects
MSSeg Philips Ingenia 3T 3D FLAIR 0.74×0.74×0.7 5
3D T1w 0.74×0.74×0.85
3D T1c 0.74×0.74×0.85
2D T2w 0.45×0.45×3
2D PD 0.45×0.45×3

Siemens Aera 1.5T 3D FLAIR 1.03×1.03×1.25 5
3D T1w 1.08×1.08×0.9
3D T1c 1.08×1.08×0.9
2D T2w 0.72×0.72×4 (Gap: 1.2)
2D PD 0.72×0.72×4 (Gap: 1.2)

Siemens Verio 3T 3D FLAIR 0.5×0.5×1.1 5
3D T1w 1×1×1
3D T1c 1×1×1
2D T2w 0.69×0.69×3
2D PD 0.69×0.69×3

Trio Siemens Trio 3T 2D FLAIR 0.7×0.7×4 40
3D T1w 1×1×1
2D T2w 0.7×0.7×4

Achieve Philips Achieva 3T 3D FLAIR 1×1×1 73
3D T1w 0.85×0.85×0.85
3D T2w 0.85×0.85×0.85

ISBI Philips 3T 2D FLAIR 0.82×0.82×2.2 14
3D T1w 0.82×0.82×1.17
2D T2w 0.82×0.82×2.2
2D PDw 0.82×0.82×2.2

4.2. Benchmark methods for lesion segmentation

In order to evaluate the lesion segmentation component of the proposed method in detail, we compared it to two publicly available and widely used algorithms for MS lesion segmentation:

  • LST-lga4 (Schmidt et al., 2012): This lesion growth algorithm starts by segmenting a T1w image into three main tissue classes (CSF, GM and WM) using SPM125, and combines the resulting segmentation with co-registered FLAIR intensities to calculate a lesion belief map. A pre-chosen initial threshold κ is then used to create an initial binary lesion map, which is subsequently grown along voxels that appear hyperintense in the FLAIR image. We set κ to its recommended default value of 0.3, which was also used in previous studies (Mühlau et al., 2013; Rissanen et al., 2014).

  • NicMsLesions6 (Valverde et al., 2017, 2019): This deep learning method is based on a cascade of two 3D convolutional neural networks, where the first one reveals possible candidate lesion voxels, and the second one reduces the number of false positive outcomes. Both networks were trained by the authors of the method on T1w and FLAIR scans coming from a publicly available training dataset of the MS lesion segmentation challenge held in conjunction with the MICCAI 2008 conference (Styner et al., 2008) (20 cases) and the MSSeg dataset (15 cases). This method was one of the top performers on the test dataset of the MICCAI 2016 challenge (Commowick et al., 2018), and one of the few methods for which an implementation is publicly available.

We note that both these benchmark methods are specifically targeting T1w-FLAIR input, whereas the proposed method is not tuned to any particular combination of input modalities.

Although we only compared our method in detail to these two benchmarks, many more good methods for MS lesion segmentation exist. We refer the reader to the MSSeg paper (Commowick et al., 2018), the ISBI challenge paper (Carass et al., 2017) and the ISBI challenge website7 to compare the reported performance further with other ones.

4.3. Metrics and measures

In order to evaluate the influence of varying the input modalities on the segmentation performance of the proposed method, and to assess segmentation accuracy with respect to that of other methods and human raters, we used a combination of segmentation volume estimates, Pearson correlation coefficients between such estimates and reference values, and Dice scores. Volumes were computed by counting the number of voxels assigned to a specific structure and converting into mm3 , whereas Dice coefficients were computed as

DiceX,Y=2|XY||X|+|Y|,

where X and Y denote segmentation masks, and | · | counts the number of voxels in a mask.

The proposed method and both benchmark algorithms produce a probabilistic lesion map that needs to be thresholded to obtain a final lesion segmentation. This requires an appropriate threshold value to be set for this purpose (variable γ in the proposed method). In order to ensure an objective comparison between the methods, we used a leave-one-out cross-validation strategy in which the threshold for each test image was set to the value that maximizes the average Dice overlap with manual segmentations in all the other images of the same dataset. For the reported performance of the methods on the ISBI dataset, the thresholds were tuned on the 5 training subjects that are part of the challenge instead.

5. Results

In this section, we first illustrate the effect of the various components of our model. We then evaluate how the proposed model adapts to different input modalities and acquisition platforms. Subsequently we compare the lesion segmentation performance of our model against that of the two benchmark methods, relate it to human inter-rater variability, and analyze its performance on the ISBI challenge data. Finally, we perform an indirect validation of the whole-brain segmentation component of the method.

Throughout the section we use boxplots to show some of the results. In these plots, the median is indicated by a horizontal line, plotted inside boxes that extend from the first to the third quartile values of the data. The range of the data is indicated by whiskers extending from the boxes, with outliers represented by circles.

5.1. Illustration of the method

In order to illustrate the effect of the various components of the method, here we analyze its behaviour when segmenting T1w-FLAIR scans of two MS subjects – one with a low and one with a high lesion load. Fig. 4 shows, in addition to the input data and the final lesion probability estimate p(zi=1|di,θ^), also an intermediate lesion probability obtained with the simplified model used to estimate θ^, i.e., before the FLAIR-based intensity constraints and the lesion shape constraints are applied. From these images we can see that the lesion shape model and the intensity constraints help remove false positive detections and enforce more realistic shapes of lesions, especially for the case with low lesion load.

Fig. 4.

Fig. 4.

Illustration of how intensity constraints and the lesion shape model help reduce false positive lesion detections in the method. Top row: a subject with a low lesion load; Bottom row: a subject with a high lesion load. From left to right: T1w and FLAIR input; intermediate lesion probability obtained with the simplified model used to estimate θ^; mask of candidate voxels based on intensity alone (intensity higher than the mean gray matter intensity in FLAIR); and final lesion probability estimate p(zi=1|di,θ^) produced by the method.

Fig. 5 analyzes the effect of the prior p(θles|θd) on the lesion intensity parameters θles for the two subjects shown in Fig. 4. When the lesion load is high, the prior does not have a strong influence, leaving the lesion Gaussian “free” to fit the data. However, when the lesion load is low, the lesion Gaussian is constrained to retain a wide variance and a mean close to the mean of WM, effectively turning the model into an outlier detection method for WM lesions. This behavior is important in cases when few lesions are present in the images, ensuring the method works robustly even when only limited data is available to estimate the lesion Gaussian parameters.

Fig. 5.

Fig. 5.

Illustration of the effect of the prior p(θles|θd) on the lesion intensity parameters, both in the case of a lesion load that is low (left, corresponding to the subject in the top row of Fig. 4) and high (right, corresponding to the subject in the bottom row of Fig. 4). The illustration is from the Monte Carlo sampling phase of the method: In each case, the value of the parameters of the lesion Gaussian is taken as the average over the Monte Carlo samples {θles(s)}s=1S, and the points represent the resulting lesion posterior estimate p(zi=1|di,θ^) in each voxel.

In order to analyze the effect of the lesion shape prior, we compared the lesion segmentation performance of the proposed method with that obtained when the shape prior was intentionally removed from the model (i.e., all the decoder network outputs fi(h) clamped to value 1). For a fair comparison, the lesion threshold value γ was re-tuned to maximize performance for the method without shape prior, in the way described in Section 4.3. Table 2 summarizes the results across the MSSeg, Trio and Achieva datasets, for different ranges of lesion load. In addition to Dice scores, the table also reports results for precision and recall, defined as

precision=TPTP+FPrecall=TPTP+FN,

where TP, FP and FN count the true positive, false positive and false negative voxels compared to the manual segmentation. The results indicate that, although performance is unchanged for high lesion loads, for which segmentation is generally easier (Commowick et al., 2018), the lesion shape prior clearly improves segmentations in subjects with small and medium lesion loads.

Table 2.

Comparison in terms of lesion segmentation performance between the proposed method and a method where the lesion shape model was intentionally removed. Results are expressed in terms of mean ±standard deviation of Dice overlap, precision and recall for different ranges of lesion load. Lesion segmentations were computed across three different datasets (MSSeg, Trio and Achieva) on T1w-FLAIR input.

Lesion load Dice
Precision
Recall
Shape model No shape model Shape model No shape model Shape model No shape model
(0, 2] [ml] 0.42 (±0.10) 0.38 (±0.10) 0.32 ( ±0.12) 0.24 (±0.07) 0.28 (±0.09) 0.24 (±0.07)
(2, 10] [ml] 0.50 (±0.13) 0.47 (±0.13) 0.37 (±0.13) 0.33 (±0.11) 0.34 (±0.12) 0.32 (±0.12)
(10, −) [ml] 0.70 (±0.11) 0.70 (±0.11) 0.62 (±0.20) 0.62 (±0.20) 0.55 (±0.12) 0.55 (±0.13)

(0, −) [ml] 0.57 (±0.16) 0.55 (±0.17) 0.46 (±0.20) 0.43 (±0.20) 0.42 (±0.16) 0.40 (±0.16)

In order to demonstrate that the model also works robustly in control subjects (with no lesions at all), and can therefore be safely applied in studies comparing MS subjects with controls, we further segmented T1w-FLAIR scans of the Achieva dataset, and computed the total volume of the lesions in each subject. The results are shown in Fig. 6; the volumes were 8.95±9.18 ml for MS subjects vs. 0.98±0.77 ml for controls. Although the average lesion volume for controls was not exactly zero, a visual inspection revealed that this was due to some controls having WM hyperintensities that were segmented by the method as MS lesions, which we find acceptable.

Fig. 6.

Fig. 6.

Difference between healthy controls (HC) and MS subjects in lesion volume, as detected by the proposed method on the Achieva dataset (23 HC subjects, 50 MS subjects, T1w-FLAIR input). Lines indicate means across subjects.

5.2. Scanner and contrast adaptive segmentations

In order to demonstrate the ability of our method to adapt to different types and combinations of MRI sequences acquired with different scanners, we show the method’s segmentation results along with the manual segmentations for a representative subset of combinations for one subject in the MSSeg (consensus as manual segmentation), the Trio and the Achieva datasets in Fig. 7. It is not feasible to show all possible combinations. For instance, mixing the 5 contrasts in the MSSeg dataset alone already yields 31 possible multi-contrast combinations. Nonetheless, it is clear that the model is indeed able to adapt to the specific contrast properties of its input scans. A visual inspection of its whole-brain segmentation component seems to indicate that the method benefits from having access to the T1w contrast for best performance. This is especially clear when only the FLAIR contrast is provided, as this visually degrades the segmentation of the white-gray boundaries in the cortical regions due to the low contrast between white and gray matter in FLAIR.

Fig. 7.

Fig. 7.

Contrast-adaptiveness of the proposed method to different combinations of input modalities. Segmentations are shown for one subject of the MSSeg (top row), the Trio (middle row) and the Achieva MS (bottom row) dataset. For each subject the top row shows slices of the data and the manual lesion annotation; the middle row shows the lesion probability map and Dice score computed by the proposed method for specific input combinations; and the bottom row shows the corresponding complete segmentations produced by the method. Enlarged figures for each subject are available in the Supplementary Material Figs. 13.

When comparing the lesion probability maps produced by the method visually with the corresponding manual lesion segmentations, it seems that the method benefits from having access to the FLAIR contrast for the best lesion segmentation performance. This is confirmed by a quantitative analysis shown in Fig. 8, which plots the Dice overlap scores for each of the seven input combinations that all our three datasets have in common, namely T1w, T2w, FLAIR, T1w-T2w, T1w-FLAIR, T2w-FLAIR, and T1w-T2w-FLAIR. Although the inclusion of additional contrasts does not hurt lesion segmentation performance, across all three datasets the best results are obtained whenever the FLAIR contrast is included as input to the model. This finding is perhaps not surprising, given that the manual delineations were all primarily based on the FLAIR image.

Fig. 8.

Fig. 8.

Lesion segmentation performance of the proposed method in terms of Dice overlap with manual raters on three different datasets when different input contrasts are used (T1w, T2w, FLAIR, T1w-T2w, T1w-FLAIR, T2w-FLAIR, T1w-T2w-FLAIR). From left to right: Dice scores on MSSeg, Trio and Achieva MS data.

Considering both the whole-brain and lesion segmentation performance together, we conclude that the combination T1w-FLAIR is well-suited for obtaining good results with the proposed method, although it will also accept other and/or additional contrasts beyond T1w and FLAIR.

5.3. Lesion segmentation

In order to compare the lesion segmentation performance of our model against that of the two benchmark methods, and relate it to human inter-rater variability, we here present a number of results based on the T1w-FLAIR input combination (which is the combination required by the benchmark methods). We also analyze the lesion segmentation performance of our method on the public ISBI challenge.

5.3.1. Comparison with benchmark lesion segmentation methods

Fig. 9 shows automatic segmentations of two randomly selected subjects from the MSSeg, the Trio and the Achieva datasets, both for our method and for the two benchmark methods LST-lga and NicMSLesions, along with the corresponding manual segmentations (consensus manual segmentations for MSSeg). Visually, all three methods perform similarly on the Achieva MS data, but some of the results for NicMSLesions appear to be inferior to those obtained with the other two methods on MSSeg and Trio data. This qualitative observation is confirmed by the quantitative analysis shown in Fig. 10, where the three methods’ Dice overlap scores are compared on each dataset: similar performances are obtained for all methods on the Achieva data, but NicMSLesions trails the other two methods on MSSeg and Trio data. Especially for MSSeg data this is a surprising result, since NicMSLesions was trained on this specific dataset, i.e., the subjects used for testing were part of the training data of this method, potentially biasing the results in favor of NicMSLesions. Based on Dice scores, the proposed method outperforms LST-lga on MSSeg data, although there are no statistically significant differences between the two methods on the other datasets.

Fig. 9.

Fig. 9.

Visual comparison of lesion probability maps on three different datasets for the proposed method and two state-of-the-art lesion segmentation methods (LST-lga and NicMsLesions) on T1w-FLAIR input. (Top) Two subjects from the MSSeg dataset; (Middle) Two subjects from the Trio dataset; (Bottom) Two subjects from the Achieva dataset. For each subject the top row shows slices of the data and the manual annotation while the bottom row shows the lesion probability maps for our model, LST-lga and NicMsLesions.

Fig. 10.

Fig. 10.

Lesion segmentation performance in terms of Dice overlap with manual raters for the proposed method and two benchmark methods (LST-lga and NicMsLesions) on T1w-FLAIR input. Statistically significant differences between two methods, computed with a two-tailed paired t-test, are indicated by asterisks (“***” for p-value < 0.001, “**” for p-value < 0.01 and “*” for p-value < 0.05). From left to right: results on the MSSeg, the Trio and the Achieva dataset.

5.3.2. Results on the ISBI data

We also evaluated the performance of the proposed method on the ISBI challenge data, obtaining a mean Dice score of 0.58 when T1w-FLAIR input is used. This score is comparable to the ones we obtained on the other three datasets analyzed in this paper (cf. Fig. 10) – MSSeg: 0.65, Trio: 0.58 and Achieva: 0.54. A few example segmentation results on the ISBI data are available in the Supplementary Material, Fig. 4.

The ISBI challenge website8 ranks submissions according to an overall lesion segmentation performance score that takes into account Dice overlap, volume correlation, surface distance, and a few other metrics (see Carass et al., 2017 for details). A score of 100 indicates perfect correspondence, while 90 is meant to correspond to human inter-rater performance (Carass et al., 2017; Styner et al., 2008). We obtained a score of 87.87, which places us around half-way in the ranking of the original challenge (Carass et al., 2017), although we note that the website currently lists methods with a much higher score.

In order to relate the performance of our method to the one obtained with the two benchmark methods, we also attempted to run LST-lga and NicMSLesions on this dataset. However, the preprocessing applied to the ISBI challenge data proved problematic for LST-lga, and we were not able to get any results with this method. Results for NicMSLesions in terms of Dice overlap are shown in Fig. 11, together with those obtained with the proposed method. It is clear that NicMSLesions suffers strongly from the domain shift between its training data and the ISBI data, a fact that was already reported in Valverde et al. (2019). For completeness, Fig. 11 also includes results for NicMSLesions when its network was updated on the ISBI training data as described in Valverde et al. (2019): different subsets of network parameters were retrained on the baseline scan of each of the five ISBI training subjects, and the combination that performed best on all 21 training images was retained. From the figure it can be seen that this partially retrained network has comparable performance to the proposed model, although the latter attains this performance without any retraining.

Fig. 11.

Fig. 11.

Lesion segmentation performance in terms of Dice overlap with manual raters on the ISBI dataset for the proposed method, NicMsLesions, and NicMsLesions with partial retraining (see text for details). Statistically significant differences between two methods, computed with a two-tailed paired t-test, are indicated by asterisks (“***” indicates p-value < 0.001).

5.3.3. Inter-rater variability

To evaluate the proposed method’s lesion segmentation performance in the context of human inter-rater variability, we took advantage of the availability of lesion segmentations by seven different raters in the MSSeg dataset. Table 3 shows the lesion segmentation performance in terms of average Dice overlap between each pair of the seven raters, and between each rater and the proposed method. On average, our method achieves a Dice overlap score of 0.57, which is slightly below the mean human raters’ range of [0.59, 0.69]. We note that this result is in line with those obtained in the MSSeg challenge (Commowick et al., 2018).

Table 3.

Comparison of lesion segmentation performance in terms of average Dice score between each pair of the seven raters of the MSSeg dataset, and between each rater and the proposed method (T1w-FLAIR input).

R1 R2 R3 R4 R5 R6 R7 Ours

R1 0.68 0.59 0.70 0.75 0.59 0.59 0.54
R2 0.68 0.59 0.71 0.72 0.60 0.57 0.56
R3 0.59 0.59 0.57 0.59 0.60 0.63 0.60
R4 0.70 0.71 0.57 0.90 0.57 0.54 0.53
R5 0.75 0.72 0.59 0.90 0.59 0.57 0.55
R6 0.59 0.60 0.60 0.57 0.59 0.61 0.57
R7 0.59 0.57 0.63 0.54 0.57 0.61 0.60

Avg 0.65 0.64 0.60 0.66 0.69 0.60 0.59 0.57

5.4. Whole-brain segmentation

Since no ground truth segmentations are available for a direct evaluation of the whole-brain segmentation component of our method, we performed an indirect validation, evaluating its potential for replacing lesion filling approaches that rely on manually annotated lesions, as well as its ability to replicate known atrophy patterns in MS. The results concentrate on the following 25 main neuroanatomical regions, segmented from T1w-FLAIR scans: left and right cerebral white matter, cerebellum white matter, cerebral cortex, cerebellum cortex, lateral ventricle, hippocampus, thalamus, putamen, pallidum, caudate, amygdala, nucleus accumbens and brain stem. To avoid cluttering, the quantitative results for left and right structures are averaged. We note that lesion segmentations are not merged into any of these brain structures (i.e., leaving “holes” in white matter), so that the results reflect performance only for the normal-appearing parts of structures.

5.4.1. Comparison with lesion filling

It is well-known that white matter lesions can severely interfere with the quantification of normal-appearing structures when standard brain MRI segmentation techniques are used (Battaglini et al., 2012; Ceccarelli et al., 2012; Chard et al., 2010; Gelineau-Morel et al., 2012; Nakamura and Fisher, 2009; Vrenken et al., 2013). A common strategy is therefore to use a lesion-filling (Chard et al., 2010; Sdika and Pelletier, 2009) procedure, in which lesions are first manually segmented, their original voxel intensities are replaced with normal-appearing white matter intensities, and standard tools are then used to segment the resulting, preprocessed images. Using such a procedure with SAMSEG would yield whole-brain segmentations that can serve as “silver standard” benchmarks against which the results of the proposed method (which works directly on the original scans) can be compared. In practice, however, we have noticed that replacing lesion intensities, which is typically done in T1w only, did not work well in FLAIR in our experiments. Therefore, rather than explicitly replacing intensities, we obtained silver standard segmentations by simply masking out lesions during the SAMSEG processing, effectively ignoring lesion voxels during the model fitting.

We wished to interpret segmentation vs. silver standard discrepancies within the context of the human inter-rater variability associated with manually segmenting lesions. Therefore, we performed experiments on the MSSeg dataset, repeatedly re-computing the silver standard using each of the seven raters’ manual lesion annotations in turn. The results are shown in Tables 4 and 5 for Pearson correlation coefficients between estimated volumes and Dice segmentation overlap scores, respectively. Each line in these tables corresponds to one structure, showing the average consistency between the silver standard of each rater compared to that of the six other raters, as well as the average consistency between the proposed method’s segmentation and the silver standards of all raters. The results indicate that, in terms of Pearson correlation coefficient, the performance of our method falls within the range of inter-rater variability, albeit narrowly (average value 0.988 vs. inter-rater range [0.988, 0.992]). In terms of Dice scores, however, the method slightly underperforms compared to the inter-rater variability (average value 0.971 vs. inter-rater range [0.978, 0.980]).

Table 4.

Average Pearson correlation coefficients of brain structure volume estimates between the silver standard of each rater compared to that of the six other raters in the MSSeg dataset, as well as the average consistency between the proposed method's segmentation and the silver standards of all raters (T1w-FLAIR input). Each line shows an average across raters for a specific brain structure.

R1 R2 R3 R4 R5 R6 R7 Ours

Cerebral White Matter 0.992 0.992 0.991 0.993 0.993 0.993 0.987 0.989
Cerebellum White Matter 0.994 0.997 0.997 0.996 0.997 0.997 0.997 0.989
Cerebral Cortex 0.997 0.999 0.999 0.999 0.999 0.999 0.999 0.997
Cerebellum Cortex 0.999 0.999 0.997 0.999 0.997 0.999 0.999 0.999
Lateral Ventricles 0.996 0.995 0.996 0.997 0.998 0.994 0.996 0.992
Hippocampus 0.982 0.989 0.987 0.979 0.977 0.979 0.984 0.981
Thalamus 0.998 0.997 0.998 0.998 0.998 0.997 0.997 0.996
Putamen 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.996
Pallidum 0.988 0.993 0.993 0.994 0.993 0.994 0.990 0.989
Caudate 0.994 0.993 0.987 0.990 0.995 0.989 0.993 0.985
Amygdala 0.953 0.967 0.970 0.973 0.941 0.957 0.972 0.963
Accumbens 0.985 0.987 0.987 0.966 0.989 0.953 0.988 0.971
Brain Stem 0.991 0.994 0.990 0.992 0.992 0.988 0.992 0.989

Average 0.990 0.992 0.992 0.990 0.990 0.988 0.992 0.988
Table 5.

Same as Table 4, but with Dice segmentation overlap scores. Each line shows an average across raters – similar to the last row of Table 3 – for a specific brain structure.

R1 R2 R3 R4 R5 R6 R7 Ours

Cerebral White Matter 0.982 0.982 0.982 0.983 0.983 0.982 0.981 0.978
Cerebellum White Matter 0.987 0.987 0.987 0.987 0.988 0.987 0.987 0.983
Cerebral Cortex 0.989 0.990 0.989 0.990 0.989 0.989 0.989 0.986
Cerebellum Cortex 0.996 0.996 0.995 0.996 0.996 0.995 0.995 0.994
Lateral Ventricles 0.972 0.970 0.972 0.974 0.976 0.971 0.971 0.954
Hippocampus 0.975 0.972 0.971 0.973 0.974 0.972 0.972 0.965
Thalamus 0.980 0.981 0.981 0.982 0.981 0.982 0.981 0.975
Putamen 0.987 0.987 0.988 0.988 0.988 0.988 0.987 0.980
Pallidum 0.985 0.985 0.986 0.986 0.986 0.986 0.985 0.978
Caudate 0.961 0.957 0.956 0.961 0.964 0.957 0.954 0.937
Amygdala 0.973 0.972 0.972 0.973 0.972 0.972 0.972 0.967
Accumbens 0.957 0.958 0.960 0.960 0.960 0.943 0.960 0.945
Brain Stem 0.987 0.986 0.984 0.986 0.986 0.986 0.986 0.983

Average 0.979 0.979 0.979 0.980 0.980 0.978 0.978 0.971

5.4.2. Detecting atrophy patterns in MS

In a final analysis, we assessed whether previously reported volume reductions in specific brain structures in MS can automatically be detected with the proposed method. Towards this end, we segmented the 23 controls and the 50 MS subjects of the Achieva dataset, and compared the volumes of various structures between the two groups. Volumes were normalized for age, gender and total intracranial volume by regressing them out with a general linear model. The intracranial volume used for the normalization was computed by summing the volumes of all the structures, as segmented by the method, within the intracranial vault. The results are shown in Fig. 12. Although not all volumes showed significant difference between groups, well established differences were replicated. In particular, we demonstrated decreased volumes of cerebral white matter, cerebral cortex, thalamus and caudate (Azevedo et al., 2018; Chard et al., 2002; Houtchens et al., 2007) as well as an increased volume of the lateral ventricles (Zivadinov et al., 2016).

Fig. 12.

Fig. 12.

Differences between healthy controls (HC) and MS subjects in normalized volume estimates of various neuroanatomical structures, as detected by the proposed method on the Achieva dataset (23 HC subjects, 50 MS subjects, T1w-FLAIR input). Statistically significant differences between the two groups, computed with a Welch’s t-test, are indicated by asterisks (“**” for p-value < 0.01 and “*” for p-value < 0.05).

6. Discussion and conclusion

In this paper, we have proposed a method for the simultaneous segmentation of white matter lesions and normal-appearing neuroanatomical structures from multi-contrast brain MRI scans of MS patients. The method integrates a novel model for white matter lesions into a previously validated generative model for whole-brain segmentation. By using separate models for the shape of anatomical structures and their appearance in MRI, the algorithm is able to adapt to data acquired with different scanners and imaging protocols without needing to be retrained. We validated the method using four disparate datasets, showing robust performance in white matter lesion segmentation while simultaneously segmenting dozens of other brain structures. We further demonstrated that it can also be safely applied to MRI scans of healthy controls, and replicate previously documented atrophy patterns in deep gray matter structures in MS. The proposed algorithm is publicly available as part of the open-source neuroimaging package FreeSurfer.

By performing both whole-brain and white matter lesion segmentation at the same time, the method we propose aims to supplant the two-stage “lesion filling” procedure that is commonly used in morphometric studies in MS, in which lesions segmented in a first step are used to avoid biasing a subsequent analysis of normal-appearing structures with software tools developed for healthy brain scans. In order to evaluate whether our method is successful in this regard, we compared its whole-brain segmentation performance against the results obtained when lesions are segmented a priori by seven different human raters instead of automatically by the method itself. Our results show that the volumes of various neuroanatomical structures obtained when lesions are segmented automatically fall within the range of inter-rater variability, indicating that the proposed method may be used instead of lesion filling with manual lesion segmentations in large volumetric studies of brain atrophy in MS. When detailed spatial overlap is analyzed, however, we found that the automatic segmentation does not fully reach the performance obtained with human lesion annotation as measured by Dice overlap.

Like many other methods for MS lesion segmentation, the method proposed here produces a spatial map indicating in each voxel its probability of belonging to a lesion, which can then be thresholded to obtain a final lesion segmentation. Although in our experience good results can be obtained by using the same threshold value across datasets (e.g., γ = 0.5), changing this value allows one to adjust the trade-off between false positive and false negative lesion detections. Since some MRI sequences and scanners will depict lesions with a higher contrast than others, and because there is often considerable disagreement between human experts regarding the exact extent of lesions (Zijdenbos et al., 1998), in our implementation we therefore expose this threshold value as an optional, tunable parameter to the end-user. Suitable threshold values can be found by visually inspecting the lesion segmentations of a few cases or, in large-scale studies, using cross-validation as we did in our experiments.

By providing the ability to robustly and efficiently segment multicontrasts scans of MS patients across a wide range of imaging equipment and protocols, the software tool presented here may help facilitate large cohort studies aiming to elucidate the morphological and temporal dynamics underlying disease progression and accumulation of disability in MS. Furthermore, in current clinical practice, high-resolution multicontrast images, which can be used to increase the accuracy of lesion segmentation, represent a significantly increased burden for the neuroradiologist to read, and are hence frequently not acquired. The emergence of robust, multi-contrast segmentation tools such as ours may help break the link between the resolution and number of contrasts of the acquired data and the human time needed to evaluate it, thus potentially increasing the accuracy of the resulting measures.

The ability of the proposed method to automatically tailor its appearance models for specific datasets makes it very flexible, allowing it to seamlessly take advantage of novel, potentially more sensitive and specific MRI acquisitions as they are developed. Although not extensively tested, the proposed method should make it possible to, with minimal adjustments, segment data acquired with advanced research sequences such as MP2RAGE (Marques et al., 2010), DIR (Redpath and Smith, 1994), FLAIR2 (Wiggermann et al., 2016) or T2* (Anderson et al., 2001), both at conventional and at ultra-high magnetic field strengths. We are currently pursuing several extensions of the proposed method, including the ability to go on and create cortical surfaces and parcellations in FreeSurfer, as well as a dedicated version for longitudinal data (Cerri et al., 2020).

Supplementary Material

1

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 765148, as well as from the National Institute Of Neurological Disorders and Stroke under project number R01NS112161. Hartwig R. Siebner holds a 5-year professorship in precision medicine at the Faculty of Health Sciences and Medicine, University of Copenhagen which is sponsored by the Lundbeck Foundation (Grant Nr. R186-2015-2138). Mark Muhlau was supported by the German Research Foundation (Priority Program SPP2177, Radiomics: Next Generation of Biomedical Imaging) – project number 428223038.

Appendix A. Parameter optimization in SAMSEG

We here describe how we perform the optimization of p(θ|D) with respect to θ and in the original SAMSEG model. We follow a coordinate ascent approach, in which a limited-memory BFGS optimization of θl is interleaved with a generalized EM (GEM) optimization of the remaining parameters θd. The GEM algorithm was derived in (Van Leemput et al., 1999) based on (Wells et al., 1996), and is repeated here for the sake of completeness. It iteratively constructs a tight lower bound to the objective function by computing the soft label assignments ωi,k based on the current estimate of θd (Eq. (2)), and subsequently improves the lower bound (and therefore the objective function) using the following set of analytical update equations for these parameters :

μkmkandΣkVk,k(c1cN)(ATS1,1AATS1,NAATSN,1AATSN,NA)1(AT(n=1NS1,nr1,n)AT(n=1NSN,nrN,n)),

where

mk=i=1Iωi,k(diCϕi)NkwithNk=i=1Iωi,k,
Vk=i=1Iωi,k(diCϕimk)(diCϕimk)TNk,
A=(ϕ11ϕp1ϕ1IϕPI),Sm,n=diag(sim,n),rm,n=(r1m,nrIm,n)

and

sim,n=k=1Ksi,km,n,si,km,n=ωi,k(k1)m,n,rim,n=dinl=1Ksi,km,n(μk)nl=1Ksi,km,n.

Appendix B. Parameter optimization

Here we describe how we perform the optimization of p(θ, θles|D) with respect to θ and θles in the augmented model of Sec. 3 with the decoder outputs fi(h) all clamped to value 1. In that case, the model can be reformulated in the same form as the original SAMSEG model, so that the same optimization strategy can be used. In particular, lesions can be considered to form an extra class (with index K + 1) in a SAMSEG model with K + 1 labels, provided that the mesh vertex label probabilities

α~jk={βjifk=K+1(lesion),αjk(βj1)otherwise.

are used instead of the original αjk’s in the atlas interpolation model of Eq. (1).

The optimization described in Appendix A does require one modification because of the prior p(θles|θd) binding the means and variances of the WM and lesion classes together. The following altered update equations for these parameters guarantee that the EM lower bound, and therefore the objective function, is improved in each iteration of the GEM algorithm:

μWM(NWMI+vNWMv+NWMΣWMΣles1)1(NWMmWM+vNWMv+NWMΣWMΣles1mles),
ΣWMNWMVWM+ΣlesΣWM1ΨlesNWM+Nles+N+2,
μlesNlesmles+vμWMNles+v,
ΣlesΨles+vκΣWMNles+v,

where Ψles=NlesvNles+v(mlesμWM)(mlesμWM)T+NlesVles.

Appendix C. Estimating lesion probabilities

We here describe how we we approximate p(zi=1|di,θ^) using Monte Carlo sampling. We use a Markov chain Monte Carlo (MCMC) approach to sample triplets {θles(s),z(s),h(s)} from the distribution p(θles,z,h|D,θ^): Starting from an initial lesion segmentation z(0) obtained from the parameter estimation procedure described in Appendix B, we use a blocked Gibbs sampler in which each variable is updated conditioned on the other ones:

Σles(s+1)p(Σles|D,θ^,z(s))=IW(Σles|ψles(s)+vκΣ^W,M,Nles(s)+vN2)
μles(s+1)p(μles|D,θ^,z(s),Σles(s+1))=N(μles|Nles(s)mles(s)+vμ^WMNles(s)+v,Σles(s+1)Nles(s)+v)
h(s+1)p(h|z(s))N(h|μυ(z(s)),diag(συ2(z(s))))
z(s+1)~p(z|D,θ^,h(s+1),θles(s+1))=i=1Ip(zi|di,θ^,h(s+1),θles(s+1)),

where we use the encoder variational approximation obtained during the training of the lesion shape model (see Sec. 3.1.2) to sample from h in the next-to-last step, and

p(zi=1|di,θ^,h,θles)=N(di|μles+Cϕi,Σles)fi(h)ρi(θ^l)li=1Kzi=01p(di|li,zi,θ^l,θles)p(zi|θ^l,h)p(li|θ^l)

in the last step. In these equations, the variables Nles(s), mles(s), Vles(s) and Ψles(s) are as defined before, but using voxel assignments wi,les=zi(s). Once S samples are obtained, we approximate p(zi=1|di,θ^) as

p(zi=1|di,θ^)1Ss=1Sp(zi=1|di,θ^,h(s),θles(s)).

In our implementation, we use S = 50 samples, obtained after discarding the first 50 sweeps of the sampler (so-called “burn-in” phase). The algorithm repeatedly invokes the decoder and encoder networks of the lesion shape model described in Sec. 3.1.2. Since this shape model was trained in a specific isotropic space, the algorithm requires transitioning between this training space and subject space using an affine transformation. This is accomplished by resampling the input and output of the encoder and decoder, respectively, using linear interpolation.

Footnotes

Declaration of Competing Interest

Hartwig R. Siebner has received honoraria as speaker from Sanofi Genzyme, Denmark and Novartis, Denmark, as consultant from Sanofi Genzyme, Denmark and as senior editor (NeuroImage) from Elsevier Publishers, Amsterdam, The Netherlands. He has received royalties as book editor from Springer Publishers, Stuttgart, Germany.

Supplementary material

Supplementary material associated with this article can be found, in the online version, at 10.1016/j.neuroimage.2020.117471

1

Although selectively fusing several automatic methods has recently been shown to approach human performance (Carass et al., 2020).

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X, 2015. Tensorflow: large-scale machine learning on heterogeneous distributed systems. 1603.04467.
  2. Adelman G, Rane SG, Villa KF, 2013. The cost burden of multiple sclerosis in the United States: a systematic review of the literature. J. Med. Econ 16 (5), 639–647. [DOI] [PubMed] [Google Scholar]
  3. Aït-Ali LS, Prima S, Hellier P, Carsin B, Edan G, Barillot C, 2005. STREM: A robust multidimensional parametric method to segment MS lesions in MRI In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3749, pp. 409–416. [DOI] [PubMed] [Google Scholar]
  4. Anderson L, Holden S, Davis B, Prescott E, Charrier C , Bunce N, Firmin D, Wonke B, Porter J, Walker J, Pennell D, 2001. Cardiovascular T2-star (T2* ) magnetic resonance for the early diagnosis of myocardial iron overload. Euro. Heart J. 22 (23), 2171–2179. [DOI] [PubMed] [Google Scholar]
  5. Ashburner J, Andersson JL, Fristen KJ , 2000. Image registration using a symmetric prior – in three dimensions. Hum. Brain Mapp. 9 (4), 212–225, [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Azevedo CJ, Cen SY, Khadka S, Liu S, Kornak J, Shi Y, Zheng L, Hauser SL, Pelletier D, 2018. Thalamic atrophy in multiple sclerosis: a magnetic resonance imaging marker of neurodegeneration throughout disease. Ann. Neurol. 83 (2), 223–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bakshi R, Thompson AJ, Rocca MA, Pelletier D, Dousset V, Barkhof F, Inglese M, Guttmann CR, Horsfield MA, Filippi M, 2008. MRI in multiple sclerosis: current status and future prospects. Lancet Neurol. 7 (7), 615–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Barkhof F, Calabresi PA, Miller DH, Reingold SC, 2009. Imaging outcomes for neuroprotection and repair in multiple sclerosis trials. Nat. Rev. Neurol. 5 (5), 256–266. [DOI] [PubMed] [Google Scholar]
  9. Battaglini M, Jenkinson M, De Stefano N, 2012. Evaluating and reducing the impact of white matter lesions on brain volume measurements. Hum. Brain Mapp. 33 (9), 2062–2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bazin P-L , Pham DL, 2008. Homeomorphic brain image segmentation with topological and statistical atlases. Med. Image Anal. 12 (5), 616–625 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Blystad I, Håkansson I, Tisell A, Ernerudh J, Smedby Ö, Lundberg P, Larsson E-M, 2015. Quantitative MRI for analysis of active multiple sclerosis lesions without gadolinium-based contrast agent. Am. J. Neuroradiol. 37 (1), 94–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bricq S, Collet C, Armspach JP, 2008. Lesions detection on 3D brain MRI using trimmmed likelihood estimator and probabilistic atlas. In: Proceedings of the 2008 Fifth IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Proceedings, ISBI, pp. 93–96. [Google Scholar]
  13. Carass A, Roy S, Gherman A, Reinhold JC, Jesson A, Arbel T, Maier O, Handels H, Ghafoorian M, Platel B, Birenbaum A, Greenspan H, Pham DL , Crainiceanu CM, Calabresi PA, Prince JL, Roncal WR, Shinohara RT, Oguz I , 2020. Evaluating white matter lesion segmentations with refined Sørensen-Dice analysis. Sci. Rep. 10, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Carass A, Roy S, Jog A, Cuzzocreo JL, Magrath E, Gherman A, Button J, Nguyen J, Prados F, Sudre CH, Cardoso MJ, Cawley N, Ciccarelli O, Wheeler-Kingshott CAM, Ourselin S, Catanese L, Deshpande H, Maurel P, Commowick O, Barillot C, Tomas-Fernandez X , Warfield SK, Vaidya S, Chunduru A, Muthuganapathy R, Krishnamurthi G, Jesson A, Arbel T, Maier O, Handels H, Iheme LO, Unay D, Jain S, Sima DM, Smeets D, Ghafoorian M, Platel B, Birenbaum A, Greenspan H, Bazin P-L, Calabresi PA, Crainiceanu CM , Ellingsen LM, Reich DS, Prince JL, Pham DL, 2017. Longitudinal multiple sclerosis lesion segmentation: resource & challenge HHS public access. NeuroImage 148, 77–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Ceccarelli A, Jackson J, Tauhid S, Arora A, Gorky J, Dell’Oglio E, Bakshi A, Chitnis T, Khoury SJ, Weiner HL, et al. , 2012. The impact of lesion in-painting and registration methods on voxel-based morphometry in detecting regional cerebral gray matter atrophy in multiple sclerosis. Am. J. Neuroradiol. 33 (8), 1579–1585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cerri S, Hoopes A, Greve DN, Mühlau M, Van Leemput K, 2020. A longitudinal method for simultaneous whole-brain and lesion segmentation in multiple sclerosis. In: Proceedings of the Third International Workshop in Machine Learning in Clinical Neuroimaging (accepted). [Google Scholar]
  17. Chard DT, Griffin CM, Parker GJM, Kapoor R, Thompson AJ, Miller DH, 2002. Brain atrophy in clinically early relapsing-remitting multiple sclerosis. Brain 125 (2), 327–337. [DOI] [PubMed] [Google Scholar]
  18. Chard DT, Jackson JS, Miller DH, Wheeler-Kingshott CA, 2010. Reducing the impact of white matter lesions on automated measures of brain gray and white matter volumes. J. Magn. Resonanc. Imaging 32 (1), 223–228. [DOI] [PubMed] [Google Scholar]
  19. Commowick O, Istace A, Kain M, Laurent B, Leray F, Simon M, Pop SC, Girard P, Améli R, Ferré J-C, Kerbrat A, Tourdias T, Cervenansky F, Glatard T, Beaumont J, Doyle S , Forbes F, Knight J, Khademi A, Mahbod A, Wang C, Mckinley R, Wagner F, Muschelli J, Sweeney E, Roura E, Lladó X, Santos MM, Santos WP, Silva-Filho AG, Tomas-Fernandez X, Urien H, Bloch I, Valverde S , Cabezas M, Vera-Olmos FJ, Malpica N, Guttmann C, Vukusic S, Edan G, Dojat M, Styner M, Warfield SK, Cotton F, Barillot C, 2018. Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure. Sci. Rep. 8 (13650). [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Cotton F, Kremer S, Hannoun S, Vukusic S , Dousset V, Roxana A, René A, Jean-Paul A, Bertrand A, Christian B, Isabelle B, Fabrice B, Claire B, Giovanni C, Frédéric C, Mikael C, Olivier C, François C, Jérôme DS, Vincent D, Françoise DD, Gilles E, Jean-Christophe F, Damien G, Tristan G, Sylvie G, Justine G, Rémy G, Charles G, Salem H, Fabrice H, Alexandre K, Stéphane K, Pierre L, de Champfleur Nicolas M, Jean-Philippe R, Jean-Amédée R, Dominique SM, Julien S, Bruno S, Ayman T, Thomas T, Sandra V, 2015. OFSEP, a nationwide cohort of people with multiple sclerosis: consensus minimal MRI protocol 42, 133–140. [DOI] [PubMed] [Google Scholar]
  21. Danelakis A, Theoharis T, Verganelakis DA , 2018. Survey of automated multiple sclerosis lesion segmentation techniques on magnetic resonance imaging. Comput. Med. Imaging Graph. 70, 83–100. [DOI] [PubMed] [Google Scholar]
  22. Dempster AP, Laird NM, Rubin DB, 1977. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soci. Ser. B (Methodol.) 39 (1), 1–38. [Google Scholar]
  23. Filippi M, Rocca M, Arnold D, Bakshi R, Barkhof F, De Stefano N , Fazekas F, Frohman E, Wolinsky J, 2006. EFNS guidelines on the use of neuroimaging in the management of multiple sclerosis. Eur. J. Neurol. 13 (4), 313–325 . [DOI] [PubMed] [Google Scholar]
  24. Fischl B, 2012. FreeSurfer. NeuroImage 62 (2), 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. García-Lorenzo D, Francis S, Narayanan S, Arnold DL, Collins DL, 2013. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med. Image Anal. 17 (1), 1–18. [DOI] [PubMed] [Google Scholar]
  26. García-Lorenzo D, Prima S, Arnold DL, Collins DL, Barillot C, 2011. Trimmed-like-lihood estimation for focal lesions and tissue segmentation in multisequence MRI for multiple sclerosis. IEEE Trans. Med. Imaging 30 (8), 1455–1467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gelineau-Morel R, Tomassini V, Jenkinson M, Johansen-Berg H, Matthews PM, Palace J, 2012. The effect of hypointense white matter lesions on automated gray matter segmentation in multiple sclerosis. Hum. Brain Mapp. 33 (12), 2802–2814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Geurts JJ, Calabrese M, Fisher E, Rudick RA, 2012. Measurement and clinical effect of grey matter pathology in multiple sclerosis. Lancet Neurol. 11 (12), 1082–1092. [DOI] [PubMed] [Google Scholar]
  29. Goldenberg MM, 2012. Multiple sclerosis review. Pharm. Therap. 37 (3), 175. [PMC free article] [PubMed] [Google Scholar]
  30. Griffanti L, Zamboni G, Khan A, Li L, Bonifacio G, Sundaresan V, Schulz UG, Kuker W, Battaglini M, Rothwell PM, Jenkinson M, 2016. BIANCA (Brain Intensity AbNormality Classification Algorithm): a new tool for automated segmentation of white matter hyperintensities. NeuroImage 141 (1), 191–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Guttmann CR, Kikinis R, Anderson MC, Jakab M, Warfield SK, Killiany RJ, Weiner HL, Jolesz FA, 1999. Quantitative follow-up of patients with multiple sclerosis using MRI: Reproducibility. J.f Magn. Resonanc. Imaging 4 (9), 509–518. [DOI] [PubMed] [Google Scholar]
  32. Houtchens MK, Benedict RH, Killiany R, Sharma J, Jaisani Z, Singh B, Weinstock-Guttman B, Guttmann CR, Bakshi R, 2007. Thalamic atrophy and cognition in multiple sclerosis. Neurology 69 (12), 1213–1223. [DOI] [PubMed] [Google Scholar]
  33. Huber PJ, 1981. Robust Statistics. John Wiley and Sons, New York. [Google Scholar]
  34. Ioffe S, Szegedy C, 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the Thirty-second International Conference on International Conference on Machine Learning, 37, pp. 448–456. [Google Scholar]
  35. Jain S, Sima DM, Ribbens A, Cambron M, Maertens A, Hecke WV, Mey JD, Barkhof F, Steenwijk MD, Daams M, Maes F, Huffel SV, Vrenken H, Smeets D, 2015. Automatic segmentation and volumetry of multiple sclerosis brain lesions from MR images. NeuroImage: Clin. 8, 367–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kikinis R , Guttmann CR, Metcalf D, Wells WM, Ettinger GJ, Weiner HL, Jolesz FA, 1999. Quantitative follow-up of patients with multiple sclerosis using MRI: Technical aspects. J. Magn. Resonanc. Imaging 9 (4), 519–530. [DOI] [PubMed] [Google Scholar]
  37. Kingma DP, Ba J, 2014. Adam: a method for stochastic optimization. 1412.6980.
  38. Kingma DP, Welling M, 2013. Auto-encoding variational Bayes. 1312.6114.
  39. Liu J, Smith CD, Chebrolu H, 2009. Automatic multiple sclerosis detection based on integrated square estimation. In: Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 31–38. [Google Scholar]
  40. Lövblad K-O, Anzalone N, Dörfler A, Essig M, Hurwitz B, Kappos L, Lee S-K, Filippi M, 2010. MR imaging in multiple sclerosis: review and recommendations for current practice. Am. J. Neuroradiol. 31 (6), 983–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Marques JP , Kober T, Krueger G, der Zwaag W.v., de Moortele P-FV, Gruetter R , 2010. MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. NeuroImage 49 (2), 1271–1281 . [DOI] [PubMed] [Google Scholar]
  42. McKinley R, Wepfer R, Aschwanden F, Grunder L, Muri R, Rummel C, Verma R, Weisstanner C, Reyes M, Salmen A, Chan A, Wagner F, Wiest R, 2019. Simultaneous lesion and neuroanatomy segmentation in multiple sclerosis using deep neural networks. 1901.07419. [DOI] [PMC free article] [PubMed]
  43. Mühlau M, Buck D, Förschler A, Boucard CC, Arsic M, Schmidt P, Gaser C, Berthele A, Hoshi M, Jochim A, Kronsbein H , Zimmer C, Hemmer B, Ilg R, 2013. White-matter lesions drive deep gray-matter atrophy in early multiple sclerosis: support from structural MRI. Multiple Scleros. J. 19 (11), 1485–1492. [DOI] [PubMed] [Google Scholar]
  44. Nakamura K, Fisher E, 2009. Segmentation of brain magnetic resonance images for measurement of gray matter atrophy in multiple sclerosis patients. Neuroimage 44 (3), 769–776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Prastawa M, Gerig G, 2008. Automatic MS lesion segmentation by outlier detection and information theoretic region partitioning. MIDAS J.. [Google Scholar]
  46. Puonti O , Iglesias JE, Van Leemput K, 2016. Fast and sequence-adaptive whole-brain segmentation using parametric Bayesian modeling. NeuroImage 143, 235–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Puonti O, Van Leemput K, 2016. Simultaneous whole-brain segmentation and white matter lesion detection using contrast-adaptive probabilistic models In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9556, pp. 9–20. [Google Scholar]
  48. Redpath TW, Smith FW, 1994. Technical note: use of a double inversion recovery pulse sequence to image selectively grey or white brain matter. Br. J. Radiol. 67 (804), 1258–1263. [DOI] [PubMed] [Google Scholar]
  49. Rezende DJ, Mohamed S, Wierstra D, 2014. Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 32, pp. 1278–1286. [Google Scholar]
  50. Rissanen E, Tuisku J, Rokka J , Paavilainen T, Parkkola R, Rinne JO, Airas L, 2014. In vivo detection of diffuse inflammation in secondary progressive multiple sclerosis using PET imaging and the radioligand11C-PK11195. J. Nucl. Med. 55 (6), 939–944. [DOI] [PubMed] [Google Scholar]
  51. Rosati G, 2001. The prevalence of multiple sclerosis in the world: an update. Neurol. Sci. 22 (2), 117–139. [DOI] [PubMed] [Google Scholar]
  52. Rousseau F, Blanc F, De Sèze J, Rumbach L, Armspach JP, 2008. An a contrario approach for outliers segmentation: application to multiple sclerosis in MRI. In: Proceedings of the 2008 Fifth IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 9–12 . [Google Scholar]
  53. Schmidt P, Gaser C, Arsic M, Buck D, Förschler A, Berthele A, Hoshi M, Ilg R, Schmid VJ, Zimmer C, Hemmer B, Mühlau M. 2012. An automated tool for detection of FLAIR-hyperintense white-matter lesions in Multiple Sclerosis. NeuroImage 59 (4), 3774–3783. [DOI] [PubMed] [Google Scholar]
  54. Sdika M, Pelletier D, 2009. Nonrigid registration of multiple sclerosis brain images using lesion inpainting for morphometry or lesion mapping. Hum. Brain Mapp. 30 (4), 1060–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Shiee N, Bazin P-L, Ozturk A, Reich DS, Calabresi PA, Pham DL, 2010. A topology-preserving approach to the segmentation of brain images with multiple sclerosis lesions. NeuroImage 49 (2), 1524–1535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Smeets D, Ribbens A, Sima DM, Cambron M, Horakova D, Jain S, Maertens A, Van Vlierberghe E, Terzopoulos V, Van Binst A-M, Vaneckova M. Krasensky J. Uher T, Seidl Z, De Keyser J. Nagels G, De Mey J, Havrdova E, Van Hecke W 2016. Reliable measurements of brain atrophy in individual patients with multiple sclerosis. Brain Behav. 6 (9), e00518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews P, Federico A, De Stefano N. 2002. Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. NeuroImage 17 (1), 479–489. [DOI] [PubMed] [Google Scholar]
  58. Sormani Maria P, Bruzzi P, 2013. MRI lesions as a surrogate for relapses in multiple sclerosis: a meta-analysis of randomised trials. Lancet Neurol. 12 (7), 669–676. [DOI] [PubMed] [Google Scholar]
  59. Styner M, Lee J, Chin B , Chin M, Commowick O, Tran H-H, Jewells V, Warfield S. 2008. 3D segmentation in the clinic: a grand challenge II: MS lesion segmentation. MIDAS J. 1–6. [Google Scholar]
  60. Sudre CH , Cardoso MJ, Bouvy WH , Biessels GJ, Barnes J, Ourselin S, 2015. Bayesian model selection for pathological neuroimaging data applied to white matter lesion segmentation. IEEE Trans. Med. Imaging 34 (10), 2079–2102. [DOI] [PubMed] [Google Scholar]
  61. Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, Correale J, Fazekas F, Filippi M, Freedman MS, Fujihara K, Galetta SL, Hartung HP, Kappos L, Lublin FD, Marrie RA, Miller AE, Miller DH, Montalban X, Mowry EM, Sorensen PS, Tintoré M, Traboulsee AL, Trojano M, Uitdehaag BM, Vukusic S, Waubant E, Weinshenker BG, Reingold SC, Cohen JA , 2018. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurol. 17 (2), 162–173. [DOI] [PubMed] [Google Scholar]
  62. Valverde S, Cabezas M, Roura E, González-Villà S, Pareto D, Vilanova JC, Ramió-Torrentà L, Rovira Á, Oliver A, Lladó X, 2017. Improving automated multiple sclerosis lesion segmentation with a cascaded 3D convolutional neural network approach. NeuroImage 155, 159–168. doi: 10.1016/j.neuroimage.2017.04.034. [DOI] [PubMed] [Google Scholar]
  63. Valverde S, Salem M, Cabezas M, Pareto D, Vilanova JC, Ramió-Torrentà L. Rovira Á, Salvi J, Oliver A, Lladó X. 2019. One-shot domain adaptation in multiple sclerosis lesion segmentation using convolutional neural networks. NeuroImage: Clin. 21, 101638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Van Leemput K, 2009. Encoding probabilistic brain atlases using Bayesian inference. IEEE Trans. Med. Imaging 28 (6), 822–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Van Leemput K, Maes F, Vandermeulen D, Colchester A, Suetens P, 2001. Automated segmentation of multiple sclerosis lesions by model outlier detection.. IEEE Trans. Med. Imaging 20 (8), 677–688. [DOI] [PubMed] [Google Scholar]
  66. Van Leemput K, Maes F, Vandermeulen D, Suetens P , 1999. Automated model-based bias field correction of MR images of the brain. IEEE Trans. Med. Imaging 18 (10), 885–896. [DOI] [PubMed] [Google Scholar]
  67. Vrenken H, Jenkinson M, Horsfield M, Battaglini M, Van Schijndel R, Rostrup E, Geurts J, Fisher E, Zijdenbos A, Ashburner J, et al. , 2013. Recommendations to improve imaging and analysis of brain lesion load and atrophy in longitudinal studies of multiple sclerosis. J. Neurol. 260 (10), 2458–2471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wells WM, Grimson WE, Kikinis R, Jolesz FA, 1996. Adaptive segmentation of MRI data. IEEE Trans. Med. Imaging 15 (4), 429–442. [DOI] [PubMed] [Google Scholar]
  69. Wiggermann V , Hernandez-Torres E, Traboulsee A, Li DK, Rauscher A, 2016. FLAIR2: a combination of FLAIR and T2 for improved MS lesion detection. Am. J. Neuroradiol. 37 (2), 259–265 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Zijdenbos A, Forghani R, Evans A. 1998. Automatic quantification of MS lesions in 3D MRI brain data sets: validation of INSECT In: Proceedings of Medical Image Computing and Computer-Assisted Intervention – MICCAI’ 98 Springer, pp. 439–448. [Google Scholar]
  71. Zivadinov R, Uher T, Hagemeier J, Vaneckova M, Ramasamy DP, Tyblova M, Bergsland N, Seidl Z, Dwyer MG, Krasensky J, Havrdova E, Horakova D, 2016. A serial 10-year follow-up study of brain atrophy and disability progression in RRMS patients. Multiple Scleros. J. 22 (13), 1709–1718. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES