Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2016 Oct 11;38(2):599–616. doi: 10.1002/hbm.23432

Simultaneous total intracranial volume and posterior fossa volume estimation using multi‐atlas label fusion

Yuankai Huo 1,, Andrew J Asman 1, Andrew J Plassard 2, Bennett A Landman 1,2,3,4,5
PMCID: PMC5225112  NIHMSID: NIHMS821395  PMID: 27726243

Abstract

Total intracranial volume (TICV) is an essential covariate in brain volumetric analyses. The prevalent brain imaging software packages provide automatic TICV estimates. FreeSurfer and FSL estimate TICV using a scaling factor while SPM12 accumulates probabilities of brain tissues. None of the three provide explicit skull/CSF boundary (SCB) since it is challenging to distinguish these dark structures in a T1‐weighted image. However, explicit SCB not only leads to a natural way of obtaining TICV (i.e., counting voxels inside the skull) but also allows sub‐definition of TICV, for example, the posterior fossa volume (PFV). In this article, they proposed to use multi‐atlas label fusion to obtain TICV and PFV simultaneously. The main contributions are: (1) TICV and PFV are simultaneously obtained with explicit SCB from a single T1‐weighted image. (2) TICV and PFV labels are added to the widely used BrainCOLOR atlases. (3) Detailed mathematical derivation of non‐local spatial STAPLE (NLSS) label fusion is presented. As the skull is clearly distinguished in CT images, we use a semi‐manual procedure to obtain atlases with TICV and PFV labels using 20 subjects who both have a MR and CT scan. The proposed method provides simultaneous TICV and PFV estimation while achieving more accurate TICV estimation compared with FreeSurfer, FSL, SPM12, and the previously proposed STAPLE based approach. The newly developed TICV and PFV labels for the OASIS BrainCOLOR atlases provide acceptable performance, which enables simultaneous TICV and PFV estimation during whole brain segmentation. The NLSS method and the new atlases have been made freely available. Hum Brain Mapp 38:599–616, 2017. © 2016 Wiley Periodicals, Inc.

Keywords: intracranial volume, TICV, posterior fossa, multi‐atlas segmentation, non‐local spatial staple, NLSS, label fusion

INTRODUCTION

Total intracranial volume (TICV), the volume inside the brain cranium, is the total volume of gray matter (GM), white matter (WM), cerebrospinal fluid (CSF), and meninges [Mathalon et al., 1993]. In volumetric analyses, many inter‐subject differences can be explained by differences in head size [Barnes et al., 2010]. To reduce variability, TICV has been widely used as a covariate in regional and whole brain volumetric analyses [Barnes et al., 2010; Farias et al., 2012; Nordenskjold et al., 2013; Peelle et al., 2012; Perlaki et al., 2014; Westman et al., 2013; Whitwell et al., 2001]. Compared with whole brain volume (WBV) [Smith, 2002], TICV is often preferred since it provides an estimation of premorbid brain size [Davis and Wright, 1977; Perneczky et al., 2010].

Manual delineation of the cranial vault is the gold standard for measuring TICV from magnetic resonance (MR) images [Whitwell et al., 2001]. However, this labor‐intensive and time‐consuming procedure is impractical on large cohort. As a result, automatic TICV estimation methods are appealing. One family of methods directly applies the automatic skull‐stripping techniques to TICV estimation for particular imaging modalities. In MRI, skull is dark while CSF is bright in some modalities [e.g., T2‐weighted (T2w) and proton density (PD)]. Therefore, the brighter CSF and brain tissues are able to be segmented from the darker skull using skull‐stripping, and the total volume of the CSF and brain tissues are used as TICV. For instance, the brain extraction tool (BET) and the brain surface extractor (BSE) achieved accurate TICV estimation using PD images [Hartley et al., 2006]. However, both skull and CSF are dark in other modalities [e.g., T1‐weighted (T1w)], in which the skull‐stripping techniques typically yield less accurate TICV estimations because of the low contrast between the CSF and skull. To derive accurate TICV estimation on such MR modalities, a number of approaches have been developed and evaluated [Aguilar et al., 2015; Ananth et al., 2014; Ashburner and Friston, 2005; Buckner et al., 2004; Driscoll et al., 2009; Hansen et al., 2015; Keihaninejad et al., 2010; Lemieux et al., 2003; Pengas et al., 2009; Smith et al., 2004]. Among these methods, three of the most prevalent are integrated in FreeSurfer (FS) [Dale et al., 1999], FMRIB Software Library (FSL) [Smith et al., 2004], and Statistical Parametric Mapping (SPM12). In FreeSurfer, the estimated TIV (eTIV) tool estimates TICV by investigating the affine transformation between target image and template [Buckner et al., 2004]. The idea is that the TICV volume is correlated with the determinant of the transform matrix (called “scaling factor”), which aligns a target image with a template. SIENAX, part of FSL, also provides a volumetric scaling factor as a normalization for head size [Fein et al., 2004]. This scaling factor is the determinant of scaling matrix from affine registration, which rescales the target image's skull to the template's skull [Smith et al., 2002]. Therefore, FreeSurfer and FSL do not provide explicit skull/CSF boundaries (SCB) when estimating TICV. SPM provides two different approaches for TICV estimation (e.g., implemented in SPM5 and SPM8). The first approach, called the reverse brain mask (RBM) method, non‐rigidly registers a TICV mask from template space to individual space [Boyes et al., 2006; Keihaninejad et al., 2010]. The second approach accumulates the tissue probabilities of GM, WM, and CSF in standard space using the “New Segment” toolbox [Ashburner and Friston, 2005; Weiskopf et al., 2011]. The first approach provides a TICV mask in individual space; however, the second method produces more accurate TICV estimations [Ridgway et al., 2011]. More recently, the newly released SPM12 provides a new “Tissue Volumes” toolbox, which combines the advantages from two previous approaches in a unified framework [Malone et al., 2015]. As a result, SPM12 achieves superior TICV estimations compared with previous SPM versions [Malone et al., 2015]. However, the TICV value and the related SCB are provided in standard space by SPM12 rather than in individual space. Extra efforts from the user side are required if the users want to achieve consistent TICV value and SCB in individual space.

FreeSurfer, FSL and SPM12 are three of the most well validated and widely accepted TICV estimation software packages. However, none of them estimate TICV by counting the voxels inside skull (or SCB), which is a natural way of calculating TICV. The reason is that it is difficult to obtain adequate intensity contrast between skull and CSF in MR T1‐weighted (T1w) images (assuming that the thickness of dura is negligible). To obtain the SCB, multispectral MR data [e.g., T2‐weighted (T2w), proton density (PD)], with more clear skull evidence, have been combined with T1w images in TICV estimation [Hansen et al., 2015; Keihaninejad et al., 2010; Pengas et al., 2009; Whitwell et al., 2001]. However, it is still essential to measure TICV with explicit SCB using a single T1w image since: (1) T2w and PD images are not available in all datasets and T1w images are commonly available structural MR sequences. (2) TICV estimation with SCB not only leads to a natural way of obtaining TICV (count voxels inside skull) but also allows us to calculate sub‐region volumes, for example, posterior fossa volume (PFV), which is essential in investigating cerebellum development, for example, [Badie et al., 1995; Nyland and Krogness, 1978; Sgouros et al., 2006].

TICV estimation using STAPLE label fusion [Warfield et al., 2004] has been proposed to derive SCB using a single T1w image [Schaerer et al., 2012]. However, the STAPLE label fusion algorithm has shown limitations [Van Leemput and Sabuncu, 2014], which have led to extensions of STAPLE [Akhondi‐Asl et al., 2014; Asman and Landman, 2011, 2012, 2013, 2014; Commowick and Warfield, 2010; Landman et al., 2012; Rohlfing et al., 2003a, 2003b; Shen et al., 2015]. Recently, an improved method called Non‐local Spatial STAPLE (NLSS) label fusion, a combination of Spatial STAPLE [Asman and Landman, 2012] and Non‐local STAPLE (NLS) [Asman and Landman, 2013], has shown advantages over STAPLE, Spatial STAPLE and NLS in brain segmentation [Asman et al., 2015; Asman and Landman, 2014; Huo et al., 2015, 2016a, 2016b], optic nerve segmentation [Harrigan et al., 2014, 2015, 2016; Panda et al., 2014] and spinal cord segmentation [Asman et al., 2014]. Therefore, using NLSS in TICV estimation is promising as it takes both spatial varying performance and non‐local intensity correspondence into account. Although the NLSS method has been successfully applied in different applications, its mathematical derivation has not been published yet, which hinders other researchers seeking to implement and use NLSS methods.

In this article, we proposed to use NLSS approach to estimate TICV and PFV simultaneously from a single MR T1w image. The main contributions of this work are: (1) TICV and PFV are simultaneously obtained with explicit SCB. (2) We develop TICV and PFV labels for 45 images of the widely used OASIS dataset under BrainCOLOR protocol [Klein et al., 2010; Landman and Warfield, 2012)] and make a subset freely available online (https://masi.vuse.vanderbilt.edu/index.php/TICV_BC2atlases). (3) This the first journal appearance of NLSS method with detailed mathematical derivation. In the multi‐atlas segmentation framework, the pairs of T1w images and TICV labels (atlases) are essential [Iglesias and Sabuncu, 2015]. Normally, atlases are obtained by labor‐intensive manual tracing. However, since skull has much higher Hounsfield unit (HU) than other brain tissues [Feeman, 2010], we speed up the atlas generation using a semi‐manual strategy to obtain TICV and PFV labels using a dataset with 20 paired MR and CT images. Then, the TICV and PFV labels are propagated to the BrainCOLOR atlases [Klein et al., 2010; Landman and Warfield, 2012]) by deploying NLSS multi‐atlas segmentation. From leave‐one‐out evaluations and reproducibility analyses, the NLSS TICV estimation method demonstrates its advantages compared with FreeSurfer, FSL, SPM12, and a previously proposed STAPLE TICV estimation approach. The new TICV and PFV labels in OASIS BrainCOLOR atlases provide acceptable performance, which enables simultaneous whole brain segmentation as well as TICV and PFV estimation without conducting additional time‐consuming non‐rigid registrations. Moreover, NLSS tool is publically available as open source software through the JIST software package (http://www.nitrc.org/projects/jist/) [Li et al., 2012; Lucas et al., 2010].

THEORY

The derivation of NLSS closely follows Spatial STAPLE [Asman and Landman, 2012] and NLS [Asman and Landman, 2013], which use Expectation‐Maximization (EM) framework [Dempster et al., 1977; McLachlan and Krishnan, 2007]. The majority of the derivations of STAPLE (section “STAPLE”), Spatial STAPLE (section “Spatial STAPLE”), and Non‐Local STAPLE (section “Non‐local STAPLE”) are left to their original works, but they are described briefly in this work. The notation follows STAPLE [Warfield et al., 2004].

Problem Definition

A target gray‐level image with N voxels is represented as IRN×1. The corresponding latent true segmentation for the target image is given by T{0,1,,L1}N×1, where {0,1,,L1} represents L possible labels for a given voxel i ( i{1,2,,N}). Since T is unknown, the labels for the target image are estimated using R pairs of atlases with intensity values ARN×R and label decisions D{0,1,,L1}N×R. In STAPLE family of approaches, the label fusion problem is regarded as a probabilistic estimation of hidden true segmentation based on the performance of multiple atlases. The performance parameter θjss indicates the probability that observed label is s given that the true label is s for atlas j ( j{1,2,,R}). All θjss, can be written as a matrix θ[0,1]R×L×L, called performance parameters. The [0,1] indicates each θjss satisfies 0θjss1.

STAPLE

The full derivation of the STAPLE algorithm is available in [Warfield et al., 2004]. Briefly, the goal of STAPLE is to select the performance parameters θ, such that they maximize the complete log‐likelihood

θ^=argmaxθlnfDθ)  (1)

corresponding to the observed atlases D and the unobserved latent true labels T. Since T is not available, the performance parameters are estimated through EM framework. In the E‐step, the weight variables Wk[0,1]L×N are derived from θ(k), where Wsi(k) represents the probability that the true label of voxel i is s at iteration k given Wsi(k) f(Ti=s|D,θ(k)). Applying Bayes' rule and the assumed conditional independence between atlases, Wk for a particular voxel and label is given by

Wsi(k)=f(Ti=s)jfDij=s|Ti=s,θj(k)nf(Ti=n)jfDij=s|Ti=n,θj(k) (2)

where f(Ti=s) is the prior probability that label s is the true label at i and will be discussed in section “Non‐local Spatial STAPLE.” n represents all existing labels while  j represents all atlases. Using θjss(k) as the simplified expression of f(Dij=s|Ti=s,θj(k)), the Eq. (2) can be rewritten as

Wsi(k)=f(Ti=s)jθjss(k)nf(Ti=n)jθjsn(k) (3)

The denominator is the partition function to force sWsi(k)=1.

Following the derivation of [Warfield et al., 2004], the M‐Step maximizes performance parameters at the iteration k+1 as

θjk+1=argmaxθjiEln fDij Ti, θj D,θjk  (4)

which can be solved as

θjss(k+1)=i:Dij=sWsi(k)iWsi(k) (5)

where θjss0 and sθjss=1. This process iteratively solves for the true data likelihood in the E‐Step and updates the performance parameters in the M‐step.

Spatial STAPLE

Spatial STAPLE (SS) is an extension of the STAPLE algorithm where the performance parameters, θ, are calculated at each voxel [Asman and Landman, 2012]. The parameters are given by θ[0,1]R×N×L×L, which correspond to performance parameters defined voxel‐wise instead of globally. As a result, the E‐step in Spatial STAPLE is given by

Wsi(k)=f(Ti=s)jθjiss(k)nf(Ti=n)jθjisn(k) (6)

which incorporates the spatially varying performance. θjiss(k) is the simplified expression of f(Dij=s|Ti=s,θji(k)). The M‐step follows the derivation of STAPLE, but since the degrees of freedom are a factor of N higher than STAPLE, two extra extensions are included to account for the increased complexity. First, the performance parameters are binned over small pooling regions instead of a strictly voxel‐wise derivation. Following [Asman and Landman, 2012], this is implemented by defining spatial pooling regions B, where Bi is the index of the bin which voxel i is contained in. Second, the performance is augmented by a non‐parametric prior θj(0) on the performance following Asman and Landman [2012] and Landman et al. [2012]. This augmentation improves the stability of the performance parameters. Thus, the M‐step is given by

θjiss(k+1)=λijsθjss(0)+iBi: Dij=sWsi(k)λijssθjss(0)+iBi Wsi(k) (7)

where sθjiss=1. λijs is a weighting parameter depends on the size of pooling region B, which balances the prior and the updated probability. We derive λijs using the same definition as [Asman and Landman, 2012].

Non‐Local STAPLE

Non‐local STAPLE (NLS) incorporates the image intensity from both the atlas images ARN×R and target image IRN×1 into the STAPLE framework using a patch‐based non‐local correspondence manner [Asman and Landman, 2013]. Patch‐based non‐local correspondence was initially introduced to account for registration inaccuracy [Coupé et al., 2011]. NLS incorporates patch‐based non‐local correspondence into the STAPLE framework as follows. The E‐step is given by

Wsi(k)=f(Ti=s)jiNiθjss(k)αjiinf(Ti=n)jiNiθjsn(k)αjii (8)

Ni is a search neighborhood around voxel i and αjii is the non‐local weighting between voxel i in the target image at voxel i on the jth atlas, within the search parameter Ni. αjii is given by

αjii=1ZαexpAijIi222σi2expEii22σd2 (9)

where · is the set of intensities within its patch neighborhood. In this definition, AijIi22 is the L2‐norm between the atlas patch centered at i and the target patch centered at i, Eii2 is the Euclidean distance in physical space between i and i, σi and σd are the standard deviations for the intensity and distance weights, respectively, and Za normalizes α to be a valid probability distribution for each atlas and target voxel. The M‐step for Non‐Local STAPLE is

θjss(k+1)=iiNi:Dij=sαjiiWsi(k)iWsi(k) (10)

which follows the original M‐step of STAPLE while incorporating non‐local correspondence.

Non‐Local Spatial STAPLE

The Non‐local Spatial STAPLE (NLSS) algorithm follows directly from the derivations of Spatial STAPLE and Non‐local STAPLE. The NLSS algorithm defines the following performance level function

fD,A|T,I,θ (11)

In the NLSS algorithm, θ is spatially varying as in Spatial STAPLE and non‐local correspondence is used to account for registration errors.

NLSS E‐step

The E‐step of NLSS follows similar to the E‐step of STAPLE. First, Bayes' rule is applied as

Wsik=fTi=sfD,ATi=s, I,θnfTi=nf(D,A|Ti=n,I,θ) (12)

Following the expansions of sections “Spatial STAPLE” and “Non‐Local STAPLE,” this becomes

Wsi(k)=f(Ti=s)jiNiθjiss(k)αjiinf(Ti=n)jiNiθjisn(k)αjii (13)

This derivation incorporates both the spatially varying performance parameters derived in Spatial STAPLE and the non‐local correspondence derived in Non‐local STAPLE.

NLSS M‐step

In M‐step of NLSS, the previously calculated Wsi(k) is used to update θji(k+1) by maximizing the expectation of the log likelihood function as

θjik+1=argmaxθjiiBiElnfDj,Aj|Ti,Ii,θji|D,A,I,θk=argmaxθjiiBisWsi(k)lnfDj,Aj|Ti=s,Ii,θji=argmaxθjiiBisWsi(k)lniNi:Dij=sθjissαjii (14)

Using a Lagrange λ Multiplier [Bellman, 1956] with constrain sθjiss=1 and setting the derivative equal to zero this becomes

0=θjinniBisWsi(k)lniNi:Dij=sθjissαjii+λsθjiss(k+1) (15)

This equation can be solved as

θjiss(k+1)=iBiiNi:Dij=sαjiiWsi(k)iBiWsi(k) (16)

Like Spatial STAPLE, the same whole‐image implicit prior θjss(0) is introduced for computational and stability concerns [Asman and Landman, 2012]. The prior can be derived from a number of approaches (e.g., STAPLE [Warfield et al., 2004], Majority Vote, Locally Weighted Vote [Sabuncu et al., 2010], etc.). In our NLSS implementation, the Majority Vote while ignoring “consensus voxels” (i.e., voxels where all raters agree) is employed as default method [Rohlfing et al., 2004]. This method ignores the consensus voxels when constructing the performance level parameters. Then, the final stable version of Eq. (16) is reformulated to

θjiss(k+1)=λijsθjss(0)+iBiiNi:Dij=sαjiiWsi(k)λijssθjss(0)+iBiWsi(k) (17)

where λijs is a weighting parameter depends on the size of pooling region Bi, which balances the prior and the updated probability. We derive λijs and Bi using the same way as [Asman and Landman, 2012].

Notice that the Eq. (16) is the theoretical expression of M‐step in the EM framework while the Eq. (17) is an approximate maximizer for computational and stability concerns. The implementations of both cases have been provided in the publically available open‐source code, which enable the users to switch from each other by controlling λijs. In practice, the Eq. (17) typically provides better performance than Eq. (16). Therefore, the implementation of Eq. (17) is the default setting in NLSS open‐source code.

Initialization, parameters, and detection of convergence

The voxelwise prior f(Ti=s) in NLSS is initialized using the weak log‐odds majority vote [Sabuncu et al., 2010]. The performance parameters are typically initialized assuming each atlas has high performance as

θjiss=0.95 if s=s0.05L1else  (18)

The search neighborhood N· and the patch neighborhood · are the two key parameters in non‐local search model. The sensitivity of N· and · in NLSS is evaluated in Supporting Information Section S‐1 (Supporting Information Fig. S1). In all presented experiments, the search neighborhood N· was set to 7 × 7×7 voxels search window centered at a target voxel while the patch neighborhood · was empirically set to 3 × 3×3 voxels. The two standard deviation parameters σi and σd in Eq. (7) were empirically set to 0.1 and 1.5, respectively. The algorithm is iterated until the trace of the difference of confusions matrices between iterations is small, typically less than 104.

METHOD

This section first introduces a semi‐manual method to establish atlases with TICV and PFV labels (section “Semi‐Manual Segmentations and Semi‐Manual Atlases”). Second, the multi‐atlas segmentation framework using NLSS label fusion is demonstrated (section “NLSS Multi‐atlas framework”). Third, the procedure of generating TICV and PFV labels for the BrainCOLOR (BC) atlases is introduced (section “TICV and PFV labels for OASIS BrainCOLOR atlases”). Last, the statistical analysis methods used in this work are introduced (section “Statistical Analysis”).

Semi‐Manual Segmentations and Semi‐Manual Atlases

We start by automatic skull labeling using CT images, then obtain TICV labels (voxels inside brain skull), and finally propagate labels to MR images using rigid registration. The procedure of automatically generating TICV atlas (Fig. 1) is inspired by the recent work [Aguilar et al., 2015]. Briefly, each CT image is aligned to MR image using rigid registration [Ourselin et al., 2001] (Fig. 1a). Then, the skull masks are obtained from CT images, whose voxel values are greater than 300 HU [Sjolund et al., 2014] (Fig. 1b). Then, a 3D closing morphological operation (a dilation followed by an erosion) followed by neck removal [Segonne et al., 2004] is applied on the skull mask to obtain the binary skull label. The closing morphological operation fills the holes in the skull, and the inner side of the filled skull provides the SCB (Fig. 1c).

Figure 1.

Figure 1

Semi‐manual pipeline of establishing atlases. First, the TICV label is obtained by applying a threshold, morphological operations and the level set method on CT images. Then, the TICV label is propagated to MR image space and the reference PFV label are provided by merging TICV label and the automatic whole brain segmentation. Finally, the semi‐manual atlases are obtained by conducting manual refinement on the reference labels. [Color figure can be viewed at http://wileyonlinelibrary.com]

The TICV segmentation is the region inside the SCB. However, the SCB is not a closed surface (e.g., the foramen magnum in the occipital bone). Such opening regions make it difficult to derive the TICV segmentation by only using morphological operations. To deal with the opening regions automatically, Topology‐preserving Geometric Deformable Model (TGDM) [Han et al., 2003] with gradient vector flow (GVF) field [Xu and Prince, 1998] is employed. The Standard Geometric Deformable Model (SGDM) has been widely used in image segmentation due to its parameterization independence and ease of implementation. However, topological flexibility of SGDM is not always desired in medical image segmentation especially when the number of components has been known and must be preserved. Based on our anatomical prior knowledge, the TICV segmentation should only contain one component (one contour surface). Therefore, the TGDM framework is employed to keep such topology. In its implementation, the level set contour of TGDM is moved by the gradient vector flow (GVF) field [Xu and Prince, 1998]. The advantage of GVF field is that it forces the contour toward skull and has close to zero force at the opening regions. We also apply a curvature force [Han et al., 2003] to keep the surface smooth at the opening regions. Using TGDM, the non‐skull voxels inside zero level set are labeled as TICV segmentation. Such segmentation has a smooth boundary at the opening regions. By copying the labels from the registered CT images voxel‐by‐voxel, we obtain skull and TICV labels on MR images (Fig. 1d).

Then, we label posterior fossa within the TICV labels. Instead of doing complete manual delineation, a rough automatic segmentation is provided as the reference labels to accelerate the procedure. Briefly, we start with a NLSS multi‐atlas segmentation to obtain the whole brain segmentations (133 labels) for each MR image under BrainCOLOR protocol [Klein et al., 2010; Landman and Warfield, 2012] (Fig. 1f). Then, we group the cerebrum regions (above tentorium cerebelli) together, which excludes the CSF and tissues in posterior fossa tissues (cerebellum and brainstem) (Fig. 1g). A closing morphological operation is conducted to obtain the reference labels (Fig. 1h and 1j), which indicates the rough location of posterior fossa. Finally, a manual refinement step is conducted by an experienced graduate student to correct the inaccurate voxels in the reference labels and obtain the final PFV labels (Fig. 1j). Using this semi‐manual pipeline, we obtain the 20 atlases consist of both T1w images and labels (posterior fossa, cerebrum, and background). The TICV is the sum of posterior fossa and cerebrum.

NLSS Multi‐Atlas Framework

We use a canonical multi‐atlas segmentation framework which contains registration, atlas selection, label propagation, and label fusion [Iglesias and Sabuncu, 2015]. Briefly, the target image is first corrected by a N4 bias field correction [Tustison et al., 2010] and then affinely registered [Ourselin et al., 2001] to the MNI305 atlas [Evans et al., 1993]. Practically, using 10–20 atlases are sufficient to achieve accurate whole brain segmentation [Aljabar et al., 2009]. Empirically, the 15 closest atlases with smallest Euclidian distance to the target image on PCA manifold are chosen if total number of available atlases is greater than 15 [Asman et al., 2015]. Then, the 15 selected atlases are then non‐rigidly registered to the target image [Avants et al., 2008]. For non‐rigid registration, we use symmetric image normalization (SyN), with a cross correlation similarity metric convergence threshold of 109 and convergence window size of 15, provided by the Advanced Normalization Tools (ANTs) software [Avants et al., 2008]. Finally, the proposed NLSS label fusion is used to combine the labels from each atlas to the target image. After multi‐atlas labeling, each voxel is assigned to one of the labels.

TICV and PFV Labels for OASIS BrainCOLOR Atlases

Using the semi‐manual strategy described in section “Semi‐manual Segmentations and Semi‐manual Atlases,” Researchers are able to reconstruct semi‐manual atlases using their own data. However, the paired MR and CT images are not typically available, especially when people want to derive both TICV and PFV labels as well as whole brain segmentation simultaneously (e.g., 133 labels in BrainCOLOR protocol). Therefore, we propagate the TICV and PFV labels from semi‐manual atlases to the BrianCOLOR atlases [Klein et al., 2010; Landman and Warfield, 2012], which consist of 45 OASIS images [Marcus et al., 2007]. We have made a subset of the new BrainCOLOR atlases freely available online to facilitate the community.

Briefly, the semi‐manual atlases (Fig. 2b) are employed to segment 45 OASIS T1w images using the NLSS multi‐atlas segmentation (Fig. 2c). Then, the TICV and PFV labels are derived for the OASIS dataset, which are referred as BrainCOLOR1 (BC1) atlases. Then, the BrainCOLOR2 (BC2) atlases are derived by combining TICV and PFV labels with 133 original labels in BrainCOLOR atlases. Note that if the original manual labels conflict with the TICV or PFV definition in BC1 atlases, we keep the original labels in BC2. Finally, The BrainCOLOR3 (BC3) atlases are obtained by merging the TICV and PFV labels in BC2 atlases.

Figure 2.

Figure 2

BC1, BC2, and BC3 atlases are obtained by adding TICV and PFV labels. (a) 20 paired MR‐CT images are used to generate (b) semi‐manual atlases. Then the NLSS multi‐atlas segmentation is conducted on (c) T1w images 45 OASIS images in BrainCOLOR (BC) atlases to achieve TICV and PFV labels. (d) The first automatic segmentation results are referred as BC1 atlases. (e) Then the original 133 labels from BC are merged with BC1 atlases by keeping the BC labels if conflictions happen. The merged BC2 atlases contain 136 labels including the TICV, PFV, and BC labels. (f) The 136 labels are merged back to 4 labels to resolve conflicts and form the BC3 atlases. A subset of BC2 atlases have been made freely available online to facilitate other researchers. We compare the performance of BC1, BC2 and BC3 atlases as well as semi‐manual atlases in section “Data and Results.” [Color figure can be viewed at http://wileyonlinelibrary.com]

Statistical Analysis

In this article, we conduct several types of volumetric analyses between FreeSurfer, FSL, SPM12 and multi‐atlas approaches. To evaluate the volumetric similarity between the automatic methods and semi‐manual segmentations, the absolute volume similarity (ASIM) (a ratio from 0 to 1, higher is better) is employed as:

ASIM=1|V1V2|0.5(V1+V2) (19)

However, the ASIM only compares the similarity of volume sizes without reflecting the spatial information especially the accuracy of SCB. For instance, the segmentations that have similar amounts of volume may have large differences in spatial appearance and location. Therefore, the widely used Dice coefficient (Dice) is employed as:

Dice=2ABA+B (20)

where A and B represent any two binary volumes that need to be compared and · represents the volume of regions. Dice values evaluate the overlap between regions A and B which takes both volumetric and spatial information into account. Moreover, the mean surface distance (MSD) between A and B is also employed to measure the average surface distance between binary volumes.

The reproducibility is another important aspect of evaluating TICV estimation. In this article, we assess the reproducibility of TICV estimation using a test–retest strategy, which compares the TICV and PFV measurements between two consequential scans from the same subject. To capture this difference, the absolute volume difference (ADIFF) (a ratio from 0 to 1, lower is better) is used as:

ADIFF=|V1V2|0.5(V1+V2) (21)

After obtaining the previous metrics, the Wilcoxon signed rank test [Wilcoxon, 1945] is used for statistical analyses. All claims of statistically significance in this article are made using the Wilcoxon signed rank test for P < 0.05.

DATA AND RESULTS

Accuracy Test

Twenty subjects, with both MR and CT images from the deep‐brain stimulation (DBS) project, were employed to evaluate the accuracy of TICV and PFV estimation. The MR images were 3D T1w volumes with 256 × 256 × 190 voxels, which have 1 × 1 ×1 mm resolution. The CT images were acquired with pixel size = 0.49 mm, slice thickness = 0.625 mm and FOV = 250 × 250 × 190 mm. From these paired MR‐CT images, 20 semi‐manual atlases (MR T1w images and labels) were generated using the semi‐manual method described in section “Semi‐manual Segmentations and Semi‐manual Atlases.” Note that the CT images were only used in generating semi‐manual atlases, but were not used in the evaluations.

First, FreeSurfer (FS), FSL, and SPM12 were deployed on the 20 T1w MR images to estimate the TICV results. Then, the NLSS multi‐atlas framework was deployed on the same dataset using leave‐one‐out strategy. In each leave‐one‐out test, other 19 atlases were used as candidate atlases, which ensured the independence to the testing image. The linear relationship between the estimated TICV results and true TICV volumes (semi‐manual atlases) were evaluated by linear regressions (Fig. 3). The linear relationship between the estimated TICV results and with the true TICV volumes (semi‐manual atlases) were evaluated by linear regressions (Fig. 3). The R2 coefficient of determination was provided to indicate how strong the linearity was between measurements, where the higher R2 indicated the stronger linearity. From the results, the NLSS TICV estimation achieved the largest R2 values ( R2 = 0.970) to the semi‐manual segmentations while FSL had the lowest R2. NLSS TICV estimation also had R2 = 0.942 to FreeSurfer and R2 = 0.956 to SPM12. The lower right box plot indicated the ASIM scores for different methods compared with semi‐manual segmentation. NLSS TICV had significant higher ASIM scores than FreeSurfer and SPM12. The ASIM score for FSL was not shown since it only provided scaling factors rather than volumetric values.

Figure 3.

Figure 3

(a) Scatter plots comparing FreeSurfer, FSL, SPM12, and NLSS on TICV estimation. In the first column, different automatic methods are compared with semi‐manual segmentations by plotting the TICV volumes with a red line of best fit and NLSS method using semi‐manual atlases achieves latest R 2 = 0.970. The remaining columns show the scatter plots between automatic methods. NLSS method still achieves large R 2 values compared with FreeSurfer, FSL, and SPM12. (b) Box plot of ASIM values between FreeSurfer, SPM12, and NLSS with Semi‐manual segmentations. The proposed NLSS (“Ref.”) method achieves significantly higher (“ *”) ASIM scores than FreeSurfer and SPM12. Since FSL only provides scaling factors rather than TICV volumes, it does not have units in (a) and not shown in (b). [Color figure can be viewed at http://wileyonlinelibrary.com]

Second, NLSS TICV estimation was compared with the previously proposed STAPLE TICV estimation [Schaerer et al., 2012]. For more complete analyses, we also compared the NLSS estimation with other label fusion approaches such as majority vote (MV), Spatial STAPLE, NLS and joint label fusion (JLF) (Table 1 and 2) using the semi‐manual atlases. The JLF [Wang et al., PAMI 2013] approach is the state‐of‐the‐art label fusion method using non‐local intensity similarity. In each leave‐one‐out analyses, the BC1, BC2, and BC3 atlases (on 45 OASIS images) were also generated from the 19 semi‐manual atlases (using the method in section “TICV and PFV labels for OASIS BrainCOLOR atlases”). Then these intermediate atlases were also deployed on the target image and their accuracies were compared with semi‐manual atlases using the same NLSS multi‐atlas framework.

Table 1.

Accuracy test results of TICV

Atlases Does not use atlases Semi‐manual BC1 BC2 BC3
Methods FS FSL SPM12 MV STAPLE SS NLS JLF NLSS NLSS NLSS NLSS
Corr. Pearson 0.954 0.923 0.953 0.959 0.957 0.960 0.969 0.985 0.985 0.965 0.963 0.964
ICC 0.836 N/A 0.916 0.961 0.936 0.957 0.964 0.985 0.985 0.942 0.964 0.907
ASIM
μ
0.941 N/A 0.964 0.976 0.971 0.977 0.979 0.986 0.986 0.972 0.978 0.961
σ
0.032 N/A 0.023 0.022 0.0285 0.024 0.024 0.014 0.015 0.026 0.020 0.028
Dice
μ
N/A N/A N/A 0.977 0.975 0.977 0.979 0.983 0.983 0.975 0.975 0.970
σ
N/A N/A N/A 0.008 0.01 0.008 0.008 0.005 0.006 0.008 0.006 0.009

MSD

(mm)

μ
N/A N/A N/A 0.968 1.058 0.984 0.888 0.725 0.743 1.106 1.112 1.245
σ
N/A N/A N/A 0.268 0.374 0.301 0.294 0.184 0.197 0.306 0.244 0.326

“Corr.” means correlation analyses. The bold values indicate the best performance. “N/A” means the values are not available since (1) FSL only provides scaling factors rather than TICV volumes, (2) FreeSurfer (FS), and SPM12 (SPM) do not generate hard TICV segmentation in individual space during the standard default processing. Note: “SS” is Spatial STAPLE. “ μ” is the mean and “ σ” is the standard deviation.

Table 2.

Accuracy test results of PFV

Atlases Semi‐manual BC1 BC2 BC3
Methods MV STAPLE SS NLS JLF NLSS NLSS NLSS NLSS
Corr. Pearson 0.947 0.934 0.949 0.963 0.979 0.976 0.958 0.958 0.958
ICC 0.944 0.818 0.945 0.953 0.971 0.975 0.919 0.951 0.888
ASIM
μ
0.975 0.940 0.973 0.974 0.982 0.984 0.963 0.973 0.953
σ
0.023 0.029 0.021 0.018 0.017 0.016 0.02 0.018 0.02
Dice
μ
0.960 0.951 0.959 0.964 0.968 0.968 0.955 0.954 0.954
σ
0.008 0.011 0.007 0.007 0.006 0.006 0.006 0.006 0.006

MSD

(mm)

μ
0.847 1.011 0.858 0.767 0.689 0.675 0.946 0.933 0.969
σ
0.15 0.214 0.14 0.126 0.120 0.107 0.121 0.118 0.132

Please see Table 1 for the descriptions of abbreviations.

Table 1 shown four different metrics of evaluating the accuracy of different TICV measurement approaches: (1) Intraclass correlation (ICC) and Pearson Correlation were used to measure the correlation between different methods and semi‐manual segmentations. The two‐way random single measure was used as the ICC model [Shrout and Fleiss, 1979]. (2) The ASIM values were used to show the accuracy of TICV volumetric estimation. (3) Dice similarity coefficients were employed to take the spatial information into account upon the ASIM metric. (4) MSD values were also derived to measure the average surface distance between binary segmentations. From Table 1, the family of multi‐atlas segmentations (MV, STAPLE, SS, NLS, JLF, and NLSS) obtained higher correlation coefficients than the prevalent FreeSurfer, FSL, and SPM12 approaches. The multi‐atlas approaches achieved higher mean and smaller standard deviation (std) on ASIM metric. Within the multi‐atlas family, when using the same semi‐manual atlases, the NLSS TICV estimation achieved higher scores on correlation coefficients, mean ASIM and mean Dice than previously proposed STABLE TICV estimation. Meanwhile, it had the smaller mean MSD and the lower standard deviation than the STAPLE method. The NLSS estimation was significantly superior to MV, Spatial STAPLE, NLS on both TICV (Table 1I) and PFV (Table 2). The NLSS and JLF had advantages on PFV and TICV respectively. However, the differences between NLSS and JLF were not statistically significant. When comparing the performance between different atlases, the BC1, BC2 and BC3 atlases performed worse than the semi‐manual atlases on correlation coefficients, mean ASIM, mean Dice, and mean MSD. However, the correlation coefficients and the mean ASIM values of using BC1, BC2, and BC3 atlases were still higher than FreeSurfer, FSL, and SPM12.

Figures 4, 5, 6 show the box plots and the statistical results using Wilcoxon signed rank test. In each figure, the statistical analyses were conducted between the NLSS method using semi‐manual atlases (marked as reference “Ref.”) with other approaches or different atlases. If the difference was statistically significant, we marked the method with “*” symbol. Otherwise, we marked the method with not significant “N.S.” Figure 4 shows the ASIM values, which only considered volumetric results for both TICV and PFV segmentations. For TICV estimation, the ASIM of NLSS (semi‐manual atlases) was significantly higher than FreeSurfer, SPM12, STAPLE, Spatial STAPLE, and NLS. For PFV estimation, the ASIM of NLSS (semi‐manual atlases) was significantly higher than STAPLE, Spatial STAPLE, and NLS. The different performance between NLSS and JLF are not statistically significant. Using the same NLSS method with different atlases, the semi‐manual atlases performed significantly better than BC1, BC2, and BC3 atlases in both TICV and PFV volumetric estimation.

Figure 4.

Figure 4

Box plots and statistical results on volume accuracy. The statistical analyses were conducted between the proposed NLSS TICV estimation using semi‐manual atlases (marked as reference “Ref.”) with other approaches or different atlases. If the difference was statistically significant, we marked the other method with “*” symbol. Otherwise, we marked it as “N.S.”. [Color figure can be viewed at http://wileyonlinelibrary.com]

Figure 5.

Figure 5

Box plots and statistical results on Dice coefficients. The statistical analyses were conducted between the proposed NLSS TICV estimation using semi‐manual atlases (marked as reference “Ref.”) with other approaches or different atlases. If the difference was statistically significant, we marked the other method with “*” symbol. Otherwise, we marked it as “N.S.”. [Color figure can be viewed at http://wileyonlinelibrary.com]

Figure 6.

Figure 6

Box plots and statistical results on mean surface distance (MSD). The statistical analyses were conducted between the proposed NLSS TICV estimation using semi‐manual atlases (marked as reference “Ref.”) with other approaches or different atlases. If the difference was statistically significant, we marked the other method with “*” symbol. Otherwise, we marked it as “N.S.”. [Color figure can be viewed at http://wileyonlinelibrary.com]

It is also important to note how the improved accuracy is able to be translated into clinical research benefits. We evaluated the statistical power of detecting a group difference between two simulated clinical cohorts using two‐sample t‐test at significant level 0.05. The power analyses were shown in the Supporting Information section S‐2 (Supporting Information Fig. S2).

Figure 5 employed the Dice similarity coefficients as the metric, which took both volumetric and spatial information into account. Since the TICV and PFV segmentations were not provided by the default processing in FreeSurfer, FSL, and SPM12, we conducted statistical analyses within the multi‐atlas family. For both TICV and PFV segmentations, the NLSS using semi‐manual atlases achieved the significant higher Dice values than MV, STAPLE, Spatial STAPLE, and NLS. The semi‐manual atlases also achieved significant higher Dice values than the BC1, BC2, or BC3 atlases. Figure 6 reflected the statistical analyses on MSD. Again, NLSS using semi‐manual atlases had the smaller MSD compared with MV, STAPLE, Spatial STAPLE, and NLS. The performance between NLSS and JLF in Figures 5 and 6 are not statistically significant. To visually check the findings in Figures 5 and 6, Figure 7 shows the qualitative performance of different methods on the same subject. The surfaces of the semi‐manual segmentations, which used as reference results, were remarked as red contours. The area of positive error (estimate larger than reference) was the area with green and purple color outside the contours while the negative error (estimate smaller than reference) was colored as white.

Figure 7.

Figure 7

Qualitative results comparing multi‐atlas segmentation methods with semi‐manual segmentation. The red contours represent the spatial location of the semi‐manual segmentation. The white color indicates the negative error, in which the estimated segmentation is smaller than the semi‐manual reference. The green and purple color outside the red contours indicates the positive error, in which the estimated segmentation is larger than reference. [Color figure can be viewed at http://wileyonlinelibrary.com]

Reproducibility Test

We employed the publicly available Kirby21 dataset (https://www.nitrc.org/projects/multimodal), which consisted of scan‐rescan images on 21 subjects [Landman et al., 2011]. Each subject had two scans with multispectral MR data (e.g., MPRAGE, FLAIR, DIT, etc.) and we used 42 T1w MPRAGE images (with 1 × 1  × 1.2 mm resolution over an FOV of 240 × 204 × 256 mm) in this reproducibility test. Ideally, the TICV and PFV estimations between two scans from the same subject should be close to each other.

Figure 8 demonstrated the reproducibility of different methods on the same 21 pairs of scan‐rescan T1w images. We used the ADIFF metric to reflect the ratio of the different volume in the total volume. The results indicated that for both TICV and PFV estimations, all methods achieved small ADIFF values (mostly smaller than 2%).

Figure 8.

Figure 8

Volumetric reproducibility analysis of different approaches on scan‐rescan T1w images. For all methods, inconsistency of TICV estimation between two scans on the same subject is less than 2%. The statistical analyses were conducted between the proposed NLSS TICV estimation using semi‐manual atlases (marked as reference “Ref.”) with other approaches or different atlases. If the difference was statistically significant, we marked the other method with “*” symbol. Otherwise, we marked it as “N.S.” [Color figure can be viewed at http://wileyonlinelibrary.com]

CONCLUSION AND DISCUSSION

This article proposes the simultaneous TICV and PFV estimation framework using multi‐atlas label fusion. Using the NLSS multi‐atlas framework, we are able to obtain accurate TICV and PFV estimation simultaneously with explicit boundary between skull and CSF. The mathematical derivation is provided for NLSS. The performance of the proposed method was compared with prevalent FreeSurfer, FSL, and SPM12 methods and the previously proposed STAPLE based TICV estimation. For more complete analyses, the NLSS method is also compared with MV, Spatial STAPLE, NLS, and JLF.

Compared with the FreeSurfer, FSL, SPM12, the proposed NLSS approach achieves significant superior performance in TICV estimation with highest correlation coefficients, mean ASIM, mean Dice and lowest mean MSD (Table 1 and 2, Fig. 3). Compared with other label‐fusion methods (Figs. 4, 5, 6): (1) NLSS approach achieves statistical better performance in simultaneous TICV and PFV estimation than the previously proposed STAPLE method [Schaerer et al., 2012]. (2) NLSS approach achieves statistical superior performance than MV, Spatial STAPLE, and NLS). (3) For ASIM, Dice, and MSD, the differences between NLSS and JLF are not statistically significant, which means NLSS and JLF are comparable accurate in TICV and PFV estimation. From Table 1 and 2, the JLF has overall better measurements in TICV estimation, while the NLSS has better measurements in PFV estimation. From Figure 8, all methods achieve high reproducibility (ADIFF < 0.2). JLF method achieves statistical smaller ADIFF score than NLSS method on TICV estimation. Overall considering all results, JLF is superior on TICV side while NLSS is superior on PFV side when conducting the simultaneous TICV and PFV estimation.

The accuracy and reproducibility are the two essential aspects when evaluating the performance of TICV estimation. FreeSurfer, FSL, and SPM12 achieves high reproducibility demonstrates that the affine registration and tissue segmentation used in the three methods are reproducible. The superior accuracy and high reproducibility indicate that the multi‐atlas based approaches do not compromise on reproducibility while providing more accurate estimations. The multi‐atlas labeling approaches not only provided more accurate TICV estimation but also estimated PFV simultaneously (which is not available in FreeSurfer, SPM12, and FSL). The PFV is essential in investigating the clinical conditions of the cerebellum [Badie et al., 1995; Nyland and Krogness, 1978; Sgouros et al., 2006]. In the Supporting Information section S‐2, we show that the improvement of accuracy in TICV estimation is able to be translated to greater statistical power on simulated clinical cohorts. The continuing investigation of this work would be on the relationship between the accuracy of TICV estimations and the power of detecting differences between empirical datasets. For instance, we could evaluate the statistical power of detecting the differences of particular metrics (corrected by TICV) between patients and controls using different TICV estimation methods.

We provide new TICV and PFV labels on the widely used 45 OASIS images using BrainCOLOR protocol. The new atlases enable simultaneous BrainCOLOR, TICV and PFV segmentation from only one set of time‐consuming non‐rigid registration. To evaluate the performance of the new BC1, BC2, and BC3 atlases, we compared them with semi‐manual atlases using the same NLSS framework. Using these intermediate atlases, we lost less than 2% of accuracy from ASIM and Dice score and increased the MSD to less than 0.5 mm compared with directly using semi‐manual atlases. However, the performances of BC1, BC2, and BC3 atlases are still better than FreeSurfer, FSL and SPM12 (Table 1). Since the BC2 atlases have included original BrainCOLOR labels, we provide a subset of BC2 atlases freely available online to facilitate other researchers (https://masi.vuse.vanderbilt.edu/index.php/TICV_BC2atlases). The T1w MR images of the same OASIS images for BC2 atlases are available via subscription from Neuromorphometrics Inc. (http://www.neuromorphometrics.com/) and the subset of them are freely available from MICCAI 2012 Grand Challenge and Workshop on Multi‐Atlas Labeling [Landman and Warfield, 2012] (https://masi.vuse.vanderbilt.edu/workshop2012/).

The semi‐manual atlas generation method may be applied on other datasets if paired MR and CT images are available. The rigid registration is used to align CT and MRI images in this study. The registration performance might be affected if huge neck/jaw movements happen in either modality. For such cases, applying a brain mask (masking out neck and jaw) before registration would address the movement issue. The proposed NLSS multi‐atlas segmentation framework is flexible in terms of incorporating other regions of interest during TICV estimation. For example, recently, multi‐atlas labeling has been used to label brain skull on CT‐MRI datasets [Torrado‐Carvajal et al., 2016]. In TICV and posterior fossa estimation, we only interested in the accuracy of the inner skull boundary, so we did not seek to fully characterize the cranium. However, it would be interesting to simultaneously provide TICV, PFV and skull labels in the future. The TICV estimation using multi‐atlas segmentation is computationally more expensive than using FreeSurfer, FSL and SPM since multiple non‐rigid registrations (≈1.5 hours per registration) are conducted for a target image. However, the total length of running time can be reduced by running such independent registrations in parallel. Moreover, the computed registration can be used for other purpose (e.g., segmenting other brain structure, morphometry, manifold learning, etc.).

Supporting information

Supporting Information

ACKNOWLEDGMENT

This research was conducted with the support from Intramural Research Program, National Institute on Aging, NIH. This study was also supported in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, Nashville, TN. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors have no conflict of interest to declare.

REFERENCES

  1. Aguilar C, Edholm K, Simmons A, Cavallin L, Muller S, Skoog I, Larsson EM, Axelsson R, Wahlund LO, Westman E (2015): Automated CT‐based segmentation and quantification of total intracranial volume. Eur Radiol 25:3151–3160. [DOI] [PubMed] [Google Scholar]
  2. Akhondi‐Asl A, Hoyte L, Lockhart ME, Warfield SK (2014): A logarithmic opinion pool based STAPLE algorithm for the fusion of segmentations with associated reliability weights. IEEE Trans Med Imaging 33:1997–2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aljabar P, Heckemann RA, Hammers A, Hajnal JV, Rueckert D (2009): Multi‐atlas based segmentation of brain images: Atlas selection and its effect on accuracy. Neuroimage 46:726–738. [DOI] [PubMed] [Google Scholar]
  4. Ananth H, Popescu I, Critchley HD, Good CD, Frackowiak RS, Dolan RJ (2014): Cortical and subcortical gray matter abnormalities in schizophrenia determined through structural magnetic resonance imaging with optimized volumetric voxel‐based morphometry. Am J Psychiatry 159:1497–1505. [DOI] [PubMed] [Google Scholar]
  5. Ashburner J, Friston KJ (2005): Unified segmentation. Neuroimage 26:839–851. [DOI] [PubMed] [Google Scholar]
  6. Asman AJ, Landman B (2011): Robust statistical label fusion through consensus level, labeler accuracy, and truth estimation (COLLATE). IEEE Trans Med Imaging 30:1779–1794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Asman AJ, Landman BA (2012): Formulating spatially varying performance in the statistical fusion framework. IEEE Trans Med Imaging 31:1326–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Asman AJ, Landman BA (2013): Non‐local statistical label fusion for multi‐atlas segmentation. Med Image Anal 17:194–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Asman AJ, Landman BA (2014): Hierarchical performance estimation in the statistical label fusion framework. Med Image Anal 18:1070–1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Asman AJ, Bryan FW, Smith SA, Reich DS, Landman BA (2014): Groupwise multi‐atlas segmentation of the spinal cord's internal structure. Med Image Anal 18:460–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Asman AJ, Huo Y, Plassard AJ, Landman BA (2015): Multi‐atlas learner fusion: An efficient segmentation approach for large‐scale data. Med Image Anal 26:82–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Avants BB, Epstein CL, Grossman M, Gee JC (2008): Symmetric diffeomorphic image registration with cross‐correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal 12:26–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Badie B, Mendoza D, Batzdorf U (1995): Posterior fossa volume and response to suboccipital decompression in patients with Chiari I malformation. Neurosurgery 37:214–218. [DOI] [PubMed] [Google Scholar]
  14. Barnes J, Ridgway GR, Bartlett J, Henley SM, Lehmann M, Hobbs N, Clarkson MJ, MacManus DG, Ourselin S, Fox NC (2010): Head size, age and gender adjustment in MRI studies: A necessary nuisance?. Neuroimage 53:1244–1255. [DOI] [PubMed] [Google Scholar]
  15. Bellman R (1956): Dynamic programming and lagrange multipliers. Proc Natl Acad Sci U S A 42:767–769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Boyes RG, Rueckert D, Aljabar P, Whitwell J, Schott JM, Hill DL, Fox NC (2006): Cerebral atrophy measurements using Jacobian integration: Comparison with the boundary shift integral. NeuroImage 32:159–169. [DOI] [PubMed] [Google Scholar]
  17. Buckner RL, Head D, Parker J, Fotenos AF, Marcus D, Morris JC, Snyder AZ (2004): A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas‐based head size normalization: Reliability and validation against manual measurement of total intracranial volume. NeuroImage 23:724–738. [DOI] [PubMed] [Google Scholar]
  18. Commowick O, Warfield SK (2010): Incorporating priors on expert performance parameters for segmentation validation and label fusion: A maximum a posteriori STAPLE. Medical Image Computing and Computer‐Assisted Intervention–MICCAI 2010. Springer. pp 25–32. [DOI] [PMC free article] [PubMed]
  19. Coupé P, Manjón JV, Fonov V, Pruessner J, Robles M, Collins DL (2011): Patch‐based segmentation using expert priors: Application to hippocampus and ventricle segmentation. Neuroimage 54:940–954. [DOI] [PubMed] [Google Scholar]
  20. Dale AM, Fischl B, Sereno MI (1999): Cortical surface‐based analysis. I. Segmentation and surface reconstruction. NeuroImage 9:179–194. [DOI] [PubMed] [Google Scholar]
  21. Davis P, Wright E (1977): A new method for measuring cranial cavity volume and its application to the assessment of cerebral atrophy at autopsy. Neuropathol Appl Neurobiol 3:341–358. [Google Scholar]
  22. Dempster AP, Laird NM, Rubin DB (1977): Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38. [Google Scholar]
  23. Driscoll I, Davatzikos C, An Y, Wu X, Shen D, Kraut M, Resnick SM (2009): Longitudinal pattern of regional brain volume change differentiates normal aging from MCI. Neurology 72:1906–1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Evans AC, Collins DL, Mills S, Brown E, Kelly R, Peters TM (1993): 3D statistical neuroanatomical models from 305 MRI volumes. In: IEEE Nuclear Science Symposium and Medical Imaging Conference. pp 1813–1817.
  25. Farias ST, Mungas D, Reed B, Carmichael O, Beckett L, Harvey D, Olichney J, Simmons A, DeCarli C (2012): Maximal brain size remains an important predictor of cognition in old age, independent of current brain pathology. Neurobiol Aging 33:1758–1768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Feeman TG (2010): The Mathematics of Medical Imaging: A Beginner's Guide. New York: Springer Science & Business Media. [Google Scholar]
  27. Fein G, Di Sclafani V, Taylor C, Moon K, Barakos J, Tran H, Landman B, Shumway R (2004): Controlling for premorbid brain size in imaging studies: T1‐derived cranium scaling factor vs. T2‐derived intracranial vault volume. Psychiatry Res 131:169–176. [DOI] [PubMed] [Google Scholar]
  28. Han X, Xu C, Prince JL (2003): A topology preserving level set method for geometric deformable models. IEEE Trans Pattern Anal Mach Intell 25:755–768. [Google Scholar]
  29. Hansen T, Brezova V, Eikenes L, Håberg A, Vangberg T (2015): How does the accuracy of intracranial volume measurements affect normalized brain volumes? Sample size estimates based on 966 subjects from the HUNT MRI cohort. Am J Neuroradiol 36:1450–1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Harrigan RL, Pandaa S, Asmana AJ, Nelsona KM, Chagantib S, DeLisic MP, Yvernaulta BC, Smithc SA, Gallowayc RL, Mawne LA (2014): Robust optic nerve segmentation on clinically acquired CT. In: Proc SPIE Int Soc Opt Eng. pp 9034:90341G. [DOI] [PMC free article] [PubMed]
  31. Harrigan RL, Plassard AJ, Mawn LA, Galloway RL, Smith SA, Landman BA (2015): Constructing a statistical atlas of the radii of the optic nerve and cerebrospinal fluid sheath in young healthy adults. In: International Society for Optics and Photonics. pp 941303–9413037. [DOI] [PMC free article] [PubMed]
  32. Harrigan RL, Smith AK, Mawn LA, Smith SA, Landman BA (2016): Short term reproducibility of a high contrast 3‐D isotropic optic nerve imaging sequence in healthy controls. In: International Society for Optics and Photonics. pp 97831L–97831L8. [DOI] [PMC free article] [PubMed]
  33. Hartley SW, Scher AI, Korf ES, White LR, Launer LJ (2006): Analysis and validation of automated skull stripping tools: A validation study based on 296 MR images from the Honolulu Asia aging study. NeuroImage 30:1179–1186. [DOI] [PubMed] [Google Scholar]
  34. Huo Y, Swett K, Resnick SM, Cutting LE, Landman BA (2015): Data‐driven probabilistic atlases capture whole‐brain individual variation. In: Proceedings of the 1st Miccai 2015 Workshop on Management and Processing of images for Population Imaging‐MICCAI‐MAPPING2015, C. Barillot, M. Dojat, D. Kennedy and W. Niessen. p 7.
  35. Huo Y, Carass A, Resnick SM, Pham DL, Prince JL, Landman BA (2016a): Combining Multi‐atlas Segmentation with Brain Surface Estimation. In: Proceedings of SPIE–the International Society for Optical Engineering, 9784. [DOI] [PMC free article] [PubMed]
  36. Huo Y, Plassard AJ, Carass A, Resnick SM, Pham DL, Prince JL, Landman BA (2016b): Consistent cortical reconstruction and multi‐atlas brain segmentation. NeuroImage. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Iglesias JE, Sabuncu MR (2015): Multi‐atlas segmentation of biomedical images: A survey. Med Image Anal 24:205–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Keihaninejad S, Heckemann RA, Fagiolo G, Symms MR, Hajnal JV, Hammers A (2010): A robust method to estimate the intracranial volume across MRI field strengths (1.5T and 3T). Neuroimage 50:1427–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Klein A, Dal Canton T, Ghosh SS, Landman B, Lee J, Worth A (2010): Open labels: Online feedback for a public resource of manually labeled brain images. In: 16th Annual Meeting for the Organization of Human Brain Mapping.
  40. Landman B, Warfield S (2012): MICCAI 2012 workshop on multi‐atlas labeling.
  41. Landman BA, Huang AJ, Gifford A, Vikram DS, Lim IA, Farrell JA, Bogovic JA, Hua J, Chen M, Jarso S, Smith SA, Joel S, Mori S, Pekar JJ, Barker PB, Prince JL, van Zijl PC (2011): Multi‐parametric neuroimaging reproducibility: A 3‐T resource study. Neuroimage 54:2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Landman BA, Asman AJ, Scoggins AG, Bogovic JA, Xing F, Prince JL (2012): Robust statistical fusion of image labels. IEEE Trans Med Imaging 31:512–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lemieux L, Hammers A, Mackinnon T, Liu RS (2003): Automatic segmentation of the brain and intracranial cerebrospinal fluid in T1‐weighted volume MRI scans of the head, and its application to serial cerebral and intracranial volumetry. Magn Reson Med 49:872–884. [DOI] [PubMed] [Google Scholar]
  44. Li B, Bryan F, Landman BA (2012): Next Generation of the Java Image Science Toolkit (JIST): Visualization and Validation. Insight J 2012:1–16. [PMC free article] [PubMed] [Google Scholar]
  45. Lucas BC, Bogovic JA, Carass A, Bazin PL, Prince JL, Pham DL, Landman BA (2010): The Java Image Science Toolkit (JIST) for rapid prototyping and publishing of neuroimaging software. Neuroinformatics 8:5–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Malone IB, Leung KK, Clegg S, Barnes J, Whitwell JL, Ashburner J, Fox NC, Ridgway GR (2015): Accurate automatic estimation of total intracranial volume: A nuisance variable with less nuisance. Neuroimage 104:366–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL (2007): Open Access Series of Imaging Studies (OASIS): cross‐sectional MRI data in young, middle aged, nondemented, and demented older adults. J Cogn Neurosci 19:1498–1507. [DOI] [PubMed] [Google Scholar]
  48. Mathalon DH, Sullivan EV, Rawles JM, Pfefferbaum A (1993): Correction for head size in brain‐imaging measurements. Psychiatry Res 50:121–139. [DOI] [PubMed] [Google Scholar]
  49. McLachlan G, Krishnan T (2007): The EM Algorithm and Extensions. New York: John Wiley & Sons. [Google Scholar]
  50. Nordenskjold R, Malmberg F, Larsson EM, Simmons A, Brooks SJ, Lind L, Ahlstrom H, Johansson L, Kullberg J (2013): Intracranial volume estimated with commonly used methods could introduce bias in studies including brain volume measurements. Neuroimage 83:355–360. [DOI] [PubMed] [Google Scholar]
  51. Nyland H, Krogness KG (1978): Size of posterior fossa in Chiari type 1 malformation in adults. Acta Neurochir 40:233–242. [DOI] [PubMed] [Google Scholar]
  52. Ourselin S, Roche A, Subsol G, Pennec X, Ayache N (2001): Reconstructing a 3D structure from serial histological sections. Image Vision Comput 19:25–31. [Google Scholar]
  53. Panda S, Asman AJ, Delisi MP, Mawn LA, Galloway RL, Landman BA (2014): Robust Optic Nerve Segmentation on Clinically Acquired CT. In: Proceedings of SPIE–the International Society for Optical Engineering, 9034:90341G. [DOI] [PMC free article] [PubMed]
  54. Peelle JE, Cusack R, Henson RNA (2012): Adjusting for global effects in voxel‐based morphometry: Gray matter decline in normal aging. Neuroimage 60:1503–1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Pengas G, Pereira JM, Williams GB, Nestor PJ (2009): Comparative reliability of total intracranial volume estimation methods and the influence of atrophy in a longitudinal semantic dementia cohort. J Neuroimaging 19:37–46. [DOI] [PubMed] [Google Scholar]
  56. Perlaki G, Orsi G, Plozer E, Altbacker A, Darnai G, Nagy SA, Horvath R, Toth A, Doczi T, Kovacs N (2014): Are there any gender differences in the hippocampus volume after head‐size correction? A volumetric and voxel‐based morphometric study. Neurosci Lett 570:119–123. [DOI] [PubMed] [Google Scholar]
  57. Perneczky R, Wagenpfeil S, Lunetta KL, Cupples LA, Green RC, Decarli C, Farrer LA, Kurz A (2010): Head circumference, atrophy, and cognition: Implications for brain reserve in Alzheimer disease. Neurology 75:137–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Ridgway G, Barnes J, Pepple T, Fox N (2011): Estimation of total intracranial volume; a comparison of methods. Alzheimer's Dement 7:S62–S63. [Google Scholar]
  59. Rohlfing T, Russakoff DB, Maurer CR (2003a): Expectation maximization strategies for multi‐atlas multi‐label segmentation. In: Proceedings of the information processing in medical imaging conference, 18, pp 210–221. [DOI] [PubMed]
  60. Rohlfing T, Russakoff DB, Maurer CR Jr (2003b): Extraction and application of expert priors to combine multiple segmentations of human brain tissue. Medical Image Computing and Computer‐Assisted Intervention‐MICCAI 2003: Springer. pp 578–585.
  61. Rohlfing T, Russakoff DB, Maurer CR Jr (2004): Performance‐based classifier combination in atlas‐based image segmentation using expectation‐maximization parameter estimation. IEEE Trans Med Imaging 23:983–994. [DOI] [PubMed] [Google Scholar]
  62. Sabuncu MR, Yeo BT, Van Leemput K, Fischl B, Golland P (2010): A generative model for image segmentation based on label fusion. IEEE Trans Med Imaging 29:1714–1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Schaerer J, Belaroussi B, Bonnand F, Roche F, Bracoud L, Yu HJ, Pachai C (2012): Accurate intracranial cavity volume estimation using multiatlas segmentation. Alzheimer's Dement 8:P272. [Google Scholar]
  64. Segonne F, Dale AM, Busa E, Glessner M, Salat D, Hahn HK, Fischl B (2004): A hybrid approach to the skull stripping problem in MRI. NeuroImage 22:1060–1075. [DOI] [PubMed] [Google Scholar]
  65. Sgouros S, Kountouri M, Natarajan K (2006): Posterior fossa volume in children with Chiari malformation Type I. J Neurosurg 105:101–106. [DOI] [PubMed] [Google Scholar]
  66. Shen D, Zhang D, Young A, Parvin B (2015): Editorial machine learning and data mining in medical Imaging. IEEE J Biomed Health Inform 19:1587–1588. [DOI] [PubMed] [Google Scholar]
  67. Shrout PE, Fleiss JL (1979): Intraclass correlations: Uses in assessing rater reliability. Psychol Bull 86:420–428. [DOI] [PubMed] [Google Scholar]
  68. Sjolund J, Jarlideni AE, Andersson M, Knutsson H, Nordstrom H (2014): IEEE Skull Segmentation in MRI by a Support Vector Machine Combining Local and Global Features. pp 3274–3279.
  69. Smith SM (2002): Fast robust automated brain extraction. Hum Brain Mapp 17:143–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, De Stefano N (2002): Accurate, robust, and automated longitudinal and cross‐sectional brain change analysis. Neuroimage 17:479–489. [DOI] [PubMed] [Google Scholar]
  71. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TEJ, Johansen‐Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang YY, De Stefano N, Brady JM, Matthews PM (2004): Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23:S208–S219. [DOI] [PubMed] [Google Scholar]
  72. Torrado‐Carvajal A, Herraiz JL, Hernandez‐Tamames JA, San Jose‐Estepar R, Eryaman Y, Rozenholc Y, Adalsteinsson E, Wald LL, Malpica N (2016): Multi‐atlas and label fusion approach for patient‐specific MRI based skull estimation. Magn Reson Med 75:1797–1807. [DOI] [PubMed] [Google Scholar]
  73. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC (2010): N4ITK: Improved N3 bias correction. IEEE Trans Med Imaging 29:1310–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Van Leemput K, Sabuncu MR (2014): A Cautionary Analysis of Staple Using Direct Inference of Segmentation Truth. New York: Springer; pp 398–406. [DOI] [PubMed] [Google Scholar]
  75. Warfield SK, Zou KH, Wells WM (2004): Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Trans Med Imaging 23:903–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Weiskopf N, Lutti A, Helms G, Novak M, Ashburner J, Hutton C (2011): Unified segmentation based correction of R1 brain maps for RF transmit field inhomogeneities (UNICORT). Neuroimage 54:2116–2124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Westman E, Aguilar C, Muehlboeck JS, Simmons A (2013): Regional magnetic resonance imaging measures for multivariate analysis in alzheimer's disease and mild cognitive impairment. Brain Topogr 26:9–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Whitwell JL, Crum WR, Watt HC, Fox NC (2001): Normalization of cerebral volumes by use of intracranial volume: Implications for longitudinal quantitative MR imaging. Am J Neuroradiol 22:1483–1489. [PMC free article] [PubMed] [Google Scholar]
  79. Wilcoxon F (1945): Individual comparisons by ranking methods. Biometr Bull 80–83. [Google Scholar]
  80. Xu C, Prince JL (1998): Snakes, shapes, and gradient vector flow. IEEE Trans Image Process 7:359–369. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information


Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES