Abstract
Accurate segmentation of the brain into gray matter, white matter, and cerebrospinal fluid using magnetic resonance (MR) imaging is critical for visualization and quantification of brain anatomy. Compared to 3T MR images, 7T MR images exhibit higher tissue contrast that is contributive to accurate tissue delineation for training segmentation models. In this paper, we propose a cascaded nested network (CaNes-Net) for segmentation of 3T brain MR images, trained by tissue labels delineated from the corresponding 7T images. We first train a nested network (Nes-Net) for a rough segmentation. The second Nes-Net uses tissue-specific geodesic distance maps as contextual information to refine the segmentation. This process is iterated to build CaNes-Net with a cascade of Nes-Net modules to gradually refine the segmentation. To alleviate the misalignment between 3T and corresponding 7T MR images, we incorporate a correlation coefficient map to allow well-aligned voxels to play a more important role in supervising the training process. We compared CaNes-Net with SPM and FSL tools, as well as four deep learning models on 18 adult subjects and the ADNI dataset. Our results indicate that CaNes-Net reduces segmentation errors caused by the misalignment and improves segmentation accuracy substantially over the competing methods.
Keywords: Brain segmentation, ascaded nested network, Deep learning, Magnetic resonance imaging
1. Introduction
Segmentation of brain magnetic resonance (MR) images into the gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), plays an important role in visualizing and measuring the brain anatomy in clinical applications and neuroscientific research [1,2]. Since manual segmentation is time-consuming, error-prone, subjective, non-reproducible, and not feasible on large-scale datasets, developing automated brain MR image segmentation methods is in urgent demand. This task, however, is challenging due to the low soft-tissue contrast and large variations of brain anatomies across subjects.
Nowadays, deep convolutional neural networks (DCNNs) have become the most prevalent and successful tool for brain MR image segmentation [3–5]. Comparing to traditional methods, DCNNs have a distinct advantage in learning the image representation and voxel labels in a uniform framework, freeing users from the troublesome handcrafted feature extraction and classifier design. However, as a supervised solution, a DCNN requires a large number of training images, each being equipped with a voxel-level dense annotation. More important, the performance of a DCNN relies heavily on the quality of those annotations.
Brain tissue annotations using 1.5T or 3T MR images can be error-prone and unreliable due to the relatively poor tissue contrast. 7T MRI has a significantly better signal-to-noise ratio (SNR), higher contrast, and more anatomical details than 1.5T and 3T MRI [6], as shown in Fig. 1. However, due to the cost, 7T MR scanners are not available in most hospitals and research centers. To leverage the advantage of 7T scanners [7], Deng et al. [8] trained a random forest-based method for 3T MR image segmentation using the ground truth derived from 7T MR images. Specifically, they generated the annotations based on the images acquired using a 7T scanner, and then transferred these high-quality annotations from 7T images to corresponding 3T images using rigid registration. Although effective, this method suffers from potential misalignment [9–11] between 3T and 7T images caused by imaging distortions, which may result in inaccurate supervision. In addition, this method is not end-to-end and needs hand-crafted features, which may limit its representation capability.
Fig. 1.

Comparison between 3T and 7T T1-weighted MR images. Manual annotation of the GM, WM, and CSF based on the 7T image is shown on the right.
In this paper, we propose a cascaded nested network (CaNes-Net) for 3T brain MR image segmentation, trained using tissue labels derived from 7T brain MR images. Construction of this model starts with training a nested network (Nes-Net) for rough segmentation. Then, based on the intermediate segmentation results, we construct geodesic distance maps for three tissue types, respectively, and concatenate these maps with the 3T image to train another Nes-Net for fine segmentation. Construction of geodesic distance maps and training of additional Nes-Net are performed iteratively until converged. This strategy is also known as auto-context [12], which concatenates the input and output of the previous model as input for the next model to obtain segmentation closer to the ground truth iteratively. To overcome misalignment, we define the correlation coefficient map of the aligned 3T and 7T images, and incorporate it into the loss function. We evaluated the proposed CaNes-Net on a dataset of 18 subjects with 3T and 7T MR images and the ADNI dataset with 799 brain MR scans.
The main contribution of this work is three-fold. First, we propose the correlation coefficient map to measure the quality of the alignment between 3T and 7T MR images, and incorporate it into a weighted loss function, allowing well-aligned voxels to play a more important role than miss-aligned voxels in supervising the training. Second, we design CaNes-Net as an iterative coarse-to-fine segmentation process, in which refinement is guided by the geodesic distance maps constructed based on the segmentation result obtained in the previous step. Third, our CaNes-Net achieves substantially improved segmentation accuracy over the state of the art.
The pilot data of this research has been presented in ISBI 2020 [13]. This paper reports a substantially extended solution with improved segmentation performance. The major extensions include (1) we provided a more comprehensive review of the research on brain MR image segmentation; (2) we supplemented an ablation study to validate the contribution of each component in CaNes-Net; (3) we discussed the number of Nes-Nets required in the cascaded architecture; (4) we showed the sensitivity of our model to detect the group difference of subjects with Alzheimer’s disease (AD) vs. normal controls (NC), AD vs. mild cognitive impairment (MCI), and MCI vs. NC on ADNI; (5) we added another experiment on the hippocampus to show the usefulness of our model; and (6) we performed an experiment on longitudinal brain MR images with the age from 70 to 75 to demonstrate the superiority of our model over the commonly-used FSL package in detecting significant difference at various time points on AD, MCI, and NC.
2. Related work
Traditional brain MR image segmentation methods are usually based on atlases, statistical models, or deformable models [14–16]. Atlas-based methods [17,18] propagate the labels from the atlas image to the individual image by registration, which use atlas as a reference to guide the segmentation of a target image. These methods are simple and effective medical image segmentation techniques. Statistical models, such as the Gaussian mixture model (GMM), are usually combined with the Markov random field (MRF) model [19], hidden MRF (HMRF) model [20] to incorporate spatial constraints into the segmentation process, and can be solved by the expectation-maximization (EM) algorithm [21,22] and bio-inspired optimization algorithms. Deformable models convert the problem of segmenting an image into that of evolving a boundary curve through minimizing the associated energy function. Level-set methods [23,24] can perform numerical computations to evolve curve or surface without having to parameterize these object. Also, level-set methods make it very easy to follow shapes with changing topology. However, these traditional algorithms rely on hand-designed features, and have limited representation of complicated brain tissue structures and changes.
Recently, DCNNs have demonstrated state-of-the-art performance on many image segmentation tasks [25–29]. Such success has prompted the application of DCNNs to brain MR image segmentation [3]. Dolz et al. [30] proposed a 3D and fully convolutional neural network with small kernels and deeper architectures for subcortical brain structure segmentation in MRI. Chen et al. [31] presented OctopusNet for multi-modal medical image segmentation. Nie et al. [32] proposed a convolution and concatenate 3D fully convolutional network (CC-3D-FCN) for infant brain image segmentation by using a transformation and a fusion module. Jog et al. [33] proposed a pulse sequence adaptive convolutional neural network (PSACNN) for whole brain MRI segmentation, and applied an augmentation scheme by building approximate forward models of pulse sequences to generate a wide range of synthetic training examples. In our previous work [5], we proposed a multi-model, multi-size, and multi-view deep neural network (M3Net) to segment brain MR images on a slice-by-slice basis. In addition, Abhijit et al. [34] proposed a fully convolutional and densely connected neural network tool named QuickNAT for whole brain segmentation, which processed an MRI brain scan in 20 s. Pierrick et al. [4] introduced AssemblyNet which ensembled a large number of CNNs for whole brain segmentation. Henschel et al. [35] proposed a developed deep learning framework (FastSurferCNN) based on QuickNAT for whole-brain segmentation, which incorporated competitive dense blocks and spatial information aggregation. Despite their success, the performance of these deep learning techniques relies heavily on the quality of the annotations of training data, which is limited not only by the difficulty of manual segmentation but also by the intrinsic quality of brain MR scans. Meanwhile, it is promisingly helpful to acquire additional 7T MR images to provide high-quality annotations for 3T images [8], but the collection of 7T scans is hindered by the high cost and a suitable method is needed to efficiently use the precious 7T MR scans in the segmentation of 3T MR scans.
3. Datasets
This study was approved by the Institutional Review Board (IRB) of the University of North Carolina at Chapel Hill and the signed consent forms were obtained from all subjects. We recruited 18 volunteers for this study with ages 30 ± 8 years MR images were acquired using a 3T Siemens Trio scanner with a 3D MP-RAGE sequence and a Siemens Magnetom 7T whole-body MR scanner with a 3D MP2-RAGE sequence. The 3T MR and 7T MR images have the resolution of 0.8594 × 0.8594 × 0.999 mm 3 and 0.80 × 0.80 × 0.799 mm3, respectively. We applied the intensity inhomogeneity correction [36], skull stripping [37], cerebellum removal, and histogram matching [38] to each scan. Then each 3T MR scan was aligned onto the corresponding 7T scan using FLIRT [39,40] with a 9-DOF transformation. To generate reliable manual segmentation, we adopted FAST [20] to obtain an initial segmentation for 7T images, then performed manual correction by an experienced expert via ITK-SNAP [41].
The ADNI dataset used for this study consists of 799 brain MR scans downloaded from the Alzheimers Disease Neuroimaging Initiative (ADNI-1) [42] database. These scans were acquired on 199 subjects with Alzheimers disease (AD), 374 subjects with mild cognitive impairment (MCI), and 226 normal controls (NC) using either 1.5T or 3.0T scanners.
4. Method
The proposed cascaded nested network (CaNes-Net) contains multiple nested networks (Nes-Nets) and two key components (i.e., the correlation coefficient map and geodesic distance maps). Side blocks are supplemented to each Nes-Net to produce side outputs for deep supervision. Since this design is inspired by the holistically-nested edge detection (HED) [43], we use the term “nested” to name each of our sub-network. The first Nes-Net takes 3T brain MR images as input and produces their rough segmentations. Other Nes-Nets take the concatenation of geodesic distance maps and 3T MR images as input and produce refined segmentation results. Each Nes-Net uses the correlation coefficients weighted cross-entropy loss to alleviate the impact of the misalignment between 7T and 3T MR images. The framework of CaNes-Net is presented in Fig. 2. We now describe the details of this model.
Fig. 2.

Diagram of the proposed CaNes-Net with multiple Nes-Nets. The purple number is f, which determines the magnification, in deconvolution layer of side block. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
4.1. Correlation coefficient map
Ideally, after aligning a 3T MR image to the 7T MR image acquired for the same subject, the voxel labels from the 7T image can be directly mapped to the aligned 3T image for training a brain MR image segmentation model. However, although both images were acquired on the same subject, the misalignment of the affine transformation can be as large as 1 and 2 voxels according to manual checking by different raters. Such misalignment is critical for the segmentation of, for example, the cortical GM, which is a thin layer with a thickness of only 2–5 voxels. A typical example is shown in Fig. 3, where the boundaries of GM, WM, and CSF are clear in the 7T MR image but quite blurry in the aligned 3T MR image. The in-consistent voxel labels caused by misalignment become apparent when overlapping the boundaries derived from the 7T image onto the aligned 3T image.
Fig. 3.

From left to right: 3T image, 7T image, ground truth derived from 7T image, and 3T image overlaid by 7T ground truth.
To measure such misalignment, we define the correlation coefficient map using the following Pearson correlation
| (1) |
where and represent two image patches centered at in the aligned 3T and 7T MR images, respectively, and are the mean and standard deviation of patch . The image patches are selected from the 3T and 7T MR images with a patch size of 5 × 5 × 5 voxels and a stride of 1 × 1 × 1 to calculate the correlation coefficient for each voxel . A large value in the correlation coefficient map indicates a proper alignment, while a small value indicates a misalignment.
4.2. Geodesic distance maps
Based on the intermediate segmentation results, we can calculate geodesic distance maps for three tissue types, respectively, to consider the spatial relationship for each voxel in each class. Given an aligned 3T image and the corresponding segmentation result produced by the first trained Nes-Net, we construct the binary masks for three tissue types, denoted by . The unsigned geodesic distance [44] between a voxel to the boundary of the mask on an intensity image, , is defined as
| (2) |
where is the set of all paths over the image connecting the voxel and is one of such paths, is the spatial derivative parametrized by , and is a vector tangent to the direction of the path. The parameter controls the contribution of the image gradient .
4.3. Structure of CaNes-Net
The proposed CaNes-Net is a cascading of four Nes-Nets. Each Nes-Net has an encoder-decoder structure that contains multiple core blocks, side blocks, and an additional fusion layer (see Fig. 2). A core block consists of two 3 × 3 × 3 convolutional layers, each being followed by the ReLU activation and batch normalization. A side block is used to produce a side output for deep supervision [43]. It consists of a core block, a deconvolutional layer with kernels of size 2f × 2f × 2f where f (the purple number in Fig. 2) determines the magnification, and a convolutional layer with 4 kernels of size 1 × 1 × 1. Each Nes-Net consists of six side output layers. The fusion layer combines multiple side outputs into a unified one by using a 3D convolutional layer with four 1 × 1 × 1 filters.
The first Nes-Net takes 3T MR images as its input and produce rough segmentation results. Each of the subsequent three Nes-Nets takes the concatenation of 3T MR images and the corresponding geodesic distance maps as input, aiming to use the spatial contextual information to guide the training process. Since the geodesic distance maps are calculated based on the segmentation results produced by the previous Nes-Net, this cascaded design can refine segmentation results gradually and repeatedly. Note that, although only four Nes-Nets are used for this study due to the trade-off between accuracy and efficiency, this design is generic and a network with more Nes-Nets can be built if necessary.
4.4. Loss function
To alleviate the misalignment issue, we encourage well-aligned voxels to play a more important role than miss-aligned voxels in supervising the segmentation process via incorporating the correlation coefficient map into the cross-entropy loss, shown as follows
| (3) |
where denotes the parameters in all layers, is the correlation coefficient map, represents the probability of voxel belonging to the th class, and is the corresponding ground truth.
Let the weights of the side output layers be denoted by , represents the th side output layer. The loss fuction for th side output is . Thus, the weighted loss function for all side outputs is defined as
| (4) |
The loss function for the fusion layer is
| (5) |
Thus, the total loss for each Nes-Net is the weighted sum of the above two loss terms, shown as follows
| (6) |
where is a weighting factor. The loss functions are same for all Nes-Nets.
4.5. Implementation details
We trained all networks on a platform with Keras 2.0.6 and a NVIDIA TITAN X GPU 12 GB. We adopted the Adam optimizer and set batch size to 32, maximum epoch number to 10,000, and initial learning rate to 0.0001, which was divided by 10 after 5000 iterations. For other hyper-parameters, we set the loss-weight for side outputs to , the weighting parameter in Eq. (6) to 1, the contribution factor in Eq. (2) to 0.2. The segmentation was performed on a patch-by-patch basis with a patch size of 32 × 32 × 32 voxels and a stride of 9 × 9 × 9 in both training and testing for all Nes-Nets. The final likelihood map for each brain MR volume was generated by averaging likelihood maps in overlapped positions. We split 20% of the training data as the validation data. These parameters were empirically determined using cross-validation on the validation data. In the proposed CaNes-Net, the first Nest-Net is trained from scratch, and each following Nes-Nets is initialized using the trained previous Nest-Net.
5. Experiments and results
5.1. Comparison to existing segmentation methods
The proposed CaNes-Net was compared to two commonly used software packages, i.e., SPM [45] and FSL [46], and four deep learning segmentation methods including 3D U-Net [47], 3D attention U-Net [48], 3D uResNet [49] and CC-3D-FCN [32] on the dataset of 18 subjects using the leave-one-out cross-validation. To ensure fair comparison, we applied the correlation coefficient map to all learning-based methods. During the testing, the correlation coefficient map was used as weight to evaluate the segmentation results. Fig. 4 shows a typical example of aligned 3T brain MR image, the ground truth derived from the 7T image, and the segmentation results obtained by eight methods. We used the Dice similar coefficient (DSC) and Hausdorff Distance (HD) to evaluate the segmentation performance quantitatively. The mean and standard deviation of DSC and HD on 18 subjects were presented in Table 1. It shows that our CaNes-Net achieves the highest average DSC and lowest average HD in the delineation of each of three major brain tissue types. Compared to U-Net, the second best solution, our CaNes-Net improves the average DSC by 1.40%, 1.33%, and 2.02% and decreases the average HD by 0.04 mm, 0.26 mm, and 0.15 mm in the segmentation of GM, WM, and CSF, respectively. Also, we performed a t-test to compare the DSC of CaNes-Net with U-Net. Based on the paired t-test, for GM, WM, and CSF, the p-values are 0.0019, 0.0013, 0.00015, respectively. Therefore, we can conclude that the proposed CaNest-Net is statistically better than the best competing model. The results in visual and quantitative evaluations indicate that our CaNes-Net substantially outperforms SPM, FSL, and four deep learning approaches.
Fig. 4.

Comparison of segmentation results produced by eight methods on one subject. Note that CaNes-Net is a cascade of four Nes-Nets.
Table 1.
Mean ± standard deviation of DSC (%) and HD (mm) achieved by eight segmentation methods on the aligned T1-weighted MR scan acquired on 18 subjects.
| Method | GM |
WM |
CSF |
|||
|---|---|---|---|---|---|---|
| DSC | HD | DSC | HD | DSC | HD | |
|
| ||||||
| FSL | 78.33 ± 1.89 | 2.60 ± 0.89 | 85.77 ± 2.22 | 2.20 ± 0.38 | 63.30 ± 6.20 | 3.40 ± 0.83 |
| SPM | 76.12 ± 2.80 | 2.75 ± 0.76 | 80.58 ± 2.28 | 2.26 ± 0.46 | 66.11 ± 9.78 | 3.25 ± 0.87 |
| U-Net | 88.21 ± 1.95 | 1.18 ± 0.13 | 93.78 ± 1.52 | 1.38 ± 0.15 | 78.82 ± 3.22 | 1.62 ± 0.24 |
| uResNet | 86.23 ± 2.77 | 1.14 ± 0.14 | 92.40 ± 3.05 | 1.60 ± 0.25 | 77.48 ± 3.96 | 1.84 ± 0.57 |
| Attention U-Net | 86.36 ± 2.86 | 1.59 ± 0.17 | 93.18 ± 1.34 | 1.26 ± 0.42 | 77.74 ± 4.04 | 1.78 ± 0.44 |
| CC-3D-FCN | 86.78 ± 1.48 | 1.31 ± 0.37 | 92.94 ± 1.24 | 1.42 ± 0.10 | 77.77 ± 2.23 | 2.18 ± 0.39 |
| Nes-Net | 89.20 ± 0.97 | 1.18 ± 0.10 | 94.72 ± 0.23 | 1.16 ± 0.07 | 80.39 ± 2.61 | 1.60 ± 0.26 |
| CaNes-Net | 89.61 ± 0.88 | 1.14 ± 0.05 | 95.11 ± 0.59 | 1.12 ± 0.06 | 80.84 ± 2.42 | 1.47 ± 0.29 |
5.2. Effectiveness of correlation coefficient map
Fig. 5 shows an example of the difference map between the ground truth derived from a 7T MR image and the segmentation result of CaNes-Net with or without using the correlation coefficient map. It reveals that using the correlation coefficient map dramatically reduces the number of mis-segmented voxels.
Fig. 5.

Difference map and enlarged region between ground truth and segmentation result produced by CaNes-Net with / without correlation coefficient (CC) map. Segmentation errors are highlighted by white pixels.
5.3. Effectiveness of iterative refinement
The last two rows in Table 1 shows the mean and standard deviation of DSC and HD achieved by Nes-Net (i.e. without iterative refinement) and the proposed CaNes-Net. It reveals that using the cascaded design (with multiple Nes-Nets) to iteratively refine the segmentation results is effective, which improves the average DSC by 0.41%, 0.39%, and 0.45%, and decreases the average HD by 0.04 mm, 0.04 mm, and 0.13 mm in the segmentation of GM, WM, and CSF, respectively.
5.4. Ablation study
To demonstrate the contribution of each component in CaNes-Net, we performed an ablation study on 18 subjects. We constructed four simplified algorithms, which uses (a) Nes-Net without correlation coefficient map, (b) Nes-Net, (c) two Nes-Nets without geodesic distance maps, (d) two Nes-Nets with geodesic distance maps. The mean and standard deviation of DSC (%) on 18 subjects were given in Table 2. It shows that (1) using correlation coefficient map improves the performance of Nes-Net; (2) using two Nes-Nets with or without geodesic distance maps outperforms than using one; (3) incorporating the geodesic distance maps into the input of the Nes-Net leads to a further performance gain; (4) CaNes-Net shows best performance than other simplified algorithms. Also, we performed the t-test to compare the DSC obtained by each pair of methods. The p-values were displayed in Table 3. It shows that the majority of-values are smaller than 0.05, which reveals that the performance gain contributed by each component is statistically significant. Therefore, each component is an indispensable part of CaNes-Net.
Table 2.
Mean ± standard deviation of DSC (%) achieved by the proposed model and four simplified versions on 18 subjects.
| Algorithms | GM | WM | CSF |
|---|---|---|---|
|
| |||
| (a) Nes-Net without CC map | 93.55 ± 1.06 | 88.37 ± 1.39 | 78.68 ± 3.06 |
| (b) Nes-Net | 94.72 ± 0.23 | 89.20 ± 0.97 | 80.39 ± 2.61 |
| (c) Two Nes-Nets without geodesic distance maps | 94.82 ± 0.65 | 89.38 ± 0.91 | 80.46 ± 2.77 |
| (d) Two Nes-Nets with geodesic distance maps | 94.91 ± 0.42 | 89.50 ± 1.04 | 80.57 ± 2.46 |
| (e) CaNes-Net (four Nes-Nets) | 95.11 ± 0.59 | 89.61 ± 0.88 | 80.84 ± 2.42 |
Table 3.
Paired t-test for DSC obtained by different methods on GM, WM, and CSF.
| Methods | GM | WM | CSF |
|---|---|---|---|
|
| |||
| (a) & (b) | 1.65e−3 | 2.75e−3 | 1.91e−4 |
| (b) & (c) | 3.79e−2 | 3.15e−2 | 6.55e−2 |
| (c) & (d) | 1.65e−2 | 1.97e−2 | 1.95e−2 |
| (d) & (e) | 9.15e−3 | 1.35e−2 | 7.65e−3 |
5.5. Number of Nes-Nets
The number of Nes-Nets was determined via leave-one-out cross-validation on all training subjects. Fig. 6 shows the impact of the number of Nes-Nets on the segmentation accuracy. It reveals that increasing this number beyond four will not substantially improve the segmentation performance. We therefore used four Nes-Nets in our CaNes-Net.
Fig. 6.

Plots of DSC versus the number of Nes-Nets when applying our CaNes-Net to the segmentation of GM, WM, and CSF.
5.6. Training and inference time
Since our method is a cascaded structure that consists of several sub-networks, we reported the training and average inference time of each step in Table 4. It shows that the first Nes-Net needs more training time since it is trained from scratch, and other three Nes-Nets need much less training time due to a good initialization based on the trained previous Nes-Net. During the inference, computing full pairwise geodesic distances is very expensive, leading to a relatively high time cost. In our future work, we will apply multi-task learning to directly estimate the geodesic distance map and thus avoid computing geodesic distances. If successful, the average inference time could be reduced to less than 100 s.
Table 4.
Training and average inference time cost of each step in CaNes-Net.
| Step | First Nes-Net | Other 3 Nes-Nets | Geodesic distance | Total |
|---|---|---|---|---|
|
| ||||
| Training | 8.59 h | 1.52 h | 6.46 h | 16.57 h |
| Inference | 17.8 s | 58.4 s | 23 m | 24 m |
5.7. Evaluation on tree-like structures
The geodesic distance could be sensitive to noise and small topology changes. To evaluate the impact of using tissue-specific geodesic distance maps as contextual information to refine the segmentation, we visualized six segmentation results on tree-like structures in Fig. 7. It shows that the segmented structures are topologically correct, which indicates that our method is robust.
Fig. 7.

Segmentation results of our method on six tree-like structures. The first row shows the brain MR images, and the second row shows the segmentation results.
5.8. Evaluation on ADNI
We further evaluated the proposed CaNes-Net model on the ADNI dataset. Since U-Net achieves better performance than other deep learning methods (see Table 1), we compared the proposed algorithm with U-Net, which achieves better performance than other networks, and the commonly-used FSL package on this dataset. Qualitative comparison was performed since no ground truth voxel labels are available for ADNI. Fig. 8 hows a transverse, a sagittal, and a coronal slice from an aligned 3T MR scan and the segmentation results produced by U-Net, FSL and our CaNes-Net. To highlight the differences among those results, we enlarged the region in the red box and displayed it beneath each subfigure. The performance of both CaNes-Net and U-Net degrade, due to they were trained on 3T data of young subjects but applied on 1.5T scans of old subjects with Alzheimer’s disease. From the enlarged views, it is apparent that U-Net and FSL over-segment GM and CSF. Fig. 9 shows the renderings of GM/CSF and GM/WM surfaces generated based on the segmentation results produced by U-Net, FSL and our CaNes-Net. It reveals that the proposed method leads to more meaningful and accurate cortical surfaces with correct topology, in comparison with the large topological errors (holes and cracks) in the results of U-Net and FSL. The results indicate that the proposed CaNes-Net model can produce more accurate segmentation results than U-Net and FSL.
Fig. 8.

Segmentation results produced by three methods on an ADNI study. From top to bottom are transverse, sagittal and coronal slices, respectively. The orange box is enlarged and displayed beneath each subfigure.
Fig. 9.

Renderings of GM/WM and GM/CSF surfaces generated from the segmentation results produced by three methods our CaNes-Net. The orange box in each image is enlarged and displayed to the right of the image.
In addition, we demonstrated the advantage of the proposed CaNes-Net in terms of sensitivity in detecting group differences between AD and NC, AD and MCI as well as between MCI and NC with an age from 70 to 75. Recalling that FSL achieves comparable performance with U-Net in Figs. 8 and 9, we will mainly focus in the following on comparing the proposed method with FSL. We constructed cortical surface [50] using the segmentation results produced by FSL and CaNes-Net. Through the cortical surface parcellation [51], we divided the cerebral cortex of each hemisphere into 35 regions of interests (ROIs). Then we compared the mean cortical thickness and total surface area of each ROI using the two-sample t-test between AD vs. NC, AD vs. MCI, and MCI vs. NC, as shown in Figs. 10 and 11. The p-value maps were binarized using the threshold of p = 0.05 after the correction for multiple comparisons. In Fig. 10, FSL shows no significant difference between AD vs. MCI and MCI vs. NC, few differences between AD and NC; in contrast, CaNes-Net shows most significant differences between AD vs. NC, and shows more significant differences between AD vs. MCI than MCI vs. NC. Fig. 11 shows similar results. Comparing Figs. 10 and 11, it shows that the t-test by using total surface area for each ROI shows more significant differences than using mean cortical thickness.
Fig. 10.

Group differences for cortical thickness given by FSL and CaNes-Net on the ADNI dataset. P-values were binarized to show significant regions ( after the correction for multiple comparisons).
Fig. 11.

Group differences for surface area given by FSL and CaNes-Net on the ADNI dataset. P-values were binarized to show significant regions ( after the correction for multiple comparisons).
To evaluate the sensitivity of the methods, we designed another experiment on the hippocampus, which is a typical AD-related brain region. We registered the segmentation results produced by FSL and our CaNes-Net to the Montreal Neurological Institute (MNI) segmented template using the HAMMER registration [52]. Then, we computed regional analysis of volumes examined in normalized space (Ravens) maps from the resultant deformation field by preserving the WM volume changes. We used a Gaussian kernel with isotropic 8 mm full width at half maximum (FWHM) to smooth the Ravens maps before statistical analysis. We calculated the mean Ravens value in the hippocampal region for each subject, and applied the two-sample t-test to compare statistically significant difference between AD and NC. For FSL, there is no statistically significant difference between groups ( and ). However, when using the Ravens values derived from the proposed CaNes-Net, it achieves significant differences between groups ( and ). Here the negative t value indicates that AD patients have relatively lower Ravens values than NC, reflecting smaller hippocampal volumes. These results indicate that the proposed method is more sensitive in identifying hippocampal atrophy.
In addition, we also performed an experiment on longitudinal brain MR images with age from 70 to 75. We constructed cortical surfaces by using the segmentation results from FSL and CaNes-Net for different time points. At each vertex, the cortical thickness can be computed based on the constructed inner and outer surfaces. And we applied the cortical surface parcellation to divide the cerebral cortex of each hemisphere into 35 ROIs. Figs. 12–14 show the change of mean cortical thickness for each ROI on 20 subjects from AD, MCI and NC at different time points, respectively. The p-value maps were binarized using the same threshold of after the correction for multiple comparisons. The CaNes-Net shows more significant difference at different time points on AD, MCI, and NC than FSL.
Fig. 12.

Longitudinal differences for cortical thickness given by FSL and CaNes-Net on AD studies from ADNI at different time points. P-values were binarized to show significant regions ( after correction for multiple comparisons). BL is the baseline.
Fig. 14.

Longitudinal differences for cortical thickness given by FSL and CaNes-Net on NC studies from ADNI at different time points. P-values were binarized to show significant regions ( after correction for multiple comparisons). BL is the baseline.
6. Discussion and conclusion
This paper presents the CaNes-Net model for 3T brain MR image segmentation, which is trained by leveraging reliable tissue labels derived from 7T MR images. The evaluation on a dataset of 18 subjects and the ADNI dataset demonstrates that our CaNes-Net substantially outperforms SPM and FSL packages and four state-of-the-art deep learning models. Our results also indicate that (1) using the constructed correlation coefficient map to weight the loss function and (2) using the cascaded design to iteratively refine the segmentation can improve the segmentation accuracy effectively. For future work, employing multitask learning techniques to learn distance maps and sharing the weight between two networks may alleviate the burden on computing distance maps and allow two highly related networks to mutually benefit each other. We will use the distance maps calculated from ground truth as supervised information to train a network. Moreover, in addition to using distance maps as the input, we will also use distance maps as attention information to guide the segmentation task. This may further improve the performance of our model. In this work, 7T images are just used to provide manual annotation; in our future work, we will jointly learn reconstruction, registration, and segmentation networks to make full use of 7T images. This may enable the model to benefit from more 3T data without 7T data and manual annotation. In addition, our method with the patch-based strategy is a bottom-up solution, which may overlook fine details. In the future, we will involve the top-down reasoning strategy to gather explicit texture information and region discontinuities, which may further highlight these regions and make the method more robust.
Fig. 13.

Longitudinal differences for cortical thickness given by FSL and CaNes-Net on MCI studies from ADNI at different time points. P-values were binarized to show significant regions ( after correction for multiple comparisons). BL is the baseline.
Acknowledgments
J. Wei and Y. Xia were partially supported by the National Natural Science Foundation of China under Grants 62171377, 61771397, and the CAAI-Huawei MindSpore Open Fund under Grants CAAIXSJLJJ-2020-005B. P.-T. Yap was supported by NIH grant EB006733.
Biographies
Dr. Zhengwang Wu is a postdoc researcher at the University of North Carolina at Chapel Hill. He got his Ph.D. degree on pattern recognition and intelligent system from Xian Jiaotong University, China. His research focuses on computer vision, machine learning, pattern recognition methods and their applications on medical image segmentation, classification and visualization.
Prof. Li Wang is a tenure track Assistant Professor in the University of North Carolina at Chapel Hill. He completed his Ph.D. in June 2010 from Nanjing University of Science and Technology. His research interests focus on image segmentation, image registration, cortical surface analysis, machine learning and their applications to normal early brain development and disorders.
Dr. Toan Duc Bui is currently a senior research scientist at VinAI Research, Vietnam. He was a Research Scholar in the University of North Carolina (UNC) at Chapel Hill to work on medical imaging. He is a co-organizer iSeg-2019. He has served as a reviewer for several top-tier conferences and journals such as MICCAI, EMBS, etc.
Dr. Liangqiong Qu received her Ph.D. degree in Pattern Recognition and Intelligent System at Chinese Academy of Sciences, China, in 2018. She is currently a postdoc in school of medicine at Stanford University. Her research interests include computer vision and medical imaging.
Prof. Pew-Thian Yap is an Associate Professor with the Department of Radiology at the University of North Carolina at Chapel Hill. He leads a wide range of research projects spanning image acquisition, reconstruction, quality control, processing, and analysis, with applications to neuroscience, radiomics, and surgical planning.
Prof. Yong Xia received the B.E., M.E., and Ph.D. degrees in computer science and technology from Northwestern Polytechnical University (NPU), in 20 01, 20 04, and 2007, respectively. He is currently a Professor at the School of Computer Science and Engineering, NPU. His research interests include medical image analysis, computer-aided diagnosis, pattern recognition, machine learning, and data mining.
Prof. Gang Li is an Associate Professor in the University of North Carolina at Chapel Hill. He completed his Ph.D. in 2010 from Northwestern Polytechnical University. His research interests include medical image segmentation, registration, cortical surface analysis, pattern recognition, machine learning and their applications to early brain development, aging and disorders.
Dinggang Shen is a Professor and a Founding Dean with School of Biomedical Engineering, Shanghai Tech University, Shanghai, China, and also a Co-CEO of United Imaging Intelligence (UII), Shanghai. He is a Fellow of IEEE, The American Institute forMedical and Biological Engineering (AIMBE), The International Association for Pattern Recognition (IAPR), and The MedicalImage Computing and Computer Assisted Intervention (MICCAI) Society. His research interests include medical image analysis,computer vision, and pattern recognition. He has published more than 1380 peer-reviewed papers in the international journals andconference proceedings, with H-index 117 and 55000+ citations. He serves as an Editor-in-Chief for Frontiers in Radiology, aswell as an editorial board member for eight international journals. Also, he has served in the Board of Directors, The MedicalImage Computing and Computer Assisted Intervention (MICCAI) Society, in 2012–2015, and was General Chair for MICCAI2019.
Footnotes
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
This work was completed when Jie Wei was a visiting student at the University of North Carolina at Chapel Hill.
References
- [1].Wright R, Kyriakopoulou V, Ledig C, Rutherford MA, Hajnal JV, Rueckert D, Aljabar P, Automatic quantification of normal cortical folding patterns from fetal brain MRI, NeuroImage 91 (2) (2014) 21–32. [DOI] [PubMed] [Google Scholar]
- [2].Wang L, Yap PT, Wang F, Wu Z, Meng Y, Dong P, Kim J, Shi F, Rekik I, et al. , Computational neuroanatomy of baby brains: a review, NeuroImage 185 (2019) 906–925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Wang L, Nie D, Li G, Puybareau E, Dolz J, Zhang Q, Wang F, Xia J, Wu Z, Chen J.a., et al. , Benchmark on automatic six-month-old infant brain segmentation algorithms: the iSeg-2017 challenge, IEEE Trans. Med. Imaging 38 (9) (2019) 2219–2230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Coupé P, Mansencal B, Clément M, Giraud R, Senneville B, Ta V, Lepetit V, Manjon J, AssemblyNet: a large ensemble of CNNs for 3D whole brain MRI segmentation, NeuroImage 219 (2020) 117026. [DOI] [PubMed] [Google Scholar]
- [5].Wei J, Xia Y, Zhang Y, M3Net: a multi-model, multi-size, and multi-view deep neural network for brain magnetic resonance image segmentation, Pattern Recognit. 91 (2019) 366–378. [Google Scholar]
- [6].Braun J, Guo J, Lützkendorf R, Stadler J, Papazoglou S, Hirsch S, Sack I, Bernarding J, High-resolution mechanical imaging of the human brain by three-dimensional multifrequency magnetic resonance elastography at 7T, NeuroImage 90 (8) (2013) 308–314. [DOI] [PubMed] [Google Scholar]
- [7].Fan Y, Rao H, Hurt H, Giannetta J, Korczykowski M, Shera D, Avants BB, Gee JC, Wang J, Shen D, Multivariate examination of brain abnormality using both structural and functional MRI, NeuroImage 36 (4) (2007) 1189–1199. [DOI] [PubMed] [Google Scholar]
- [8].Deng M, Yu R, Wang L, Shi F, Yap PT, Shen D, Alzheimer’s Disease Neuroimaging Initiative, Learning-based 3T brain MRI segmentation with guidance from 7T MRI labeling, Med. Phys. 43 (12) (2016) 6588–6597. [DOI] [PubMed] [Google Scholar]
- [9].Jia H, Wu G, Wang Q, Shen D, ABSORB: atlas building by self-organized registration and bundling, Neuroimage 51 (3) (2010) 1057–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Luan H, Qi F, Xue Z, Chen L, Shen D, Multimodality image registration by maximization of quantitative-qualitative measure of mutual information, Pattern Recognit. 41 (1) (2008) 285–298. [Google Scholar]
- [11].Cao X, Yang J, Zhang J, Nie D, Kim M, Wang Q, Shen D, Deformable image registration based on similarity-steered CNN regression, in: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2017, pp. 300–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Tu Z, Bai X, Auto-context and its application to high-level vision tasks and 3D brain image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 32 (10) (2010) 1744–1757. [DOI] [PubMed] [Google Scholar]
- [13].Wei J, Bui DT, Wu Z, Wang L, Xia Y, Li G, Shen D, 7T guided 3T brain tissue segmentation using cascaded nested network, in: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), IEEE, 2020, pp. 140–143. [Google Scholar]
- [14].Sun L, Shao W, Wang M, Zhang D, Liu M, High-order feature learning for multi-atlas based label fusion: application to brain segmentation with MRI, IEEE Trans. Image Process. 29 (2020) 2702–2713. [DOI] [PubMed] [Google Scholar]
- [15].Mahata N, Kahali S, Adhikari SK, Sing JK, Local contextual information and Gaussian function induced fuzzy clustering algorithm for brain MR image segmentation and intensity inhomogeneity estimation, Appl. Soft. Comput. 68 (2018) 586–596. [Google Scholar]
- [16].Singh P, Huang Y, Lee T, A novel ambiguous set theory to represent uncertainty and its application to brain MR image segmentation, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC), 2019, pp. 2460–2465. [Google Scholar]
- [17].Aljabar P, Heckemann RA, Hammers A, Hajnal JV, Rueckert D, Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy, Neuroimage 46 (3) (2009) 726–738. [DOI] [PubMed] [Google Scholar]
- [18].Wu G, Wang Q, Zhang D, Nie F, Huang H, Shen D, A generative probability model of joint label fusion for multi-atlas based brain segmentation, Med. Image Anal. 18 (6) (2014) 881–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Held K, Kops ER, Krause BJ, Wells W.M. I, Kikinis R, Muller-Gartner H-W, Markov random field segmentation of brain MR images, IEEE Trans. Med. Imaging 16 (6) (1997) 878–886. [DOI] [PubMed] [Google Scholar]
- [20].Zhang Y, Brady M, Smith S, Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm, IEEE Trans. Med. Imaging 20 (1) (2001) 45–57. [DOI] [PubMed] [Google Scholar]
- [21].Van Leemput K, Maes F, Vandermeulen D, Suetens P, Automated model-based tissue classification of MR images of the brain, IEEE Trans. Med. Imaging 18 (10) (2002) 897–908. [DOI] [PubMed] [Google Scholar]
- [22].Van Leemput K, Maes F, Vandermeulen D, Suetens P, A unifying framework for partial volume segmentation of brain MR images, IEEE Trans. Med. Imaging 22 (1) (2003) 105–119. [DOI] [PubMed] [Google Scholar]
- [23].Huang R, Ding Z, Gatenby C, Metaxas D, Gore J, A variational level set approach to segmentation and bias correction of images with intensity inhomogeneity, in: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2008, pp. 1083–1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Wang L, Shi F, Lin W, Gilmore JH, Shen D, Automatic segmentation of neonatal images using convex optimization and coupled level sets, NeuroImage 58 (3) (2011) 805–817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Xie Y, Zhang J, Xia Y, Shen C, A mutual bootstrapping model for automated skin lesion segmentation and classification, IEEE Trans. Med. Imaging 39 (7) (2020) 2482–2493. [DOI] [PubMed] [Google Scholar]
- [26].Li Z, Xia Y, Deep reinforcement learning for weakly-supervised lymph node segmentation in CT images, IEEE J. Biomed. Health Inform. 25 (3) (2021) 774–783. [DOI] [PubMed] [Google Scholar]
- [27].Jia H, Xia Y, Song Y, Zhang D, Huang H, Zhang Y, Cai W, 3D APA-Net: 3D adversarial pyramid anisotropic convolutional network for prostate segmentation in MR images, IEEE Trans. Med. Imaging 39 (2) (2020) 447–457. [DOI] [PubMed] [Google Scholar]
- [28].Zhang Y, Wu J, Liu Y, Chen Y, Wu EX, Tang X, MI-UNet: multi-inputs UNet incorporating brain parcellation for stroke lesion segmentation from T1-weighted magnetic resonance images, IEEE J. Biomed. Health Inform. 25 (2) (2021) 526–535. [DOI] [PubMed] [Google Scholar]
- [29].Dong P, Guo Y, Gao Y, Liang P, Shi Y, Wu G, Multi-atlas segmentation of anatomical brain structures using hierarchical hypergraph learning, IEEE Trans. Neural Netw. Learn. Syst. 31 (8) (2020) 3061–3072. [DOI] [PubMed] [Google Scholar]
- [30].Dolz J, Desrosiers C, Ayed IB, 3D fully convolutional networks for subcortical segmentation in MRI: a large-scale study, NeuroImage 170 (2017) 456–470. [DOI] [PubMed] [Google Scholar]
- [31].Chen Y, Chen J, Wei D, Li Y, Zheng Y, in: International Workshop on Multi-scale Multimodal Medical Imaging, Springer, 2019, pp. 17–25. [Google Scholar]
- [32].Nie D, Wang L, Adeli E, Lao C, Lin W, Shen D, 3-D fully convolutional networks for multimodal isointense infant brain image segmentation, IEEE Trans. Syst., Man, Cybern. 49 (3) (2019) 1123–1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Jog A, Hoopes A, Greve DN, Van Leemput K, Fischl B, PSACNN: pulse sequence adaptive fast whole brain segmentation, NeuroImage 199 (2019) 553–569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Roy AG, Conjeti S, Navab N, Wachinger C, QuickNAT: a fully convolutional network for quick and accurate segmentation of neuroanatomy, NeuroImage 186 (2019) 713–727. [DOI] [PubMed] [Google Scholar]
- [35].Henschel L, Conjeti S, Estrada S, Diers K, Fischl B, Reuter M, FastSurfer—A fast and accurate deep learning based neuroimaging pipeline, NeuroImage 219 (2020) 117012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Sled JG, Zijdenbos AP, Evans AC, A nonparametric method for automatic correction of intensity nonuniformity in MRI data, IEEE Trans. Med. Imaging 17 (1) (1998) 87–97. [DOI] [PubMed] [Google Scholar]
- [37].Shi F, Wang L, Dai Y, Gilmore JH, Lin W, Shen D, LABEL: pediatric brain extraction using learning-based meta-algorithm, NeuroImage 62 (3) (2012) 1975–1986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Yoo TS, Ackerman MJ, Lorensen WE, Schroeder W, Chalana V, Aylward S, Metaxas D, Whitaker R, Engineering and algorithm design for an image processing API: a technical report on ITK - the insight toolkit, Stud. Health Technol. Inform. 85 (2002) 586–592. [PubMed] [Google Scholar]
- [39].Jenkinson M, Smith SM, A global optimisation method for robust affine registration of brain images, Med. Image Anal. 5 (2) (2001) 143–156. [DOI] [PubMed] [Google Scholar]
- [40].Jenkinson M, Bannister P, Brady M, Smith S, Improved optimization for the robust and accurate linear registration and motion correction of brain images, NeuroImage 17 (2) (2002) 825–841. [DOI] [PubMed] [Google Scholar]
- [41].Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G, User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability, NeuroImage 31 (3) (2006) 1116–1128. [DOI] [PubMed] [Google Scholar]
- [42].Jack CR, Bernstein MA, Fox NC, Thompson P, Weiner MW, et al. , The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods, J. Magn. Reson. Imaging 27 (4) (2010) 685–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Xie S, Tu Z, Holistically-nested edge detection, Int. J. Comput. Vis. 125 (5) (2017) 3–18. [Google Scholar]
- [44].Criminisi A, Sharp T, Blake A, GeoS: geodesic image segmentation, in: Proceedings of the European Conference on Computer Vision (ECCV), 5302, 2008, pp. 99–112. [Google Scholar]
- [45].Dale AM, Liu AK, Fischl B, Buckner RL, Belliveau JW, Lewine JD, Hal-gren E, Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity, Neuron 26 (1) (2000) 55–67. [DOI] [PubMed] [Google Scholar]
- [46].Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Tej B, Johansen-berg H, Bannister PR, De Luca M, Drobnjak I, Flitney D, et al. , Advances in functional and structural MR image analysis and implementation as FSL, NeuroImage 23 (1) (2004) 208–219. [DOI] [PubMed] [Google Scholar]
- [47].Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O, 3D U-Net: learning dense volumetric segmentation from sparse annotation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2016, pp. 424–432. [Google Scholar]
- [48].Oktay O, Schlemper J, Folgoc LL, Lee MCH, Heinrich MP, Misawa K, Mori K, Mcdonagh S, Hammerla N, Kainz B, et al. , Attention U-Net: learning where to look for the pancreas, arXiv preprint arXiv: 1804.03999 (2018). [Google Scholar]
- [49].Guerrero R, Qin C, Oktay O, Bowles CT, Chen L, Joules R, Wolz R, Valdés-Hernández MDC, Dickie DA, Wardlaw JM, et al. , White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks, NeuroImage 17 (2018) 918–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Li G, Nie J, Wu G, Wang Y, Shen D, Consistent reconstruction of cortical surfaces from longitudinal brain MR images, NeuroImage 59 (4) (2012) 3805–3820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Fischl B, FreeSurfer, NeuroImage 62 (2) (2012) 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Shen D, Davatzikos C, HAMMER: hierarchical attribute matching mechanism for elastic registration, IEEE Trans. Med. Imaging 21 (11) (2002) 1421. [DOI] [PubMed] [Google Scholar]
