Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 10.
Published in final edited form as: IEEE Access. 2020 May 29;8:101550–101568. doi: 10.1109/access.2020.2998537

Deep Cerebellar Nuclei Segmentation via Semi-Supervised Deep Context-Aware Learning from 7T Diffusion MRI

Jinyoung Kim 1, Remi Patriat 1, Jordan Kaplan 1, Oren Solomon 1, Noam Harel 1,2
PMCID: PMC7351101  NIHMSID: NIHMS1602087  PMID: 32656051

Abstract

Deep cerebellar nuclei are a key structure of the cerebellum that are involved in processing motor and sensory information. It is thus a crucial step to accurately segment deep cerebellar nuclei for the understanding of the cerebellum system and its utility in deep brain stimulation treatment. However, it is challenging to clearly visualize such small nuclei under standard clinical magnetic resonance imaging (MRI) protocols and therefore precise segmentation is not feasible. Recent advances in 7 Tesla (T) MRI technology and great potential of deep neural networks facilitate automatic patient-specific segmentation. In this paper, we propose a novel deep learning framework (referred to as DCN-Net) for fast, accurate, and robust patient-specific segmentation of deep cerebellar dentate and interposed nuclei on 7T diffusion MRI. DCN-Net effectively encodes contextual information on the patch images without consecutive pooling operations and adding complexity via proposed dilated dense blocks. During the end-to-end training, label probabilities of dentate and interposed nuclei are independently learned with a hybrid loss, handling highly imbalanced data. Finally, we utilize self-training strategies to cope with the problem of limited labeled data. To this end, auxiliary dentate and interposed nuclei labels are created on unlabeled data by using DCN-Net trained on manual labels. We validate the proposed framework using 7T B0 MRIs from 60 subjects. Experimental results demonstrate that DCN-Net provides better segmentation than atlas-based deep cerebellar nuclei segmentation tools and other state-of-the-art deep neural networks in terms of accuracy and consistency. We further prove the effectiveness of the proposed components within DCN-Net in dentate and interposed nuclei segmentation.

Keywords: 7T diffusion MRI, deep cerebellar nuclei, deep neural networks, self-training, segmentation

I. INTRODUCTION

The cerebellum is crucially not only involved in complex motor, cognitive and linguistic tasks [1] but also emotional and perceptual processing [2], [3]. Of the cerebellum system, deep cerebellar nuclei (DCN) integrate signals from cerebellar cortex and relay them to cerebral cortex or brainstem. Therefore, the DCN play a central role to form a feedback loop of cerebellar cortex and cerebral cortex [3]. The DCN are divided into three parts: fastigial, interposed (globose and emboliform sub-nuclei), and dentate nuclei [4].

There have been on-going efforts to investigate the functional role of the DCN in neurological disorders and treatment [4]–[7]. Also, it has been reported that deep brain stimulation (DBS) therapy of the dentate nucleus is effective for poststroke motor impairments, tremor, and cerebellar ataxia [6]–[8]. Clear visualization of the DCN is thus a pre-requisite for such neuroimaging studies or neuro-modulation planning [4]. Moreover, automatic segmentation facilitates subsequent analysis in terms of consistency and efficiency. Although there have been studies to visualize the DCN with different magnetic resonance imaging (MRI) protocols [4], [9] or provide cerebellum atlases on common templates [10], [11], it is not trivial to precisely localize such small nuclei on subject-specific standard clinical MRIs. Indeed, little work has been done to automatically segment the DCN [11], [12]. Diedrichsen et al. [11] generates a probabilistic atlas (including the DCN) on a cerebellum template (SUIT) [10] and normalizes the cerebellum anatomy of a specific subject onto the SUIT atlas space. An estimated warp deformation field is then inversely applied to the SUIT atlas for segmentation of the cerebellum in the subject space. Ye et al. [12] utilizes a geometric deformable model with a tractography initialization. Only dentate segmentation results, however, are provided on a very small set of 3 Tesla (T) diffusion MRIs. More recently, Carass et al. [13] compares state-of-the-art segmentation methods that participated in a Cerebellum Parcellation Challenge of MICCAI 2017. However, lobes, vermis, lobules, and their subcomponents in a hierarchical level of the cerebellum were the primary interest of these methods.

With the advent in ultrahigh-field MR technologies, 7T MR imaging allows clear visualization of anatomical structures due to its superior contrast and resolution [14]–[16]. More recently, the 7T MRI system (The Magnetom Terra, Siemens Medical Solutions) received a 510k cleared for clinical use by the Food and Drug Administration (FDA). A number of studies leverage 7T MRIs for functional imaging study of the DCN [17], [18]. Diedrichsen et al. [17] provides a high quality atlas of the DCN and improves a normalization process using multiple contrast 7T MRIs. Thurling et al. [18] uses 7T functional MRIs to study the activation of the dentate nucleus in a verb generation task. Visual benefits of the 7T MRI also facilitate segmentation of the DCN that appear as hypo-intense or hyper-intense. Manual delineation of the DCN is, however, time-consuming and requires the anatomical expert-knowledge that is subjective and thus prone to intra- and inter-rater variability. 7T MR atlas-based segmentation automates the procedures, but does not adequately take into account inter-subject variability and oftentimes entails additional refinement steps. Therefore, it is required to fast, accurately, and consistently segment the DCN on the subject-specific image in a fully automatic way.

During the last decade, convolutional neural networks (CNN) have re-gained much attention due to state-of-the-art performance in computer vision and image processing tasks with increasingly available training data and computational power [19]. However, since it was typically designed for classification tasks on 2D images, volumetric segmentation in medical images has been limited due to low efficiency.

Recently, fully convolutional networks (FCN) have handled the issue by considering a whole network as a large convolution filter, trained in an end-to-end manner [20]. Given images with arbitrary size, this enables dense inference in a single step, and thus redundant convolutions and pooling operations can be avoided. Therefore, the FCN and its variants have proven their effectiveness in various medical image segmentation tasks [21]–[23]. Among such architectures, U-Net [23] is the most notable approach. The key feature, a skip connection allows the network to prevent loss of contextual information at multiple image scales and thus it has shown great potential for semantic segmentation. Schlemper et al. [24] extends U-Net by exploiting an attention mechanism to learn where to focus on for medical image segmentation. However, such FCNs usually have millions of parameters and require a large amount of labeled data to avoid over-fitting [25] , [26]. More recently, densely connected convolutional neural networks [27] have been developed to address those challenges and also incorporated into the FCN framework (FC-DenseNet) [25]. Dense connection enables an efficient gradient propagation, deep supervision, and reuse of features [26], [27]. The number of parameters to be optimized in the network is therefore dramatically reduced without harming its performance, which is applicable in clinical scenarios with limited labeled data.

In this paper, we propose a novel deep learning framework for fast, accurate, and robust patient-specific segmentation of the DCN (named DCN-Net). We use B0 images from 7T diffusion weighted imaging (DWI) to segment the DCN since the small nuclei appear as hypo-intense regions in these images, furthermore, DWI is becoming increasingly part of the standard protocol in the clinical workflow [28], [29]. We solely focus on dentate and interposed nuclei due to the lack of visible contrast of the fastigial nucleus in the B0 images. Fig. 1 visualizes the 7T DWI B0 image around the dentate and interposed nuclei. Since automatic segmentation mostly relies on the image appearance, it is not an easy task to simultaneously segment such small and adjacent structures, especially with low contrast boundaries, isointense, and fuzzy borders as displayed in Fig. 1. This is of significant importance in clinical scenarios where high quality data is not always available.

Fig. 1.

Fig. 1.

Dentate and interposed nuclei hypo-intense region in selected planes of axial, coronal, and sagittal views on the 7T B0 MRI of a specific subject (top: two selected planes (red) in the whole brain image, middle: ROI and dentate (blue)/interposed (light blue) contours on superior (S), anterior (A), and left (L) planes of corresponding views, bottom: ROI and dentate (blue) contours on inferior (I), posterior (P), and right (R) planes).

We herein address common yet important issues in deep learning-based segmentation of small, adjacent, imbalanced, and isointense structures - capturing contextual information on small patch images, handling highly imbalanced class labels, and overcoming the problem of limited labeled data.

3D patches (sub-volumes) based processing within the FCN is typically considered to reduce the memory burden and to significantly increase the number of training samples, especially in volumetric medical image segmentation [30]. However, learning features in very deep networks using such patches may not preserve local details of small dentate and interposed nuclei (whose volumes are approximately 630mm3 and 50mm3, respectively) due to consecutive pooling oper-ations. Several networks address this issue by employing a dilated convolution which adjusts the field of view of convolutional filters, thereby capturing the contextual information without max-pooling [31]–[35]. Chen et al. [32] uses the dilated convolution with different rates in parallel to encode multi-scale contextual information. It is later extended in [35] by adding a decoder to recover boundary details. Also, Gu et al. [33] introduces a dense atrous convolution block to extract high level features.

Data imbalance problem (i.e., a large volume difference between dentate and interposed nuclei in this work) is a longstanding challenge in the machine learning community. Training of deep neural networks on highly imbalanced labels may converge to local minima, resulting in suboptimal inference. There are still efforts underway to handle such class imbalance situation of isointense labels in the medical image domain. Hashemi et al. [36] proposes an exclusive multi-label multiclass training strategy for infant brain tissue segmentation. Also, Milletari el al. [37] uses Dice coefficient as an objective function for training of the FCN to address the imbalance between foreground and background. Furthermore, other similarity loss functions are introduced for detecting multiple sclerosis lesion in highly imbalanced data [38] and reducing the Hausdorff Distance in segmentation [39].

Creating high quality labels on clinical data requires human expertise and thus access to labeled data is very limited. There have been several strategies to deal with a limited number of labeled data in semi-supervised ways [40], [41]. Roy et al. [40] employs a popular brain segmentation tool to obtain auxiliary labels for pre-training. The pre-trained model is then fine-tuned on the limited manual labels. Radosavovic et al. [41] introduces a data distillation approach that generates extra labels by ensembling predictions from teacher models trained on different transformations of unlabeled data and then re-train a student model on the generated labels.

Our main contribution is threefold: 1) we introduce dilated dense blocks with exponentially increasing dilation rates to encode multi-scale contextual information without consecutive max-pooling and additional complexity in the encoding path. The new encoding path is integrated into a decoder of FC-DenseNet (hereafter, FC-Dense ContextNet). 2) We propose to independently learn label probabilities of dentate and interposed nuclei to handle a class imbalance problem with the multi-class hybrid asymmetric loss function. We incorporate an attention score map as a regularization term and an overlap penalty for avoiding overlaps between dentate and interposed nuclei into the total loss function. 3) We exploit self-training strategies to overcome the problem of limited labeled data, which allow the proposed network to utilize auxiliary labels for improving training. To the best of our knowledge, this is the first work to segment simultaneously deep cerebellar dentate and interposed nuclei using a deep neural network on the patient-specific MRI data.

The rest of the paper is organized as follows. We first detail aforementioned contributions in Section II. In Section III, we briefly describe the experimental setup. We then present and discuss segmentation results and carry out ablation study to demonstrate the effectiveness of the proposed components, followed by presenting limitations and future directions in Section IV. Finally, we conclude this work in Section V.

II. METHODS

In this section, we first extend FC-DenseNet to effectively encode contextual information at different scales. Also, we explain how to mitigate the class imbalance problem of dentate and interposed nuclei in the network followed by presenting the proposed loss function. Finally, self-training strategies are presented for handling limited labeled data. To formulate the training problem, let us denote training set by T={(Ui,Li),i=1,,n}, where Ui={uij,j=1,,mRm} is the input image patch and Li={lij,j=1,,mRm, lijc is the ground truth label patch. n is the number of training samples, m is the number of voxels in the image patch, and c is a label class index. We use region of interest (ROI) images around the DCN from whole brain images for efficient training. During the inference, the ROI on the test image is efficiently localized using the anatomical similarity between training images and test image (see Fig. 2). Overlapping 3D patches (sub-volumes) on the ROI are utilized as input of the model due to memory constraints and a limited number of training data. Overview of our proposed DCN segmentation framework (DCN-Net) is presented in Fig. 3. Each proposed component is detailed next.

Fig. 2.

Fig. 2.

Efficient ROI localization based on the anatomical similarity during the inference.

Fig. 3.

Fig. 3.

Overview of the proposed framework for dentate and interposed nuclei segmentation. (a) Overall scheme of DCN-Net. It consists of deep context-aware feature encoder, decoder of FC-DenseNet, independent label probability estimation, and attention regularization. (b) Description of convolution layers, transition layers, dense block, dilated dense block, and the proposed attention module.

A. DEEP CONTEXT-AWARE FEATURE LEARNING

FC-DenseNet [25] has been successfully applied in many segmentation tasks [25], [26], [38]. However, it still requires consecutive max-pooling to encode features on a larger receptive field, resulting in loss of detailed boundary information, especially that is critical in small structures.

A dilated convolution layer [31] adjusts the size of the receptive field by using a sparse convolutional kernel and thus can be exploited in the network to address the above problem without adding complexity. Inspired by [42], we propose to use dilated convolutional layers with exponentially increasing dilation rates in each dense block of the encoder (referred as to a dilated dense block). As shown in Fig. 3-(a), the number of convolutional layers (L) in the first, second, and third dilated dense block is 3, 4, and 5, respectively. Dilation rates (1, 1, 2), (1, 1, 2, 4), and (1, 1, 2, 4, 8) are applied to convolutional layers of three dilated dense blocks. such growing dilation rates allow us to avoid the gridding effect [42] in cascaded dilated convolutions with the same rate. Moreover, each dilated dense block aggregates contextual information at different scales, similar to the atrous spatial pyramid pooling [32]. Also, the number of channels in convolutional layers of each dilated dense block grows by 8.

we replace transition down blocks and dense blocks in the encoding path of FC-DenseNet with only dilated dense blocks as illustrated in Fig. 4-(a). Also, in order to build a deeper network without a memory burden for 3D input patches, we add max-pooling operations in the skip-connections. Intermediate feature maps at different scales are then concatenated into the decoder path. This architecture enables the network to keep the rich semantic information while going deeper into the network. Finally, we incorporate multi-scale (pyramid) input patches into max-pooled feature maps in the skip-connection (see also Fig. 3-(a)). This strategy facilitates learning of locality aware features by recovering information lost by max-pooling, thereby further improving the segmentation performance [43]. we use the network described herein as a backbone (named FC-Dense ContextNet).

Fig. 4.

Fig. 4.

Comparison of proposed architectures with existing models. (a) An encoder of FC-Dense ContextNet (with the proposed dilated dense blocks) replaces the existing encoder of FC-DenseNet for deep context-aware learning. (b) An independent label probability estimation with segmentation losses (LD and LI) for dentate and interposed nuclei, respectively, and an overlap loss (LO) is proposed to handle multi-label dependency during the training via the existing joint label probability estimation with a multi-class segmentation loss for background, dentate, and interposed nuclei (LD+I).

B. INDEPENDENT LABEL PROBABILITY ESTIMATION

Jointly learning representations of highly imbalanced class labels with similar intensity on the image might cause sub-optimal label prediction since it relies mostly on prevalence labels (e.g., in this work the ratio for the average number of voxels of dentate and interposed nuclei is 12.6:1).

To reduce the bias in training with imbalanced dentate and interposed nuclei labels, we propose an independent singlelabel multi-class training strategy which separately learns dentate and interposed nuclei label probabilities in a single network (see Fig. 4-(b)). Specifically, label probabilities of background/dentate pijcD and background/interposed pijcI, respectively, for the training sample i, the voxel j, and the label class index c are independently estimated using softmax activation at the final layer of the network:

pijcD=P(lijD=c|Ui,Θ),c{1:background,2:dentate}andpijcI=P(lijI=c|Ui,Θ),c{1:background,2:interposed nuclei}, (1)

where lijD is a ground truth background/dentate label and lijI is a ground truth background/interposed label for a voxel j, given the image patch Ui of a training sample i and network weights Θ. The ground truth label maps LiD and LiI are one-hot encoded:

gijcD={1,iflijD=c0,othersandgijcI={1,iflijI=c0,others. (2)

The Dice coefficient loss has been widely used for medical image segmentation [37]. However, since it equally weighs false positive and false negative, resulting in segmentation with low recall, it may not be suitable for highly imbalanced small objects [38]. In this study, we introduce a multi-class hybrid asymmetric loss which handles such a class imbalance problem of small objects by weighing false negative for higher recall and moreover focuses more on low probability classes (hard examples).

Tversky loss (T L) [44] generalizes the Dice coefficient loss by balancing false negative and false positive. Given estimated label probability map pijc and one-hot coded ground label map gijc, the loss function is defined as:

TLc=11nijpijcgijcjpijcgijc+αjpij1gij2+βjpij2gij1. (3)

where α and β control the influence of false positives and false negatives. In this work, we set α and β, respectively, as 0.3 and 0.7 to improve recall, weighing false negatives. This may lead to higher performance and generalization for segmentation of imbalanced small structures [44]. Also, Focal loss (F L) [45] extends the cross entropy loss by adding a coefficient (gijcpijc)2 to attend to the lower probability class during the training:

FLc=1nmij(gijcpijc)2log(pijc). (4)

Finally, the proposed hybrid segmentation loss for dentate and interposed nuclei, respectively, combines multi-class T L and F L:

LD(T;Θ)=cwc(πTTLcD+πFFLcD)andLI(T;Θ)=cwc(πTTLcI+πFFLcI). (5)

The class weight wc is computed from a volume ratio of dentate or interposed nuclei and background. πT and πF are weights for T L and F L, respectively, and equally set to 0.5.

While Hashemi et al. [36] uses a sigmoid function on cerebrospinal fluid and white matter labels, respectively, for whole brain tissue segmentation (multi-label multi-class problem), we estimate probabilities of dentate or interposed nuclei and background together using a softmax activation function (single-label multi-class problem).

C. ATTENTION REGULARIZATION AND OVERLAP PENALTY

We incorporate an attention regularization term and overlap penalty into a total loss function to accelerate convergence by focusing more on relevant regions and further improve the label prediction without uncertainty on the border between dentate and interposed nuclei.

Motivated by Schlemper et al. [24] and Jetley et al. [46], we exploit an attention mechanism to effectively leverage the salient features in the network. We introduce an attention module which is incorporated into the skip-connection. As illustrated in Fig. 3-(b), the attention module calculates an attention score map to highlight meaningful regions and suppress feature responses irrelevant to segmentation. The number of channels of an intermediate feature map from the dilated dense block in the encoder is first changed to the number of channels of a gating signal (coarse feature) from the dense block in the decoder using 1 × 1 × 1 convolution layer. The gating signal is then up-sampled to the dimension of the feature map. The following channel-wise operations (average pooling, max pooling, and squeeze) involve in where to attend by learning the spatial dependency. Outputs of the channel-wise operations are concatenated followed by exponential linear transformation and 1 × 1 × 1 convolution layer. The attention score map is finally obtained using a sigmoid function. The attention module output both the attention score map and the intermediate feature map scaled by the attention score map.

The final attention score map (a) concatenates attention score maps from the attention module in the skip-connections followed by 1 × 1 × 1 convolution layer (Fig. 3-(a)). The softmax probability, given ai and Θ for the training sample i, the voxel j , and the class label index c corresponds to:

pijcA=P(lD+Iij=cai,Θ),c{1:background,2:dentate and interposed nuclei}, (6)

where lijD+I is a ground truth background/dentate and interposed nuclei label. The ground truth label guides the final attention score map during the training by minimizing the attention loss as a regularization term which encourages attention to relevant features (Fig. 3-(a)). We utilize a categorical cross-entropy function for the attention loss:

LA(T;Θ)=1nmicjgijD+Ilog(pijcA),

where the one-hot coded map

gijD+I={1,iflijD+I=c0,others} (7)

The independent label probability learning effectively mitigates a class imbalance problem in segmenting dentate and interposed nuclei. However, since dentate and interposed nuclei are in the same vicinity and independently estimated structures are not mutually exclusive, there might be oftentimes overlapping regions between segmented dentate and interposed nuclei. We therefore propose to impose an overlap penalty during training. To this end, we incorporate Dice coefficient (DC) [47] between estimated dentate and interposed nuclei labels as the overlap loss into a total loss function:

LO(T;Θ)=1ni2jpij2Dpij2Ij(pij2D+pij2I). (8)

Finally, the total loss function in our proposed DCN-Net can be defined as:

Ltotal(T;Θ)=λDLD+λILI+λALA+λOLO, (9)

where λD, λI, λA, and λO are weight values for the dentate and interposed nuclei segmentation losses, the attention loss, and the overlap loss. The weights are empirically set to 1.0, 1.0, 0.5, and 0.1, respectively, in this work.

D. SEMI-SUPERVISED LEARNING

Collecting a large amount of clinical data is crucial for effective training of deep learning models driven by supervised learning. Data augmentation is a popular way to increase training data by adding spatial variation of data [33]. Also, patch (or sub-volumes) based processing in medical imaging is typically considered to significantly increase the number of training samples, while reducing the memory burden [30]. Even though a vast amount of unlabeled data is given, creating high quality labels is a core task to train a model. However, it requires anatomical expertise and is labor-intensive.

Inspired by [40] and [41], we utilize two self-training strategies that use predictions on unlabeled data to refine a model, handling data limitation as shown in Fig. 5. Our database contains 29 labeled data (pairs of 7T diffusion-weighted B0 MRIs and corresponding manual labels) and 31 unlabeled 7T B0 MRIs.

Fig. 5.

Fig. 5.

The self-training strategies by (a) pre-training on predicted labels and fine-tuning on manual labels and using (b) an expanded pool of training data with ensemble of labels created by model distillation. K = 31 (without failures) and N = 5 in this work.

First, given 31 unlabeled 7T B0 MRIs, corresponding dentate and interposed nuclei labels are predicted using initially trained DCN-Net on 29 manually labeled data. K pseudo labels are selected via visual inspection. The DCN-Net is pretrained on the pseudo labeled data and is then fine-tuned on 29 manually labeled data (Fig. 5-(a)).

While Roy et al. [40] uses a different automatic segmentation tool to obtain pseudo labels, we predict the labels using our own model trained with manually labeled data and use them to update the same model. The pre-training encourages the model to have a good initialization by leveraging pseudo labels from a wide variety of unlabeled data. Fine-tuning of the pre-trained model contributes to prune the discrepancy between a limited number of manual labels and pseudo labels.

We also exploit knowledge distillation [48] based on randomness of the model initialization as illustrated in Fig. 5-(b). We train the DCN-Net (randomly initialized) N-times on manual labels and then predict dentate and interposed nuclei labels on 31 unlabeled 7T B0 MRIs using N trained models. N predicted labels on each unlabeled data are fused via a majority voting to generate an auxiliary label (N = 5 in this work). We finally re-train the DCN-Net on a union set of 31 auxiliary labels and 29 manual labels. Ensembling predictions of transformed data in a single model can be more efficient than fusing predictions from multiple trained models [41], but in the medical image domain, since such data transformation might add biases on the shape of anatomical structures, we leverage the learned knowledge in randomly initialized models.

III. EXPERIMENTAL SETUP

We performed extensive experiments to validate our proposed model in deep cerebellar dentate and interposed nuclei segmentation. In this section, we present datasets, preprocessing, details about implementation and training, and evaluation metrics used for validation.

A. DATA AND PRE-PROCESSING

The 7T diffusion-weighted MRIs (B0) of 60 subjects were used in this work under approval of the Institutional Review Board at the University of Minnesota. Preprocessing steps included motion, susceptibility and eddy current distortions correction using FSL [49] and are detailed in [50]. The voxel size of the B0 image is 1.25 × 1.25 × 1.25mm3.

29 B0 images were randomly chosen for validation. Of deep cerebellar nuclei, fastigial nuclei could not be manually segmented due to low contrast in the image. We thus segmented deep cerebellar dentate and interposed nuclei which are visible on the B0 image. Dentate and interposed nuclei were manually labeled on the images and carefully cross-validated by three anatomical experts, and were served as ground truth [51]. Remaining 31 B0 images where there do not exist ground truth labels were used to create extra labels for semi-supervised learning (as described in section II-D).

We set bounding box regions around the dentate and interposed nuclei labels on whole brain images to facilitate training. For the ROI localization on a test image during the inference, a reference training image is chosen by measuring a similarity score (e.g., mutual information) [52] between training images and the test image. The reference image is linearly registered onto the test image and corresponding dentate and interposed nuclei mask is then transformed onto the test image (see Fig. 2). This is considered more suitable than rather applying different architectures (e.g., Mask R-CNN [53]) to localize the bounding box around small target structures due to spatial similarity of brain anatomy.

B. IMPLEMENTATION AND TRAINING DETAILS

We implemented the proposed DCN-Net using Keras [54] library package on top of Tensorflow [55]. We compared DCN-Net with the existing DCN segmentation tools (SUIT [11]) and publicly available state-of-the-art deep neural networks (DNNs) – DeepMedic [21], LiviaNet [22], U-Net [23], Attention U-Net [24], FC-DenseNet [25], DeepLab v3+ [35] (with FC-DenseNet as a backbone), and CE-Net [33].

To this end, we utilized the SUIT toolbox available in SPM12 [56] and adopted original implementations of the networks. 2D architectures are extended to 3D for volumetric segmentation of dentate and interposed nuclei.

For fair comparison, we trained models under the same environment. The networks were initialized using the approach proposed in [57]. Random seeds in kernels were fixed during the training to avoid biases from different initialization. The parameters were optimized with Adam [58] with 0.001 learning rate. The proposed multi-class hybrid asymmetric loss was applied in other networks to eliminate the effect of different loss functions in segmentation performance. The size of minibatches is 8 and the number of epochs is 50. Training is early stopped after 20 epochs without improvement on a validation set and then we take model weights with a minimal validation loss to avoid overfitting on test set.

3D patches (sub-volumes) based processing within each network is considered to reduce the memory burden and to significantly increase the number of training samples. The size of input patches is 32 × 32 × 32 (the effect of different patch sizes in segmentation is compared later in Section IV-D). Predicted output patches are the same size as input patches. The step size for patch extraction is 5 × 5 × 5 (patch step size 9 × 9 × 9 is also used for investigating the effect of the number of training samples in segmentation (Section IV-C)). Training of networks was done with NVIDIA Tesla K40 GPU.

The source code and relevant materials will be available from the corresponding author upon reasonable request.

C. EVALUATION METRICS

DC [47], center of mass distance (CMD), mean surface distance (MSD) between ground truth and segmented results of each method, and volumes were computed for quantitative analysis. Those metrics are considered highly relevant to accurately represent any small anatomical structures [52].

For statistical analysis of each measure, a paired t-test was performed on single comparisons. A one-way analysis of variance (ANOVA) and Tukey’s honest significance post-hoc test were conducted for multiple comparisons.

Five-fold cross validation on 29 test sets was used for evaluation. 20% of the training data was used as validation set. Thus, we trained the models on five combinations of training and test data in which each combination consists of 19 or 20 training sets, 4 validation sets, and 5 or 6 test sets. Note that, since we have utilized overlapping 3D patches (e.g.,32 × 32 × 32 size and 5 × 5 × 5 step) as input of the models, the number of training samples (patch images) used in the model was approximately 3500 at each fold (which is discussed in Section IV-C).

IV. RESULTS AND DISCUSSION

In this section, we present quantitative and qualitative segmentation results of our proposed model with comparison to the existing DCN segmentation tool and state-of-the-art deep learning architectures. We investigate the effect of the number of training samples in the segmentation performance of each deep learning model. The influence of the size of image patches in segmentation within the proposed DCN-Net is also explored. Moreover, we provide in-depth study on the impact of each component in our proposed model followed by discussing limitations and future direction.

A. QUANTITATIVE COMPARISON WITH THE EXISTING TOOL

Of cerebellum parcellation methods presented in [13], SUIT [11] is able to localize the DCN on subject-specific data using the probabilistic atlas of the cerebellum. SUIT also provides a tool specific to studying the dentate for a group of subjects but instead of segmenting the dentate, it aims at optimizing the normalization of the subject data to an atlas space by using user-provided dentate masks [17]. Since it is different from the purpose of this study, we compared segmentation results obtained by using SUIT (with DARTEL normalization [59]) with segmentation based on the proposed DCN-Net (without self-training strategies).

As shown in Fig. 6, DCN-Net significantly outperformed in every metric for dentate and interposed nuclei segmentation (p <0.001). The consistency of SUIT segmentation was also much worse. Such a large variance was attributed to uncertainty in registration processes that might be influenced by anatomical variability in different populations, discrepancy of magnetic field strength, contrast, or resolution between atlas template and subject-specific data, and robustness of normalization techniques [52]. This confirms that atlas-based segmentation requires additional refinement steps. Further, the inference time of SUIT (~30min) was much longer than that of DCN-Net (~0.5min).

Fig. 6.

Fig. 6.

Box plots of (a) dentate and (b) interposed nuclei segmentation based on SUIT and DCN-Net. * and **, respectively, indicate p<0.05 and p<0.001. See Table S-I of the supplementary material for average values and deviation.

B. QUANTITATIVE COMPARISON WITH STATE-OF-THE-ART DNNS

Dentate and interposed nuclei segmentation results obtained by using U-Net [23], Attention U-Net [24], FC-DenseNet [25], DeepLab v3+ [35], CE-Net [33], and DCN-Net (without self-training strategies) are quantitatively compared in Fig. 7. Note that training DeepMedic [21] and LiviaNet [22] failed to converge due to local minima in spite of using the proposed multi-class hybrid asymmetric loss.

Fig. 7.

Fig. 7.

Box plots of (a) dentate and (b) interposed nuclei segmentation based on state-of-the-art DNNs and DCN-Net with different patch step sizes. Statistical significance is marked as * and **, respectively, for p<0.05 and p<0.001 (blue for 5 × 5 × 5 patch step and red for 9 × 9 × 9 patch step). See Table S-II and S-III of the supplementary material for average values and deviation.

DCN-Net produced significantly better dentate segmentation results than any of the state-of-the-arts (p <0.05 in MSD and DC). In interposed segmentation, DCN-Net also showed better performance than state-of-the-art methods in terms of average errors, which was mostly not statistically significant (p>0.05).

Overall, average errors and deviation in interposed segmentation were larger than those in dentate segmentation. This might be caused by the smaller size of interposed nuclei [60]. On the other hand, volume of interposed segmentation was closer to the manual label than dentate segmentation. Volume difference between dentate segmentations and the manual label was significant (p<0.05).

Table I gives the number of trainable parameters in each network. Densely connected network based models - FC-DenseNet, DeepLab v3+, and DCN-Net - have much fewer parameters due to their unique architecture that does not require the re-learning of redundant feature maps, that is important to reduce overfitting in training, resulting in better segmentation [25]. Note that DCN-Net has a comparable number of parameters with FC-DenseNet while it maintains the size of input patches without max-pooling in the encoder due to proposed dilated dense blocks, which allows for capturing contextual information.

TABLE I.

The number of trainable parameters for deep neural networks

U-Net Attention U-Net FC-DenseNet DeepLab v3+ CE-Net DCN-Net
The number of trainable parameters 5,605k 5,862k 742k 1,280k 2,742k 890k

C. THE EFFECT OF THE NUMBER OF TRAINING SAMPLES

To investigate the effect of the number of training samples in segmentation, we compared segmentation results obtained by using models trained with patches of 9 × 9 × 9 (e.g., 765 training samples) and 5 × 5 × 5 step sizes (e.g., 3,453 training samples). As shown in Fig. 7, training models with patches of 9 × 9 × 9 step size worsened segmentation performance. The larger patch step size (i.e., less training samples) influenced more dentate segmentation than interposed segmentation. Particularly, while attention U-Net, FC-DenseNet, DeepLab v3+, and CE-Net with less training samples yielded significantly worse dentate segmentation (p<0.05 in DC), DCN-Net and U-Net produced comparable results with both patch step sizes.

Deterioration in interposed segmentation obtained by using networks with less training samples was not statistically significant (p>0.05). DCN-Net still achieved significantly better performance in dentate segmentation than state-of-the-art methods with less training samples (p<0.05 in MSD and DC). In interposed segmentation, DCN-Net also produced lower average error than state-of-the-art networks. Particularly, DCN-Net significantly outperformed U-Net and CE-Net (p<0.05 in MSD and DC) and FC-DenseNet (p<0.05 in DC).

We observed that the performance of DCN-Net trained with less training samples is comparable to state-of-the-art models trained with more training samples both in dentate and interposed nuclei segmentation. This might prove the robustness of DCN-Net to the smaller number of training samples.

D. THE EFFECT OF THE 3D PATCH SIZE

We conducted an additional experiment to investigate the effect of different image patch sizes in segmentation of dentate and interposed nuclei. Patch-based processing is preferred to significantly increase training samples and handle memory issues, especially for volumetric medical image segmentation. For example, there are only 29 training images in our case, but 3,453 training samples (on the ROI images) are available when using 32 × 32 × 32 patch size and 5 × 5 × 5 step size. Since the larger patches may cause memory burden during training while the smaller patches are insufficient to encode contextual information, it is thus important to select a proper image patch size in the network.

Segmentation results obtained by using our proposed DCN-Net trained with 16 × 16 × 16, 24 × 24 × 24, 32 × 32 × 32, and 40 × 40 × 40 patch size with 5 × 5 × 5 step size were quantitatively compared in Fig. 8. Using the larger patch sizes (32 × 32 × 32 and 40 × 40 × 40) produced significantly better dentate segmentation than using the smaller patch size (p<0.05 in DC). This might explain that the larger patch size allows the model to encode the contextual information on the dentate region. For interposed nuclei segmentation, using the patch size of 32 × 32 × 32 showed slightly better DC value than using other patch sizes. Interestingly, the patch size of 40 × 40 × 40 showed the worst DC value and thus it might not be encouraged for encoding features on such smaller region. Further, it resulted in high computational overhead so that training took much longer time than using smaller patch sizes.

Fig. 8.

Fig. 8.

Box plots of (a) dentate and (b) interposed nuclei segmentation based on DCN-Net with different patch sizes. Statistical significance is marked as * and **, respectively, for p<0.05 and p<0.001. See Table S-IV of the supplementary material for average values and deviation.

The patch size of 32 × 32 × 32 is considered an optimal choice for better dentate and interposed nuclei segmentation (in terms of DC value) in this work.

E. QUALITATIVE ANALYSIS

Figures 9 and 10 visualize dentate and interposed nuclei segmentation on the 7T B0 MRI of a specific subject (see also a visual example in Fig. 1).

Fig. 9.

Fig. 9.

Visual comparison of dentate and interposed nuclei segmentation results on a low contrast 7T B0 MRI of a specific subject. The first column shows ground truth dentate and interposed nuclei and volumetric segmentations obtained by SUIT (DARTEL), U-Net, and Attention U-Net. The last three columns are corresponding contours on two selected planes of Fig. 1 along with measures (average CMD, MSD, DC, and volume (VOL) in both sides). Red is the segmented dentate, green is the segmented interposed, blue is the ground truth dentate, and light blue is the ground truth interposed.

Fig. 10.

Fig. 10.

Visual comparison of dentate and interposed nuclei segmentation results on a low contrast 7T B0 MRI of a specific subject. The first column shows dentate and interposed nuclei volumetric segmentations obtained by FC-DenseNet, DeepLab v3+, CE-Net, and the proposed DCN-Net. The last three columns are corresponding contours on two selected planes of Fig. 1 along with measures (average CMD, MSD, DC, and volume (VOL) in both sides). Red is the segmented dentate, green is the segmented interposed, blue is the ground truth dentate, and light blue is the ground truth interposed.

SUIT produced over-segmentation around the boundary of dentate and interposed nuclei. An incorrect registration between a low contrast test image and atlas template in SUIT might cause such a large systematic error in segmentation. State-of-the-art DNNs yielded better segmentation results than SUIT, but some artifacts were still observed on the region with similar intensity level, especially in the superior part (see the axial view of each method in Figures 9 and 10). Overall, segmentation results obtained by using DCN-Net were visually and quantitatively closer to the ground truth than segmentation based on state-of-the-art DNNs. This also exemplifies the robustness of our proposed DCN-Net to a low contrast image.

F. ABLATION STUDY

We carried out an in-depth study on the impact of each component in DCN-Net for dentate and interposed nuclei segmentation. To this end, we performed segmentation based on DenseNet with the proposed dilated dense blocks (Dilated DenseNet), FC-DenseNet, FC-Dense ContextNet, and DCN-Net with and without self-training strategies. Segmentation results are quantitatively compared in Fig. 11. The effectiveness of proposed components is further studied next.

Fig. 11.

Fig. 11.

Box plots of (a) dentate and (b) interposed nuclei segmentation for ablation study. Statistical significance is marked as * and **, respectively, for p<0.05 and p< 0.001. See Table S-V of the supplementary material for average values and deviation.

1). DILATED DENSE BLOCKS IN THE ENCODER AND DECODER STYLE:

FC-Dense ContextNet (backbone) shares dilated dense blocks in the encoder with Dilated DenseNet and has the decoder to recover high-resolution features as FC-DenseNet does. FC-Dense ContextNet significantly outperformed Dilated DenseNet in dentate segmentation (p<0.05 in CMD and p<0.001 in MSD and DC) and also yielded significantly better interposed segmentation results (p<0.05 in DC), proving the effectiveness of the decoder in segmentation.

Moreover, FC-Dense ContextNet yielded significantly better dentate segmentation than FC-DenseNet (p<0.05 in DC) and also showed slightly more accurate interposed segmentation by exploiting dilated dense blocks in the encoder. To intuitively explain the effect of the proposed dilated dense blocks in FC-Dense ContextNet, we provide intermediate feature maps at each scale obtained from encoders of FC-Dense ContextNet and FC-DenseNet, respectively, using a 2D image example in Fig. 12. Compared with the dentate label of the input image, dilated dense blocks followed by max-pooling in FC-Dense ContextNet produced clearer feature maps than maxpooling followed by dense blocks in FC-DenseNet (see F2 and F3 in Fig. 12). This exemplifies the benefit of the dilated dense blocks in the FC-Dense ContextNet to extract features without max pooling operation followed by convolutional layers. Feature maps of interposed nuclei in both networks, however, were not clear due to its small size (appeared in a few voxels). To address this, multi-scale (pyramid) input patches are incorporated into feature maps after max-pooling in the skip-connection of the proposed DCN-Net and loss of details in interposed nuclei can be thus recovered in the decoder (see also the deep context-aware feature encoding block in Fig. 3-(a)).

Fig. 12.

Fig. 12.

Comparison of intermediate feature maps in encoders of FC-Dense ContextNet and FC-DenseNet. (a) Feature maps in each resolution (F1, F2, and F3) are extracted from skip-connection in the encoder of FC-Dense ContextNet (top) and FC-DenseNet (bottom) and convolution with 1 × 1 filter size is performed to visualize one channel feature map for simplicity. One slice image that contains dentate and interposed nuclei and has the same size as image patch (32 × 32) is used for the experiment. The input image and intermediate feature maps in the encoder of (b) FC-Dense ContextNet and (c) FC-DenseNet are compared. Feature maps F2 and F3 are scaled for visual comparison in each resolution.

In this study, we can clearly see that dilated dense blocks boost the feature representation power in the network, especially with higher impact of a decoder on segmentation.

2). MULTI-CLASS HYBRID ASYMMETRIC LOSS FUNCTION:

Training the backbone network (FC-Dense ContextNet) with multi-class DC loss did not converge, while training the network with the proposed multi-class hybrid asymmetric loss converged very fast (See Fig. S-1 of the supplementary material for training loss and validation loss curve for the number of epochs). Unfortunately, the trained model with the DC loss failed to produce dentate and interposed nuclei segmentation results. This suggests that the DC loss may not be proper for segmentation of highly imbalanced small organs despite its popularity in medical image segmentation. The proposed multi-class hybrid asymmetric loss, however, was able to handle such a class imbalance problem in segmentation by adjusting weights of false negative and false positive for achieving higher recall, encouraging training to focus more on hard examples (low probability classes).

3). INDEPENDENT LABEL PROBABILITY ESTIMATION:

Independently estimating label probabilities yielded better segmentation results than jointly estimating label probabilities in FC Dense ContextNet in terms of accuracy and consistency as summarized in Table II. We observed that it is more effective in dentate segmentation (13.3% in CMD and 2.4% in DC) with the statistical significance (p <0.001 in MSD and DC) than interposed segmentation (2.3% in CMD and 0.5% in DC). This justifies that this strategy helps the network to avoid biases, especially in the larger label, induced by interdependency of highly imbalanced class labels during the multilabel probability estimation.

TABLE II.

Quantitative comparison between joint and independent label probability estimation in FC-Dense ContextNet for dentate and interposed nuclei segmentation

Target Dentate Interposed

Metric CMD (nm) MSD (mm2) DC Volume (mm3) CMD (nm) MSD (mm2) DC Volume (mm3)
Joint label estimation 0.562±0.40 0.422±0.14 0.855±0.06 758±195 (668±159) 1.141±0.88 0.488±0.25 0.669±0.14 55±18 (53±24)
Independent label estimation 0.482±0.29 0.375±0.10 0.875±0.04 739±167 (668±159) 1.115±0.89 0.489±0.31 0.673±0.14 59±21 (53±24)
*

Bold indicates p<0.001 for paired t-tests with joint label estimation. () is ground truth volume.

4). ATTENTION REGULARIZATION AND OVERLAP PENALTY:

We investigated the effectiveness of attention and overlap losses in FC-Dense ContextNet with independent label probability estimation.

Table III summarizes each measure in dentate and interposed nuclei segmentation. While the attention regularization term slightly improved interposed segmentation, it has not been effective for dentate segmentation. This might explain that the attention loss helps detection of smaller organs, suppressing irrelevant features.

TABLE III.

Quantitative comparison of FC-Dense ContextNet (independent label probability estimation) with attention loss and overlap loss in dentate and interposed nuclei segmentation

Target Dentate Interposed

Metric CMD (mm) MSD (mm2) DC Volume (mm3) CMD (mm) MSD (mm2) DC Volume (mm3)
FC-Dense ContextNet (independent label estimation) 0.482±0.29 0.375±0.10 0.875±0.04 739±167 (668±159) 1.115±0.89 0.489±0.31 0.673±0.14 59±21 (53±24)
+ attention loss 0.549±0.42 0.392±0.08 0.867±0.04 770±165 (668±159) 1.040±0.75 0.472±0.24 0.680±0.12 59±23 (53±24)
+ overlap loss 0.555±0.37 0.377±0.10 0.874±0.04 735±172 (668±159) 1.047±0.82 0.467±0.21 0.687±0.13 58±21 (53±24)
+ attention loss and overlap loss (DCN-Net) 0.514±0.35 0.380±0.11 0.873±0.05 736±165 (668±159) 1.085±0.92 0.514±0.35 0.682±0.16 53±18 (53±24)
*

Bold indicates p<0.001 for paired t-tests with FC-Dense ContextNet. ( ) is ground truth volume.

Fig. 13 visually compares dentate and interposed nuclei segmentation with and without the overlap loss to validate its effectiveness. We observed that the overlap penalty in the loss function reduces overlaps between dentate and interposed nuclei that might be caused by separately estimating label probabilities.

Fig. 13.

Fig. 13.

A visual example on the axial plane that represents the effectiveness of overlap loss. Dentate (red) and interposed nuclei (green) of left (top) and right (bottom) are segmented using FC-Dense ContextNet (independent label estimation) without and with the overlap penalty.

Both attention map and overlap penalty in the proposed Fc-Dense ContextNet (DCN-Net) slightly improved interposed segmentation (2.7% in CMD and 1.5% in DC) while reducing overlaps between dentate and interposed nuclei. An interesting observation is that such an overlap constraint compensated for a slight deterioration in dentate segmentation obtained by FC-Dense ContextNet with only attention loss (see also volume difference between dentate segmentations when adding overlap loss in Table III).

5). SELF-TRAINING STRATEGIES:

We validated two self-training strategies described in Fig. 5 to effectively train the proposed DCN-Net on a limited number of labeled data. As shown in Table IV, while self-training strategies slightly improved interposed segmentation, it has not been effective for dentate segmentation. Similar to the impact of the attention loss, it might be difficult to further improve segmentation as it reaches theoretically optimal performance bound simulated by its volume [60]. For interposed segmentation, expanding the pool of training data (auxiliary and manual labels) via model distillation has shown to be more effective than pretraining with predicted labels and fine-tune training with manual labels (e.g., 0.701 ±0.16 and 0.686±0.14, respectively, in DC). Interestingly, pre-training with pseudo dentate and interposed nuclei labels (segmented by using DCN-Net trained on manual labels) achieved comparable results to DCN-Net trained on only manual labels. This proves that the pseudo labels were well segmented with DCN-Net and thus this facilitated semi-supervised learning.

TABLE IV.

Quantitative comparison of DCN-Net with Self-Training strategies in dentate and interposed nuclei segmentation.

Target Dentate Interposed

Metric CMD (mm) MSD (mm2) DC Volume (mm3) CMD (mm) MSD (mm2) DC Volume (mm3)
Manual labels 0.514±0.35 0.380±0.11 0.873±0.05 736±165 (668±159) 1.085±0.92 0.514±0.35 0.682±0.16 53±18 (53±24)
Self-training strategy #1 (pre-training) 0.613±0.46 0.373±0.12 0.872±0.05 713±155 (668±159) 1.029±0.76 0.486±0.22 0.676±0.15 52±16 (53±24)
Self-training strategy #1 (fine-tune) 0.529±0.42 0.381±0.12 0.874±0.05 731±165 (668±159) 1.003±0.78 0.424±0.24 0.686±0.14 67±24 (53±24)
Self-training strategy #2 (model distillation) 0.552±0.44 0.390±0.13 0.868±0.05 778±185 (668±159) 1.005±0.94 0.468±0.43 0.701±0.16 60±23 (53±24)
*

Bold indicates p<0.05 for paired t-tests with DCN-Net (trained on manual labels). () is ground truth volume.

Additionally, we trained U-Net with self-training strategies and performed dentate and interposed nuclei segmentation to validate the effectiveness of self-training in a different network. As summarized in Table V, self-training strategies significantly improved interposed segmentation performance of U-Net (p<0.05 in DC), that was thus comparable to that of DCN-Net with and without self-training strategies, while they have deteriorated dentate segmentation. Unlike pre-training of DCN-Net, pseudo dentate labels (obtained by using U-Net trained on manual labels) have impeded to pretraining of U-Net. Dentate segmentation obtained by using the pre-trained U-Net was significantly worse than dentate segmentation results from U-Net trained on only manual labels (p<0.05 in CMD, MSD, and DC). This indicates that the quality of pseudo dentate labels was not good enough for selftraining and also explains deterioration in dentate segmentation obtained by using self-trained U-Net. Interposed segmentation obtained by using the pre-trained U-Net on pseudo labels, on the contrary, was improved and slightly better than interposed segmentation obtained by using pre-trained DCN-Net on pseudo labels (0.688±0.17 vs. 0.676±0.15 in DC). We demonstrate that high quality labels segmented by using DCN-Net trained on manual labels facilitate self-training of DCN-Net, thereby further improving segmentation results, especially in interposed nuclei.

TABLE V.

Quantitative comparison of U-Net with Self-Training strategies in dentate and interposed nuclei segmentation.

Target Dentate Interposed

Metric CMD (mm) MSD (mm2) DC Volume (mm3) CMD (mm) MSD (mm2) DC Volume (mm3)
Manual labels 0.616±0.53 0.431±0.13 0.853±0.06 724±184 (668±159) 1.148±0.95 0.574±0.43 0.659±0.15 49±18 (53±24)
Self-training strategy #1 (pre-training) 1.45±1.25 0.770±0.18 0.698±0.12 1216±267 (668±159) 1.000±1.01 0.598±0.60 0.688±0.17 42±14 (53±24)
Self-training strategy #1 (fine-tune) 0.668±0.56 0.462±0.17 0.837±0.08 793±213 (668±159) 0.977±0.78 0.511±0.36 0.687±0.15 51±19 (53±24)
Self-training strategy #2 (model distillation) 0.684±0.61 0.452±0.13 0.843±0.06 806±190 (668±159) 1.02±0.93 0.52±0.38 0.693±0.17 48±20 (53±24)
*

Bold indicates p<0.05 for paired t-tests with U-Net (trained on manual labels). ( ) is ground truth volume.

G. LIMITATION AND FUTURE WORKS

We have validated DCN-Net only in our own dataset obtained from unique acquisition protocols since public datasets with DCN ground truth labels are not currently available. To evaluate generalizability and robustness of DCN-Net, unseen cases of a large-scale data across centers should be explored and segmentation results need to be comprehensively analyzed.

Dentate and interposed nuclei have been segmented in this study using the B0 contrast from DWI and appear as hypo-intense regions. DWI is becoming part of the standard-of-care for imaging protocols in the clinical practice. Further, tractography based on DWI may allow for studying connections of the whole cerebellar system [61]. As demonstrated in many literatures [4], [9]–[11], [17], [62], various structural MR imaging modalities also allow us to directly identify the DCN. Unfortunately, most of the modality 7T images in our database (e.g., susceptibility-weighted imaging and T2-weighted image) were optimized to cover the basal ganglia region that was needed for deep brain stimulation studies and thus did not contain the cerebellum. Recently introduced quantitative susceptibility mapping (QSM) also exhibits clear visualization of sub-cortical brain structures due to higher levels of brain iron that induce susceptibility contrast [63]. However, 1) QSM is not part of the standard clinical imaging protocols and 2) it requires additional post-processing steps with uncertainty in susceptibility estimates under different acquisition protocols that still needs to be extensively investigated for its clinical use [64].

While we have considered the B0 diffusion MRI to visualize and segment the DCN in this study for the reasons, it should be noted that the proposed model can be exploited to segment the DCN on other clinically available MRI modalities where the nuclei are fairly visible due to its feature representation power, given sufficient training data. Furthermore, to find more suitable MRI contrasts for segmentation, the effectiveness of such MRI modalities in segmentation within DCN-Net can be evaluated. Also, learning an optimal combination of different contrasts, leveraging complementary information on the images should improve segmentation.

7T MR imaging enables direct identification of many anatomical structures thanks to its superior contrast and signal-to-noise ratio (SNR). Recently, the clinical use of 7T MRI has been approved by FDA. However, there are still a limited number of 7T MRI machines in current practice due to the significant hardware cost. It may therefore be needed to segment the DCN on standard clinical 3T MRI. Unfortunately, standard clinical MRI systems do not provide clear visualization of the DCN due to the poor SNR and contrast and thus oftentimes require post-processing steps for enhancement (e.g., QSM). The quality of the manual segmentation for the ground truth labels on clinical 3T MRIs may not be sufficient for training or evaluation of the model. Future direction would be to utilize the 7T knowledge learned within the DCN-Net that can be transferred to segment the DCN on 3T MRIs (similar to 7T guided 3T MRI segmentation [52]).

The scalability of the proposed DCN-Net could allow for its wide use in practice. However, it was not fully assessed in this work due to the limited access to large-scale experts-labeled data. It is thus crucial to collect and label a large amount of MRI data with different magnetic fields, contrasts, and pathologies across various centers as discussed above. As more 7T systems are coming on-line and more labeled data will become available, the scalability of the proposed method on such large-scale datasets could be tested in an adequate manner.

Although the proposed DCN-Net was specialized in segmenting dentate and interposed nuclei, the issues handled in this paper - contextual information, class imbalance of small structures, and limited labeled data - are highly relevant to a slew of other applications in medical imaging. Further validation of DCN-Net on publicly available datasets to demonstrate its applicability to different segmentation tasks (e.g., brain tissue or subcortical structures) remains as a future work.

Finally, this work may allow neurosurgeons and clinicians to facilitate neuroanatomical studies of the DCN and/or dentate nucleus DBS planning by providing fast, accurate, and robust patient-specific deep cerebellar dentate and interposed nuclei segmentation. Particularly, dentate nucleus DBS treatment has been effective for post-stroke motor recovery [6]–[8]. Precise DBS lead placement within such a small nucleus is critical for maximizing benefits and minimizing side effects in the treatment [8]. A patient-specific volumetric dentate model provided by the proposed method may support the correct localization of the DBS lead. Moreover, such capabilities might lead to development of a simulation method for automatic DBS parameter optimization. The clinical feasibility of the proposed model in dentate nucleus DBS treatment needs to be further proven by retrospectively evaluating the postoperative electrode active contact locations on the dentate nucleus segmentation produced by our model.

Supplementary Material

supp1-2998537

Acknowledgments

This work was supported in part by R01-NS081118, R01-NS113746, P50-NS098573, P30-NS076408 and P41-EB027061.

Biographies

graphic file with name nihms-1602087-b0014.gif

JINYOUNG KIM received the B.S and M.S. degrees in electrical and computer engineering from Hanyang University, South Korea, in 2004 and 2008, respectively, the M.S. degree in electrical and computer engineering from the University of Minnesota, Twin-Cities, MN, USA, in 2012, and the Ph.D. degree in electrical and computer engineering from Duke University, Durham, NC, USA, in 2015.

He was a Research Scientist at Surgical Information Sciences, Inc., Minneapolis, MN, USA, from 2016 to 2018, where he developed a clinical image-guided targeting and planning system for subthalamic nucleus-deep brain stimulation surgery (based on doctoral studies and FDA 510(k) cleared). He is currently a Research Fellow with the Center for Magnetic Resonance Research at the University of Minnesota, Twin-Cities, MN, USA. His research interests lie in computer vision, machine learning (deep learning), medical image computing with applications to healthcare and medicine, particularly in image-guided surgical planning and clinical decision support.

Dr. Kim has been a member of the Medical Image Computing and Computer Assisted Intervention society.

graphic file with name nihms-1602087-b0015.gif

RÉMI PATRIAT received a B.A. degree in Physics and Film Studies at the University of Minnesota, Morris in 2010, an M.S. degree in Medical Physics from the University of Wisconsin, Madison in 2012 and a PhD in Medical Physics from the same institution in 2015.

From 2015 to 2017, he was a Postdoctoral Researcher with the Center for Magnetic Resonance Research at the University of Minnesota. He was then promoted to Research Associate in 2017 and Assistant Professor in 2019. For the past decade, he has been involved in medical imaging in humans and more specifically the use of structural and functional magnetic resonance imaging (fMRI) in research and clinical settings. His current research focuses on the improvement of deep brain stimulation (DBS) therapy for movement disorders by using high-resolution MRI at 7 Tesla (T).

Dr. Patriat has been a member of the Organization for Human Brain Mapping, the International Society of Magnetic Resonance in Medicine for many years. Awards and honors include the ISMRM Magna Cum Laude Merit Award, the Abbott Award in Physics and other academic recognitions.

graphic file with name nihms-1602087-b0016.gif

JORDAN KAPLAN is an undergraduate student at Stanford University who anticipates earning a B.Sc. degree in Human Biology upon graduation in 2022. He has been a research assistant with the Center for Magnetic Resonance Research at the University of Minnesota since 2017.

graphic file with name nihms-1602087-b0017.gif

OREN SOLOMON received the B.S. degree in electrical engineering from Ben-Gurion University of the Negev, Beersheba, Israel, in 2008, the M.S. degree (cum laude) in electrical engineering from Tel Aviv University, Tel Aviv, Israel, in 2012, and the Ph.D. degree in electrical engineering from the Technion Israel Institute of Technology, Haifa, Israel in 2019.

He is currently a post-doctoral associate with the Center for Magnetic Resonance Research at the of Minnesota, Twin-Cities, MN, USA. His research focus is on the development of algorithmic tools for biomedical imaging in ultrasound and magnetic resonance.

graphic file with name nihms-1602087-b0018.gif

NOAM HAREL received the B.Sc. degree in biology from Tel Aviv University, Israel and the M.Sc. and Ph.D. degrees in physiology and neuroscience for mapping auditory areas using optical imaging techniques from the University of Toronto, Canada, in 1996 and 2000, respectively.

For his post doctoral training, he moved to the Center for Magnetic Resonance Research (CMRR), University of Minnesota where his research focused on the development of methods for high-resolution MRI and functional MRI (fMRI) applications using high magnetic fields (7T & 10.5T). In particular, he developed fMRI capabilities for mapping columnar and laminar organization in cerebral cortex both in human and animal models.

Prof. Harel’s current research focuses on the development and integration of 7T MRI data into deep brain stimulation (DBS) surgical navigation in particular and brain surgery in general. Prof. Harel is currently a Professor in the departments of Radiology and Neurosurgery, University of Minnesota and is the PI of the Imaging Core of the Udall Center for Excellence in Parkinson’s Disease.

References

  • [1].Manto M, “The Cerebellum, Cerebellar Disorders, and Cerebellar Research-Two Centuries of Discoveries,” Cerebellum, vol. 7, no. 4, pp. 505–516, 2008. [DOI] [PubMed] [Google Scholar]
  • [2].Schutter Dennis J. L. G. and Van Honk Jack, “The cerebellum on the rise in human emotion,” Cerebellum, vol. 4, no. 4, pp. 290–294, 2005. [DOI] [PubMed] [Google Scholar]
  • [3].Baumann O et al. , “Consensus Paper: The Role of the Cerebellum in Perceptual Processes,” Cerebellum, vol. 14, no. 2, pp. 197–220, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Habas C, “Functional imaging of the deep cerebellar nuclei: A review,” Cerebellum, vol. 9, no. 1 pp. 22–28, 2010. [DOI] [PubMed] [Google Scholar]
  • [5].Baumel Y, Jacobson GA, and Cohen D, “Implications of functional anatomy on information processing in the deep cerebellar nuclei,” Front. Cell. Neurosci, vol. 3, no. 14, pp. 1–8, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Wathen CA, Frizon LA, Maiti TK, Baker KB, and Machado AG, “Deep brain stimulation of the cerebellum for poststroke motor rehabilitation: From laboratory to clinical trial,” Neurosurg. Focus, vol. 45, no. 2, p. E13, 2018. [DOI] [PubMed] [Google Scholar]
  • [7].Teixeira MJ et al. , “Deep brain stimulation of the dentate nucleus improves cerebellar ataxia after cerebellar stroke.,” Neurology, vol. 85, no. 23, pp. 2075–6, 2015. [DOI] [PubMed] [Google Scholar]
  • [8].Miterko LN et al. , “Consensus Paper: Experimental Neurostimulation of the Cerebellum,” Cerebellum, vol. 18, no. 6, pp. 1064–1097, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Deoni SCL and Catani M, “Visualization of the deep cerebellar nuclei using quantitative T1 and ρ magnetic resonance imaging at 3 Tesla,” Neuroimage, vol. 37, no. 4, pp. 1260–1266, 2007. [DOI] [PubMed] [Google Scholar]
  • [10].Diedrichsen J, “A spatially unbiased atlas template of the human cerebellum,” Neuroimage, vol. 33, no. 1, pp. 127–138, 2006. [DOI] [PubMed] [Google Scholar]
  • [11].Diedrichsen J, Balsters JH, Flavell J, Cussans E, and Ramnani N, “A probabilistic MR atlas of the human cerebellum,” Neuroimage, vol. 46, no. 1, pp. 39–46, 2009. [DOI] [PubMed] [Google Scholar]
  • [12].Ye C, Bogovic JA, Bazin PL, Prince JL, and Ying SH, “Fully automatic segmentation of the dentate nucleus using diffusion weighted images,” in Proc. ISBI, 2012, pp. 1128–1131. [Google Scholar]
  • [13].Carass A et al. , “Comparing fully automated state-of-the-art cerebellum parcellation from magnetic resonance images,” Neuroimage, vol. 183, pp. 150–172, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Abosch A, Yacoub E, Ugurbil K, and Harel N, “An assessment of current brain targets for deep brain stimulation surgery with susceptibility-weighted imaging at 7 tesla,” Neurosurgery, vol. 67, no. 6, pp. 1745–1756, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Cho etal Z-H., “Direct visualization of deep brain stimulation targets in Parkinson disease with the use of 7-tesla magnetic resonance imaging,” J Neurosurg, vol. 113, no. 3, pp. 639–647, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Kerl HU, Gerigk L, Pechlivanis I, Al-Zghloul M, Groden C, and Nolte IS, “The subthalamic nucleus at 7.0 Tesla: Evaluation of sequence and orientation for deep-brain stimulation,” Acta Neurochir. (Wien)., vol. 154, no. 11, pp. 2051–2062, 2012. [DOI] [PubMed] [Google Scholar]
  • [17].Diedrichsen etal J., “Imaging the deep cerebellar nuclei: A probabilistic atlas and normalization procedure,” Neuroimage, vol. 54, no. 3, pp. 1786–1794, 2011. [DOI] [PubMed] [Google Scholar]
  • [18].Thurling M et al. , “Activation of the dentate nucleus in a verb generation task: A 7T MRI study,” Neuroimage, vol. 57, no. 3, pp. 1184–1191, 2011. [DOI] [PubMed] [Google Scholar]
  • [19].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [DOI] [PubMed] [Google Scholar]
  • [20].Shelhamer E, Long J, and Darrell T, “Fully Convolutional Networks for Semantic Segmentation,” in Proc. CVPR, 2015, pp. 3431–3440. [DOI] [PubMed] [Google Scholar]
  • [21].Kamnitsas K et al. , “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Med. Image Anal, vol. 36, pp. 61–78, 2017. [DOI] [PubMed] [Google Scholar]
  • [22].Dolz J, Desrosiers C, and Ben Ayed I, “3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study,” Neuroimage, vol. 170, pp. 456–470, 2018. [DOI] [PubMed] [Google Scholar]
  • [23].Ronneberger O, Fischer P, and Brox T, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Proc. MICCAI, 2015, pp. 234–241. [Google Scholar]
  • [24].Schlemper J et al. , “Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images,” Med. Image Anal, vol. 53, pp. 197–207, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Jegou S, Drozdzal M, Vazquez D, Romero A, and Bengio Y, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation,” in Proc. CVPR Workshop, 2017, pp. 11–19. [Google Scholar]
  • [26].Khened M, Kollerathu VA, and Krishnamurthi G, “Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers,” Med. Image Anal, vol. 51, pp. 21–45, 2019. [DOI] [PubMed] [Google Scholar]
  • [27].Huang G, Liu Z, and Weinberger KQ, “Densely Connected Convolutional Networks,” in Proc. CVPR, 2017, pp. 2261–2269. [Google Scholar]
  • [28].Soujanya Chilla G, Heng Tan C, Xu C, and Loo Poh C, “Diffusion weighted magnetic resonance imaging and its recent trend—a survey,” Quant Imaging Med Surg, vol. 5, no. 3, pp. 407–422, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Baliyan V, Das CJ, Sharma R, and Kumar Gupta A, “Diffusion weighted imaging: Technique and applications,” World J. Radiol, vol. 8, no. 9, pp. 785–798, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Bernal J, Kushibar K, Cabezas M, Valverde S, Oliver A, and Llado X, “Quantitative Analysis of Patch-Based Fully Convolutional Neural Networks for Tissue Segmentation on Brain Magnetic Resonance Imaging,” IEEE Access, vol. 7, pp. 89986–90002, 2019. [Google Scholar]
  • [31].Yu F and Koltun V, “Multi-scale context aggregation by dilated convolutions,” in Proc. ICLR, 2016, pp. 1–13. [Google Scholar]
  • [32].Chen L-C, Papandreou G, Kokkinos I, Murphy K, and Yuille AL, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 40, no. 4, pp. 834–848, 2018. [DOI] [PubMed] [Google Scholar]
  • [33].Gu Z et al. , “CE-Net: Context Encoder Network for 2D Medical Image Segmentation,” IEEE Trans. Med. Imaging, vol. 38, no. 10, pp. 2281–2292, 2019. [DOI] [PubMed] [Google Scholar]
  • [34].Li Y, Zhang X, and Chen D, “CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes,” in Proc. CVPR, 2018, pp. 1091–1100. [Google Scholar]
  • [35].Chen L-C, Zhu Y, Papandreou G, Schroff F, and Adam H, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” in Proc. ECCV, 2018, pp. 1–18. [Google Scholar]
  • [36].Hashemi SR, Prabhu SP, Warfield SK, and Gholipour A, “Exclusive Independent Probability Estimation using Deep 3D Fully Convolutional DenseNets?: Application to IsoIntense Infant Brain MRI Segmentation,” in Proc. MIDL, 2019, pp. 260–274. [Google Scholar]
  • [37].Milletari F, Navab N, and Ahmadi S-A, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” in Proc. 3D Vision (3DV), 2016, pp. 565–571. [Google Scholar]
  • [38].Hashemi SR, Salehi SSM, Erdogmus D, Prabhu SP, Warfield SK, and Gholipour A, “Asymmetric Loss Functions and Deep Densely-Connected Networks for Highly-Imbalanced Medical Image Segmentation?: Application to Multiple Sclerosis Lesion Detection,” IEEE Access, vol. 7, pp. 1721–1735, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Karimi D and Salcudean SE, “Reducing the Hausdorff Distance in Medical Image Segmentation with Convolutional Neural Networks,” IEEE Trans. Med. Imaging, vol. 39, no. 2, pp. 499–513, 2020. [DOI] [PubMed] [Google Scholar]
  • [40].Roy AG, Conjeti S, Navab N, and Wachinger C, “QuickNAT: A fully convolutional network for quick and accurate segmentation of neuroanatomy,” Neuroimage, vol. 186, pp. 713–727, 2019. [DOI] [PubMed] [Google Scholar]
  • [41].Radosavovic I, Dollar P, Girshick R, Gkioxari G, and He K, “Data Distillation: Towards Omni-Supervised Learning,” in Proc. CVPR, 2018. [Google Scholar]
  • [42].Dolz J et al. , “Multiregion segmentation of bladder cancer structures in MRI with progressive dilated convolutional networks,” Med. Phys, vol. 45, no. 12, pp. 5482–5493, 2018. [DOI] [PubMed] [Google Scholar]
  • [43].Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, and Belongie S, “Feature Pyramid Networks for Object Detection,” in Proc. CVPR, 2017, pp. 936–944. [Google Scholar]
  • [44].Sadegh S, Salehi M, Erdogmus D, and Gholipour A, “Tversky loss function for image segmentation using 3D fully convolutional deep networks,” in Proc. MICCAI Workshop (MLMI), 2017, pp. 379–387. [Google Scholar]
  • [45].He K, Goyal P, Girshick R, Dollar P, and Lin T-Y, “Focal loss for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell. to be Publ, 2018. [DOI] [PubMed] [Google Scholar]
  • [46].Jetley S, Lord NA, Lee N, and Torr PHS, “Learn to pay attention,” in ICLR, 2018. [Google Scholar]
  • [47].Dice LR, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. [Google Scholar]
  • [48].Hinton G and Dean J, “Distilling the Knowledge in a Neural Network,” in Proc. NIPS Workshop (Deep Learning), 2014, pp. 1–9. [Google Scholar]
  • [49].https://fsl.fmrib.ox.ac.uk/fsl/fslwiki.”.
  • [50].Plantinga BR et al. , “Individualized parcellation of the subthalamic nucleus in patients with Parkinson’s disease with 7T MRI,” Neuroimage, vol. 168, pp. 403–411, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Duchin Y et al. , “Patient-specific Anatomical Model for Deep Brain Stimulation based on 7 Tesla MRI,” PLoS One, vol. 13, no. 8, pp. 1–23, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Kim J et al. , “Automatic localization of the subthalamic nucleus on patient-specific clinical MRI by incorporating 7 T MRI and machine learning: Application in deep brain stimulation,” Hum. Brain Mapp, vol. 40, no. 2, pp. 679–698, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].He K, Gkioxari G, Dollar P, and Girshick R, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 8828, pp. 1–13, 2018. [DOI] [PubMed] [Google Scholar]
  • [54].https://keras.io.”.
  • [55].https://www.tensorflow.org.”.
  • [56].https://www.fil.ion.ucl.ac.uk/spm/software/spm12.”.
  • [57].He K, Zhang X, Ren S, and Sun J, “Delving Deep into Rectifiers?: Surpassing Human-Level Performance on ImageNet Classification,” in Proc. ICCV, 2015, pp. 1026–1034. [Google Scholar]
  • [58].Kingma DP and Ba JL, “ADAM: a method for stochastic optimization,” in Proc. ICLR, 2015. [Google Scholar]
  • [59].Ashburner J, “A fast diffeomorphic image registration algorithm,” Neuroimage, vol. 38, no. 1, pp. 95–113, 2007. [DOI] [PubMed] [Google Scholar]
  • [60].Shamir RR, Duchin Y, Kim J, Sapiro G, and Harel N, “Continuous Dice Coefficient: a Method for Evaluating Probabilistic Segmentations,” arXiv:1906.11031, 2019. [Google Scholar]
  • [61].Keser Z et al. , “Diffusion tensor imaging of the human cerebellar pathways and their interplay with cerebral macrostructure,” Front. Neuroanat, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Khadilkar S, Jaggi S, Patel B, Yadav R, Hanagandi P, and Faria Do Amaral LL, “A practical approach to diseases affecting dentate nuclei,” Clin. Radiol, vol. 71, pp. 107–119, 2016. [DOI] [PubMed] [Google Scholar]
  • [63].Rasouli J et al. , “Utilization of Quantitative Susceptibility Mapping for Direct Targeting of the Subthalamic Nucleus During Deep Brain Stimulation Surgery,” Oper. Neurosurg, vol. 0, no. 0, pp. 1–8, 2017. [DOI] [PubMed] [Google Scholar]
  • [64].Lauzon ML, McCreary CR, McLean DA, Salluzzi M, and Frayne R, “Quantitative susceptibility mapping at 3 T: comparison of acquisition methodologies,” NMR Biomed, vol. 30, no. 4, 2016. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-2998537

RESOURCES