Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 1.
Published in final edited form as: Comput Med Imaging Graph. 2025 Jan 4;120:102489. doi: 10.1016/j.compmedimag.2024.102489

DDEvENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI

Chenjun Li 1,*, Dian Yang 1,*, Shun Yao 2, Shuyue Wang 3, Ye Wu 4, Le Zhang 1, Qiannuo Li 5, Kang Ik Kevin Cho 6, Johanna Seitz-Holland 6, Lipeng Ning 6, Jon Haitz Legarreta 6, Yogesh Rathi 6, Carl-Fredrik Westin 6, Lauren J O’Donnell 6, Nir A Sochen 7, Ofer Pasternak 6,, Fan Zhang 1,
PMCID: PMC11792617  NIHMSID: NIHMS2046916  PMID: 39787735

Abstract

In this study, we developed an Evidential Ensemble Neural Network based on Deep learning and Diffusion MRI, namely DDEvENet, for anatomical brain parcellation. The key innovation of DDEvENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. To do so, we design an evidence-based ensemble learning framework for uncertainty-aware parcellation to leverage the multiple dMRI parameters derived from diffusion MRI. Using DDEvENet, we obtained accurate parcellation and uncertainty estimates across different datasets from healthy and clinical populations and with different imaging acquisitions. The overall network includes five parallel subnetworks, where each is dedicated to learning the FreeSurfer parcellation for a certain diffusion MRI parameter. An evidence-based ensemble methodology is then proposed to fuse the individual outputs. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality diffusion MRI data from healthy adults and clinically diffusion MRI data from participants with various brain diseases (schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson’s disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Compared to several state-of-the-art methods, our experimental results demonstrate highly improved parcellation accuracy across the multiple testing datasets despite the differences in dMRI acquisition protocols and health conditions. Furthermore, thanks to the uncertainty estimation, our DDEvENet approach demonstrates a good ability to detect abnormal brain regions in patients with lesions that are consistent with expert-drawn results, enhancing the interpretability and reliability of the segmentation results.

Keywords: Diffusion MRI, Uncertainty Estimation, Brain Parcellation, Deep Learning

1. Introduction

Parcellation of cortical and subcortical brain regions is a vital step in neuroimaging analysis for mapping the structural and functional regions of the brain (Cloutman & Lambon Ralph, 2012; Ji et al., 2019). Diffusion MRI (dMRI) (Basser & Pierpaoli, 2011) is an advanced MRI technique that characterizes tissue microstructure and uniquely enables in vivo mapping of the brain white matter connections (Zhang et al., 2022). Parcellation is an essential step for many computational dMRI applications, such as fiber tract identification (Wassermann et al., 2016), structural connectome construction (Sporns et al., 2005), and clinical applications, such as characterizing abnormal regional microstructure across brain disorders (Seitz et al., 2018). Most approaches for dMRI parcellation compute the parcellation from anatomical MRI (T1- or T2-weighted) data and then register it to the dMRI space. However, the inter-modality registration presents difficulties due to image distortions (Albi et al., 2018; Jones & Cercignani, 2010; Wu et al., 2008), and the low resolution of dMRI data (Malinsky et al., 2013), often leading to inaccuracies in the resulting brain parcellation. Additionally, these methods cannot be applied without collecting accompanying anatomical MRI data.

Advanced techniques have been developed for direct brain parcellation from dMRI data using deep learning, eliminating the need for inter-modality registration and thus enhancing parcellation performance (Ciritsis et al., 2018; Liu et al., 2007; Theaud et al., 2022; Wen et al., 2013; Yap et al., 2015; Zhang et al., 2021, 2023), with improved accuracy and efficiency (Billot et al., 2023; Theaud et al., 2022; Zhang et al., 2023). Early work using deep learning typically employs encoder-decoder architecture with convolutional neural networks (CNNs) for efficient feature extraction and segmentation (Theaud et al., 2022; Zhang et al., 2021). More recent methods (Hayat & Aramvith, 2024; Z. Li et al., 2023) have considered using data fusion techniques with attention mechanisms to leverage more contextual information and improve accuracy. However, an important limitation of these advanced methods is that they were trained and tested on data that are closely matched in terms of subject characteristics and imaging acquisition protocols. Their effectiveness diminishes when applied to out-of-distribution data, i.e., data that is different from the training data. As a result, the generalizability of the existing dMRI parcellation methods to data from different sources remains a challenge. In general, there are several major sources of variability that lead to uncertainty and poorer generalizability in deep learning model predictions in dMRI parcellation. First, it is well known that variations across acquisition sites, arising from different MRI scanners and/or acquisition protocols can result in significant biases in dMRI data (Karayumak et al., 2019; Y. Li et al., 2024; Tax et al., 2019). Site variability can result in poor generalizability to data from different centers/sites/scanners. Second, the parcellation performance can be compounded by differences in spatial resolution, leading to varying overlaps or partial volumes between anatomical structures (Guevara et al., 2020). Consequently, the certainty in the parcellation prediction in the boundary between regions or tissue types diminishes. Third, current parcellation methods are primarily designed and/or trained on the brains of healthy individuals with normal structural appearances. As a result, they can be erroneous when applied to brains with abnormalities such as lesions, brain tumors, and white matter hyperintensity (WMH). Parcellation of out-of-distribution data (e.g., Figure 1 with a dataset containing a glioma) is challenging not just for deep learning approaches but also for traditional parcellation tools such as FreeSurfer (Fischl, 2012).

Figure 1:

Figure 1:

Example of out-of-distribution parcellation. (a) shows the input image data with apparent abnormal lesion regions (glioma) within the red rectangle, which constitutes this scan as an out-of-distribution scan. (b) shows the parcellation result using the widely used FreeSurfer software (Fischl, 2012), where the lesion boundary is mislabeled as being part of the cortex. (c) shows the parcellation result using FastSurfer (Henschel et al., 2020), a recently proposed deep-learning approach. While we can observe a visually improved parcellation, there is still apparent mislabeling in the glioma (shown with yellow arrows). (d) shows the uncertainty estimation using our proposed DDEvENet method, which outlines tissue boundaries as well as abnormal brain regions and is utilized to improve parcellation.

Including uncertainty estimation techniques in the learning process is an important way to improve generalizability of deep neural networks (Abdar et al., 2021; Kendall & Gal, 2017; Zou et al., 2023), and is especially useful to their application on out-of-distribution data. In the context of anatomical brain parcellation, the ability to estimate uncertainty in the model’s predictions can help pinpoint potential areas in which the model is not accurate, and offer extra insights into anatomical variations that cause the uncertainty. In turn, the estimation of uncertainty can be included in the training process to reduce mis-labeling in areas of high uncertainty. For example, the estimation of uncertainty for the glioma case (Figure 1) highlights lesion regions as having high uncertainty. In the literature, one widely used approach for uncertainty estimation is based on Bayesian neural networks (BNNs) (Springenberg et al., 2016) that calculate a probability distribution over network weights to model parameter uncertainty in each voxel. However, BNNs suffer from high computational complexity because of the difficulty of efficiently computing posterior distributions over the model parameters. Monte Carlo Dropout (Gal & Ghahramani, 2016) is a method to obtain uncertainty estimates by performing dropout at inference time and running the model multiple times. It introduces randomness into the model to let it behave like an ensemble of networks but requires careful and complicated adjustment of NN structures, which may in turn affect model performance. The Deep Ensemble approach (Lakshminarayanan et al., 2017) attempts to overcome overfitting and improves predictive performance by training multiple deep models and aggregating their predictions. dMRI data is multi-dimensional and orientation-dependent (Johansen-Berg & Behrens, 2013; Shi et al., 2012; Wee et al., 2011), from which multiple parameters (e.g., fractional anisotropy; FA, and mean diffusivity; MD) can be derived to describe multiple different microstructural properties. In this case, the dMRI data is inherently suitable for the application of ensemble learning techniques (Lella et al., 2021; Winzeck et al., 2019), that can leverage information across different features to improve parcellation results. Yet, utilizing these ensembles necessitates model optimization for each network in the ensemble, demanding significant amounts of computational resources.

Evidential deep learning (H. Li et al., 2023; Sensoy et al., 2018; Zou et al., 2022), rooted in the principles of subjective logic theory (Jsang, 2018), offers a simple yet effective tool for uncertainty estimation. Importantly, this relatively new approach provides improved results, especially for the classification of out-of-distribution samples (Sensoy et al., 2018). So far, evidential deep learning has been used for few-class segmentation such as binary tumor segmentation (H. Li et al., 2023; Zou et al., 2022), but holds great potential for multi-class segmentation (e.g. parcellation of the brain image into over tens of regions) for its high efficiency and ease of implementation. Subjective logic theory extends traditional probability theory by introducing the notion of uncertainty that integrates evidence and belief (Jsang, 2018). Unlike standard logic, where propositions are dichotomously considered true or false, or probabilistic logic, which assigns probabilities in the range [0, 1] to all arguments, evidential deep learning explicitly acknowledges a probability for uncertainty (Sensoy et al., 2018). In this framework, the network is trained to simultaneously optimize data fit while maximizing evidence, hence minimizing uncertainty.

In this study, we propose the Evidence-based Ensemble Neural Network DDEvENet, a novel uncertainty-aware deep learning method for anatomical brain parcellation of cortical and subcortical regions directly from dMRI data. The novelty of our approach lies primarily in the application of evidential deep learning combined with ensemble methods for uncertainty-aware brain parcellation. Our method is built on established neural network architectures and techniques but tailored to consider the multi-parameter property of dMRI data, enhancing both the accuracy and interpretability of parcellation results. The key innovation of DDEvENet is its utilization of evidential deep learning to quantify uncertainty at each voxel during a single inference. As a result, even though DDEvENet is trained using data from healthy individuals scanned with a single MRI protocol, it is able to provide accurate parcellation across different datasets with different imaging acquisitions, and from different populations, including individuals with brain disorders. In addition, we combine the uncertainty estimation across different channels, this way DDEvENet leverages multiple dMRI parameters to perform fine-scale parcellation, followed by an evidence-based ensemble fusion of the output from each dMRI parameter for final parcellation predictions and uncertainty estimation. Given that different diffusion parameters describe tissue properties from varying perspectives, incorporating uncertainty estimation can enhance prediction results by accounting for the differences in outputs from each parameter. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality dMRI images from healthy adults and clinically dMRI data from participants with various brain disorders (including schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson’s disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Our results indicate not only improved parcellation accuracy but also the ability to detect abnormal brain regions, suggesting DDEvENet’s robustness and generalization capabilities in both research and clinical settings.

2. Methods

The goal of the proposed DDEvENet method is to compute an anatomical parcellation directly from the input dMRI data while providing an uncertainty map highlighting image voxels with low prediction confidence (see an overview of the method in Figure 2). We first compute five diffusion parameter maps from the diffusion-weighted images (DWIs), describing different microstructural properties of the brain (Section 2.1). Then, for each input parameter, an evidential learning subnetwork is used to perform parameter-specific parcellation prediction and uncertainty estimation (Section 2.2). Finally, evidence-based ensemble learning is performed to fuse the five parameter-specific subnetworks and compute the final parcellation and uncertainty map (Section 2.3).

Figure 2:

Figure 2:

DDEvENet overview. Five parameter maps calculated from the diffusion-weighted images (DWIs) are used as input to train the corresponding subnetworks, and the FreeSurfer-based parcellation is used as the ground truth. Incorporating evidential loss, the subnetworks produce voxel-wise evidence that can be further parameterized as Dirichlet distributions and output evidence-based uncertainty. The evidence-based uncertainty is used as a criterion for ensemble prediction maps, and the final uncertainty heatmap is calculated using the entropy of the averaged evidence.

2.1. Network Architecture

The overall network architecture has two major components, a backbone network that is based on the DDParcel method (Zhang et al., 2023) and an uncertainty estimation network (Sections 2.2 and 2.3). In brief, the backbone network comprises five subnetworks, where each subnetwork takes an individual dMRI parameter as the input and is trained with the T1w-based FreeSurfer (FS) parcellation to predict brain parcellation from each diffusion parameter map. The network input includes five parameter maps derived from the diffusion tensor imaging (DTI) model (Pierpaoli & Basser, 1996), including: FA that quantifies the variability of diffusion across different orientations, MD that quantifies the overall magnitude of diffusion, and E1, E2, and E3 that are the three eigenvalues of the diffusion tensor. Each subnetwork uses the FastSurferCNN (Henschel et al., 2020) architecture that is designed to perform T1w-based FS parcellation using deep learning. Given that different diffusion parameters describe the tissue properties from varying perspectives, resulting in parcellations differently from each other, evidential deep learning is used for estimation of the uncertainty of the outputs from the different subnetworks. To do so, the final classifier module with 1×1 convolution and softmax activation used in DDParcel is replaced with a softplus layer to enable a later integration of the evidence learning module that is described in detail below.

2.2. Evidential Deep Learning

In the proposed DDEvENet, to enable the use of evidential learning, the output layer of each subnetwork includes the evidence learning module (Sensoy et al., 2018; Zou et al., 2022)], rather than using the classifier module from FastSurferCNN. Instead of applying a softmax on the outputs of the network, the outputs are identified as evidence emq,c in each subnetwork q that assigns the voxel m to the class n. The evidence learning module, as described in (Sensoy et al., 2018), transforms the evidence into beliefs and uncertainty. Beliefs pmq,c are defined as:

pmq,c=emq,csm (1)

where Sm=n=1Nemq,n+1. For subnetwork q, at voxel m, the sum of beliefs pmq,n across labels N and uncertainty umq is 1 (Sensoy et al., 2018):

n=1Npmq,n+umq=1 (2)

The uncertainty of classification of subnetwork q at voxel m can be then directly obtained as:

umq=Sm-n=1Nemq,nSm=NSm (3)

The evidence and model parameters are learned by the minimization of the overall loss:

L=LDice+λLEDL. (4)

Here, LDice is the standard Dice loss function (Sudre et al., 2017) computed against the ground truth T1w-based FS parcellation, which ensures stable parcellation across all FS regions of the entire brain. LEDL is the evidential loss function, which enables the network to learn the evidence emn,q. The values of e are non-negative, ranging from zero to infinity, with larger values indicating higher confidence in the prediction. Specifically, LEDL is given as

LEDL=Lrce+λklLKL (5)

Here, Lrce is a revised cross-entropy loss that adds Bayes risk to the conventional cross-entropy term to regulate the evidential learning process (Sensoy et al., 2018), and LKL is the Kullback-Leibler (KL) divergence that ensures incorrect labels produce less evidence (Sensoy et al., 2018). Specifically, the computation of Lrce and LKL begins with using the evidence as the parameters of a Dirichlet distribution with αmn,q=emn,q+1, leading to the loss function (Sensoy et al., 2018):

Lrce=n=1NymnψSm-ψαmn (6)

where ymn is either one or zero for the ground truth class n in voxel m, and ψ(.) is the digamma function. LKL is the Kullback-Leibler (KL) divergence, which is calculated as:

LKL=logΓΣn=1Nα˜mnΓ(N)n=1NΓα~mn+n=1Nα~mn-1ψα~mn-ψn=1Nα~mn (7)

where α˜mn=ymn+1-ymnαmn.λ and λkl are parameters that scale the contribution of the three loss functions.

2.3. Evidence-based Ensemble Learning for Uncertainty Estimation

Leveraging the outputs obtained from the five sub-networks, the entropy based on the averaged beliefs is used for the final uncertainty estimation.

Specifically, each subnetwork produces a set of beliefs pmq,c and an associated uncertainty umq for each voxel m, where q indexes the subnetwork corresponding to each dMRI parameter, and c indexes the class labels. To combine the information from the multiple subnetworks, we first compute an average belief across all subnetworks for each class at each voxel:

pmc=1Mq=1Mpmq,c (8)

where N is the total number of class labels. The entropy um quantifies the uncertainty in the final prediction at voxel m, with higher entropy indicating higher uncertainty.

Next, we compute the entropy of the averaged belief distribution at each voxel to obtain the final uncertainty estimation:

um=-c=1Npmclogpmc (9)

where N is the total number of class labels. The entropy um quantifies the uncertainty in the final prediction at voxel m, with higher entropy indicating higher uncertainty.

The entropy of classification probabilities is usually considered as an effective estimation of uncertainty (Luo et al., 2019), (Gawlikowski et al., 2023). Here we used the entropy of beliefs to calculate the uncertainty, as the beliefs and probabilities have similar ranges and implications. The network-specific uncertainty is not directly related to the final uncertainty but can show how uncertain a subnetwork or a type of input is in general (See Figures 4 and 5: some individual network uncertainty outputs are obviously brighter and clearer than the others), which also provides valuable information.

Figure 4:

Figure 4:

Visualization of uncertainty estimations on a randomly selected patient scan with WMH. The left part shows the input and the evidence-based uncertainty from three of the subnetworks with a comparison to the manually segmented WMH mask. The right side shows the final parcellation and uncertainty estimation.

Figure 5:

Figure 5:

Visualization of uncertainty estimations on another randomly selected patient scan with BT. This scan fails to run properly on the FS software due to the existence of the tumor region. The left part shows the input and the evidence-based uncertainty from three of the subnetworks. The final output with a comparison to the manually segmented mask is provided.

Finally, the evidence-based ensemble classification is performed based on the outputs of the five subnetworks. For each voxel, the subnetwork prediction with the minimum subnetwork uncertainty is adopted as the final prediction, described as:

Qm=argminq{FA,MD,E1,E2,E3}umq (10)
Classm=argmaxy{1,2,3,,N-1,N}pm,Qm,y (11)

This means that for each voxel, we rely on the subnetwork that is most confident (i.e., has the lowest uncertainty) in its prediction at that location. The rationale behind this choice is that different dMRI parameters may provide more reliable information in different regions due to their sensitivity to different microstructural properties. By selecting the prediction from the subnetwork with the lowest uncertainty, we aim to leverage the most trustworthy information available at each voxel.

Entropy-based uncertainty affects the final segmentation by providing a measure of confidence in the subnetwork predictions, which can be used to inform further processing or decision-making. For instance, regions with high entropy may correspond to areas where the model is uncertain due to factors such as image noise, anatomical variability, or pathology. By identifying these regions or modalities, clinicians or researchers can focus their attention on areas that may require manual review or additional imaging.”

This approach is straightforward yet effective in that it leverages the evidence and uncertainty the model learns from each of the dMRI parameters, but it does not require complicated training. Since each dMRI parameter reveals different brain microstructures, it is also reasonable to treat each subnetwork as an individual source of evidence, but not to concatenate them into a trainable neural network again.

2.4. Implementation and Parameter Settings

Our method is implemented using Pytorch 2.0 (Paszke et al., 2019) and trained on a server equipped with NVIDIA RTX3090 GPUs. For each subnetwork, the development of our model is based on FastSurferCNN (https://github.com/Deep-MI/FastSurfer) and DDparcel (https://github.com/zhangfanmark/DDParcel). Code pieces in (Zou et al., 2022) are also used for the calculation of evidential loss (https://github.com/Cocofeat/TBraTS). We adopted the configurations such as learning rate and number of input channels as recommended by the FastSurferCNN paper, considering these values as a starting point for our model. We employed Adam as the optimizer (Kingma, 2014) because of its effectiveness in handling sparse gradients on noisy problems. A learning rate decay strategy was implemented to enhance training dynamics, with an initial rate of 0.01, then decreased by 95% every five epochs over 200 epochs. The models were iteratively updated using a batch size of 8, which balances the trade-off between memory constraints and the benefits of mini-batch training. For the overall loss, the parameters λ and λkl are empirically set to 0.7 and 0.4, respectively, to balance the model’s focus between parcellation performance and quality of uncertainty estimation. To improve the generalization, we applied data augmentation techniques during training. Specifically, we performed random rotations (±15 degrees), scaling (90% to 110%), and horizontal flipping on the input images. These augmentations were applied on-the-fly during training. The use of data augmentation resulted in a 2% increase in the Dice coefficient on the validation set, indicating enhanced model robustness.

3. Experimental Evaluation

3.1. Experimental Datasets

We evaluate the proposed DDEvENet method using dMRI datasets from multiple independently acquired populations (see Table 1), including: (1) 350 young healthy adults (28.1 ± 3.2 years old, 179 females and 171 males) in the Human Connectome Project (HCP) database (Glasser et al., 2013, 2016); (2) 50 young adults with diverse psychiatric conditions (37.6 ± 9.2 years old, 20 females and 30 males) consisting of 10 schizophrenia, 10 bipolar disorder, and 10 attention-deficit/hyperactivity disorder, and 20 healthy controls from the Consortium for Neuropsychiatric Phenomics (CNP) database (Poldrack et al., 2016); (3) 50 elderly adults (62.8 ± 7.1 years old, 25 females and 25 males), consisting of 25 healthy controls and 25 patients diagnosed with Parkinson’s disease from The Parkinson’s Progression Markers Initiative (PPMI) database; (4) 11 cerebral small vessel disease patients (62.6 ± 19.8 years old, 2 females and 9 males) with visible white matter hyperintensity (WMH) at the Second Affiliated Hospital ofZhejiang University School of Medicine, China; (5) 22 neurosurgical patients (48.1 ± 23.1 years old, 9 females and 13 males) diagnosed with brain tumors (BT) from the First Affiliated Hospital of Sun Yat-sen University, China. In the rest of the paper, we refer to these datasets as the HCP, CNP, PPMI, WMH, and BT datasets. Usage of the in-house WMH and BT datasets was approved by the local ethics committees at the Second Affiliated Hospital of Zhejiang University School of Medicine and the First Affiliated Hospital of Sun Yat-sen University, respectively.

Table 1:

Demographic information and diffusion MRI acquisition details of the datasets under studya

Dataset # Subjects Age Gender Health Condition dMRI data

HCP 350 28.1 ± 3.2 179 F, 171 M 350 healthy b= 1000 s/mm2, 108 directions, TE/TR=89.5/5520 ms, resolution=1.25×1.25×1.25 mm3
CNP 50 37.6 ± 9.2 20 F, 30 M 20 healthy, 10 BP, 10 SZ, 10 ADHD b= 1000 s/mm2, 64 directions, TE/TR=93/9000 ms, resolution=2×2×2 mm3
PPMI 50 62.8 ± 7.1 25 F, 25 M 25 healthy, 25 PD b= 1000 s/mm2, 64 directions, TE/TR=88/7600 ms, resolution=2×2×2 mm3
WMH 11 62.6 ± 19.8 2 F, 9 M 11 WMH b= 1000 s/mm2, 30 directions, TE/TR=80.8/8000 ms, resolution=2×2×2 mm3
BT 22 48.1 ± 23.1 9 F, 13 M 22 BT b= 1000 s/mm2, 64 directions, TE/TR=79/22000 ms, resolution=2×2×2 mm3

Total 483
a

Abbreviations: Dataset: HCP - Human Connectome Project; CNP - Consortium for Neuropsychiatric Phenomics; PPMI - Parkinson’s Progression Markers Initiative; WMH - white matter hyperintensity; BT - Brain Tumor. Gender: F - female; M - male. Health Condition: BP - bipolar disorder; SZ - schizophrenia; ADHD - attention-deficit/hyperactivity disorder; PD - Parkinson’s disease.

In our study, the high-quality HCP datasets are used to train the brain parcellation model (n=200), as well as for model validation (n=100) and testing (n=50). The CNP, PPMI, WMH, and BT datasets, which are dMRI data collected through clinical acquisition protocols for studies focused on clinical applications, are utilized to evaluate the generalization capabilities of the trained model across various populations, acquisition protocols, and scanners.

3.1.1. MRI Acquisition and Preprocessing

Table 1 gives an overview of the diffusion image acquisitions of the datasets under study. These dMRI datasets were scanned with different diffusion imaging protocols, as follows. (1) The HCP data were acquired with a high-quality image acquisition protocol using a customized 3T Connectome Siemens Skyra scanner. The acquisition parameters are TE = 89.5 ms, TR = 5520 ms, voxel size = 1.25 × 1.25 × 1.25 mm3. A total of 288 images were acquired for each subject, including 18 baseline images and 270 diffusion-weighted images evenly distributed at three shells of b = 1000/2000/3000 s/mm2. (2) The CNP data were acquired using a 3T Siemens TrioTim scanner. For dMRI data, the acquisition parameters were: TE = 93 ms, TR = 9000 ms, and voxel size = 2 × 2 × 2 mm3. Each subject underwent the acquisition of 65 volumes, consisting of 1 baseline image and 64 diffusion-weighted images at b = 1000 s/mm2. For anatomical T1w data, the acquisition parameters included TE = 2.26 ms, TR = 1900 ms, and voxel size of 1 × 1 × 1mm3. (3) The PPMI data were acquired using a 3T Siemens Trio scanner. The dMRI data acquisition parameters are TE = 88 ms, TR=7600 ms, and voxel size = 2 × 2 × 2 mm3. A total of 65 volumes were acquired for each subject, including 1 baseline image with b = 0 s/mm2 and 64 volumes at b = 1000 s/mm2. (4) The WMH data were acquired using a 3T GE Healthcare MR750 scanner. For dMRI data, the acquisition parameters were: TE = 80.8 ms, TR = 8000 ms, and voxel size = 2 × 2 × 2 3. The dMRI data was acquired with 30 non-collinear diffusion sensitization directions using a b-value of 1000 s/mm2. Additionally, 5 volumes were obtained with no diffusion weighting (b-value = 0 s/mm2). The other dMRI parameters included a flip angle of 90 degree and a slice thickness of 2 mm with no inter-slice gap. (5) The BT data were acquired using a 3T Siemens Prisma scanner. The acquisition parameters for the dMRI data included TE = 79 ms, TR = 22,000 ms, and a voxel size of 2 × 2 × 2 mm3. A total of 65 volumes were acquired for each subject, including 1 baseline image with b = 0 s/mm2 and 64 volumes with b = 1000 s/mm2.

For HCP, the provided dMRI data was processed following the HCP minimum processing pipeline (Glasser et al., 2013), including brain masking, motion correction, eddy current correction, EPI distortion correction, and rigid registration to the MNI space. For CNP, PPMI, WMH and BT, the dMRI data was processed as described in our previous study using a well-established pipeline (Zhang et al., 2018) (https://github.com/pnlbwh/pnlpipe), including eddy current-induced distortion correction, motion correction, and echo-planar imaging EPI distortion correction. Input diffusion parameters are computed using SlicerDMRI (Norton et al., 2017; Zhang et al., 2020). T1w-based FS parcellation is computed for each testing subject and used as ground truth for quantitative evaluation. In the BT dataset, two subjects fail to run parcellation using the Freesurfer software because of abnormal brain structures. Therefore, a subset of the BT dataset containing 20 subjects is used for quantitative comparison and the remaining 2 subjects are used for visual assessment of the uncertainty estimation results.

3.2. Experimental Design

We perform three experimental evaluations. First, we conduct an ablation study to assess the effectiveness of different ensemble criteria within our parcellation framework (Section 3.2.1). Second, we compare the proposed method to five state-of-the-art dMRI parcellation methods (Section 3.2.2). Third, we evaluate the versatility and robustness of our method on multiple dMRI datasets with varying imaging conditions (Section 3.2.3).

3.2.1. Ablation Study

First, the effectiveness of the ensemble criteria and backbone network is evaluated in an ablation study, a comparison was performed among the following methods, including: (1) the probability-based method, (2) the entropy-based method, and (3) the evidence-based method (proposed). Specifically, the probability-based method averages prediction probabilities derived from the original softmax activation layers. The entropy-based method calculates the entropy of the output probabilities and selects the label with minimal entropy as the final prediction for each voxel. The evidence-based method (proposed) employs the evidence learning module and conducts ensemble segmentation as described in Section 2.3. For the former two methods, we use the default loss settings recommended in the original papers to train the subnetworks. For evidence-based method, we also incorporate evidential deep learning loss. We test these three criteria of the ensemble on three backbone networks, including FastSurfer, nnU-Net, and Swin UNETR. The evaluation is performed using the HCP testing data. For each compared method and each HCP testing subject, the Dice score, recall, and intersection over union (IoU) for each FS region is computed between the prediction and the ground truth parcellation, and then the average score across all regions is obtained.

3.2.2. Comparison to State-of-the-art Methods

We then compare the proposed DDEvENet with several state-of-the-art methods in the literature, including FastSurfer (Henschel et al., 2020), Swin UNETR (Hatamizadeh et al., 2021), nnU-Net (Hatamizadeh et al., 2021), (Isensee et al., 2021), and DDParcel (Zhang et al., 2023). The details for each compared method are introduced as follows. (1) FastSurfer is a method designed for predicting FS parcellations from structural MRI scans that provides efficient and accurate cortical reconstruction. To extend its applicability to diffusion dMRI data, we adapt FastSurfer to accept an individual dMRI parameter as input (FA is used as it yields the best parcellation performance across all the diffusion maps), enabling direct comparison with other methods. (2) Swin UNETR, initially designed for brain tumor segmentation in MRI images, reformulated image segmentation as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. Leveraging a hierarchical Swin Transformer as the encoder, Swin UNETR excels at handling multi-modal input data. In our study, we train a brain parcellation model with Swin UNETR using the same input images as our proposed method and evaluated its performance in predicting FS parcellations. (3) nnU-Net has earned acclaim for its adaptability and robust performance in medical image parcellation tasks. Leveraging the U-Net architecture, nnU-Net automatically configures a customized segmentation pipeline based on dataset characteristics and available hardware resources. In our study, we train a brain parcellation model with nnU-Net utilizing the same multi-channel input images as our proposed method, and generate FS parcellations for comparative evaluation. (4) DDParcel is a recently proposed brain parcellation method directly using dMRI data. Unlike conventional approaches requiring inter-modality registration, DDParcel circumvents this need, thus minimizing parcellation errors arising from distortion artifacts and low dMRI image resolution. Designed specifically for dMRI data with its unique multi-parameter characteristics, DDParcel utilized a multi-level fusion strategy to harness information from the various dMRI parameters as inputs for training. Here, the comparison is performed using the HCP testing dataset. Evaluation metrics including Dice score, recall, and IoU were computed for each FS region, with statistical analyses performed to assess significant differences among the methods. This comprehensive evaluation framework ensured robust comparisons and provided insights into the strengths and limitations of each method for FS parcellation tasks.

3.2.3. Comparison of Parcellation Performance on Different Datasets

Furthermore, we evaluate DDEvENet on multiple datasets (including HCP, CNP, PPMI, WMH, and BT) to demonstrate the generalizability and robustness of our approach with varying imaging conditions. For each dataset, we first quantitatively compare our method with each of the state-of-the-art methods (see Section 3.2.2). The FreeSurfer-based parcellation is used as ground truth parcellation from which the Dice score between the predicted and ground truth parcellation maps is computed. It is worth noting that the FS parcellation failed for two subjects from the BT dataset due to the presence of the disease structures. For these two subjects, there is then no ground truth available for calculating the parcellation metrics. Therefore, a subset of the BT dataset containing 20 subjects is used to compute the metrics. Following that, visual comparisons of parcellation performance are provided to illustrate the practical effectiveness and reliability of our segmentation results. Finally, we evaluate the effectiveness of uncertainty estimation on the WMH and BT datasets. The manually segmented abnormal lesion regions are compared with the uncertainty heatmap. Through these experiments, we aim to establish the proposed method’s effectiveness in accuracy, generalizability, and clinical utility.

4. Experimental Results

4.1. Ablation Study

Table 2 presents the ablation study results, showing that the evidence-based ensemble consistently outperforms the others in terms of Dice scores, recall, and IoU. The consistent superiority of the evidence-based ensemble indicates that it can more effectively capture target structures and preserve spatial relationships through its nuanced integration of voxel-level uncertainties. When considering the backbone networks, it is important to note the impact they have on overall performance. FastSurferCNN stands out for notably improving performance, suggesting its compatibility with ensemble methods. This compatibility implies that the ensemble approach leverages the strengths of FastSurferCNN effectively. On the other hand, nnU-Net exhibits minimal variation across different ensemble criteria. This could be attributed to its self-configured architectures, which are inherently optimized for performance. Consequently, it shows limited gains from further ensemble enhancements.

Table 2:

Ablation study with comparison to different ensemble criteria and backbone networksa

Backbone Network Ensemble Criteria
Probability-based
Entropy-based
Evidence-based
Dice Recall IoU Dice Recall IoU Dice Recall IoU

nnU-Net 0.751 0.791 0.661 0.755 0.790 0.668 0.758 0.794 0.673
Swin UNETR 0.735 0.783 0.644 0.746 0.788 0.660 0.761 0.786 0.665
FastSurferCNN 0.746 0.780 0.669 0.764 0.792 0.672 0.789 0.804 0.682
a

The table shows the performance of different backbone networks across various ensemble criteria.

4.2. Comparison to State-of-the-art Methods

Table 3 gives the comparison results of our proposed DDEvENet method compared with the FastSurfer, Swin UNETR, nnU-Net, and DDParcel methods. The results show that our proposed method obtains the highest Dice scores, followed by the DDParcel, nnU-Net, FastSurfer, and then the Swin UNETR method. Our proposed method also achieves the highest Mean Recall and Mean IoU scores among all methods evaluated, demonstrating improvement in parcellation benchmarks. A one-way repeated measures analysis of variance (ANOVA) is conducted, indicating statistically significant differences among the methods for Dice (F = 3.998, p = 0.007), Recall (F = 5.243, p = 0.002), and IoU (F = 21.512, p = 0.001).

Table 3:

Parcellation performance in comparison with state-of-the-art methodsa

Model Dice Score Mean Recall Mean IoU

FastSurfer 0.739 ± 0.071 0.792 ± 0.019 0.646 ± 0.011
Swin UNETR 0.731 ± 0.092 0.784 ± 0.015 0.633 ± 0.006
nnU-Net 0.749 ± 0.084 0.765 ± 0.011 0.663 ± 0.009
DDParcel 0.770 ± 0.073 0.793 ± 0.015 0.670 ± 0.013
DDEvENet 0.789 ± 0.061 0.804 ± 0.018 0.682 ± 0.010
a

For the three single-image input networks (FastSurfer, Swin UNETR, and nnU-Net), the network inputs are FA images, which produce the best parcellation results. Our proposed method and DDParcel allow inputs of multiple images and thus five and four dMRI parameters are used, respectively.

4.3. Comparison of Parcellation Performance on Different Datasets

The quantitative results of parcellation performance on different datasets are first studied. As shown in Table 4, our proposed method consistently outperforms the other methods across all datasets in terms of the Dice metric. The statistical validation shows significant differences in Dice scores among the methods for several datasets. In the HCP dataset (F = 5.3564, p = 0.0006), PPMI dataset (F = 8.7191, p ¡ 0.0001), BT dataset (F = 3.1838, p = 0.0168), and WMH dataset (F = 3.6952, p = 0.0077), the results indicate strong statistical significance, demonstrating the effectiveness of the methodology. For the CNP dataset, while the statistical significance was less evident (F = 1.4002, p = 0.2398), DDEvENet still achieved the highest Dice score among the methods.

Table 4:

Parcellation Dice Score on HCP, CNP, PPMI, BT, WMH datasets in comparison with state-of-the-art methodsa

Dataset FastSurfer Swin UNETR nnU-Net DDParcel DDEvENet

HCP 0.739 ± 0.071 0.731 ± 0.092 0.749 ± 0.084 0.770 ± 0.073 0.789 ± 0.061
CNP 0.653 ± 0.084 0.625 ± 0.102 0.673 ± 0.075 0.670 ± 0.094 0.693 ± 0.081
PPMI 0.669 ± 0.062 0.619 ± 0.087 0.680 ± 0.101 0.702 ± 0.074 0.709 ± 0.063
BT 0.694 ± 0.077 0.685 ± 0.115 0.692 ± 0.081 0.713 ± 0.069 0.748 ± 0.053
WMH 0.701 ± 0.095 0.671 ± 0.108 0.694 ± 0.105 0.747 ± 0.085 0.760 ± 0.092
a

The table compares the parcellation Dice scores across different datasets using various state-of-the-art methods.

Figures 2 and 3 provide a visual comparison of the FS parcellation across the different methods in the HCP, CNP, and PPMI datasets. We can observe that our method generates a visually smoother segmentation that is more consistent with the tissue boundaries appearing on the input image.

Figure 3:

Figure 3:

Visualization of parcellation results on randomly selected PPMI (left) and CNP (right) scans. The green and yellow arrows identify examples of tissue boundaries for easier comparison across parcellation labelmaps.

Figures 4 and 5 present a visualization of uncertainty estimations on randomly selected patient scans from the WMH and BT datasets. The left part displays the input alongside evidence-based uncertainty from three subnetworks. These heatmaps show rough views of uncertainty distribution, where high values are typically at tissue boundaries and pathological regions. Additionally, the final output is compared to the manually segmented lesion masks (white matter hyperintensity region or tumor). We can find that our network is capable of producing reasonable parcellation for unseen patient scans, and the final output uncertainty heatmaps clearly highlight the abnormal regions. Notably, the input with BT shown in Figure 5 is one of the two subjects for whom FreeSurfer-based parcellation fails due to the presence of a large tumor region. In contrast, our method successfully performs parcellation while identifying the abnormal lesion regions as regions of high uncertainty. More specifically, unlike the compared state-of-the-art methods (nn-Unet and Swin UNETR), in DDEvENet the parcellation of the gray matter remains accurate in the presence of a tumor or lesion. In all the compared methods the lesion itself is not labeled as a lesion since a lesion label is not available in the training data. Nevertheless, DDEvENet outputs the uncertainty map that can be used for identification of abnormal lesion region, demonstrating the potential of DDEvENet in automatic segmentation of lesions.

5. Discussion

In this work, we introduced DDEvENet, an advanced Evidence-based Ensemble Neural Network for brain parcellation and uncertainty estimation using dMRI data. DDEvENet incorporates evidential deep learning with ensemble techniques to mitigate challenges such as handling out-of-distribution data, a common issue with conventional segmentation models. Our comprehensive evaluation demonstrated DDEvENet’s superior performance in accuracy and robustness across diverse datasets, including those with pathological brain scans, underscoring its potential for significant impact in both research and clinical settings.

We showed that DDEvENet improved brain parcellation results both quantitatively and visually. Compared to the baseline and state-of-the-art FS parcellations, DDEvENet’s results showed smoother tissue boundaries that better aligned with actual brain structures and also improved quantitative parcellation metrics. One of the main reasons is that DDEvENet is an ensemble approach that can utilize complementary information about brain microstructures extracted from the multiple dMRI parameters. Another reason is that DDEvENet uses evidential learning to estimate uncertainty in the model’s predictions. This can help pinpoint potential areas in which the model is not accurate, identifying regions that may require further examination or refinement. For example, uncertainty heatmaps can reveal anatomical variations and inconsistencies in the data, highlighting areas where tissue boundaries are ambiguous or where partial volume effects are most pronounced due to the low resolution of dMRI data.

Based on subjective logic, evidential deep learning offers an effective way of identifying out-of-distribution regions and cases. This evidence-based approach leverages the subjective logic theory to effectively manage the inherent uncertainties and complex information that dMRI offers (Jones & Cercignani, 2010; Le Bihan, 2003). Unlike conventional models, evidential deep learning allows DDEvENet to express varying degrees of beliefs, accommodating the subtle and intricate diffusion patterns found in brain tissues. The overall uncertainty can be then easily derived from those beliefs, where unseen patterns in the training data will be highlighted. Those abnormal uncertainty heatmaps are valuable for identifying reliable areas/cases and those appearing to be out-of-distribution or needing further investigation in brain scans. Moreover, when different input parameters produce varied outcomes, the resulting uncertainty heatmaps from each subnetwork are also useful. They demonstrate the degree to which specific types or settings of inputs can inform us about brain structures, thereby aiding in making well-informed decisions and ongoing refinements.

DDEvENet’s robustness and generalizability were demonstrated on large testing datasets from multiple imaging sources and different populations. In the datasets from healthy controls and brain patient data without apparently abnormal brain regions (i.e., the CNP, and PPMI datasets), our method consistently outperformed the compared state-of-the-art deep learning methods. This strong generalization is essential for practical applications, ensuring reliable performance across diverse populations and varying imaging conditions. Ablation studies confirmed that the specific backbone structures and ensemble criteria used in DDEvENet’s subnetworks significantly contribute to its superior performance. When applied to out-of-distribution datasets with patients with abnormal brain lesions, DDEvENet introduced voxel-level uncertainty estimation, allowing for more precise identification of problematic areas in MRI scans. This capability can be important for clinical diagnostics. Future work could further explore the correlation between uncertainty estimates and specific pathologies, such as identifying patterns of uncertainty that correspond to particular lesion types or stages of disease progression. Such studies could enhance the interpretability of uncertainty maps, providing deeper insights into underlying anatomical or pathological features and supporting more targeted clinical interventions.

Furthermore, DDEvENet is designed to be a fast and compact tool for both parcellation and uncertainty estimation. The evidential component is integrated into the network structure, enabling uncertainty estimation without the need for retraining the model. On an RTX3090 GPU, DDEvENet completes both tasks in approximately 2 minutes. This efficiency makes DDEvENet particularly useful for guiding clinical assessments and ensuring reliable interpretations in a timely manner.

Potential limitations of the present study, including suggested future work to address limitations, are as follows. First, an interesting direction for future work is to integrate other imaging modalities such as T1-weighted and functional MRI into our framework. These modalities provide complementary structural and functional information that can further enhance the accuracy and robustness of uncertainty estimation. Multi-modal fusion strategies could be developed to leverage the strengths of each modality for a more comprehensive understanding of brain structure and function. Second, while our approach already offers a large computational advantage over the traditional registration-based approach, we acknowledge the potential for further optimization, particularly in the context of real-time processing for applications such as surgical planning and intraoperative guidance. Real-time applications require high speed and accuracy, which can be addressed by techniques like model compression, pruning, and hardware acceleration (e.g., FPGA or GPU optimizations) to enhance responsiveness in clinical settings. Lightweight architectures for resource-constrained environments could expand the use of our method to portable devices, such as bedside tools. Additionally, adaptive inference strategies, such as coarse-to-fine frameworks or real-time uncertainty estimation, could improve efficiency and provide critical feedback in time-sensitive tasks. These refinements would help extend our model’s utility to practical, real-world clinical applications. Third, recent advancements in self-supervised learning (Gui et al., 2024; Huang et al., 2023) and neural architecture search (NAS) (Kang et al., 2023; Qin et al., 2023) have shown promise in medical imaging applications. Self-supervised learning could enable DDEvENet to leverage unlabeled data for pre-training, potentially improving performance on limited datasets. NAS could help in discovering more efficient network architectures tailored to our specific task, possibly reducing computational overhead. Incorporating these approaches in future work may enhance both the efficiency and generalizability of DDEvENet. Finally, we tested our method on populations with a wide age range including children, young adults, and elderly adults (see Supplementary Figure A). However, our evaluation was also limited to a certain age range and did not extend to very young ages (e.g., babies and neonates). It is well known that the myelination of white matter in neonates is essentially different from the populations under study, which may produce parcellation errors in local brain regions (see Supplementary Figure B). Therefore, the curation of training data that reflects the anatomy of the specific populations might be needed to further improve the generalizability of DDEvENet across different age groups and developmental stages.

6. Conclusion

In conclusion, DDEvENet offers a robust and efficient solution for brain parcellation and uncertainty estimation using diffusion MRI data. By combining evidential deep learning with ensemble strategies, DDEvENet addresses key challenges in conventional segmentation models, providing superior accuracy and robustness across diverse datasets. The model’s integration of multiple dMRI parameters and evidence-based ensemble methods enhances its performance, making it a valuable tool in both research and clinical contexts.

Supplementary Material

1

Highlights.

  • Introduced the EVENet model, leveraging an evidential deep learning framework to quantify predictive uncertainty in dMRI brain parcellation.

  • Implemented an evidence-based ensemble method that enhances the reliability of segmentation by integrating multiple uncertainty estimates.

  • Developed a novel uncertainty heatmap to identify regions with low confidence, potentially correlating with pathological areas.

  • Demonstrated improved segmentation accuracy and interpretability in brain parcellation tasks compared to conventional methods.

Acknowledgements

This work is in part supported by the National Key R&D Program of China (No. 2023YFE0118600), the National Natural Science Foundation of China (No. 62371107) and the National Institutes of Health (R01MH108574, P41EB015902, R01MH125860, R01MH119222, R01MH132610, R01NS125781, K99MH131850).

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Data and Code Availability

The data used in this project include the public HCP (www.humanconnectome.org), CNP (https://openfmri.org/dataset/ds000030), and PPMI (http://www.ppmi-info.org) datasets. The raw imaging data of the WMH and BT datasets are not publicly available because public availability would compromise participant confidentiality and participant privacy, but the derived diffusion MRI parameter maps will be made available upon request. The code and trained model will be made publically available at: https://github.com/chenjun-li/DDEvENet.

References

  1. Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR, et al. (2021). A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information fusion, 76, 243–297. [Google Scholar]
  2. Albi A, Meola A, Zhang F, Kahali P, Rigolo L, Tax CM, Ciris PA, Essayed WI, Unadkat P, Norton I, et al. (2018). Image registration to compensate for epi distortion in patients with brain tumors: An evaluation of tract-specific effects. Journal of Neuroimaging, 28(2), 173–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Basser PJ, & Pierpaoli C (2011). Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor mri. Journal of Magnetic Resonance, 213(2), 560–570. [DOI] [PubMed] [Google Scholar]
  4. Billah T, Cetin Karayumak S, Bouix S, & Rathi Y (2019). Multi-site diffusion mri harmonization. Zenodo. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Billot B, Greve DN, Puonti O, Thielscher A, Van Leemput K, Fischl B, Dalca AV, Iglesias JE, et al. (2023). Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis, 86, 102789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cetin Karayumak S, Kubicki M, & Rathi Y (2018). Harmonizing diffusion mri data across magnetic field strengths. Medical Image Computing and Computer Assisted Intervention, 116–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ciritsis A, Boss A, & Rossi C (2018). Automated pixel-wise brain tissue segmentation of diffusion-weighted images via machine learning. NMR in Biomedicine, 31(7), e3931. [DOI] [PubMed] [Google Scholar]
  8. Cloutman LL, & Lambon Ralph MA (2012). Connectivity-based structural and functional parcellation of the human cortex using diffusion imaging and tractography. Frontiers in Neuroanatomy, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fischl B (2012). Freesurfer. Neuroimage, 62(2), 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gal Y, & Ghahramani Z (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. international conference on machine learning, 1050–1059. [Google Scholar]
  11. Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe A, Triebel R, Jung P, Roscher R, et al. (2023). A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 56(Suppl 1), 1513–1589. [Google Scholar]
  12. Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, Ugurbil K, Andersson J, Beckmann CF, Jenkinson M, et al. (2016). A multi-modal parcellation of human cerebral cortex. Nature, 536(7615), 171–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, Xu J, Jbabdi S, Webster M, Polimeni JR, et al. (2013). The minimal preprocessing pipelines for the human connectome project. Neuroimage, 80, 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Guevara M, Guevara P, Román C, & Mangin J-F (2020). Superficial white matter: A review on the dmri analysis methods and applications. NeuroImage, 212, 116673. [DOI] [PubMed] [Google Scholar]
  15. Gui J, Chen T, Zhang J, Cao Q, Sun Z, Luo H, & Tao D (2024). A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence. [DOI] [PubMed] [Google Scholar]
  16. Hatamizadeh A, Nath V, Tang Y, Yang D, Roth HR, & Xu D (2021). Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. International MICCAI brainlesion workshop, 272–284. [Google Scholar]
  17. Hayat M, & Aramvith S (2024). Transformer’s role in brain mri: A scoping review. IEEE Access. [Google Scholar]
  18. Henschel L, Conjeti S, Estrada S, Diers K, Fischl B, & Reuter M (2020). Fastsurfer-a fast and accurate deep learning based neuroimaging pipeline. NeuroImage, 219, 117012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Huang S-C, Pareek A, Jensen M, Lungren MP, Yeung S, & Chaudhari AS (2023). Self-supervised learning for medical image classification: A systematic review and implementation guidelines. NPJ Digital Medicine, 6(1), 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Isensee F, Jaeger PF, Kohl SA, Petersen J, & Maier-Hein KH (2021). Nnu-net: A self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203–211. [DOI] [PubMed] [Google Scholar]
  21. Ji JL, Spronk M, Kulkarni K, Repovš G, Anticevic A, & Cole MW (2019). Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185, 35–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Johansen-Berg H, & Behrens TE (2013). Diffusion mri: From quantitative measurement to in vivo neuroanatomy. Academic Press. [Google Scholar]
  23. Jones DK, & Cercignani M (2010). Twenty-five pitfalls in the analysis of diffusion mri data. NMR in Biomedicine, 23(7), 803–820. [DOI] [PubMed] [Google Scholar]
  24. Jsang A (2018). Subjective logic: A formalism for reasoning under uncertainty. Springer Publishing Company, Incorporated. [Google Scholar]
  25. Kang J-S, Kang J, Kim J-J, Jeon K-W, Chung H-J, & Park B-H (2023). Neural architecture search survey: A computer vision perspective. Sensors, 23(3), 1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Karayumak SC, Bouix S, Ning L, James A, Crow T, Shenton M, Kubicki M, & Rathi Y (2019). Retrospective harmonization of multi-site diffusion mri data acquired with different acquisition parameters. Neuroimage, 184, 180–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kendall A, & Gal Y (2017). What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30. [Google Scholar]
  28. Kingma DP (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [Google Scholar]
  29. Lakshminarayanan B, Pritzel A, & Blundell C (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30. [Google Scholar]
  30. Le Bihan D (2003). Looking into the functional architecture of the brain with diffusion mri. Nature reviews neuroscience, 4(6), 469–480. [DOI] [PubMed] [Google Scholar]
  31. Lella E, Pazienza A, Lofu D, Anglani R, & Vitulano F (2021). An ensemble learning approach based on diffusion tensor imaging measures for alzheimer’s disease classification. Electronics, 10(3), 249. [Google Scholar]
  32. Li H, Nan Y, Del Ser J, & Yang G (2023). Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation. Neural Computing and Applications, 35(30), 22071–22085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Li Y, Zhang W, Wu Y, Yin L, Zhu C, Chen Y, Cetin-Karayumak S, Cho KIK, Zekelman LR, Rushmore J, et al. (2024). A diffusion mri tractography atlas for concurrent white matter mapping across eastern and western populations. Scientific Data, 11(1), 787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Li Z, Zhang C, Zhang Y, Wang X, Ma X, Zhang H, & Wu S (2023). Can: Context-assisted full attention network for brain tissue segmentation. Medical Image Analysis, 85, 102710. [DOI] [PubMed] [Google Scholar]
  35. Liu T, Li H, Wong K, Tarokh A, Guo L, & Wong ST (2007). Brain tissue segmentation based on dti data. NeuroImage, 38(1), 114–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Luo J, Sedghi A, Popuri K, Cobzas D, Zhang M, Preiswerk F, Toews M, Golby A, Sugiyama M, Wells WM, et al. (2019). On the applicability of registration uncertainty. Medical Image Computing and Computer Assisted Intervention, 410–419. [Google Scholar]
  37. Malinsky M, Peter R, Hodneland E, Lundervold AJ, Lundervold A, & Jan J (2013). Registration of fa and t1-weighted mri data of healthy human brain based on template matching and normalized cross-correlation. Journal of digital imaging, 26, 774–785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mirzaalian H, Ning L, Savadjiev P, Pasternak O, Bouix S, Michailovich O, Grant G, Marx CE, Morey RA, Flashman LA, et al. (2016). Inter-site and inter-scanner diffusion mri data harmonization. NeuroImage, 135, 311–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ning L, Bonet-Carne E, Grussu F, Sepehrband F, Kaden E, Veraart J, Blumberg SB, Khoo CS, Palombo M, Kokkinos I, et al. (2020). Cross-scanner and cross-protocol multi-shell diffusion mri data harmonization: Algorithms and results. Neuroimage, 221, 117128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Norton I, Essayed WI, Zhang F, Pujol S, Yarmarkovich A, Golby AJ, Kindlmann G, Wassermann D, Estepar RSJ, Rathi Y, et al. (2017). Slicerdmri: Open source diffusion mri software for brain cancer research. Cancer research, 77(21), e101–e103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32. [Google Scholar]
  42. Pierpaoli C, & Basser PJ (1996). Toward a quantitative assessment of diffusion anisotropy. Magnetic resonance in Medicine, 36(6), 893–906. [DOI] [PubMed] [Google Scholar]
  43. Poldrack RA, Congdon E, Triplett W, Gorgolewski K, Karlsgodt K, Mumford J, Sabb F, Freimer N, London E, Cannon T, et al. (2016). A phenome-wide examination of neural and cognitive function. Scientific data, 3(1), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Qin S, Zhang Z, Jiang Y, Cui S, Cheng S, & Li Z (2023). Ng-nas: Node growth neural architecture search for 3d medical image segmentation. Computerized Medical Imaging and Graphics, 108, 102268. [DOI] [PubMed] [Google Scholar]
  45. Seitz J, Rathi Y, Lyall A, Pasternak O, Del Re EC, Niznikiewicz M, Nestor P, Seidman LJ, Petryshen TL, Mesholam-Gately RI, et al. (2018). Alteration of gray matter microstructure in schizophrenia. Brain imaging and behavior, 12, 54–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sensoy M, Kaplan L, & Kandemir M (2018). Evidential deep learning to quantify classification uncertainty. Advances in neural information processing systems, 31. [Google Scholar]
  47. Shi F, Yap P-T, Gao W, Lin W, Gilmore JH, & Shen D (2012). Altered structural connectivity in neonates at genetic risk for schizophrenia: A combined study using morphological and white matter networks. Neuroimage, 62(3), 1622–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sporns O, Tononi G, & Kotter R (2005). The human connectome: A structural description of the human brain. PLoS computational biology, 1(4), e42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Springenberg JT, Klein A, Falkner S, & Hutter F (2016). Bayesian optimization with robust bayesian neural networks. Advances in neural information processing systems, 29. [Google Scholar]
  50. Sudre CH, Li W, Vercauteren T, Ourselin S, & Jorge Cardoso M (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 240–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tax CM, Grussu F, Kaden E, Ning L, Rudrapatna U, Evans CJ, St-Jean S, Leemans A, Koppers S, Merhof D, et al. (2019). Cross-scanner and cross-protocol diffusion mri data harmonisation: A benchmark database and evaluation of algorithms. NeuroImage, 195, 285–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Theaud G, Edde M, Dumont M, Zotti C, Zucchelli M, Deslauriers-Gauthier S, Deriche R, Jodoin P-M, & Descoteaux M (2022). Doris: A diffusion mri-based 10 tissue class deep learning segmentation algorithm tailored to improve anatomically-constrained tractography. Frontiers in Neuroimaging, 1, 917806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Vollmar C, O’Muircheartaigh J, Barker GJ, Symms MR, Thompson P, Kumari V, Duncan JS, Richardson MP, & Koepp MJ (2010). Identical, but not the same: Intra-site and inter-site reproducibility of fractional anisotropy measures on two 3.0 t scanners. Neuroimage, 51(4), 1384–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wassermann D, Makris N, Rathi MK, Shenton Y.and, Kubicki R,M, & Westin CF. (2016). The white matter query language: A novel approach for describing human white matter anatomy. Brain structure & function, 221(9), 4705–4721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wee C-Y, Yap P-T, Li W, Denny K, Browndyke JN, Potter GG, Welsh-Bohmer KA, Wang L, & Shen D (2011). Enriched white matter connectivity networks for accurate identification of mci patients. Neuroimage, 54(3), 1812–1822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wen Y, He L, von Deneen KM, & Lu Y (2013). Brain tissue classification based on dti using an improved fuzzy c-means algorithm with spatial constraints. Magnetic Resonance Imaging, 31(9), 1623–1630. [DOI] [PubMed] [Google Scholar]
  57. Winzeck S, Mocking SJ, Bezerra R, Bouts MJ, McIntosh EC, Diwan I, Garg P, Chutinet A, Kimberly WT, Copen WA, et al. (2019). Ensemble of convolutional neural networks improves automated segmentation of acute ischemic lesions using multiparametric diffusion-weighted mri. American Journal of Neuroradiology, 40(6), 938–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wu M, Chang L-C, Walker L, Lemaitre H, Barnett AS, Marenco S, & Pierpaoli C (2008). Comparison of epi distortion correction methods in diffusion tensor mri using a novel framework. Medical Image Computing and Computer-Assisted Intervention, 321–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Yap P-T, Zhang Y, & Shen D (2015). Brain tissue segmentation based on diffusion mri using l0 sparse-group representation classification. Medical Image Computing and Computer-Assisted Intervention, 132–139. [PMC free article] [PubMed] [Google Scholar]
  60. Zhang F, Breger A, Cho KIK, Ning L, Westin C-F, O’Donnell LJ, & Pasternak O (2021). Deep learning based segmentation of brain tissue from diffusion mri. Neuroimage, 233, 117934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zhang F, Cho KIK, Seitz-Holland J, Ning L, Legarreta JH, Rathi Y, Westin C-F, O’Donnell LJ, & Pasternak O (2023). Ddparcel: Deep learning anatomical brain parcellation from diffusion mri. IEEE Transactions on Medical Imaging. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Zhang F, Daducci A, He Y, Schiavi S, Seguin C, Smith RE, Yeh C-H, Zhao T, & O’Donnell LJ (2022). Quantitative mapping of the brain’s structural connectivity using diffusion mri tractography: A review. NeuroImage, 249, 118870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Zhang F, Noh T, Juvekar P, Frisken S, Rigolo L, Norton I, Kapur T, Pujol S, Wells W, Yarmarkovich A, Kindlmann G, Wassermann D, San Jose Estepar R, Rathi Y, Kikinis R, Johnson H, Westin C-F, Pieper S, Golby A, & O’Donnell L (2020). SlicerDMRI: Diffusion MRI and tractography research software for brain cancer surgery planning and visualization. JCO Clin. Can. Info, 4, 299–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Zhang F, Wu Y, Norton I, Rigolo L, Rathi Y, Makris N, & O’Donnell LJ (2018). An anatomically curated fiber clustering white matter atlas for consistent white matter tract parcellation across the lifespan. Neuroimage, 179, 429–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zou K, Chen Z, Yuan X, Shen X, Wang M, & Fu H (2023). A review of uncertainty estimation and its application in medical imaging. Meta-Radiology, 100003. [Google Scholar]
  66. Zou K, Yuan X, Shen X, Wang M, & Fu H (2022). Tbrats: Trusted brain tumor segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, 503–513. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

The data used in this project include the public HCP (www.humanconnectome.org), CNP (https://openfmri.org/dataset/ds000030), and PPMI (http://www.ppmi-info.org) datasets. The raw imaging data of the WMH and BT datasets are not publicly available because public availability would compromise participant confidentiality and participant privacy, but the derived diffusion MRI parameter maps will be made available upon request. The code and trained model will be made publically available at: https://github.com/chenjun-li/DDEvENet.

RESOURCES