Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 1.
Published in final edited form as: Med Phys. 2023 Sep 11;51(3):1931–1943. doi: 10.1002/mp.16695

Quantifying U-Net Uncertainty in Multi-Parametric MRI-based Glioma Segmentation by Spherical Image Projection

Zhenyu Yang 1,2,3, Kyle Lafata 1,4,5, Eugene Vaios 1, Zongsheng Hu 6,7, Trey Mullikin 1, Fang-Fang Yin 1,2, Chunhao Wang 1,*
PMCID: PMC10925552  NIHMSID: NIHMS1929162  PMID: 37696029

Abstract

Background:

Uncertainty quantification in deep learning is an important research topic. For medical image segmentation, the uncertainty measurements are usually reported as the likelihood that each pixel belongs to the predicted segmentation region. In potential clinical applications, the uncertainty result reflects the algorithm’s robustness and supports the confidence and trust of the segmentation result when the ground-truth result is absent. For commonly studied deep learning models, novel methods for quantifying segmentation uncertainty are in demand.

Purpose:

To develop a U-Net segmentation uncertainty quantification method based on spherical image projection of multi-parametric MRI (MP-MRI) in glioma segmentation.

Methods:

The projection of planar MRI data onto a spherical surface is equivalent to a nonlinear image transformation that retains global anatomical information. By incorporating this image transformation process in our proposed spherical projection-based U-Net (SPU-Net) segmentation model design, multiple independent segmentation predictions can be obtained from a single MRI. The final segmentation is the average of all available results, and the variation can be visualized as a pixel-wise uncertainty map. An uncertainty score was introduced to evaluate and compare the performance of uncertainty measurements.

The proposed SPU-Net model was implemented on the basis of 369 glioma patients with MP-MRI scans (T1, T1-Ce, T2, and FLAIR). Three SPU-Net models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT), respectively. The SPU-Net model was compared with (1) the classic U-Net model with test-time augmentation (TTA) and (2) linear scaling-based U-Net (LSU-Net) segmentation models in terms of both segmentation accuracy (Dice coefficient, sensitivity, specificity, and accuracy) and segmentation uncertainty (uncertainty map and uncertainty score).

Results:

The developed SPU-Net model successfully achieved low uncertainty for correct segmentation predictions (e.g., tumor interior or healthy tissue interior) and high uncertainty for incorrect results (e.g., tumor boundaries). This model could allow the identification of missed tumor targets or segmentation errors in U-Net. Quantitatively, the SPU-Net model achieved the highest uncertainty scores for three segmentation targets (ET/TC/WT): 0.826/0.848/0.936, compared to 0.784/0.643/0.872 using the U-Net with TTA and 0.743/0.702/0.876 with the LSU-Net (scaling factor = 2). The SPU-Net also achieved statistically significantly higher Dice coefficients, underscoring the improved segmentation accuracy.

Conclusion:

The SPU-Net model offers a powerful tool to quantify glioma segmentation uncertainty while improving segmentation accuracy. The proposed method can be generalized to other medical image-related deep-learning applications for uncertainty evaluation.

Keywords: Segmentation uncertainty, spherical projection, deep learning, glioma segmentation

1. Introduction

Automatic image segmentation is a key research topic in medical imaging analysis1-3. Driven by recent developments in algorithms and increased computational power, deep learning has become the major vehicle for improved medical image segmentation2,4,5. When translating research and development into real-world clinical applications, the robustness of deep neural network (DNN) predictions must be studied before incorporating it into patient care6-8. Classic neural networks are limited by their inability to deliver reliable uncertainty estimation and suffer from over- or under-confidence7. Specifically, in medical image segmentation, current DNNs learn from training cases (i.e., paired image data and segmentation ground truths derived from manual delineation) and make segmentation predictions using test cases. However, pixel-wise uncertainty, defined as the likelihood that each pixel belongs in the target segmentation region, is unavailable. Without these uncertainty measurements, predictions may provide a false impression of certainty9. Additionally, DNNs are prone to overfitting due to frequently underpowered medical image datasets, thus underscoring the importance of uncertainty estimation for robustness assessment (i.e., model calibration) for deep learning models6,10,11.

To overcome these issues, researchers are actively working to understand and quantify uncertainty in DNN prediction12. In general, solutions fall under two approaches: measuring DNN model-related uncertainty (i.e., epistemic uncertainty) and measuring data-related uncertainty (i.e., aleatoric uncertainty)13. A representative technique for model-related uncertainty evaluation uses Bayesian neural networks12,14 which replace a single weight in the DNN with a probability distribution to produce a probabilistic prediction. The variation in model parameters can be translated to the variation in segmentation predictions, yielding an error margin for each pixel. Monte Carlo dropout15,16 represents an alternative technique. Here, based on the traditional DNN architecture, the prediction uncertainty is estimated by repeating the segmentation prediction multiple times with a certain number of neurons randomly switched off (i.e., dropped out) with each iteration. Both aforementioned estimation techniques have not been well investigated in medical image segmentation due to their complexity and computational costs. In contrast, data-related uncertainty estimation has been more extensively investigated. Medical image data sources are often inhomogeneous (e.g., high intra- and inter-observer variation, image noise effect, and image acquisition protocol variation)17 with significant real-world data variability (e.g., unknown/rare data entries18,19). These limitations have implications for DNN prediction uncertainties 20. Test-time augmentation21,22 is commonly used to determine image segmentation uncertainty by evaluating the different augmentation combinations (e.g., rotation, scaling, flipping, adding noise). However, a standard data augmentation strategy for uncertainty evaluation has yet to be established and this trial-and-error method frequently suffers from poor efficiency.

Given these limitations with previously investigated methodologies, we aimed to develop a novel method to quantify data-related uncertainty of U-Net23 using multi-parametric MRI (MP-MRI)-based brain glioma segmentation. Our approach was inspired by the image processing techniques used in spherical cameras. The planar MRI images were projected onto a pre-defined spherical surface with multiple projection centers. Our hypothesis was that the variation in segmentation results using these projections with different projection centers could reflect the image-related segmentation uncertainty. We utilized this approach as an equivalence to nonlinear image transformation and combined it with U-Net to quantify the uncertainty of glioma segmentation. The effectiveness of this method was evaluated through the comparison studies presented in this work.

2. Materials and Methods

A. Image Data

The Brain Tumor Segmentation (BraTS) Challenge 2020 dataset was employed in this work24. This dataset includes 369 subjects with either low-grade glioma or glioblastoma. Each subject has four standard MR sequences as an MP-MRI protocol: Fluid Attenuated Inversion Recovery (FLAIR), T1-weighted (T1), contrast-enhanced T1-weighted (T1-Ce), and T2-weighted (T2) sequences. The ground-truth (GT) tumor segmentation, contoured by experienced neurosurgeons, includes three overlapping tumor targets as binarized masks: the enhancing tumor (ET), the tumor core (TC), and the whole tumor (WT). Figure 1 illustrates an example of 4 MR sequences with the corresponding ground-truth segmentations. The BraTS challenge 2020 dataset also includes the following pre-processing steps: (1) co-registration to the same anatomical template, (2) interpolation to the same resolution 1 × 1 × 1 mm, and (3) skull-stripping. To facilitate image projection operations, we also extended the in-plane matrix size of the BraTS image from 240 × 240 to 256 × 256 by padding zeros around the edges.

Figure 1.

Figure 1.

An example of 4 MR images (i.e., FLAIR, T1, T1-Ce, T2) and the corresponding ground-truth segmentations (i.e., ET, TC, and WT) from the BraTS challenge 2020 dataset.

B. Segmentation Model Design

B.1. Spherical Projection

Inspired by spherical camera image processing25, we projected the planar images onto a pre-defined spherical surface as part of an image processing step. As illustrated in Figure 2(A), the spherical image projection causes an inhomogeneous scaling over each sub-region of the original image: local image details near the image center are magnified while preserving the field-of-view (FOV) that renders global anatomy26. We hypothesize that the spherically projected MR images could be used to quantify segmentation uncertainty. Segmentation variation arising from multiple spherical projections derived from an original image reflects internal uncertainty in the segmentation results.

Figure 2.

Figure 2.

(A) Schematic diagram of a spherical image projection. (B) Forward projection: projecting a planar image (green) onto a spherical surface as a spherical image (blue). (C) Backward projection: projecting a spherical image (blue) onto a Cartesian plane as a planar image (green).

Mathematically, we refer to the projection from a planar image to a spherical surface as the forward projection. Assume the sphere in Figure 2(A) has a center O and radius r, and P is the center of the planar image. The lateral view of the forward projection is shown in Figure 2(B). The planar image is placed perpendicular to line OP at projection distance d. The spherical image can be obtained by projecting each pixel within the planar image onto the pre-defined sphere along its radius. The lateral view of the backward projection, i.e., the inverse operation of the forward projection, is shown in Figure 2(C). Given a planar image with a size of h×h, the relationship of h, d, and r jointly determines the projection geometry. Without loss of generality, we let the sphere be unit r=1, and h=0.5. As such, the projection geometry, for both forward projection and backward projection, is governed by the projection distance d.

To quantify the locoregional scaling effect of the spherical projection, a 16 × 16 lattice, as shown in Figure 2(A), was created with image dimensions matching the BraTS dataset (i.e., 256 × 256 matrix size). This lattice image I was first forward-projected as I and then backward-projected as I". The locoregional scaling factor fSP was defined as the area ratio of each square tile within I:

fSP=area of each square aure tile inIarea of each square tile inI(=16×16) (1)

Implementing spherical projection on images with a finite resolution requires interpolation calculations; consequently, the paired forward and backward projection operations degrade the quality of I" (in reference to I). In this work, we employed the structural similarity index (SSIM) to determine the optimal projection distance setting: the SSIM was calculated within the central 8 × 8 lattice region of I and I", and projection distance d with the highest SSIM value (i.e., minimal image quality degradation) was adopted in the following MP-MRI processing.

B.2 Segmentation Model Design

The overall design of the spherical projection-based U-Net (SPU-Net) model is summarized in Figure 3. A U-Net DNN, with its encoding and decoding parts as shown in Figure 3(B), was first constructed, and it has been widely implemented for image segmentation tasks27,28. The encoding part consists of repeated convolutional layers, each followed by a max pooling operation. This hierarchical operation encodes the image into multiple levels of feature representations. The decoding part includes repeated up-convolutional layers. Obtained deep image features are then concatenated with the encoding part. This design projects discriminative features onto the image space to obtain a pixel-wise classification. Another 1 × 1 convolutional layer with a sigmoid operation is followed to produce a segmentation prediction as a probability distribution: the pixel value is the probability of belonging to the correct segmentation in the range of 0 to 1. Because spherically projected images were processed as nonlinear image transforms within a Cartesian grid, we coded the SPU-Net architecture within the same grid to streamline its implementation.

Figure 3.

Figure 3.

(A) The overall design of the proposed spherical projection-based U-Net (SPU-Net) model. Given a 2D MRI image X, the forward projection is performed to generate a set of spherical images XS, and the corresponding segmentation MS can be obtained from XS by U-Net. The backward projection is subsequently performed to project MS onto the Cartesian plane as M. Finally, the segmentation uncertainty U and binarized segmentation mask Z can be obtained by analyzing M. (B) U-Net architecture.

We denote the original input 4-channel MP-MRI as X (dimension = 256 × 256 × 4), with the center shown as a green dot. Given a projection distance d, a single forward projection causes an inhomogeneous scaling to each sub-region. To achieve a uniform transformation effect across X, multiple forward projections can be performed with different projection centers. As shown by the white dots in Figure 3(A), the projection centers are set to be evenly distributed across X, while zero padding is adopted to maintain the FOV size during the forward projection. Let nN={1,2,3,,K}, where K is the total number of available projection centers. The SPU-Net model for a given MR image X can be summarized as follows:

  1. By performing the forward projection with k different image centers, a set of k 4-channel forward-projected images XS={XSn}nN can be obtained as a k×256×256×4 tensor.

  2. A set of k probability distribution maps MS={MSn}nN are generated as the output by U-Net using XS on the spherical surface.

  3. After backward projection, A set of k probability distribution maps M={Mn}nN are obtained from MS on the Cartesian plane

For each pixel (i,j) within X, K independent segmentation predictions can be found within M, i.e., M(i,j)={M(i,j)1,M(i,j)2,M(i,j)3,,M(i,j)K}. To reach the final segmentation result as a binarized mask Z, a voting scheme is designed:

{if(n=1KM(i,j)n)K>0.5,Z(i,j)=1otherwise,Z(i,j)=0}, (2)

where Z(i,j) is the value of pixel (i,j) in Z. The uncertainty U can be quantified by the entropy of the segmentation predictions10. Suppose there are T unique values in M(i,j), and the frequency of the t-th unique value is p^(i,j)t. As such, the uncertainty can be approximated as:

U(i,j)=t=1Tp^(i,j)tln(p^(i,j)t) (3)

where U(i,j) is the uncertainty at pixel (i,j) in U. Note that M(i,j) contains continuous numbers between 0 and 1. Here, we normalize and discretize M(i,j) to the range of 0-100 for the ease of entropy computation10,29.

Based on the pixel-wise segmentation uncertainty result, a quantitative evaluation metric is needed to evaluate the overall image segmentation uncertainty. This metric is expected to be high when (1) U(i,j) is low in the pixel where the segmentation result Z(i,j) is correct (i.e., high confidence in correct results), and (2) U(i,j) is high in the pixel where Z(i,j) is incorrect (i.e., reasonable doubt in wrong results). In this work, we adopted the uncertainty score defined in the BraTS segmentation uncertainty challenge 2020 (QU-BraTS challenge 2020)29. Specifically, the obtained U is first linearly normalized to 0-100, and an uncertainty threshold τ is set to 100 predetermined integers as τ=1,2,,100. At each threshold τ, all pixels with uncertainty value U(i,j)τ are marked as “uncertain”, and the associated segmentation results Z(i,j) are filtered out and not considered in the subsequent calculations. The remaining segmentation results in Z(i,j) are compared with the corresponding ground truth based on the Dice similarity index. At each τ, the total number of true positive pixels and true negative pixels can also be obtained in the remaining Z(i,j) as TPτ and TNτ, respectively. The ratio of filtered true positive (FTP) pixels at threshold τ is defined as FTP=(TP100TPτ)TP100, where the TP100 is the number of true positive pixels in the unfiltered Z (i.e., τ=100). The ratio of filtered true negative (FTN) pixels is defined in a similar manner. As such, three curves and their corresponding area under the curve (AUC) can be obtained:

  1. AUC1: the area under the curve of the Dice similarity index versus τ. High AUC1 indicates that segmentation predictions are accurate in regions of low uncertainty (i.e., high confidence for the correct results).

  2. AUC2: the area under the curve of FTP versus τ. (1 – AUC2) penalizes the low confidence in the true positive predictions.

  1. AUC3: the area under the curve of FTN versus τ. (1 – AUC3) penalizes the low confidence in the true negative predictions.

Collectively, the final uncertainty score is defined as:

Score=AUC1+(1AUC2)+(1AUC3)3. (4)

The uncertainty score (1) rewards high confidence in correct predictions and low confidence in incorrect predictions, and (2) penalizes low confidence for pixels with correct predictions29.

C. Comparison Study

In the adopted BraTS 2020 dataset with 369 subjects, axial 2D images from the four MR sequences were used as 4-channel 2D samples. The sample usage for training and independent test follows the 8:2 ratio in the patient assignment, and five-fold cross-validation within the training set was employed. In the proposed SPU-Net model shown in Figure 3, the projection geometry was determined by the optimal d selection in the study of B.1, and the projection center in Figure 3(A) was spaced at 8-pixel intervals (Hence K=1024). During the training, the loss function was binary cross-entropy, and the Adam optimizer with an initial learning rate of 10−3 was adopted. The segmentation accuracy was evaluated by the sensitivity, specificity, accuracy, and Dice similarity index. The segmentation uncertainty was evaluated by the uncertainty score in Equation (3). While a single model can be used to simultaneously segment ET, TC, and WT, the inter-correlation between the three targets may limit uncertainty quantification. Therefore, we developed three independent SPU-Net models for ET, TC, and WT segmentation, respectively. During the training, the loss function was binary cross-entropy, and the Adam optimizer with an initial learning rate of 10−3 was adopted. The segmentation accuracy was evaluated by the sensitivity, specificity, accuracy, and Dice similarity index. The segmentation uncertainty was evaluated by the uncertainty score in Equation (4).

In the comparison study, two additional segmentation models were studied:

  1. Classic U-Net model with test-time augmentation (TTA). Specifically, the TTA protocol includes image rotation, flipping, scaling, and noise addition. The augmentation is repeated k=1024 times to match our SPU-Net model.

  2. Linear scaling-based U-Net (LSU-Net) model. Since our spherical projection operation is equivalent to an inhomogeneous image scaling, it is worth comparing the homogeneous linear upscaling effect with our spherical projection design. The model design was summarized in Figure 4, the linear upscaling was employed to magnify the locoregional image content at k different image centers. Cropping was subsequently employed to maintain the input tensor dimension as k×256×256×4. The rest of the design followed the SPU-Net model, and the final segmentation result with uncertainty was evaluated following Equations (2)-(3). The linear scaling factor fLS was set close to fSP near the image center in the spherical projection design (as Figure 2(A)).

Figure 4.

Figure 4.

The linear scaling-based U-Net (LSU-Net) model for comparison purposes. The workflow follows SPU-Net shown in Figure 3.

In these two models, the training settings, including training/test set assignment, loss function, and initial learning rate, were kept the same as the proposed SPU-Net model. The achieved segmentation accuracy (sensitivity, specificity, accuracy, and Dice similarity index) and uncertainty score (in Equation (4)) were compared by the Wilcoxon signed-rank test. The statistical significance level was set at 0.05.

3. Results

Figure 5(A) shows the SSIM as a function of projection distance d. As shown, the paired forward and backward projections at d=0.3 produce minimal image quality degradation near the image center, and thus d=0.3 was adopted in our design. Figure 5(B) illustrates the created 16 × 16 lattice image I, its forward-projected image I, and the corresponding backward-projected image I" at d=0.3. Our in-house forward projection algorithm upscaled the lattice size near the image center while preserving the whole lattice structure. The backward projection successfully restored the overall lattice integrity from I. The locoregional scaling effect of the spherical projection at d=0.3 was quantified in Figure 5(C), i.e., the area ratio of each square tile in I and I. The factor fSP near the image center ranged from 2 to 3 (as marked by the red box); therefore, we studied linear upscaling factors fLS=2 and fLS=3 in the LSU-Net model for comparison purposes.

Figure 5.

Figure 5.

(A) SSIM (between the central 8 × 8 lattices of I and I") as a function of the projection distance d. (B) The created 16 × 16 lattice image I, its forward-projected image I, and the corresponding backward-projected image I" at d=0.3. (C) The locoregional scaling effect of the spherical projection at d=0.3 (i.e., the area ratio of each square tile in I and I at d=0.3.)

Figure 6 shows an example of ET segmentation from the classic U-Net model, the LSU-Net model, and the proposed SPU-Net model. The original MRIs with ground truth segmentation are shown in the left panel. The binarized segmentation masks with pixel-wise segmentation uncertainty are rendered in the right panel. The segmentation mask from the SPU-Net model demonstrates the highest visual consistency with the ground truth, and the classic U-Net model shows good visual consistency with some discrepancies near the tumor rim. For both fLS=2 and fLS=3, the LSU-Net models show low sensitivity in the tumor region and falsely identified the contrast-enhanced blood vessels on the right side as tumors. Among the four models, the LSU-Net (fLS=3) model shows the largest visual disagreement with the ground truth. The SPU-Net model’s uncertainty is (1) low in the tumor interior and normal tissue interior, and (2) high on the segmentation mask’s boundary. The missing tumor region in the mask (as marked by the red arrows) is appropriately indicated in the uncertainty map U. These results highlight the potential segmentation variation that follows human operations perception, which is consistent with our expectations: minimal uncertainty in the correct segmentation region and high uncertainty in the incorrect segmentations. In contrast, the uncertainty from the LSU-Net model (for both fLS=2 and fLS=3) fails to render a definite pattern. Although blurred contours can be observed, the numerical values are insufficient to highlight the boundary. The segmentation uncertainty from the classic U-Net model is better than the LSU-Net model results, but several correct segmentation predictions (e.g., tumor interior) are marked with high uncertainty. Figures 7 and 8 illustrate the TC and WT segmentations, respectively. The superior SPU-Net results in Figure 6 are similarly appreciated.

Figure 6.

Figure 6.

An example of ET segmentation from the classic U-Net model, the LSU-Net (fLS=2) model, the LSU-Net (fLS=3) model, and the SPU-Net model. The binarized segmentation mask and pixel-wise segmentation uncertainty are demonstrated for each model.

Figure 7.

Figure 7.

An example of TC segmentation (binarized segmentation results and the corresponding segmentation uncertainty).

Figure 8.

Figure 8.

An example of WT segmentation (binarized segmentation results and the corresponding segmentation uncertainty).

Figure 9(A)-(C) shows the curves of the Dice similarity index, FTP, and FTN as a function of 100τ for three segmentation targets (i.e., ET, TC, and WT), respectively. The blue, yellow, red, and green curves correspond to the classic U-Net, LSU-Net (fLS=2), LSU-Net (fLS=3), and SPU-Net models, respectively. Each curve is represented as a shaded plot, where the solid line represents the mean value for all test cases and the shaded area indicates the standard deviation. As illustrated, the SPU-Net model has a higher Dice curve for all three segmentation targets compared to the rest models, producing the highest mean AUC1. Dice curves of the LSU-Net (fLS=2) model are close to the classic U-Net model for three segmentation targets, while LSU-Net (fLS=3) yields the lowest results. Note that the Dice index of the SPU-Net model is very high even when τ is very small. This finding suggests that the SPU-Net model produces an accurate segmentation prediction with very high confidence (i.e., very low uncertainty). As τ increases (i.e., involving more “uncertain” pixels into consideration), the Dice curve has a decreasing slope for ET and TC segmentation and a steady slope for WT segmentation. These results suggest that segmentation prediction can be less accurate for those “uncertain” pixels, which is consistent with our expectations. For the FTP results, the SPU-Net model achieves the lowest mean AUC2, which indicates that true positive predictions are obtained with high confidence. Based on the observation of the FTP curve, all models show a similar shape with different AUC2 results. As τ approaches 100, the least true positive predictions are filtered out in our SPU-Net model for all segmentation targets. For the FTN results, all models achieved a very steady slope with a low mean AUC3, indicating that true negative predictions are determined with high confidence.

Figure 9.

Figure 9.

(A)-(C) The curve of the Dice similarity index, FTP, and FTN as a function of 100-τ for ET, TC, and WT segmentation, respectively.

Table I summarizes the segmentation accuracy evaluations and uncertainty scores from five-fold cross-validation (mean ± standard deviation) from the four models. In terms of segmentation accuracy, the proposed SPU-Net model achieved the highest mean Dice similarity index for all three targets. In contrast, the LSU-Net model did not show a robust improvement compared to the classic U-Net model. With respect to uncertainty scores, the proposed SPU-Net model achieved a significantly improved score. Though the LSU-Net (fLS=2) model achieved higher scores across all three segmentations compared to LSU-Net (fLS=3), LSU-Net did not consistently outperform the classic U-Net model.

Table I.

Five-fold cross-validation segmentation results and uncertainty score (mean ± standard deviation) for three segmentation targets from the classic U-Net model, the LSU-Net model, and the SPU-Net model.

Accuracy Sensitivity Specificity Dice Uncertainty Score
ET U-Net (TTA) 0.9757±0.1393 0.7977±0.1898 0.9785±0.1402 0.8031±0.1974* 0.7837±0.1546*
LSU-Net (fLS=2) 0.9923±0.0512 0.7952±0.1654 0.9952±0.0514 0.8104±0.1667* 0.7431±0.1586*
LSU-Net (fLS=3) 0.9906±0.0561 0.7554±0.1664 0.9944±0.0563 0.7807±0.1734* 0.7170±0.1488*
SPU-Net 0.9874±0.0988 0.8862±0.1347 0.9887±0.0997 0.8820±0.1478 0.8262±0.1643
TC U-Net (TTA) 0.8542±0.3241 0.7318±0.3081 0.8613±0.3419 0.6433±0.3380* 0.6428±0.2614*
LSU-Net (fLS=2) 0.9422±0.1856 0.7027±0.2922 0.9546±0.1952 0.7068±0.2908* 0.7015±0.2302*
LSU-Net (fLS=3) 0.9384±0.1833 0.6521±0.2949 0.9515±0.1973 0.6455±0.2899* 0.6565±0.2174*
SPU-Net 0.9458±0.1975 0.8168±0.2280 0.9530±0.2036 0.7966±0.2514 0.8479±0.1487
WT U-Net (TTA) 0.9892±0.0244 0.8838±0.1202 0.9958±0.0237 0.8821±0.0889* 0.8715±0.0962*
LSU-Net (fLS=2) 0.9871±0.0111 0.8703±0.0995 0.9944±0.0069 0.8858±0.0849* 0.8757±0.0637*
LSU-Net (fLS=3) 0.9811±0.0166 0.8268±0.1136 0.9910±0.0120 0.8413±0.1039* 0.8362±0.0672*
SPU-Net 0.9928±0.0175 0.9303±0.0746 0.9905±0.0169 0.9365±0.0603 0.9359±0.0495
“*”

: statistically significant result compared to the SPU-Net model.

4. Discussion

We developed a novel U-Net segmentation uncertainty quantification method using spherical projection for glioma segmentation using MP-MRI sequences obtained from a large cohort of patients in the BraTS database. A key innovation in this work is the ability to project planer MR images onto a spherical surface as part of a proposed U-Net segmentation workflow, which is equivalent to a nonlinear image transformation. In a group of projections with different centers across the entire FOV, fine structures are magnified with varying scales, resulting in non-linear transform effects. Multiple independent segmentation predictions can then be obtained by U-Net from a single MR image. Our hypothesis is that the segmentation prediction resulting from spherical transformation may mirror the uncertainty present in the model, such that a high degree of consistency in the image predictions across multiple transformations corresponds to a high level of confidence. Pixel-wise segmentation uncertainty can be obtained and visualized as an anatomical image. The uncertainty estimation improves the interpretability of a binarized segmentation mask. Figure 6 depicts how this model accurately highlights missed tumor regions and potential segmentation errors with high uncertainty. In addition, our design is different from the attention mechanisms, which refer to special model designs that guide a model to concentrate on relevant portions of the input data30. Most research integrating deep learning with visual attention mechanisms in contemporary CNN models employs masking techniques to pinpoint critical channel-wise or spatial-wise image features via another layer, i.e., attention modules, which have trainable weights31. Our spherical projection technique focuses on manipulating input image data, and it thus becomes different from the attention mechanism.

Image Segmentation accuracy and uncertainty quantification are two separate yet related topics: for an image segmentation model, the segmentation accuracy is the absolute difference between the model’s output and the ground-truth results when evaluating the model with an independent test set; on the other hand, the segmentation uncertainty aims to quantify the confidence of the model when generating output. Previous studies demonstrated that high uncertainty regions near tumor boundaries reflect intra- and inter-observer variation that occurs with manual delineation32. Figure 9 shows that the correct predictions (for both true positive and true negative results) can be obtained with high degrees of confidence. These findings, along with the uncertainty score, suggest that the SPU-Net model outperforms the classic U-Net and LSU-Net models with regard to uncertainty quantification. In the proposed SPU-Net model, each sub-region within the original MRI is transformed into a series of projected images with varied magnifications. The U-Net model predicts the segmentation (for a given sub-region) under diverse appearances, which may better reflect internal ambiguity with segmentation results. In contrast, the TTA in the classic U-Net model applies the global transformation to the entire image without variations of locoregional image details. These uncertainty results were inferior to results from the proposed SPU-Net model. While the linear upscaling design in the LSU-Net models may enhance the locoregional image details, it ignores the unscaled peripheral image features; consequently, the locoregional transformation, even after repetition for full FOV coverage, is insufficient for uncertainty quantification.

The BraTS challenge 202029,33 has been extensively studied, with numerous deep learning architectures reporting state-of-the-art segmentation accuracy. Despite its use of the classical U-Net structure with a smaller number of parameters, our proposed SPU-Net model achieved an improved or comparable Dice coefficient, demonstrating the effectiveness of our spherical projection technique. In terms of segmentation uncertainty, the QU-BraTS challenge 202029 benchmarked 15 different uncertainty quantification methods, producing uncertainty scores ranging from 0.5828-0.8885, 0.5989-0.9135, and 0.6312-0.9429 for ET, TC, and WT segmentation, respectively. Compared to the benchmarked models, our SPU-Net achieved comparable uncertainty scores to the top 5 models of each segmentation task. Notably, the SPU-Net results and the aforementioned state-of-the-art results come from different independent test sets. Thus, a direct comparison is unfortunately unavailable. Furthermore, our spherical projection technique can be easily integrated into other deep-learning segmentation models. As shown in Figure S-1 in the Supplementary Materials, two other models with the proposed spherical projection design were investigated, namely SPU-Net++ (i.e., U-Net++34 with spherical projection images) and SP-MH-UNet (i.e., MH-UNet35 with spherical projection images). We observed that both models with the proposed spherical projection design exhibited substantial increases in the uncertainty score, suggesting enhanced uncertainty quantification in all three segmentation tasks. Additionally, both models achieved improved or similar segmentation accuracies (measured by the Dice coefficient) compared to their original versions. These results suggest that our proposed spherical projection method can serve as a model-independent technique for quantifying segmentation uncertainty, while also potentially improving segmentation accuracy.

In general, DNN models fundamentally rely on sufficient, homogeneous, annotated data36. Due to limitations with currently available image datasets, the development of an accurate segmentation tool for clinical application faces significant hurdles 37. Better triaging of simple and complex cases between humans and computer algorithms could bridge gaps in current DNN models38: DNN models could be leveraged to analyze simple cases with high confidence and reliability, while difficult ones with high observable uncertainty should be referred to experienced radiologists. In this study, our SPU-Net model was approved to be a powerful tool to guide the clinical review of segmentations with high ambiguity. By replacing the U-Net with other DNN structures (such as Bayesian neural networks or Monte Carlo dropout), our SPU-Net model can be incorporated with other existing uncertainty quantification methods for uncertainty characterization. Furthermore, the proposed spherical projection-based image transformation can be generalized to other medical image-related deep learning applications for uncertainty evaluation (e.g., classification, regression, etc.).

In addition to the uncertainty quantification, the SPU-Net model improves glioma segmentation accuracy. Similar to data augmentation strategies in deep learning, the spherical projection provides a large data sample size with a diverse appearance, which can be used to improve model robustness and generalizability. Previous studies reported that small tumors/organs with fine structures and complex boundaries are challenging for the classic U-Net39. This is evident in our results in Figure 6. One explanation of such limitation is that the U-Net adopts the fixed receptive field of the convolution kernel40. Typically, a large receptive field ignores the small structure, whereas a small receptive field may extract redundant image features. Gliomas can manifest in a wide variety of sizes, shapes, and locations across the brain41, and a single convolutional kernel thus may be insufficient for every instance42,43. Previous research has shown that combining the convolutional kernels with different sizes benefits segmentation. Our proposed spherical projection model addresses this challenge by magnifying the locoregional image content at different scales, effectively altering the receptive fields relative to the entire brain without modifying the deep neural network structure. By combining the results from multiple receptive fields, the final segmentation mask is more accurate, even for complex tumor boundaries, as illustrated in Figure 6. Another interesting finding is that the LSU-Net (fLS=3) model consistently underperforms compared to the LSU-Net (fLS=2) model. As fLS increases, more peripheral image content outside the upscaled region is lost; in contrast, the spherical projection preserves the global anatomical information within FOV (as in Figure 2(A)), which improves segmentation accuracy.

As a feasibility study, the current spherical projection was designed and implemented using a pre-defined spherical surface. Given a projection distance d, the image after the forward or backward projection was determined. The optimal d was subsequently investigated by measuring the degradation in image quality following a paired forward and backward projection. Given that both the segmentation accuracy and uncertainty were evaluated in the original Cartesian plane, the results in Figure 5(A) suggest that the adopted d=0.3 is appropriate for the segmentation task. Although other d values can be specified to obtain different uncertainty results, the projection calculations may limit the segmentation accuracy. Additionally, the number of projections k can be arbitrarily selected. In this work, we set the projection center to be spaced at 8-pixel intervals, i.e., k=1024. Based on Figure 5(C), this choice magnifies each sub-region in the original MRI by a factor fSP greater than 2, which is expected to provide sufficient projections for uncertainty quantification while balancing the computational cost. Generalizing the proposed method to other applications may result in a different optimal k, and the deep neural network architecture may require further optimization to support the increased computational demands at higher k.

5. Conclusion

In this work, we developed a segmentation uncertainty quantification method based on spherical projection for U-Net. Using a large database of multi-parametric MRI-based glioma segmentations, the developed technique achieved high segmentation accuracy and successfully highlighted missed tumor regions and potential segmentation errors. The presented methodology can be generalized to other medical image-related deep-learning applications for uncertainty evaluation.

Supplementary Material

Supinfo

Funding Statement:

This work is partially supported by NIH CA014236

Footnotes

Conflict of Interest: None associated with this work

Reference

  • 1.Pham DL, Xu C, Prince JL. A Survey of Current Methods in Medical Image Segmentation. Image Segmentation.:27. [DOI] [PubMed] [Google Scholar]
  • 2.Henry T, Carre A, Lerousseau M, et al. Brain tumor segmentation with self-ensembled, deeply-supervised 3D U-net neural networks: a BraTS 2020 challenge solution. ArXiv201101045 Cs Eess. Published online November 27, 2020. Accessed April 27, 2022. http://arxiv.org/abs/2011.01045 [Google Scholar]
  • 3.Işın A, Direkoğlu C, Şah M. Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods. Procedia Comput Sci. 2016;102:317–324. doi: 10.1016/j.procs.2016.09.407 [DOI] [Google Scholar]
  • 4.Hesamian MH, Jia W, He X, Kennedy P. Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging. 2019;32(4):582–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu Z, Tong L, Chen L, et al. Deep learning based brain tumor segmentation: a survey. Complex Intell Syst. 2023;9(1):1001–1026. [Google Scholar]
  • 6.Abdar M, Pourpanah F, Hussain S, et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf Fusion. 2021;76:243–297. [Google Scholar]
  • 7.Gawlikowski J, Tassi CRN, Ali M, et al. A Survey of Uncertainty in Deep Neural Networks. Published online January 18, 2022. Accessed July 25, 2022. http://arxiv.org/abs/2107.03342 [Google Scholar]
  • 8.Yang Z, Hu Z, Ji H, et al. A Neural Ordinary Differential Equation Model for Visualizing Deep Neural Network Behaviors in Multi-Parametric MRI based Glioma Segmentation. ArXiv Prepr ArXiv220300628. Published online 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sanchez T, Caramiaux B, Thiel P, Mackay WE. Deep Learning Uncertainty in Machine Teaching. In: 27th International Conference on Intelligent User Interfaces. ACM; 2022:173–190. doi: 10.1145/3490099.3511117 [DOI] [Google Scholar]
  • 10.Ghoshal B, Tucker A, Sanghera B, Lup Wong W. Estimating uncertainty in deep learning for reporting confidence to clinicians in medical image segmentation and diseases detection. Comput Intell. 2021;37(2):701–734. [Google Scholar]
  • 11.Jungo A, Meier R, Ermis E, et al. On the effect of inter-observer variability for a reliable estimation of uncertainty of medical image segmentation. In: Springer; 2018:682–690. [Google Scholar]
  • 12.Kwon Y, Won JH, Kim BJ, Paik MC. Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation. Comput Stat Data Anal. 2020;142:106816. [Google Scholar]
  • 13.Der Kiureghian A, Ditlevsen O. Aleatory or epistemic? Does it matter? Struct Saf. 2009;31(2):105–112. [Google Scholar]
  • 14.Dechesne C, Lassalle P, Lefèvre S. Bayesian U-Net: Estimating Uncertainty in Semantic Segmentation of Earth Observation Images. Remote Sens. 2021;13(19):3836. doi: 10.3390/rs13193836 [DOI] [Google Scholar]
  • 15.Gal Y, Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: PMLR; 2016:1050–1059. [Google Scholar]
  • 16.Camarasa R, Bos D, Hendrikse J, et al. Quantitative comparison of monte-carlo dropout uncertainty measures for multi-class segmentation. In: Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis. Springer; 2020:32–41. [Google Scholar]
  • 17.Angelini ED, Clatz O, Mandonnet E, Konukoglu E, Capelle L, Duffau H. Glioma dynamics and computational models: a review of segmentation, registration, and in silico growth algorithms and their clinical applications. Curr Med Imaging. 2007;3(4):262–276. [Google Scholar]
  • 18.Mundt M, Pliushch I, Majumder S, Ramesh V. Open Set Recognition Through Deep Neural Network Uncertainty: Does Out-of-Distribution Detection Require Generative Classifiers? In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE; 2019:753–757. doi: 10.1109/ICCVW.2019.00098 [DOI] [Google Scholar]
  • 19.Liang S, Li Y, Srikant R. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. Published online August 30, 2020. Accessed July 25, 2022. http://arxiv.org/abs/1706.02690 [Google Scholar]
  • 20.Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data. 2019;6(1):60. doi: 10.1186/s40537-019-0197-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang G, Li W, Aertsen M, Deprest J, Ourselin S, Vercauteren T. Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing. 2019;338:34–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang G, Li W, Ourselin S, Vercauteren T. Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. In: Springer; 2018:61–72. [Google Scholar]
  • 23.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Springer; 2015:234–241. [Google Scholar]
  • 24.Menze BH, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2015;34(10):1993–2024. doi: 10.1109/TMI.2014.2377694 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li S. Full-view spherical image camera. In: Vol 4. IEEE; 2006:386–390. [Google Scholar]
  • 26.Zhang C, He S, Liwicki S. A Spherical Approach to Planar Semantic Segmentation. In: ; 2020. [Google Scholar]
  • 27.Siddique N, Paheding S, Elkin CP, Devabhaktuni V. U-net and its variants for medical image segmentation: A review of theory and applications. Ieee Access. 2021;9:82031–82057. [Google Scholar]
  • 28.Du G, Cao X, Liang J, Chen X, Zhan Y. Medical image segmentation based on u-net: A review. J Imaging Sci Technol. 2020;64:1–12. [Google Scholar]
  • 29.Mehta R, Filos A, Baid U, et al. QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation -- Analysis of Ranking Metrics and Benchmarking Results. ArXiv211210074 Cs Eess. Published online December 19, 2021. Accessed April 28, 2022. http://arxiv.org/abs/2112.10074 [PMC free article] [PubMed] [Google Scholar]
  • 30.Vo T. Attention! Stay Focus! Published online April 16, 2021. Accessed September 14, 2022. http://arxiv.org/abs/2104.07925 [Google Scholar]
  • 31.Oktay O, Schlemper J, Folgoc LL, et al. Attention U-Net: Learning Where to Look for the Pancreas. :10. [Google Scholar]
  • 32.Czolbe S, Arnavaz K, Krause O, Feragen A. Is segmentation uncertainty useful? In: Springer; 2021:715–726. [Google Scholar]
  • 33.Henry T, Carre A. Top 10 BraTS 2020 challenge solution: Brain tumor segmentation with self-ensembled, deeply-supervised 3D- Unet like neural networks. :13. [Google Scholar]
  • 34.Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Stoyanov D, Taylor Z, Carneiro G, et al. , eds. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Vol 11045. Lecture Notes in Computer Science. Springer International Publishing; 2018:3–11. doi: 10.1007/978-3-030-00889-5_1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ahmad P, Jin H, Alroobaea R, et al. MH UNet: A multi-scale hierarchical based architecture for medical image segmentation. IEEE Access. 2021;9:148384–148408. [Google Scholar]
  • 36.Ding Y, Liu J, Xu X, et al. Uncertainty-aware training of neural networks for selective medical image segmentation. In: PMLR; 2020:156–173. [Google Scholar]
  • 37.Joskowicz L, Cohen D, Caplan N, Sosna J. Inter-observer variability of manual contour delineation of structures in CT. Eur Radiol. 2019;29(3):1391–1399. [DOI] [PubMed] [Google Scholar]
  • 38.Nair T, Precup D, Arnold DL, Arbel T. Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation. Med Image Anal. 2020;59:101557. [DOI] [PubMed] [Google Scholar]
  • 39.Ibtehaz N, Rahman MS. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020;121:74–87. [DOI] [PubMed] [Google Scholar]
  • 40.Luo W, Li Y, Urtasun R, Zemel R. Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst. 2016;29. [Google Scholar]
  • 41.Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y. A deep learning model integrating FCNNs and CRFs for brain tumor segmentation. Med Image Anal. 2018;43:98–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Peng C, Zhang X, Yu G, Luo G, Sun J. Large kernel matters--improve semantic segmentation by global convolutional network. In: ; 2017:4353–4361. [Google Scholar]
  • 43.Shen X, Wang C, Li X, et al. Rf-net: An end-to-end image matching network based on receptive field. In: ; 2019:8132–8140. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

RESOURCES