Skip to main content
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: IEEE Trans Med Imaging. 2020 Nov 30;39(12):3868–3878. doi: 10.1109/TMI.2020.3006437

TABLE II.

Calibration quality and segmentation performance for baselines trained with cross-entropy (LCE) are compared with those that were trained with Dice loss (LDSC) and those that were calibrated with ensembling (M=50) and MC dropout. Boldfaced font indicates the best results for each application (model) and shows that the differences are statistically significant.

Calibration Quality Segmentation Performance (Average Dice Score (95% CI))
Application (Model) NLL (95% CI) Brier (95% CI) ECE% (95% CI) Segment I* Segment II* Segment III*
Brain (LCE) 0.52 (0.16–1.66) 0.23 (0.08–0.62) 8.11 (1.54–26.23) 0.37 (0.00–0.84) 0.47 (0.07–0.82) 0.58 (0.03–0.87)
Brain (MCDO LCE) 0.81 (0.16–2.62) 0.36 (0.08–0.92) 13.41 (0.80–43.26) 0.34 (0.00–0.81) 0.34 (0.03–0.76) 0.54 (0.02–0.86)
Brain (EN LCE) 0.29 (0.11–0.71) 0.15 (0.05–0.40) 3.28 (0.52–10.06) 0.49 (0.00–0.92) 0.59 (0.11–0.86) 0.68 (0.04–0.91)
Brain (LDSC) 0.62 (0.17–2.70) 0.23 (0.06–0.55) 13.20 (2.60-33.55) 0.45 (0.00–0.89) 0.60 (0.10–0.90) 0.67 (0.07–0.91)
Brain (MCDO LDSC) 1.14 (0.28–4.04) 0.18 (0.06–0.49) 8.96 (2.41–23.87) 0.43 (0.00–0.88) 0.58 (0.08–0.89) 0.64 (0.03–0.91)
Brain (EN LDSC) 0.31 (0.16–0.78) 0.14 (0.08–0.35) 3.71 (0.94-15.27) 0.51 (0.00–0.93) 0.66 (0.11–0.91) 0.74 (0.16–0.92)
Heart (LCE) 0.36 (0.16–1.18) 0.17 (0.09–0.41) 5.75 (1.42–17.99) 0.77 (0.17–0.91) 0.73 (0.45–0.86) 0.91 (0.63–0.97)
Heart (MCDO LCE) 0.36 (0.17–1.10) 0.17 (0.09–0.41) 5.70 (1.39–17.93) 0.78 (0.27–0.90) 0.73 (0.47–0.86) 0.92 (0.64–0.97)
Heart (EN LCE) 0.23 (0.13–0.58) 0.13 (0.07–0.30) 2.51 (0.58–10.15) 0.81 (0.18–0.93) 0.77 (0.56–0.88) 0.93 (0.79–0.97)
Heart (LDSC) 0.62 (0.17–2.70) 0.23 (0.06–0.55) 13.20 (2.60-33.55) 0.84 (0.14–0.96) 0.81 (0.49–0.90) 0.92 (0.64–0.97)
Heart (MCDO LDSC) 0.41 (0.17–1.51) 0.45 (0.11–0.81) 36.79 (6.17-70.58) 0.84 (0.12–0.96) 0.78 (0.04–0.89) 0.91 (0.61–0.97)
Heart (EN LDSC) 0.31 (0.16–0.78) 0.14 (0.08–0.35) 3.71 (0.94–15.27) 0.87 (0.12–0.96) 0.83 (0.59–0.91) 0.93 (0.71–0.98)
Prostate (LCE) 0.40 (0.22–0.79) 0.25 (0.13–0.47) 8.08 (1.60–25.50) 0.83 (0.62–0.91)
Prostate (MCDO LCE) 0.30 (0.14–0.69) 0.16 (0.08–0.30) 5.23 (0.70–12.75) 0.77 (0.49–0.89)
Prostate (EN LCE) 0.16 (0.13–0.25) 0.09 (0.06–0.16) 4.12 (1.92–7.04) 0.87 (0.68–0.92)
Prostate (LDSC) 0.74 (0.31–1.60) 0.11 (0.06–0.27) 5.72 (3.20–12.57) 0.88 (0.72–0.93)
Prostate (MCDO LDSC) 0.48 (0.22–1.03) 0.11 (0.07–0.25) 5.23 (2.75–11.60) 0.86 (0.67–0.93)
Prostate (EN LDSC) 0.15 (0.07–0.25) 0.07 (0.04–0.14) 2.02 (0.48-3.89) 0.90 (0.76–0.95)

The presented calibration quality metrics are calculated for bounding boxes. For whole volume results see Table I of the Supplementary Material.

Comparison between Hausdorff distance of different models is provided in Table II of the Supplementary Material.

*

For brain application segments, I, II, and III correspond to non-enhancing tumor, edema, and enhancing tumor, respectively. For heart application segments, I, II, and III correspond to the right ventricle, the myocardium, and the left ventricle, respectively. For prostate application segment I corresponds to the prostate gland.