. Author manuscript; available in PMC: 2023 Sep 27.

Published in final edited form as: Eur Radiol. 2023 Apr 12;33(9):6582–6591. doi: 10.1007/s00330-023-09583-3

Table 2.

Pair-wise comparison of segmentation model DSCs using bootstrap resampling

Models compared [DSC (95% CI)]		p value
Full image segmentation model comparisons
U-Net baseline [0.768 (0.753–0.781)]	U-Net after self-refinement [0.798 (0.784–0.810)]	< 0.001
Mask R-CNN baseline [0.831 (0.816–0.846)]	Mask R-CNN after self-refinement [0.871 (0.854–0.886)]	< 0.001
HRNet baseline [0.838 (0.823–0.854)]	HRNet after self-refinement [0.873 (0.858–0.889)]	< 0.001
Full image vs. Hybrid segmentation model comparison
HRNet after self-refinement [0.873 (0.858–0.889)]	Mask R-CNN hybrid [0.884 (0.868–0.899)]	< 0.001

Pair-wise comparison of segmentation model DSCs using bootstrap resampling technique. All baseline models were trained using the Train_Otsu dataset and are compared with the peak model of the same architecture obtained from self-refinement

The best-performing full image segmentation model (HRNet after self-refinement) is compared with the best-performing hybrid method (Mask R-CNN hybrid). The image patch segmentation model used in the hybrid Mask R-CNN method was trained with the Train_Final-patch dataset derived from Train_Final

For each comparison, the DSC of the better performing model is in bold text. Bolded p values indicate p < 0.05

CI, confidence interval