Skip to main content
. Author manuscript; available in PMC: 2020 May 5.
Published in final edited form as: J Magn Reson Imaging. 2019 May 2;51(1):175–182. doi: 10.1002/jmri.26766

TABLE 4.

P-values Comparing Subgroups Using Wilcoxon Rank Sum Test

A: Voxel-by-voxel detection accuracy using ROC statistics*
Subgroups AUC Sensitivity Specificity
  G1 vs. G2 0.0131 0.0017 0.0496
  G1 vs. G3 0.0024 0.0024 0.0038
  G2 vs. G3 0.4282 0.6794 0.0421
B: Detection and segmentation accuracy at an optimal probability threshold**
Subgroups Dice Recall Precision
  G1 vs. G2 1.0000 0.5816 0.2557
  G1 vs. G3 0.0629 0.1131 0.2557
  G2 vs. G3 0.0915 0.0230 0.8633
C: Lesion-by-lesion detection accuracy at an optimal probability threshold**
Subgroups Sensitivity FP (no size limit) FP (10 mm3 size limit)
  G1 vs. G2 0.0069 0.0158 0.0829
  G1 vs. G3 0.0002 0.0139 0.6352
  G2 vs. G3 0.3952 0.7031 0.1178

G1 = subgroup having 1–3 metastases; G2 = subgroup having 4–10 metastases; G3 = subgroup having >10 metastases. Significant P-values are highlighted in bold. All P-values were measured using the Wilcoxon rank sum test.

*

Sensitivity and specificity were determined using the maximum value of Youden’s index.

**

The metrics were estimated using an optimal probability threshold of 0.93, as determined from the development set.