. Author manuscript; available in PMC: 2020 May 5.

Published in final edited form as: J Magn Reson Imaging. 2019 May 2;51(1):175–182. doi: 10.1002/jmri.26766

TABLE 4.

P-values Comparing Subgroups Using Wilcoxon Rank Sum Test

A: Voxel-by-voxel detection accuracy using ROC statistics^*
Subgroups	AUC	Sensitivity	Specificity
G1 vs. G2	0.0131	0.0017	0.0496
G1 vs. G3	0.0024	0.0024	0.0038
G2 vs. G3	0.4282	0.6794	0.0421
B: Detection and segmentation accuracy at an optimal probability threshold^**
Subgroups	Dice	Recall	Precision
G1 vs. G2	1.0000	0.5816	0.2557
G1 vs. G3	0.0629	0.1131	0.2557
G2 vs. G3	0.0915	0.0230	0.8633
C: Lesion-by-lesion detection accuracy at an optimal probability threshold^**
Subgroups	Sensitivity	FP (no size limit)	FP (10 mm³ size limit)
G1 vs. G2	0.0069	0.0158	0.0829
G1 vs. G3	0.0002	0.0139	0.6352
G2 vs. G3	0.3952	0.7031	0.1178

G1 = subgroup having 1–3 metastases; G2 = subgroup having 4–10 metastases; G3 = subgroup having >10 metastases. Significant P-values are highlighted in bold. All P-values were measured using the Wilcoxon rank sum test.

Sensitivity and specificity were determined using the maximum value of Youden’s index.

^**

The metrics were estimated using an optimal probability threshold of 0.93, as determined from the development set.