TABLE 4.
P-values Comparing Subgroups Using Wilcoxon Rank Sum Test
A: Voxel-by-voxel detection accuracy using ROC statistics* | |||
---|---|---|---|
Subgroups | AUC | Sensitivity | Specificity |
G1 vs. G2 | 0.0131 | 0.0017 | 0.0496 |
G1 vs. G3 | 0.0024 | 0.0024 | 0.0038 |
G2 vs. G3 | 0.4282 | 0.6794 | 0.0421 |
B: Detection and segmentation accuracy at an optimal probability threshold** | |||
Subgroups | Dice | Recall | Precision |
G1 vs. G2 | 1.0000 | 0.5816 | 0.2557 |
G1 vs. G3 | 0.0629 | 0.1131 | 0.2557 |
G2 vs. G3 | 0.0915 | 0.0230 | 0.8633 |
C: Lesion-by-lesion detection accuracy at an optimal probability threshold** | |||
Subgroups | Sensitivity | FP (no size limit) | FP (10 mm3 size limit) |
G1 vs. G2 | 0.0069 | 0.0158 | 0.0829 |
G1 vs. G3 | 0.0002 | 0.0139 | 0.6352 |
G2 vs. G3 | 0.3952 | 0.7031 | 0.1178 |
G1 = subgroup having 1–3 metastases; G2 = subgroup having 4–10 metastases; G3 = subgroup having >10 metastases. Significant P-values are highlighted in bold. All P-values were measured using the Wilcoxon rank sum test.
Sensitivity and specificity were determined using the maximum value of Youden’s index.
The metrics were estimated using an optimal probability threshold of 0.93, as determined from the development set.