. 2022 Aug 17;50(1):67–79. doi: 10.1007/s00259-022-05927-1

Table 3.

Patient-level, lesion-level, and voxel-level results comparison between the automated model and observer 2 measured with respect to the observer 1 segmentations

Task	Metric	Automated model	Observer 2
Patient-level classification	Accuracy (%)	100 (28/28)	93.9 (26/28)
	Sensitivity (%)	100 (20/20)	100 (20/20)
	PPV (%)	100 (20/20)	90.9 (20/22)
	Specificity (%)	100 (8/8)	75 (6/8)
	NPV (%)	100 (8/8)	100 (6/6)
Lesion-level detection	PPV (%)	95.5 (63/66)	91.7 (66/72)
	Sensitivity (%)	68.5 (63/92)	71.7 (66/92)
	F1 score (%)	79.7	80.5
Lesion sub-groups detection
Local prostate	Sensitivity (%)	100 (15/15)	93.3 (14/15)
Regional nodal	Sensitivity (%)	42.1 (8/19)	73.7 (14/19)
Distant nodal	Sensitivity (%)	62.8 (27/43)	62.8 (27/43)
Osseous	Sensitivity (%)	92.9 (13/14)	78.6 (11/14)
Visceral	Sensitivity (%)	0 (0/1)	0 (0/1)
Voxel-level segmentation	DSC (mean ± SD)	49.3 ± 18.9	33.1 ± 18.2
	Sensitivity (mean ± SD)	47.7 ± 28.4	62.6 ± 37.7
	PPV (mean ± SD)	67.9 ± 21.6	31.7 ± 24.3