Table 3.
Patient-level, lesion-level, and voxel-level results comparison between the automated model and observer 2 measured with respect to the observer 1 segmentations
| Task | Metric | Automated model | Observer 2 |
|---|---|---|---|
| Patient-level classification | Accuracy (%) | 100 (28/28) | 93.9 (26/28) |
| Sensitivity (%) | 100 (20/20) | 100 (20/20) | |
| PPV (%) | 100 (20/20) | 90.9 (20/22) | |
| Specificity (%) | 100 (8/8) | 75 (6/8) | |
| NPV (%) | 100 (8/8) | 100 (6/6) | |
| Lesion-level detection | PPV (%) | 95.5 (63/66) | 91.7 (66/72) |
| Sensitivity (%) | 68.5 (63/92) | 71.7 (66/92) | |
| F1 score (%) | 79.7 | 80.5 | |
| Lesion sub-groups detection | |||
| Local prostate | Sensitivity (%) | 100 (15/15) | 93.3 (14/15) |
| Regional nodal | Sensitivity (%) | 42.1 (8/19) | 73.7 (14/19) |
| Distant nodal | Sensitivity (%) | 62.8 (27/43) | 62.8 (27/43) |
| Osseous | Sensitivity (%) | 92.9 (13/14) | 78.6 (11/14) |
| Visceral | Sensitivity (%) | 0 (0/1) | 0 (0/1) |
| Voxel-level segmentation | DSC (mean ± SD) | 49.3 ± 18.9 | 33.1 ± 18.2 |
| Sensitivity (mean ± SD) | 47.7 ± 28.4 | 62.6 ± 37.7 | |
| PPV (mean ± SD) | 67.9 ± 21.6 | 31.7 ± 24.3 | |