The average and range of inter-expert observer variability in manual prostate segmentation across the whole and the test dataset. Mean ± standard deviation of difference observed between two experts based on our metrics. Since both segmentation labels in each of the pairwise comparisons were from expert observers (i.e., lack of reference for MAD and signed ΔV calculation), the bilateral MAD (MADb) and absolute volume difference (|ΔV|) are reported in this Table. Npat and NImg are the number of patients and the number of test images, respectively.