Skip to main content
. 2023 Sep 26;13:16153. doi: 10.1038/s41598-023-42961-x

Table 2.

Comparison of model to test experts neuroradiologists B and C on test sets.

Categories Metric 1 Expert B Expert C Expert A
Inter-expert 2 (B to A) Model-expert2 (B to model) p-value3 for non-inferiority Inter-expert2 (C to A) Model-expert2 (C to model) p-value3 for non-inferiority Model-expert2 (A to model)
Volume VS 0.66 ± 0.1 0.81 ± 0.1 p<0.001 0.64 ± 0.3 0.51 ± 0.3 p<0.01 0.67 ± 0.14
AVD [ml] 8.40 ± 5.25 7.11 ± 4.81 Non-sig 7.28 ± 4.96 5.99 ± 2.24 p<0.05 7.43 ± 4.31
Overlap Dice 0.47 ± 0.16 0.56 ± 0.18 p<0.0001 0.25 ± 0.15 0.36 ± 0.15 p<0.0001 0.47 ± 0.13
Precision 0.49 ± 0.26 0.52 ± 0.18 p<0.0001 0.64 ± 0.16 0.77 ± 0.15 p<0.001 0.58 ± 0.26
Recall 0.59 ± 0.18 0.73 ± 0.16 p<0.0001 0.17 ± 0.15 0.26 ± 0.14 p<0.0001 0.52 ± 0.15
Distance HD 95 [mm] 15.89 ± 5.02 12.39 ± 3.78 Non-sig 21.97 ± 7.36 18.13 ± 7.03 Non-sig 18.04 ± 9.21
SDT 5 mm 0.54 ± 0.09 0.63 ± 0.16 p<0.0001 0.31 ± 0.14 0.31 ± 0.18 p<0.0001 0.46 ± 0.09

1 VS volumetric similarity, AVD absolute volume difference, HD 95 Hausdorff distance 95th percentile, SDT surface dice at tolerance 5 mm 2 Median ± 95% CI (bootstrapped) 3 p-values of one-sided Wilcoxon sign rank test.