Table 2.
Comparison of model to test experts neuroradiologists B and C on test sets.
| Categories | Metric | Expert B | Expert C | Expert A | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Inter-expert (B to A) | Model-expert (B to model) | p-value for non-inferiority | Inter-expert (C to A) | Model-expert (C to model) | p-value for non-inferiority | Model-expert (A to model) | |||||||
| Volume | VS | 0.66 | ± 0.1 | 0.81 | ± 0.1 | p<0.001 | 0.64 | ± 0.3 | 0.51 | ± 0.3 | p<0.01 | 0.67 | ± 0.14 |
| AVD [ml] | 8.40 | ± 5.25 | 7.11 | ± 4.81 | Non-sig | 7.28 | ± 4.96 | 5.99 | ± 2.24 | p<0.05 | 7.43 | ± 4.31 | |
| Overlap | Dice | 0.47 | ± 0.16 | 0.56 | ± 0.18 | p<0.0001 | 0.25 | ± 0.15 | 0.36 | ± 0.15 | p<0.0001 | 0.47 | ± 0.13 |
| Precision | 0.49 | ± 0.26 | 0.52 | ± 0.18 | p<0.0001 | 0.64 | ± 0.16 | 0.77 | ± 0.15 | p<0.001 | 0.58 | ± 0.26 | |
| Recall | 0.59 | ± 0.18 | 0.73 | ± 0.16 | p<0.0001 | 0.17 | ± 0.15 | 0.26 | ± 0.14 | p<0.0001 | 0.52 | ± 0.15 | |
| Distance | HD 95 [mm] | 15.89 | ± 5.02 | 12.39 | ± 3.78 | Non-sig | 21.97 | ± 7.36 | 18.13 | ± 7.03 | Non-sig | 18.04 | ± 9.21 |
| SDT 5 mm | 0.54 | ± 0.09 | 0.63 | ± 0.16 | p<0.0001 | 0.31 | ± 0.14 | 0.31 | ± 0.18 | p<0.0001 | 0.46 | ± 0.09 | |
VS volumetric similarity, AVD absolute volume difference, HD 95 Hausdorff distance 95th percentile, SDT surface dice at tolerance 5 mm Median ± 95% CI (bootstrapped) p-values of one-sided Wilcoxon sign rank test.