Table 4.
Comparison of model performance with subspecialists and junior radiologists evaluating uncropped images of the external testing data and stratified by age group. For Cohen's kappa scores and categorical accuracy, 95% confidence intervals were generated using 10,000 bootstrap samples. Permutation tests with 10,000 iterations were used to calculate p-values.
| Accuracy | Cohen's κ | Difference in κ | p-value | ||
|---|---|---|---|---|---|
| Total (n = 291) |
Model | 73•4% | 0•560 (0•481, 0•639) | ||
| Rater 1 | 69•3% | 0•483 (0•394, 0•567) | -0•077 (-0•180, 0•021) | 0•14 | |
| Rater 2 | 73•4% | 0•553 (0•468, 0•634) | -0•007 (-0•112, 0•096) | 0•89 | |
| Rater 3 | 73•1% | 0•555 (0•472, 0•633) | -0•005 (-0•115, 0•103) | 0•93 | |
| Rater 4 | 67•9% | 0•430 (0•340, 0•519) | -0•130 (-0•240, -0•020) | 0•02 | |
| Rater 5 | 63•4% | 0•367 (0•285, 0•449) | -0•193 (-0•293, -0•093) | 0•0005 | |
| Age (<10, n = 97) | Model | 74•2% | 0•383 (0•210, 0•542) | ||
| Rater 1 | 79•4% | 0•478 (0•278, 0•655) | 0•095 (-0•128, 0•314) | 0•41 | |
| Rater 2 | 79•4% | 0•515 (0•334, 0•678) | 0•132 (-0•080, 0•343) | 0•23 | |
| Rater 3 | 79•4% | 0•535 (0•367, 0•695) | 0•152 (-0•080, 0•393) | 0•25 | |
| Rater 4 | 80•4% | 0•448 (0•239, 0•637) | 0•065 (-0•177, 0•314) | 0•61 | |
| Rater 5 | 69•1% | 0•229 (0•064, 0•390) | -0•154 (-0•341, 0•017) | 0•11 | |
| Age (10-24, n = 97) | Model | 77•3% | 0•630 (0•498, 0•755) | ||
| Rater 1 | 70•1% | 0•496 (0•336, 0•640) | -0•134 (-0•311, 0•038) | 0•13 | |
| Rater 2 | 72•2% | 0•538 (0•392, 0•676) | -0•092 (-0•261, 0•075) | 0•28 | |
| Rater 3 | 77•3% | 0•618 (0•473, 0•749) | -0•012 (-0•183, 0•156) | 0•88 | |
| Rater 4 | 69•1% | 0•450 (0•291, 0•596) | -0•180 (-0•352, -0•011) | 0•045 | |
| Rater 5 | 52•6% | 0•217 (0•085, 0•354) | -0•413 (-0•576, -0•246) | <1•0e-6 | |
| Age (>24, n = 97) | Model | 68•8% | 0•514 (0•366, 0•648) | ||
| Rater 1 | 58•3% | 0•386 (0•250, 0•521) | -0•128 (-0•304, 0•047) | 0•15 | |
| Rater 2 | 68•8% | 0•526 (0•385, 0•660) | 0•012 (-0•178, 0•200) | 0•89 | |
| Rater 3 | 62•5% | 0•413 (0•263, 0•556) | -0•101 (-0•294, 0•093) | 0•31 | |
| Rater 4 | 54•2% | 0•282 (0•132, 0•429) | -0•232 (-0•426, -0•033) | 0•025 | |
| Rater 5 | 68•8% | 0•479 (0•345, 0•608) | -0•035 (-0•198, 0•137) | 0•71 |
Rater 1 and 2 are subspecialists, while rater 3-5 are junior radiologists.