Table 2.
Test Set 1 | Test Set 2 | ||||||
---|---|---|---|---|---|---|---|
Before Training (Test Set 1A) |
p-Value | After Training (Test Set 1B) |
p-Value | Follow-Up | p-Value | ||
Sensitivity (95% CI) |
Unassisted gastroenterologists | 57.78% (43.33–72.23) |
0.002 | 85.56% (77.38–93.73) |
0.076 | 71.11% (53.60–88.63) |
0.025 |
AI-assisted gastroenterologists | 84.44% (76.97–92.91) |
94.44% (86.26–100) |
91.11% (82.64–99.58) |
||||
Specificity (95% CI) |
Unassisted gastroenterologists | 63.33% (41.32–85.35) |
0.668 | 71.11% (43.24–98.97) |
0.652 | 70.0% (50.34–89.66) |
0.631 |
AI-assisted gastroenterologists | 68.89% (45.20–92.58) |
65.56% (52.72–78.39) |
74.44% (62.93–86.50) |
||||
Accuracy (95% CI) |
Unassisted gastroenterologists | 60.56% (47.58–73.54) |
0.033 | 78.33% (67.11–89.56) |
0.765 | 70.56% (57.96–83.15) |
0.050 |
AI-assisted gastroenterologists | 76.67% (66.05–87.28) |
80.0% (71.72–88.28) |
82.78% (76.36–89.20) |
The p-values were calculated for separate comparisons regarding the sensitivity, specificity, and accuracy between unassisted gastroenterologists (N = 6) and AI-assisted gastroenterologists (N = 6) at every moment of testing (before training, directly after training, and at follow-up).