Table 3.
Comparison of predictive accuracies between AI model and endoscopists in the test set
Prediction target | AI model (n/N) | Experts, mean (95% CI) | Novices, mean (95% CI) | |||
---|---|---|---|---|---|---|
Overall | Video | Overall | Video | Overall | Video | |
Undifferentiated histology | ||||||
Accuracy (%) | 92.7 (102/110) | 100 (10/10) | 71.6 (68.6–74.6)* | 71.1 | 58.1 (53.4–62.8)** | 48.8 |
Sensitivity (%) | 87.3 (48/55) | 100 (5/5) | 85.8 (79.5–92.1) | 88.5 | 60.8 (53.1–68.5) | 43.1 |
Specificity (%) | 98.2 (54/55) | 100 (5/5) | 65.9 (63.2–68.6) | 65.8 | 57.4 (53.1–61.7) | 48.0 |
PPV (%) | 98.0 (48/49) | 100 (5/5) | 52.9 (46.8–59.2) | 51.1 | 53.3 (42.5–64.1) | 45.0 |
NPV (%) | 88.5 (54/61) | 100 (5/5) | 90.3 (84.6–95.4) | 91.1 | 62.9 (49.0–76.8) | 52.5 |
Submucosal invasion | ||||||
Accuracy (%) | 87.3 (96/110) | 100 (10/10) | 72.6 (70.6–74.6)* | 77.8 | 63.9 (56.4–71.4)** | 58.8 |
Sensitivity (%) | 83.3 (45/54) | 100 (4/4) | 70.9 (65.9–75.9) | 86.2 | 60.8 (53.1–68.5) | 78.5 |
Specificity (%) | 91.1 (51/56) | 100 (6/6) | 76.5 (73.4–79.6) | 70.2 | 73.0 (61.6–84.4) | 51.8 |
PPV (%) | 90.0 (45/50) | 100 (4/4) | 77.6 (71.4–83.8) | 80.6 | 77.2 (64.0–90.4) | 75.0 |
NPV (%) | 85.0 (51/60) | 100 (6/6) | 67.9 (58.8–0.77) | 75.9 | 51.0 (38.6–63.5) | 47.9 |
Lymphovascular invasion | ||||||
Accuracy (%) | 76.4 (84/110) | 100 (10/10) | 69.7 (59.6–79.8) | 72.2 | 64.8 (61.4–68.2) | 62.5 |
Sensitivity (%) | 30.0 (6/20) | 100 (1/1) | 33.7 (26.7–40.7) | 33.3 | 28.9 (20.6–37.2) | 62.5 |
Specificity (%) | 86.7 (78/90) | 100 (9/9) | 86.7 (85.2–88.2) | 76.5 | 87.5 (83.2–92.7) | 62.5 |
PPV (%) | 33.3 (6/18) | 100 (1/1) | 46.9 (31.5–62.3) | 8.7 | 40.0 (16.9–63.1) | 3.8 |
NPV (%) | 84.8 (78/92) | 100 (9/9) | 68.4 (56.7–73.7) | 91.8 | 68.1 (48.8–87.4) | 96.0 |
Lymph node metastasis | ||||||
Accuracy (%) | 87.7 (71/81) | 62.5 (5/8) | 72.3 (65.8–78.8)* | 68.1 | 67.7 (49.8–85.6) | 65.6 |
Sensitivity (%) | 41.7 (5/12) | 33.3 (1/3) | 17.6 (11.5–23.7) | 22.2 | 14.8 (5.2–24.4) | 29.2 |
Specificity (%) | 95.4 (66/69) | 80.0 (4/5) | 85.3 (72.3—98.4) | 95.6 | 83.4 (79.6–87.2) | 87.5 |
PPV (%) | 62.5 (5/8) | 50 (1/2) | 19.7 (3.9–35.5) | 46.3 | 10.3 (2.6–18.0) | 43.8 |
NPV (%) | 90.4 (66/73) | 66.7 (4/6) | 81.8 (72.7–90.9) | 67.9 | 76.2 (0.58–0.95) | 70.5 |
AI artificial intelligence, CI confidence interval, n number of correct answers, N number of questions
* P < 0.05, when accuracy was compared with that of the AI system using the Mcnemar’s test
** P < 0.05, when the mean accuracy was compared with that of the experts using the Mann–Whitney U test