Skip to main content
. 2024 Jul 2;27(5):1088–1099. doi: 10.1007/s10120-024-01524-3

Table 3.

Comparison of predictive accuracies between AI model and endoscopists in the test set

Prediction target AI model (n/N) Experts, mean (95% CI) Novices, mean (95% CI)
Overall Video Overall Video Overall Video
Undifferentiated histology
 Accuracy (%) 92.7 (102/110) 100 (10/10) 71.6 (68.6–74.6)* 71.1 58.1 (53.4–62.8)** 48.8
 Sensitivity (%) 87.3 (48/55) 100 (5/5) 85.8 (79.5–92.1) 88.5 60.8 (53.1–68.5) 43.1
 Specificity (%) 98.2 (54/55) 100 (5/5) 65.9 (63.2–68.6) 65.8 57.4 (53.1–61.7) 48.0
 PPV (%) 98.0 (48/49) 100 (5/5) 52.9 (46.8–59.2) 51.1 53.3 (42.5–64.1) 45.0
 NPV (%) 88.5 (54/61) 100 (5/5) 90.3 (84.6–95.4) 91.1 62.9 (49.0–76.8) 52.5
Submucosal invasion
 Accuracy (%) 87.3 (96/110) 100 (10/10) 72.6 (70.6–74.6)* 77.8 63.9 (56.4–71.4)** 58.8
 Sensitivity (%) 83.3 (45/54) 100 (4/4) 70.9 (65.9–75.9) 86.2 60.8 (53.1–68.5) 78.5
 Specificity (%) 91.1 (51/56) 100 (6/6) 76.5 (73.4–79.6) 70.2 73.0 (61.6–84.4) 51.8
 PPV (%) 90.0 (45/50) 100 (4/4) 77.6 (71.4–83.8) 80.6 77.2 (64.0–90.4) 75.0
 NPV (%) 85.0 (51/60) 100 (6/6) 67.9 (58.8–0.77) 75.9 51.0 (38.6–63.5) 47.9
Lymphovascular invasion
 Accuracy (%) 76.4 (84/110) 100 (10/10) 69.7 (59.6–79.8) 72.2 64.8 (61.4–68.2) 62.5
 Sensitivity (%) 30.0 (6/20) 100 (1/1) 33.7 (26.7–40.7) 33.3 28.9 (20.6–37.2) 62.5
 Specificity (%) 86.7 (78/90) 100 (9/9) 86.7 (85.2–88.2) 76.5 87.5 (83.2–92.7) 62.5
 PPV (%) 33.3 (6/18) 100 (1/1) 46.9 (31.5–62.3) 8.7 40.0 (16.9–63.1) 3.8
 NPV (%) 84.8 (78/92) 100 (9/9) 68.4 (56.7–73.7) 91.8 68.1 (48.8–87.4) 96.0
Lymph node metastasis
 Accuracy (%) 87.7 (71/81) 62.5 (5/8) 72.3 (65.8–78.8)* 68.1 67.7 (49.8–85.6) 65.6
 Sensitivity (%) 41.7 (5/12) 33.3 (1/3) 17.6 (11.5–23.7) 22.2 14.8 (5.2–24.4) 29.2
 Specificity (%) 95.4 (66/69) 80.0 (4/5) 85.3 (72.3—98.4) 95.6 83.4 (79.6–87.2) 87.5
 PPV (%) 62.5 (5/8) 50 (1/2) 19.7 (3.9–35.5) 46.3 10.3 (2.6–18.0) 43.8
 NPV (%) 90.4 (66/73) 66.7 (4/6) 81.8 (72.7–90.9) 67.9 76.2 (0.58–0.95) 70.5

AI artificial intelligence, CI confidence interval, n number of correct answers, N number of questions

* P < 0.05, when accuracy was compared with that of the AI system using the Mcnemar’s test

** P < 0.05, when the mean accuracy was compared with that of the experts using the Mann–Whitney U test