Table 2.
Diagnostic accuracy for AIS, LVO, stroke and cerebral hemorrhage detection of GPT-3.5 and GPT-4.
GPT-3.5 | GPT-4 | P value a | |
---|---|---|---|
AIS | 0.015 | ||
Sensitivity,% (95% CI) | 94.3 (91.5–96.3) | 93.5 (90.5–95.6) | |
Specificity,% (95% CI) | 24.4 (14.2–38.7) | 55.6 (41.2–69.1) | |
Accuracy,% (95% CI) | 86.5 (82.8–89.5) | 89.3 (85.8–91.9) | |
PPV,% (95% CI) | 90.8 (87.4–93.3) | 94.3 (91.4–96.3) | |
NPV,% (95% CI) | 35.5 (21.1–53.1) | 52.1 (38.3–65.5) | |
PLR (95% CI) | 1.25 (0.93–1.67) | 2.10 (0.73–2.73) | |
AUC (95% CI) | 0.59 (0.50–0.69) | 0.75 (0.65–0.84) | |
LVO stroke | <0.001 | ||
Sensitivity,% (95% CI) | 81.2 (71.6–88.1) | 77.7 (67.7–85.2) | |
Specificity,% (95% CI) | 38.1 (32.9–43.6) | 64.1 (58.7–69.2) | |
Accuracy,% (95% CI) | 47.3 (42.4–52.2) | 67.0 (62.3–71.4) | |
PPV,% (95% CI) | 26.1 (21.2–31.8) | 36.9 (30.2–44.1) | |
NPV,% (95% CI) | 88.2 (81.8–92.6) | 91.4 (87.0–94.4) | |
PLR (95% CI) | 1.31 (0.90–1.92) | 2.16 (1.50–3.12) | |
AUC (95% CI) | 0.60 (0.53–0.66) | 0.71 (0.65–0.77) | |
Cerebral Hemorrhage | 0.031 | ||
Sensitivity,% (95% CI) | 21.2(-4.0–46.5) | 45.5 (41.7–49.3) | |
Specificity,% (95% CI) | 95.6 (91.6–99.5) | 94.1 (92.5–95.7) | |
Accuracy,% (95% CI) | 89.2 (86.1–92.4) | 90.0 (88.7–91.3) | |
PPV,% (95% CI) | 29.3 (1.1–57.5) | 40.5 (30.8–50.2) | |
NPV,% (95% CI) | 93.1 (87.2–99.0) | 95.1 (93.5–96.7) | |
PLR (95% CI) | 1.01 (0.471–2.10) | 7.59 (5.23–11.06) | |
AUC (95% CI) | 0.58 (0.47–0.70) | 0.69 (0.59–0.81) |
Abbreviations: AIS = acute ischemic stroke; AUC = area under the receiver operating characteristic curve; LVO stroke = large vessel occlusion stroke; NPV = negative predictive value; PLR = positive likelihood ratio; PPV = positive predictive value.
The P value reflects the significance of the difference between GPT-3.5 and GPT-4 in terms of their proportions of right and wrong predictions, as determined by the McNemar test.