Skip to main content
. 2023 Nov 2;9:e47532. doi: 10.2196/47532

Table 1.

Diagnostic accuracy and triage accuracy of GPT-4 and physicians.

Accuracy GPT-4 (n, %; 95% CIa) Consensus of 3 physicians (n, %; 95% CI) P valueb
Diagnosis

Overall (n=45) 44 (97.8; 88.2-99.9) 41 (91.1; 79-98) .38

Self-care (n=15) 15 (100; 78.2-100) 14 (93.3; 68.1-99.8) .99

Nonemergent care (n=15) 15 (100; 78.2-100) 15 (100; 78.2-100) .99

Emergent care (n=15) 14 (93.3; 68.1-99.8) 12 (80.0; 51.9-95.7) .13
Triage

Overall (n=45) 30 (66.7; 51.0-80.0) 30 (66.7; 51.0-80.0) .99

Self-care (n=15) 2 (13.3; 1.7-40.5) 6 (40.0; 16.3-67.7) .22

Nonemergent care (n=15) 15 (100; 78.2-100) 11 (73.3; 44.9-92.2) .13

Emergent care (n=15) 13 (86.7; 59.5-98.3) 13 (86.7; 59.5-98.3) .99

aCIs were calculated using the Clopper-Pearson method, and they are reported in percentages.

bThe performance of GPT-4 and that of physicians were compared using the McNemar test.