Skip to main content
. 2023 Nov 2;9:e47532. doi: 10.2196/47532

Table 2.

Comparison of diagnostic and triage accuracy of GPT-4 with racial and ethnic conditions. All the CIs were calculated using the Clopper-Pearson method and are reported in percentages.

Accuracy Correct answers without race and ethnic conditions, n (%; 95% CI) Correct answers with racial and ethnic conditions, n (%; 95% CI)


Black White Asian Hispanic
Diagnosis

Overall (n=45) 44 (97.8; 88.2-99.9) 45 (100; 92.1-100)a 45 (100; 92.1-100)a 45 (100; 92.1-100)a 45 (100; 92.1-100)a

Emergent care (n=15) 15 (100; 78.2-100) 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a

Nonemergent care (n=15) 15 (100; 78.2-100) 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a

Self-care (n=15) 14 (93.3; 68.1-99.8) 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a 15 (100; 78.2-100)a
Triage

Overall (n=45) 30 (66.7: 51.0-80.0) 28 (62.2; 46.5-76.2)b 30 (66.7; 51.0-80.0)a 30 (66.7; 51.0-80.0)a 28 (62.2; 46.5-76.2)c

Emergent care (n=15) 13 (86.7; 59.5-98.3) 11 (73.3; 44.9-92.2)b 12 (80.0; 51.9-95.7)a 15 (100; 78.2-100)b 15 (100; 78.2-100)b

Nonemergent care (n=15) 15 (100; 78.2-100) 15 (100; 78.2-100)a 14 (93.3; 68.1-99.8)a 14 (93.3; 68.1-99.8)a 12 (80.0; 51.9-95.7)d

Self-care (n=15) 2 (13.3; 1.7-40.5) 2 (13.3; 1.7-40.5)a 4 (26.7; 7.8-55.1)b 1 (6.7; 0.2-31.9)a 1 (6.7; 0.2-31.9)a

aP value=.99.

bP value=.5.

cP value=.69.

dP value=.25.