Table 2.
Comparison of diagnostic and triage accuracy of GPT-4 with racial and ethnic conditions. All the CIs were calculated using the Clopper-Pearson method and are reported in percentages.
Accuracy | Correct answers without race and ethnic conditions, n (%; 95% CI) | Correct answers with racial and ethnic conditions, n (%; 95% CI) | ||||
|
|
Black | White | Asian | Hispanic | |
Diagnosis | ||||||
|
Overall (n=45) | 44 (97.8; 88.2-99.9) | 45 (100; 92.1-100)a | 45 (100; 92.1-100)a | 45 (100; 92.1-100)a | 45 (100; 92.1-100)a |
|
Emergent care (n=15) | 15 (100; 78.2-100) | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a |
|
Nonemergent care (n=15) | 15 (100; 78.2-100) | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a |
|
Self-care (n=15) | 14 (93.3; 68.1-99.8) | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a | 15 (100; 78.2-100)a |
Triage | ||||||
|
Overall (n=45) | 30 (66.7: 51.0-80.0) | 28 (62.2; 46.5-76.2)b | 30 (66.7; 51.0-80.0)a | 30 (66.7; 51.0-80.0)a | 28 (62.2; 46.5-76.2)c |
|
Emergent care (n=15) | 13 (86.7; 59.5-98.3) | 11 (73.3; 44.9-92.2)b | 12 (80.0; 51.9-95.7)a | 15 (100; 78.2-100)b | 15 (100; 78.2-100)b |
|
Nonemergent care (n=15) | 15 (100; 78.2-100) | 15 (100; 78.2-100)a | 14 (93.3; 68.1-99.8)a | 14 (93.3; 68.1-99.8)a | 12 (80.0; 51.9-95.7)d |
|
Self-care (n=15) | 2 (13.3; 1.7-40.5) | 2 (13.3; 1.7-40.5)a | 4 (26.7; 7.8-55.1)b | 1 (6.7; 0.2-31.9)a | 1 (6.7; 0.2-31.9)a |
aP value=.99.
bP value=.5.
cP value=.69.
dP value=.25.