Skip to main content

View full-text article in PMC

. 2023 Nov 2;9:e47532. doi: 10.2196/47532

Table 1.

Diagnostic accuracy and triage accuracy of GPT-4 and physicians.

Accuracy			GPT-4 (n, %; 95% CI^a)		Consensus of 3 physicians (n, %; 95% CI)		P value^b
Diagnosis
	Overall (n=45)	44 (97.8; 88.2-99.9)		41 (91.1; 79-98)		.38
	Self-care (n=15)	15 (100; 78.2-100)		14 (93.3; 68.1-99.8)		.99
	Nonemergent care (n=15)	15 (100; 78.2-100)		15 (100; 78.2-100)		.99
	Emergent care (n=15)	14 (93.3; 68.1-99.8)		12 (80.0; 51.9-95.7)		.13
Triage
	Overall (n=45)	30 (66.7; 51.0-80.0)		30 (66.7; 51.0-80.0)		.99
	Self-care (n=15)	2 (13.3; 1.7-40.5)		6 (40.0; 16.3-67.7)		.22
	Nonemergent care (n=15)	15 (100; 78.2-100)		11 (73.3; 44.9-92.2)		.13
	Emergent care (n=15)	13 (86.7; 59.5-98.3)		13 (86.7; 59.5-98.3)		.99

^aCIs were calculated using the Clopper-Pearson method, and they are reported in percentages.

^bThe performance of GPT-4 and that of physicians were compared using the McNemar test.