Skip to main content

View full-text article in PMC

. 2023 Dec 6;9:e52202. doi: 10.2196/52202

Table 2.

Comparison of the scores achieved by GPT-4 and Japanese medical residents across various clinical fields (N=137).

Fields	Questions, n (%)	Examinees, % (95% CI)	GPT-4, % (95% CI)	Differences	P value
General practice	19 (13.9)	71.8 (61.0-82.6)	63.2 (41.5-84.8)	–8.6	.40
Internal medicine	48 (35.0)	55.2 (49.4-60.9)	81.3 (70.2-92.3)	26.1	<.001^a
Surgery	9 (6.6)	57.6 (41.3-74.0)	77.8 (50.6-105)	20.2	.22
Pediatrics	12 (8.8)	55.1 (39.6-70.5)	66.7 (40.0-93.3)	11.6	.42
Obstetrics and gynecology	15 (10.9)	49.1 (38.8-59.4)	80.0 (59.6-100)	30.9	.02^a
Emergency	19 (13.8)	48.1 (37.7-58.5)	57.9 (35.7-80.1)	9.8	.39
Psychiatry	15 (10.9)	53.8 (40.4-67.2)	46.7 (21.4-71.9)	–7.1	.58

^aStatistically significant.