Skip to main content
. 2023 Dec 6;9:e52202. doi: 10.2196/52202

Table 2.

Comparison of the scores achieved by GPT-4 and Japanese medical residents across various clinical fields (N=137).

Fields Questions, n (%) Examinees, % (95% CI) GPT-4, % (95% CI) Differences P value
General practice 19 (13.9) 71.8 (61.0-82.6) 63.2 (41.5-84.8) –8.6 .40
Internal medicine 48 (35.0) 55.2 (49.4-60.9) 81.3 (70.2-92.3) 26.1 <.001a
Surgery 9 (6.6) 57.6 (41.3-74.0) 77.8 (50.6-105) 20.2 .22
Pediatrics 12 (8.8) 55.1 (39.6-70.5) 66.7 (40.0-93.3) 11.6 .42
Obstetrics and gynecology 15 (10.9) 49.1 (38.8-59.4) 80.0 (59.6-100) 30.9 .02a
Emergency 19 (13.8) 48.1 (37.7-58.5) 57.9 (35.7-80.1) 9.8 .39
Psychiatry 15 (10.9) 53.8 (40.4-67.2) 46.7 (21.4-71.9) –7.1 .58

aStatistically significant.