. 2023 Jun 29;9:e48002. doi: 10.2196/48002

Table 3.

Comparison of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 in the Japanese Medical Licensing Examination (JMLE) by difficulty levela.

Difficulty level	Question (n=254), n (%)	Examinee correct response rate^b (%)	GPT-3.5 correct response rate (%; 95% CI)	GPT-4 correct response rate (%; 95% CI)	P value
Easy	82 (32.3)	98.7	69.5 (59.3-79.7)	87.8 (80.6-95.0)	.001
Normal	112 (44.1)	90.2	46.2 (37.0-55.8)	77.7 (69.8-85.5)	<.001
Hard	60 (23.6)	56.3	33.3% (21.1-45.6)	73.3 (61.8-84.8)	<.001

^aDifficulty level was classified by the percentage of correct responses provided by medu4 [16], Japan’s leading preparatory school for the JMLE: easy, >97%; normal, 80% to 96.9%; and hard, <79.9%.

^bThe correct response rates of examinees were obtained from the 117th JMLE, as announced by the Ministry of Health, Labour and Welfare [15].