Table 3.
Comparison of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 in the Japanese Medical Licensing Examination (JMLE) by difficulty levela.
Difficulty level | Question (n=254), n (%) | Examinee correct response rateb (%) | GPT-3.5 correct response rate (%; 95% CI) | GPT-4 correct response rate (%; 95% CI) | P value |
Easy | 82 (32.3) | 98.7 | 69.5 (59.3-79.7) | 87.8 (80.6-95.0) | .001 |
Normal | 112 (44.1) | 90.2 | 46.2 (37.0-55.8) | 77.7 (69.8-85.5) | <.001 |
Hard | 60 (23.6) | 56.3 | 33.3% (21.1-45.6) | 73.3 (61.8-84.8) | <.001 |
aDifficulty level was classified by the percentage of correct responses provided by medu4 [16], Japan’s leading preparatory school for the JMLE: easy, >97%; normal, 80% to 96.9%; and hard, <79.9%.
bThe correct response rates of examinees were obtained from the 117th JMLE, as announced by the Ministry of Health, Labour and Welfare [15].