Table 1.
Comparison of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 for essential knowledge questions and other questions in the Japanese Medical Licensing Examination (JMLE).
| Question category | Question (n=254), n (%) | Examinee correct response ratea (%) | GPT-3.5 correct response rate (%; 95% CI) | GPT-4 correct response rate (%; 95% CI) | P value |
| All questions | 254 (100) | 84.9 | 50.8 (44.6-57.0) | 79.9 (75.0-84.9) | <.001 |
| Essential knowledge | 78 (30.7) | 89.2 | 55.1 (43.8-66.4) | 87.2 (79.6-94.8) | <.001 |
| General clinical | 105 (41.3) | 83.1 | 43.8 (34.2-53.5) | 73.3 (64.7-81.9) | <.001 |
| Specific disease | 71 (28) | 83 | 56.3 (44.5-68.2) | 81.7 (72.5-90.9) | <.001 |
aThe correct response rates of examinees were obtained from the 117th JMLE, as announced by the Ministry of Health, Labour and Welfare [15].