Table 2.
Comparison of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 by question type in the Japanese Medical Licensing Examination (JMLE).
Question type | Question (n=254), n (%) | Examinee correct response ratea (%) | GPT-3.5 correct response rate (%; 95% CI) | GPT-4 correct response rate (%; 95% CI) | P value |
General | 134 (52.7) | 84 | 51.5 (42.9-60.0) | 79.1 (72.1-86.1) | <.001 |
Clinical | 98 (38.6) | 85.3 | 50 (39.9-60.1) | 79.6 (71.5-87.7) | <.001 |
Clinical sentence | 22 (8.7) | 88.8 | 50 (27.3-72.7) | 86.3 (70.8-102) | .005 |
aThe correct response rates of examinees were obtained from the 117th JMLE, as announced by the Ministry of Health, Labour and Welfare [15].