Table 1.
LLM, response | NBMEa-Free-Step1 (n=87), n (%) | NBME-Free-Step2 (n=102), n (%) | AMBOSS-Step1 (n=100), n (%) | AMBOSS-Step2 (n=100), n (%) | |
ChatGPTb | |||||
|
Correct | 56 (64.4) | 59 (57.8) | 44 (44) | 42 (42) |
|
Incorrect | 31 (35.6) | 43 (42.2) | 56 (56) | 58 (58) |
InstructGPT | |||||
|
Correct | 45 (51.7) | 54 (52.9) | 36 (36) | 35 (35) |
|
Incorrect | 42 (48.3) | 48 (47.1) | 64 (64) | 65 (65) |
GPT-3 | |||||
|
Correct | 22 (25.3) | 19 (18.6) | 20 (20) | 17 (17) |
|
Incorrect | 65 (74.7) | 83 (81.4) | 80 (80) | 83 (83) |
aNBME: National Board of Medical Examiners.
bChatGPT: Chat Generative Pre-trained Transformer.