Skip to main content
. 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943

Table 1.

Answering accuracy of leading models against human performance on USMLE (test), MedMCQA (validation), and PubMedQA (test) datasets

Model Date USMLE MedMCQA PubMedQA
Codex 5-shot CoTa 2022 60.2 59.7 78.2
Llama 2 5-shot CoTa 2023 62.5 53.6
Fine-tuned SOTA 2022 50.3 52.9 78.2
GPT-4 2023 86.1 73.7 81.2
MedPalm v.2 2023 86.5 72.3 77.4
Human (passing score) 60.0 50.0
Human (expert score) 87.0 90.0 78.0

Find an overview of our results in supplemental information section A.

a

Our best methods.