. 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943

Table 1.

Answering accuracy of leading models against human performance on USMLE (test), MedMCQA (validation), and PubMedQA (test) datasets

Model	Date	USMLE	MedMCQA	PubMedQA
Codex 5-shot CoT^a	2022	60.2	59.7	78.2
Llama 2 5-shot CoT^a	2023	62.5	53.6	–
Fine-tuned SOTA	2022	50.3	52.9	78.2
GPT-4	2023	86.1	73.7	81.2
MedPalm v.2	2023	86.5	72.3	77.4
Human (passing score)	–	60.0	50.0	–
Human (expert score)	–	87.0	90.0	78.0

Find an overview of our results in supplemental information section A.

Our best methods.