Skip to main content
. 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943

Figure 7.

Figure 7

Comparing open-source LLMs against the closed-source Codex on the MedQA-USMLE benchmark (τ=0.9, up to k=100 samples)

We report answering accuracy, model calibration, and answering bias.