Skip to main content
. Author manuscript; available in PMC: 2024 Oct 15.
Published in final edited form as: Nat Med. 2024 Feb 27;30(4):1134–1142. doi: 10.1038/s41591-024-02855-5

Fig. 3 |. Identifying the best model/method.

Fig. 3 |

a, Impact of domain-specific fine-tuning. Alpaca versus Med-Alpaca. Each data point corresponds to one experimental configuration, and the dashed lines denote equal performance. b, Comparison of adaptation strategies. One in-context example (ICL) versus QLoRA across all open-source models on the Open-i radiology report dataset. c, Effect of context length for ICL. MEDCON scores versus number of in-context examples across models and datasets. We also included the best QLoRA fine-tuned model (FLAN-T5) as a horizontal dashed line for valid datasets. d, Head-to-head model comparison. Win percentages of each head-to-head model combination, where red/blue intensities highlight the degree to which models on the vertical axis outperform models on the horizontal axis.