Skip to main content
. 2023 Jul 12;620(7972):172–180. doi: 10.1038/s41586-023-06291-2

Fig. 5. Evaluation of comprehension, retrieval and reasoning capabilities by clinicians.

Fig. 5

a,b, Evaluation of correctness (a) and incorrectness (b) of reading comprehension, recall of knowledge and reasoning steps. The results indicate a gap between Flan-PaLM and clinicians, and show that Med-PaLM is able to substantially reduce the gap. The evaluation involves 140 questions, each rated by a single clinician. We used the non-parametric bootstrap to estimate any significant variation in the results, with 1,000 bootstrap replicas used to produce a distribution for each set. We used the 95% bootstrap percentile interval to assess variations.