Skip to main content
. Author manuscript; available in PMC: 2024 Oct 15.
Published in final edited form as: Nat Med. 2024 Feb 27;30(4):1134–1142. doi: 10.1038/s41591-024-02855-5

Fig. 4 |. Clinical reader study.

Fig. 4 |

a, Study design comparing summaries from the best model versus that of medical experts on three attributes: completeness, correctness and conciseness. b, Results. Highlight colors correspond to a value’s location on the color spectrum. Asterisks (*) denote statistical significance by a one-sided Wilcoxon signed-rank test, P < 0.001. c, Distribution of reader scores for each summarization task across attributes. Horizontal axes denote reader preference as measured by a five-point Likert scale. Vertical axes denote frequency count, with 1,500 total cases for each plot. d, Extent and likelihood of possible harm caused by choosing summaries from the medical expert (pink) or best model (purple) over the other. e, Reader study user interface.