Skip to main content
. Author manuscript; available in PMC: 2024 Oct 15.
Published in final edited form as: Nat Med. 2024 Feb 27;30(4):1134–1142. doi: 10.1038/s41591-024-02855-5

Extended Data Table 3 |.

Individual reader scores

Task Reader Completeness Correctness Conciseness

Radiology reports 1 3.5 ± 5.6 1.7 ± 3.6 1.2 ± 4.8
2 3.6 ± 6.6 2.5 ± 4.7 −0.3 ± 5.4
3 0.8 ± 2.9 0.6 ± 3.2 −1.7 ± 3.0
4 4.7 ± 4.7 2.9 ± 3.9 1.2 ± 3.8
5 1.4 ± 4.0 0.6 ± 2.2 −0.6 ± 3.4
Pooled 2.8 ± 5.1 * 1.7 ± 3.7 * 0.0 ± 4.3
ICC 0.45 0.58 0.48

Patient questions 1 1.7 ± 7.2 0.6 ± 3.4 0.3 ± 3.4
2 1.0 ± 5.6 −0.1 ± 3.6 0.1 ± 3.6
3 2.3 ± 7.2 2.0 ± 5.3 2.2 ± 5.9
4 1.9 ± 6.7 0.0 ± 0.0 0.0 ± 0.0
5 0.9 ± 5.7 0.4 ± 3.6 0.4 ± 3.6
Pooled 1.6 ± 6.5 * 0.6 ± 3.7 * 0.6 ± 3.9 *
ICC 0.67 0.31 0.21

Progress notes 1 3.4 ± 7.5 0.5 ± 2.5 0.1 ± 4.5
2 2.3 ± 6.5 0.6 ± 4.4 0.4±4.2
3 2.7 ± 6.3 1.0 ± 4.4 0.9 ± 3.7
4 2.5 ± 7.2 0.5 ± 6.8 1.7 ± 6.9
5 2.0 ± 6.8 −0.8 ± 4.5 −0.1 ± 1.2
Pooled 2.6 ± 6.9 * 0.4 ± 4.8 0.6 ± 4.5 *
ICC 0.77 0.74 0.42

Overall Pooled 2.3 ± 5.8 * 0.8 ± 3.7 * 0.4 ± 4.0 *
ICC 0.63 0.56 0.38

Reader study results evaluating completeness, correctness and conciseness (columns) across individual readers and pooled across readers. Scores are on the range [−10, 10], where positive scores denote that the best model is preferred to the medical expert. Asterisks (*) on pooled rows denote statistical significance by a one-sided Wilcoxon signed-rank test, P < 0.001. Intra-class correlation (ICC) values across readers are on a range of [−1, 1] where −1, 0 and +1 correspond to negative, no and positive correlations, respectively.