Extended Data Table 3 |.
Task | Reader | Completeness | Correctness | Conciseness |
---|---|---|---|---|
| ||||
Radiology reports | 1 | 3.5 ± 5.6 | 1.7 ± 3.6 | 1.2 ± 4.8 |
2 | 3.6 ± 6.6 | 2.5 ± 4.7 | −0.3 ± 5.4 | |
3 | 0.8 ± 2.9 | 0.6 ± 3.2 | −1.7 ± 3.0 | |
4 | 4.7 ± 4.7 | 2.9 ± 3.9 | 1.2 ± 3.8 | |
5 | 1.4 ± 4.0 | 0.6 ± 2.2 | −0.6 ± 3.4 | |
Pooled | 2.8 ± 5.1 * | 1.7 ± 3.7 * | 0.0 ± 4.3 | |
ICC | 0.45 | 0.58 | 0.48 | |
| ||||
Patient questions | 1 | 1.7 ± 7.2 | 0.6 ± 3.4 | 0.3 ± 3.4 |
2 | 1.0 ± 5.6 | −0.1 ± 3.6 | 0.1 ± 3.6 | |
3 | 2.3 ± 7.2 | 2.0 ± 5.3 | 2.2 ± 5.9 | |
4 | 1.9 ± 6.7 | 0.0 ± 0.0 | 0.0 ± 0.0 | |
5 | 0.9 ± 5.7 | 0.4 ± 3.6 | 0.4 ± 3.6 | |
Pooled | 1.6 ± 6.5 * | 0.6 ± 3.7 * | 0.6 ± 3.9 * | |
ICC | 0.67 | 0.31 | 0.21 | |
| ||||
Progress notes | 1 | 3.4 ± 7.5 | 0.5 ± 2.5 | 0.1 ± 4.5 |
2 | 2.3 ± 6.5 | 0.6 ± 4.4 | 0.4±4.2 | |
3 | 2.7 ± 6.3 | 1.0 ± 4.4 | 0.9 ± 3.7 | |
4 | 2.5 ± 7.2 | 0.5 ± 6.8 | 1.7 ± 6.9 | |
5 | 2.0 ± 6.8 | −0.8 ± 4.5 | −0.1 ± 1.2 | |
Pooled | 2.6 ± 6.9 * | 0.4 ± 4.8 | 0.6 ± 4.5 * | |
ICC | 0.77 | 0.74 | 0.42 | |
| ||||
Overall | Pooled | 2.3 ± 5.8 * | 0.8 ± 3.7 * | 0.4 ± 4.0 * |
ICC | 0.63 | 0.56 | 0.38 |
Reader study results evaluating completeness, correctness and conciseness (columns) across individual readers and pooled across readers. Scores are on the range [−10, 10], where positive scores denote that the best model is preferred to the medical expert. Asterisks (*) on pooled rows denote statistical significance by a one-sided Wilcoxon signed-rank test, P < 0.001. Intra-class correlation (ICC) values across readers are on a range of [−1, 1] where −1, 0 and +1 correspond to negative, no and positive correlations, respectively.