Skip to main content
. 2023 Aug 28;30(12):1995–2003. doi: 10.1093/jamia/ocad177

Table 2.

ROUGE recall scores {R-1/R-2/R-L}, word count (WC) mean and standard deviation (SD), and word and sentence error rates (ER) as seen with a dependency arc entailment (DAE) model are presented for our proposed models for comparing automated and physician-written summaries.

Summarization task R-1 R-2 R-L Word count mean (±SD) Word-ER Sent-ER
History of present illness (HPI)
 Baseline: Textrank 43.30 28.94 38.40 53 (±9) 0.3 0.7
 BART 61.67 53.12 59.69 43 (±9) 6.3 42.4
 BART constrained 61.02 52.78 59.05 44 (±11) 5.7 40.6
Daily narrative
 Baseline: Textrank 9.55 1.32 8.93 20 (±3) 31.0 42.2
 BART 46.59 35.03 43.95 10 (±7) 9.9 28.8
 BART constrained 46.42 34.77 43.75 11 (±13) 7.2 26.7
Discharge summary hospital course
 Baseline: Textrank 15.48 4.18 8.51 511 (±451)
 Day-to-day 37.10 14.44 19.64 444 (±374)
 Day-to-day constrained 35.97 13.76 18.83 421 (±365)

Sentences within each summarization section are individually compared (HPI and daily) along with the full hospital course summary.

Hospital charts had 81.5k words on average, which is too large for the DAE model.