. 2023 Aug 28;30(12):1995–2003. doi: 10.1093/jamia/ocad177

Table 2.

ROUGE recall scores {R-1/R-2/R-L}, word count (WC) mean and standard deviation (SD), and word and sentence error rates (ER) as seen with a dependency arc entailment (DAE) model are presented for our proposed models for comparing automated and physician-written summaries.

Summarization task	R-1	R-2	R-L	Word count mean (±SD)	Word-ER $↓$	Sent-ER $↓$
History of present illness (HPI)
Baseline: Textrank	43.30	28.94	38.40	53 (±9)	0.3	0.7
BART	61.67	53.12	59.69	43 (±9)	6.3	42.4
BART constrained	61.02	52.78	59.05	44 (±11)	5.7	40.6
Daily narrative
Baseline: Textrank	9.55	1.32	8.93	20 (±3)	31.0	42.2
BART	46.59	35.03	43.95	10 (±7)	9.9	28.8
BART constrained	46.42	34.77	43.75	11 (±13)	7.2	26.7
Discharge summary hospital course
Baseline: Textrank	15.48	4.18	8.51	511 (±451)	—	—
Day-to-day	37.10	14.44	19.64	444 (±374)	—	—
Day-to-day constrained	35.97	13.76	18.83	421 (±365)	—	—

Sentences within each summarization section are individually compared (HPI and daily) along with the full hospital course summary.

Hospital charts had 81.5k words on average, which is too large for the DAE model.