Skip to main content

View full-text article in PMC

. 2024 Oct 3;26:e60601. doi: 10.2196/60601

Table 6.

Evaluation for the multidocument summarization task.

	MEDIQA-AnS (p)				MEDIQA-AnS (s)
	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-1		ROUGE-2	ROUGE-L
TextRank^a	29.88	10.23	17.01	43.77		26.80	30.52
BART	24.56^b	7.56^b	17.18^b	32.32^b		15.42	24.03^b
Pegasus	17.44	5.36	13.44	19.54		7.46	14.93
PRIMERA	16.66	4.89	12.68	21.78		9.77	16.85
BioBART	23.16	7.47	16.47	30.87		15.91^b	23.66

^aTextRank is only used as a reference for extractive summarization, so its scores are not compared with those of generative models.

^bThe superior score within the same data set.