Skip to main content
. 2024 Oct 3;26:e60601. doi: 10.2196/60601

Table 6.

Evaluation for the multidocument summarization task.


MEDIQA-AnS (p) MEDIQA-AnS (s)

ROUGE-1 ROUGE-2 ROUGE-L ROUGE-1 ROUGE-2 ROUGE-L
TextRanka 29.88 10.23 17.01 43.77 26.80 30.52
BART 24.56b 7.56b 17.18b 32.32b 15.42 24.03b
Pegasus 17.44 5.36 13.44 19.54 7.46 14.93
PRIMERA 16.66 4.89 12.68 21.78 9.77 16.85
BioBART 23.16 7.47 16.47 30.87 15.91b 23.66

aTextRank is only used as a reference for extractive summarization, so its scores are not compared with those of generative models.

bThe superior score within the same data set.