Table 6.
Evaluation for the multidocument summarization task.
|
|
MEDIQA-AnS (p) | MEDIQA-AnS (s) | |||||
|
|
ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-1 | ROUGE-2 | ROUGE-L | |
| TextRanka | 29.88 | 10.23 | 17.01 | 43.77 | 26.80 | 30.52 | |
| BART | 24.56b | 7.56b | 17.18b | 32.32b | 15.42 | 24.03b | |
| Pegasus | 17.44 | 5.36 | 13.44 | 19.54 | 7.46 | 14.93 | |
| PRIMERA | 16.66 | 4.89 | 12.68 | 21.78 | 9.77 | 16.85 | |
| BioBART | 23.16 | 7.47 | 16.47 | 30.87 | 15.91b | 23.66 | |
aTextRank is only used as a reference for extractive summarization, so its scores are not compared with those of generative models.
bThe superior score within the same data set.