Skip to main content
. Author manuscript; available in PMC: 2024 Apr 30.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:10520–10542. doi: 10.18653/v1/2023.acl-long.587

Table 5:

Benchmarking PRIMERA and LongT5 models after initial fine-tuning (FT) for relevance and faithfulness. R1, R2, and BS-Ref stand for Rouge-1/2 F1 and BERTScore F1 vis-a-vis reference, respectively. Fact., Bart., and BS-Src stand for FactScore, BARTScore, and BERTScore F1 vis-a-vis the source. Metrics defined in §4.1 and 4.2.

Model Clinical Chemical Biomedical
Relevance Metrics R1 R2 BS-Ref R1 R2 BS-Ref R1 R2 BS-Ref
PRIMERA 25.15 9.39 83.81 45.47 16.31 86.24 48.01 20.83 86.25
LongT5 24.22 8.57 83.15 42.51 14.46 85.74 44.32 17.91 85.02
Faithful Metrics Fact. Bart. BS-Src Fact. Bart. BS-Src Fact. Bart. BS-Src
PRIMERA 53.29 −2.92 83.33 85.96 −6.29 88.89 86.91 −3.77 88.54
LongT5 53.71 −2.88 82.84 83.25 −6.36 88.70 83.62 −3.89 88.31