. Author manuscript; available in PMC: 2024 Apr 30.

Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:10520–10542. doi: 10.18653/v1/2023.acl-long.587

Table 5:

Benchmarking PRIMERA and LongT5 models after initial fine-tuning (FT) for relevance and faithfulness. R1, R2, and BS-Ref stand for Rouge-1/2 F1 and BERTScore F1 vis-a-vis reference, respectively. Fact., Bart., and BS-Src stand for FactScore, BARTScore, and BERTScore F1 vis-a-vis the source. Metrics defined in §4.1 and 4.2.

	Model	`Clinical`			`Chemical`			`Biomedical`
`Relevance Metrics`		R1	R2	BS-Ref	R1	R2	BS-Ref	R1	R2	BS-Ref
	PRIMERA	25.15	9.39	83.81	45.47	16.31	86.24	48.01	20.83	86.25
	LongT5	24.22	8.57	83.15	42.51	14.46	85.74	44.32	17.91	85.02
`Faithful Metrics`		Fact.	Bart.	BS-Src	Fact.	Bart.	BS-Src	Fact.	Bart.	BS-Src
	PRIMERA	53.29	−2.92	83.33	85.96	−6.29	88.89	86.91	−3.77	88.54
	LongT5	53.71	−2.88	82.84	83.25	−6.36	88.70	83.62	−3.89	88.31