. Author manuscript; available in PMC: 2024 May 20.

Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:2680–2697. doi: 10.18653/v1/2023.acl-long.151

Table 8:

ROUGE-F1 metrics for top-ranked GPT-3.5 summaries on a random 1k subset of the CNN/DailyMail test set. Single represents a single candidate baseline (similarly to Top Beam in Table 2). The others produce 16 candidates, which are then re-ranked with BRIO.

`Candidate Method`	`R1`	`R2`	`RL`
`Single`	40.84	17.30	37.07
`Temperature Sampling`	42.51	19.17	38.73
`Nucleus Sampling`	42.43	19.06	38.65
`PGA (ours)`	43.56	20.11	39.95