Skip to main content
. Author manuscript; available in PMC: 2024 May 20.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:2680–2697. doi: 10.18653/v1/2023.acl-long.151

Table 8:

ROUGE-F1 metrics for top-ranked GPT-3.5 summaries on a random 1k subset of the CNN/DailyMail test set. Single represents a single candidate baseline (similarly to Top Beam in Table 2). The others produce 16 candidates, which are then re-ranked with BRIO.

Candidate Method R1 R2 RL
Single 40.84 17.30 37.07
Temperature Sampling 42.51 19.17 38.73
Nucleus Sampling 42.43 19.06 38.65
PGA (ours) 43.56 20.11 39.95