Skip to main content
. Author manuscript; available in PMC: 2024 Apr 30.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:10520–10542. doi: 10.18653/v1/2023.acl-long.587

Table 7:

PRIMERA models calibrated to improve faithfulness. Contrast sets for calibration are formed from the generation methods in §4.2. REL stands for RelAgg (from §4.1). FAITH stands for FaithAgg (from §4.2).

Selection Type Selection Strategy Clinical Chemical Biomedical Dataset Avg.
REL FAITH REL FAITH REL FAITH REL FAITH
Random −.264 .133 −.054 .085 .005 .165 −.104 .128
Quality Average −.293 .160 −.065 .037 .010 .169 −.116 .122
Margin Based Max −.326 .313 −.139 .011 −.033 .018 −.166 .114
Min −.083 .297 −.109 .112 −.030 .039 −.074 .149
Diversity Based Max .002 .290 −.124 .043 −.052 .029 −.058 .121
Min −.039 .315 −.040 .101 −.043 .093 −.041 .170
Likelihood Based Easy .043 .177 −.058 .002 −.024 .071 −.013 .083
Hard .071 .174 −.233 .215 .013 .147 −.050 .179
Spurious Max Extract. Gap .044 .278 .058 .046 −.051 .067 .017 .131
Avg. Across Strategies −.094 .237 −.085 .072 −.023 .089 −.067 .133