Skip to main content
. Author manuscript; available in PMC: 2024 Sep 30.
Published in final edited form as: Proc Mach Learn Res. 2023 Aug;219:2–30.

Table 9:

Comparing the correlation to human annotations of single metrics, as well as the average correlation of ensembles of metrics that include a given metric. Lastly, we include the correlation of the best performing metric ensemble (Coverage, BARTScore, Distilled).

Metric Pearson Correlation
Single Avg In Ensemble
Coverage (Cov) .457 .544

BARTScore .539 .550
CTC .507 .546
Entailment .453 .539
BERTScore .482 .535
Reviser .324 .528
FactScore .444 .536

Distilled .564 .556

Best Ensemble N/A .583