Table 4.
Overview of evaluation metrics for rationale's quality.
| Approach | Desiderata | Representative paper(s) |
|---|---|---|
| Proxy-based | Plausibility | Paranjape et al., 2020; Guerreiro and Martins, 2021; Jang and Lukasiewicz, 2021; Chan A. et al., 2022; Atanasova et al., 2024 |
| Faithfulness | Carton et al., 2020; DeYoung et al., 2020; Zhang et al., 2021a; Chan A. et al., 2022 | |
| Simulatability | Hase et al., 2020 | |
| Consistency | Atanasova et al., 2024 | |
| Robustness | Chen H. et al., 2022; Ross et al., 2022 | |
| Human-grounded | Understandability | Ehsan et al., 2019; Lertvittayakumjorn and Toni, 2019; Hase and Bansal, 2020; Jain et al., 2020 |
| Relatability | Ehsan et al., 2019; Lertvittayakumjorn and Toni, 2019; Hase and Bansal, 2020 |