Table A1.
Performance of different rationalization approaches on the MovieReviews, SST, and FEVER datasets.
| Dataset | Rationale | Approach | F1 | Suff | Comp | References |
|---|---|---|---|---|---|---|
| MovieReviews | Extractive | Pipeline | 0.77 | 0.88 | 0.10 | Atanasova et al., 2024 |
| Extractive | Pipeline | 0.84 | 0.89 | 0.09 | Guerreiro and Martins, 2021 | |
| Extractive | Pipeline | 0.91 | 0.95 | 0.12 | Chan A. et al., 2022 | |
| Extractive | MT Unsupervised | 0.91 | 0.93 | 0.11 | Lei et al., 2016 | |
| Extractive | MT Unsupervised | 0.94 | 0.92 | 0.12 | Paranjape et al., 2020 | |
| Extractive | MT Unsupervised | 0.90 | 0.91 | 0.15 | Carton et al., 2020 | |
| Extractive | MT Supervised | 0.92 | 0.93 | 0.14 | Lei et al., 2016 | |
| Extractive | MT Supervised | 0.96 | 0.91 | 0.16 | DeYoung et al., 2020 | |
| Abstractive | MT Text-to-Text | 0.97 | 0.89 | 0.11 | Narang et al., 2020 | |
| SST | Extractive | Pipeline | 0.80 | 0.75 | 0.11 | Guerreiro and Martins, 2021 |
| Extractive | Pipeline | 0.93 | 0.89 | 0.11 | Chan A. et al., 2022 | |
| Extractive | MT Unsupervised | 0.92 | 0.95 | 0.15 | Carton et al., 2020 | |
| Abstractive | Generative Pipelined | 0.90 | 0.79 | 0.07 | Zhao and Vydiswaran, 2021 | |
| FEVER | Extractive | MT Unsupervised | 0.71 | 0.85 | 0.05 | DeYoung et al., 2020 |
| Extractive | Pipeline | 0.70 | 0.89 | 0.07 | Guerreiro and Martins, 2021 | |
| Extractive | MT Unsupervised | 0.82 | 0.85 | 0.15 | Carton et al., 2020 | |
| Extractive | MT Supervised | 0.85 | 0.87 | 0.14 | DeYoung et al., 2020 | |
| Extractive | MT Supervised | 0.87 | 0.87 | 0.16 | DeYoung et al., 2020 | |
| Abstractive | Generative MT | 0.84 | 0.87 | 0.11 | Zhou et al., 2020 |