Table 3:
Comparison of the explanations obtained by LIME with three different types of textual cues annotated as Trigger, LoST indicators, and consequences using ROUGE scores and BLEU scores. ([Black, italicize, underlined] + bold) represents the highest values of [ATS: All Test Samples, TP: True Positives, TN: True Negatives], respectively. Higher the value of LoST indicators and lower (↓) the values with Trigger and consequences, better is the classifier.
| Model | Evaluation | Trigger (↓) | LoST Indicators (↑) | Consequences (↓) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ATS | TP | TN | ATS | TP | TN | ATS | TP | TN | ||
|
| ||||||||||
| BERT | ROUGE | 0.0756 | 0.0705 | 0.0868 | 0.2773 | 0.3128 | 0.1985 | 0.0421 | 0.0442 | 0.0375 |
| BLEU | 0.0519 | 0.0446 | 0.0681 | 0.1656 | 0.1760 | 0.1426 | 0.0315 | 0.0349 | 0.0241 | |
| ALBERT | ROUGE | 0.0987 | 0.0973 | 0.1014 | 0.2201 | 0.2424 | 0.1777 | 0.0359 | 0.0427 | 0.0229 |
| BLEU | 0.0657 | 0.0595 | 0.0774 | 0.1363 | 0.1367 | 0.1356 | 0.0271 | 0.0318 | 0.0179 | |
| DistilBERT | ROUGE | 0.0826 | 0.0837 | 0.0809 | 0.2471 | 0.2738 | 0.2051 | 0.0351 | 0.0273 | 0.0474 |
| BLEU | 0.0569 | 0.0564 | 0.0579 | 0.1451 | 0.1412 | 0.1513 | 0.0260 | 0.0219 | 0.0324 | |
| DeBERTa | ROUGE | 0.0687 | 0.0645 | 0.0790 | 0.2461 | 0.2570 | 0.2193 | 0.0468 | 0.0465 | 0.0476 |
| BLEU | 0.0459 | 0.0389 | 0.0634 | 0.1586 | 0.1513 | 0.1724 | 0.0346 | 0.0341 | 0.0359 | |
|
| ||||||||||
| ClinicalBERT | ROUGE | 0.0728 | 0.0729 | 0.0726 | 0.2045 | 0.2304 | 0.1639 | 0.0369 | 0.0359 | 0.0387 |
| BLEU | 0.0463 | 0.0451 | 0.0485 | 0.1260 | 0.1272 | 0.1241 | 0.0286 | 0.0288 | 0.0284 | |
| PsychBERT | ROUGE | 0.0946 | 0.0729 | 0.1272 | 0.2310 | 0.2304 | 0.1949 | 0.0307 | 0.0359 | 0.0415 |
| BLEU | 0.0648 | 0.0451 | 0.0860 | 0.1336 | 0.1272 | 0.1414 | 0.0226 | 0.0288 | 0.0325 | |
| MentalBERT | ROUGE | 0.0820 | 0.0733 | 0.1047 | 0.2614 | 0.2821 | 0.2076 | 0.0160 | 0.0126 | 0.0248 |
| BLEU | 0.0464 | 0.0348 | 0.0768 | 0.1544 | 0.1521 | 0.1604 | 0.0107 | 0.0091 | 0.0147 | |