Table 8.
Balanced Accuracy Cross-Dataset Performance Summary of Mental-Alpaca Finetuning on Single Dataset.
| Test Dataset | Dreaddit | DepSeverity | SDCNL | CSSRS-Suicide | ||
|---|---|---|---|---|---|---|
| Finetune Dataset | Task #1 | Task #2 | Task #3 | Task #4 | Task #5 | Task #6 |
| Dreaddit | ↑ 0.720 | ↑ 0.623 | ↓ 0.474 | ↑ 0.720 | ↓ 0.156 | |
| DepSeverity | ↑ 0.618 | | 0.493 | ↑ 0.753 | ↓ 0.156 | ||
| SDCNL | ↓ 0.468 | ↓ 0.461 | ↑ 0.623 | ↑ 0.573 | ↓ 0.156 | |
| CSSRS-Suicide | ↓ 0.500 | ↓ 0.500 | ↑ 0.622 | ↑ 0.500 | ||
| Reference: | ||||||
| 0.593 | 0.522 | 0.431 | 0.493 | 0.518 | 0.232 | |
| Mental-Alpaca | 0.816 | 0.775 | 0.746 | 0.724 | 0.730 | 0.403 |
indicate the results of the model finetuned and tested on the same dataset. The bottom few rows are related Alpaca versions for reference. ↑/↓ marks the ones with better/worse cross-dataset performance compared to the zero-shot version .