Table 8.
Balanced Accuracy Cross-Dataset Performance Summary of Mental-Alpaca Finetuning on Single Dataset.
Test Dataset | Dreaddit | DepSeverity | SDCNL | CSSRS-Suicide | ||
---|---|---|---|---|---|---|
Finetune Dataset | Task #1 | Task #2 | Task #3 | Task #4 | Task #5 | Task #6 |
Dreaddit | ↑ 0.720 | ↑ 0.623 | ↓ 0.474 | ↑ 0.720 | ↓ 0.156 | |
DepSeverity | ↑ 0.618 | | 0.493 | ↑ 0.753 | ↓ 0.156 | ||
SDCNL | ↓ 0.468 | ↓ 0.461 | ↑ 0.623 | ↑ 0.573 | ↓ 0.156 | |
CSSRS-Suicide | ↓ 0.500 | ↓ 0.500 | ↑ 0.622 | ↑ 0.500 | ||
Reference: | ||||||
0.593 | 0.522 | 0.431 | 0.493 | 0.518 | 0.232 | |
Mental-Alpaca | 0.816 | 0.775 | 0.746 | 0.724 | 0.730 | 0.403 |
indicate the results of the model finetuned and tested on the same dataset. The bottom few rows are related Alpaca versions for reference. ↑/↓ marks the ones with better/worse cross-dataset performance compared to the zero-shot version .