Skip to main content
. Author manuscript; available in PMC: 2025 Feb 8.
Published in final edited form as: Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar 6;8(1):31. doi: 10.1145/3643540

Table 8.

Balanced Accuracy Cross-Dataset Performance Summary of Mental-Alpaca Finetuning on Single Dataset.

Test Dataset Dreaddit DepSeverity SDCNL CSSRS-Suicide
Finetune Dataset Task #1 Task #2 Task #3 Task #4 Task #5 Task #6
Dreaddit 0.823 0.720 0.623 0.474 0.720 0.156
DepSeverity 0.618 0.733 0.769 | 0.493 0.753 0.156
SDCNL 0.468 0.461 0.623 0.730 0.573 0.156
CSSRS-Suicide 0.500 0.500 0.622 0.500 0.753 0.578
Reference:
AlpacaZS 0.593 0.522 0.431 0.493 0.518 0.232
Mental-Alpaca 0.816 0.775 0.746 0.724 0.730 0.403

Numbers indicate the results of the model finetuned and tested on the same dataset. The bottom few rows are related Alpaca versions for reference. / marks the ones with better/worse cross-dataset performance compared to the zero-shot version AlpacaZS.