Table 9.
Depressed users most strongly misclassified in each variation of the preemptive depression identification experimenta.
|
|
One depression user per control user (1:1) | One depression user per 3 control users (1:3) | One depression user per 5 control users (1:5) | One depression user per 10 control users (1:10) | |
| Classifier | MentalBERT LMb | MentalBERT LM | MentalRoBERTa LM | SVMc using word embeddings | |
| User | d13 | d38 | d13 | d57 | |
| Control probability | 0.93 | 0.94 | 0.99 | 0.98 | |
| Sum of post lengths in words | 1696 | 1888 | 1696 | 55,897 | |
| Topic |
|
|
|
|
|
| Chief TF-IDFd features |
|
|
|
|
|
| Depressed vocabulary counts | |||||
|
|
people | 1 | 1 | 1 | 64 |
|
|
know | 6 | 0 | 6 | 93 |
|
|
thing | 3 | 0 | 3 | 35 |
|
|
feel | 2 | 2 | 2 | 10 |
|
|
time | 5 | 8 | 5 | 99 |
|
|
woman | 1 | 0 | 1 | 7 |
|
|
go | 3 | 0 | 3 | 54 |
|
|
want | 3 | 1 | 3 | 71 |
|
|
life | 2 | 0 | 2 | 28 |
|
|
relationship | 0 | 0 | 0 | 2 |
| Control vocabulary counts | |||||
|
|
game | 0 | 1 | 0 | 9 |
|
|
trade | 0 | 0 | 0 | 2 |
|
|
key | 0 | 0 | 0 | 4 |
|
|
team | 2 | 3 | 2 | 4 |
|
|
play | 0 | 1 | 0 | 35 |
|
|
player | 0 | 0 | 0 | 8 |
|
|
shiny | 0 | 0 | 0 | 0 |
|
|
hatch | 0 | 0 | 0 | 0 |
|
|
thank | 1 | 1 | 0 | 15 |
|
|
add | 0 | 2 | 0 | 14 |
aLexical properties of those users’ posts are provided.
bLM: language model.
cSVM: support vector machine.
dTF-IDF: term frequency–inverse document frequency.