Table 2. Correlations between the prediction performance of the random forest models using different features and self-reported SWL.
Feature set | r | p value | RMSE |
---|---|---|---|
Baseline (1) | 0.001 | 0.97 | 1.37 |
LIWC (13) | 0.29 | 1.3e -15 | 1.32 |
selected LDA (117) | 0.33 | < 2.2e -16 | 1.32 |
selected LDA + sentiment (120) | 0.34 | < 2.2e -16 | 1.31 |
selected LDA + selected LIWC + sentiment (133) | 0.36 | < 2.2e -16 | 1.30 |
The baseline model uses the median of the self-reported SWL with variation as feature. Root mean square error (RMSE) is relative to a range of SWL scores from the full dataset of 1.2 to 6.8. Numbers within brackets in the ‘feature set’ column are numbers of features in those sets.