Table 1.
Model | AUC | F1-Score | Sensitivity | Specificity |
---|---|---|---|---|
SVM w/ LIWC | 0.712 (0.612–0.811) | 0.631 (0.500–0.761) | 0.680 (0.476–0.886) | 0.744 (0.563–0.922) |
Supervised DL w/ LIWC | 0.689 (0.560–0.818) | 0.182 (0.055–0.370) | 0.300 (0.010–0.758) | 0.767 (0.364–0.970) |
SVM w/ SKP | 0.797 (0.719–0.879) | 0.719 (0.591–0.846) | 0.654 (0.473–0.835) | 0.939 (0.855–1.0) |
Supervised DL w/ SKP | 0.811 (0.715–0.907) | 0.642 (0.469–0.813) | 0.600 (0.366–0.833) | 0.911 (0.838–0.984) |
RL (T = 5) | 0.633 (0.535–0.703) | 0.486 (0.288–0.680) | 0.459 (0.280–0.630) | 0.811 (0.661–0.936) |
RL (T = 10) | 0.741 (0.631–0.852) | 0.590 (0.352–0.829) | 0.560 (0.309–0.811) | 0.922 (0.823–0.969) |
RL (T = 15) | 0.721 (0.618–0.827) | 0.595 (0.399–0.790) | 0.50 (0.327–0.713) | 0.922 (0.856–0.987) |
RL (T = 20) | 0.809 (0.706–0.914) | 0.726 (0.551–0.901) | 0.620 (0.413–0.827) | 0.988 (0.953–1.0) |
RL (T = 30) | 0.853 (0.796–0.914) | 0.801 (0.733–0.880) | 0.818 (0.678–0.958) | 0.898 (0.828–0.969) |
RL(T = 35) | 0.859 (0.787–0.952) | 0.808 (0.735–0.883) | 0.818 (0.677–0.958) | 0.911 (0.839–1.0) |
Difference | 0.0616 (−0.049–0.172) | 0.089 (−0.078–0.259) | 0.163 (−0.083–0.410) | −0.040 (−0.130–0.050) |
Abbreviations: Parentheses denotes confidence interval (CI) for the metric. SVM denotes support vector machines classifier, and Supervised DL denotes 2-layer feed-forward neural network classifier. RL denotes reinforcement learning agent. For feature representation of corpus, LIWC is the original word-level embedding used in Asgari et al., 8. SKP denotes a 4800-dimensional Skip-Thought vector embedding was used to represent each conversational turn. A dialogue summary is obtained by averaging across all turn-based responses for each user. We then evaluate the performance of our RL-agent across 10 stratified shuffle splits. Each split uses 65% of data for training and 35% for testing.