Table 2.
Model (Turns) | AUC | F1-Score | Sensitivity | Specificity |
---|---|---|---|---|
SVM (T = 5) | 0.493 (0.439–0.547) | 0.169 (0.061–0.275) | 0.12 (0.046–0.193) | 0.860 (0.776–0.950) |
SVM (T = 10) | 0.550 (0.479–0.620) | 0.275 (0.113–0.428) | 0.200 (0.083–0.319) | 0.900 (0.820–0.970) |
SVM (T = 20) | 0.624 (0.563–0.685) | 0.405 (0.232–0.578) | 0.360 (0.171–0.548) | 0.888 (0.789–0.989) |
SVM (T = 30) | 0.633 (0.557–0.707) | 0.424 (0.247–0.601) | 0.320 (0.187–0.458) | 0.944 (0.882–1.0) |
SVM (T = 35) | 0.714 (0.627–0.801) | 0.576 (0.420–0.732) | 0.440 (0.277–0.602) | 0.968 (0.944–1.0) |
Supervised DL (T = 5) | 0.497 (0.392–0.603) | 0.104 (0.015–0.182) | 0.111 (0.091–0.129) | 0.880 (0.812–0.980) |
Supervised DL (T = 10) | 0.527 (0.459–0.594) | 0.278 (0.123–0.433) | 0.200 (0.088–0.316) | 0.933 (0.856–1.0) |
Supervised DL (T = 20) | 0.673 (0.588–0.758) | 0.399 (0.212–0.583) | 0.320 (0.139–0.500) | 0.945 (0.888 - (1.0) |
Supervised DL (T = 30) | 0.720 (0.643–0.796) | 0.477 (0.317–0.638) | 0.360 (0.228–0.491) | 0.955 (0.914–0.996) |
Supervised DL (T = 35) | 0.780 (0.695–0.864) | 0.490 (0.327–0.654) | 0.366 (0.229–0.500) | 0.966 (0.928–1.0) |
RL (T = 5) | 0.633 (0.535–0.703) | 0.486 (0.288–0.680) | 0.459 (0.280–0.630) | 0.811 (0.661–0.936) |
RL (T = 10) | 0.741 (0.631–0.852) | 0.590 (0.352–0.829) | 0.560 (0.309–0.811) | 0.922 (0.823–0.969) |
RL (T = 20) | 0.809 (0.706–0.914) | 0.726 (0.551–0.901) | 0.620 (0.413–0.827) | 0.988 (0.953–1.0) |
RL (T = 30) | 0.853 (0.796–0.914) | 0.801 (0.733–0.880) | 0.818 (0.678–0.958) | 0.898 (0.828–0.969) |
RL(T = 35) | 0.859 (0.787–0.952) | 0.808 (0.735–0.883) | 0.818 (0.677–0.958) | 0.911 (0.839–1.0) |
Abbreviations: SVM denotes support vector machines classifier, Supervised DL denotes 2-layer feedforward neural network, and RL denotes reinforcement learning agent. In all three cases, conversations were cut off at various turn lengths (T), and performance with the classifier was performed to obtain the AUC, F1, sensitivity and specificity scores. Confidence intervals were obtained on 10 randomized shuffle splits for all experiments.