Table 7.
Performance comparisons of a bench model [61] and a proposed convolutional neural network (CNN) model for depression detection and Patient Health Questionnaire-9 score prediction on audio data sets in a text-dependent setting (read-dependent speech mode) and a text-independent setting (spontaneous mode; N=318).
| Models | Data sets | ||||||||
|
|
Text-dependent setting (mean of 10-fold) | Text-independent setting (single fold) | |||||||
|
|
ACCa (%) | F1-scoreb (%) | CCCc | RMSEd | ACC (%) | F1-score (%) | CCC | RMSE | |
| Proposed CNNs model | 78.14 | 77.27 | 0.28 | 9.21 | 56.82 | 37.84 | 0.287 | 5.53 | |
| GCNN-LSTMe [61] | 51.65 | 50.90 | 0.43 | 8.10 | 58.57 | 39.78 | 0.497 | 5.70 | |
aACC: accuracy.
bF1-score: the weighted average of precision and recall.
cCCC: Concordance Correlation Coefficient.
dRMSE: Root Mean Square Error.
eGCNN-LSTM: Gated Convolutional Neural Network-Long Short Term Memory.