Skip to main content
. 2023 Jul 10;13:11155. doi: 10.1038/s41598-023-35184-7

Table 2.

Depression detection and severity estimation performance, in terms of F1 (F1(D) and F1(H)), Balanced Accuracy (BAc.) and RMSE, on DAIC-WOZ and Vocal Mind datasets.

Acoustic features
Alone
Speaker embeddings
Alone
Acoustic and speaker
Embeddings combined
 Dataset1: DAIC   Model COVAREP ECAPA (ECAPA, COVAREP)
F1(D) F1(H) BAc. RMSE F1(D) F1(H) BAc. RMSE F1(D) F1(H) BAc. RMSE
  MK-CNN 0.35 0.70 0.52 7.39 0.43 0.78 0.60 6.35 0.45 0.79 0.61 6.21
  LSTM 0.32 0.70 0.51 7.41 0.46 0.79 0.61 6.31 0.47 0.80 0.63 6.19
OpenSMILE ECAPA (ECAPA, OpenSMILE)
  MK-CNN 0.37 0.74 0.55 6.87 0.43 0.78 0.61 6.35 0.49 0.81 0.65 6.08
  LSTM 0.39 0.73 0.56 6.82 0.46 0.79 0.63 6.31 0.50 0.83 0.66 6.01
 Dataset2: VM   Model COVAREP ECAPA (ECAPA, COVAREP)
F1(D) F1(H) BAc. RMSE F1(D) F1(H) BAc. RMSE F1(D) F1(H) BAc. RMSE
  MK-CNN 0.30 0.68 0.49 7.61 0.32 0.80 0.55 6.64 0.34 0.80 0.57 6.55
  LSTM 0.32 0.67 0.50 7.63 0.34 0.81 0.57 6.62 0.37 0.81 0.60 6.51
OpenSMILE ECAPA (ECAPA, OpenSMILE)
  MK-CNN 0.32 0.74 0.53 6.96 0.32 0.80 0.56 6.64 0.41 0.81 0.61 6.41
  LSTM 0.34 0.75 0.54 6.94 0.34 0.81 0.57 6.62 0.43 0.84 0.64 6.28

F1(D) and F1(H) are F1 scores for depressed and healthy classes, respectively. COVAREP and OpenSMILE are acoustic features. Results obtained using ECAPA-TDNN x-vectors (ECAPA), COVAREP and OpenSMILE features on DAIC-WOZ (DAIC) and Vocal Mind (VM) datasets. For results obtained by combining Acoustic and Speaker embeddings ((ECAPA, COVAREP) and (ECAPA, OpenSMILE)), MK-CNN and LSTM models refer to CE models with MK-CNN and LSTM blocks, respectively.

Bold values indicate best results in each comparison group.