Table 3.
Performance comparison of proposed approach with SOTA approaches. CEl refers to models with LSTM block.
DAIC-WOZ | Vocal mind | |||||||
---|---|---|---|---|---|---|---|---|
Approach | BAc. | RMSE | BAc. | RMSE | ||||
Sequence | 0.32 | 0.70 | 0.51 | 7.41 | 0.32 | 0.67 | 0.49 | 7.63 |
eGeMAPS | 0.32 | 0.71 | 0.52 | 7.05 | 0.27 | 0.74 | 0.50 | 7.22 |
FVTC-MFCC | 0.37 | 0.79 | 0.58 | 6.41 | 0.30 | 0.77 | 0.54 | 6.85 |
FVTC-FMT | 0.39 | 0.79 | 0.59 | 6.37 | 0.34 | 0.76 | 0.55 | 6.82 |
Mk-CNN (COVAREP) | 0.35 | 0.70 | 0.52 | 7.39 | 0.30 | 0.68 | 0.49 | 7.61 |
LSTM (OpenSMILE) | 0.39 | 0.73 | 0.56 | 6.82 | 0.34 | 0.75 | 0.55 | 6.94 |
MK-CNN (ECAPA-TDNN) | 0.43 | 0.78 | 0.60 | 6.35 | 0.32 | 0.80 | 0.56 | 6.64 |
LSTM (ECAPA-TDNN) | 0.46 | 0.79 | 0.63 | 6.31 | 0.34 | 0.81 | 0.57 | 6.62 |
CE (ECAPA, COVAREP) | 0.47 | 0.80 | 0.64 | 6.19 | 0.37 | 0.81 | 0.59 | 6.51 |
CE (ECAPA, OpenSMILE) | 0.51 | 0.83 | 0.66 | 6.01 | 0.43 | 0.84 | 0.64 | 6.28 |