Skip to main content
. Author manuscript; available in PMC: 2022 Nov 4.
Published in final edited form as: Interspeech. 2022 Sep;2022:3338–3342. doi: 10.21437/interspeech.2022-10798

Table 1:

F1-scores for DepAudioNet using the DAIC-WOZ dataset for three features, with and without adversarial speaker disentanglement. F1-Avg is the average of F1-scores for non-depressed (F1-ND) and depressed (F1-D) classes. +Adv denotes adversarial training. λ values used for disentanglement are mentioned in parenthesis.

Input Feature +Adv F1-Avg F1-ND F1-D
Mel-spec No 0.619 0.706 0.533
Mel-spec Yes (λ = 5e-6) 0.646 0.732 0.560
Raw-audio No 0.646 0.779 0.512
Raw-audio Yes (λ = 1e-6) 0.660 0.726 0.594
Wav2vec2.0 No 0.686 0.804 0.567
Wav2vec2.0 Yes (λ = 1e-3) 0.692 0.808 0.576