Skip to main content
. Author manuscript; available in PMC: 2022 Nov 4.
Published in final edited form as: Interspeech. 2022 Sep;2022:3338–3342. doi: 10.21437/interspeech.2022-10798

Table 2:

F1-scores for DepAudioNet using the CONVERGE dataset for three features, with and without adversarial speaker disentanglement. ‘+Adv’ denotes adversarial training.

Input Feature +Adv F1-Avg F1-ND F1-D
Mel-spec No 0.879 0.890 0.868
Mel-spec Yes (λ = 5e-5) 0.890 0.903 0.877
Raw-audio No 0.829 0.832 0.826
Raw-audio Yes (λ = 2e-4) 0.857 0.870 0.844
XLSR-Mandarin No 0.912 0.921 0.903
XLSR-Mandarin Yes (λ = 2e-4) 0.915 0.925 0.906