Table 2.

F1-AVG scores (MV) with and without score-level fusion with the Word2vec text model. Results are shown for the top 2 audio-only models together with their DeIDs that illustrate the privacy-preserving feature of USSD.

Audio-Model	Disent.	Audio-only	Word2vec Fusion (Text-only)	DeID (Audio-only)

Raw-Audio ECAPA-TDNN	ADV	0.790	0.860 (0.762)	22.32%
ComparE16 LSTM-only	USSD	0.776	0.830 (0.762)	92.87%