. 2023 Jul 10;13:11155. doi: 10.1038/s41598-023-35184-7

Table 2.

Depression detection and severity estimation performance, in terms of $F_{1}$ ( $F_{1} (D)$ and $F_{1} (H)$ ), Balanced Accuracy (BAc.) and RMSE, on DAIC-WOZ and Vocal Mind datasets.

		Acoustic features Alone				Speaker embeddings Alone				Acoustic and speaker Embeddings combined
Dataset1: DAIC	Model	COVAREP				ECAPA				(ECAPA, COVAREP)
	Model	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE
	MK-CNN	0.35	0.70	0.52	7.39	0.43	0.78	0.60	6.35	0.45	0.79	0.61	6.21
	LSTM	0.32	0.70	0.51	7.41	0.46	0.79	0.61	6.31	0.47	0.80	0.63	6.19
		OpenSMILE				ECAPA				(ECAPA, OpenSMILE)
	MK-CNN	0.37	0.74	0.55	6.87	0.43	0.78	0.61	6.35	0.49	0.81	0.65	6.08
	LSTM	0.39	0.73	0.56	6.82	0.46	0.79	0.63	6.31	0.50	0.83	0.66	6.01
Dataset2: VM	Model	COVAREP				ECAPA				(ECAPA, COVAREP)
	Model	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE	$F_{1} (D)$	$F_{1} (H)$	BAc.	RMSE
	MK-CNN	0.30	0.68	0.49	7.61	0.32	0.80	0.55	6.64	0.34	0.80	0.57	6.55
	LSTM	0.32	0.67	0.50	7.63	0.34	0.81	0.57	6.62	0.37	0.81	0.60	6.51
		OpenSMILE				ECAPA				(ECAPA, OpenSMILE)
	MK-CNN	0.32	0.74	0.53	6.96	0.32	0.80	0.56	6.64	0.41	0.81	0.61	6.41
	LSTM	0.34	0.75	0.54	6.94	0.34	0.81	0.57	6.62	0.43	0.84	0.64	6.28

$F_{1} (D)$ and $F_{1} (H)$ are $F_{1}$ scores for depressed and healthy classes, respectively. COVAREP and OpenSMILE are acoustic features. Results obtained using ECAPA-TDNN x-vectors (ECAPA), COVAREP and OpenSMILE features on DAIC-WOZ (DAIC) and Vocal Mind (VM) datasets. For results obtained by combining Acoustic and Speaker embeddings ((ECAPA, COVAREP) and (ECAPA, OpenSMILE)), MK-CNN and LSTM models refer to CE models with MK-CNN and LSTM blocks, respectively.

Bold values indicate best results in each comparison group.