. 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665

Table 1.

Quantitative evaluation of the different strategies on speech emotion recognition. In bold, the best model.

TL Strategy	Inputs	Models	With VAD (InaSpeech)	Accuracy ± 95% CI
-	-	Human perception [18]	-	67.00
-	-	ZeroR	-	13.33 ± 2.06
Feature Extraction	Deep-Spectrum embs. from fc7 of AlexNet	SVC	No	43.32 ± 2.56
	Deep-Spectrum embs. from fc7 of AlexNet	SVC	Yes	45.80 ± 2.57
	PANNs embs. from CNN-14	SVC	No	39.73 ± 2.53
	PANNs embs. from CNN-14	SVC	Yes	37.22 ± 2.50
Fine Tuning	Mel spectrograms	AlexNet	No	60.72 ± 2.52
	Mel spectrograms	AlexNet	Yes	61.67 ± 2.51
	Mel spectrograms	CNN-14	No	76.58 ± 2.18
	Mel spectrograms	CNN-14	Yes	75.25 ± 2.23