. 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665

Table A2.

Table with model architectures associated to the rates reported in Table 3.

TL Strategy	Inputs	Models	With VAD (InaSpeech)	Model Architecture
-	-	Human perception	-	-
-	-	ZeroR	-	-
Feature Extraction (from pre-trained STN onAffectNet)	posteriors (7 classes)	Max. voting	No	-
		Max. voting	Yes	-
		Sequential (bi-LSTM)	No	1 layer bi-LSTM with 150 neurons +1 attention layer
		Sequential (bi-LSTM)	Yes	1 layer bi-LSTM with 150 neurons +1 attention layer
	fc50	Sequential (bi-LSTM)	No	2 layers bi-LSTM with 50 neurons +1 attention layer
	fc50	Sequential (bi-LSTM)	Yes	1 layer bi-LSTM with 150 neurons +1 attention layer
	flatten-810	Sequential (bi-LSTM)	No	2 layer bi-LSTM with 200 neurons +2 attention layers
	flatten-810	Sequential (bi-LSTM)	Yes	1 layer bi-LSTM with 150 neurons +1 attention layer
Fine-Tuning on RAVDESS	posteriors (8 classes)	Max. voting	No	-
		Max. voting	Yes	-
		Sequential (bi-LSTM)	No	2 layer bi-LSTM with 50 neurons +2 attention layers
		Sequential (bi-LSTM)	Yes	1 layer bi-LSTM with 25 neurons +1 attention layer
	fc50	Sequential (bi-LSTM)	No	1 layer bi-LSTM with 150 neurons +1 attention layer
	fc50	Sequential (bi-LSTM)	Yes	2 layer bi-LSTM with 150 neurons +2 attention layers
	flatten-810	Sequential (bi-LSTM)	No	2 layer bi-LSTM with 150 neurons +2 attention layers
	flatten-810	Sequential (bi-LSTM)	Yes	2 layer bi-LSTM with 300 neurons +2 attention layers