Skip to main content
. 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665

Table A2.

Table with model architectures associated to the rates reported in Table 3.

TL
Strategy
Inputs Models With VAD
(InaSpeech)
Model Architecture
- - Human
perception
- -
- - ZeroR - -
Feature Extraction
(from pre-trained
STN onAffectNet)
posteriors
(7 classes)
Max. voting No -
Yes -
Sequential
(bi-LSTM)
No 1 layer bi-LSTM
with 150 neurons
+1 attention layer
Yes 1 layer bi-LSTM
with 150 neurons
+1 attention layer
fc50 Sequential
(bi-LSTM)
No 2 layers bi-LSTM
with 50 neurons
+1 attention layer
Yes 1 layer bi-LSTM
with 150 neurons
+1 attention layer
flatten-810 Sequential
(bi-LSTM)
No 2 layer bi-LSTM
with 200 neurons
+2 attention layers
Yes 1 layer bi-LSTM
with 150 neurons
+1 attention layer
Fine-Tuning
on RAVDESS
posteriors
(8 classes)
Max. voting No -
Yes -
Sequential
(bi-LSTM)
No 2 layer bi-LSTM
with 50 neurons
+2 attention layers
Yes 1 layer bi-LSTM
with 25 neurons
+1 attention layer
fc50 Sequential
(bi-LSTM)
No 1 layer bi-LSTM
with 150 neurons
+1 attention layer
Yes 2 layer bi-LSTM
with 150 neurons
+2 attention layers
flatten-810 Sequential
(bi-LSTM)
No 2 layer bi-LSTM
with 150 neurons
+2 attention layers
Yes 2 layer bi-LSTM
with 300 neurons
+2 attention layers