Table A2.
Table with model architectures associated to the rates reported in Table 3.
TL Strategy |
Inputs | Models | With VAD (InaSpeech) |
Model Architecture |
---|---|---|---|---|
- | - | Human perception |
- | - |
- | - | ZeroR | - | - |
Feature Extraction (from pre-trained STN onAffectNet) |
posteriors (7 classes) |
Max. voting | No | - |
Yes | - | |||
Sequential (bi-LSTM) |
No | 1 layer bi-LSTM with 150 neurons +1 attention layer |
||
Yes | 1 layer bi-LSTM with 150 neurons +1 attention layer |
|||
fc50 | Sequential (bi-LSTM) |
No | 2 layers bi-LSTM with 50 neurons +1 attention layer |
|
Yes | 1 layer bi-LSTM with 150 neurons +1 attention layer |
|||
flatten-810 | Sequential (bi-LSTM) |
No | 2 layer bi-LSTM with 200 neurons +2 attention layers |
|
Yes | 1 layer bi-LSTM with 150 neurons +1 attention layer |
|||
Fine-Tuning on RAVDESS |
posteriors (8 classes) |
Max. voting | No | - |
Yes | - | |||
Sequential (bi-LSTM) |
No | 2 layer bi-LSTM with 50 neurons +2 attention layers |
||
Yes | 1 layer bi-LSTM with 25 neurons +1 attention layer |
|||
fc50 | Sequential (bi-LSTM) |
No | 1 layer bi-LSTM with 150 neurons +1 attention layer |
|
Yes | 2 layer bi-LSTM with 150 neurons +2 attention layers |
|||
flatten-810 | Sequential (bi-LSTM) |
No | 2 layer bi-LSTM with 150 neurons +2 attention layers |
|
Yes | 2 layer bi-LSTM with 300 neurons +2 attention layers |