Skip to main content
. 2021 Nov 18;21(22):7665. doi: 10.3390/s21227665

Table 3.

Quantitative evaluation of the different strategies on the facial emotion recognizer. Results are given at the video level. All the results are reported on eight emotions except those that appear with (*), that are reported in seven emotions, collapsing the ‘Neutral’ and ‘Calm’ emotions. In bold, the best model.

TL Strategy Inputs Models With VAD
(InaSpeech)
Accuracy ±
95% CI
- - Human
perception [18]
- 75.00
- - ZeroR - 13.33 ± 2.06
Feature Extraction
(from pre-trained
STN
on AffectNet)
posteriors
(7 classes)
Max. voting No 30.49 * ± 2.38
Yes 30.35 * ± 2.37
Sequential
(bi-LSTM)
No 38.87 ± 2.52
Yes 39.75 ± 2.53
fc50 Sequential
(bi-LSTM)
No 50.40 ± 2.58
Yes 48.77 ± 2.58
flatten-810 Sequential
(bi-LSTM)
No 53.85 ± 2.57
Yes 51.70 ± 2.58
Fine-Tuning
on RAVDESS
posteriors
(8 classes)
Max. voting No 54.20 ± 2.56
Yes 55.07 ± 2.56
Sequential
(bi-LSTM)
No 55.82 ± 2.56
Yes 56.87 ± 2.56
fc50 Sequential
(bi-LSTM)
No 46.48 ± 2.58
Yes 46.13 ± 2.57
flatten-810 Sequential
(bi-LSTM)
No 54.14 ± 2.57
Yes 57.08 ± 2.56