Skip to main content
. 2021 Dec 14;21(24):8356. doi: 10.3390/s21248356

Table 4.

The performance of our models using both visual and audio features. FS denotes feature selection.

Models (Video and Audio) Extended COGNIMUSE
(Intended Emotion)
Global EIMT16
(Expected Emotion)
Arousal Valence Arousal Valence
MSE PCC MSE PCC MSE PCC MSE PCC
Feature AAN 0.124 0.630 0.178 0.572 0.742 0.503 0.185 0.467
Temporal AAN 0.153 0.551 0.238 0.319 0.854 0.210 0.218 0.415
Mixed AAN 0.217 0.251 0.285 0.270 1.556 0.318 0.234 0.341
2FC-layer model 0.293 0.228 0.284 0.217 0.989 0.500 0.276 0.372
2-layer LSTM model 0.247 0.083 0.301 0.092 2.222 0.254 0.303 0.208
Sivaprasad et al. [15]
(audio and video, FS) 0.08 0.84 0.21 0.50 - - - -
Yi et al. [18] - - - - 1.173 0.446 0.198 0.399
Chen et al. [17] - - - - 1.479 0.467 0.201 0.419
Liu et al. [16] - - - - 1.182 0.212 0.236 0.379
Guo et al. [69] - - - - 0.543 0.459 0.209 0.326
Yi et al. [45] - - - - 0.542 0.522 0.193 0.468