. 2021 Dec 14;21(24):8356. doi: 10.3390/s21248356

Table 4.

The performance of our models using both visual and audio features. FS denotes feature selection.

Models (Video and Audio)	Extended COGNIMUSE (Intended Emotion)				Global EIMT16 (Expected Emotion)
	Arousal		Valence		Arousal		Valence
	MSE	PCC	MSE	PCC	MSE	PCC	MSE	PCC
Feature AAN	0.124	0.630	0.178	0.572	0.742	0.503	0.185	0.467
Temporal AAN	0.153	0.551	0.238	0.319	0.854	0.210	0.218	0.415
Mixed AAN	0.217	0.251	0.285	0.270	1.556	0.318	0.234	0.341
2FC-layer model	0.293	0.228	0.284	0.217	0.989	0.500	0.276	0.372
2-layer LSTM model	0.247	0.083	0.301	0.092	2.222	0.254	0.303	0.208
Sivaprasad et al. [15]
(audio and video, FS)	0.08	0.84	0.21	0.50	-	-	-	-
Yi et al. [18]	-	-	-	-	1.173	0.446	0.198	0.399
Chen et al. [17]	-	-	-	-	1.479	0.467	0.201	0.419
Liu et al. [16]	-	-	-	-	1.182	0.212	0.236	0.379
Guo et al. [69]	-	-	-	-	0.543	0.459	0.209	0.326
Yi et al. [45]	-	-	-	-	0.542	0.522	0.193	0.468