Table 2.
The performance of the proposed models using only audio features.
Models (Only Audio) | Extended COGNIMUSE (Intended Emotion) |
Global EIMT16 (Expected Emotion) |
||||||
---|---|---|---|---|---|---|---|---|
Arousal | Valence | Arousal | Valence | |||||
MSE | PCC | MSE | PCC | MSE | PCC | MSE | PCC | |
Feature AAN | 0.125 | 0.621 | 0.185 | 0.543 | 1.111 | 0.397 | 0.209 | 0.327 |
Temporal AAN | 0.162 | 0.472 | 0.247 | 0.254 | 1.159 | 0.185 | 0.225 | 0.285 |
Mixed AAN | 0.219 | 0.204 | 0.269 | 0.160 | 1.650 | 0.290 | 0.235 | 0.314 |
2FC-layer model | 0.299 | 0.203 | 0.299 | 0.173 | 1.533 | 0.395 | 0.368 | 0.318 |
2-layer LSTM model | 0.266 | 0.091 | 0.310 | 0.080 | 2.311 | 0.262 | 0.348 | 0.210 |