Skip to main content
. 2021 Aug 12;21(16):5452. doi: 10.3390/s21165452
AVER Audio video emotion recognition
CNN Convolution Neural Network
Crema-d [33] Crowd-sourced Emotional multi-modal Actors Dataset
DNN Deep Neural Network
HCI Human–Computer Interaction
FC Fully connected layer
IEMOCAP [30] Interactive emotional dyadic motion capture dataset
LSTM Long Short Term Memory
MRPN multi-modal Residual Perceptron Network
RAVDESS [34] The Ryerson Audio–Visual Database of Emotional Speech and Song
RP Residual Perceptron
SAC Sequence Aggregation Component
SOTA State of the Art Solution
STFT Short-term Fourier transformation
SVM Support Vector Machine
VIT [38] Vision Transformer