| FER | Facial Emotion Recognition |
| SER | Speech Emotion Recognition |
| RAVDESS | The Ryerson Audio-Visual Database of Emotional Speech and Song |
| ST | Spatial Transformer |
| CNN | Convolutional Neural Network |
| MTCNN | Multi-task Cascaded Convolutional Networks |
| Bi-LSTM | Bi-Directional Short-Term Memory networks |
| GAN | Generative Adversarial Networks |
| embs | embeddings |
| fc | fully-connected |
| SVC | Support Vector Machines/Classification |
| VAD | Voice Activity Detector |
| TL | Transfer-Learning |
| CI | Confidence Interval |
| CV | Cross-Validation |