| MER | Music Emotion Recognition |
| DNN | Deep Neural Network |
| CNN | Convolutional Neural Network |
| GMM | Gaussian mixture model |
| SVM | Support Vector Machine |
| CLR | Calibrated Label Ranking |
| MVMM | Music Video Multi-Modal |
| C3D | Convolutional 3 Dimensional |
| CAL500 | Computer Audition Lab 500-song |
| DEAP120 | Database for Emotion Analysis using Physiological signals with 120 samples |
| MMTM | Multimodal Transfer Module |
| SE | Squeeze-and-Excitation |
| MFCC | Mel Frequency Cepstral Coefficient |
| ROC | Receiver Operation Characteristics |
| AUC | Area Under Curve |
| GAP | Global Average Pooling |
| FFT | Fast Fourier Transform |
| T-F | Time-Frequency |
| 2/3D | 2/3 Dimension |
| DRB2DSC | Dense Residual Block 2D Standard Convolution |
| DRB3DSC | Dense Residual Block 3D Standard Convolution |
| DRB2DSCFC | Dense Residual Block 2D with Separable Channel and Filter Convolution |
| DRB3DSCFC | Dense Residual Block 3D with Separable Channel and Filter Convolution |