| HCI | Human–computer interaction |
| SER | Speech emotion recognition |
| MFCC | Mel frequency cepstral coefficient |
| LLD | Low level descriptors |
| CNN | Convolutional neural network |
| DSCNN | Deep stride convolutional neural network |
| CL | Connected layers |
| FC | Fully connected layers |
| SDFA | Salient discriminative features analysis |
| STFT | Short-term Fourier transformation |
| FFT | Fast Fourier transformation |
| IEMOCAP | Interactive emotional dyadic motion capture |
| RAVDESS | Ryerson audio visual database of emotional speech and song |