Transformers for Urban Sound Classification—A Comprehensive Performance Evaluation

. 2022 Nov 16;22(22):8874. doi: 10.3390/s22228874

AAML	Additive Angular Margin Loss
AST	Audio Spectrogram Transformer
AUC	Area Under the Receiver Operating Characteristic (ROC) Curve
BERT	Bidirectional Encoder Representations from Transformers
CENS	Chroma Energy Normalized Statistics
CNN	Convolutional Neural Networks
CQT	Constant Q-Transform
CRNN	Convolutional Recurrent Neural Networks
DCNN	Deep Convolutional Neural Networks
DenseNet	Dense Convolutional Network
DL	Deep Learning
DNN	Deep Neural Network
ESC	Environmental Sound Classification
FN	False Negative
FP	False Positive
GPU	Graphics Processing Unit
LSTM	Long Short-Term Memory
M2M-AST	Many-to-Many Audio Spectrogram Transformer
MFCC	Mel Frequency Cepstral Coefficients
NLP	Natural Language Processing
pp	percentage points
ResNet	Residual Neural Network
RNN	Recurrent Neural Networks
STFT	Short-Term Fourier Transformation
TFCNN	Temporal-Frequency attention-based Convolutional Neural Network
TP	True Positive
VATT	Video–Audio–Text Transformer