Skip to main content
. 2023 Apr 10:1–9. Online ahead of print. doi: 10.1007/s11760-023-02559-2

Fig. 2.

Fig. 2

Model architecture. The 257-dimensional 2D audio spectrogram as input into the four residual blocks. Subsequently, the features extracted by ResNet are sent to BiLSTM with attention. Finally two FC layers and an average pooling layer are used to obtain the final prediction