Table 3.
Method | Audio Representation | Accuracy |
---|---|---|
SVM [9] | feature vector † | 77.4% |
SVM [29] | feature vector † | 48.81% |
Random Forest [29] | feature vector † | 56.07% |
2D CNN [7] | Spectrograms | 73.6% |
VGG-16 | all spectrograms * | 49.20% |
VGG-16 | k selected spectrograms ⋄ | 45.11% |
Proposed 3D CNN | all spectrograms * | 80.41% |
Proposed 3D CNN | k selected spectrograms ⋄ | 81.05% |
†: A feature vector of commonly known audio features like Table 1. *: All generated frames/spectrograms of one audio is used. ⋄: Only k (9) frames/spectrograms of one audio is used.