. 2019 May 8;21(5):479. doi: 10.3390/e21050479

Table 3.

Comparison of recognition rates among different methods for SAVEE dataset.

Method	Audio Representation	Accuracy
SVM [9]	feature vector ^†	77.4%
SVM [29]	feature vector ^†	48.81%
Random Forest [29]	feature vector ^†	56.07%
2D CNN [7]	Spectrograms	73.6%
VGG-16 $^{(1)}$	all spectrograms ^*	49.20%
VGG-16 $^{(2)}$	k selected spectrograms ^⋄	45.11%
Proposed 3D CNN $^{(1)}$	all spectrograms ^*	80.41%
Proposed 3D CNN $^{(2)}$	k selected spectrograms ^⋄	81.05%

^†: A feature vector of commonly known audio features like Table 1. ^*: All generated frames/spectrograms of one audio is used. ^⋄: Only k (9) frames/spectrograms of one audio is used.