CochCNN9 Word |
Convolutional architecture for word recognition |
Cochleagram |
Word label (794) |
Word-Speaker-Noise dataset [25] |
CochCNN9 Speaker |
Convolutional architecture for speaker recognition |
Cochleagram |
Speaker label (433) |
Word-Speaker-Noise dataset [25] |
CochCNN9 AudioSet |
Convolutional architecture for auditory event recognition (AudioSet) |
Cochleagram |
AudioSet label (517) |
Word-Speaker-Noise dataset [25] |
CochCNN9 MultiTask |
Convolutional architecture for word recognition, speaker recognition, and auditory event recognition (AudioSet) |
Cochleagram |
Three output layers: Word label (794), Speaker label (433), AudioSet label (517) |
Word-Speaker-Noise dataset [25] |
CochCNN9 Genre |
Convolutional architecture for music genre classification |
Cochleagram |
Genre label (41) |
Genre task using Million Song Dataset [155] |
CochResNet50 Word |
Convolutional architecture for word recognition |
Cochleagram |
Word label (794) |
Word-Speaker-Noise dataset [25] |
CochResNet50 Speaker |
Convolutional architecture for speaker recognition |
Cochleagram |
Speaker label (433) |
Word-Speaker-Noise dataset [25] |
CochResNet50 AudioSet |
Convolutional architecture for auditory event recognition (AudioSet) |
Cochleagram |
AudioSet label (517) |
Word-Speaker-Noise dataset [25] |
CochResNet50 MultiTask |
Convolutional architecture for word recognition, speaker recognition, and auditory event recognition (AudioSet) |
Cochleagram |
Three output layers: Word label (794), Speaker label (433), AudioSet label (517) |
Word-Speaker-Noise dataset [25] |
CochResNet50 Genre |
Convolutional architecture for music genre classification |
Cochleagram |
Genre label (41) |
Genre task using Million Song Dataset [155] |
SpectroTemporal |
Linear filterbank with spectral and temporal modulations [45] |
Cochleagram |
Spectrotemporal embedding space |
(None) |