Skip to main content
. 2023 Dec 13;21(12):e3002366. doi: 10.1371/journal.pbio.3002366

Table 2. In-house model overview.

Model name Brief description Model input Model output Training dataset
CochCNN9 Word Convolutional architecture for word recognition Cochleagram Word label (794) Word-Speaker-Noise dataset [25]
CochCNN9 Speaker Convolutional architecture for speaker recognition Cochleagram Speaker label (433) Word-Speaker-Noise dataset [25]
CochCNN9 AudioSet Convolutional architecture for auditory event recognition (AudioSet) Cochleagram AudioSet label (517) Word-Speaker-Noise dataset [25]
CochCNN9 MultiTask Convolutional architecture for word recognition, speaker recognition, and auditory event recognition (AudioSet) Cochleagram Three output layers: Word label (794), Speaker label (433), AudioSet label (517) Word-Speaker-Noise dataset [25]
CochCNN9 Genre Convolutional architecture for music genre classification Cochleagram Genre label (41) Genre task using Million Song Dataset [155]
CochResNet50 Word Convolutional architecture for word recognition Cochleagram Word label (794) Word-Speaker-Noise dataset [25]
CochResNet50 Speaker Convolutional architecture for speaker recognition Cochleagram Speaker label (433) Word-Speaker-Noise dataset [25]
CochResNet50 AudioSet Convolutional architecture for auditory event recognition (AudioSet) Cochleagram AudioSet label (517) Word-Speaker-Noise dataset [25]
CochResNet50 MultiTask Convolutional architecture for word recognition, speaker recognition, and auditory event recognition (AudioSet) Cochleagram Three output layers: Word label (794), Speaker label (433), AudioSet label (517) Word-Speaker-Noise dataset [25]
CochResNet50 Genre Convolutional architecture for music genre classification Cochleagram Genre label (41) Genre task using Million Song Dataset [155]
SpectroTemporal Linear filterbank with spectral and temporal modulations [45] Cochleagram Spectrotemporal embedding space (None)