Skip to main content
. 2023 Dec 13;21(12):e3002366. doi: 10.1371/journal.pbio.3002366

Table 16. Training details for CochCNN9 and CochResNet50 models trained on clean speech.

Model Name Batch Size Initial Learning Rate Num Classes (this number is inclusive of the “null” label, although no “null” examples were included when training clean models) Accuracy on Clean Speech for Training Task
CochCNN9 Word 128 0.01 794 (Top 1) 82.311%
(Top 5) 94.150%
CochCNN9 WordClean 128 0.01 794 (Top 1) 84.365%
(Top 5) 95.078%
CochCNN9 Speaker 128 0.01 433 (Top 1) 99.799%
(Top 5) 99.990%
CochCNN9 SpeakerClean* 128 0.01 433 (Top 1) 99.905%
(Top 5) 99.998%
CochResNet50 Word 256 0.1 794 (Top 1) 94.212%
(Top 5) 98.993%
CochResNet50 WordClean 256 0.1 794 (Top 1) 93.998%
(Top 5) 98.662%
CochResNet50 Speaker 256 0.1 433 (Top 1) 99.973%
(Top 5) 100.000%
CochResNet50 Speaker Clean 256 0.1 433 (Top 1) 99.988%
(Top 5) 100.000%

*Model had additional gradient clipping (max l2 norm = 1.0) and learning rate warm-up for the first 500 batches of training (learning rate = <initial learning rate> / (500-i), where i is the batch number).