Table III.
Summary of the significant state-of-the-art DNN speech recognition models
| Architecture | Dataset | Error rate |
|---|---|---|
| RNN [126] - FIT, Czech Republic, Johns Hopkins University, 2011 | Penn Corpus (natural language modeling) | 123* |
| Autoencoder/DBN [127] - Collaboration, 2012 | English Broadcast News Speech Corpora (spoken word recognition) | 15.5%** |
| LSTM [129] - Google, 2014 | Google Voice Search Task (spoken word recognition) | 10.7%** |
| Deep LSTM [130] - National Chiao Tung University, 2016 | CHiME 3 Challenge (spoken word recognition) | 8.1%** |
| CNN-BLSTM [131] - Microsoft, 2017 | Switchboard (spoken word recognition) | 5.1% |
| Attention (LAS) & LSTM [132] - Google, 2018 | In-house google dictation (spoken word recognition) | 4.1% |
| Attention & LSTM with pretraining [133] - Collaboration, 2018 | LibriSpeech (spoken word recognition) | 3.54% |
(*perpeplexity-size of model needed for optimal next word prediction with 10K classes, **word error rate)