Skip to main content
. Author manuscript; available in PMC: 2021 Dec 5.
Published in final edited form as: Neurocomputing (Amst). 2020 Jul 26;417:302–321. doi: 10.1016/j.neucom.2020.07.053

Table III.

Summary of the significant state-of-the-art DNN speech recognition models

Architecture Dataset Error rate
RNN [126] - FIT, Czech Republic, Johns Hopkins University, 2011 Penn Corpus (natural language modeling) 123*
Autoencoder/DBN [127] - Collaboration, 2012 English Broadcast News Speech Corpora (spoken word recognition) 15.5%**
LSTM [129] - Google, 2014 Google Voice Search Task (spoken word recognition) 10.7%**
Deep LSTM [130] - National Chiao Tung University, 2016 CHiME 3 Challenge (spoken word recognition) 8.1%**
CNN-BLSTM [131] - Microsoft, 2017 Switchboard (spoken word recognition) 5.1%
Attention (LAS) & LSTM [132] - Google, 2018 In-house google dictation (spoken word recognition) 4.1%
Attention & LSTM with pretraining [133] - Collaboration, 2018 LibriSpeech (spoken word recognition) 3.54%

(*perpeplexity-size of model needed for optimal next word prediction with 10K classes, **word error rate)