Skip to main content
. 2020 Dec 3;22(12):1367. doi: 10.3390/e22121367

Table 1.

Existing lipreading methods using deep learning. There are two key components to traditional lipreading: lip feature extraction (front-end) and feature recognition (back-end). There are also end-to-end systems that use deep learning methods to obtain state-of-the-art performance. For each system, we report the main database, the recognition task tested, and their reported best recognition rate.

Year Reference Methods Database Recognition Task Rec. Rate (%)
Front-End Back-End
2016 Assael et al. [15] 3D-CNN Bi-GRU GRID Sentences 95.20
2016 Chung and Zisserman [22] VGG-M LSTM OuluVS2 Phrases 31.90
SyncNet LSTM OuluVS2 Phrases 94.10
2016 Chung and Zisserman [23] CNN LRW Words 61.10
CNN OuluVS Phrases 91.40
CNN OuluVS2 Phrases 93.20
2016 Wand et al. [19] Eigenlips SVM GRID Phrases 69.50
HOG SVM GRID Phrases 71.20
Feed-forward LSTM GRID Phrases 79.50
2017 Chung and Zisserman [24] CNN LSTM + attention OuluVS2 Phrases 91.10
CNN LSTM + attention MV-LRS Sentences 43.60
2017 Chung et al. [16] CNN LSTM+attention LRW Words 76.20
CNN LSTM + attention GRID Phrases 97.00
CNN LSTM + attention LRS Sentences 49.80
2017 Petridis et al. [25] Autoencoder Bi-LSTM OuluVS2 Phrases 94.70
2017 Stafylakis and Tizimiropoulos [26] 3D-CNN + ResNet Bi-LSTM LRW Words 83.00
2018 Fung and Mak [9] 3D-CNN Bi-LSTM OuluVS2 Phrases 87.60
2018 Petridis et al. [10] 3D-CNN + ResNet Bi-GRU LRW Words 82.00
2018 Wand et al. [20] Feed-forward LSTM GRID Phrases 84.70
2018 Xu et al. [21] 3D-CNN+highway Bi-GRU + attention GRID Phrases 97.10
2019 Weng [27] Two-Stream 3D—CNN Bi-GUR LRW Words 82.07