. 2020 Dec 3;22(12):1367. doi: 10.3390/e22121367

Table 1.

Existing lipreading methods using deep learning. There are two key components to traditional lipreading: lip feature extraction (front-end) and feature recognition (back-end). There are also end-to-end systems that use deep learning methods to obtain state-of-the-art performance. For each system, we report the main database, the recognition task tested, and their reported best recognition rate.

Year	Reference	Methods		Database	Recognition Task	Rec. Rate (%)
Year	Reference	Front-End	Back-End	Database	Recognition Task	Rec. Rate (%)
2016	Assael et al. [15]	3D-CNN	Bi-GRU	GRID	Sentences	95.20
2016	Chung and Zisserman [22]	VGG-M	LSTM	OuluVS2	Phrases	31.90
		SyncNet	LSTM	OuluVS2	Phrases	94.10
2016	Chung and Zisserman [23]	CNN		LRW	Words	61.10
		CNN		OuluVS	Phrases	91.40
		CNN		OuluVS2	Phrases	93.20
2016	Wand et al. [19]	Eigenlips	SVM	GRID	Phrases	69.50
		HOG	SVM	GRID	Phrases	71.20
		Feed-forward	LSTM	GRID	Phrases	79.50
2017	Chung and Zisserman [24]	CNN	LSTM + attention	OuluVS2	Phrases	91.10
		CNN	LSTM + attention	MV-LRS	Sentences	43.60
2017	Chung et al. [16]	CNN	LSTM+attention	LRW	Words	76.20
		CNN	LSTM + attention	GRID	Phrases	97.00
		CNN	LSTM + attention	LRS	Sentences	49.80
2017	Petridis et al. [25]	Autoencoder	Bi-LSTM	OuluVS2	Phrases	94.70
2017	Stafylakis and Tizimiropoulos [26]	3D-CNN + ResNet	Bi-LSTM	LRW	Words	83.00
2018	Fung and Mak [9]	3D-CNN	Bi-LSTM	OuluVS2	Phrases	87.60
2018	Petridis et al. [10]	3D-CNN + ResNet	Bi-GRU	LRW	Words	82.00
2018	Wand et al. [20]	Feed-forward	LSTM	GRID	Phrases	84.70
2018	Xu et al. [21]	3D-CNN+highway	Bi-GRU + attention	GRID	Phrases	97.10
2019	Weng [27]	Two-Stream 3D—CNN	Bi-GUR	LRW	Words	82.07