Skip to main content
. 2022 May 12;22(10):3683. doi: 10.3390/s22103683

Table 8.

The CER (%) and WER (%) results of different E2E ASR models were trained using the Uzbek language speech corpus. The impact of language model (LM), speed perturbation (SP), and spectral augmentation (SA) are also reported.

Model LM SP SA Valid Test
CER WER CER WER
E2E-LSTM × × × 13.8 43.1 14.0 44.0
× × 14.9 30.0 14.3 31.4
× 13.7 27.6 14.4 30.6
12.6 24.9 12.0 27.0
DNN-HMM × × × 12.8 34.7 10.2 32.1
× × 10.3 20.5 8.6 24.9
× 6.9 18.8 7.5 23.5
6.9 19.9 8.1 24.9
RNN-CTC × × × 13.3 35.8 9.7 32.3
× × 12.2 27.2 9.1 24.3
× 10.9 25.1 8.7 23.9
8.3 24.7 7.9 22.3
E2E − Transformer × × × 12.3 35.2 9.4 31.6
× × 11.7 25.7 8.7 23.9
× 10.7 23.9 8.4 23.0
9.9 21.4 7.6 21.0
E2E-Conformer × × × 12.7 37.6 10.7 35.1
× × 11.5 27.5 9.7 26.3
× 9.2 21.7 7.5 21.2
7.8 18.1 5.8 17.4
E2E − T (CTC + Attention) × × × 12.1 33.2 9.8 30.3
× × 9.6 19.4 7.9 22.7
× 6.4 17.9 7.4 20.3
5.7 15.2 5.41 14.3