Skip to main content
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: Comput Speech Lang. 2020 Feb 18;63:101077. doi: 10.1016/j.csl.2020.101077

Table 2:

Baseline results of ASR trained only on children’s speech (91 hours).

Model WER

GMM-HMM Monophone 54.53%
GMM-HMM Triphone 36.96%
GMM-HMM Triphone LDA+MLLT 32.79%
GMM-HMM Triphone LDA+MLLT+SAT 24.55%
GMM-HMM Triphone LDA+MLLT+SAT + VTLN 25.66%

Hybrid DNN-HMM 35.97%
Hybrid DNN-HMM + VTLN 32.72%
Hybrid DNN-HMM + LDA+MLLT+SAT 21.31%
Hybrid DNN-HMM + LDA+MLLT+SAT + VTLN 21.82%
Hybrid DNN-HMM + online i-vector (speaker) 28.03%
Hybrid DNN-HMM + online i-vector (utterance) 26.59%
Hybrid DNN-HMM + offline i-vector (utterance) 25.53%