Skip to main content
. 2022 May 12;22(10):3683. doi: 10.3390/s22103683

Figure 5.

Figure 5

A network combining connectionist temporal classification (CTC) and attention-based voice recognition. The shared encoder transforms the input x into the encoder vector h as the output. Both the CTC and the attention models are trained on the common encoder. The probability distribution for the output y is generated using CTC and attention.