Skip to main content
. 2022 May 12;22(10):3683. doi: 10.3390/s22103683

Figure 3.

Figure 3

ASR architecture based on E2E transformers. During training, the encoder is fed a sequence of feature frames. In addition, the encoder output is passed to the decoder for the masked target transcription. The decoder provides a prediction of the masked transcription.