Skip to main content
. 2022 Jul 19;22(14):5381. doi: 10.3390/s22145381
Algorithm 1: Two-Step Joint Optimization Procedure with Auxiliary ASR Loss.
Require: m, batch size. n, the number of iterations. α1, auxiliary ASR loss weight. α2, loss weight ratio between speech enhancement and ASR. θSE, speech enhancement learnable parameters. θASR, ASR learnable parameters.
1: for1,,ndo
2:    Mini-batch of m clean speech frames {s(1),,s(m)} and corresponding acoustic features {a(1),,a(m)}
3:    Mini-batch of m enhanced speech frames {s^(1),,s^(m)} and corresponding acoustic features {a^(1),,a^(m)}
4:    Mini-batch of m linguistic units {y(1),,y(m)}
5:    First-Step Processing: Update the speech enhancement parameters using SI-SNR and auxiliary ASR loss
θSE1mi=1m(α1·SISNR(s(i),s^(i))+(1α1)·aux(a(i),a^(i)))
6:    Second-Step Processing: Update both speech enhancement and ASR parameters using SI-SNR and ASR loss
θSE, θASR1mi=1m(α2·SISNR(s(i),s^(i))+(1α2)·ASR(a(i),y(i)))
7: end for