. Author manuscript; available in PMC: 2021 Mar 18.

Published in final edited form as: IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug 12;27(11):1839–1848. doi: 10.1109/taslp.2019.2934319

TABLE I.

Calculation of the estimated target spectrogram in different two-stage networks. $G^{(1)} (\cdot)$ and $G^{(2)} (\cdot)$ denote the first and second stage DNN. $μ_{O_{1}}$ and $σ_{O_{1}}$ denote normalization parameters for the output of the first stage, and $μ_{O_{1}}$ and $σ_{O_{2}}$ the parameters for the second stage.

Combination	DNN formula
Mapping+Mapping	$\| {\hat{S}}_{1} \| = e x p (G^{(2)} ([\frac{G^{(1)} (\bar{F} (m)) - μ_{o_{1}}}{σ_{o_{1}}}, \bar{F} (m)]) \times σ_{o_{2}} + μ_{o_{2}})$
Mapping+Masking	$\| {\hat{S}}_{1} \| = G^{(2)} ([\frac{G^{(1)} (\bar{F} (m)) - μ_{o_{1}}}{σ_{o_{1}}}, \bar{F} (m)]) \times \| Y \|$
Masking+Mapping	$\| {\hat{S}}_{1} \| = e x p (G^{(2)} ([\frac{l o g (G^{(1)} (\bar{F} (m)) \times \| Y \|) - μ_{o 1}}{σ_{o_{1}}}, \bar{F} (m)]) \times σ_{o_{2}} + μ_{o_{2}})$
Masking+Masking	$\| {\hat{S}}_{1} \| = G^{(2)} ([\frac{l o g (G^{(1)} (\bar{F} (m)) \times \| Y \|) - μ_{o_{1}}}{σ_{o_{1}}}, \bar{F} (m)]) \times \| Y \|$