Skip to main content
. 2017 Jun 8;141(6):4230–4239. doi: 10.1121/1.4984271

FIG. 2.

FIG. 2.

(Color online) Diagram of the proposed DNN-based speech separation framework. A two-talker mixture first undergoes feature extraction. A DNN is trained using these features to estimate the IRM for the male target talker as well as the IRM for the female interfering talker. The estimated IRM for the target talker is pointwise multiplied with the magnitude spectrogram of the two-talker mixture, which results in the estimated magnitude spectrogram of the target speech. Finally, an overlap-add method is used to resynthesize the target speech signal from the estimated magnitude spectrogram and the mixture phase.