Table 1. Comparison of speech separation accuracy of ODAN with two other methods for separating two-speaker mixtures (WSJ0-mix2 dataset) and three-speaker mixtures (WSJ0-mix3 dataset).
Number of Speakers |
Model | Causal |
SI-SNRi (dB) |
SDRi (dB) |
PESQ | ESTOI |
Two speakers | Original mixture | – | 0 | 0 | 2.02 | 0.56 |
DAN-LSTM (11) | No | 9.1 | 9.5 | 2.73 | 0.77 | |
uPIT-LSTM (15) | Yes | – | 7.0 | – | – | |
ODAN | Yes | 9.0 | 9.4 | 2.70 | 0.77 | |
Three speakers | Original mixture | – | 0 | 0 | 1.66 | 0.39 |
DAN-LSTM (11) | No | 7.0 | 7.4 | 2.13 | 0.56 | |
uPIT-BLSTM (15) | No | – | 7.4 | – | – | |
DPCL++ (50) | No | 7.1 | – | – | – | |
ODAN | Yes | 6.7 | 7.2 | 2.03 | 0.55 |