Table III.
ESTOI, PESQ, and ΔSDR scores for speaker separation in anechoic conditions. Single-stage networks are trained with anechoic data. Boldface highlights the best result in each condition.
| ESTOI (%) | PESQ | ΔSDR (dB) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| TIR (dB) | −12 | −6 | Average | −12 | −6 | Average | −12 | −6 | Average | |
| Unprocessed | 24.6 | 36.4 | 30.5 | 1.35 | 1.58 | 1.46 | ||||
| DFN | Mapping | 62.4 | 73.6 | 68.0 | 2.35 | 2.65 | 2.50 | 14.15 | 10.77 | 12.46 |
| Masking | 63.7 | 76.2 | 69.9 | 2.40 | 2.75 | 2.57 | 15.83 | 12.94 | 14.28 | |
| LSTM | Mapping | 67.9 | 77.3 | 72.6 | 2.47 | 2.76 | 2.61 | 14.60 | 11.14 | 12.87 |
| Masking | 69.1 | 79.7 | 74.4 | 2.52 | 2.86 | 2.69 | 16.26 | 13.32 | 14.79 | |
| BLSTM | Mapping | 71.6 | 79.9 | 75.7 | 2.61 | 2.87 | 2.74 | 15.24 | 11.70 | 13.47 |
| Masking | 72.0 | 81.5 | 76.7 | 2.62 | 2.94 | 2.78 | 16.77 | 13.58 | 15.17 | |