Table 5:
Transfer Learning Results (DNN: Hybrid DNN-HMM + offline i-vector AV: Acoustic Variability Modeling, PV: Pronunciation Variability Modeling) - 91 hours
| Model | AV | PV | Configuration | WER |
|---|---|---|---|---|
| DNN Children | ✗ | ✗ | Baseline | 25.53% |
| DNN Adult | ✗ | ✗ | Baseline | 39.32% |
| DNN Children + Adult | ✗ | ✗ | - | 20.35% |
| DNN TL | ✗ | ✓ | 1 layer | 26.97% |
| DNN TL | ✓ | ✗ | 1 layer | 24.26% |
| DNN TL | ✓ | ✓ | 1 layer each | 19.63% |
| DNN TL | ✓ | ✓ | dis-joint 1 layer each | 20.01% |
| DNN TL | ✓ | ✓ | 2 layers each | 17.8% |
| DNN TL | ✓ | ✓ | dis-joint 2 layers each | 18.74% |
| DNN TL | - | - | all layers | 17.8% |