Table 2:
The performance of ASR models on the ADReSS and WLS test set. The best performance for each ASR model and generation method pair is in bold.
| Model | ADReSS test set | WLS test set |
|---|---|---|
| WER (CER) | WER (CER) | |
| Pre-trained, best-path decoding | ||
| wav2vec2-base-960h | 0.559 (0.357) | 0.541 (0.318) |
| wav2vec2-large-960h | 0.493 (0.292) | 0.471 (0.269) |
| wav2vec2-large-960h-lv60 | 0.443 (0.252) | 0.412 (0.240) |
| wav2vec2-large960h-lv60-self | 0.422 (0.258) | 0.390 (0.235) |
| hubert-large-ls960-ft | 0.415 (0.228) | 0.322 (0.210) |
| Domain-adapted, best-path decoding | ||
| wav2vec2-base-960h | 0.438 (0.299) | 0.366 (0.227) |
| wav2vec2-large-960h | 0.427 (0.266) | 0.358 (0.210) |
| wav2vec2-large-960h-lv60 | 0.364 (0.254) | 0.277 (0.176) |
| wav2vec2-large960h-lv60-self | 0.354 (0.234) | 0.264 (0.159) |
| hubert-large-ls960-ft | 0.332 (0.210) | 0.306 (0.153) |
| Pre-trained, beam search decoding | ||
| wav2vec2-base-960h | 0.469 (0.335) | 0.435 (0.293) |
| wav2vec2-large-960h | 0.403 (0.267) | 0.380 (0.243) |
| wav2vec2-large-960h-lv60 | 0.341 (0.223) | 0.323 (0.215) |
| wav2vec2-large960h-lv60-self | 0.340 (0.245) | 0.319 (0.224) |
| hubert-large-ls960-ft | 0.318 (0.225) | 0.305 (0.196) |
| Domain-adapted, beam search decoding | ||
| wav2vec2-base-960h | 0.380 (0.290) | 0.303 (0.220) |
| wav2vec2-large-960h | 0.361 (0.253) | 0.301 (0.205) |
| wav2vec2-large-960h-lv60 | 0.321 (0.250) | 0.235 (0.174) |
| wav2vec2-large960h-lv60-self | 0.310 (0.226) | 0.220 (0.154) |
| hubert-large-ls960-ft | 0.285 (0.205) | 0.210 (0.145) |