Table 1.
Comparison of system configurations regarding audio normalization, used feature set, and feature normalization strategy (infant-wise: normalization in separate for all vocalizations of an infant, respectively; global: normalization over all instances of the dataset; see also last paragraph of “Feature-based representation” section) by means of the unweighted average recall (UAR) achieved in a binary vocalization classification paradigm RTT versus TD
| Audio normalization | Feature set | Feature normalization | UARRTT vs. TD |
|---|---|---|---|
| – | ComParE | infant-wise | .356 |
| – | ComParE | global | .399 |
| – | eGeMAPS | infant-wise | .832 |
| – | eGeMAPS | global | .674 |
| ✓ | ComParE | infant-wise | .372 |
| ✓ | ComParE | global | .281 |
| ✓ | eGeMAPS | infant-wise | .879 |
| ✓ | eGeMAPS | global | .535 |
UAR values are rounded to three decimal places
ComParE Computational Paralinguistics ChallengE (feature set), eGeMAPS extended Geneva Minimalistic Acoustic Parameter Set, RTT Rett syndrome, TD typical development, ✓ applied, – not applied