Table 5.
Selection of most informative features of the multimodal classifier.
| Feature | χ2 | P | Hotspots | Non-hotspots |
|---|---|---|---|---|
| Nee nee nee (no no no)a | 23.347 | 0.127 | 4 | 1 |
| Angst euh euh (fear uh uh)a | 23.060 | 0.129 | 2 | 0 |
| War euh war (were uh were)a | 22.071 | 0.137 | 0 | 5 |
| Category ‘Disgust’b | 21.840 | 0.139 | 0.97 | 0.44 |
| Category ‘Death’c | 21.099 | 0.146 | 0.23 | 0.04 |
| Pijn helemal nik (pain absolutely nothing)a | 20.692 | 0.150 | 2 | 0 |
| Weg vlucht euh (away flight uh)a | 20.692 | 0.150 | 2 | 0 |
| Zeg euh euh (say uh uh)a | 20.408 | 0.153 | 0 | 11 |
| Emotional expressionsd | 18.663 | 0.172 | 8.02 | 1.71 |
| Category ‘Negative emotions’c | 17.905 | 0.181 | 2.30 | 1.22 |
| Category ‘Interrogative pronoun’e | 17.879 | 0.181 | 0.00 | 0.04 |
| Category ‘Anger’c | 17.803 | 0.182 | 0.46 | 0.19 |
| Absolute word count (word tokens)e | 17.498 | 0.186 | 245.85 | 521.12 |
| Bang dod gan (afraid to die)a | 17.443 | 0.187 | 3 | 0 |
| Category ‘Sadness’c,* | 17.192 | 0.190 | 0.69 | 0.23 |
| Euh soort euh (uh sort uh)a | 17.138 | 0.190 | 0 | 4 |
| Zeg euh kom (say uh come)a | 17.003 | 0.192 | 2 | 0 |
| Ging ging wer (went went again)a | 16.500 | 0.199 | 0 | 2 |
| Category ‘Anxiety’c | 16.249 | 0.202 | 0.77 | 0.37 |
| Category ‘Sadness’b,* | 15.045 | 0.220 | 1.80 | 1.05 |
| Number of voiced unitsf | 15.043 | 0.220 | 7.58 | 72.05 |
| Category ‘Eating’c | 15.038 | 0.220 | 0.05 | 0.17 |
| Number of silent unitsf | 14.569 | 0.227 | 5.79 | 66.95 |
| Total duration of speechf | 14.543 | 0.228 | 8.68 | 47.72 |
| Category ‘Swear words’c | 14.388 | 0.230 | 0.12 | 0.02 |
Twenty-five of the 50 most informative features, based on χ2 ranking. The first column shows a selection of high ranked features. N-grams are Dutch and stemmed (hence might seem misspelled; e.g. ‘dood’ is stemmed to ‘dod’, and ‘gaan’ to ‘gan’), with unstemmed English translations in parentheses. The remaining columns show occurrence counts and means for both classes. Values for the class with the highest occurrence are in boldface. *Sadness is listed twice: the first is the LIWC category and the second is the NRC emotion. aN-gram of 3 consecutive words, bEmotion feature extracted using the NRC emotion lexicon, cLIWC feature extracted using the LIWC dictionary, dEmotional expressions extracted using custom tagger, eText statistic extracted using Python’s TextStat package, fSpeech feature extracted using Praat.