Table 5.
Confusion matrices of speech emotion recognition (4 classes) with LittleBeats™ and smartphone audio data.
| LittleBeats™ Data | Smartphone Data | |||||||
|---|---|---|---|---|---|---|---|---|
| Ground Truth Labels | NEU | HAP | SAD | ANG | NEU | HAP | SAD | ANG |
| Neutral (NEU) | 22 (0.786) |
2 (0.071) |
2 (0.071) |
2 (0.071) |
17 (0.607) |
1 (0.036) |
6 (0.214) |
4 (0.143) |
| Happy (HAP) | 0 (0) |
24 (0.649) |
2 (0.054) |
11 (0.297) |
1 (0.027) |
25 (0.676) |
2 (0.054) |
9 (0.243) |
| Sad (SAD) | 6 (0.154) |
8 (0.205) |
25 (0.641) |
0 (0) |
6 (0.158) |
4 (0.105) |
27 (0.711) |
1 (0.026) |
| Angry (ANG) | 0 (0) |
6 (0.158) |
4 (0.105) |
28 (0.737) |
1 (0.026) |
12 (0.316) |
3 (0.079) |
22 (0.579) |
Note: Rows represent the ground truth labels, and columns represent the predicted data. The proportions of a given ground truth label that were predicted as neutral, happy, sad, and angry, respectively, are shown in parentheses.