Table 4.
The most stable features for predicting BvA and alogia from vocal acoustics.
Feature name | How feature is computed | What feature means |
---|---|---|
Alogia | ||
Unvoiced Segment Length: SD (StddevUnvoicedSegmentLength) | Standard deviation of unvoiced segments length | Captures the variability in pause length. This is potentially related to articulation rate and speech production, and conceptually critical to alogia. |
Blunted affect | ||
Mel-Frequency-Capstral-Coefficients – 2: SD (mfcc2_sma3_stddevNorm) | Computed as a spectrum of transformed frequency values over time | Captures variability in the global signature of the signal spectrum over time, based on a short-term frequency representation based on a nonlinear mel scale of frequency. It broadly reflects global changes in the vocal tract and is critical for speech recognition in humans and in automated systems. The MFCC2 reflects finer spectral details than MFCC1. |
Harmonic Difference: H1 – A3 (logRelF0-H1-A3_sma3nz_amean) | Mean ratio of energy of the first F0 harmonic (H1) to the energy of the highest harmonic in the third formant range (A3) | Ratio of energy of the first F0 harmonic to the third F0 harmonic - generated from the vocal folds as opposed to the vocal tracts. A measure of “spectral tilt” (i.e., tendency for lower frequencies to have less volume), and associated with breathy voice in men, and lack of “creaky voice” |
Both blunted vocal affect and alogia | ||
Second Formant: M (F2frequency_sma3nz_amean) | Average of formant 2 frequency values | Captures spectral shaping of vocal signal, computed as the average frequency from vowel shaping. The second formant typically reflects tongue body movement from front to back. |
Acoustic features determined to be most stable using stability selection.
BvA blunted vocal affect.