Source Features
Jitter [%] |
Deviations in individual consecutive f0 period lengths, indicating irregular closure and asymmetric vocal-fold vibrations. |
Shimmer [%] |
Difference in the peak amplitudes of consecutive f0 periods, indicating irregularities in voice intensity. |
Tremor [Hz] |
Frequency of the most intense low-frequency fundamental frequency-modulating component in a specified analysis range. |
Harmonics-to-noise ratio (HNR) [dB] |
Ratio between f0 and noise components, indirectly correlating with perceived aspiration. |
Frequency disturbance ratio (FDR) [%] |
Relative mean value of the frequency disturbance from 5 to 5 periods (five points average). |
Amplitude Disturbance ratio (ADR) [%] |
Relative mean amplitude value over a set of windows. |
Quasi-open quotient (QOQ) |
Ratio of the vocal folds’ opening time, often reduced in functional dysphonia. |
Normalized amplitude quotient (NAQ) |
Ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram, normalized with respect to the period time. |
Peak slope |
Slope of the regression line that is fit to log10 of the maxima of each frame. |
Filter Features
F1 mean [Hz] |
First peak in the spectrum of voiced utterances resulting from a resonance of the human vocal tract. |
F2 mean [Hz] |
Second peak in the spectrum of voiced utterances resulting from a resonance of the human vocal tract. |
F1 variability [Hz] |
Measures of dispersion of F1 (variance, standard deviation). |
F2 variability [Hz] |
Measures of dispersion of F2 (variance, standard deviation). |
F1 range [Hz] |
Difference between the lowest and highest F1 values. |
Vowel space |
F1 and F2 2D space for the vowels /a/, /i/, /u/. |
Linear predictive coding (LPC) coefficients |
Coefficients predicting the next time point of the audio signal using previous values. |
Spectral Features
Mel-frequency cepstral coefficients (MFCCs) |
Coefficients derived by computing a spectrum of the log-magnitude Mel-spectrum of the audio segment. |
Prosodic Features
f0 mean [Hz] |
Fundamental frequency, perceived as pitch (mean, median). |
f0 variability [Hz] |
Measures of dispersion of f0 (variance, standard deviation). |
f0 range [Hz] |
Difference between the lowest and highest f0 values. |
Intensity [dB] |
Acoustic intensity in decibels relative to a reference value. |
Intensity variability [dB] |
Measures of dispersion of intensity (variance, standard deviation). |
Energy velocity |
Mean-squared central difference across frames, possibly correlating with motor coordination. |
Maximum phonation time [s] |
Maximum time during which phonation of a vowel is sustained. |
Speech rate |
Number of speech units per second over the duration of the speech sample (including pauses). |
Articulation rate |
Number of speech units per second over the duration of the speech sample (excluding pauses). |
Time talking [s] |
Sum of the duration of all speech segments. |
Utterance duration mean [s] |
Mean duration of utterance length. |
Pause duration mean [s] |
Mean duration of pause length. |
Pause variability [s] |
Measures of dispersion of pause duration (variance, standard deviation). |
Pause total [s] |
Total duration of pauses. |