Skip to main content
. 2024 Aug 23;13(17):4997. doi: 10.3390/jcm13174997

Table 1.

Acoustic Features.

Feature Description
Source Features
Jitter [%] Deviations in individual consecutive f0 period lengths, indicating irregular closure and asymmetric vocal-fold vibrations.
Shimmer [%] Difference in the peak amplitudes of consecutive f0 periods, indicating irregularities in voice intensity.
Tremor [Hz] Frequency of the most intense low-frequency fundamental frequency-modulating component in a specified analysis range.
Harmonics-to-noise ratio (HNR) [dB] Ratio between f0 and noise components, indirectly correlating with perceived aspiration.
Frequency disturbance ratio (FDR) [%] Relative mean value of the frequency disturbance from 5 to 5 periods (five points average).
Amplitude Disturbance ratio (ADR) [%] Relative mean amplitude value over a set of windows.
Quasi-open quotient (QOQ) Ratio of the vocal folds’ opening time, often reduced in functional dysphonia.
Normalized amplitude quotient (NAQ) Ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram, normalized with respect to the period time.
Peak slope Slope of the regression line that is fit to log10 of the maxima of each frame.
Filter Features
F1 mean [Hz] First peak in the spectrum of voiced utterances resulting from a resonance of the human vocal tract.
F2 mean [Hz] Second peak in the spectrum of voiced utterances resulting from a resonance of the human vocal tract.
F1 variability [Hz] Measures of dispersion of F1 (variance, standard deviation).
F2 variability [Hz] Measures of dispersion of F2 (variance, standard deviation).
F1 range [Hz] Difference between the lowest and highest F1 values.
Vowel space F1 and F2 2D space for the vowels /a/, /i/, /u/.
Linear predictive coding (LPC) coefficients Coefficients predicting the next time point of the audio signal using previous values.
Spectral Features
Mel-frequency cepstral coefficients (MFCCs) Coefficients derived by computing a spectrum of the log-magnitude Mel-spectrum of the audio segment.
Prosodic Features
f0 mean [Hz] Fundamental frequency, perceived as pitch (mean, median).
f0 variability [Hz] Measures of dispersion of f0 (variance, standard deviation).
f0 range [Hz] Difference between the lowest and highest f0 values.
Intensity [dB] Acoustic intensity in decibels relative to a reference value.
Intensity variability [dB] Measures of dispersion of intensity (variance, standard deviation).
Energy velocity Mean-squared central difference across frames, possibly correlating with motor coordination.
Maximum phonation time [s] Maximum time during which phonation of a vowel is sustained.
Speech rate Number of speech units per second over the duration of the speech sample (including pauses).
Articulation rate Number of speech units per second over the duration of the speech sample (excluding pauses).
Time talking [s] Sum of the duration of all speech segments.
Utterance duration mean [s] Mean duration of utterance length.
Pause duration mean [s] Mean duration of pause length.
Pause variability [s] Measures of dispersion of pause duration (variance, standard deviation).
Pause total [s] Total duration of pauses.