Skip to main content
. 2022 Apr 11;82(4):4787–4820. doi: 10.1007/s11042-022-12315-2

Table 6.

Audio features

Feature Description Statistical features No. of
Name extracted features
Pitch It is an approximation of the quasi-periodic rate of vibrations per speech cycle. mean, median, standard deviation, minimum, mode maximum, kurtosis, Root mean square, skewness 9
Intensity It is the measure of the perceived loudness. mean, median, standard deviation, minimum, mode maximum, kurtosis, Root mean square, skewness 9
Formants [F1,F2, F3,F4] They indicate resonating frequencies of the vocal tract. The formant with the lowest frequency band is F1, then the second F2, which occurs with 1000Hz intervals. mean, median, standard deviation, minimum, maximum, kurtosis, mode, Root mean square, skewness 36
Pulses A fundamental, audible, and steady beat in the voice. Count, Mean, standard deviation, variance 4
Amplitude It is the size of the oscillations of the vocal folds due to vibrations caused by speech biosignal. minimum, maximum, mean, Root mean square 4
Mean Absolute jitter It is the absolute difference between consecutive vocal periods, divided by the mean vocal period. Mean 1
Jitter (local, absolute) The absolute difference between consecutive periods, in seconds. Mean 1
Relative average perturbation jitter It measures the effects of long-term pitch changes like slow rise/fall in pitch. It is calculated as the average absolute difference between a period and its average and its 2 neighbours, divided by the mean period. Mean 1
5-point period perturbation Jitter It is calculated using the average absolute difference between a period and the average of it and its 4 closest neighbours, divided by the mean period. Mean 1
Mean absolute differences Jitter It is the absolute difference between consecutive differences between consecutive periods, divided by the mean period Mean 1
Shimmer It defines the short-term (cycle-to-cycle) tiny fluctuations in the amplitude of the waveform which reflects inherent resistance/noise in the voice biosignal. Mean 1
Mean Shimmer Average absolute difference between the amplitudes of consecutive periods, divided by the average amplitude. Mean 1
Mean Shimmer dB average absolute base-10 logarithm of the difference between the amplitudes of consecutive periods, multiplied by 20. Mean 1
3-point Amplitude Perturbation Quotient Shimmer It is calculated as the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of its neighbours, divided by the average amplitude. Mean 1
5-point Amplitude Perturbation Quotient Shimmer It is the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of it and its 4 closest neighbours, divided by the average amplitude. Mean 1
11-point Amplitude Perturbation Quotient Shimmer It is the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of it and its 10 closest neighbours, divided by the average amplitude Mean 1
Mean absolute differences shimmer Average absolute difference between consecutive differences between the amplitudes of consecutive periods. Mean 1
Harmonicity of the voiced parts only It is used for measuring the repeating patterns in voiced speech signals. Mean 1
Mean autocorrelation It is used for measuring the repeating patterns in the speech signal. Mean 1
Mean harmonics-to-noise ratio It is a measure which gives the relationship between the periodic and additive noise components of the speech signal. Mean 1
Mean noise-to-harmonics ratio It is a measure which gives the relationship between the periodic and additive noise components of the speech signal. Mean 1
Fraction of locally unvoiced frames It is a fraction of pitch frames analysed as unvoiced pitch (75Hz) frames in a speech biosignal of a specified length. Mean 1
Number of voice breaks The number of distances between consecutive vocal pulses that are longer than 1.25 divided by the pitch floor. Hence, if the pitch floor is 75 Hz, all inter-pulse intervals which are longer than 16.6667 ms are called as voice breaks. Count 1
Degree of voice breaks This measure is the total duration of breaks between the voiced parts of the speech signal. Mean 1
Total energy Total energy of a vocal signal in air. Mean 1
Mean power The mean power of a speech signal in air. Mean 1