Table 6.
Audio features
Feature | Description | Statistical features | No. of |
---|---|---|---|
Name | extracted | features | |
Pitch | It is an approximation of the quasi-periodic rate of vibrations per speech cycle. | mean, median, standard deviation, minimum, mode maximum, kurtosis, Root mean square, skewness | 9 |
Intensity | It is the measure of the perceived loudness. | mean, median, standard deviation, minimum, mode maximum, kurtosis, Root mean square, skewness | 9 |
Formants [F1,F2, F3,F4] | They indicate resonating frequencies of the vocal tract. The formant with the lowest frequency band is F1, then the second F2, which occurs with 1000Hz intervals. | mean, median, standard deviation, minimum, maximum, kurtosis, mode, Root mean square, skewness | 36 |
Pulses | A fundamental, audible, and steady beat in the voice. | Count, Mean, standard deviation, variance | 4 |
Amplitude | It is the size of the oscillations of the vocal folds due to vibrations caused by speech biosignal. | minimum, maximum, mean, Root mean square | 4 |
Mean Absolute jitter | It is the absolute difference between consecutive vocal periods, divided by the mean vocal period. | Mean | 1 |
Jitter (local, absolute) | The absolute difference between consecutive periods, in seconds. | Mean | 1 |
Relative average perturbation jitter | It measures the effects of long-term pitch changes like slow rise/fall in pitch. It is calculated as the average absolute difference between a period and its average and its 2 neighbours, divided by the mean period. | Mean | 1 |
5-point period perturbation Jitter | It is calculated using the average absolute difference between a period and the average of it and its 4 closest neighbours, divided by the mean period. | Mean | 1 |
Mean absolute differences Jitter | It is the absolute difference between consecutive differences between consecutive periods, divided by the mean period | Mean | 1 |
Shimmer | It defines the short-term (cycle-to-cycle) tiny fluctuations in the amplitude of the waveform which reflects inherent resistance/noise in the voice biosignal. | Mean | 1 |
Mean Shimmer | Average absolute difference between the amplitudes of consecutive periods, divided by the average amplitude. | Mean | 1 |
Mean Shimmer dB | average absolute base-10 logarithm of the difference between the amplitudes of consecutive periods, multiplied by 20. | Mean | 1 |
3-point Amplitude Perturbation Quotient Shimmer | It is calculated as the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of its neighbours, divided by the average amplitude. | Mean | 1 |
5-point Amplitude Perturbation Quotient Shimmer | It is the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of it and its 4 closest neighbours, divided by the average amplitude. | Mean | 1 |
11-point Amplitude Perturbation Quotient Shimmer | It is the average absolute difference between the amplitude of a vocal period and the average of the amplitudes of it and its 10 closest neighbours, divided by the average amplitude | Mean | 1 |
Mean absolute differences shimmer | Average absolute difference between consecutive differences between the amplitudes of consecutive periods. | Mean | 1 |
Harmonicity of the voiced parts only | It is used for measuring the repeating patterns in voiced speech signals. | Mean | 1 |
Mean autocorrelation | It is used for measuring the repeating patterns in the speech signal. | Mean | 1 |
Mean harmonics-to-noise ratio | It is a measure which gives the relationship between the periodic and additive noise components of the speech signal. | Mean | 1 |
Mean noise-to-harmonics ratio | It is a measure which gives the relationship between the periodic and additive noise components of the speech signal. | Mean | 1 |
Fraction of locally unvoiced frames | It is a fraction of pitch frames analysed as unvoiced pitch (75Hz) frames in a speech biosignal of a specified length. | Mean | 1 |
Number of voice breaks | The number of distances between consecutive vocal pulses that are longer than 1.25 divided by the pitch floor. Hence, if the pitch floor is 75 Hz, all inter-pulse intervals which are longer than 16.6667 ms are called as voice breaks. | Count | 1 |
Degree of voice breaks | This measure is the total duration of breaks between the voiced parts of the speech signal. | Mean | 1 |
Total energy | Total energy of a vocal signal in air. | Mean | 1 |
Mean power | The mean power of a speech signal in air. | Mean | 1 |