Peak width (PW) |
The peak width of the smoothed power spectrum, P. The peak of the spectrum was identified in the range of 0–200 Hz. Its width was measured at 75 % of the corresponding height (Fig. 2a) |
Spectrum slope (SL) |
The slope of the linear regression line, fit to spectrum P in logarithmic axes. The power spectrum, when plotted in dB as 20 log (P/Pmin) with Pmin = 5 E–05, was previously shown to decrease exponentially with frequency for contents higher than 75 Hz [19]. SL is measured in dB/octave, where an octave represents the interval needed to double the frequency (Fig. 2a inset) |
Power of regression line (PLN) |
The power of the area under the regression line |
Power ratio (PR) |
The power ratio is defined as PR = 1 – I1 – Espectrm/EregressionI, where Espectrm is the area under the logarithmic spectrum and Eregression is the area under the regression line. These areas are computed using trapezoidal integration method. A power ratio value close to 1 means that the logarithmic spectrum follows the regression line closely [19] |
Mel-frequency cepstral coefficient (MFCC) |
Mel-frequency cepstral coefficients encode information about the peak energies or resonances of a sound signal and are indirectly related to the impulse response of the system used to produce the sound. In our study, we can consider the chest as a solid system and the resulting MFCC coefficients as indicators of its impulse response. As the lung sound signal is recorded after traveling through various chest chambers, different chest formations are expected to yield variations in the MFCCs (see Online Supplement A). MFCC sequences can be calculated using filters centered at various frequencies. For the current study, three coefficients (MFCC1, MFCC2, MFCC3) were kept for each subject by averaging over all short-time extracted MFCCs, corresponding to filters centered at frequencies {56, 116, 181} Hz respectively |
Spectral shape (scales) |
Scales estimate how broad or narrow the spectral profile is. These spectral modulations reflect how contents vary along frequency and were calculated from the auditory spectrogram, modeling the cochlear representation of sounds, calculated over 8 ms window. The auditory spectrogram was filtered using 31 Gabor-shape seed filters, logarithmically spaced, and varying from wideband to narrowband: 0–8 cycles/octave (c/o) [26, 27]. The response, produced for each scale and time index, was averaged over time to yield the scale profile. Low scale values (<1 c/o) corresponded to a very smooth spectral profile with peaks that spread over more than 1 octave; high scale values corresponded to a peaky spectrum with number of tips to troughs in the spectrum greater than 1 in each octave. Figure 2c shows a schematic representation |
Temporal modulations (rates) |
Rates capture how fast or slow the frequency contents change with time and in which phase (direction), positive or negative. These temporal modulations were calculated from the auditory spectrogram using 23 exponential filters, constructed of varying velocities ∈ [0, 64] Hz for both directions [26, 27]. Rates were computed for each frequency band of the spectrogram, and results were averaged to yield one rate profile per subject. Figure 2e shows a schematic representation |