Abstract
Multi-band Dynamic Range (MBDR) Compression is a key part of the signal processing operation in hearing aid devices (HADs). Operating speed of the MBDR compressor plays an important role in preserving the quality and intelligibility of the output signal. Traditional fast-acting compressor preserves the audible cues in quiet speech but, in presence of surrounding noise, it can degrade the sound quality by introducing pumping and breathing effects. Alternatively, slow-acting compressor maintains the temporal cues and the listening comfort but may provide inadequate gain for soft inputs that come right after loud inputs. HADs may operate in a variable acoustic environment. Therefore, a fixed speed in compression might affect the performance of the hearing aids. In this study, we propose a frequency(FFT) based nine-band adaptive MBDR compression which uses spectral flux as a measure of the intensity change in input level to adapt the speed of the compressor in each band. Gain, threshold and compression ratio of the compressor for nine bands are adjusted based on the audiogram of the hearing impaired patient. The proposed frequency-based adaptive MBDR compression method is implemented on smartphone. The objective and subjective test results demonstrate the performance of proposed method compared to fixed compression approaches.
1. INTRODUCTION
Dynamic Range Compressor (DRC) is a variable gain amplifier where loud sounds higher than the predefined threshold level are compressed and softer sounds are provided with adequate gain in different frequency bands. This signal processing operation can be considered as loudness equalization which adjusts the amplitude ratio between highest to lowest intensities in every frequency band of a signal. DRC is commonly used in music production, sound systems, hearing aids, amplifiers, and broadcasting etc.1 In this paper, we will focus on the importance of DRC in hearing aid application. At present hearing aids are the most common solution to sensorineural hearing loss.2 A hearing-impaired (HI) person encounters difficulty in understanding or listening to sounds that are below their hearing threshold level. This reduced dynamic range of audibility is known as residual dynamic range.3 Hearing aids employ DRC to optimally utilize the residual dynamic range of HI person. DRC being responsible for mapping the wide range of input sounds to the residual dynamic range of HI person, makes it a crucial component in hearing aids. DRC is defined by parameters such as compression ratio, compression threshold, attack time, release time, gain, knee width, etc. These parameters are decided based on an individual’s listening preferences and hearing threshold levels, often accompanied by the individual’s audiogram. Setting these parameters on hearing aids is also called “hearing aid fitting”. An individual’s need and listening preference makes fitting of hearing aid a bit convoluted process.4
Fitted hearing aid ensures normal hearing experience and better speech recognition ability for people with hearing loss. The purpose of DRC is to provide prescribed frequency-dependent amplification to the input signal. After the amplification, DRC compress or reduce the gain of sounds that are above the preset threshold level in any frequency band because HI people may have trouble with understanding or hearing soft sounds but their perception of loud sound is still similar to a normal hearing person. DRC produces the output that is not distorted or above the discomfort loudness level but still audible to HI person. DRC being a non-linear operation, it is very important to maintain the quality and intelligibility of the input signal. Gain to be inserted or compression ratio are decided based on the input level and hearing threshold of a HI person. A compressor that divides the input level into multiple categories by having multiple threshold values is referred to as wide dynamic range compressor (WDRC).5, 6 Hearing aid with WDRC will have different compression ratio after each threshold level. Hearing thresholds of a HI person are also not identical across the whole frequency spectrum which suggests that gain inserted across each frequency band of input signal needs to be different. Compressor that divides the input signal into multiple bands is referred as multi-band dynamic range (MBDR) compressor.7 In another word, amplification gain depends on the input level and hearing threshold of a particular frequency band. These gains are decided by hearing aid fitting strategy. An audiologist uses popular fitting strategies such as Desired Sensation Level (DSL), NAL-NL2, and Half-gain rule, etc. for gain computation.8,9
Parameters such as compression threshold, compression ratio and gain of the compressor assure the audibility and the comfort level of sound. Whereas, the attack time and release time of compressor play a vital role in maintaining the intelligibility and quality of the input signal.10,11 When the input levels changes instantly, the compressor takes time to change the gain. Sudden and abrupt changes in the gain values can create temporal artifacts and unpleasant listening experience. In designing DRC for hearing aid, attack time and release time primarily rely on user preference and the surrounding environment where the hearing aid is used. Attack time is generally kept at a minimum (5 to 30 ms) to reduce the gain right after signal overshooting the threshold level. Release time, however, has an important role to play in the listener’s preference. The choice of shorter or longer release time decides the speed of the compressor. The fast-acting compressor has shorter release time (50 to 100 ms) and the slow-acting compressor has longer release time (800 ms to 2s). Fast-acting compression preserves the audible cues in quiet speech, but in the presence of surrounding noise, it can degrade the sound quality by introducing pumping and breathing effects.7,12 Alternatively, slow-acting compression maintains the temporal cues and the listening comfort but may provide inadequate gain for soft inputs that come immediately after loud inputs.13 Therefore, a fixed speed in compression might affect the performance of the hearing aids. Some compression designs incorporate adaptive or variable release time that is release time would change based on duration, nature of input or surrounding sound environment.14-16
In this study, we propose an FFT based multi-band dynamic range adaptive compressor. We divide the input signal into nine unequal bands in the frequency domain. Center frequencies of these nine bands are 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz similar to those present in the typical audiogram. The gain calculation for each band corresponding to the user’s hearing threshold and input level is carried out by the DSL-v5 strategy. However, attack and release time are adapted based on the spectral flux measurement of the input signal. Spectral flux is used as a measure of signal transient. Our specific focus will be the effect of adaptive time constants on the performance of compression. Objective and subjective evaluation for fast-acting, slow-acting, and proposed adaptive compression are done on normal hearing people to show the improvement in the quality and intelligibility of the signal. Additionally, the proposed method is also implemented on a smartphone for real-time application. Ubiquitous smartphone as a cost-effective standalone platform carrying all the signal processing functions of hearing aids makes it a viable assistive listening device.17-19
The rest of the paper is organized as follows. Section 2 gives the background information on compression parameters and the effect of release time. The proposed method is described in Section 3. Section 4 provides a brief description of real-time implementation on a smartphone device carrying an iOS operating system. The performance comparison for different compression speeds is carried out in Section 5. Section 6 concludes the overall work in this paper.
2. BACKGROUND AND MOTIVATION
Compression is typically applied as a side-chain configuration on feed-forward topology as shown in Figure 3.14 Parameter tuning is very important in the case of DRC for hearing aids to have optimal performance. Compression parameters and their effects for hearing aid use are discussed below.
A. COMPRESSOR PARAMETERS
Compression Threshold (T): It defines a threshold level above which compression or gain reduction needs to apply. T is typically kept around 65 dB SPL. There can be multiple threshold points in the WDRC compression scheme, which sets a different compression ratio after each threshold level.
Compression Ratio (R): It determines the amount of gain reduction being applied, once the input signal crosses the threshold. For hearing aid applications, R is generally kept below 4. High values of R can result unpleasantness and unclear sound.20 The output-input relation can be given by
(1) |
where T is compression threshold.
Knee Width (W): The knee is the point of threshold level where input-output relation changes its slope. This transition can be made smooth so that the effect of compression won’t be noticeable. Knee width is the spread of this smooth transition region.
Attack Time (τA): Attack time is defined as the time delay between activating the compression and reaching within 3dB of its compressed final value the moment that the input signal exceeds the threshold level. Attack time is kept minimum (5 to 20ms) as we don’t want the user to listen to sounds above discomfort level for long. Figure 1a shows the impact of different attack times on the signal.
Release Time (τR): Release time is defined as the time delay between deactivating the compression and reaching within 4dB of its normal value the moment that the input signal falls below the threshold level. Release time can be set between (50ms to 2s). We discuss the further effects of release time in the next sub-section. Figure 1b shows the impact of different release times on the signal.
Makeup Gain (M): The compressor equalizes the loudness across the signal by compressing the loud sounds. Makeup gain works as a volume controller. M is set generally to increase the overall volume of the signal.
Number of Bands: Since the hearing loss level is not identical across the frequency spectrum, it is important to provide the different amount of gains at different frequency bands. The number of bands varies from 3 to 16 in modern digital hearing aids. Typically, it is set to nine, since we know the hearing thresholds for these nine bands from the audiogram. Center frequencies of these nine bands are located at 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz known as octave frequencies. We will consider these nine bands in our work.
B. EFFECT OF COMPRESSOR SPEED
The speed of the compression has a great impact on the quality and intelligibility of the output signal.10 How quickly gain changes occurs is critically important in determining the effectiveness of the compressor and associated artifacts. The speed of the compressor is defined by the release time. Attack and Release time defines how quickly the compressor reacts once the signal crosses the threshold value. Figure 2 shows the effects of attack time and release time on an analytical signal. Attack time is usually kept minimum to suppress the signal immediately after it goes above the defined compression threshold. A compressor can be divided into two broader classes based on the value of release time “fast-acting compressor (FAC)” and “slow-acting compressor (SAC)”. Typical FAC has release time around 50 to 100ms and SAC has release time around 800ms to 2s. Many literature has discussed the advantages and disadvantages of the fast-acting and slow-acting compressors.10,21-23
FAC intends to make the hearing impaired person’s perception of loudness more like that of a normal listener. A system with short recovery times (5 to 200ms) is also called “Syllabic Compressor” because the gain changes over the time are comparable with the duration of individual syllables in the speech. FAC with a high compression ratio reduces temporal cues and are problematic for people with cognitive issues.15 As a result, high compression ratios (> 3) are avoided for these systems as they are shown to have adverse effects on speech intelligibility.23,24 Another drawback is that ambient noise is more perceptible during pauses. If the gain changes significantly during the speech itself, or during the pauses, then “breathing” and “Pumping” noises may be heard that are objectionable to the user. So during low SNR, FAC is preferred.16 Spatial cues are also reduced which are important for locating the sound sources. Another study also suggests that people with lower cognitive ability may have more difficulty with the fast-acting compression.25 To reduce this problem, release time should be longer.
Fitting philosophy for SAC is to deliver comfortable loudness and naturalness for overall sound in the environment. SAC preserves the temporal structure which is not the case with FAC. One disadvantage is that loudness perception is not restored to the “normal”. In Figure 2d one can see that gain drops to a low value right after an intense sound; hearing aid goes effectively “dead” for a while. This can create inaudibility for weak sounds rapidly following the intense sounds.21,22 However, we can see in Figure 2c that fast-acting compressor can restore this audibility. Another disadvantage linked with SAC is that when sound comes from multiple sources with different intensities, it would be difficult to understand the lower intensity sound source.
Audiologists fit the hearing aid for users at the clinic. The hearing aids that are available currently may not exactly fall clearly into these categories of fast or slow. Release time can also be set to be the in-between value of these two groups. But a fixed time constant may not be acceptable for optimal performance. Variable release time compressor employs both fast-acting and slow-acting compressors in series to vary the release time.25 SNR aware compressor strategy was also proposed to select between FAC or SAC based on the SNR of the input signal.16 In our proposed method, spectral flux is used to automate the release time of the compressor. We are using spectral flux as a measure to track the transient nature of the incoming signal. The following section presents the details of the proposed method.
3. PROPOSED METHOD
Proposed frequency-based multi-band dynamic range adaptive compression is explained in this section. The input signal is considered in 20ms frame-wise manner with 50% overlap and hamming windowing. Let N be the number of samples in one frame and Fs the sampling rate. The windowed input signal is transformed into K point single-sided frequency spectrum using FFT. Figure 3 represents the block diagram of the proposed approach. Converting the linear domain into the log-scale domain yields
(2) |
where k denotes the frequency bin and n denotes a index frame at time instance n. Equation 2 also represents the energy of each frequency bin. We can categorize the energy of each bin into three different levels soft, moderate or loud. The K bins of the frequency domain can be grouped into nine bands with center frequencies as 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz. Every nine bands can be now amplified based on the user’s hearing thresholds and the input level of the respective band. To calculate gain A(k) for every nine bands, we follow the DSL-v5 hearing aid fitting procedure. Gain insertion can be given as follow:
(3) |
Bins falling in the same band will have a similar gain. After the insertion of gain, the signal passes through the compression function. The compression function makes sure that the amplified signal is within the comfortable loudness. This operation can be applied in the frequency domain as per:
(4) |
where T, R, and W are compression threshold, compression ratio, and knee width respectively. Notation of (n, k) is dropped for simplicity.
After the application of the compression function, gain smoothing is applied. Smoothing of the gain is necessary as sudden changes in the gain might create audible artifacts. In gain smoothing stage, attack time and release time comes into consideration. Gain smoothing is applied as
(5) |
where, WG can be calculated as WG = F – XL ; attack time and release time constants are given as and respectively. Here, τA and τR are attack time and release time respectively given in milliseconds.
Additionally, we adapt the release time based on the spectral flux. More transient the signal, the higher spectral flux and shorter the time constants needed for compression. Thus, release time is updated as
(6) |
where τRmin is the value of minimum desired release times, SFsmooth is the smoothed version of spectral flux SF(n), γR is constant controlling the dependence of release time constant on spectral flux, and ϵ is a small constant to prevent the denominator to be zero. The maximum value of τR can be limited to 2 s.
Figure 4 shows the adaption of release time based on the input signal and working of compression. SF(n) is the spectral flux at the nth frame, given as
(7) |
(8) |
In cases when spectral flux changes too much, release time adaption shouldn’t be noticeable to user. Equation (8) gives the smoothed value for spectral flux SFsmooth(n). After the smoothing stage of the compression, the frequency domain output signal frame Y(n) is generated by shaping the frequency domain input signal based on a gain factor calculated from processing the side chain as shown in Figure 3. Thus,
(9) |
where M is the make-up gain which controls the overall volume. The time-domain output signal frame y(n) can be obtained via inverse short-time Fourier transform (STFT) of Y(k).
4. SMARTPHONE IMPLEMENTATION
Real-time implementation of the proposed method has been done for the iPhone device operating on iOS 13 as the operating system. The bottom microphone is used to capture the incoming input signal. Real-time capturing and playback framework was developed in C++ programming on Xcode IDE. Core Audio library is used in making this real-time record and playback framework. All processing is done based on a frame-wise manner with 20 ms as an input frame and 16 kHz sampling rate. The output frame is calculated based on the proposed method as per Section 3. Output can be played back through the wireless headphone (Air-pods) or wired earphones to the HI person’s ear. Figure 5 shows the graphical user interface of the developed application. The application allows the user to input the hearing thresholds or audiogram data for each ear. Users can select the fitting strategy for gain calculation. Also, the user has the option to select the speed of the compression or set it to the specific value. Smartphone application provides the freedom of a parameter setting. An easy-to-use graphical interface allows the user to tune his/her compression parameters for the best listening experience.
5. EXPERIMENT AND EVALUATION
A clean speech signal was generated using the HINT database. We concatenated six sentences, selected based on intensities of the sentences as shown in Figure 4. Parameters of the compression considered are; T = 65 dB SPL, R = 2, W = 20 dB, M = 0 dB, τA = 10 ms. Three cases were considered for performance comparison. Fast-acting, slow-acting, and adaptive compression were considered for objective and subjective evaluation. All three types of compression were frequency-based multi-band compression with center frequencies of bands located at 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz. Only the release time of these methods was different. For fast-acting and slow-acting compression release time of 100ms and 800ms were considered respectively. For adaptive compression release time was varying between release times of fast and slow-acting compression and, considered to be changing as per (6). Figure 6 shows the comparison of objective measures. Perceptual evaluation of speech quality (PESQ) and hearing aid speech quality index were used as a measure of speech quality. Coherence and speech intelligibility index (CSII) and hearing aid speech perception index (HASPI) were used as a measure of speech intelligibility. Adaptive compression showed greater performance by all the measures. To subjectively evaluate these three methods, we considered a mean opinion score (MOS) of 10 normal-hearing participants suggesting score between 1 to 5, 1 being poorest and 5 being excellent quality. Figure 7a presents the MOS of the participants when compression is applied on clean speech. Figure 7b presents the MOS of the participants when machinery noise was mixed with the clean speech with 10 dB SNR and compression was applied. We can see that the fast-acting compressor performed worse in case of the presence of background noise.
6. CONCLUSION
A frequency-based multi-band dynamic range adaptive compression technique was proposed. The input signal was transformed into the frequency domain and divided into nine bands. The gain for each band was calculated based on the input intensity level and hearing threshold of a hearing-impaired person using the DSL-v5 gain fitting strategy. The effect of the parameter release time on the performance of the compressor was examined. To enhance the performance, spectral flux based adaption of release time was done. Real-time implementation of the proposed frequency-based multi-band compressor was done on a smartphone (iPhone). The developed application lets users input their hearing thresholds, set the parameters of the compression, change the speed of compression and fitting strategy according to their listening preferences. Objective and subjective evaluations showed performance improvement using the proposed adaptive compression method.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by the National Institute on Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health (NIH) under Award 5R01DC015430-04. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
REFERENCES
- 1.Reese David, Gross Lynne, and Gross Brian. Audio production worktext: concepts, techniques, and equipment. Routledge, 2012. [Google Scholar]
- 2.National Institute on Deafness and other Communication Disorders NIDCD. https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing. Accessed: 2020-01-03.
- 3.Banerjee Shilpi. The compression handbook, 2011.
- 4.Council US. Hearing loss: determining eligibility for social security benefits. 2004. [PubMed] [Google Scholar]
- 5.Kuk Francis K. Theoretical and practical considerations in compression hearing aids. Trends in Amplification, 1(1):5–39, 1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Davies-Venn Evelyn, Souza Pamela, Brennan Marc, and Stecker G Christopher. Effects of audibility and multichannel wide dynamic range compression on consonant recognition for listeners with severe hearing loss. Ear and hearing, 30(5):494, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Schneider Todd and Brennan Robert. A multichannel compression strategy for a digital hearing aid. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 411–414. IEEE, 1997. [Google Scholar]
- 8.Polonenko Melissa J, Scollie Susan D, Moodie Sheila, Seewald Richard C, Laurnagaray Diana, Shantz Juliane, and Richards Andrea. Fit to targets, preferred listening levels, and self-reported outcomes for the dsl v5. 0a hearing aid prescription for adults. International journal of audiology, 49(8):550–560, 2010. [DOI] [PubMed] [Google Scholar]
- 9.Keidser Gitte, Dillon Harvey, Flax Matthew, Ching Teresa, and Brewer Scott. The nal-nl2 prescription procedure. Audiology research, 1(1), 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Moore Brian CJ. The choice of compression speed in hearing aids: Theoretical and practical considerations and the role of individual differences. Trends in Amplification, 12(2):103–112, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rallapalli Varsha H and Alexander Joshua M. Effects of noise and reverberation on speech recognition with variants of a multichannel adaptive dynamic range compression scheme. International journal of audiology, pages 1–9, 2019. [DOI] [PubMed] [Google Scholar]
- 12.Wiinberg Alan, Jepsen Morten Løve, Epp Bastian, and Dau Torsten. Effects of hearing loss and fast-acting compression on amplitude modulation perception and speech intelligibility. Ear and hearing, 40(1):45–54, 2019. [DOI] [PubMed] [Google Scholar]
- 13.Stone Michael A and Moore Brian CJ. Side effects of fast-acting dynamic range compression that affect intelligibility in a competing speech task. The Journal of the Acoustical Society of America, 116(4):2311–2323, 2004. [DOI] [PubMed] [Google Scholar]
- 14.Giannoulis Dimitrios, Massberg Michael, and Joshua D Reiss. Parameter automation in a dynamic range compressor. Journal of the Audio Engineering Society, 61(10):716–726, 2013. [Google Scholar]
- 15.Kuk Francis, Schmidt Erik, Jessen Anders Holm, and Sonne M. New technology for effortless hearing: A “unique” perspective. Hearing Review, 22(11):32–36, 2015. [Google Scholar]
- 16.May Tobias, Kowalewski Borys, and Dau Torsten. Signal-to-noise-ratio-aware dynamic range compression in hearing aids. Trends in hearing, 22:2331216518790903, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hao Yiya, Zou Ziyan, and Panahi Issa M. A robust smartphone based multi-channel dynamic-range audio compressor for hearing aids. The Journal of the Acoustical Society of America, 143(3):1961–1961, 2018. [Google Scholar]
- 18.Kiiciik Abdullah, Ganguly Anshuman, Hao Yiya, and Panahi Issa MS. Real-time convolutional neural network-based speech source localization on smartphone. IEEE Access, 7:169969–169978, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bhat Gautam S, Shankar Nikhil, Reddy Chandan KA, and Panahi Issa MS. A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone. IEEE Access, 7:78421–78433, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Boike Kumiko T and Souza Pamela E. Effect of compression ratio on speech recognition and speech-quality ratings with wide dynamic range compression amplification. Journal of Speech, Language, and Hearing Research, 43(2):456–468, 2000. [DOI] [PubMed] [Google Scholar]
- 21.Cox Robyn M and Xu Jingjing. Short and long compression release times: speech understanding, real- world preferences, and association with cognitive ability. Journal of the American Academy of Audiology, 21(2):121–138, 2010. [DOI] [PubMed] [Google Scholar]
- 22.Pittman Andrea L, Pederson Ashley J, and Rash Madalyn A. Effects of fast, slow, and adaptive amplitude compression on children’s and adults’ perception of meaningful acoustic information. Journal of the american academy of audiology, 25(9):834–847, 2014. [DOI] [PubMed] [Google Scholar]
- 23.Rosengard Peninah S, Payton Karen L, and Braida Louis D. Effect of slow-acting wide dynamic range compression on measures of intelligibility and ratings of speech quality in simulated-loss listeners. Journal of Speech, Language, and Hearing Research, 2005. [DOI] [PubMed] [Google Scholar]
- 24.Souza Pamela E, Jenstad Lorienne M, and Boike Kumiko T. Measuring the acoustic effects of compression amplification on speech in noise. The Journal of the Acoustical Society of America, 119(1):41–44, 2006. [DOI] [PubMed] [Google Scholar]
- 25.Kuk Francis, Slugocki Chris, Korhonen Petri, Seper Eric, and Hau Ole. Evaluation of the efficacy of a dual variable speed compressor over a single fixed speed compressor. Journal of the American Academy of Audiology, 2019. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.