Abstract
OBJECTIVE
Cochlear implant (CI) signal processing degrades the spectral components of speech. This requires CI users to rely primarily on temporal cues, specifically, amplitude modulations within the temporal envelope, to recognize speech. Auditory temporal processing ability for envelope modulations worsens with advancing age, which may put older CI users at a disadvantage compared to younger users. In order to evaluate how potential age-related limitations for processing temporal envelope modulations impact spectrally degraded sentence recognition, noise-vocoded sentences were presented to younger and older normal-hearing listeners in quiet. Envelope modulation rates were varied from 10-500 Hz by adjusting the low-pass filter cut-off frequency (LPF). The goal of this study was to evaluate if age impacts recognition of noise-vocoded speech and if this age-related limitation existed for a specific range of envelope modulation rates.
DESIGN
Noise-vocoded sentence recognition in quiet was measured as a function of number of spectral channels (4, 6, 8, and 12 channels) and LPF (10, 20, 50, 75, 150, 375, and 500 Hz) in 15 younger normal-hearing listeners and 15 older near normal-hearing listeners. Hearing thresholds and working memory were assessed to determine the extent to which these factors were related to recognition of noise-vocoded sentences.
RESULTS
Younger listeners achieved significantly higher sentence recognition scores than older listeners overall. Performance improved in both groups as the number of spectral channels and LPF increased. As the number of spectral channels increased, the differences in sentence recognition scores between groups decreased. A spectral-temporal trade-off was observed in both groups in which performance in the 8- and 12-channel conditions plateaued with lower-frequency amplitude modulations compared to the 4- and 6-channel conditions. There was no interaction between age group and LPF, suggesting that both groups obtained similar improvements in performance with increasing LPF. The lack of an interaction between age and LPF may be due to the nature of the task of recognizing sentences in quiet. Audiometric thresholds were the only significant predictor of vocoded sentence recognition. Although performance on the working memory task declined with advancing age, working memory scores did not predict sentence recognition.
CONCLUSION
Younger listeners outperformed older listeners for recognizing noise-vocoded sentences in quiet. The negative impact of age was reduced when ample spectral information was available. Age-related limitations for recognizing vocoded sentences were not affected by the temporal envelope modulation rate of the signal, but instead appear to be related to a generalized task limitation or to reduced audibility of the signal.
INTRODUCTION
For many individuals with relatively severe degrees of hearing loss, traditional hearing aids do not provide sufficient speech recognition ability. In these cases, cochlear implants (CIs) can partially restore hearing ability and provide superior speech recognition benefits compared to hearing aids. CIs are considered a safe and effective treatment for hearing loss for adults of all ages (Luntz et al. 2015), but it is unclear if older CI users ultimately achieve comparable post-implantation performance to younger CI users (Blamey et al. 2013; Sladen and Zappler 2015). The issue of potential age-related CI performance limitations is of increasing importance as the population of CI candidates over the age of 65 years continues to grow each year (Lin et al. 2012).
CIs are considered the most successful neural prosthetic device to date (Wilson and Dorman 2008) with many post-lingually deafened adult users obtaining near-perfect open-set speech recognition scores in quiet (Gifford et al. 2008; Holden et al. 2013). Significant improvements in post-implantation speech recognition in noise are also noted for CI users, but performance in noise remains relatively poor compared to normal-hearing (NH) listeners (Fu and Nogaki 2005). The signals received by CI users are severely degraded representations of the acoustic speech signal. CI speech processing is characterized by severe spectral degradation. Thus, speech recognition performance with a CI is, in part, dependent on the user’s ability to process the available temporal envelope information within spectrally degraded signals. Additional factors that correlate with speech recognition performance with a CI include etiology of deafness, age at onset of hearing loss, neural survival, and electrode positioning within the cochlea (Blamey et al. 2013).
The relative contributions of spectral and temporal cues can be described in terms of a spectral-temporal trade-off, such that excellent speech recognition is possible given severely limited spectral cues (e.g., vocoded speech presented in quiet) as long as temporal cues are available (Shannon et al. 1995). Many studies that evaluated this trade-off tested young adult NH listeners and presented vocoded speech, which functions similarly to the sound processing that occurs in present-day CIs (Loizou 2006). Vocoded speech mimics the spectral degradation involved in CI speech processing by dividing the acoustic speech signal into frequency bands using a series of band-pass filters. The temporal envelopes of each frequency band are extracted and used to modulate a carrier signal, typically narrow bands of noise (noise vocoding) or sine tones (sine vocoding). The modulated bands are then summed to create the final acoustic signal composed of relatively preserved temporal envelope cues and limited spectral cues.
Spectral-temporal trade-offs have been demonstrated for the recognition of various vocoded speech stimuli. Xu et al. (2005) systematically varied the number of spectral channels from 1 to 16 channels and the low-pass filter cut-off frequency (LPF) of the envelope extractor from 1 to 512 Hz. Results showed that in order to achieve a particular level of phoneme recognition performance, the number of channels that were required increased as the LPF decreased. With a single spectral channel, increasing the LPF up to 512 Hz resulted in improved consonant recognition performance. With multiple spectral channels (3-12 channels), consonant recognition improved with increasing the LPF up to 16 Hz. A similar spectral-temporal trade-off has also been observed for gender identification (Fu et al. 2004), sentence recognition (Shannon et al. 1995; Shannon et al. 1998; Stilp and Goupell 2015), and phoneme recognition in the presence of background noise (Xu and Zheng 2007).
While the relationship between temporal and spectral representation for vocoded speech has been well documented, little is understood about the interaction of these cues with the age of the listener. In particular, advancing age may negatively impact speech recognition outcomes in CI users (e.g., Blamey et al. 2013). While some studies demonstrate comparable performance between younger and older CI users (Chatelin et al. 2004; Leung et al. 2005; Pasanisi et al. 2003), more recent studies demonstrate that older CI users have lower speech recognition scores compared to younger CI users on a variety speech recognition measures (Friedland et al. 2010; Lundin et al. 2013; Sladen and Zappler 2015). The reason for the discrepancies between studies investigating the effect of age may be due to differences in speech recognition materials (e.g., single words vs sentences) and/or differences in the age cutoff at which participants were considered either “younger” or “older.”
One consequence of normal auditory aging is a decline in auditory temporal processing abilities. Older listeners, independent of their hearing thresholds, demonstrate deficits on speech and non-speech measures of temporal processing, including gap detection (Snell and Frisina 2000), duration discrimination (Fitzgibbons and Gordon-Salant 1994; 1995), time-compressed speech recognition (Fitzgibbons and Gordon-Salant 1996; Gordon-Salant and Fitzgibbons 1993), and envelope modulation processing (Grose et al. 2009; He et al. 2008; Leigh-Paffenroth and Fowler 2006; Purcell et al. 2004).
Envelope modulation processing is evaluated behaviorally in controlled psychoacoustic experiments by measuring temporal modulation transfer functions (TMTFs) – modulation detection thresholds (MDTs) measured as a function of modulation frequency. In psychoacoustic experiments, amplitude modulations are imposed on non-speech signals, typically wide- or narrow-band noises, at varied modulation depths. TMTFs reveal the low-pass filter characteristics of the auditory system for detecting high-frequency modulation rates. He et al. (2008) obtained TMTFs from younger and older NH listeners. Results for higher modulation rates, where amplitude-modulation detection was dependent on temporal cues, showed that older listeners had significantly higher (e.g., poorer) detection thresholds, which suggested an age-related decline in the encoding of temporal envelope modulations. Using electrophysiological techniques, Leigh-Paffenroth and Fowler (2006) measured the auditory steady state response (ASSR) in younger and older NH listeners. The group of older listeners demonstrated reduced phase locking at the highest modulation rate of 90 Hz. Similarly, Grose et al. (2009) measured the ASSR in younger and older listeners and found age-related deficits in temporal envelope processing, but only for relatively high modulation rates of 128 Hz. Taken together, these studies suggest that older listeners show deficits for processing envelope modulations at relatively high rates. Given that CI users must primarily rely on temporal envelope information to recognize speech, and that CI speech processing algorithms can deliver envelope modulations up to approximately 400 Hz, younger CI users may be able to utilize those higher-rate envelope modulations more effectively than older CI users. Therefore, the presence of age-related temporal envelope processing deficits may put older CI users at a disadvantage compared to younger users for recognizing speech.
Studies that investigated the effects of aging on vocoded speech recognition demonstrated reduced performance in older listeners compared to younger listeners. Schvartz et al. (2008) measured vocoded phoneme recognition in younger, middle-aged, and older NH listeners. The number of frequency channels and amount of frequency-to-place mismatch (to simulate a shallow insertion of the electrode array) was varied, resulting in conditions with differing levels of spectral distortion. When stimuli were severely degraded by limited channels and a greater frequency shift, younger listeners had better phoneme recognition than middle-age and older listeners. Age of the listener and cognitive ability in the domain of working memory were the primary predictors of vowel recognition performance. Sheldon et al. (2008) measured vocoded word recognition abilities in younger and older listeners as a function of the number of spectral channels needed to achieve 50% accuracy. When presented with randomized blocks of different channel conditions, older listeners required approximately eight spectral channels, whereas younger listeners needed only six spectral channels to reach target performance. Schvartz and Chatterjee (2012) measured fundamental frequency (gender) identification in younger and older NH listeners by varying the temporal envelope modulation rate of vocoded stimuli. Results showed that in conditions with reduced spectral cues, younger listeners had better fundamental frequency identification when higher-rate temporal envelope information was presented (>100 Hz), while older listeners did not show this improvement. This suggested that the older listeners were unable to utilize the higher-rate modulations to improve performance on the rate discrimination task.
Age-related changes in cognitive status have also been proposed as contributing to CI performance in older adults. Advancing age is typically associated with declines in working memory ability (Baddeley 2012; Daneman and Carpenter 1980), selective attention (Humes et al. 2006), and cognitive processing speed (Park et al. 1996). Word and sentence recognition performance has been shown to correlate with working memory ability in CI users (Lyxell et al. 1996; Moberly et al. 2017; Tao et al. 2014). Similar correlations have been identified for working memory ability and vocoded-speech recognition scores (Schvartz et al. 2008) and for speech degraded by the presence of background noise (Wingfield and Tun 2001). Additionally, The Ease of Language Understanding (ELU) model (Rönnberg et al. 2013) postulates that listeners must rely on explicit processing (i.e., working memory) in order to resolve highly degraded incoming speech signals, such as vocoded speech. Individuals with high working memory have the cognitive resources to identify the degraded speech signal from semantic long-term memory, but individuals with low working memory do not have the cognitive resources to accurately identify the target signal. Therefore, poorer working memory in older listeners compared to younger listeners may be one source of poorer recognition of vocoded speech among older listeners. Therefore, in order to investigate the effect of age on vocoded-speech recognition, it is advantageous to also investigate the potential contribution of working memory ability to recognition scores.
In summary, CIs may not yield comparable post-implantation speech recognition outcomes in older users compared to younger users because of age-related declines in temporal envelope modulation processing and possibly to declines in working memory. The degree to which older listeners process and make use of temporal envelope information for the recognition of spectrally degraded sentences is unknown. The primary purpose of this study was to evaluate if chronological age impacts speech recognition performance when spectral cues are reduced and if there is an interaction between age and the temporal envelope modulation rate of the speech stimuli. A negative effect of age overall, with no interaction between age and envelope modulation rate, would suggest a general age-related limitation for recognizing vocoded speech stimuli potentially resulting from age-related changes in working memory. The presence of a significant interaction between age and envelope modulation rate, however, would suggest that the potential age-related limitation for recognizing vocoded speech may be due to underlying temporal envelope modulation processing deficits.
This study tested the hypothesis that listeners in both age groups will rely more on temporal envelope cues as spectral cues are reduced, but older listeners will not benefit from increasing envelope modulation rates to the same degree as younger listeners. Figure 1 shows two hypothetical performance functions for speech recognition scores as a function of increasing temporal envelope modulation rate in younger and older listeners. The top panel (A) illustrates an interaction between age and temporal envelope modulation rate, which would suggest an age-related “rate limitation” for processing higher-rate envelope modulations. This result would suggest that age-related auditory temporal envelope processing deficits could limit older listeners’ ability to take advantage of higher-rate modulations. The bottom panel (B) illustrates an alternative result in which there is no interaction between age and envelope modulation rate, which would suggest a potential age-related “task limitation” or a general age-related deficit for recognizing vocoded speech independent of temporal modulation rates. An age-related “task limitation” could potentially be driven by the presence of slight differences in hearing sensitivity between age groups or by cognitive factors that are correlated with chronological age, specifically working memory ability (e.g., Schvartz et al. 2008). It is predicted that older listeners will obtain lower scores on a working memory task compared to younger listeners. It is also hypothesized that if the data suggest a generalized age-related “task limitation,” that these working memory scores will be the strongest predictor of vocoded-speech recognition.
Figure 1.
Hypothetical performance functions for YNH and ONH listeners for recognizing noise-vocoded sentences as a function of low-pass filter cutoff frequency.
For the current study, younger listeners with NH and older listeners with near-normal hearing were selected to investigate the effect of age on the recognition of vocoded sentences presented with varying amounts of spectral and temporal information. Listeners in both age groups were required to have normal or near-normal hearing from 250-4000 Hz in order to isolate the impact of chronological age while minimizing the contribution of hearing loss. The results are expected to provide a possible explanation for why older listeners do not perform as well as younger listeners on vocoded-speech recognition tasks. The results also have the potential to shed light on the possible contribution of auditory temporal envelope processing ability and cognition to CI outcomes.
MATERIALS AND METHODS
Listeners
Fifteen younger NH (YNH) listeners aged 18-22 years (mean=20.5, SD=1.3) and 15 older near-NH (ONH) listeners aged 60-78 years (mean=68.8, SD=5.5) participated in this study. Normal hearing was determined by pure-tone air-conduction thresholds ≤25 dB HL at octave frequencies from 250-4000 Hz in the test ear. All listeners had thresholds ≤25 dB HL at all frequencies with the exception of five ONH listeners who had an audiometric threshold above 25 dB HL at only one octave frequency. Of these five ONH listeners, a single threshold at 30 dB HL was found at 4 kHz in four listeners and at 1 kHz in one listener. Therefore, the ONH group was considered to have near-NH. The pure-tone thresholds were obtained in a sound-attenuating booth over supra-aural headphones. Audiometric threshold data for both groups are shown in Figure 2. All listeners passed a general cognitive awareness screening using the Montreal Cognitive Assessment (MoCA) (Nasreddine et al. 2005). For this study, a score of ≥22 on the MoCA was considered as a “pass” (Cecato et al. 2016; Goupell et al. 2017; Smith and Pichora-Fuller 2015).
Figure 2.
Audiometric data for the YNH group (open circles) and the ONH group (filled circles). Errors bars represent ±1 standard deviation.
Stimuli
Stimuli consisted of the Institute of Electrical and Electronic Engineers (IEEE) sentences (Rothauser et al. 1969), which are low-context, phonetically-balanced sentences, each containing five keywords. The stimuli were recorded by a female speaking with a General American dialect and digitized at a 44.1-kHz sampling rate. Vocoding was accomplished using custom made MATLAB software (The Mathworks, Natick MA, USA). The lower and upper frequency boundaries of vocoded stimuli were 200 and 4000 Hz, respectively. The resulting acoustic signals were forward-backward bandpass filtered into 4, 6, 8, and 12 logarithmically spaced frequency channels using third-order Butterworth bandpass filters, resulting in an attenuation rate of −36 dB/octave. The temporal envelope was extracted from each channel using the Hilbert transform and forward-backward third-order low-pass filters (LPF) with cut-off frequencies of 10, 20, 50, 75, 150, 375, and 500 Hz (−36 dB/octave attenuation rate). The extracted temporal envelope was used to modulate a broadband noise, which was then filtered to correspond to the center frequency and appropriate bandwidth of each channel. The individual frequency channels were summed to generate the final vocoded sentence stimuli. Table 1 shows the frequency boundaries for individual channels for all vocoded stimuli, as well as the center frequency and bandwidths of each individual channel. The maximum modulation rate that can pass through each band-pass filter is also shown for each channel. Narrower bandwidths, which occur for a greater number of channels, limit the information-carrying capacity of each channel and decrease the maximum modulation frequency per channel.
Table 1.
Signal characteristics for each channel condition: frequency boundary, center frequencies, bandwidths, and the maximum modulation rate that could be passed through each bandwidth. The modulation frequencies that can be passed through a band-pass filter are limited by the bandwidth of that channel.
| Channels = 4 | 1 | 2 | 3 | 4 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Frequency boundary (Hz) | 200 | 423 | 894 | 1891 | 4000 | ||||||||
| Bandwidth (Hz) | 223 | 471 | 997 | 2109 | |||||||||
| Max Mod. Rate (Hz) | 111 | 236 | 499 | 1054 | |||||||||
| Channels = 6 | 1 | 2 | 3 | 4 | 5 | 6 | |||||||
| Frequency boundary (Hz) | 200 | 330 | 543 | 894 | 1474 | 2428 | 4000 | ||||||
| Bandwidth (Hz) | 130 | 213 | 352 | 579 | 954 | 1572 | |||||||
| Max Mod. Rate (Hz) | 65 | 107 | 176 | 290 | 477 | 786 | |||||||
| Channels = 8 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||||
| Frequency boundary (Hz) | 200 | 291 | 423 | 615 | 894 | 1301 | 1891 | 2751 | 4000 | ||||
| Bandwidth (Hz) | 91 | 132 | 192 | 279 | 407 | 590 | 860 | 1249 | |||||
| Max Mod. Rate (Hz) | 46 | 66 | 96 | 140 | 204 | 295 | 430 | 625 | |||||
| Channels = 12 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
| Frequency boundary (Hz) | 200 | 257 | 330 | 423 | 543 | 697 | 894 | 1148 | 1474 | 1891 | 2428 | 3116 | 4000 |
| Bandwidth (Hz) | 57 | 73 | 93 | 120 | 154 | 197 | 254 | 326 | 417 | 537 | 688 | 884 | |
| Max Mod. Rate (Hz) | 29 | 37 | 47 | 60 | 77 | 99 | 127 | 163 | 209 | 269 | 344 | 442 |
Procedure
Listeners were seated comfortably in a sound-attenuating booth (Industrial Acoustics Inc., Bronx, NY) and were presented with vocoded stimuli monaurally at 70 dB-A using a soundcard (UA-25 EX, Edirol/Roland Corp., Los Angeles, CA) and amplifier (D-75A, Crown Audio, Elkhart, IN) through circumaural headphones (Sennheiser, HD650, Hanover, Germany). Stimuli were presented to each listener’s right ear, except for one ONH subject whose right ear did not meet the threshold criteria, thus stimuli were presented to the left ear. The presentation level of the stimuli was calibrated using a sound level meter (Brüel & Kjær Sound & Vibration Measurement A/S, Naerum, Denmark) prior to testing. The calibration tone was a 1000-Hz pure tone that was energy equated to the long-term root-mean-square (RMS) of the speech stimuli. Listeners were instructed to listen to each vocoded sentence and to repeat as much of the sentence as possible. Guessing was encouraged and responses were scored in real time by a single examiner. Loose keyword scoring was used, meaning that a keyword was scored as “correct” if the participant repeated the correct root of a keyword. For example, additions and/or omissions of a final /s/ to a keyword were still considered correct. The examiner was seated at a computer screen within the test booth, with the listener faced away from the screen. The computer screen displayed the sentences presented to the listener. The examiner selected which of the five keywords were correctly identified and manually selected when the next sentence should be presented. In this manner, testing was self-paced by the listener. Listeners were given as much time as needed to repeat a sentence. No feedback was provided.
Experimental trials consisted of each channel × LPF condition: four channel conditions (4, 6, 8, 12) and seven LPF conditions (10, 20, 50, 75, 150, 375, 500 Hz), for a total of 28 experimental conditions. Twenty-four sentences for each condition were presented, resulting in a total of 120 keywords that were scored for each condition. Conditions were presented in a completely randomized order in which the channel × LPF condition was different for every trial. Listeners were familiarized with vocoded speech prior to testing during a short training period. The training stimuli consisted of 20 IEEE sentences that were vocoded using eight channels and a 150-Hz LPF. Testing took approximately two hours to complete.
Cognitive Measures
The List Sorting Working Memory Test from the NIH Toolbox Cognitive Test Battery (Gershon et al. 2013) iPad Application was presented to all listeners. Listeners were asked to view a sequence of pictures illustrating animals or food items varying in size and, after the final picture, they are asked to repeat back the name of all the objects from the pictures in size order from the smallest item to the largest. An auditory presentation of the word corresponding to each picture was presented simultaneously during the sequence. The first portion of the test requires listeners to hold the names of all objects in the sequence in memory, then to reorder those objects according to ascending size, and to repeat this new order back to the experimenter. If all objects within a sequence were repeated back in the correct size order, this response was scored as “correct” and the software would initiate the next sequence that included one additional object (up to seven objects in a single sequence). This process continued until the subject failed to correctly report back all objects in the correct order for two sequential trials or until the subject correctly reported all objects for the longest, seven-item sequence. The second portion of the test required an additional level of ordering, where the listeners were asked to first report back only the food within the sequence in ascending size order, and then to report the animals in ascending order. This portion of the test was terminated in the same fashion as the first portion. The final age-corrected standard score was calculated by the software and was used for statistical analyses.
RESULTS
Figure 3 shows sentence recognition performance for each channel condition as a function of LPF cut-off for both groups. For all conditions, the YNH group achieved higher scores than the ONH group. Sentence recognition scores, converted to rationalized arcsine units (RAUs) (Studebaker 1985), were analyzed using a split-plot factorial analysis of variance (ANOVA) with two within-subject factors [number of channels (four levels: 4, 6, 8, and 12 channels) and LPF (seven levels: 10, 20, 50, 75, 150, 375, and 500)] and one between-subjects factor (age group). Mauchly’s test indicated that the assumption of sphericity had been violated, for number of channels [χ2(5)=16.9, p=0.005] and for the channel × LPF interaction [χ2(170)=236.7, p=0.001]. Therefore, the Greenhouse-Geisser correction was used to interpret results when appropriate. Results showed a significant main effect of age [F(1,28)=10.3, p=0.003, =0.27], indicating that the YNH group significantly outperformed the ONH group on average. There was also a significant main effect of channel [F(2.1,58.4)=1976.9, p<0.001, =0.98] and LPF [F(6,168)=285.5, p<0.001, =0.91]. There were significant two-way interactions of age × channel [F(2.1,58.4)=7.2, p=0.001, =0.21] and channel × LPF [F(9.8,273.0)=7.3, p<0.001, =0.21]. The two-way interaction of age × LPF and the three-way interaction of age × channel × LPF were not significant.
Figure 3.
Average sentence recognition scores in RAUs for each channel condition plotted as a function of LPF for each group. Each channel condition appears in a separate panel. Data for the YNH group are plotted with open symbols. Data for the ONH group are plotted with filled symbols. Errors bars represent ±1 standard deviation.
The two-way interaction of age × channel is highlighted in Figure 4. To further explore this interaction, four independent samples t-tests (with Bonferroni corrections) were performed comparing sentence recognition scores between groups for each channel condition. Scores converted to RAUs were calculated for each channel condition collapsed across LPF. Results revealed that the YNH group significantly outperformed the ONH group for 4 channels [t(28)=5.3, p<0.001] and 6 channels [t(28)=3.2, p=0.003], but not for 8 or 12 channels (p>0.0125). The mean difference in sentence recognition scores between groups decreased as the number of channels increased, from 13.0 RAU with 4 channels to only 4.7 RAU for 12 channels. This result suggests that age has a stronger effect on sentence recognition for signals with severely limited spectral information.
Figure 4.
Sentence recognition scores in RAUs in each channel condition averaged across all LPFs for the YNH group (white bars) and the ONH group (black bars). Errors bars represent ±1 standard deviation.
Figure 5 shows sentence recognition scores for each channel condition plotted as a function of LPF. In order to investigate the two-way interaction of channel × LPF, the asymptote of the performance functions for each channel condition was evaluated using planned Helmert contrasts. These contrasts were used to compare performance for each LPF condition to the mean performance of all subsequent LPF levels (higher frequencies). Using this method, the LPF that elicited asymptotic performance for each channel condition was identified. Because there was no significant age × LPF interaction, the Helmert contrasts were performed for each channel condition as a function of LPF collapsed across both age groups. The asymptotic LPF conditions for each channel condition are designated with filled symbols in Figure 5. In the 4-channel condition, listeners reached asymptotic performance at 150 Hz. In the 6-channel condition, asymptotic performance was achieved at 75 Hz. For the 8- and 12-channel conditions, sentence recognition performance asymptotes at 50 Hz. In summary, as the number of channels increased, the LPF cut-off needed for listeners to reach asymptotic performance decreased. This is consistent with a spectral-temporal trade-off for vocoded speech recognition.
Figure 5.
Sentence recognition scores in RAUs plotted as a function of LPF averaged across both age groups. Filled symbols indicate the LPF where asymptotic performance was achieved for each channel condition. Errors bars represent ±1 standard deviation.
Sentence recognition performance increased as LPF increased in all channel conditions. There was no significant age × LPF interaction, which suggests that these data are most representative of the “task limitation” hypothesis plotted in Figure 1B, rather than the “rate limitation” hypothesis. Therefore, it appears that decreased vocoded sentence recognition in older listeners may not be a result of age-related temporal processing deficits specifically for processing envelope modulations. One factor that could contribute to a task limitation for recognizing vocoded sentences is cognitive ability, specifically in the domain of working memory. The results for the List Sorting working memory task did not show a significant difference between groups for the age-corrected standard scores using an independent samples t-test (t[28] = −0.7, p = 0.464), which is unsurprising given that the age-corrected scores account for typical age-related declines in these cognitive domains. Age-corrected scores for the YNH group ranged from 92-139 with a mean of 109.6. Age-corrected scores for the ONH ranged from 75-133 with a mean of 105.8. The uncorrected standard scores, however, showed a significant difference between age groups. Scores for the YNH group ranged from 90-136 with a mean of 111.3. Scores in the ONH group ranged from 86-120 with a mean of 101.5. Results showed significantly poorer scores in the ONH group compared to the YNH group (t[28] = 2.7, p = 0.013).
The nature of the age-related limitation for recognizing vocoded sentences was evaluated using a forward-selection multiple regression analysis, which evaluated the contribution of age, hearing thresholds, and List Sorting score to the average speech recognition results achieved for each channel condition collapsed across LPF condition. Prior to regression analysis, all possible predictor variables were entered into a bivariate correlation analysis to determine multicollinearity and to evaluate which predictor variables should be included in the regression analysis (See supplemental digital content for correlation table). The predictor variables were age, List Sorting age-corrected standard score, hearing thresholds (dB HL) at all frequencies tested (0.25, 0.5, 1, 2, and 4 kHz), as well as the high frequency pure-tone average (HFPTA) reflecting the average threshold at 1, 2, and 4 kHz. Following the correlation analysis, five predictor variables were selected to be entered into the regression analyses: age, Listing Sorting score, 1-kHz threshold, 2-kHz threshold, and 4-kHz threshold. Five forward-selection multiple regression analyses were performed on sentence recognition scores (one regression analysis per channel condition) with five possible predictor variables. As shown in Table 2, a single predictor variable was identified as the significant predictor of sentence recognition scores for 4- and 6-channel conditions: 4-kHz threshold. Two predictor variables were identified for the 8- and 12-channel conditions: 2-kHz threshold and 1-kHz threshold. Thus, hearing sensitivity at one or more frequencies accounted for between 22-51% of the variance in sentence recognition scores in all channel conditions. The cognitive measure of working memory did not contribute significantly to the variance in the sentence recognition scores in any of the five channel conditions.
Table 2.
Results of forward-selection multiple regression analysis.
| Model Statistics | Predictor Statistics | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model # | R2 | ∆ R2 | F∆ | F∆ sig. | F | df | sig. | Predictor(s) | β | sig. | ||
| 4 Channels | 1 | 0.51 | - | - | - | 29.2 | 1,28 | <0.001 | 4 kHz | −0.715 | <0.001 | |
| 6 Channels | 1 | 0.37 | - | - | - | 16.4 | 1,28 | <0.001 | 4 kHz | −0.608 | <0.001 | |
| 8 Channels | 1 | 0.31 | - | - | - | 12.5 | 1,28 | 0.001 | 2 kHz | −0.556 | 0.001 | |
| 2 | 0.41 | 0.11 | 4.76 | 0.038 | 9.5 | 2,27 | 0.001 | 2 kHz | −0.851 | <0.001 | ||
| 1 kHz | 0.437 | 0.038 | ||||||||||
| 12 Channels | 1 | 0.22 | - | - | - | 7.8 | 1,28 | 0.009 | 2 kHz | −0.466 | 0.009 | |
| 2 | 0.43 | 0.22 | 10.34 | 0.003 | 10.4 | 2,27 | <0.001 | 2 kHz | −0.893 | <0.001 | ||
| 1 kHz | 0.632 | 0.003 | ||||||||||
DISCUSSION
The primary purpose of this study was to evaluate if listener age impacted sentence recognition performance in quiet when spectral cues were reduced (using noise vocoding) and if there was an interaction between age and the temporal envelope modulation rate of the speech stimuli. Results showed that YNH listeners outperformed ONH listeners, but only for the most spectrally degraded conditions (4 and 6 channels). As the spectral information of the speech stimuli increased with increasing number of channels, the LPF that was required to reach maximum performance decreased. This pattern reflected a spectral-temporal trade-off for noise vocoded sentence recognition in quiet for both listener groups. Although there was a significant effect of age group, there was no interaction between age group and LPF condition. This result suggested that there was no difference in the ONH listeners’ ability to utilize envelope modulations between 10-500 Hz compared to the YNH listeners for recognizing vocoded sentences in quiet. This result, plotted in Figure 3, most closely supports the “task limitation” hypothesis illustrated in Figure 1B. Vocoded speech recognition performance was significantly related to listeners’ audiometric thresholds and not to working memory ability.
Age-related differences in sentence recognition performance were dependent on the number of vocoded channels. Figure 4 highlights the effect of age on sentence recognition in each channel condition. The sentence recognition differences between age groups decreased as the number of channels increased. The effect of age was largest and only significant for the more difficult vocoding conditions, with a 13.0 RAU reduction in sentence recognition scores for the ONH group with 4 channels, but only a 4.7 RAU difference with 12 channels. This result is consistent with previous studies that revealed age differences in speech recognition scores for more degraded signals, such as reverberant or time-compressed speech (Gordon-Salant and Fitzgibbons 1993; Wingfield et al. 1985).
The shape of the performance functions for the different vocoding conditions was consistent with a spectral-temporal trade-off. Figure 5 shows performance functions averaged across both age groups for each channel condition. The lowest envelope cutoff frequency at which listeners reached asymptotic performance is represented by a filled symbol. In the 4- and 6- channel conditions, performance asymptotes at 150 and 75 Hz, respectively. Performance in the 8- and 12-channel conditions asymptotes when the LPF reaches 50 Hz. This is consistent with a spectral-temporal trade-off pattern. When spectral information is sparse, as with the 4-channel condition, improvements in performance can be achieved with increasing temporal information to include modulation rates up to 150 Hz. Conversely, when spectral information is more robust, as in the 8- and 12-channel conditions, increasing temporal information to include modulations above 50 Hz does not significantly improve performance. Another factor that could limit the benefit a listener receives from higher-frequency modulation rates with a large number of channels is the information-carrying capacity of each channel. As the number of channels increases, the bandwidth of each individual channel decreases. Narrow bandwidths proportionally limit the modulation frequencies that are passed through the filter. Table 1 shows that in the case of the 12-channel condition, only the upper half of the channels (channels 8-12) had the capacity to transmit modulation frequencies higher than 150 Hz. However, all but the lowest frequency channel in the 4-channel condition could convey modulations above 150 Hz. Additionally, the upper two channels in the 4-channel condition had the bandwidth capacity to carry modulations at 500 Hz and above, which may have contributed to the improvements in sentence recognition up to the 150-Hz LPF condition.
The results generally support the task limitation hypothesis (Figure 1B), defined earlier as a general age-related deficit for recognizing vocoded sentences independent of temporal modulation rate. There was neither an age × LPF interaction, nor an age × channel × LPF interaction. Thus, age-related deficits in vocoded sentence recognition in quiet were not affected by the temporal envelope modulation rate. It should be noted, however, that the temporal distortion resulting from reductions in the envelope modulation rate is only one of many styles of temporal degradation. Another form of temporal distortion (i.e., temporal jitter, time compression) may have interacted with listener age, so it cannot be ruled out that age-related temporal processing deficits may contribute to relatively poor vocoded sentence recognition in older listeners. An age-related task limitation could suggest the contribution of more global, higher-level declines that affect ONH listeners’ performance, such as cognitive processes. However, despite an age-related difference in scores on the working memory task used in this study, the working memory scores did not significantly predict vocoded sentence recognition performance. Note that the only cognitive domain that was evaluated in this study was working memory. It is possible that other age-related cognitive declines (e.g., inhibition and/or cognitive processing speed) may contribute to older listeners’ ability to perform this task.
Results reported by Schvartz and Chatterjee (2012) are somewhat conflicting with results of the current study. Schvartz and Chatterjee (2012) measured fundamental frequency identification for vowel tokens by younger and older NH listeners as a function of number of channels (1, 4, 8, 16, and 32 channels) and LPF (20, 50, 100, 200, and 400 Hz). The results of that study revealed that older listeners’ identification performance did not improve to the same degree as that of younger listeners when presented with increasing LPF. Younger listeners’ performance increased when the LPF was increased from 50 to 400 Hz, while older listeners’ performance did not. Therefore, older listeners were not able to utilize envelope modulations between 50-400 Hz as effectively as younger listeners, which suggested an age-related rate limitation for fundamental frequency identification. Results of the current study are more consistent with a task-related limitation for older listeners’ vocoded sentence recognition, rather than a temporal modulation rate limitation. The difference in findings could be a result of the inherent differences between frequency identification tasks and sentence recognition tasks.
Age-related temporal processing deficits have also been demonstrated by older listeners for identifying vocoded speech tokens. Goupell et al. (2017) presented a continuum of single-word vocoded stimuli that varied in the silence duration between phonemes. Younger and older NH listeners selected whether the token was perceived as “dish” or “ditch.” The number of spectral channels, as well as the low-pass filter cut-off frequency (50 and 400 Hz) were varied. Results revealed a spectral-temporal trade-off for identifying the speech tokens in both groups, but the older listeners required significantly longer silence durations to change their percept from “dish” to “ditch” compared to the younger listeners. In addition, when spectral and temporal information were reduced, older listeners’ performance was more negatively impacted compared to younger listeners. This result is consistent with the interaction of age and channel that was found in the current study. When spectral cues are greatly reduced, as in the 4- and 6-channel conditions, the performance differences between age groups were larger. However, the current study did not reveal any significant age × LPF interactions as was shown in Goupell et al. (2017). This likely occurred because of the nature of the stimuli and the task. Goupell et al. (2017) showed age × LPF interactions for specific silence durations along the continuum, confirming the presence of age-related temporal processing deficits for longer durations of silence (above 40 msec), but not for shorter durations. It is possible that the artificial manipulation of silence duration within single-words highlighted the temporal processing limitations in the older group. In contrast, the current study used sentences presented in quiet, which contain some contextual cues. Older listeners benefit more from context compared to younger listeners (Pichora-Fuller et al. 1995), therefore the stimuli chosen for the current study (i.e., sentences presented in quiet) may not have revealed potential underlying temporal processing deficits for processing envelope modulations.
As shown in Table 2, the strongest predictors of listeners’ performance in the current study were hearing sensitivity at 1, 2, and 4 kHz. When hearing thresholds were added to the model as predictor variables, chronological age did not significantly contribute to the variance in sentence recognition scores. On average, the difference in hearing thresholds between groups was 8.6 dB at 1 kHz, 11.3 dB at 2 kHz, and 20.6 dB at 4 kHz. Based on the Speech Intelligibility Index (SII; ANSI 1997) and recent re-calculations of band importance functions for sentences (Healy et al. 2013), the contribution of 1/3-octave bands between 1-4 kHz, particularly in the region of 2 kHz, is of considerable importance for recognition of unprocessed sentences. Thus, the correlation between listeners’ hearing sensitivity and vocoded sentence recognition performance is consistent with the importance of the spectral region between 1-4 kHz as shown in the SII. Although listeners in the ONH group had audiometrically NH up to 4 kHz (with the exception of five listeners who were found to have one 30 dB HL threshold at a single frequency), their average thresholds were elevated by 10.9 dB compared to listeners in the YNH group. A reduction in signal audibility could potentially reduce the salience of the amplitude modulations and temporal cues that are required for accurate vocoded sentence recognition. Although the speech stimuli were presented at 70 dB-A in an effort to reduce the potential differences in signal audibility between groups, it is possible that slight differences in signal audibility present a potential confound in the current study. However, there were no significant differences in vocoded speech recognition between groups in the 8- and 12-channel conditions. Age differences emerged in the 4- and 6-channel conditions only. As all channel conditions were equally audible to each listener, this result suggests that a separate variable related to age, apart from audiometric thresholds, contributed to the vocoded speech recognition scores. Such an interpretation of the data collected with vocoded speech would be in line with Dubno et al. (1984), who found that both mild hearing loss and increasing age contributed to worse speech understanding in noise even though performance was matched in quiet conditions. In other words, the control condition in quiet allowed for the researchers to show that audibility was not an issue for the stimuli, which is similar to the vocoded speech using different numbers of channels in the current study. Given recent research suggesting the existence of suprathreshold hearing deficits [i.e., synaptopathy or hidden hearing loss; Kujawa and Liberman (2006)], even if hearing thresholds were equivalent between age groups, such information may not capture suprathreshold losses at the auditory periphery that could be contributing to age-related deficits in speech processing in background noise or vocoded speech recognition.
Sentence recognition scores in the 4-channel condition are similar to the results that are seen for modulation detection thresholds (MDTs). When measured in TMTF experiments using wideband noise carriers, MDTs show low-pass filter characteristics with thresholds increasing for frequencies above 50 Hz and little to no detection ability above approximately 300 Hz (Viemeister 1979). In the current study, modulations up to 150 Hz improved sentence recognition performance when spectral information was sparse. The presence of periodicity cues, which are introduced with modulation frequencies above 50 Hz, may have contributed to this improvement. Rosen (1992) observed that temporal fluctuations between 50-500 Hz provide segmental information to cue voicing and manner, prosodic information to cue intonation and stress, as well as timing information for marking syntactic units.
The results of the current study may have indirect implications for the growing population of older CI users. Although there was an effect of age and audiometric thresholds, in which the YNH group outperformed the ONH group, there was also an interaction between age and channel, suggesting that these age differences were reduced as more spectral information was provided in the 8- and 12-channel conditions. This may highlight the importance of spectral cues for recognizing CI-processed speech, particularly in older users. Alternatively, results reported for this study suggested that age-related differences in vocoded sentence recognition were likely related to differences in hearing sensitivity and/or to another age-related factor that has yet to be determined. The differences in hearing thresholds across age groups are a limitation of the current study, thus the implications for actual CI users given these results are presently limited.
If future studies show that older age independent of hearing loss is associated with poorer temporal processing, which results in poorer vocoded sentence recognition, counseling regarding the effects of age on auditory system function to older CI candidates may assist in establishing appropriate and reasonable expectations. In light of reports suggesting that post-implantation rehabilitation and training can improve speech recognition in adult CI users (Barlow et al. 2016; Fu and Galvin 2007; Plant et al. 2015), aural rehabilitation programs targeted at improving speech recognition skills may help to mitigate potential age-related deficits in older CI users.
CONCLUSION
Results suggested that audiometric thresholds were the strongest predictor of vocoded speech recognition, not working memory ability nor age-related temporal envelope processing limitations. ONH listeners had poorer vocoded sentence recognition compared to the YNH listeners, but only for the most spectrally degraded conditions. A spectral-temporal trade-off was observed for both groups. As spectral information decreased, the LPF that was needed for listeners to reach asymptotic performance increased. There was also no difference in older listeners’ ability to utilize envelope modulations between 10-500 Hz compared to younger listeners for recognizing vocoded sentences in quiet. Audiometric thresholds at 1, 2, and 4 kHz were the only significant predictors of sentence recognition performance. On average, thresholds were 10.9 dB worse in the older group than the younger group. These results suggested that older listeners were at a disadvantage for recognizing noise-vocoded sentences in quiet, primarily because of differences in audiometric thresholds between age groups.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by NIH Grants R01-AG051603 (M.J.G.), R37-AG09191 (S.G.S.), F32-DC016478 (M.J.S.), and NIH Institutional Research Grant T32 DC000046E (S.G.S. – Co-PI with Catherine Carr). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would like to thank the University of Maryland’s College of Behavioral & Social Sciences (BSOS) Dean’s Office for their support. We would like to thank Sasha Pletnikova, Allison Heuber, Adelia Witt, Hannah Johnson, and Shelby Creelman for help in collecting and analyzing these data.
This work was supported by NIH Grant R01 AG051603 (M.J.G.), NIH Grant R37 AG09191 (S.G.S.), NIH Grant F32 DC016478 (M.J.S.), NIH Institutional Research Grant T32 DC000046E (S.G.S. – Co-PI with Catherine Carr), and a seed grant from the University of Maryland-College Park College of Behavioral and Social Sciences. We have no other conflicts of interest. Finally, this work met all requirements for ethical research put forth by the IRB of the University of Maryland.
References:
- ANSI. (1997). S3.5 (R2007), American National Standard Methods for the Calculation of the Speech Intelligibility Index. Acoustical Society of America, New York. [Google Scholar]
- Baddeley A (2012). Working memory: Theories, models, and controversies. Ann Rev Psychol, 63, 1–29. [DOI] [PubMed] [Google Scholar]
- Barlow N, Purdy SC, Sharma M, et al. (2016). The effect of short-term auditory training on speech in noise perception and cortical auditory evoked potentials in adults with cochlear implants. Semin Hear, 37, 084–098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blamey P, Artieres F, Baskent D, et al. (2013). Factors affecting auditory performance of postlinguistically deaf adults using cochlear implants: An update with 2251 patients. Audiol Neurotol, 18, 36–47. [DOI] [PubMed] [Google Scholar]
- Cecato JF, Martinelli JE, Izbicki R, et al. (2016). A subtest analysis of the Montreal cognitive assessment (MoCA): Which subtests can best discriminate between healthy controls, mild cognitive impairment and Alzheimer’s disease? Int Psychogeriatr, 28, 825–832. [DOI] [PubMed] [Google Scholar]
- Chatelin V, Kim EJ, Driscoll C, et al. (2004). Cochlear implant outcomes in the elderly. Otol Neurotol, 25, 298–301. [DOI] [PubMed] [Google Scholar]
- Daneman M, Carpenter PA (1980). Individual differences in working memory and reading. J Verbal Learn Verbal Behav, 19, 450–466. [Google Scholar]
- Dubno JR, Dirks DD, Morgan DE (1984). Effects of age and mild hearing loss on speech recognition in noise. J Acoust Soc Am, 76, 87–96. [DOI] [PubMed] [Google Scholar]
- Fitzgibbons PJ, Gordon-Salant S (1994). Age effects on measures of auditory duration discrimination. J Speech Hear Res, 37, 662–670. [DOI] [PubMed] [Google Scholar]
- Fitzgibbons PJ, Gordon-Salant S (1996). Auditory temporal processing in elderly listeners. J Am Acad Audiol, 7, 183–189. [PubMed] [Google Scholar]
- Fitzgibbons PJ, Gordon‐Salant S (1995). Age effects on duration discrimination with simple and complex stimuli. J Acoust Soc Am, 98, 3140–3145. [DOI] [PubMed] [Google Scholar]
- Friedland DR, Runge-Samuelson C, Baig H, et al. (2010). Case-control analysis of cochlear implant performance in elderly patients. Arch Otolaryngol Head Neck Surg, 136, 432–438. [DOI] [PubMed] [Google Scholar]
- Fu Q-J, Chinchilla S, Galvin JJ (2004). The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users. J Assoc Res Otolaryngol, 5, 253–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Q-J, Galvin JJ (2007). Perceptual learning and auditory training in cochlear implant recipients. Trends Amplif, 11, 193–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Q-J, Nogaki G (2005). Noise susceptibility of cochlear implant users: the role of spectral resolution and smearing. J Assoc Res Otolaryngol, 6, 19–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gershon RC, Wagster MV, Hendrie HC, et al. (2013). NIH toolbox for assessment of neurological and behavioral function. Neurol, 80, S2–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gifford RH, Shallop JK, Peterson AM (2008). Speech recognition materials and ceiling effects: Considerations for cochlear implant programs. Audiol Neurotol, 13, 193–205. [DOI] [PubMed] [Google Scholar]
- Gordon-Salant S, Fitzgibbons PJ (1993). Temporal factors and speech recognition performance in young and elderly listeners. J Speech Hear Res, 36, 1276–1285. [DOI] [PubMed] [Google Scholar]
- Goupell MJ, Gaskins CR, Shader MJ, et al. (2017). Age-related differences in the processing of temporal envelope and spectral cues in a speech segment. Ear Hear, 38, e335–e342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grose JH, Mamo SK, Hall JW 3rd. (2009). Age effects in temporal envelope processing: Speech unmasking and auditory steady state responses. Ear Hear, 30, 568–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He N. j., Mills JH, Ahlstrom JB, et al. (2008). Age-related differences in the temporal modulation transfer function with pure-tone carriers. J Acoust Soc Am, 124, 3841–3849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Healy EW, Yoho SE, Apoux F (2013). Band importance for sentences and words reexamined. J Acoust Soc Am, 133, 463–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holden LK, Finley CC, Firszt JB, et al. (2013). Factors affecting open-set word recognition in adults with cochlear implants. Ear Hear, 34, 342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Humes LE, Lee JH, Coughlin MP (2006). Auditory measures of selective and divided attention in young and older adults using single-talker competition. J Acoust Soc Am, 120, 2926–2937. [DOI] [PubMed] [Google Scholar]
- Kujawa SG, Liberman MC (2006). Acceleration of age-related hearing loss by early noise exposure: evidence of a misspent youth. J Neurosci, 26, 2115–2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leigh-Paffenroth E, Fowler CG (2006). Amplitude-modulated auditory steady-state responses in younger and older listeners. J Acoust Soc Am, 17, 582–597. [DOI] [PubMed] [Google Scholar]
- Leung J, Wang NY, Yeagle JD, et al. (2005). Predictive models for cochlear implantation in elderly candidates. Arch Otolaryngol Head Neck Surg, 131, 1049–1054. [DOI] [PubMed] [Google Scholar]
- Lin FR, Chien WW, Li L, et al. (2012). Cochlear implantation in older adults. Medicine, 91, 229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loizou P (2006). Speech processing in vocoder-centric cochlear implants In Moller A (Ed.), Cochlear and Brainstem Implants (pp. 109–143). Karger, Basel, Switzerland. [DOI] [PubMed] [Google Scholar]
- Lundin K, Näsvall A, Köbler S, et al. (2013). Cochlear implantation in the elderly. Cochlear Implants Int, 14, 92–97. [DOI] [PubMed] [Google Scholar]
- Luntz M, Yehudai N, Most T, et al. (2015). Cochlear implantation in elderly individuals: Insights based on a retrospective evaluation. Harefuah, 154, 761–765. [PubMed] [Google Scholar]
- Lyxell B, Andersson J, Arlinger S, et al. (1996). Verbal information-processing capabilities and cochlear implants: Implications for preoperative predictors of speech understanding. J Deaf Stud Deaf Educ, 1, 190–201. [DOI] [PubMed] [Google Scholar]
- Moberly AC, Houston DM, Harris MS, et al. (2017). Verbal working memory and inhibition-concentration in adults with cochlear implants. Laryngo Investig Otolaryngol, 2, 254–261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nasreddine ZS, Phillips NA, Bédirian V, et al. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. A Am Geriatr Soc, 53, 695–699. [DOI] [PubMed] [Google Scholar]
- Park DC, Smith AD, Lautenschlager G, et al. (1996). Mediators of long-term memory performance across the life span. Psychol Aging, 11, 621. [DOI] [PubMed] [Google Scholar]
- Pasanisi E, Bacciu A, Vincenti V, et al. (2003). Speech recognition in elderly cochlear implant recipients. Clin Otolaryngol, 28, 154–157. [DOI] [PubMed] [Google Scholar]
- Pichora-Fuller MK, Schneider BA, Daneman M (1995). How young and old adults listen to and remember speech in noise. J Acoust Soc Am, 97, 593–608. [DOI] [PubMed] [Google Scholar]
- Plant G, Bernstein CM, Levitt H (2015). Optimizing performance in adult cochlear implant users through clinician directed auditory training. Semin Hear, 36, 296–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell DW, John SM, Schneider BA, et al. (2004). Human temporal auditory acuity as assessed by envelope following responses. J Acoust Soc Am, 116, 3581. [DOI] [PubMed] [Google Scholar]
- Rönnberg J, Lunner T, Zekveld A, et al. (2013). The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances. Front Syst Neurosci, 7, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen S (1992). Temporal information in speech: Acoustic, auditory and linguistic aspects. PhiloTrans Royal Soc London, 336, 367–373. [DOI] [PubMed] [Google Scholar]
- Rothauser E, Chapman W, Guttman N, et al. (1969). IEEE recommended practice for speech quality measurements. IEEE Trans Audio Electroacoust, 17, 225–246. [Google Scholar]
- Schvartz KC, Chatterjee M (2012). Gender identification in younger and older adults: Use of spectral and temporal cues in noise-vocoded speech. Ear Hear, 33, 411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schvartz KC, Chatterjee M, Gordon-Salant S (2008). Recognition of spectrally degraded phonemes by younger, middle-aged, and older normal-hearing listeners. J Acoust Soc Am, 124, 3972–3988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon RV, Zeng F-G, Kamath V, et al. (1995). Speech recognition with primarily temporal cues. Science, 270, 303–304. [DOI] [PubMed] [Google Scholar]
- Shannon RV, Zeng F-G, Wygonski J (1998). Speech recognition with altered spectral distribution of envelope cues. J Acoust Soc Am, 104, 2467–2476. [DOI] [PubMed] [Google Scholar]
- Sheldon S, Pichora-Fuller MK, Schneider BA (2008). Effect of age, presentation method, and learning on identification of noise-vocoded words. J Acoust Soc Am, 123, 476–488. [DOI] [PubMed] [Google Scholar]
- Sladen DP, Zappler A (2015). Older and younger adult cochlear implant users: Speech recognition in quiet and noise, quality of life, and music perception. Am J Audiol, 24, 31–39. [DOI] [PubMed] [Google Scholar]
- Smith SL, Pichora-Fuller MK (2015). Associations between speech understanding and auditory and visual tests of verbal working memory: Effects of linguistic complexity, task, age, and hearing loss. Front Psychol, 6, 1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snell KB, Frisina DR (2000). Relationships among age-related differences in gap detection and word recognition. J Acoust Soc Am, 107, 1615–1626. [DOI] [PubMed] [Google Scholar]
- Stilp CE, Goupell MJ (2015). Spectral and temporal resolutions of information-bearing acoustic changes for understanding vocoded sentences. J Acoust Soc Am, 137, 844–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Studebaker GA (1985). A "rationalized" arcsine transform. J Speech Hear Res, 28, 455–462. [DOI] [PubMed] [Google Scholar]
- Tao D, Deng R, Jiang Y, et al. (2014). Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users. PloS one, 9, e99096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viemeister NF (1979). Temporal modulation transfer functions based upon modulation thresholds. J Acoust Soc Am, 66, 1364–1380. [DOI] [PubMed] [Google Scholar]
- Wilson BS, Dorman MF (2008). Interfacing sensors with the nervous system: Lessons from the development and success of the cochlear implant. IEEE Sensors J, 8, 131. [Google Scholar]
- Wingfield A, Poon LW, Lombardi L, et al. (1985). Speed of processing in normal aging: effects of speech rate, linguistic structure, and processing time. J Gerontol, 40, 579–585. [DOI] [PubMed] [Google Scholar]
- Wingfield A, Tun PA (2001). Spoken language comprehension in older adults: Interactions between sensory and cognitive change in normal aging. Sem Hear, 22, 287–302. [Google Scholar]
- Xu L, Thompson CS, Pfingst BE (2005). Relative contributions of spectral and temporal cues for phoneme recognition. J Acoust Soc Am, 117, 3255–3267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu L, Zheng Y (2007). Spectral and temporal cues for phoneme recognition in noise. J Acoust Soc Am, 122, 1758. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





