Loudness Context Effects in Normal-Hearing Listeners and Cochlear-Implant Users

Ningyuan Wang; Heather A Kreft; Andrew J Oxenham

doi:10.1007/s10162-015-0523-y

. 2015 Jun 4;16(4):535–545. doi: 10.1007/s10162-015-0523-y

Loudness Context Effects in Normal-Hearing Listeners and Cochlear-Implant Users

Ningyuan Wang ^1,^✉, Heather A Kreft ², Andrew J Oxenham ^1,²

PMCID: PMC4488167 PMID: 26040213

Abstract

Context effects in loudness have been observed in normal auditory perception and may reflect a general gain control of the auditory system. However, little is known about such effects in cochlear-implant (CI) users. Discovering whether and how CI users experience loudness context effects should help us better understand the underlying mechanisms. In the present study, we examined the effects of a long-duration (1-s) intense precursor on the loudness relations between shorter-duration (200-ms) target and comparison stimuli. The precursor and target were separated by a silent gap of 50 ms, and the target and comparison were separated by a silent gap of 2 s. For normal-hearing listeners, the stimuli were narrowband noises; for CI users, all stimuli were delivered as pulse trains directly to the implant. Significant changes in loudness were observed in normal-hearing listeners, in line with earlier studies. The CI users also experienced some loudness changes but, in contrast to the results from normal-hearing listeners, the effect did not increase with increasing level difference between precursor and target. A “dual-process” hypothesis, used to explain earlier data from normal-hearing listeners, may provide an account of the present data by assuming that one of the two mechanisms, involving “induced loudness reduction,” was absent or reduced in CI users.

Keywords: auditory context effects, loudness recalibration, cochlear implants, loudness enhancement

Introduction

Our perception of a stimulus or event is dependent in large part on the context in which it is presented. Much has been learned about perceptual processing through the study of context effects and their neural correlates. In auditory perception, judgments of the loudness of a sound can be affected by the intensity relation between that target sound and sounds that precede it. Early studies showed that when an intense auditory stimulus precedes a weaker one, the loudness of the weaker stimulus can be judged to have increased by as much as 30 dB, whereas when the preceding signal, or precursor, is less intense than the following target signal, the loudness of the target decreases somewhat from its loudness in isolation (Galambos et al. 1972; Elmasian and Galambos 1975; Elmasian et al. 1975). This phenomenon, known as loudness enhancement or decrement, was thought to reflect a general principle of intensity coding and gain control in the auditory system.

These early studies generally involved a three-tone paradigm, with a conditioning (or precursor) tone, followed by a target tone and then a comparison tone, which subjects adjusted in level to match the loudness of the target tone. All three tones were presented at the same frequency. Manipulations of the presentation ear revealed that loudness enhancement was strongest when all tones were presented to the same ear (binaural or monaural presentations). In a dichotic situation (with the precursor and target presented to opposite ears), less, but still significant, loudness enhancement was observed (Elmasian and Galambos 1975). In contrast, loudness decrement effects seemed relatively insensitive to ear of presentation (Elmasian et al. 1980), suggesting that loudness enhancement may involve some monaural, possibly peripheral, processing components, whereas loudness decrement may primarily involve more central sites. A finding that raised fundamental questions concerning the peripheral nature of loudness enhancement was that enhancement (and decrement) could also be observed when the conditioning tone was presented after the target tone in time (Elmasian et al. 1980).

Loudness enhancement and decrement are considered “assimilative” effects, in that the loudness of the target is drawn towards that of the conditioner (and presumably vice versa). Other studies of loudness context effects have reported the opposite, namely that an intense precursor tone can reduce the loudness of a subsequent tone that is presented at a lower level. In contrast to loudness enhancement, this “loudness recalibration” (e.g., Marks 1994; Mapes-Riordan and Yost 1999) or “induced loudness reduction” (e.g., Scharf et al. 2002) seems to be a relatively long-lasting effect. It is generally observed when the precursor and target are at the same frequency, but the comparison tone is presented at a frequency that is remote from that of the precursor and target. As with loudness enhancement, the effect can be relatively large, ranging from about 10 to 20 dB, depending on the measurement method and stimulus parameters used. Interestingly, maximum loudness recalibration is not obtained directly after the precursor, but instead builds up to reach a maximum at a delay of around 1 s, and is still observable at a delay of 3 s (Mapes-Riordan and Yost 1999).

As proposed by Scharf et al. (2002), and supported by Arieh and Marks (2003a), the build-up and relatively long time constants associated with loudness recalibration suggest a possible reinterpretation of the earlier loudness enhancement studies, where all three tones were presented at the same frequency. In particular, it may be that the comparison tone is reduced in loudness by the precursor rather than the target tone being increased in loudness. To investigate this issue, Oberfeld (2007) used a four-tone task, with the first three tones (precursor, target, and comparison) at the same frequency and fourth tone at a remote frequency. He asked listeners to compare the loudness of the original comparison (third) tone with that of the fourth tone. According to Oberfeld’s results, it seems that both enhancement and adaptation contribute to loudness recalibration. Results from his study support an earlier hypothesis of Arieh and Marks (2003a) that loudness recalibration reflects a dual-process mechanism. On one hand, when an intense auditory signal (precursor) precedes a weaker one (target) by a short gap (less than 100 ms), the loudness of the following signal can be enhanced (Elmasian and Galambos 1975; Marks 1988); on the other hand, when the time interval between precursor and target (close in frequency) exceeds 200 ms, the target signal will be reduced, perhaps due to adaptation (Arieh and Marks 2003a). These properties of loudness recalibration could be explained by the interaction between a fast-onset and fast-decay enhancement process and a fast-onset but slower-decay adaptation process (Oberfeld 2007).

There are many potential sources of both enhancement and adaptation along the auditory pathways, and few attempts have been made to constrain the locus or nature of these sources. One of the potential sources of an adaptation-like process is the medial olivocochlear (MOC) efferent system, which acts to reduce both the gain and frequency selectivity of the basilar membrane response to sound, by affecting the action of the outer hair cells (Nieder et al. 2003; Guinan 2006; Jennings et al. 2009). As such, an MOC-based effect could, in principle, help explain why loudness effects transfer only partially across the ears; MOC effects are activated bilaterally but are strongest for ipsilateral activation (Guinan 2006). Although the time constants associated with the MOC fast effect are not thought to extend to several seconds, the slow effect of MOC may at least contribute to loudness changes (Cooper and Guinan 2003).

In this study, we investigated context effects on loudness using both normal-hearing listeners and cochlear-implant (CI) users with a three-tone paradigm similar to that used in early loudness enhancement studies. We use loudness context effect (LCE) as a relatively neutral term to avoid any assumption regarding whether the effect reflects an enhancement of the target or adaptation of the comparison (or both). The stimuli were presented as high-rate pulse trains to single electrodes of the CIs. In the normal-hearing listeners, narrowband noises were used (rather than tones) to better simulate the spread of excitation produced by single-electrode stimulation in CIs (e.g., Bingabr et al. 2008). In addition, we varied the frequency (or electrode) of the precursor relative to that of the target and comparison tones. The rationale was that if two different mechanisms are responsible for the time course of LCE, then the two mechanisms might have different frequency selectivity. The comparison of normal-hearing listeners and CI users allowed us to test the role of the MOC efferent system. Because MOC efferent activation affects cochlear gain, it requires an intact cochlea. Therefore, any portion of the effect due to MOC efferent effects should not be observed in CI users. Thus, if CI users show some LCE, we could conclude that LCE cannot be due solely to MOC activation (although it may still play some role). As a result, investigating LCE in CI users may provide us with important information about the potential underlying mechanisms. Some researchers have suggested that the cochlear gain changes induced by the MOC efferent system may be important for speech perception in noise (e.g., Guinan 2010; Garinis et al. 2011; Clark et al. 2012; de Boer et al. 2012; Mishra and Lutman 2014). Therefore, any differences in the results between normal-hearing listeners and CI users may provide guidance for future CI signal processing systems to restore normal context effects for auditory and speech perception.

Experiment 1: Loudness Context Effects in Normal-Hearing Listeners

Methods

Subjects

Seven listeners (two males, five females) participated in this experiment and were compensated for their time. Their ages ranged from 18 to 63 years (mean age 26.1 years; only one subject older than 45). All listeners had normal hearing, as defined by audiometric thresholds below 20 dB HL at octave frequencies between 0.25 and 8 kHz. All participants provided written informed consent, and all protocols were approved by the Institutional Review Board of the University of Minnesota.

Stimuli

A schematic diagram of the stimuli used in this experiment is shown in Figure 1A. Each trial consisted of three sounds: a precursor, a target, and a comparison. The temporal properties of the stimuli remained constant for the entire experiment. The total duration of the precursor was 1 s, and the total durations of both the target and the comparison were 200 ms. The precursor and target were separated by a silent gap of 50 ms, which was sufficient to trigger both loudness enhancement and ILR effects according to Arieh and Marks (2003a), and the target and comparison were separated by a silent gap of 2 s. All the stimuli were gated on and off with 10-ms raised-cosine ramps. All the stimuli were narrowband noises, created by filtering a Gaussian white noise with a second-order IIR peaking filter in the time domain, with slopes of either 24 or 96 dB/octave. The use of bandpass noise was intended to simulate the spread of current produced by CIs, and the different slopes were intended to simulate different degrees of current spread produced by monopolar and bipolar stimulation modes. The 24 dB/octave slopes were chosen to be within the range provided by Bingabr et al. (2008) to simulate monopolar stimulation (although shallower slopes have also been assumed; see Oxenham and Kreft. 2014); the 96 dB/octave slopes were chosen to be in the range of the values provided by Bingabr et al. (2008) to simulate bipolar stimulation.

Fig. 1 — Schematic diagrams of stimuli used in Experiments 1 and 2. Panel **(A)** shows the stimuli for Experiment 1, where the precursor was presented at one of five center frequencies of 455, 762, 1278, 2142, or 3590 Hz, selected from the standard Advanced Bionics 16-channel map, corresponding to the center frequencies of electrodes E2, E5, E8, E11, or E14, respectively, and the target and comparison stimuli had a center frequency of 1278 Hz, corresponding to electrode 8 of the standard CI map. Panel **(B)** shows the stimuli from Experiment 2, where a pulse train was delivered directly to those selected electrodes (E2, E5, E8, E11, or E14) of the CI via a clinical research platform.

The level of the target was always 60 dB sound pressure level (SPL). A precursor level of 70 dB SPL was tested in conjunction with filter slopes of both 24 and 96 dB/octave. The 10 dB level difference between the precursor and target was selected because it was deemed large enough to produce some effect, based on previous studies (Elmasian and Galambos 1975; Elmasian et al. 1980), but not so large as to make a comparison with CI users difficult, based on their more limited dynamic range (Hong et al. 2003). The center frequency of the precursor within each block was selected from one of five values (455, 762, 1278, 2142, or 3590 Hz), approximately logarithmically spaced around the center frequency of the target and comparison, which was always 1278 Hz. The spacing between adjacent components corresponds to 3.5 to 4.5 equivalent rectangular bandwidths (ERBs) of the auditory filters (Glasberg and Moore 1990). The level of precursor and target remained constant within each block. The level of the comparison varied between trials within a specific range centered around the target level, from 57 to 63 dB SPL in 1-dB steps.

Additional data were collected with an 85-dB SPL precursor and a 60-dB SPL target, with filter slopes of 96 dB/octave and only one precursor center frequency of 1278 Hz, corresponding to the center frequency of the target and comparison. The comparison level range was from 55 to 65 dB SPL, in 2-dB steps. A larger step size was used with the higher precursor level, because a larger effect was expected, based on previous literature (Elmasian and Galambos 1975).

The stimuli were generated digitally and played out diotically from a LynxStudio L22 24-bit soundcard at a sampling rate of 22.5 kHz via Sennheiser HD650 headphones to listeners seated in a double-walled sound-attenuating chamber.

Procedure

A training session was run prior to the actual experiment, involving the target and comparison sounds, but no precursor. Listeners were instructed to respond to the question, “Which sound is louder?” via virtual buttons on the computer display. As in the actual experiment, the target was always 60 dB SPL. The comparison was presented at one of six levels: 57, 58, 59, 61, 62, and 63 dB SPL. Each level was presented 10 times, resulting in 60 trials per training block. Feedback was provided throughout the training session. Listeners were required to reach 80 % correct to proceed to the actual experiment. All of the participants achieved this level of performance within two blocks of training.

In the actual experiment, listeners were asked to ignore the precursor (if present) and to again judge which of the two short sounds (the target and the comparison) was louder. A reference condition with no precursor (similar to the training condition) was also included. Each precursor condition was repeated five times in random order within each of three sessions. The first session involved the 70-dB SPL precursor at one of five center frequencies with the 24-dB/octave filter slopes; the second session involved the 70-dB SPL precursor at one of five center frequencies with the 96-dB/octave filter slopes; the third session involved the 85-dB SPL precursor at only a single center frequency with the 96-dB/octave filter slopes. In the first and second sessions, each block comprised one precursor frequency (or no precursor) with seven comparison levels, repeated 10 times in random order, making a total of 70 trials per block. Each session contained 30 blocks (five repetitions for each of the six precursor conditions, with trials in a new random order in each block), for a total of 50 repetitions of each stimulus per subject. In the final session, with the 85-dB precursor, six comparison levels were each repeated 10 times, for a total of 60 trials per block. A total of 10 blocks of trials were presented per subject in the last session (reference and on-frequency condition, five times for each condition), for a total of 50 repetitions of each stimulus per subject. No feedback was provided in the test sessions. The whole experiment lasted about 6 to 8 h, divided into 2-h sessions.

Results and Discussion

The mean results are presented in Figure 2. In each panel, the proportion of trials in which the comparison was judged to be louder than the target is plotted as a function of the comparison level. Figures 2A and 2B show the results with a 70-dB SPL precursor, with data from the 24 and 96 dB/octave filter slopes, respectively. Different symbols represent the different precursor center frequencies, as shown in the legend. Figure 2D shows the data using the precursor level of 85 dB SPL and filter slopes of 96 dB/octave. Figure 2C replots the on-frequency-precursor and no-precursor conditions from Figure 2B for ease of comparison.

Fig. 2 — Results from normal-hearing listeners. The proportion of trials (%) in which the comparison was judged louder than the target is plotted as a function of the comparison level (dB SPL). The target level was always 60 dB SPL. Panels **(A)** and **(B)** show results using a precursor level of 70 dB SPL with filter slopes of 24 and 96 dB/octave, respectively. Panel **(C)** replots the on-frequency and no precursor conditions from panel **(B)** for ease of comparison with Panel **(D)**, which shows data using a precursor level of 85 dB SPL and filter slopes of 96 dB/octave. Error bars represent 1 s.e. of the mean.

Consider first the conditions with no precursor (filled circles). In all three conditions, the point of subjective equality (PSE), i.e., the level at which the comparison was judged louder than the target 50 % of time, was reached at a comparison level between 58 and 60 dB SPL. In other words, the two stimuli were judged equally loud when the target was 0–2 dB higher in level than the comparison. Perceptual biases of this kind have occurred in other loudness comparison studies, although the direction of the bias does not appear to be always consistent. For instance, in Elmasian et al. (1980), for baseline conditions, the 50-dB target alone was matched with a comparison tone level of around 52 dB, whereas the 70-dB target alone was matched with a comparison tone level nearer 66 dB SPL.

Consider next the effect of adding a precursor. In general, the addition of a precursor resulted in the target being judged louder (and/or the comparison being judged quieter), as shown by the fact that the filled circles (precursor absent) lie above the other symbols in all conditions. Moreover, the on-frequency precursor produced the largest effects, as shown by the fact that the open circles generally fall below all the other symbols. In general, the effect of the precursor diminished with increasing spectral distance between the precursor and target. This trend is particularly apparent in the case of the 24 dB/octave slopes, where the progression from no difference to a large difference in center frequency was more systematic; in the condition with 96 dB/octave slopes, the on-frequency precursor produced the largest effect, but all other precursor conditions produced similarly small effects.

Finally, consider the effect of precursor level. As expected from previous studies (Elmasian and Galambos 1975; Mapes-Riordan and Yost 1999), the overall effect (difference between no precursor and on-frequency precursor) seems greater with the higher-level than with the lower-level precursor (compare Fig. 2C and 2D).

Probit analysis was used to fit each of the curves shown in Fig. 2 for each subject individually. The fitted curves from each subject and each condition were then used to calculate the comparison level at the PSE. A level higher than 60 dB SPL implies that the comparison required a higher level than the target to be judged equally loud.

To confirm the statistical significance of the trends described above, within-subjects analyses of variance (ANOVA) were carried out with Huynh-Feldt corrections for lack of sphericity applied where appropriate, using the fitted PSEs as the dependent variable. In the first analysis considering just the conditions with the 70-dB precursor, the factors were filter slope (24 or 96 dB/oct) and precursor (6 levels—5 frequencies or no precursor). Significant main effects were observed for both precursor [F(5,30) = 8.86; p = 0.001] and filter slope [F(1,6) = 6.94; p = 0.039]. There was also a significant interaction between filter slope and precursor type [F(5,30) = 3.24; p = 0.019]. A planned comparison found a significant difference between PSE in the no-precursor condition and the PSE in the on-frequency condition [F(1,6) = 14.5; p = 0.009]. In addition, when the no-precursor condition was removed, contrast analysis revealed a quadratic trend for precursor frequency [F(1,6) = 23.4; p = 0.003]. These two findings indicate that the precursor affected loudness judgments and that the effect appeared to be frequency selective, with the effect decreasing with increasing frequency distance between the precursor and the target frequency. Although the effect of filter slope and its interaction with precursor frequency reached significance, the effects appear small and not easily interpretable.

To assess the effect of precursor level, the difference in PSE between the no-precursor condition and the on-frequency precursor condition was calculated from the data from session 2 (70 dB SPL precursor) and session 3 (85 dB SPL precursor). These differences, which represent the effect of the precursor on the loudness comparison, or LCE (in dB), were subjected to a paired-samples (within-subjects) t test. As illustrated in Figure 3A, the difference in LCEs, which were 1.52 and 5.74 dB for the 70- and 85-dB precursor, respectively, was significant [t(6) = 5.08, p = 0.002].

Fig. 3 — Derived PSE for the individual subjects. Panel **(A)** shows PSEs for the normal-hearing subjects, and Panel **(B)** shows the results from CI users. Different symbols denote different subjects in the two panels, but there is no relationship between the symbols across the two panels. Symbols of CI users are indicated in Table 1. The levels of precursor and target (precursor/target) are shown on the x-axis. The results from no-precursor baseline conditions are shown as *red unfilled symbols*, and those from the on-frequency precursor condition are shown in *blue unfilled symbols*. The *horizontal bars* indicate the mean of each condition.

One puzzling aspect of the data is that the larger LCE with the higher-level precursor is not just due to the higher PSE in the precursor condition but seems to be also due to the lower PSE in the no-precursor condition. It is not clear why the no-precursor PSE was lower in the session that tested the higher-level precursor. It is conceivable that having blocks with the higher-level precursor interspersed with the no-precursor blocks led to an “over-compensation” of responses in the no-precursor blocks, in order for subjects to keep the overall number of “louder” and “quieter” responses more equal, when averaged over the session. However, the effect was relatively small, and further study would be needed to test this speculation.

In summary, significant LCE was observed in normal-hearing listeners. The effect exhibited frequency selectivity: it was greatest when the precursor was at the same frequency as the target and decreased with increasing spectral distance between the precursor and the target. The effect was also level-dependent, as it was greater for the 85-dB precursor than for the 70-dB precursor. Although the effect of filter slope reached statistical significance when all conditions were included, the overall amount of LCE and the effect of frequency separation between precursor and target were similar for both filter slopes tested.