Temporal weighting of binaural information at low frequencies: Discrimination of dynamic interaural time and level differences

Anna C Diedesch; G Christopher Stecker

doi:10.1121/1.4922327

. 2015 Jul 8;138(1):125–133. doi: 10.1121/1.4922327

Temporal weighting of binaural information at low frequencies: Discrimination of dynamic interaural time and level differences

Anna C Diedesch ¹, G Christopher Stecker ^1,^a)

PMCID: PMC4499054 PMID: 26233013

Abstract

The importance of sound onsets in binaural hearing has been addressed in many studies, particularly at high frequencies, where the onset of the envelope may carry much of the useful binaural information. Some studies suggest that sound onsets might play a similar role in the processing of binaural cues [e.g., fine-structure interaural time differences (ITD)] at low frequencies. This study measured listeners' sensitivity to ITD and interaural level differences (ILD) present in early (i.e., onset) and late parts of 80-ms pure tones of 250-, 500-, and 1000-Hz frequency. Following previous studies, tones carried static interaural cues or dynamic cues that peaked at sound onset and diminished to zero at sound offset or vice versa. Although better thresholds were observed in static than dynamic conditions overall, ITD discrimination was especially impaired, regardless of frequency, when cues were not available at sound onset. Results for ILD followed a similar pattern at 1000 Hz; at lower frequencies, ILD thresholds did not differ significantly between dynamic-cue conditions. The results support the “onset” hypothesis of Houtgast and Plomp [(1968). J. Acoust. Soc. Am. 44, 807–812] for ITD discrimination, but not necessarily ILD discrimination, in low-frequency pure tones.

I. INTRODUCTION

Auditory spatial information is conveyed by many different acoustic features (i.e., spatial cues) of sounds. Ideally, listeners would make use of, and appropriately weight, all available cues. In real rooms, for example, sound onsets carry reliable spatial cues while later sound can be distorted by echoes and reverberation. In that case, an appropriate strategy is to weight the spatial cues in a temporally nonuniform manner that emphasizes the early arriving sound. In fact, the auditory system does exactly this, as has been thoroughly established in the literature on the Franssen effect (Franssen, 1960; Hartmann and Rakerd, 1989; Yost et al., 1997) and precedence effects (Wallach et al., 1949; Yost and Soderquist, 1984; Zurek, 1987; Litovsky et al., 1999; Brown et al., 2015). When sounds carry spatial cues that are constant over time, however, emphasis of the onset cues provides no advantage; instead, listeners ought to benefit from weighting spatial cues in a temporally uniform manner. The results of several studies, however, suggest otherwise. Most recently, Stecker and Bibee (2014) reported nonuniform temporal weighting of static interaural time differences (ITD) in 500 Hz pure tones presented over headphones. That result was consistent with previous reports of nonuniform temporal weighting of ITD in bands of low-frequency noise (Houtgast and Plomp, 1968) and modulated high-frequency sounds (Hafter and Dye, 1983). In all three studies, ITD thresholds were measured as a function of sound duration. Despite differences in the sounds employed, all three studies found that ITD thresholds improved with duration, but at a shallower rate than would be expected if listeners used all portions of the sound equally. This finding lead Houtgast and Plomp (1968, p. 811)—later echoed by Hafter and Dye (1983)—to conclude, “the onset of the signal contributes much more to [the accuracy of] the lateral position perceived than the ongoing part.”

Stecker and Brown (2010) and Stecker and Bibee (2014) tested that hypothesis more directly by comparing lateralization thresholds for sounds that differed in the availability of ITD at sound onset. In their approach, sounds carried ITD that was either constant throughout the duration (condition “RR”) or changed over time so that either the onset (in condition “0R”) or offset (“R0”) was diotic. Stecker and Brown (2010) presented high-rate trains of filtered clicks centered at 4000 Hz, similar to stimuli employed by Hafter and Dye (1983);¹ Stecker and Bibee (2014) adapted these methods to test pure tones at 500 Hz, the same frequency region tested by Houtgast and Plomp (1968). Both Stecker and Brown (2010) and Stecker and Bibee (2014) reported significant threshold elevations in condition 0R (zero ITD at onset) compared to conditions in which the ITD cue was available near sound onset. That result parallels a recent observation by Dietz et al. (2013) of greater weighting of fine-structure ITD during the rising envelope (i.e., the “onset”), rather than the peak, of each modulation period in a sinusoidally amplitude-modulated 500 Hz tone. Here, we adopt the approach of Stecker and Brown (2010) to measure temporal weighting across a range of pure-tone frequencies for both ITD and for interaural level differences (ILD).

It is noteworthy that Houtgast and Plomp (1968), Hafter and Dye (1983), and Stecker and Bibee (2014) all found remarkably similar results despite the wide range of stimulus types (noises, click trains, and tones, respectively) and frequencies (500–4000 Hz) tested. As mentioned above, all three calculated improvements in lateralization accuracy with signal duration, and expressed these in the form of log(threshold)-vs-log(duration) slopes. For uniform temporal weighting, that slope should be −0.5, corresponding to 1/ $\sqrt{(duration)}$ . Obtained values were significantly shallower but quantitatively similar across studies, ranging −0.23 to −0.25 for Houtgast and Plomp's two listeners, −0.08 to −0.33 (mean −0.22) for Hafter and Dye (1983), and −0.09 to −0.27 (mean −0.18) for Stecker and Bibee (2014). These remarkable similarities suggest that a common mechanism underlies temporal weighting of ITD, regardless of frequency. The literature on precedence effects, on the other hand, does suggest some differences across frequency. One important consideration is the interaction between successive stimuli on the basilar membrane (i.e., “ringing” of peripheral filters, see Tollin, 1998; Stecker, 2014). Because the temporal properties of the auditory peripheral response vary with cochlear place, one might expect larger effects at lower frequencies due to longer ringing. That expectation is roughly consistent with stronger precedence effects for stimuli with low-frequency early components and high-frequency late components than vice versa (Divenyi, 1992; Shinn-Cunningham et al., 1995), and with the observations from Whitmer (2004) of stronger Franssen effects for low-frequency (250–500 Hz) than high-frequency (2000–4000 Hz) tones. Some studies have suggested additional effects of frequency, for example, stronger Franssen effects for mid-frequency (1000–1500 Hz) than higher- or lower-frequency tones (Yost et al., 1997). To investigate potential differences across frequency, experiments of the current study investigated pure tones varying from 250 to 1000 Hz.

In contrast to the remarkable similarities in temporal weighting of ITD across studies, results regarding the temporal weighting of ILD have been more equivocal. Some studies have reported similar onset dominance for ILD and ITD (Zurek, 1980; Hafter et al., 1983); others have suggested key differences, e.g., stronger precedence effects for ITD than for ILD cues (Krumbholz and Nobbe, 2002; Saberi et al., 2004; Brown and Stecker, 2013). Hafter et al. (1983) measured ILD thresholds as a function of duration and showed identical effects for ILD as they did for ITD (Hafter and Dye, 1983). However, Stecker and Brown (2010) found no evidence for specific onset weighting in ILD, despite using stimuli (4000-Hz click trains) similar to those of Hafter et al. (1983). Results of that and a follow-up study which compared the weighting of ILD carried by onsets, offsets, and interior portions of sounds (Stecker and Brown, 2012) suggest that ILD cues near sound offset are strongly weighted, unlike the case for ITD, where the onset dominates more completely. Here, we investigate whether similar differences between ITD and ILD exist for low-frequency pure tones.

II. EXPERIMENT 1: DISCRIMINATION OF DYNAMIC INTERAURAL TIME DIFFERENCES

A. Methods

Data collection took place in the Department of Speech and Hearing Sciences at the University of Washington. All procedures, including recruitment, consenting, and testing of human subjects, were in accordance to the guidelines of the University of Washington Human Subjects Division and were reviewed and approved by the cognizant Institutional Review Board.

1. Participants

Ten listeners, eight female, aged 20 to 40 (mean 26.4, SD 6.7) participated in this experiment. Four of the listeners had participated in a similar experiment (Stecker and Bibee, 2014). One of those was employed in the lab (listener 0510). Another of the listeners was the first author (0507), who did not participate in the previous study. Other listeners were naive to the purpose of the study. A hearing screening conducted prior to data collection confirmed normal pure tone thresholds (<15 dB hearing level) at octave frequencies between 250 and 8000 Hz for all listeners. Participants other than lab personnel were monetarily compensated for their time.

2. Stimuli

Stimuli were pure tones of 80 ms duration presented at 60 dB sound pressure level (SPL) over closed circumaural earphones (Stax 4070). Tone frequency was 250, 500, or 1000 Hz, sound levels were calibrated to 60 dB SPL at each frequency using a head and torso simulator (Brüel & Kjær 4100D, Nærum, Denmark). Stimuli were gated on and off by diotic raised cosine ramps of 20-ms duration to minimize envelope cues (see Fig. 1). Reference stimuli were completely diotic (ITD = 0) whereas target stimuli carried a right-leading fine-structure ITD cue, Δt. ITD was applied in one of three configurations. In condition RR, ITD was static and equal to Δt throughout the duration of the tone. In condition R0, the ITD changed over time, diminishing linearly from Δt at sound onset to 0 μs at sound offset. The reverse was true in condition 0R, in which ITD grew from 0 μs at sound onset to Δt at sound offset. Dynamic ITD values in these conditions were accomplished by introducing interaural frequency differences (<3.2 Hz, see below) while controlling the starting and ending phases in each ear. To reduce the reliability of that cue, overall frequency roved ±10% (e.g., ±25 Hz at 250 Hz) between stimulus intervals. Intensity roved ±5 dB.

FIG. 1. — (Color online) Illustration of stimuli employed in experiment 1 (not to scale). In both experiments, listeners identified target intervals containing 80-ms pure tones with diotic envelopes and non-zero ITD (experiment 1) or ILD (experiment 2). Panels illustrate waveforms at each ear (line shading) with ITD imposed. Binaural differences were presented in three conditions: (left) The “RR” condition presented a constant ITD or ILD cue lateralized toward the right ear. In the dynamic-cue conditions “R0” and “0R,” ITD or ILD changed over time. (Center) In condition “R0,” the cue was largest at sound onset and diminished over time, while (right) in condition “0R,” the cue was largest at sound offset. Note that the waveforms in dynamic-cue conditions were diotic at sound offset (in condition R0) or sound onset (in condition 0R). To the extent that listeners rely primarily on binaural information present early in the sound, binaural-cue thresholds should be elevated in condition 0R relative to other conditions.

3. Procedure

Participants were tested in a double-walled sound-attenuating chamber (IAC, New York, NY). At the beginning of each test block, a continuous diotic 500-Hz pure tone was presented over the headphones. Listeners were instructed to adjust the headphone placement until the sound appeared centered in head. Subsequently, experimental stimuli were presented in a four-interval, two-alternative forced-choice (4I2AFC) task, with the target occurring in either the second or third interval. Other intervals contained diotic reference tones. Responses to target stimuli were made on a four-button response box with light emitting diode (LED) lights located above each response button (TDT RBOX). LEDs blinked sequentially to mark the four intervals presented on each trial. On each trial, listeners pressed one of the buttons to indicate which interval contained the right-leading target. Feedback was provided to the listeners on each trial. Throughout the experiment, participants were in control of the pace of testing and were given breaks at least every 30 min or as needed.

ITD thresholds for each test block were estimated using a two-down one-up adaptive procedure tracking Δt at 71% correct in two interleaved but independent tracks (Levitt, 1971). Initial values of Δt depended on frequency condition and were set to 600 μs, 500 μs, or 250 μs, at 250, 500, and 1000 Hz, respectively. Δt was adjusted by a scaling factor of 0.2 for the first 4 of 12 reversals and 0.05 thereafter. ITD threshold was estimated as the geometric mean of Δt during the final eight reversals recorded in each track.

Prior to the start of the randomized experimental conditions, participants were presented with written and verbal instructions, followed by at least four practice blocks (eight threshold measurements) in condition RR at 250, 500, and 1000 Hz. Additional verbal instructions, given after the first practice block, instructed listeners to attend only to the lateralization of the sounds and to ignore fluctuations in pitch and intensity. Experimental test blocks followed, each presenting one combination of the three conditions (RR, R0, and 0 R) and three test frequencies (250, 500, 1000 Hz). After each combination was tested once (in random order), the entire set of nine combinations was repeated three additional times, for a total of four test blocks (eight interleaved tracks) per combination. In cases where an adaptive track failed to behave asymptotically, the data were eliminated and the test block was repeated. Test blocks lasted approximately 8 to 10 min, and data were collected in 2-hour test sessions. Each listener completed approximately 8 to 10 such sessions.

4. Analysis

Data were analyzed following the approach of Stecker and Bibee (2014), who describe that approach in more detail. Briefly, geometric mean thresholds for individual listeners were computed across test blocks in each experimental condition (see Fig. 2). For group-level analyses, each such threshold was normalized via division by the threshold obtained in condition RR for the corresponding participant and frequency. Group-average normalized thresholds were then computed across listeners (Fig. 3). Bootstrapped 95% confidence intervals were computed by 1000-fold resampling of threshold estimates across test blocks for individual listeners (Efron and Tibshirani, 1986), and second-level resampling of those data across listeners for group data (Stecker and Bibee, 2014).

FIG. 2. — Individual ITD thresholds from experiment 1. In each panel, symbols plot ITD thresholds (vertical axis) for conditions RR (black squares), R0 (rightward-pointing triangles), and 0R (leftward-pointing triangles) against tone frequency (horizontal axis). Error bars indicate bootstrapped 95% confidence intervals. Light gray lines plot two times the RR thresholds for comparison. Individual listeners' data are represented in separate panels. In most cases, ITD thresholds improved with increasing frequency from 250 to 1000 Hz, and with the availability of ITD information at sound onset (best in condition RR and worst in condition 0R).

FIG. 3. — Group-mean thresholds from experiment 1. Symbols plot group-mean normalized ITD thresholds (vertical axis) against frequency (horizontal axis) in conditions RR (squares), R0 (rightward-pointing triangles), and 0R (leftward-pointing triangles). Means were computed following normalization to each listener's RR threshold at the corresponding frequency. Error bars represent bootstrapped 95% confidence intervals.

To test the “onset” hypothesis described by Houtgast and Plomp (1968) and Stecker and Bibee (2014), we computed ratios of threshold Δt obtained in conditions 0R and R0 (Fig. 4). Bootstrapped 95% confidence intervals were computed by resampling the threshold estimates in each condition and computing the corresponding ratio for each bootstrapped sample. Similarly at the group level, confidence intervals were computed by second-level resampling of the group-average thresholds, prior to calculating the 0R/R0 threshold ratio for each bootstrapped sample. The proportion of such ratio values falling at or below 1 (i.e., the p-value of a one-tailed confidence interval) quantified the statistical significance that 0R thresholds exceeded R0 thresholds at the group level.

FIG. 4. — Bars plot the ratio of 0R to R0 ITD thresholds obtained in experiment 1 for each individual (horizontal axis), and for the group average (far right). Bar shading indicates tone frequency. Error bars represent bootstrapped 95% confidence intervals. Consistent with Figs. 2 and 3, most cases indicate higher thresholds in condition 0R than in condition R0 (i.e., ratio >1).

Finally, the effects of frequency were assessed using a factorial repeated measures analysis of variance (ANOVA) on normalized thresholds, with factors of participant, condition (RR, R0, 0R), and frequency (250, 500, 1000 Hz).

B. Results

Individual listeners' ITD thresholds for the nine stimulus conditions (symbols) are plotted against frequency in Fig. 2. Error bars indicate 95% bootstrapped confidence intervals. Gray lines in each panel plot double the threshold obtained in condition RR (squares), for comparison to dynamic-cue thresholds obtained in conditions R0 (rightward-pointing triangles) and 0R (leftward-pointing triangles). Four general observations can be made from these data: First, individual listeners varied in overall sensitivity—with thresholds ranging from 29 to 105 μs in condition RR at 500 Hz, for example—but demonstrated similar patterns of threshold variation across conditions. Second, thresholds tended to decrease with increasing frequency. Third, in almost every case, the highest thresholds were obtained in condition 0R. That result is consistent with the “onset” hypothesis, since that stimulus carries zero ITD at sound onset. Fourth, and less consistent with that hypothesis, better thresholds were obtained in condition RR than R0, suggesting that listeners did benefit from consistent ITD information late in the stimulus, when available. Frequency- and condition-dependent differences are further corroborated in Table I (experiment 1), which displays means and standard deviations of non-normalized threshold data across listeners.

TABLE I.

Group-mean threshold values, computed without normalization, for experiment 1 (ITD in μs, left) and experiment 2 (ILD in DB, right). Means and standard deviations (in parentheses) are given for each combination of frequency (columns) and condition (rows).

	Experiment 1: ITD Thresholds (μs)			Experiment 2: ILD Thresholds (dB)
	250 Hz	500 Hz	1000 Hz	250 Hz	500 Hz	1000 Hz
RR	140.5 (69.6)	59.4 (27.0)	53.5 (23.0)	2.9 (1.5)	2.9 (1.4)	4.2 (2.0)
R0	178.9 (53.8)	84.1 (23.3)	79.2 (33.8)	4.9 (2.7)	4.6 (1.5)	5.7 (1.5)
0R	237.6 (71.7)	132.4 (44.8)	111.9 (39.1)	5.1 (2.5)	5.3 (1.9)	6.9 (2.1)

Open in a new tab

Figure 3 plots group-average thresholds, normalized to condition RR at each frequency (squares). Triangles plot values for stimulus conditions R0 and 0R, as in Fig. 2. Error bars represent bootstrapped 95% confidence intervals. Similarly to the individual data, higher thresholds were observed in condition 0R than R0, both of which significantly exceeded RR thresholds (i.e., normalized thresholds >1). The overall effect of frequency (Fig. 2) cannot be seen, because data were normalized separately at each frequency.

Factorial repeated-measures ANOVA indicated no significant effects of frequency, or frequency-by-condition interaction, on normalized threshold values in conditions R0 and 0R; F(2,18) = 1.90; p = 0.18, F(2,18) = 1.06; p = 0.37, respectively. Thus, it appears that the cross-frequency differences apparent in Fig. 2 were independent of stimulus-condition effects, and were removed by the frequency-specific threshold normalization. ANOVA did reveal a significant main effect of stimulus condition (R0, 0R), F(1,9) = 89.51; p < 0.05, consistent with the trends apparent in Figs. 2 and 3.

The “onset” hypothesis for ITD discrimination can be meaningfully addressed by directly comparing thresholds obtained in conditions R0 and 0R. For a given Δt, the long-term average ITD is equal across the two conditions, which differ only in the temporal configuration of the cue. According to the “onset” hypothesis, listeners should be significantly more sensitive to ITD in condition R0 than 0R. To compare these two conditions, the threshold ratio Δt_0R/Δt_R0 was calculated for each listener and for the group average, all of which are plotted in Fig. 4. Although ratios varied across conditions and listeners, they were consistently ≥1. Average ratio values were greater than 1 overall: 1.3 (p < 0.005), 1.6 (p < 0.001), and 1.4 (p < 0.001) at 250, 500, and 1000 Hz, respectively. These results strongly support the “onset” hypothesis for ITD across this frequency range.