Temporal weighting functions for interaural time and level differences. III. Temporal weighting for lateral position judgments

G Christopher Stecker; Jennifer D Ostreicher; Andrew D Brown

doi:10.1121/1.4812857

. 2013 Aug;134(2):1242–1252. doi: 10.1121/1.4812857

Temporal weighting functions for interaural time and level differences. III. Temporal weighting for lateral position judgments

G Christopher Stecker ^1,^a), Jennifer D Ostreicher ¹, Andrew D Brown ^1,^b)

PMCID: PMC3745506 PMID: 23927122

Abstract

Temporal variation in listeners' sensitivity to interaural time and level differences (ITD and ILD) was assessed using the temporal weighting function (TWF) paradigm [Stecker and Hafter (2002). J. Acoust. Soc. Am. 112, 1046–1057] in the context of sound-source lateralization. Brief Gabor click trains were presented over headphones with overall ITD and/or ILD ranging ±500 μs ITD and/or ±5 dB ILD across trials; values for individual clicks within each train varied by an additional ±100 μs or ±2 dB to allow TWF calculation by multiple regression. In separate conditions, TWFs were measured for (i) ITD alone, (ii) ILD alone, (iii) ITD and ILD covarying (“in agreement”), and (iv) ITD and ILD varying independently across clicks. Consistent with past studies that measured TWF for binaural discrimination, TWFs demonstrated high weight on the first click for stimuli with short interclick interval (ICI = 2 ms), but flatter weighting for longer ICI (5–10 ms). Some conditions additionally demonstrated greater weight for clicks near the offset than near the middle of the train [Stecker and Hafter (2009). J. Acoust. Soc. Am. 125, 3914–3924]. The latter result was observed only when stimuli carried ILD, and appeared more reliably for 5 ms than for 2 or 10 ms ICI.

INTRODUCTION

To accurately localize sound sources, listeners must be sensitive to a variety of auditory spatial cues [including interaural time differences (ITD) and interaural level differences (ILD)] and capable of integrating spatial information across those cues and over the duration of a sound. For example, binaural discrimination of simple sounds with static cues improves with increasing duration. For an ideal listener with equal access to information carried in the beginning, middle, and end of a sound, the improvement can be understood statistically as a consequence of integrating multiple independent and equally weighted samples of the binaural information. However, an accumulating body of evidence suggests that real listeners do not weight the binaural cues carried in different temporal portions of a brief sound equally; instead, temporal weighting of binaural information varies over a sound's duration. Evidence of this uneven weighting includes sub-optimal improvement in binaural discrimination with stimulus duration (Tobias and Zerlin, 1959; Houtgast and Plomp, 1968; Yost et al., 1971; McFadden and Sharpley, 1972; Ricard and Hafter, 1973; Nuetzel and Hafter, 1976; McFadden and Moffitt, 1977; Hafter and Dye, 1983) and non-uniform temporal weighting functions (TWFs) for binaural discrimination (Hafter and Buell, 1990; Saberi, 1996; Brown and Stecker, 2010). Overall, those studies suggest greater sensitivity to binaural cues carried by sound onsets than by later segments. For modulated high-frequency sounds, such as filtered click trains (Hafter and Dye, 1983), the degree of this “onset dominance” is greatest for high modulation rates or short interclick intervals (ICI). Somewhat in contrast to the results of headphone studies cited above, Stecker and Hafter (2002) found that TWFs for sounds presented over loudspeakers in the free field emphasized both sound onsets and offsets. Stecker and Hafter (2009) demonstrated the offset effect (termed “upweighting”) to be a monotonic increase in weights toward the end of the train (i.e., the effects were not confined to the offset click) consistent with “leaky” temporal integration of auditory spatial cues (cf. Tobias and Zerlin, 1959).

It remains unclear which situations give rise to upweighting of late-arriving sound. Because the effect was observed in a task that required listeners to point to free-field sounds varying in azimuth (Stecker and Hafter, 2002) and elevation (Macpherson and Wagner, 2008), but not for a task involving ITD discrimination (Saberi, 1996), Stecker and Hafter (2009) suggested two alternative hypotheses: that the effect might be limited, first, to the processing of non-ITD cues available in the free field or, second, to “open-loop” judgments of sound location, as in a pointing task.1 With respect to the first hypothesis, Brown and Stecker (2010) found generally weaker onset dominance for ILD than ITD “discrimination,” but no evidence of upweighting for either cue, suggesting that the difference in cues does not, by itself, explain the difference in results. The current study addresses the second hypothesis: that upweighting manifests primarily in open-loop judgments of sound-source locations, as a consequence of the different memory demands, spatial representations, or stimulus configurations involved in localization vs discrimination tasks. Here, TWFs are measured for open-loop lateralization, a task similar to the pointing task of Stecker and Hafter (2002), but for sounds carrying ITD and/or ILD presented over headphones.

EXPERIMENT 1: INTERAURAL TIME AND LEVEL IN AGREEMENT

Stecker and Hafter (2002) asked listeners to localize filtered click trains presented in the free field. In their experiment, individual clicks within each train were randomly distributed across a group of loudspeakers spanning 11°–22° azimuth. TWFs, computed by regressing listeners' localization judgments onto the individual-click locations, revealed both ICI-dependent onset dominance and upweighting of late-arriving clicks. Experiment 1 of the current study aimed to replicate Stecker and Hafter's (2002) experiment under headphones. Specifically, sounds were presented with ITD and ILD varying in a correlated manner and over a range of values similar to those experienced by listeners in the free field.

The pointing technique of Stecker and Hafter (2002) involved orienting to sounds in egocentric space, as did that of Macpherson and Wagner (2008). Stecker and Hafter (2009) identified two aspects of such tasks that could be relevant to upweighting. First, open-loop tasks require that spatial locations be stored and maintained in memory prior to the response. Second, orientation tasks require the generation of an explicitly spatial response in an egocentric reference frame mapped to that of the stimulus. Here, two different open-loop lateralization tasks were tested: head-turning, which necessarily involved explicit orientation in egocentric space (as in the free-field task), and visual scaling via touchscreen response, which did not.

Methods

All procedures, including recruitment, consenting, and testing of human subjects followed the guidelines of the University of Washington Human Subjects Division and were reviewed and approved by the cognizant Institutional Review Board.

Subjects

Nine subjects (five female) participated in this experiment. One was the second author and another was a research assistant employed in the lab; the remainder were paid subjects naive to the purpose of the experiment. All subjects reported normal hearing and demonstrated pure-tone detection thresholds <15 dB hearing level (HL) at octave frequencies spanning 250–8000 Hz.

Stimuli

Stimuli were trains of Gabor clicks (Gaussian-windowed tone bursts). Each click consisted of a 4 kHz cosine multiplied by a Gaussian temporal envelope with σ = 221 μs, truncated at a total duration of 2 ms. The resulting spectral bandwidth was also Gaussian, with σ = 750 Hz (half-maximal bandwidth ≈ 1.8 kHz). Trains of 16 clicks were synthesized at 48.828 kHz (Tucker-Davis Technologies RX6, Alachua, FL) and presented via headphones (Sennheiser HD 485, Hannover, Germany) at 70 dB peak-equivalent sound pressure level (approximately 65–74 dBA, depending on condition). Click trains were presented with a peak-to-peak ICI equal to 2, 5, or 10 ms. Thus, the total stimulus duration was 32, 77, or 152 ms. ITD and ILD were applied to the stimuli as follows: on each trial, a “base” ITD value was selected from the set {−500, −300, −100, +100, +300, +500 μs}. The base ILD on each trial was set to be in accordance with the base ITD using a trading ratio of 100 μs/dB (e.g., +3 dB for a +300 μs ITD), roughly corresponding to the average trading ratio observed experimentally for such stimuli (Stecker, 2010). Individual clicks within each train were presented at the base ITD and base ILD, plus an additional random perturbation drawn from a uniform distribution spanning ±100 μs and ±2 dB. Perturbations were independent across clicks in a train, but perfectly correlated between ITD and ILD.

Procedure

Testing took place in a double-walled sound-attenuating chamber (IAC, Bronx, NY). Subjects were seated in a swivel chair facing a 80-cm (diagonal) touch-sensitive display (elo Touchsystems 3200L, Tyco Electronics, Bermuda) at a distance of 50 cm. The position and orientation of the listener's head was monitored using an electromagnetic position-tracking system (Polhemus Fastrak, Colchester, VT). The system's transmit coil was affixed to the upper headband of the headphones and the receive coil was suspended in a wooden frame ∼10 cm directly above the listener. At the start of each 90-trial run, a steady 4000 Hz pure tone was delivered from both earphones, and subjects were instructed to adjust the earphone placement to obtain a clearly centered acoustic image. Next, listeners were instructed to sit upright and face directly forward, and to initiate the run by button press or touchscreen response. The head position recorded upon this initiation signal defined a “home” position for each run. Listeners were required to orient within ±5° of home position azimuth and elevation before the start of each trial. Text symbols delivered at eye level and in the center of the display indicated directional deviations from home position. After holding home position for one second, the symbols disappeared and a single auditory stimulus was presented following an additional 1 s delay.

Two different open-loop response measures were employed. The first utilized a head-turn measure previously described by Stecker (2010). Following presentation of a single stimulus, the listener was instructed to rotate her head in the direction of the perceived sound location, by an amount corresponding to the magnitude of the image's lateral position, and then to indicate the response by pressing a hand-held button. The listener was instructed to indicate the leftmost image if multiple images were perceived, or the leftmost extent of a broad image. Although listeners did not spontaneously report having heard multiple images in any condition, the instruction was included to ensure that analyses were not biased by listener's decisions about which image to identify (leftmost responding simply flattens TWFs in such cases). Head position was recorded at the time of each button press, and its azimuth defined the lateralization response on each trial. Although the stimuli employed in this study were expected to produce images within, and not external to, the head, a majority of listeners spontaneously described the task as “turning to face the direction of sound” and none suggested any awareness of the artificiality of turning one's head in the direction of something inside one's head. Regardless, the analytical procedure requires only that responses be systematically correlated to the degree of lateralization experienced by the listeners, and in this regard we do not consider the presence or absence of external perception to be of any major consequence.

The second response measure utilized the touchscreen. Listeners were presented with a 55-cm horizontal bar, 2 cm in height, positioned at eye level on the touchscreen display. The bar spanned approximately 50° visual angle. Following each stimulus presentation, subjects were instructed to make an eye movement in the perceived direction of the sound (i.e., to look at a particular location on the bar) without moving the head, and then (while maintaining head position) to touch the foveated point using either hand. Listeners were instructed beforehand that the horizontal dimension of the bar should be used to indicate the degree of leftward or rightward laterality, with the edges of the bar correspond to “fully left” and “fully right” and the center of the bar indicating a centered image. The horizontal position of response within the bar defined the lateralization response on each trial. As in the head-turn condition, listeners were instructed to identify the leftmost image if multiple or diffuse images were perceived.

In both methods, subjects were instructed to return to the home position following each response, and prepare for the next trial. Each run consisted of 90 trials (15 trials per base ITD/ILD value), and subjects completed 8 runs for each combination of ICI (randomized across sets of 4 runs, within which ICI was fixed) and response measure (fixed within sets of 12 runs). Testing order was counterbalanced across listeners and arranged so that each subject completed four runs of each ICI/response combination before proceeding to the next condition.

Analysis of TWFs

Response data were transformed to ranks (i.e., ranked according to lateral position) within each run prior to the estimation of TWFs using multiple linear regression. Rank-transformation served two purposes. First, it normalized response data across runs and across listeners to avoid effects of bias (e.g., tending to respond further to one side or the other) or range (e.g., failing to utilize the full response scale) differing across listeners. Second, rank-transformation reduces the effects of nonlinearities in response data (e.g., expansion due to listeners avoiding responses close to midline) and ensures a uniform distribution of the response data. Visual inspection of pre-transformed data revealed occasional differences in bias, range, and linearity of responses across subjects and runs, but otherwise approximately uniform response distributions. Thus, it appears unlikely that, in this case, rank-transformation would have significantly altered the degree of response dependency on ITD and ILD. TWFs calculated from non-transformed data were quantitatively similar, though more variable, to those described here using rank-transformed data.2

Perceptual weights for each of 16 clicks in a train were estimated using multiple linear regression of the rank-transformed response, θ_R, onto the binaural cues applied to individual clicks, θ_i

{\hat{θ}}_{R} = \sum_{i = 1}^{16} β_{i} θ_{i} + k .

(1)

For comparison across subjects and conditions, regression coefficients, β_i, were then normalized so that absolute values summed to 1 over the 16-click stimulus duration3

w_{i} = \frac{β_{i}}{\sum_{j = 1}^{16} | β_{j} |} .

(2)

The normalized weights, w_i, indicate each click's relative influence on the listener's response, and typically vary from 0 (indicating no linear relationship between click location and response) to 1 (indicating a perfect linear relationship). Negative values may also be obtained, but generally reflect variation around zero rather than significant negative effects on the response. Plots of w_i weights vs click number comprise the TWFs and indicate how click effectiveness varies over the stimulus duration. TWFs were estimated separately for each combination of listener, ICI, and response method, with each analysis combining data across all eight runs for the combination in question. Statistical confidence intervals were computed at the 95% confidence level for each normalized weight, using a 1000-fold bootstrap procedure (Efron and Tibshirani, 1986). For each combination of subject and condition, individual trials were resampled with replacement 1000 times. Normalized weights were computed for each bootstrapped sample according to Eqs. 1, 2. Distributions of bootstrapped w_i were approximately normal; the standard deviation across bootstrapped samples was used to estimate the standard error of w_i for calculation of 95% confidence intervals on w_i for individual TWFs.

Group-average TWFs, as plotted in Fig. 1, were computed by taking the mean across subjects for each click weight; 95% confidence intervals were computed by bootstrapping the individual TWFs as described above and computing a group-mean TWF on each iteration. Statistical confidence intervals were based on the resulting distribution of group-mean weights across 1000 bootstrapped samples.

Subject-averaged TWFs for ITD and ILD in agreement. Left: head-turn response, right: touchscreen response. In each panel, normalized weights (y-axis) are plotted for each click in a train, as a function of the temporal order of the clicks (x-axis). Symbols plot the mean of normalized weights across subjects; error bars indicate bootstrapped 95% confidence intervals on mean weights. The dashed horizontal line in each panel indicates the value that would obtain if all clicks were equally weighted (1/16), while the solid line indicates zero. Top to bottom, panels plot TWFs for 2, 5, and 10 ms ICI, respectively.

For an ideal observer, each TWF would reflect uniform and equal weighting on all clicks in a train, as all clicks would be equally informative for the task. That is, if listeners' responses made optimal use of binaural information carried by all clicks in a train, normalized TWFs would be flat, with a value of 1/16 for each click. For reference, that value is plotted as a dashed line in Figs. 1, 2, and 5 6 7 8.

Individual-subject TWFs for ITD and ILD in agreement. Left: head-turn response, right: touchscreen response. As in Fig. 1, panels plot normalized click weight (y-axis) as a function of the temporal order of clicks in a train (x-axis). Symbols indicate TWFs for individual subjects. Asterisks (*) mark clicks where a statistically significant proportion of subjects (p < 0.05) demonstrated significantly non-zero weights on an individual basis (also p < 0.05). Other conventions (dashed line at 1/16, solid line at 0, ICI = 2, 5, 10 ms from top to bottom panels, respectively) as in Fig. 1.

Subject-averaged TWFs for ITD (left) and ILD (right) tested separately (Experiment 2). Formatting as in Fig. 1.

Individual-subject TWFs for ITD (left) and ILD (right) tested separately (Experiment 2). Formatting as in Fig. 2.

Subject-averaged TWFs for ITD (left) and ILD (right) presented together but with independent per-click variation (Experiment 3). Formatting as in Fig. 1.

Individual-subject TWFs for ITD (left) and ILD (right) measured simultaneously (Experiment 3). Formatting as in Fig. 2.

Measures of non-uniformity in TWFs

Following Stecker and Hafter (2009), two measures were defined to estimate the degree to which TWFs departed from uniformity. Both are ratios adapted from the “average ratio” (AR), originally defined by Saberi (1996) as the ratio of onset click weight to the average of post-onset click weights. As did Stecker and Hafter (2009), we redefined AR as the ratio of onset or offset weight to the mean of intermediate weights (i.e., the mean excluding onset and offset clicks)

{AR}_{onset} = \frac{w_{1}}{\sum_{i = 2}^{N - 1} w_{i} / (N - 2)}

(3)

{AR}_{offset} = \frac{w_{N}}{\sum_{i = 2}^{N - 1} w_{i} / (N - 2)},

(4)

where N indicates the total number of clicks in each train (16 for all experiments described here). AR_onset describes the degree to which onset clicks dominated listeners' judgments (i.e., “onset dominance”); similarly AR_offset indicates the relative influence of offset clicks (a measure of “upweighting”4).

Results

TWFs, averaged across subjects, are plotted in Fig. 1 for each combination of response measure (left vs right panels) and ICI (top to bottom). Consistent with past studies (Saberi, 1996; Stecker and Hafter, 2002, 2009; Brown and Stecker, 2010), onset clicks received significantly higher weight than did later clicks when the ICI was short (2 ms). That effect was reduced for longer values of ICI. At 5 ms ICI, the largest weights were found for both onset and offset clicks, consistent with the upweighting reported by Stecker and Hafter (2002, 2009), which was similarly found to be greatest in that range of ICI. No evidence for upweighting was observed at 2 ms ICI. Both effects were reduced or absent in the generally flatter TWFs measured at 10 ms ICI, which more closely approximated the equal-weighting value of 1/16 (dashed line).

Individual-subject TWFs are plotted in Fig. 2. Subjects were unanimous in assigning high weights to click 1 at 2 ms ICI. A similar pattern is seen in the weights applied to offset clicks at 5 ms ICI; weights on click 16 were significantly greater than 0 (one-tailed p < 0.05) on an individual basis for 8 of 9 subjects in the head-turn condition and 9 of 9 subjects in the touchscreen condition. Asterisks (*) mark those clicks along with any others with significantly non-zero weights in a statistically significant proportion of subjects (i.e., at least six, p < 0.05). Those patterns of onset dominance at 2 ms ICI and upweighting at 5 ms ICI are further illustrated in the leftmost bars (“HT” and “TS”) plotted in Figs. 3 4; both the median AR values across subjects and the proportion of subjects exhibiting AR > 1 indicate onset dominance (AR_onset > 1; Fig. 3) at 2 ms ICI and upweighting (AR_offset > 1; Fig. 4) at 5 ms ICI, in both response conditions. Significant proportions of subjects also exhibited AR_onset > 1 at 5 ms ICI in the touchscreen condition and at 10 ms ICI in the head-turn condition, although the group median AR_onset in these conditions failed to reach significance due to the modest size of effects for most subjects (see Fig. 2).

Onset weights dominate lateralization at short ICI. (A) Across-subject median ratio of onset weight to interior weight (AR_onset) is plotted against experimental condition. From left to right, conditions are Experiment 1 head-turn (HT) response; Experiment 1 touchscreen (TS) response; Experiment 2 ITD condition; Experiment 2 ILD condition; Experiment 3 ITD weights; Experiment 3 ILD weights. Bar shading indicates ICI of 2 ms (black), 5 ms (gray), or 10 ms (white). Error bars plot bootstrapped 95% confidence intervals. (B) Proportion of subjects demonstrating onset dominance (AR_onset > 1). Asterisks (*) indicate statistical significance of the proportion (p < 0.05) via sign test.

Elevated offset weights were observed especially for ILD at 5 ms ICI. (A) Across-subject median ratio of offset weight to interior weight (AR_offset) is plotted against experimental condition. Conditions and formatting as in Fig. 3. (B) Proportion of subjects demonstrating upweighting (AR_offset > 1). Asterisks (*) indicate statistical significance of the proportion (p < 0.05) via sign test.

Finally, comparing the TWFs measured using the head-turn and touchscreen response measures in Figs. 1 2 suggests a close correspondence, with similar degrees of onset dominance and upweighting across the two methods. The median AR_onset across subjects was 8.8 for the head-turn and 8.2 for the touchscreen measures at 2 ms ICI. Corresponding values were 2.0 and 1.9 at 5 ms ICI, and 1.7 and 2.1 at 10 ms ICI. AR_offset was also similar across head-turn and touchscreen conditions: AR_offset = 2.0 and 2.5, respectively, at 5 ms ICI. No significant differences in AR were observed between head-turn and touchscreen conditions. All estimates of AR were contained within the 95% confidence interval obtained under the other procedure at the corresponding ICI. Note that the apparent reduction in AR_offset for 2 ms ICI in the touchscreen task resulted from near-zero—and for some subjects negative—weights applied to click 16, but did not differ significantly from 1.

The results of Experiment 1 support the key features of TWFs described by Stecker and Hafter (2002, 2009), namely onset dominance at 2 ms ICI and upweighting at 5 ms ICI. That similarity, together with the failure of Brown and Stecker (2010) to observe upweighting for ITD or ILD discrimination, suggests that upweighting may reflect aspects of open-loop localization or lateralization tasks, at least for stimuli carrying both ITD and ILD cues. Furthermore, the precise nature of the task (orientation via head-turning or scaling via touchscreen response) appeared to have no influence on the results. To determine whether the nature of the cue plays an additional role, the second experiment aimed to measure TWFs separately for ITD and ILD using the touchscreen task employed in Experiment 1.