Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2016 Jul 7;140(1):EL14–EL19. doi: 10.1121/1.4954870

Sound source localization identification accuracy: Level and duration dependencies

William A Yost 1
PMCID: PMC5848824  PMID: 27475204

Abstract

Sound source localization accuracy for noises was measured for sources in the front azimuthal open field mainly as a function of overall noise level and duration. An identification procedure was used in which listeners identify which loudspeakers presented a sound. Noises were filtered and differed in bandwidth and center frequency. Sound source localization accuracy depended on the bandwidth of the stimuli, and for the narrow bandwidths, accuracy depended on the filter's center frequency. Sound source localization accuracy did not depend on overall level or duration.

1. Introduction

Sound source localization accuracy for noise stimuli presented in a sound-deadened room in the front azimuthal field as measured in an identification task (listeners indicate which loudspeaker presented a sound) was shown by Yost and Zhong (2014) to increase with increasing bandwidth, with asymptotic (best) performance obtained when the bandwidth was between one and two octaves. For narrower bandwidths (<1 octave), localization accuracy was best for low-frequency noise (125–500 Hz), worse for mid-frequency noise (1000–4000 Hz), and intermediate for high-frequency noise (2000–8000 Hz). Yost et al. (2013) had earlier shown that sound source localization accuracy for two-octave or wider noise bursts did not depend on the center frequency of the noise [a result replicated in the Yost and Zhong (2014) paper]. Results for both studies were obtained for localizing sound sources in the front azimuth hemifield where interaural time (ITD) and level (ILD) differences appear to be the dominant cues for sound source localization (Shub et al., 2008). The results from both papers are consistent with the data from Stevens and Newman (1936) which investigated sound source localization accuracy for similar conditions. These appear to be the only parametric investigations of sound source localization in the front azimuthal open or near open field. In these papers the overall level of the noise and its duration were kept constant. While duration and overall level have been studied in lateralization tasks when sounds are presented over headphones, there are almost no parametric studies of sound source localization accuracy as a function of overall level or duration in the front azimuthal open field for noise stimuli. Interaural discrimination thresholds, especially ITD thresholds, are duration-dependent for headphone delivered stimuli in lateralization tasks (see Yost and Hafter, 1987 and Stecker, 2014 for reviews). But Vliegen and Van Opstal (2004) did not find an effect of noise duration on azimuth localization accuracy in an open field. While different studies of sound source localization have used different overall levels, given the many differences among these studies it is not possible to estimate the degree to which overall level would affect sound source localization accuracy in the front azimuthal field. And, there does not seem to be a parametric study of the effect of overall level on sound source localization accuracy in the front azimuthal plane.

In order to better determine how duration and overall level affect sound source localization accuracy in the front azimuthal field this paper investigated sound source localization accuracy for some of the stimulus conditions studied by Yost and Zhong (2014), but using the procedure of Yost et al. (2013). Because stimulus bandwidth, and for narrow bandwidths filter center frequency, affect sound source localization accuracy (see Yost and Zhong, 2014), the effects of duration and overall level were investigated for different noise bandwidths and filter center frequencies.

2. Experiment I: Overall noise level

Sound source localization accuracy using an identification task was measured as a function of overall noise level [25–85 dB sound pressure level (SPL)] for 2-octave wide bands of noise and 1/10th-octave wide bands of noise when the center frequencies were 250, 2000, and 4000 Hz. Measurements were also made for a broadband noise (125–8000 Hz). All measurements were made in a near open field in the front azimuthal hemifield.

2.1. Methods

Listeners. Twelve young (19–32 years of age, 9 females) listeners who reported normal hearing were used as subjects. All of the procedures reported in this paper were approved by the Arizona State University Institutional Review Board for the Protection of Human Subjects (IRB).

Stimuli. Noise bursts of 200-ms duration, shaped with cosine-squared, 20-ms rise-fall times and presented at 25, 45, 65, and 85 dB SPL in overall level were used as the main stimuli. The noise bursts were filtered by three-pole Butterworth filters (implemented in matlab) in four frequency conditions in a two-octave wide set of conditions: Low frequency [center frequency (CF) of 250-Hz], mid frequency (CF of 2000 Hz), high frequency (CF of 4000 Hz), and broadband (125–8000 Hz). The noises with the same CFs (250, 2000, and 4000 Hz) were also measured for 1/10th octave wide noise bursts. These stimulus conditions are identical to those used by Yost and Zhong (2014).

Listening environment. Details of the listening environment have been described in Yost et al. (2015), Yost and Zhong (2014), and Yost et al. (2013). In general, the listening room is 10′ × 15′ lined on all six surfaces with acoustic foam providing a room with 102-ms wideband reverberation time (RT60) and ambient noise level of 31 dBA. Eleven of the possible 24 loudspeakers (Boston Acoustics Soundware 100 loudspeakers at 15° spacing, 75° left of center to 75° right of center) on a five-foot azimuth circle with the listener in the middle and with the loudspeakers producing sound at pinna height. All sounds were generated at 44 100 samples/s and presented via a 24-channel sound system (Echo Gina, Santa Barbara, CA).

Procedure. The procedure was identical to that used in Yost et al. (2013). Listeners identified which of thirteen loudspeakers [from 90° left (loudspeaker #13) to 90° right (loudspeaker #1)] presented sound, although only eleven loudspeakers actually presented sound. Listeners did not know that the loudspeakers at each end of the array (at ±90°) did not produce sound. Sound source localization accuracy was first measured for the two-octave wide noise bursts. The order of presentation among the sixteen different sounds (four levels by four filter conditions) and eleven loudspeaker locations was randomized with each stimulus/loudspeaker combination being presented equally often.

Then, sound source localization accuracy was measured using the same procedure and listeners for the 1/10th octave wide noise bursts. The order of presentation among the twelve different sounds (four levels by three filter conditions) and eleven loudspeaker locations was randomized with each stimulus/loudspeaker condition being presented equally often.

Each individual sound (sixteen sounds for the two-octave wide conditions and twelve sounds for the 1/10th octave conditions) was presented 20 times from each of the eleven loudspeaker locations. No feedback was provided. Listeners were instructed at the beginning of each presentation to look at a red dot on the center loudspeaker (#7) throughout each stimulus presentation to keep their heads facing forward. Listeners' head position was closely monitored, and they were rarely admonished for moving their heads. As mentioned in previous work (Yost and Zhong, 2014, and Yost et al., 2013), a 200-ms stimulus is too short for the head to move to the location of distal sound sources while the sound is being presented.

2.2. Results and discussion

Figure 1 displays the data of experiment I in terms of mean (twelve listeners) rms error in degrees as a function of overall stimulus level in dB SPL, with the different functions representing different CF conditions. The data for the two-octave wide bands of noise are on the left and those for the 1/10th octave wide bands are on the right. The error bars are ± one standard deviation.

Fig. 1.

Fig. 1.

(Color online) Mean (12 listeners) rms errors in degrees as a function of overall level in dB SPL for the filtered conditions. Left figure: Data for 2-octave wide noise bands (CFs of 250, 2000, and 4000 Hz) and a broadband (BB) condition. Right figure: Data for the 1/10th octave filter conditions (at the three CFs). Error bars are ± one standard deviation.

Two, two-way repeated measures analysis of variances (ANOVA) were performed, one for the two-octave wide conditions and one for the 1/10th octave conditions. The focus of experiment I is on the effect of overall sound level, given that previous work (Yost and Zhong, 2014) has already shown a large and statistically significant difference in rms error between two-octave and 1/10th octave bandwidth noise stimuli (for all three CFs), and the results of this study as a function of bandwidth are very similar to those of Yost and Zhong (2014). For each ANOVA the two main variables were CF and overall level. For the two-octave wide noise bands (left panel of Fig. 1), neither the CF main effect [F(2,22) = 2.8] nor the overall level main effect [F(3,33) = 1.9] were statistically significant at a 0.05 level of significance. And there was no statistically significant interaction [F(6,66) = 1.2]. For the 1/10th octave wide noise bands (right panel of Fig. 1); the CF main effect was statistically significant [F(2,22) = 9.8] at a 0.05 level of significance, but neither the overall level main effect [F(2,22) = 2.1] nor the interaction [F(4,44) = 1.6] were statistically significant.

Thus, rms error in a sound source localization identification task does not appear to depend on overall level over a 60-dB range for two-octave and 1/10th octave wide noise bursts. The CF of the noise did statistically affect rms errors for the 1/10th octave wide noise bursts, but not for the two-octave wide noise bursts, as shown previously (Yost and Zhong, 2014 and Yost et al., 2013). And, as shown in earlier studies (e.g., Yost and Zhong, 2014), rms errors are greater for 1/10th octave wide noise bursts than for two-octave wide noise bursts.

If ITD processing is used primarily to determine sound source localization accuracy for low-frequency sounds and ILD processing primarily for high-frequency sounds, then these results suggest that overall sound level (over the range from 25 to 85 dB SPL) does not differentially affect the use of ITD and/or ILD cues in determining sound source localization azimuth accuracy for filtered noises in the open field.

Overall level was measured in sound pressure level (SPL, re 20 μPa) meaning that the noises probably differed in loudness as a function of changes in filter center frequency and/or noise bandwidth. Loudness level would probably change very little for 250-Hz to 4000-Hz center-frequency noise bursts that are as wide as two octaves. But the loudness of a 1/10th octave noise is softer than that of a two-octave wide noise, and there is a small loudness/overall level interaction between center frequency and noise bandwidth (see Florentine, 2011). However, these changes in loudness are small in comparison to the 60 dB range of overall level used in the experiment. Thus, it is highly unlikely that presenting sounds at different loudness levels rather than different SPL levels would change the result that overall level (loudness) does not appear to affect sound source localization accuracy in the front azimuthal field over a range of overall levels that are well above threshold and that are not uncomfortably loud.

3. Experiment II: Noise duration

Sound source localization accuracy was measured as a function of noise duration for filtered noise bursts of different durations using the same procedures described in experiment I.

3.1. Method

Listeners. Twelve young (19–37 years of age, 8 females) listeners who reported normal hearing were used as subjects. None of the listeners in experiment II participated in experiment I.

Stimuli. Noise bursts of 60 dB SPL of overall level at durations of 25, 150, and 450 ms were used. All noise bursts were shaped with cosine-squared, 20-ms rise-fall times. The noise bursts were filtered exactly as in experiment I.

Listening Environment and Procedures. Same as in experiment I.

3.2. Results and discussion

Figure 2 displays the data of experiment II as mean (twelve listeners) and plus/minus one standard deviation rms error in degrees as a function of stimulus duration in ms, with the different functions representing different CF conditions. The data for the two-octave wide bands of noise are on the left and those for the 1/10th octave wide bands are on the right. The error bars are ± one standard deviation.

Fig. 2.

Fig. 2.

(Color online) Mean (12 listeners) rms errors in degrees as a function of stimulus duration in ms for the filtered conditions. Left figure: Data for 2-octave wide noise bands (CFs of 250, 2000, and 4000 Hz) and a BB condition. Right figure: Data for the 1/10th octave filter conditions (at three CFs). Error bars are ± one standard deviation.

Two, two-way repeated measures ANOVAs were performed, one for the two-octave wide noise conditions and one for the 1/10th octave noise conditions (as in experiment I). For each ANOVA the two main variables were CF and stimulus duration. For the two-octave wide noise bands (left panel of Fig. 2), neither the CF main effect [F(3,33) = 2.8] nor the stimulus duration main effect [F(2,22) = 2.1] were statistically significant at a 0.05 level of significance. And there was no significant interaction [F(6,66) = 2.7]. For the 1/10th octave wide noise bands (right panel of Fig. 2), the CF main effect was statistically significant [F(2,22) = 8.1] at a 0.05 level of significance, but neither the stimulus duration main effect [F(2,22) = 2.0] nor the interaction [F(4,44) = 2.1] were statistically significant.

Thus, rms error in a sound source localization identification task does not appear to depend on duration for two-octave and 1/10th octave wide noise bursts. The CF of the noises did statistically affect rms errors for the 1/10th octave wide noise bursts, but not the two-octave wide noise bursts, as shown previously (Yost and Zhong, 2014) and in experiment I. And, as shown in previous studies (e.g., Yost and Zhong, 2014) and in experiment I, rms errors are greater for 1/10th octave wide noise bursts than for two-octave wide noise bursts.

In all conditions, rms errors were lower when the sources were in front of the listeners (e.g., 0° azimuth) than when they were at the sides (e.g., 75° azimuth). In Fig. 3 the rms errors for each of the eleven loudspeaker locations were averaged across listeners, levels, and durations for the three filtered conditions for the noise stimuli and plotted as a function of loudspeaker location (#2: −75° right, #7: 0°, #12: 75° left). The two-octave wide data are shown on the left and the 1/10th octave data on the right of Fig. 3. Listeners did occasionally identify loudspeakers #1 and #13 at the ends of the azimuth array as presenting a sound. There were a total of 31 responses identifying loudspeaker #1 and 26 responses identifying loudspeaker #13 aggregated across all conditions, as identifying loudspeakers #1 and #13 did not depend on which conditions were tested. Recall that no sound was presented from loudspeakers #1 and #13 to avoid edge effects as described by Hartmann et al. (1998). The results shown in Fig. 3 are consistent with those obtained in previous studies (Yost et al., 2013; Yost and Zhong, 2014) using similar stimuli and procedures. These results are also consistent with the literature (e.g., Mills, 1972) that indicates higher acuity when the source is in front of a listener rather than off to the side. The data also suggest that listeners did not move their heads to try to gain an advantage by having the sound sources near the front of the head where acuity is highest (Mills, 1972).

Fig. 3.

Fig. 3.

(Color online) Mean (24 listeners) rms errors in degrees as a function of the eleven loudspeaker locations (2: −75° right, 7: 0° center, 12: 75° left). Left figure: Data for the 2-octave wide filter conditions (CFs of 250, 2000, and 4000 Hz). Right figure: Data for the 1/10th octave filter conditions (CFs of 250, 2000, and 4000 Hz). Data are averaged over duration, overall level, and listeners. Error bars are ± one standard deviation.

4. Overall discussion

There are very few data involving sound source localization in the front half of the azimuth plane in an open field when duration and overall level have been systematically varied. Blauert (1997) reports that almost no studies have investigated the effect of duration on sound source localization. Dubrovsky and Chernyak (1971) could not find evidence for an effect of duration on sound source localization performance. There does not appear to be a parametric study of the effects of overall level on sound source localization accuracy in the front azimuthal field. While different studies of sound source localization are run at different overall levels, other differences among the studies make comparing the results based on level differences difficult, if not impossible. Vliegen and Van Opstal (2004) did investigate sound source localization of noise bursts as a function of duration and overall level. The primary goal of their paper involved measuring elevation gain (the relationship between the perceived elevation of sound sources and the actual elevation of the sources). However, they also measured azimuth gain. Their data indicated no effect of duration on azimuthal sound source localization accuracy over the range of 3 to 30 ms, and a very small effect of overall level for only a few listeners over a range of 33 to 73 dB SPL.

The effect of duration and level on lateralization of tones of different frequencies was studied by Yost (1981). In this study the lateral position of tones determined by ITDs and by ILDs did not depend on tonal duration (20, 100, and 500 ms) or overall level (30, 50, and 70 dB SPL) at the various frequencies tested (200, 500, 750, 1000, and 1500 Hz for ITDs, and 200, 500, 1000, 2000, and 5000 Hz for ILDs).

The data of the current paper appear to agree with the few data in the literature regarding sound source localization accuracy in the azimuth plane in an open field and for lateralization judgments of the lateral position of perceived images for headphone generated sounds. That is, sound source localization accuracy and sound lateralization position performance do not appear to depend on overall stimulus level and duration. This is not the case for measures of lateral discrimination of changes in ITDs and ILDs. Yost and Hafter (1987) reviewed the literature pertaining to measures of ITD and ILD discrimination thresholds for tones, noises, and transients as a function of stimulus duration and the number of transients in a click train. Most of the data pertained to ITD discrimination measures in which ITD discrimination thresholds decrease with increasing duration, often in a linear manner on log-log coordinates. This is consistent with an integration of binaural information over time as explained by Yost and Hafter (1987) and in the articles cited in their review. The current lateralization literature confirms the hypothesis of binaural information integration over time for ITD and ILD discrimination in lateralization tasks for a variety of simple and complex stimuli, although the exact form of the information integration is still being investigated (e.g., see Stecker, 2014).

While ILD and, especially, ITD discrimination thresholds clearly decrease with increasing stimulus duration, sound source localization accuracy in an open field does not appear to change with stimulus duration. The two types of measures, interaural discrimination involving headphone-delivered stimuli and sound source localization accuracy in an open field, are clearly different measures. It is difficult to measure lateralization accuracy with headphone-delivered stimuli in the same way sound source localization accuracy is measured. And, it is not possible to measure only ITD or ILD discrimination in an open field. Thus, a direct comparison of interaural (ITD or ILD) discrimination between localization and lateralization tasks would be a challenging undertaking. It seems logical to assume that if duration affects the ability to discriminate a change in interaural time (ITD) or level (ILD), then duration should affect sound source localization accuracy, assuming sound source localization accuracy depends on ITD and ILD processing as is the basic tenet of the duplex theory of sound source localization in the azimuth plane (Rayleigh, 1907, and Stevens and Newman, 1936). However, the data collected to date do not support this assumption. We can only suggest that a careful examination of the differences between the two types of measurements be explored to further understand the role of stimulus duration in spatial hearing.

As has been acknowledged in previous work using localization identification paradigms (Hartmann et al., 1998; Yost et al., 2013; Yost and Zhong, 2014), identification tasks may overestimate sound source localization acuity compared to other sound source localization tasks in which listeners do not have visual or other non-acoustic information about the possible location of the sound sources whose location they are judging. Thus, rms errors may be greater when such localization tasks are used, but there does not appear to be any reason why the effects of overall level or stimulus duration would differentially affect sound source localization accuracy based on the type of task used to determine sound source localization accuracy.

In summary, while stimulus bandwidth and, for narrow bandwidth stimuli, frequency region may have a significant influence on sound source localization accuracy (see Yost et al., 2013 and Yost and Zhong, 2014), it does not appear as if overall level or stimulus duration affect sound source localization accuracy. The current data involved only filtered noise bursts, the overall levels and durations were limited to specific ranges, and the sound source locations were limited to the front azimuthal field. So it is not certain that the results would generalize to other stimuli and spatial configurations.

Acknowledgments

This research was supported by a grant from the Air Force Office of Scientific Research (AFOSR). The assistance of Dr. Xuan Zhong and input from Dr. Yi Zhou and Dr. Torben Pastore are much appreciated. Some of these data were presented at the ASA/ICE meeting in Montreal Canada in 2013. The assistance of Anbar Najam, Lab Coordinator, is appreciated.

References and links

  • 1. Blauert, J. (1997). Spatial Hearing ( MIT Press, Cambridge, MA: ). [Google Scholar]
  • 2. Dubrovsky, N. A. , and Chernyak, R. R. (1971). “ The size and the localization of noise images of different durations of noise,” 7th International Congress on Acoustics, Budapest, 25H4. [Google Scholar]
  • 3. Florentine, M. , Popper, A. N. , and Fay, R. R. (2011). Loudness ( Springer, New York). [Google Scholar]
  • 4. Hartmann, W. M. , Rakerd, B. , Joseph, B. , and Gaalaas, J. B. (1998). “ On the source-identification method,” J. Acoust. Soc. Am. 104, 3546–3557. 10.1121/1.423936 [DOI] [PubMed] [Google Scholar]
  • 5. Mills, A. W. (1972). “ Auditory localization,” in Foundations of Modern Auditory Theory, edited by Tobias J. V. ( Academic, New York), Vol. 2, Chap. 8. [Google Scholar]
  • 6. Rayleigh, J. W. Strutt (1907). “ On our perception of sound direction,” Philos. Mag. 136, 56–464. [Google Scholar]
  • 7. Shub, D. E. , Carr S. P., Kong, Y. , and Colburn, H. S. (2008). “ Discrimination and identification of azimuth using spectral shape,” J. Acoust. Soc. Am. 124, 3132–3141. 10.1121/1.2981634 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Stecker, G. C. (2014). “ Temporal weighting functions for interaural time and level differences: Effects of carrier frequency,” J. Acoust. Soc. Am. 136, 3221–3331. 10.1121/1.4900827 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Stevens, S. S. , and Newman, E. B. (1936). “ The localization of actual sources of sound,” Am. J. Psychol. 48, 297–306. 10.2307/1415748 [DOI] [Google Scholar]
  • 10. Vliegen, J. , and Van Opstal, A. J. (2004). “ The influence of duration and level on human sound Localization,” J. Acoust. Soc. Am. 115, 1705–1713. 10.1121/1.1687423 [DOI] [PubMed] [Google Scholar]
  • 11. Yost, W. A. (1981). “ Lateral position of sinusoids presented with interaural intensive and temporal differences,” J. Acoust. Soc. Am. 70, 397–409. 10.1121/1.386775 [DOI] [Google Scholar]
  • 12. Yost, W. A. , and Hafter, E. (1987). “ Lateralization of simple stimuli,” in Directional Hearing, edited by Yost W. A. and Gourevitch G. ( Springer-Verlag, Berlin: ), Chap. 3. [Google Scholar]
  • 13. Yost, W. A. , Loiselle, L. , Dorman, M. , Brown, C. , and Burns, J. (2013). “ Sound source localization of filtered noises by listeners with normal hearing: A statistical analysis,” J. Acoust. Soc. Am. 133, 2876–2882. 10.1121/1.4799803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Yost, W. A. , and Zhong, X. (2014). “ Sound source localization identification accuracy: Bandwidth dependencies,” J. Acoust. Soc. Am. 136, 2737–2746. 10.1121/1.4898045 [DOI] [PubMed] [Google Scholar]
  • 15. Yost, W. A. , Zhong, X. , and Najam, A. (2015). “ Judging sound rotation when listeners and sound rotate: Sound source localization is a multisensory process,” J. Acoust. Soc. Am. 138, 3293–3308. 10.1121/1.4935091 [DOI] [PubMed] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES