Skip to main content
Journal of Otology logoLink to Journal of Otology
. 2026 Feb 6;21(1):27–34. doi: 10.26599/JOTO.2026.9540049

Above, Below, and Beyond: Distinct Vertical‑Plane Localization Profiles in Normal Hearing Listeners

Harshada Mali 1,*, Nisha KV 1
PMCID: PMC12945657  PMID: 41766842

Abstract

Background and Objectives: The perception of sound in the vertical plane supports spatial hearing by enabling listeners to detect sources located above and below. Sounds originating from both the front and back elevations along the mid-sagittal plane further contribute to a three-dimensional auditory experience. This study aimed to characterize the variability in vertical sound localization abilities among normal-hearing (NH) individuals using spatialized audio.

Materials and Methods: Fifty-one NH participants (aged 18 to 35 years) completed three vertical localization tasks under headphones as part of a single-group, within-subject experimental study. These tasks included two-plane identification: (1) top-down localization, (2) front-back localization, and one discrimination task in the front plane. Hierarchical Cluster Analysis (HCA) was employed to identify distinct patterns in spatial localization profiles specific to the vertical-median plane. Fisher's Discriminant Function Analysis (FDA) was used to validate the accuracy of HCA and estimate classification error.

Results: HCA revealed three distinct listener clusters: (1) cluster 1 with good performance across all three tasks, (2) cluster 2 with selective impairment in top-bottom identification, and (3) cluster 3 with selective deficits in front-back identification. FDA validated group membership of the clusters identified by the HCA, with a prediction accuracy of 98%.

Conclusions: Individuals with clinically NH exhibited three distinct vertical localization profiles: uniform performers, those impaired in top-bottom identification, and those impaired in front-back identification. These profiles may be linked to the interplay between acoustic and non-acoustic perceptual factors.

Keywords: Vertical-plane sound localization, Spatial Hearing, Localization accuracy, Virtual Assessment, Discrimination.

1. Introduction

Vertical localization is related to the ability to determine the elevation of sound sources in an auditory field (Risoud et al., 2018). Auditory skills can be influenced by various factors, including age (Otte et al., 2013), sensory deficits (Lewald, 2002), and environmental conditions (Getzmann, 2003). While differences in the time of arrival and intensity of sounds between the two ears provide critical information for localizing sound sources along the horizontal plane(Middlebrooks, 2015), the spectral cues created by the pinna and torso help to determine the vertical position of a sound (Macpherson and Sabin, 2013). While spatial abilities have been studied extensively in the horizontal plane (Nisha and Kumar, 2017; Risoud et al., 2018) and distance dimension (Sitdikov et al., 2023), relatively few studies have investigated vertical plane localization.(Jiang et al., 2019; Xie et al., 2023) Vertical localization plays a vital role in three-dimensional auditory perception (Middlebrooks, 2015). However, the underlying mechanisms remain poorly understood, limiting both theoretical insights and the development of spatial audio technologies that rely on elevation cues.

Research on sound source elevation perception encompasses a wide range of methods and outcome measures. The methods most commonly used to study localization in the vertical plane involve paradigms in the free field (Talagala et al., 2014) and in the closed field (Rajguru et al., 2022). In either of these paradigms, localization abilities are estimated using one of two protocols: (i) absolute localization/identification, where the listener must locate a sound source directly without any relative reference, and (ii) discrimination, where two sound sources must be distinguished in the auditory signal, either simultaneously or sequentially (Carlini et al., 2024). In absolute localization, a listener-centered approach is used, whereas in discrimination paradigms, a stimulus-centered approach is used. While tasks such as localization accuracy or angular error estimation are used to assess identification ability, paradigms such as minimum audible angle, minimum audible distance, and spatial bisection are used to evaluate spatial discrimination (Finocchietti et al., 2019). The current study employs both listener- and stimulus-centered approaches, with spatial tasks involving absolute identification and discrimination. In absolute localization paradigms, listeners are required to determine the position of a sound relative to their own head or body, making the task inherently listener-centered. In contrast, discrimination paradigms involve the sequential presentation of two sounds that differ or are similar in virtual elevation. Here, participants judge whether the sounds originate from the same or different elevations, making the task stimulus-centered, because it emphasizes the relationship between stimuli rather than their absolute positions relative to the listener. The considerable variability in the methodology and outcome measures in the literature on vertical localization has hindered efforts to establish a clear profile of vertical spatial perception. A discrimination paradigm in the free field revealed that the minimum audible angle in the vertical plane was 3.65° in normal-hearing young adults (Grantham et al., 2003). In contrast, Lewald (2002) assessed vertical localization using an angular error estimation paradigm in 10 sighted NH individuals (18 to 40 years) and six congenitally blind NH individuals (18-30 years). For free-field sound sources positioned at an elevation of 31° above and below, the absolute errors ranged from 4.5° to 8.3° in sighted participants and from 6.6° to 40.7° in blind participants, highlighting substantial individual variability.

Recently, researchers have employed virtual auditory displays (VADs) and ambisonics technology to reproduce real-world spatial orientation under headphones (Marmel et al., 2018; Rajguru et al., 2022). However, the findings of these studies have remained inconsistent. Using the identification paradigm in the free field, Otte et al. (2013) reported a mean error of 7.6° for sounds ranging between -55° and +55° elevation in NH young adults (24.9 ± 4.9 years). More recently, Rajguru et al. (2022) documented substantially larger errors of 18° in elevation, using a similar absolute identification paradigm for sources in the vertical plane. Drawing clear inferences about vertical localization abilities in NH listeners remains challenging because of methodological variability across existing studies, which often employ diverse tasks, stimuli, and outcome measures. To date, no studies have systematically examined how individuals vary in their ability to localize sounds in the vertical plane, especially when different types of listening tasks are used. To address these gaps, the present study assessed the vertical localization abilities of NH young adults using two types of tasks: sound source identification, which measures the ability to judge the absolute location of a sound, and discrimination, which measures the ability to detect small differences in elevation between sounds. This provides a profile of how vertical spatial cues are processed under each task by different individuals.

Although angular error provides a detailed measure of the overall resolution in terms of localization accuracy, it can obscure direction-specific trends by averaging across all spatial locations. Consequently, it may not reveal localized performance patterns, such as a listener consistently misjudging sounds from a specific plane. To address this, we used plane-specific accuracy metrics to capture directional biases and individual variability more precisely.

2. Methods

2.1. Research design

A within-subject design was used with purposive sampling, having one group complete three vertical localization tasks (Orlikoff et al., 2014).

2.2. Participants

Fifty-one adults with clinically normal hearing sensitivity (Clark, 1981) (pure tone average of 0.5, 1, 2, and 4 kHz < 15 dB HL) aged 18-35 years (mean: 25.03; SD: 3.49; females and males) participated in the study. Participants were students and working staff from the institution, all of whom were recruited through voluntary participation. This study did not include individuals with middle ear disorders, sensorineural hearing loss, neurological conditions, or cognitive deficits. Individuals with more than two years of musical training were excluded, as musical experience is known to enhance spatial hearing abilities (Paromov et al., 2025).

2.3. Informed consent and ethical approval

Ethical committee approval was obtained from the Institutional Review Board (No. SD/IRB/M.1-20-2024-25 dated 23rd December, 2024). All participants provided written informed consent prior to inclusion in the study.

2.4. Instrumentation

The spatialized stimuli for all tests were routed through a MOTU-MICROBOOK IIc audio interface (Motu Inc., Massachusetts, Cambridge, USA) connected to a personal laptop (Dell G15, 12th Gen Intel (R) Core (TM) i5-12500H, 2.50 GHz with 16 GB RAM with Windows 11). In all tests, spatialized stimuli were presented through Sennheiser HD 569 (Wedemark, Germany) headphones at a presentation level of 45 dB SPL. This was considered an important step because the localization accuracy in the vertical median plane decreases as the intensity increases above 45 dB SPL (Marmel et al., 2018). All test stimuli were calibrated using a sound-level meter (Bruel & Kjaer model 2270; B&K Company) connected to a microphone placed in the ear of the manikin (Knowles Electronics Mannequin for Acoustics Research; KEMAR, G.R.A.S Sound & Vibration, type 45 BA, Holte, Denmark).

2.5. Generation of Spatial Stimuli for Vertical Plane Localization

Three experimental paradigms were developed for this study. All spatialized auditory stimuli were generated and processed using two virtual rendering platforms, MATLAB R2024a (The MathWorks, Natick, MA) and Sound Lab (slab3d) version 6.7.3 (NASA, USA, 2012)(Miller, 2012). Slab3d was explicitly used to generate stimuli in the up-down identification task by providing head-related spatial cues through its built-in rendering environment. The use of Slab3d for spatial experiments has been validated in recent literature on the virtual rendering of sound sources.(Sampath et al., 2023) For the front-back identification and vertical discrimination tasks, spatialization was implemented using MathWorks’s ambisonics binaural decoding toolbox (MathWorks Inc, 2022). The use of ambisonics rendering in vertical localization experiments has been advocated, as it enables spatial audio simulations that include elevations from 0° to -180° (front to back in the median plane)(Wiggins and Wiggins, 2017). However, it was limited to an elevation of ± 90° (top to bottom), making it unsuitable for assessing front-back localization across the entire vertical range. The MATLAB-based ambisonics toolbox employs non-individualized ambisonics HRTFs developed by the Institute of Sound and Vibration Research (ISVR) (University of Southampton, UK). These HRTFs are derived from spherical harmonic decomposition and support accurate spatial cue reproduction at elevation and azimuth under headphones (Schillebeeckx et al., 2001; Wiggins and Wiggins, 2017). White noise bursts were generated at a sampling rate of 48 kHz with a duration of 1000 ms, and spatial filtering was applied separately for each channel using frequency-domain FIR filtering. The reference SOFA file included in the toolbox was used for HRTF filtering. Custom MATLAB scripts allowed for the input of specific spatial coordinates, and the final stimuli were saved as “.wav” files. All the stimuli were calibrated to 45 dB SPL.

3. Procedure

3.1. Virtual Auditory Space Identification- Vertical Plane (VASI-V)

Figure 1 shows a schematic representation of the elevations and orientations of the planes used in the VASI-V. The left and right panels depict two subtests: (a) the front-back identification paradigm and (b) the top-bottom identification paradigm. For the front–back plane identification task, virtual sound sources were generated in five locations spanning three different planes with elevations of the front (10° and 45°), top (90°), and back (135° and 170°) (Fig. 1A). For the top-bottom plane identification task, virtual sound sources were generated at five elevations: +90°, +45° (top), 0° (front), -45°, and -90° (bottom) (Fig. 1B).

Figure 1.

Figure 1

Schematic representation of the VASI-V paradigm. Upper panels: Graphical user interfaces (GUIs) of the two identification tasks used to record participant responses on (left) front-back plane identification and (right) top-bottom (vertical) plane identification. Lower panels: Spectral profiles of the sound stimuli used in the front–back plane identification test (left) and the top–bottom (vertical) plane identification test (right)

Familiarization and test phases. Both subtests included a familiarization phase, which included stimulus and task familiarization subphases. In the stimulus familiarization subphase, participants were allowed to explore and become acquainted with the spatial characteristics of each virtual source. They were encouraged to interact with the graphical user interface (GUI) by clicking on the source locations with the mouse. After the click, a sound corresponding to the location is played (Fig. 1). In the task familiarization subphase, the stimulus was first played, and the participants were encouraged to register their responses via a mouse click on the location that emitted the sound. No more than 10 practice trials per location were included in this subphase. Feedback was provided after each response to reinforce learning. The participants were also given the option to repeat the stimulus familiarization subphase if they felt that additional exposure was needed. Neither task was scored. The test phase was administered immediately after familiarization.

In the test phase, each of the five simulated elevations was presented 10 times, resulting in 50 randomized trials. The participants were required to identify the perceived elevation of each sound by a mouse click using the same GUI without any feedback (Fig. 1). An interval of 1000 ms was maintained after every response and subsequent trial. The trials were saved in an Excel sheet, which contained information about the order of stimuli presented and their corresponding responses. Each subtest lasted approximately 15–20 minutes.

Methodological Considerations for Scoring. The midline elevations (0°, 90°) were included in the testing across paradigms to maintain perceptual symmetry and reduce positional bias, although they were excluded from scoring to avoid confounding from non-target planes. Midline spatial positions such as 90° in the front-back plane and 0° in the top-bottom plane were excluded from the cluster analysis because of their reduced directional specificity. These positions are often associated with ambiguous auditory cues (Wightman and Kistler, 1999).

Scoring framework. The scoring system employs a planewise approach that tolerates in-plane errors while requiring strict accuracy for cross-plane discrimination. Only the responses corresponding to the target planes were included in the final analyses. In the front–back plane identification task, elevations corresponding to the front (10°, 45°) and back (135°, 170°) planes were used for planewise scoring, whereas in the top-bottom plane identification task, only the top (+45°, +90°) and bottom (-45°, -90°) elevations were included.

The participants received credit for correct responses when identifying sounds from front elevations (10° and 45°) or back elevations (135° and 170°), even when they confused one front angle with another within the same plane. For example, if a sound presented at 10° was identified as 45°, it was scored as correct because both angles represented the front plane. However, any confusion between the front and back planes resulted in errors, such as identifying a 10° front sound as a 135° back sound. The top-bottom plane identification task employed a similar in-plane tolerance approach. Only elevations representing the top plane (45° and 90°) and bottom plane (-45°, -90°) were included in the analysis.

Each correct identification of the sound source elevation within the target plane was awarded a score of 1, and incorrect responses were scored as zero. The participants could interchange angles within the same plane without loss of score; however, cross-plane errors between the top and bottom positions were scored as “incorrect”, i.e., zero. The final score for each subtest represented the sum of the correctly identified target-plane stimuli, with a possible range of 0-40. These scores were used as indices of the vertical plane identification accuracy in the respective spatial domains.

3.2. Virtual Auditory Space Discrimination- Vertical Plane (VASD-V).

For the discrimination task, seven stimuli were generated in increments of 15° from 0° to 90°, as depicted in the right panel of Figure 3b. The aim was to assess discrimination ability with a finer resolution, that is, 15°, and a pair of stimuli varying or similar in elevation was presented in each trial. The participants were required to complete the discrimination task by judging whether a presented pair of stimuli was "same" or "different" using two alternative forced-choice methods. The GUI (Figure 3) consisted of one play button and two response buttons: "Same" and "Different", as shown in Figure 3a. The experiment included five catch trials for each identical pair (5 ×7) and five test trials for each nonidentical pair (5 × 21). This yielded a total of 145 trials. Before the test, two identical pairs and five nonidentical pairs were randomly presented for task familiarization. The scoring commenced only if all catch trials correctly responded "same." Only pairs of nonidentical trials were used for further analysis. Each correct response was scored as 1, and 0 otherwise. The overall discrimination accuracy at each elevation was computed.

Figure 3.

Figure 3

Hierarchical cluster analysis dendrogram of 51 participants generated using Ward’s method and squared Euclidean distance. The vertical axis represents individual participants, while the horizontal axis indicates linkage distance which is a measure of dissimilarity between clusters. Shorter linkage distances suggest greater similarity between participants or clusters, whereas longer distances indicate greater dissimilarity. Branches joined at higher linkage distances represent clusters that are more distinct from one another.

Figure 2.

Figure 2

Left panel: Graphical user interface of the discrimination task used in the Virtual Acoustic Space Discrimination Test – Vertical (VASD-V). Participants click ‘Play’ to hear a pair of virtual sound stimuli presented along the vertical plane and then indicate whether the sounds originated from the Same’ or ‘Different perceived locations by selecting ‘Same’ or ‘Different’. Right panel: Visual representation of the vertical elevations used in the discrimination task.

3.3. Validation

All the spatial tests were validated by five experts using a structured questionnaire to assess the construct relevance, clarity, ecological validity, and feasibility. The content validity index (CVI) was calculated, with all the items showing strong agreement (CVI > 0.80). (see Appendix).

3.4. Statistical analysis

Data were analyzed using IBM SPSS software version 26 (IBM Corp, Chicago, IL, USA). The Shapiro-Wilk test of normality was administered. Owing to the differing maximum scores of the three test paradigms (40 for front-back and top-bottom identification each and 210 for discrimination), all scores were standardized to Z scores to enable meaningful comparisons across variables and to normalize the data. Hierarchical cluster analysis (HCA) was conducted using Ward’s method (Ward, 1963) with the squared Euclidean distance as the dissimilarity metric to explore patterns in participant performance across the three test paradigms. The resulting agglomeration schedule and dendrogram were used to determine the optimal number of clusters. The HCA was subsequently rerun with the number of clusters specified to produce a final solution, assigning each participant to a cluster group. To validate and interpret the cluster assignments, a multivariate analysis of variance (MANOVA) was conducted with the cluster group as the independent variable and the scores from the three paradigms as the dependent variables. Fisher discriminant function analysis (FDA) was performed to confirm the accuracy of the predicted group membership.

4. Results

Shapiro‒Wilk’s test revealed that the data were normally distributed (p > 0.05). HCA was conducted to explore the underlying patterns of spatial performance. Visual inspection of the resulting dendrogram (Figure 3) revealed a clear separation in linkage distances within the 10 to 15 unit range, thus supporting a three-cluster solution. To increase the stability and precision of cluster membership, a k-means clustering algorithm was subsequently applied and initialized with equally spaced centroids on the basis of the HCA output. The algorithm converged after five iterations, resulting in a stable classification of participants into three distinct clusters.

Cluster analysis yielded three distinct spatial accuracy profiles on the basis of participants’ front-back and top-bottom performance along with discrimination scores, as depicted in Figure 4. Although all three measures were included in the analysis, the resulting cluster distinctions were driven primarily by differences in identification performance, as discrimination scores varied relatively less across the groups. Multivariate analysis of variance (MANOVA) confirmed significant differences across clusters on the basis of identification. The descriptive statistics for each cluster are presented in Table 1. Cluster 1 demonstrated high accuracy across all three measures. Cluster 2 was characterized by strong front-back identification but relatively poor top-bottom accuracy, whereas Cluster 3 exhibited the opposite pattern, with good top-bottom performance but reduced front-back accuracy. Cluster 1 also showed better discrimination accuracy than Clusters 2 and 3 did.

Figure 4.

Figure 4

3D scatter plot illustrating spatial hearing accuracy across the three clusters. Each point represents an individual’s performance across three dimensions: discrimination accuracy (X-axis), top-bottom accuracy (Y-axis), and front-back accuracy.

Table 1. Descriptive statistics (mean ± SD) for performance of the three clusters in terms of front–back accuracy, top–bottom accuracy, and discrimination accuracy for each cluster and the overall sample. Significant group differences were observed across all the measures. Tukey’s post hoc tests revealed multiple significant pairwise differences, highlighting distinct spatial accuracy profiles. The effect sizes (ηp2) indicate large effects for all the variables.

Variables C1
Mean (SD) n=18
C2
Mean (SD) n=17
C3
Mean (SD) n=16
Test of significance Tukey post hoc
Note: FBA represents front–back accuracy, TBA and DA stands for top–bottom and discrimination accuracy respectively
FBA 26.44 (1.42) 26.23 (2.46) 19.06 (1.87) F (2,48) = 75.78,
p < 0.001, ηp2= 0.76
C1 vs C3
C2 vs C3
p < 0.001
p < 0.001
TBA 27.61 (2.65) 18.17 (2.78) 24.5 (4.51) F (2,48) = 34.95,
p < 0.001, ηp2= 0.59
C1 vs C2
C1 vs C3
C2 vs C3
p < 0.001
p < 0.05
p < 0.001
DA 137.33 (7.51) 127.70 (7.33) 126.93 (7.16) F (2,48) = 10.83,
p < 0.001, ηp2= 0.31
C1 vs C2
C1 vs C3
p < 0.01
p < 0.001

The Fisher discriminant analysis prediction validated the results of the cluster analysis. The predicted group memberships closely matched the original cluster assignments derived using Ward’s method. Two canonical discriminant functions were extracted that contributed significantly to distinguishing the clusters. Table 2 presents the results of the FDA test for all functions. Both functions (DF1 and DF2) significantly contributed to the discrimination between the clusters.

Table 2. Wilks’ lambda (λ) and chi-square (χ2) tests for the significance of canonical discriminant functions. The table shows the results of multivariate tests assessing the significance of each discriminant function in separating the groups.

Discriminant Function(s) λ χ² df p
Discriminant function 1 (DF1) 0.07 121.03 6 <0.001
Discriminant function 2 (DF2) 0.41 42.31 2 <0.001

The standardized canonical discriminant function equations were as follows:

DF1 = 0.84 × FBA + 0.02 × TBA + 0.23 × DA

DF2 = -0.27 × FBA + 0.99 × TBA + 0.39 × DA

(Where FBA represents front-back accuracy and TBA and DA represent top-bottom and discrimination accuracies, respectively.)

Front-back accuracy (canonical weight: 0.84) was the primary contributor to DF1, whereas top-bottom accuracy (canonical weight: 0.99) was the predominant contributor to DF2, as shown in Table 2. The cluster centroids plotted in Figure 5 clearly separate the three spatial performance clusters and patterns, suggesting that the two functions capture distinct dimensions of spatial ability. The error rate was computed by comparing the casewise statistics of individuals' discriminant function scores against their clusterwise classification, as shown in Table 3. An overall classification accuracy of approximately 98% was noted, indicating the clear segregation of the groups on the basis of the weights obtained.

Figure 5.

Figure 5

Canonical discriminant functions (functions 1 and 2) represent the distribution of the participant clusters. Each point represents an individual case. Clear separation among clusters confirms the discriminant validity of the spatial performance profiles: clusters C2 and C3 are separated by DF2, whereas C1 and C2 are separated by DF1.

Table 3. Accuracy of discriminant function analyses comparing predicted group membership. Total participants (number count, in parentheses) are tabulated with the corresponding percentage.

Ward’s membership (original classification) Fisher DFA prediction membership Classification error
C1 C2 C3
Note: C1, C2, C3 denote clusters 1, 2 and 3 respectively.
C1 94.4% (17) 5.6 % (1) 0 % (0) 5.6 % (1)
C2 0 % (0) 100% (17) 0 % (0) 0 % (0)
C3 0 % (0) 0 % (0) 100% (16) 0 % (0)
Overall error 1.9 %

5. Discussion

The present study identified three distinct spatial performance profiles among NH listeners (Figure 5), highlighting the heterogeneous nature of vertical-plane spatial auditory processing. The clustering results indicate that individual listeners vary in their ability to utilize spatial cues in the vertical plane, resulting in distinct patterns of localization accuracy. Specifically, the participants in Cluster 1 exhibited consistently high performance across all paradigms (front-back accuracy, top-bottom accuracy, and discrimination accuracy). In contrast, Clusters 2 and 3 revealed notable deficits in identification: Cluster 2 demonstrated preserved front-back accuracy but impaired top-bottom performance, whereas Cluster 3 showed strong top-bottom accuracy alongside reduced front-back identification. These findings provide behavioral support for listener-specific computational models of sagittal plane (vertical) sound localization (Baumgartner et al., 2014). This suggests that localization performance arises from the comparison of incoming spectral cues (HRTFs) with internal templates shaped by nonacoustic factors. Systematic variability in spatial accuracy is better explained by differences in template uncertainty than by HRTF variability alone. That is, listeners with lower uncertainty exhibit more precise spatial judgments and map the derived clusters obtained in the present study onto this computational model (Baumgartner et al., 2014). (Table 4).

Table 4. Mapping clusters to spectral cue fidelity. For clusters, refer to Figure 3.

Clusters Behavioral Profile (current study) Baumgartner Model Mapping
(Baumgartner et al., 2014)
Inferred Spectral Cue Fidelity
C1
(35.29%)
Equally accurate on top/bottom and front/back identification tasks The model predicts strong spectral-template matching + low uncertainty → sharp localization High-fidelity cues across the full spectrum
C2
(33.33%)
Normal front–back; poor top/bottom identification tasks Fails top/bottom tasks when corresponding spectral-template matching is unreliable Degraded top-bottom elevation-band cues (~6–11 kHz)
(Hebrank and Wright, 1974)
C3
(31.37%)
Normal top-bottom; many front–back confusions Fails front/back when corresponding spectral-template matching is unreliable Degraded cues for front/back distinction (likely mid/high frequencies) (Hebrank and Wright, 1974;
Middlebrooks and Green, 1991)

These cognitive-perceptual factors likely interact to produce the spatial error patterns observed across clusters. The discriminant analysis further supported the differentiation between clusters, with front-back and top-bottom accuracy loading strongly on distinct discriminant functions (Table 2). Importantly, discriminant function analysis revealed that identification and discrimination tasks contributed unequally across spatial axes, supporting the notion that spatial hearing involves distinct processing mechanisms for different spatial judgments.

Given that vertical localization depends more heavily on spectral cues shaped by the pinna, from the spectral envelope patterns of the virtual stimuli used in the identification paradigms (Figure 1), it can be concluded that there are distinct bands of energy in the spectra for both front–back and top-bottom elevations (Hofman et al., 1998). While front-back localization is primarily cued by spectral energy differences in the mid- to high-frequency range (6-10 kHz) owing to pinna filtering effects (Hebrank and Wright, 1974; Macpherson and Middlebrooks, 2002), top-bottom identification relies more heavily on spectral notches and peaks in the high-frequency band above 10 kHz(Middlebrooks and Green, 1991). It may be hypothesized that differential use or weighting of these cues underlies the performance variability observed across listener clusters (Lladó et al., 2025). Additional literature indicates that spectral-notch weighting plays a key role in elevation perception, with listeners relying on the stability of pinna-induced spectral features across frequency bands (Middlebrooks and Green, 1991; Hofman et al., 1998). Studies on dynamic cue use show that head movements markedly improve front-back and elevation accuracy by providing time-varying spectral and binaural information (Wightman and Kistler, 1999). Moreover, work comparing natural and virtual elevation localization demonstrates that non-individualized HRTFs reduce vertical accuracy and increase reversals, emphasizing the importance of individualized spectral cues (Wenzel et al., 1993). Understanding the normative subtypes of spatial performance can provide a baseline against which clinical populations, such as those with hearing loss or neurological disorders, can be compared.

Impaired vertical sound localization can lead to difficulty in detecting overhead warnings, voices from elevated positions, or spatially locating threats in multilevel environments, potentially compromising safety and communication (Mendonça et al., 2013). It can also reduce immersion and interaction accuracy in virtual or augmented reality systems, in which accurate spatial hearing is essential (Bronkhorst, 2015). Such deficits may go unnoticed in routine hearing tests but still impact daily functioning in spatially complex settings (Spagnol, 2015). Finally, the observed differential spatial profiles have implications for assessment and rehabilitation. Listeners with preserved top-bottom accuracy but impaired front-back plane identification (Cluster 3) may benefit from training paradigms that emphasize enhancing spectral cue utilization or resolving front-back confusion (Majdak et al., 2013). A limitation of this study is the relatively small sample size within each cluster, which may restrict the generalizability of the identified spatial performance profiles. Additionally, the study employed generalized HRTFs rather than individualized HRTFs, which may have introduced spectral mismatches that may not fully reflect the precision of localization in real-world, listener-specific environments, which also reduces ecological validity. The stimulus set consisted of white noise bursts, as it offers broadband spectral content necessary for elevation-related cues; however, stimulus diversity may reveal further variability. The planewise categorization approach used in this study targets cross-plane perceptual confusion but does not capture finer within-plane angular identification. Future studies will incorporate adaptive psychophysical procedures to obtain threshold-level measures. Furthermore, with the small sample under each cluster, the stability of these spatial performance profiles over time remains unknown, and future work should also examine whether they are modifiable through experience or auditory training, as well as assess potential gender biases. Expanding the task battery to include dynamic or ecologically valid spatial scenarios, such as moving sound sources, reverberant spaces, and audiovisual integration, may provide a more comprehensive understanding of vertical spatial hearing in everyday contexts.

6. Summary

This study demonstrated that spatial hearing performance, even among clinically NH individuals, can be segmented into three robust and distinct localization profiles, reflecting meaningful variability in spatial processing abilities. These profiles corresponded to (1) high accuracy across all spatialized audio tasks, (2) impaired top–bottom identification, and (3) impaired front–back identification. The emergence of these groups confirms that spatial hearing is not a uniform ability, but rather a multidimensional construct in which different spatial dimensions can be selectively preserved or compromised. Variability in performance across profiles appears to stem from differences in spectral cue efficiency and internal template uncertainty, with distinct frequency ranges differentially influencing front-back versus top-bottom perception. Collectively, these findings provide valuable normative benchmarks for clinical assessment and have practical implications for designing personalized auditory training programs and optimizing spatial audio technologies.

7. Conclusion

This study revealed that spatial hearing performance, even among individuals with clinical NH, can be categorized into distinct profiles on the basis of front-back and top-bottom localization tasks. The identification of three robust clusters demonstrates that spatial processing is not uniform across listeners and supports the notion that different spatial dimensions can be independently preserved or impaired. These findings underscore the importance of adopting a multidimensional framework to assess spatial hearing ability. Moreover, the profiles established here provide task- and HRTF-specific normative reference scores that can support future research involving children, individuals with hearing impairment, and those with CAPD.

Acknowledgments

Conflict of interest

The authors declare no conflicts of interest related to this work.

Data availability

The datasets obtained and analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request.

Ethical approval and Informed consent

Ethical committee approval was obtained from the Institutional Review Board (No. SD/IRB/M.1-20-2024-25 dated 23rd December, 2024). All participants provided written informed consent prior to inclusion in the study.

References

  1. Baumgartner, R., Majdak, P., Laback, B. Modeling sound-source localization in sagittal planes for human listeners. J. Acoust. Soc. Am. 2014;136(2):791–802. doi: 10.1121/1.4887447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bronkhorst, A.W. The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten. Percept. Psychophys. 2015;77(5):1465–1487. doi: 10.3758/s13414-015-0882-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Carlini, A., Bordeau, C., Ambard, M. Auditory localization: a comprehensive practical review. Front. Psychol. 2024;15:1408073. doi: 10.3389/fpsyg.2024.1408073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Clark, J.G. Uses and abuses of hearing loss classification. ASHA. 1981;23(7):493–500. [PubMed] [Google Scholar]
  5. Finocchietti, S., Cappagli, G., Giammari, G., et al. Test–retest reliability of BSP, a battery of tests for assessing spatial cognition in visually impaired children. PLoS One. 2019;14(4):e0212006. doi: 10.1371/journal.pone.0212006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Getzmann, S. The influence of the acoustic context on vertical sound localization in the median plane. Percept. Psychophys. 2003;65(7):1045–1057. doi: 10.3758/BF03194833. [DOI] [PubMed] [Google Scholar]
  7. Grantham, D.W., Hornsby, B.W.Y., Erpenbeck, E.A. Auditory spatial resolution in horizontal, vertical, and diagonal planes. J. Acoust. Soc. Am. 2003;114(2):1009–1022. doi: 10.1121/1.1590970. [DOI] [PubMed] [Google Scholar]
  8. Hebrank, J., Wright, D. Spectral cues used in the localization of sound sources on the median plane. J. Acoust. Soc. Am. 1974;56(6):1829–1834. doi: 10.1121/1.1903520. [DOI] [PubMed] [Google Scholar]
  9. Hofman, P.M., Van Riswick, J.G.A., Van Opstal, A.J. Relearning sound localization with new ears. Nat. Neurosci. 1998;1(5):417–421. doi: 10.1038/1633. [DOI] [PubMed] [Google Scholar]
  10. Jiang, J.L., Xie, B.S., Mai, H., et al. The role of dynamic cue in auditory vertical localisation. Appl. Acoust. 2019;146:398–408. doi: 10.1016/j.apacoust.2018.12.002. [DOI] [Google Scholar]
  11. J. D. Miller, “SLAB 3D,” [Software]. Accessed: May 05, 2024. [Online]. Available: https://slab3d.sourceforge.net/
  12. Lewald, J. Vertical sound localization in blind humans. Neuropsychologia. 2002;40(12):1868–1872. doi: 10.1016/S0028-3932(02)00071-4. [DOI] [PubMed] [Google Scholar]
  13. Lladó, P., Majdak, P., Barumerli, R., et al., 2025. Spectral weighting of monaural cues for auditory localization in sagittal planes. Trends Hear. 29, 23312165251317027. 10.1177/23312165251317027. [DOI] [PMC free article] [PubMed]
  14. Macpherson, E.A., Middlebrooks, J.C. Listener weighting of cues for lateral angle: The duplex theory of sound localization revisited. J. Acoust. Soc. Am. 2002;111(5):2219–2236. doi: 10.1121/1.1471898. [DOI] [PubMed] [Google Scholar]
  15. Macpherson, E.A., Sabin, A.T. Vertical-plane sound localization with distorted spectral cues. Hear Res. 2013;306:76–92. doi: 10.1016/j.heares.2013.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Majdak, P., Walder, T., Laback, B. Effect of long-term training on sound localization performance with spectrally warped and band-limited head-related transfer functions. J. Acoust. Soc. Am. 2013;134(3):2148–2159. doi: 10.1121/1.4816543. [DOI] [PubMed] [Google Scholar]
  17. Marmel, F., Marrufo-Pérez, M.I., Heeren, J., et al. Effect of sound level on virtual and free-field localization of brief sounds in the anterior median plane. Hear Res. 2018;365:28–35. doi: 10.1016/j.heares.2018.06.004. [DOI] [PubMed] [Google Scholar]
  18. MathWorks Inc., 2022. Ambisonic binaural decoding (Audio Toolbox documentation). [Online]. Available: https://in.mathworks.com/help/audio/ug/ambisonic-binaural-decoding.html. Accessed: May 05, 2024.
  19. Mendonça, C., Campos, G., Dias, P., et al. Learning auditory space: Generalization and long-term effects. PLoS One. 2013;8(10):e77900. doi: 10.1371/journal.pone.0077900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Middlebrooks, J.C., Green, D.M. Sound localization by human listeners. Annu. Rev. Psychol. 1991;42(1):135–159. doi: 10.1146/annurev.ps.42.020191.001031. [DOI] [PubMed] [Google Scholar]
  21. Middlebrooks, J.C. Sound localization. Handbook Clin. Neurol. 2015;129:99–116. doi: 10.1016/B978-0-444-62630-1.00006-8. [DOI] [PubMed] [Google Scholar]
  22. Nisha, K.V., Kumar, A.U. Virtual auditory space training-induced changes of auditory spatial processing in listeners with normal hearing. J. Int. Adv. Otol. 2017;13(1):118–127. doi: 10.5152/iao.2017.3477. [DOI] [PubMed] [Google Scholar]
  23. Orlikoff, R., Schiavetti, N., Metz, D., 2014. Evaluating Research in Communication Disorders. 7th ed. London: Pearson Education.
  24. Otte, R.J., Agterberg, M.J.H., Van Wanrooij, M.M., et al. Age-related hearing loss and ear morphology affect vertical but not horizontal sound-localization performance. J. Assoc. Res. Otolaryngol. 2013;14(2):261–273. doi: 10.1007/s10162-012-0367-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Paromov, D., Augereau, T.M., Maheu, M., et al. Musical training shapes spatial cognition. Cortex. 2025;193:49–56. doi: 10.1016/j.cortex.2025.10.002. [DOI] [PubMed] [Google Scholar]
  26. Rajguru, C., Brianza, G., Memoli, G. Sound localization in web-based 3D environments. Sci. Rep. 2022;12(1):12107. doi: 10.1038/s41598-022-15931-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Risoud, M., Hanson, J.N., Gauvrit, F., et al. Sound source localization. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 2018;135(4):259–264. doi: 10.1016/j.anorl.2018.04.009. [DOI] [PubMed] [Google Scholar]
  28. Sampath, S., Aisha, S., Neelamegarajan, D., et al. Comparison of a free-field and a closed-field sound source identification paradigms in assessing spatial acuity in adults with normal hearing sensitivity. J. Audiol. Otol. 2023;27(4):219–226. doi: 10.7874/jao.2023.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Schillebeeckx, P., Paterson-Stephens, I., & Wiggins, B., 2001. Using Matlab/Simulink as an implementation tool for multichannel surround sound. In Proc. 19th Int. AES Conf.: Surround Sound (Paper 10059). Munich, Germany. https://www.aes.org/e-lib/browse.cfm?elib=10059
  30. Sitdikov, V.M., Gvozdeva, A.P., Andreeva, I.G. A quick method for determining the relative minimum audible distance using sound images. Atten. Percept. Psychophys. 2023;85(8):2718–2730. doi: 10.3758/s13414-023-02663-y. [DOI] [PubMed] [Google Scholar]
  31. Spagnol, S. On distance dependence of pinna spectral patterns in head-related transfer functions. J. Acoust. Soc. Am. 2015;137(1):EL58–EL64. doi: 10.1121/1.4903919. [DOI] [PubMed] [Google Scholar]
  32. Talagala, D.S., Zhang, W., Abhayapala, T.D., et al. Binaural sound source localization using the frequency diversity of the head-related transfer function. J. Acoust. Soc. Am. 2014;135(3):1207–1217. doi: 10.1121/1.4864304. [DOI] [PubMed] [Google Scholar]
  33. Ward, J.H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963;58(301):236–244. doi: 10.1080/01621459.1963.10500845. [DOI] [Google Scholar]
  34. Wenzel, E.M., Arruda, M., Kistler, D.J., et al. Localization using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 1993;94(1):111–123. doi: 10.1121/1.407089. [DOI] [PubMed] [Google Scholar]
  35. Wiggins, B.J., 2017. Analysis of binaural cue matching using ambisonics to binaural decoding techniques. In: Proceedings of the 4th International Conference on Spatial Audio, Graz, Austria.
  36. Wightman, F.L., Kistler, D.J. Resolution of front–back ambiguity in spatial hearing by listener and source movement. J. Acoust. Soc. Am. 1999;105:2841–2853. doi: 10.1121/1.426899. [DOI] [PubMed] [Google Scholar]
  37. Xie, B.S., Liu, L.L., Jiang, J.L., et al. Auditory vertical localization in the median plane with conflicting dynamic interaural time difference and other elevation cues. J. Acoust. Soc. Am. 2023;154(3):1770–1786. doi: 10.1121/10.0020909. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets obtained and analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request.


Articles from Journal of Otology are provided here courtesy of Chinese PLA General Hospital

RESOURCES