Abstract
Bilateral cochlear-implant (CI) users struggle to understand speech in noisy environments despite receiving some spatial-hearing benefits. One potential solution is to provide acoustic beamforming. A headphone-based experiment was conducted to compare speech understanding under natural CI listening conditions and for two non-adaptive beamformers, one single beam and one binaural, called “triple beam,” which provides an improved signal-to-noise ratio (beamforming benefit) and usable spatial cues by reintroducing interaural level differences. Speech reception thresholds (SRTs) for speech-on-speech masking were measured with target speech presented in front and two maskers in co-located or narrow/wide separations. Numerosity judgments and sound-localization performance also were measured. Natural spatial cues, single-beam, and triple-beam conditions were compared. For CI listeners, there was a negligible change in SRTs when comparing co-located to separated maskers for natural listening conditions. In contrast, there were 4.9- and 16.9-dB improvements in SRTs for the beamformer and 3.5- and 12.3-dB improvements for triple beam (narrow and wide separations). Similar results were found for normal-hearing listeners presented with vocoded stimuli. Single beam improved speech-on-speech masking performance but yielded poor sound localization. Triple beam improved speech-on-speech masking performance, albeit less than the single beam, and sound localization. Thus, triple beam was the most versatile across multiple spatial-hearing domains.
I. INTRODUCTION
One of the main challenges facing cochlear-implant (CI) listeners is the difficulty they experience when attempting to understand target speech in environments with multiple competing talkers or other types of background noise/interference (e.g., Loizou et al., 2009; Litovsky et al., 2012; Goupell et al., 2016; Litovsky et al., 2017). In contrast, normal-hearing (NH) listeners effectively exploit spatial cues [interaural time differences (ITDs) and interaural level differences (ILDs)] that foster the perceived separation of sound sources, which reduces the unwanted masking caused by non-target sound sources (Bronkhorst and Plomp, 1988). Consequently, NH listeners typically demonstrate a larger advantage than CI listeners for understanding speech when masking sound sources originate from different locations than that of the target talker. In other words, NH listeners demonstrate a greater spatial release from masking (SRM; the difference in threshold for co-located and spatially separated target and masker sources, discussed further below) than CI listeners (Schleich et al., 2004; Buss et al., 2008; van Hoesel et al., 2008; Loizou et al., 2009; Bernstein et al., 2016; Goupell et al., 2016).
Fitting CI users with bilateral devices has become increasingly common because it may restore some spatial-perception abilities. Despite the gains found with this approach, there remain fundamental issues that prevent CI users from achieving larger binaural benefits. Sound processing by CIs usually results in the loss of acoustic temporal-fine-structure information, in particular low-frequency ITDs, which provide critical cues for precise sound localization for NH listeners (Wightman and Kistler, 1992) and for most hearing-impaired listeners (but non-CI; e.g., Dubno et al., 2002). Furthermore, CI listeners have demonstrated limited ability to exploit ITDs even when acoustic temporal-fine-structure information is encoded by low-rate pulse trains in current devices (Zirn et al., 2016; Ausili et al., 2020). Therefore, bilateral CI listeners primarily use ILDs for sound localization (e.g., Seeber and Fastl, 2008; Aronoff et al., 2010). Other factors can diminish the salience of binaural cues in bilateral CI listeners, such as interaural place-of-stimulation mismatches (e.g., Hu and Dietz, 2015) and channel interactions from multi-electrode stimulation (Kan and Litovsky, 2015; Egger et al., 2016). Although there have been several previous studies and processor development efforts aimed at mitigating these problems and improving spatial hearing in bilateral CI listeners, a substantial deficit remains. To bring the sound-localization abilities and speech-understanding performance in multiple-source “cocktail party” listening situations [see reviews in Middlebrooks and Simon (2017)] of bilateral CI listeners to levels more comparable to NH listeners (e.g., Schleich et al., 2004; Bernstein et al., 2016; Goupell et al., 2016), advances in the effectiveness of bilateral CI approaches are necessary.
One approach to enhance speech understanding under masked conditions is to use front-end (i.e., before the input to the CI electrode arrays) signal-processing algorithms that implement noise reduction or other types of signal enhancement, rather than concentrating on improving the fidelity of binaural cues provided by the CIs. Such approaches aim to remove the masking sounds prior to reception by the listener. Acoustic beamforming is an example of such an approach that has found some success in enhancing speech understanding in noise for both hearing-aid (Valente et al., 1995; Valente et al., 2000) and CI users (Chung et al., 2006; Chung and Zeng, 2009; Baumgartel et al., 2015a; Baumgartel et al., 2015b; Dieudonné and Francart, 2018; Williges et al., 2018). Acoustic beamformers selectively amplify signals originating from a specified direction (typically from the azimuth the listener is facing) and attenuate signals originating from other azimuths. Thus, if the target source is in front of the listener at the focus of the beamformer and the maskers originate from non-frontal azimuths, then the signal-to-noise ratio (SNR) [or in the case of speech-on-speech masking, target-to-masker ratio (TMR)] is improved by an amount that depends on a variety of factors including the (typically) frequency-dependent directional response. A simple type of beamformer is a directional microphone with a fixed polar pattern of amplification. This type of directional microphone can consist of two or more microphones oriented front-to-back. The signals received then undergo a filter-and-sum algorithm that provides a directional response. The greater the number of microphones used, the greater the response directionality that can be achieved (e.g., Stadler and Rabinowitz, 1993; Desloge et al., 1997; Greenberg et al., 2003). Some beamforming methods have utilized adaptive patterns of amplification. Adaptive beamformers change their polar patterns based on the incoming signal to steer “nulls” (i.e., attenuation) in response to the locations of the interfering sound sources. Under appropriate conditions, adaptive beamformers have been found to provide better speech understanding in noise than omnidirectional amplification both for hearing-aid (Ricketts and Henry, 2002; Valente et al., 2006) and CI users (Spriet et al., 2007; Dorman et al., 2018), although the benefits provided by adaptive methods have limitations too, such as computational complexity (cf. Greenberg et al., 2003).
One limitation of many fixed and adaptive beamformers is that they usually assume the target source location is in front. In reality, there are many instances when the target source location is not in a frontal direction, and the target of interest can also change location rapidly, particularly in a multi-talker scenario. With these non-steering beamformers, the listener's only option to change the beamformer direction is to turn the head toward the target source, a strategy that may be socially awkward or too slow to follow the switching of speakers in a conversation (e.g., Roverud et al., 2018; see also Hendrikse et al., 2019). Steering beamformers, which allow the beamformer directionality to be steered away from the frontal azimuth and toward the target source location, have been explored with electroencephalogram attention decoding (e.g., Fuglsang et al., 2017; Aroudi and Doclo, 2020) and a visually guided hearing aid (VGHA), where an eye-tracker steers a beamformer based on the eye-gaze direction of the user (Kidd et al., 2013; Best et al., 2017b; Roverud et al., 2018). The VGHA thus allows a hearing-impaired (or, potentially, a CI) listener to select and focus on a talker of their choosing in a sound field comprising multiple talkers. The benefits of the beamformer component of the VGHA have been described for static (Kidd et al., 2015) and dynamic situations (Best et al., 2017b; Roverud et al., 2018; Hládek et al., 2019).
Although the beamformer used by a VGHA provides a high degree of spatial selectivity and typically yields large improvements in speech reception thresholds (SRTs) for both speech and noise maskers (e.g., Kidd et al., 2015), it also delivers only a single-beam output so that the stimulus is presented monotically or diotically to the listener. This means that the TMR advantage provided by the beamformer comes at the cost of greatly reduced sound-localization performance and diminished spatial awareness. For many hearing-impaired listeners, a hybrid natural-beamformer strategy that uses a combination of beamforming in the higher frequencies and natural head-induced head-related transfer functions (HRTFs) in the lower frequencies produced both large improvements in SRTs and preserved good localization performance of broadband and low-frequency sounds (Kidd et al., 2015; Best et al., 2017b). The benefit of this hybrid beamformer strategy, however, depends on the listener being able to exploit low-frequency ITDs, an ability that is severely compromised in bilateral CI users (Litovsky et al., 2012; Churchill et al., 2014).
An alternative to this hybrid beamformer strategy that does not depend on low-frequency ITDs has been proposed recently (Jennings and Kidd, 2018; Kidd et al., 2020). This new method, called “triple beam,” uses three spatially tuned beams of amplification designated left, center, and right. The center beam is presented diotically and is equivalent to what is used in the single beamformer (hereafter called “beam”),1 which has been used in the earlier VGHA studies (e.g., Favrot et al., 2013; Kidd et al., 2013; Kidd et al., 2015). The center beam—absent any eye-gaze steering—is focused at 0° azimuth relative to the axis of the head (i.e., directly in front of the listener). The left and right beams are routed only to the left and right ears, respectively, and may be set at any angle from 0° to –90° (left beam) or from 0° to +90° (right). Thus, the input to the left ear is the superposition of the waveforms from the center beam and the left beam, while the input to the right ear is the superposition of the waveforms from the center beam and the right beam. The design goal of the triple-beam algorithm is to take advantage of the benefits of beamforming for speech understanding while also providing improved localization ability. Note that the triple-beam approach is somewhat counterintuitive in that the side beams may actually increase the input levels of the non-target/masker sources (i.e., decrease TMR), depending on the masker locations and orientation of the side beams.
Other novel binaural beamformers with binaural cue preservation and/or amplification have been suggested recently (e.g., Aroudi and Doclo, 2020). Using an approach somewhat similar to that used in the triple beam, Dieudonné and Francart (2018) reported using two beams directed to the sides (no diotic center beam as in triple beam) and introduced ILDs to frequencies <1500 Hz. Using this two-beam approach, they found that SRTs improved for NH listeners presented with bimodal CI simulations by 15.7 dB with noise presented to the CI ear and 7.6 dB with noise presented to the HA ear. Localization errors likewise were improved. Williges et al. (2018) used a steering beamformer and artificially applied ILDs based on estimated source direction in bilateral CI users and showed improvements to spatial-hearing abilities. Given these promising new approaches, we speculated that the triple-beam algorithm could provide a benefit for bilateral CI listeners by exploiting exaggerated ILDs, the primary sound-localization cue for this population, thus providing a greater sense of spatial hearing than would be possible otherwise.
The first goal of the current study was to determine the beamforming benefits for bilateral CI listeners in a spatial speech-on-speech unmasking task using a single beam and the new triple beam. The expectation was that both beamforming approaches would improve speech understanding in “cocktail party” listening conditions in bilateral CI listeners. We were also interested in whether exaggerated ILDs might maintain or improve speech understanding using triple beam by enhancing perceptual segregation of sound sources in conditions where TMR did not improve or even could decrease compared to the single beam (cf. Kidd et al., 2020). Thus, a highly informational masking situation was chosen. Some previous studies have indicated that exaggerated spatial cues could improve SRTs in bilateral CI listeners (Brown, 2014; Bernstein et al., 2016). It is unclear, however, how beneficial spatial information is for speech understanding with interferer sounds for bilateral CI listeners; a majority of the benefit is well described by improvements in TMR (Loizou et al., 2009; Dieudonné and Francart, 2019). A second goal was to determine whether the triple-beam approach specifically would improve sound-source localization. We hypothesized that the triple-beam approach could maintain speech understanding by compensating for decreased TMR with exaggerated spatial cues while also enhancing the ability of these listeners to localize sound sources. Preserving both abilities could provide an advancement in amplification/remediation strategies for bilateral CI users. We also obtained judgments from bilateral CI listeners about the number of sound sources that were present in the sound field in addition to where they were located. The rationale for making those measurements was that a complete understanding of the benefit provided by various algorithms for communication in CI listeners requires consideration of multiple functional aspects of hearing. NH listeners also were tested under channel-vocoded (i.e., a CI simulation; Shannon et al., 1995) stimulus conditions for comparison to the patterns of performance observed for the bilateral CI listeners and under natural listening conditions (which have implications for benefits realized by hearing-aid users).
II. EXPERIMENT I: SPEECH-ON-SPEECH MASKING IN CI LISTENERS
The purpose of this experiment was to compare SRTs in speech-on-speech masking situations using three listening algorithms: natural presentation, a single-beam algorithm (beam), or a hybrid beamforming algorithm with three beams (triple beam).
A. Methods
1. Listeners and equipment
Ten bilateral CI listeners (four females, 25–90 yr, average = 57.3 yr) were tested in this experiment. Listener information is presented in Table I; listeners S1–S10 participated in experiment I. All listeners had at least 1 yr of experience using bilateral CIs. Most of the bilateral CI listeners used their everyday programs on their personal sound processors, which were located behind the ears, and microphones were above the pinnae. S5's typical Cochlear sound processors were CP950s (where the microphone is not located above the pinna, but near the magnet that is behind the pinna). She was tested here, instead, with a pair of standard behind-the-ear CP910s dedicated for research purposes that also were adjusted to her typical settings. The listeners were instructed to select the program with a nearly omnidirectional microphone setting and without SCAN (an automatic scene classifying algorithm) if possible. Two of the ten CI listeners, S1 and S4, were tested with SCAN on because they had no alternative on their processors.
TABLE I.
CI listener demographic information and hearing history.
| Code | Gender | Age (yr) | Implant use duration (yr) | Sound processor | ||
|---|---|---|---|---|---|---|
| Left | Right | Left | Right | |||
| S1 | M | 72 | 15 | 3 | CP920 | CP920 |
| S2 | F | 60 | 9 | 9 | CP910 | CP910 |
| S3 | M | 31 | 9 | 11 | Freedom | Freedom |
| S4 | M | 60 | 8 | 3 | CP920 | CP920 |
| S5 | F | 64 | 1 | 7 | CP910 | CP910 |
| S6 | F | 65 | 5 | 7 | CP910 | CP910 |
| S7 | M | 78 | 4 | 7 | CP910 | CP910 |
| S8 | F | 28 | 26 | 8 | CP920 | CP920 |
| S9 | M | 90 | 7 | 15 | CP920 | CP920 |
| S10 | M | 25 | 7 | 7 | CP920 | CP920 |
| S11 | F | 66 | 9 | 8 | CP910 | CP910 |
| S12 | F | 74 | 11 | 14 | CP910 | CP920 |
| S13 | F | 77 | 13 | 2 | Naída CI Q90 | Naída CI Q90 |
| S14 | F | 45 | 10 | 8 | CP920 | CP920 |
A personal computer ran the experiment in matlab (Mathworks, Natick, MA). Stimuli were delivered by a sound card (UA-25 EX; Edirol/Roland Corp., Los Angeles, CA) and amplifier (D-75A; Crown Audio, Elkhart, IN) to circumaural headphones (HD650; Sennheiser, Wedemark, Germany). Testing was performed in a double-walled sound-attenuating booth (IAC, North Aurora, IL).
Testing was performed at the University of Maryland (College Park, MD). The Institutional Review Board at the University of Maryland approved this research protocol. Informed consent was obtained from listeners before testing.
2. Stimuli
Stimuli were five-word sentences consisting of a name, verb, number, adjective, and object from a laboratory-designed matrix-style corpus (Kidd et al., 2008). There were eight monosyllabic words per category, yielding a total of 40 words. The listener was presented with one target female talker and two masker female talkers. On each trial, three talkers were randomly chosen from a set of eight young-adult female talkers. All talkers spoke with neutral inflection. The duration of the individual words in this corpus was not the same; therefore, while words in each sentence stream occurred at approximately the same time, they were not precisely time aligned. The talkers and words chosen for both the target and maskers were mutually exclusive for each trial. Stimuli were processed in matlab to simulate talkers at different spatial locations, with or without beamforming. They were presented through a pair of circumaural headphones that were placed over the sound processors (Grantham et al., 2008; Goupell et al., 2018).
3. Beamformer characteristics
Measured HRTFs for the beamformers were used to virtually place the target at 0° (in front) and the masking talkers at either 0° (co-located) or symmetrically spaced around the target at locations of ±30° or ±90°. The HRTFs for the algorithms and spacings were created by recording impulse responses in a mildly reverberant sound field laboratory using 16 omnidirectional microphones mounted in four front-to-back rows of four microphones distributed laterally across the top of the head on a flexible headband/circuit board placed on the KEMAR manikin head [Fig. 6 in Kidd (2017); Fig. 2 in Roverud et al. (2018)]. This setup has been used exclusively for investigational purposes (cf. Kidd, 2017); such a beamformer would not be realizable in typical commercial devices and is distinct from other studies that have evaluated approaches that could be implemented in current devices (Adiloglu et al., 2015; Baumgartel et al., 2015b; Dieudonné and Francart, 2018). The signals from the 16 microphones were filtered/delayed-and-summed to create a single-beam beamformer aimed in any specified direction within a range of angles about the head. Beam consisted of a single beam that always pointed toward the front of the listener. Roverud et al. (2018) and Kidd et al. (2020) describe the recording and signal processing for beam and KEMAR in more detail. The triple beam was a combination of three beamformers, which was a simple summation with equal weight on each channel (i.e., the left channel was presented the input from the summed left and center beams) (Kidd et al., 2020). The triple beam had the ability to provide non-zero ILDs for sound sources, depending on the location, but no ITDs were provided.
Figures 1(A) and 1(B) provide an illustration of the broadband spatial response patterns of the triple-beam algorithm. The input stimulus was a broadband noise that initially had the same long-term average spectrum of a female speaker in the corpus that was used in the procedure. Then it was bandpass filtered between 200 and 8000 Hz using a fourth-order forward-backward Butterworth filter, the purpose being to approximately match the frequency range of the CI listeners participating in this experiment and the NH listeners presented vocoded speech participating in experiment II. Slight asymmetries in attenuation were due to irregularities in the room and placement of the microphone array during the recording of the impulse responses used to compute the filters. One beam was aimed toward the front of the talker at 0° (i.e., toward the target source) and created a single-beam signal that was routed to both ears (equivalent to the beam condition). The two other beams (called the left- and right-side beams) were oriented symmetrically at fixed azimuths of ±40°, and the output of each beam, after combination with the output of the center beam, was routed monaurally to the proximal ear. As can be seen in Fig. 1(C), this configuration of beams produces exaggerated ILDs relative to natural listening conditions from 0 to ±60° (Kidd et al., 2020).
FIG. 1.
(Color online) Illustrations of the patterns of attenuation [(A) and (B)] and ILDs (C) for the triple beam (TRIBEAM) algorithm based on measured impulse responses of the microphone array. In (A), each line represents a beam oriented in a different direction: solid (left ear only) is aimed at –40°, dashed (both ears) is aimed at 0° (directly in front of the listener), and dashed-dotted (right ear only) is aimed at +40°. The y axis represents attenuation relative to the signal with no beam processing (0-dB attenuation). (B) is similar except the attenuation is that for the summed center and side beams; in other words, it is the attenuation experienced by each ear when presented the triple-beam algorithm. (C) demonstrates the ILD (right – left attenuation in dB) from the KEMAR algorithm (solid) and the triple-beam algorithm (dashed). The beam algorithm is omitted because the diotic presentation produces 0-dB ILD for all source azimuths. (D) shows the TMR (solid lines) as well as the corresponding iSNR (dashed lines; see text for a description) as a function of masker angle for symmetrically placed maskers presented at the same level as a target placed at 0°. Symbols on the lines indicate TMR/iSNR levels corresponding to masker angles tested in experiments I and II.
As a result of the ILDs and directional microphone characteristics, improvements in the TMR can be achieved, depending on the spatial positions of target and maskers. We therefore calculated the intelligibility-weighted signal-to-noise ratio (iSNR) to estimate such improvements (Greenberg et al., 1993; Baumgartel et al., 2015a). The iSNR was calculated by filtering the signal (Gaussian noise convolved with the long-term average spectrum of the female BUG corpus used in this study) into six bands using fourth-order Butterworth filters and weighting and summing the SNRs for the resulting bands according to the octave-band speech intelligibility index standard (ANSI, 2017). The reported iSNR was the average of both ears. Because iSNR calculations were mostly independent of TMR, we provide the iSNR with target and maskers presented at the same level in Fig. 1(D). In general, the beam and triple-beam configurations show improvements in iSNR as the maskers are moved away from the target located at the midline to more lateral azimuths, consistent with the goal of beamforming and the opposite of what occurs for the natural listening (KEMAR) condition for these spatial configurations.
4. Procedure
Listeners were asked to identify the words spoken by a target talker in the presence of two competing masker talkers. The target talker was identified by the first word in the sentence (the name “Sue”; see below). Listeners were tested using three listening algorithms: natural presentation (KEMAR), a single-beamforming algorithm (beam), or a hybrid beamforming algorithm with three beams (triple beam). Impulse responses for KEMAR were recorded in the same room used to record the impulse responses for beam and triple beam, using loudspeakers and in-ear microphones from a KEMAR manikin 1.5 m away (G.R.A.S., Holte, Denmark), not behind-the-ear microphones common to CI sound processors [see detailed description in Kidd (2017)]. Use of the in-ear microphone provided easier comparison to previous studies and results in larger ILDs compared to BTE microphones (Mayo and Goupell, 2020); the resultant ILDs are presented in Fig. 1(C).
To set the stimulus levels, the target was placed at 0° azimuth and initially presented at a sound pressure level (SPL) of 65 dB. Following this initial presentation level, sample words were presented to the CI listener in order to achieve an interaurally loudness-balanced presentation. Monaurally presented speech stimulus tokens were presented to the listener, and an experimenter adjusted a single ear to a comfortable level based on listener report. After each ear was set at a monaural comfortable level, interaurally loudness-balanced values were determined by playing sample speech stimulus tokens sequentially across the ears. Again, an experimenter adjusted the sound levels while maintaining a comfortable overall loudness. Finally, the stimuli were presented dichotically to both ears, and the experimenter made any final adjustments. Only the loudness was balanced, not the spatial location (Fitzgerald et al., 2015; Baumgärtel et al., 2017). During testing, this pretest-determined loudness-balancing value for individual CI listeners was applied to stimuli in addition to any attenuation provided by spatial and microphone processing. Four listeners needed no loudness adjustment, while the other six listeners needed an interaural adjustment of 4 dB or less.
To minimize the effect of automatic gain control of the CIs, all stimuli were presented at or below the loudness-balanced levels. Thus, when the TMR was positive, the target level was held constant near 65 dB SPL while the masker levels were reduced, and when the TMR was negative the maskers where held constant near 65 dB SPL while the target level was reduced (Goupell et al., 2016). This procedure allowed for the appropriate TMR while minimizing the potential confound introduced by stimuli that became uncomfortably loud. The scaling for target and maskers was applied before spatial processing. TMR in this study refers to the ratio of sound level of the target to the individual masker levels, not to the overall level of the two maskers combined.
The listeners were tested using a one-up one-down adaptive procedure that estimated the 50% correct point on the psychometric function. The level corresponding to the 50% correct point was taken as an estimate of the SRT specified as the TMR in dB. On each trial, the listener was simultaneously presented with one target sentence and two masker sentences. The first target word was always “Sue.” The listener selected the four other target words in sequence from an array of all 32 words available from the speech matrix (i.e., excluding the name category, which was not scored). The responses were registered by using a mouse to select the printed words from a graphical user interface displaying the word matrix on a monitor. Correctly indicating at least three of the four target words was deemed a correct response. Listeners were told that they could expect the target sentence from the front.
The experimental runs were blocked by microphone condition (KEMAR, beam, and triple beam) and spatial condition (maskers co-located at 0°, spatially separated at ±30°, or spatially separated at ±90°). CI listeners were tested on nine (3 × 3) combinations in total. Each condition combination was tested in one block consisting of four adaptive tracks. For each trial within the block, a track was randomly chosen from any of the unfinished tracks until all four tracks were completed. For each adaptive track, the TMR started at 12 dB. The initial adaptive step size was 6 dB. The adaptive step size was reduced to 3 dB after three reversals. Each track ended after at least 20 trials and a minimum of nine reversals had occurred. The TMRs of the last six reversals were averaged to obtain the SRT for an individual track. The final SRT for a condition combination for each listener was the average of the four individual track SRTs. The order of condition combinations was randomized for each listener. The experiment took a total of approximately 4 h for each individual CI listener. Listeners were given breaks and tested for more than 1 day if needed.
B. Results and discussion
The SRTs for individual listeners are shown in Fig. 2. Generally, large differences were observed across listeners. For example, listeners S8 and S10 had relatively low SRTs (e.g., one threshold each less than –10-dB TMR), while listeners S1 and S9 had relatively high SRTs (e.g., multiple thresholds greater than 10-dB TMR). The patterns of SRTs, however, were relatively consistent across listeners, supporting a group-level analysis.
FIG. 2.
Masked SRTs (in dB) are plotted along the ordinate for each individual CI listener as a function of the masker separation from the target along the abscissa.
Figure 3 shows the group average results for the CI listeners. A two-way repeated-measures (RM) analysis of variance (ANOVA) was performed on the SRTs with factors of masker angle (three levels: 0, ±30°, ±90°) and microphone condition (three levels: KEMAR, beam, and triple beam). There was a significant main effect of masker angle [F(2,18) = 155.2, p < 0.0001, = 0.95] and of microphone condition [F(2,18) = 239.1, p < 0.0001, = 0.96]. The masker angle × microphone interaction also was significant [F(4,36) = 257.0, p < 0.0001, = 0.89].
FIG. 3.
Group average SRTs (A) for CI listeners as a function of masker separation in azimuth. Error bars represent ±1 standard deviation. (B) shows iSNRs vs SRTs. Each point corresponds to a certain microphone condition and masker angle. The dashed line shows a linear regression of the data.
To analyze this interaction, 36 two-tailed paired t-tests assuming equal variance were performed post hoc, and the p values are reported with Bonferroni correction. For the co-located condition (masker angle = 0°), there were no significant differences in SRTs between microphone conditions [KEMAR vs beam, KEMAR vs triple beam, and beam vs triple beam (p > 0.05 for all three comparisons)]. At masker angles = ±30°, SRTs were significantly lower for beam compared to KEMAR (p < 0.01). The SRT for triple beam was not significantly different from the SRT for KEMAR and beam (p > 0.05 for both comparisons). At masker angles = ±90°, SRTs for KEMAR were significantly higher than SRTs for beam and triple beam (p < 0.0001 for both), and SRTs for triple beam were significantly higher than SRTs for beam (p < 0.005).
For the KEMAR condition, there were no significant differences in SRTs between different angles (p > 0.05 for all three comparisons). For the beam condition, the SRT for the masker angle = 0° was significantly higher than the SRT at ±30° (p < 0.01) and ±90° (p < 0.0001), and the SRT for the masker angle = ±30° was significantly higher than the SRT at ±90° (p < 0.0001). For the triple-beam condition, the SRT for the masker angle = 0° was significantly higher than the SRT at ±30° (p < 0.05) and ±90° (p < 0.0005), and the SRT for the masker angle = ±30° was significantly higher than the SRT at ±90° (p < 0.0005). Therefore, the two-way interaction was a result of an increasingly larger decrease in SRTs as a function of masker angle across the three beamforming conditions: KEMAR (no change), triple beam (improvement), and beam (the most improvement).
The average SRTs obtained in the nine condition combinations were compared to the iSNR [Fig. 3(B)]. A linear regression was highly significant (p < 0.0001), where 90% of the variance was explained in the data points. Therefore, the performance on this task was well described by the changes in iSNR, including the better SRTs for the beam compared to the triple beam, and consistent with other studies in bilateral CI listeners (Baumgartel et al., 2015a). This also suggests that there seems to be minimal additional binaural benefit of possible perceived spatial separation for the triple-beam algorithm, contrary to our hypothesis.
In summary, the bilateral CI listeners benefited greatly from beamforming compared to natural spatial cues. Specifically, there was minimal improvement in SRTs when the target and maskers had spatial separation for the KEMAR conditions, while both beam and triple beam produced large improvement in SRTs. Furthermore, the improvement in SRTs for the beam condition was significantly greater than that for the triple-beam condition, suggesting the improvements in SRTs were driven primarily by improving the TMR rather than providing binaural unmasking from improved perceived spatial separation of the three sources. This result is consistent with the highly significant correlation in SRTs and iSNR.
III. EXPERIMENT II: SPEECH-ON-SPEECH MASKING IN NH LISTENERS
A second experiment was performed with NH listeners using stimuli that were processed through a vocoder under conditions that paralleled those tested with the CI listeners. Vocoding degrades the spectrotemporal representation of the stimulus relative to natural presentation in a manner that is similar to the sound processing performed by actual CIs (Loizou, 2006). An advantage of this type of CI simulation with NH listeners is that it allows certain binaural cues to be manipulated in ways that can control some of the stimulus properties that underlie the spatial-hearing abilities of CI listeners. For example, diminishing the interaural correlation of the vocoded signals may blur perceptual sound-source information and thus affect spatial-hearing abilities such as localization and source segregation (Jones et al., 2014; Swaminathan et al., 2016; Goupell and Stakhovskaya, 2018; Baltzell et al., 2020). In this experiment, two types of carriers were used, interaurally uncorrelated noise carriers that were intended to convey relatively poor spatial-location information (i.e., perceptually diffuse sound sources) and correlated noise carriers that were intended to convey better spatial-location information (i.e., perceptually compact sound sources) (Jones et al., 2014). We hypothesized that the NH listeners presented with the vocoded stimuli would produce poorer SRTs than with the non-vocoded stimuli because of diminished spatial-hearing abilities. We also hypothesized that uncorrelated noise carriers would diminish the usefulness of perceived spatial differences for performing the speech-on-speech masking task. The relevant question then was how different beamforming strategies would benefit listeners under these different interaural vocoding schemes and how that compared to the performance by these same listeners under matched natural (non-vocoded) listening.
A. Methods
1. Listeners and equipment
Six NH listeners (20–24 yr, average = 22 yr) were tested with both unprocessed and vocoded acoustic speech. No attempt was made to age match the NH listeners to the CI listeners for practical reasons (ease of subject recruitment) and because our unpublished preliminary work to date has not indicated any strong interaction between age and the different beamforming algorithms tested here. The NH listeners had pure-tone audiometric thresholds of <20 dB hearing level at octave frequencies from 250 to 8000 Hz (with the exception of one listener who had hearing thresholds at 250 Hz of 20 and 30 dB for the left and right ear, respectively). The listeners were compensated for their participation in this study except for two listeners who were members of the lab (including the first author).
Two listeners were tested at the University of Maryland who had extensive experience with vocoded speech. Four listeners were tested at Boston University who had limited vocoded speech experience. The Institutional Review Boards at the two institutions approved this research protocol. Informed consent was obtained from listeners before testing.
The testing setup at the University of Maryland was the same as in experiment I. At Boston University, a personal computer ran the experiment in matlab. Stimuli were delivered by a sound card (HDSP 9632; RME, Millcreek, UT) to circumaural headphones (HD280 Pro; Sennheiser, Wedemark, Germany). Testing was performed in a double-walled sound-attenuating booth (IAC, North Aurora, IL). The SRTs for the listeners at the University of Maryland were significantly better than for the listeners at Boston University (two-tailed two-sample t-test, p = 0.0499), which may have been a result of the differences in previous exposure to vocoded speech.
2. Stimuli and procedure
The stimuli were the same as those used in experiment I. NH listeners were tested using the same microphone conditions, target and masker spacings, and adaptive procedures described in experiment I.2 The one difference was that the experimental procedure for NH listeners also contained the additional variable of stimulus processing or vocoding. There were three vocoding conditions: unprocessed/natural (called NAT), vocoded with correlated noise carriers (called COR), and vocoded with uncorrelated noise carriers (called UNC). Tests were blocked by microphone, spatial condition, and vocoding condition. In total, NH listeners listened to 27 combinations (3 × 3 × 3). The total experiment took approximately 12 h per listener.
Vocoding generally followed previous implementations (Shannon et al., 1995; Loizou, 2006). The first step was to bandpass filter the stimuli into eight logarithmic and contiguous frequency channels from 300 to 8500 Hz using fourth-order Butterworth filters. The envelopes were then extracted using a Hilbert transform and low-pass filtered at a 400-Hz cutoff frequency with a second-order Butterworth filter. The envelopes were used to modulate narrowband noise carriers that were created using the same fourth-order Butterworth bandpass filters during the frequency decomposition, each of which was either interaurally correlated (identical waveforms) or uncorrelated (statistically independent waveforms). The channel outputs were summed, and the resulting left and right signals were normalized to the root mean square (rms) value of their corresponding pre-vocoded signals. Thus, before TMR scaling was applied, all vocoded signals would be presented at a level of 65 dB SPL, similar to what was done in experiment I. Vocoding was applied after spatial processing and TMR scaling to conform to the order of processing that occurs in bilateral CIs.
B. Results and discussion
The top row of Fig. 4 contains the group average SRTs for the NH listeners across the three vocoding conditions, which were analyzed with a three-way RM ANOVA using the factors of masker angle (three levels: 0, ±30°, ±90°), microphone (three levels: KEMAR, beam, and triple beam), and stimulus processing (three levels: NAT, COR, and UNC). The assumption of sphericity was not always supported, and a Greenhouse–Geisser correction was applied that reduced the degrees of freedom in computing p values where appropriate. There was a significant main effect of masker angle [F(2,10) = 74.0, p < 0.001, = 0.94], where average SRTs were 3.4, –5.8, and –12.0 dB for 0°, ±30°, and ±90°, respectively. There was also a significant two-way masker angle × processing interaction [F(4,20) = 49.1, p < 0.001, = 0.91], a significant two-way masker angle × microphone interaction [F(4,20) = 50.1, p < 0.001, = 0.91], a significant two-way microphone × processing interaction [F(4,20) = 11.7, p < 0.01, = 0.70], and a significant three-way masker angle × microphone × processing interaction [F(2.5,12.3) = 5.4, p < 0.05, = 0.52].
FIG. 4.
Group average SRTs [(A)–(C)] for the NH listeners. The columns display the results from the three types of processing: natural (NAT), vocoded with correlated noise carriers (COR), and vocoded with uncorrelated noise carriers (UNC) plotted in the left, middle, and right columns, respectively. Error bars represent ±1 standard deviation. Within each panel, the different microphone conditions (KEMAR, beam, and triple beam) are indicated by different shading of bars (see legend). The lower row plots iSNRs vs SRTs [(D)–(F)] for NH listeners. Columns are also organized by processing type as in (A)–(C). Trend lines with R2 and p values are shown.
With respect to the three-way interaction, we note that the two vocoded conditions produced similar SRTs across all conditions and differed substantially from the SRTs for the unprocessed conditions. A separate three-way RM ANOVA using the factors of masker angle (three levels: 0, ±30°, ±90°), microphone (three levels: KEMAR, beam, and triple beam), and stimulus processing (only two levels: COR and UNC) confirmed that the main effect of stimulus processing and any interaction with stimulus processing was not significant (p > 0.05 for all). Also, the SRTs for the vocoded conditions appeared similar to those found for the CI listeners in Fig. 3 (an explicit comparison between the SRTs for the CI listeners and the NH listeners presented vocoded stimuli is provided in Sec. VI A). Therefore, the significant three-way interaction was a result of a fundamentally different pattern of SRTs between the unprocessed (NAT) and vocoded (COR and UNC) conditions.
The unprocessed (NAT) SRTs [Fig. 4(A)] showed that there were relatively large improvements in SRTs of >20 dB, even when comparing the smaller ±30° to the 0° masker separation. A two-way RM ANOVA was performed on the NAT SRTs with factors of masker angle (three levels: 0, ±30°, ±90°) and microphone condition (three levels: KEMAR, beam, and triple beam). There was a significant main effect of masker angle [F(2,10) = 141.6, p < 0.0001, = 0.97] and of microphone condition [F(2,10) = 13.6, p < 0.005, = 0.73]. The masker angle × microphone interaction was also significant [F(4,20) = 257.0, p < 0.0005, = 0.63].
To analyze this interaction, 36 two-tailed paired t-tests assuming equal variance were performed post hoc, and the p values are reported with Bonferroni correction. For the co-located condition (masker angle = 0°), there were no significant differences in SRTs between microphone conditions [KEMAR vs beam, KEMAR vs triple beam, and beam vs triple beam (p > 0.05 for all three comparisons)]. At masker angles = ±30°, there were also no significant differences in SRTs between microphone conditions (p > 0.05 for all three comparisons). At masker angles = ±90°, there were also no significant differences in SRTs between microphone conditions (p > 0.05 for all three comparisons).
For the KEMAR condition, the SRT for masker angle = 0° was significantly higher than the SRT for ±30° (p < 0.0001) and ±90° (p < 0.0001), and the SRT for masker angle = ±30° was not significantly different from the SRT for ±90° (p > 0.05). For the beam condition, the SRT for the masker angle = 0° was not significantly different from the SRT at ±30° (p > 0.05), the SRT for the masker angle = 0° was significantly higher than the SRT at ±90° (p < 0.01), and the SRT for the masker angle = ±30° was not significantly different from the SRT at ±90° (p > 0.05). For the triple-beam condition, the SRT for the masker angle = 0° was significantly higher than the SRT at ±30° (p < 0.005) and ±90° (p < 0.0005), and the SRT for the masker angle = ±30° was not significantly different from the SRT at ±90° (p > 0.05). Therefore, the two-way interaction was partially a result of different decreases in SRTs as a function of masker angle across the three beamforming conditions.
The SRTs for the conditions in Fig. 4(A) correspond well with those measured in Kidd et al. (2020), who tested NH listeners under the same NAT conditions using similar methods. Kidd et al. (2020) also found relatively low SRTs, which they partially attributed to the use of an adaptive tracking rule in which a correct response for a trial was registered when only three of the four target words were identified correctly. Perceptual segregation cues could have also contributed to the relatively low SRTs in the current study, such as using loudness differences between talkers (i.e., listening to a quieter talker in the mixture; Brungart, 2001). For the NAT conditions at ±30° and ±90° masker location, it is also interesting to note that the SRT for the KEMAR condition was not significantly different from the SRT obtained from the single-beam and triple-beam conditions. Our interpretation is that this occurred because altering/removing the natural binaural cues with beamforming for NH listeners when presented natural signals reduced perceptual separation, which mattered more than the change in TMR.
To investigate the effects of TMR on these data, we performed an iSNR analysis similar to what was done in experiment I, which is shown in Figs. 4(D)–4(F). The iSNR is significantly correlated with the NAT SRTs in Fig. 4(D), explaining 67% of the variance in the points. In contrast, iSNR was not significantly correlated with the COR SRTs or the UNC SRTs. The lack of correlation may be due partially to the smaller range of SRTs that the conditions produced in Figs. 4(E) and 4(F) compared to Fig. 4(D). Comparing these data to those for the CI listeners in Fig. 3(B), who showed a much stronger relationship between iSNR and SRT, supports the notion that the NH listeners had better access to other non-TMR-based benefits, such as better perceived sound separation, which the iSNR metric would not capture. These results and interpretations motivated a second set of localization experiments (Secs. IV and V) and more formal comparison of data across subject groups (Sec. VI).
The relatively small change in SRT as a function of masker angle for the unprocessed (NAT) compared to the vocoded conditions occurred because of the removal of the temporal fine structure and access to low-frequency ITDs (Swaminathan et al., 2016; Baltzell et al., 2020). One notable difference in the stimuli of the current study compared to those studies is that, to perform a better comparison to actual CI listeners, we applied vocoding after application of our HRTF, which resulted in a negligible effect of noise carrier correlation. Had the processing order been reversed, we would have expected access to low-frequency ITDs from the noise carriers for the correlated condition and thus had a larger SRM and effect of carrier correlation. Garadat et al. (2009) showed that the effect of HRTF-vocoder processing order on SRTs was small to negligible for cases where there was an asymmetrical masker configuration (front target vs 90° left or right interferer); this appears to also apply to cases with symmetrical maskers, as was the case in the current study [see also, e.g., Hu et al. (2018)].
In summary, as expected, NH listeners performed best for NAT stimuli when compared to the two cases where the stimuli were vocoded. With respect to the different microphone algorithms, however, when the stimuli were vocoded (COR and UNC conditions), the NH listeners obtained lower SRTs for both the single-beam and triple-beam conditions with spatially separated maskers than were found for KEMAR. The SRTs for the vocoded conditions were not well explained by the iSNR, which may be a result of their better access to spatial cues than the CI listeners in experiment I.
IV. EXPERIMENT III: NUMEROSITY JUDGMENTS AND SOUND LOCALIZATION IN CI LISTENERS
The limitations on speech understanding for NH and hearing-impaired listeners in the types of speech-on-speech masking conditions tested here are largely a consequence of informational masking, as indicated by studies that have specifically attempted to separate the contributions of energetic and informational components (e.g., Arbogast et al., 2002; Brungart et al., 2006; Kidd et al., 2016; Kidd et al., 2019). In contrast, previous work in CI listeners and the results from experiments I and II suggest that energetic masking (i.e., TMR) best explains the results. One factor underlying informational masking is high rates of explicit masker confusion because of the poor segregation of sound sources (although other factors matter as well; cf. Kidd and Colburn, 2017). This observation leads to questions related to how well bilateral CI listeners and the NH listeners under the vocoded stimulus conditions could perceive the distinct sound sources and were able to determine where the sound sources were located. Specifically, to what extent does the independent processing of CI/vocoded sound input to the two ears diminish the ability to segregate and localize sound sources in the environment? A related question is whether the two beamforming strategies examined in the earlier experiments—which provided improved SRTs—also improve localization and sound-source segregation/determination and to what extent the results from experiments I and II are related to source segregation/localization. Although fully addressing these issues lies beyond the scope of the present study, this experiment was undertaken in an attempt to obtain a rough estimate of the perceptual organization of sound sources in the complex environment used in this study.
A. Methods
1. Listeners and equipment
Five CI (S2 and S11–S14; see Table I; ages = 45–77 yr, average = 64.4 yr) listeners were given a virtual sound-localization task (i.e., stimuli were spatialized through headphones using HRTFs as in experiments I and II). CI listener S2 participated in the masked SRT measurements of experiment I; the other four CI listeners only performed the numerosity/sound-localization measurements of experiment III. Listener S13 had Advanced Bionics devices, while S11, S12, and S14 used Cochlear Ltd. devices. Data were collected at the University of Maryland. The testing setup was the same as in experiment I. CI listeners used their everyday clinical sound processors and performed a loudness-balancing procedure similarly to the one that occurred in experiment I. The stimuli were presented via the same circumaural headphones and the same double-walled sound-attenuating booths as in experiment I. The research protocol was approved by the Institutional Review Board. Informed consent was obtained from listeners before testing.
2. Stimuli and procedures
The stimuli were sentences that were composed and processed in the same manner as described in experiment I. A single sentence or three concurrent sentences from different talkers were presented to the listeners. The sounds were presented from virtual locations of 0°, ±30°, and ±90°. The three-source conditions mimicked those tested in the earlier speech-on-speech masking conditions; namely, the three sources were co-located (all at 0°) or spatially separated [(–30°, 0°, +30°) or (–90°, 0°, +90°)]. The stimuli were processed through the same three microphone conditions that were tested before: KEMAR, beam, and triple beam. The CI listeners were presented only with the unprocessed stimuli.
The listeners were asked to mark the number of talkers and the azimuthal location of each talker on a computer response interface. This interface consisted of an illustration of the top of the head displayed in the middle of a semicircle on a computer screen. The semicircle was centered on the nose of the illustrated head, so that it formed an arc from –90° on the left to +90° on the right. Listeners clicked a button on the screen using a mouse to initiate a trial. After the stimulus presentation, the listener could choose to repeat the stimulus or indicate one, two, or three perceived sound-source location(s) by clicking on the corresponding location(s) on the semicircle. Listeners were allowed to respond with one, two, or three talker locations (note that no two-talker condition was present) at any angle or angles between –90° and +90°. After selecting at least one location, listeners were allowed to move on to the next trial or continue marking additional sound locations. Multiple marks were allowed at a given location to indicate talkers perceived as co-located. Response feedback was not provided. The data consisted of the counts of the number of sources (i.e., numerosity judgments) and the judgments of perceived source azimuths (i.e., localization judgments).
Each condition combination was tested once in a block of trials with the order of conditions randomized. In each block, the CI listeners were tested with 8 source configurations × 3 microphone conditions resulting in 24 trials per block. The CI listeners performed 10 blocks of trials and 240 total trials (8 location configurations × 3 microphone conditions × 10 blocks), which were completed in about 1 h per listener. All three stimulus types were presented at 65 dB SPL (equivalent to TMR = 0 in experiments I and II) before microphone filtering, and no level roving was used during the experiment.
Before each session, listeners were allowed to listen to a non-interactive demo where the sound was moved sequentially from –90° to +90° in 15° steps using NAT stimuli and KEMAR microphone condition, with the corresponding location marked on the semicircle. Listeners could repeat this demonstration as many times as they wanted prior to starting the session.
B. Results and discussion
1. Numerosity judgments
Figures 5(A)–5(F) display the numerosity judgments for the one-source conditions. Inspection of these data indicates that CI listeners S11, S12, and S14 reliably reported one source (>80%) when one source was presented under all of the different microphone conditions. In contrast, CI listener S13 reported one or two sources relatively equally often, while S2 reported one, two, or three sources roughly equally. The average percentage of responses across conditions was 2%, 47%, and 51% for one, two, or three sources, respectively. Therefore, some CI listeners indicated there was more than one talker even for the one-source condition, but had difficulty in determining whether it was two or three talkers.
FIG. 5.
The top row [(A)–(F)] shows numerosity judgments for one-source stimuli (i.e., percentage of trials in which a single source was perceived as one, two, or three sources) for the three microphone conditions for CI listeners. (A)–(E) show results of individual CI listeners, and (F) shows the group average. The bottom row [(G)–(I)] shows group average numerosity judgments for three-source conditions for different masker spatial configurations (maskers at 0°, ±30°, and ±90°). Error bars show ±1 standard deviation.
For the three-source conditions (bottom row of Fig. 5), the average percentage of responses across conditions was 5%, 50%, and 45% for one, two, and three sources, respectively. Therefore, two talkers was the most common response for the CI listeners.
It should be noted that, both in the initial design of the task and in our ultimate interpretation, it was clear that numerosity can be a difficult, uncertain, and variable judgment for bilateral CI listeners. The removal of the temporal fine structure, the perceptual consequences of the independent processing of the two CIs (such as having two devices that are not bilaterally synchronized in time or different neural encoding across ear), or mismatches in the tonotopic locations of electrode pairs across ears likely created blurred or split sound images and/or the poor perceptual segregation of talkers for degraded signals [see Kan and Litovsky (2015) for a review]. Responding that two images were present for a single source could reflect a lack of binaural fusion of the sound inputs (Reiss et al., 2018; Kan et al., 2019).
2. Sound localization
Sound-localization judgments from five CI listeners were obtained under each microphone condition for a single source at angles spanning the range from –90° to +90°. The average response angle from the five individuals and the group average responses are shown in Fig. 6. Note that these values reflect the location judgments only for the data in which there was one stimulus presented and the listener correctly identified that one source was present.
FIG. 6.
The first five panels of the top row show the average response angle as a function of source angle for the individual one-source responses from five CI listeners. The rightmost panel of the top row shows the average across listeners. Accurate responses are indicated by the diagonal line (dashed). The bottom row contains the average absolute error in the localization judgments (vertical distance from the diagonal) for the same five individual CI listeners. The rightmost panel in the bottom row shows the average across listeners. Error bars represent ±1 standard deviation of the group average in both the top and bottom rows. The rms error is displayed in text in the bottom rightmost panel.
The top row of Fig. 6 shows the absolute localization judgments plotted as a function of the azimuthal source angle. The diagonal line indicates accurate responses. First, large differences were apparent across listeners. For KEMAR and triple beam, the responses were qualitatively similar, while the beam responses often were markedly different by comparison. The poor localization under the beam condition was expected, given the absence of binaural cues from the single-beamformer output. However, other cues are available in the beam condition. Sounds outside the focus of the beam are perceived as softer due to the attenuation of the beamformer. Furthermore, the frequency-specific nature of the beam processing means that the timbre of sounds varies with location; in other words, the high frequencies are attenuated more by the beamformer than the low frequencies, causing the speech to be increasingly low-pass filtered as it is moved away from 0°. This effect on localization has been noted in past studies using the single beam [see Kidd (2017) for a review]. The extent to which the CI listeners could exploit such spectral cues is unclear (Goupell et al., 2008; Winn and Litovsky, 2015). The data show large and consistent errors in the judgments of CI listeners at ±90°, and in some cases it appeared that the listener chose one side or the other and always assigned that location to either of the ±90° presentations (e.g., S2 and S11), which might be a result of large ILDs or intracranial lateralization bias occurring as the overall level changes (Goupell et al., 2013; Fitzgerald et al., 2015; Stakhovskaya and Goupell, 2017). These patterns also are clear from the lower row of plots in Fig. 6, which displays the error in localization judgments both for the individual listeners and for the group. Note the cases where the error is very high for one extreme angle but very low on the opposite side.
The relatively good localization performance for the triple beam is a result of the ILDs that are reintroduced with this microphone condition [see Fig. 1(C)]. Thus, lateralization in this situation was roughly proportional to ILD, as suggested by Kelvasa and Dietz (2015).
A two-way RM ANOVA with factors of angle (five levels: 0, ±30°, ±90°) and microphone (three levels: KEMAR, beam, and triple beam) was performed on the group average errors shown in the right panel of the lower row of panels in Fig. 6. The assumption of sphericity was not always supported, and a Greenhouse–Geisser correction was applied that reduced the degrees of freedom in computing p values where appropriate. The main effect of source angle was not significant [F(1.2,4.8) = 3.36, p > 0.05, = 0.46], but the main effect of microphone condition was significant [F(2,8)=26.3, p < 0.0001, = 0.88]. Post hoc analyses indicated that both KEMAR and triple beam yielded lower errors than did beam (p < 0.01 and p < 0.005, respectively), while KEMAR and triple beam were not significantly different (p > 0.05). The source angle × microphone interaction was not significant [F(8,32) = 2.09, p > 0.05, = 0.34].
Figure 6 also reports the rms localization errors for the different conditions to summarize the data and facilitate comparisons for the single-source NAT condition to the existing literature (e.g., Aronoff et al., 2010; Kerber and Seeber, 2012; Jones et al., 2014). Average rms errors of 20°–40° are not uncommon in other studies, with the largest errors often occurring at relatively large angles near ±90°. Given that we only had five source locations, two of which were ±90°, a relatively large percentage of our trials had large error, which explains why our rms error of 42.7° is at the high end of the range reported in the literature.
In summary, the CI listeners' localization performance was much better with the triple-beam algorithm than with the single beam. The localization error with triple-beam was comparable to that found with KEMAR.
V. EXPERIMENT IV: NUMEROSITY JUDGMENTS AND SOUND LOCALIZATION IN NH LISTENERS
A. Methods
Four NH (ages = 21–23 yr) were given the same virtual sound-localization task in experiment III. All four NH listeners participated in both masked SRT measurements of experiment II. Data for the NH listeners were collected at Boston University. The stimuli were presented via the same testing setup, circumaural headphones, and the same double-walled sound-attenuating booth as in experiment II. The research protocol was approved by the Institutional Review Boards at Boston University. Informed consent was obtained from listeners before testing.
The procedure was the same as in experiment III. The one change in conditions was that the NH listeners were presented with the unprocessed and vocoded stimuli (both COR and UNC). Each condition combination was tested once in a block of trials with the order of conditions randomized. The NH had 72 trials per block, since there were also three speech processing conditions. Therefore, NH listeners performed 720 trials (8 location configurations × 3 microphone conditions × 3 vocoder conditions × 10 blocks), which were completed in about 3 h per listener. All three stimulus types were presented at 65 dB SPL (equivalent to TMR = 0 in experiments I and II) before microphone filtering, and no level roving was used during the experiment.
B. Results and discussion
1. Numerosity judgments
For the one-source conditions (leftmost column of Fig. 7), the NH listeners consistently reported one source for both the NAT and COR conditions. They reported slightly fewer one-source responses for the UNC carrier but still did so on more than 80% of the single-source presentations. Also, the microphone condition and stimulus vocoding conditions apparently had little effect on the number of sources reported.
FIG. 7.
The leftmost column [(A)–(C)] shows group average numerosity judgments for one-source stimuli (i.e., percentage of trials in which a single source was perceived as one, two, or three sources) for the three microphone conditions for the NH listeners. The three right columns [(D)–(L)] show group average numerosity judgments for three-source conditions for different masker spatial configurations (maskers at 0°, ±30°, and ±90°) and microphone conditions (ordered in the same way as the single-source panels). The top row presents the results for unprocessed stimuli (NAT), the middle row presents results for vocoded stimuli with interaurally correlated noise carriers (COR), and the bottom row presents vocoded stimuli with interaurally uncorrelated noise carriers (UNC). Error bars show ±1 standard deviation.
For the three-source conditions (rightmost three columns of Fig. 7), the average percentage of responses across conditions was 11, 73, and 16% for one, two, and three sources, respectively. Therefore, for the NH listeners, two talkers was the most common response. There was only one exception when there were more three-source responses, which occurred for the KEMAR NAT ±90° condition. Also, the triple-beam NAT ±90° condition had a relatively large percentage of three-source responses (42%) compared to the other conditions. This result can be explained by the fact that the KEMAR NAT ±90° condition provided the largest spatial differences between talkers and thus should be the easiest in which to perceive three spatially distinct sources. In contrast, beam, which removes spatial cues altogether, resulted in fewer correct numerosity judgments than KEMAR and triple beam in the ±90° conditions. Similarly, COR and UNC, which degrade spatial cues, produced a worse correspondence between the number of actual and perceived sound sources when compared to the NAT condition. There are exceptions where UNC produced more three-source responses than NAT, especially in conditions with few or no spatial cues, such as 0° and beam. However, this could be because the uncorrelated carriers created a spatially diffuse sound image.
2. Sound localization
With respect to the NH listeners tested under the three stimulus processing conditions (NAT, COR, and UNC), the group average results are plotted in Fig. 8 in the same manner as the group average CI data in Fig. 6.
FIG. 8.
The top row of panels [(A)–(C)] shows the group average response angle as a function of source angle for the individual one-source responses from four NH listeners for the three stimulus types: unprocessed (NAT), vocoded with interaurally correlated noise carriers (COR), and vocoded with interaurally uncorrelated noise carriers (UNC). Accurate responses are indicated by the diagonal line (dashed). The bottom row [(D)–(F)] contains the average absolute error in the localization judgments (vertical distance from the diagonal). Error bars represent ±1 standard deviation of the group average. The rms error is displayed in text in the bottom row.
The upper row of Fig. 8 shows the group average sound-localization responses (ordinate) as a function of the single-source location (azimuth) for the NH listeners. A three-way RM ANOVA was conducted on the localization error results (bottom row) with factors of angle (five levels: 0, ±30°, ±90°), microphone (three levels: KEMAR, beam, and triple beam), and stimulus processing (three levels: NAT, COR, and UNC). There was a significant effect of angle [F(1.4,4.2) = 27.4; p < 0.005, = 0.90], as well as a significant effect of microphone [F(1.01,3.03) = 114.1; p < 0.005, = 0.97]. Post hoc tests showed that there was a difference between KEMAR and triple beam compared to beam (p < 0.005 and p < 0.01, respectively), but there was no difference between KEMAR and triple beam (p > 0.05). The main effect of stimulus processing was not significant [F(2,6) = 3.48; p > 0.05, = 0.54]. The angle × microphone interaction was significant [F(2.1,6.2) = 33.2; p < 0.0001, = 0.92], which was a result of the relatively large response errors for the beam condition at ±90° compared to the other microphone conditions (Fig. 8). None of the other two- or three-way interactions were significant (p > 0.05 for all).
The localization results and errors for the NAT condition correspond well with the results of Kidd et al. (2020), who tested NAT listeners under similar conditions. As noted in Kidd et al. (2020), NAT KEMAR errors can be attributed to the tendency to compress responses toward the mean value of a range of values. Furthermore, the use of non-individualized HRTF-convolved stimuli presented through headphones would be expected to cause larger rms errors than free-field presentation because of a poor match with a listener's actual interaural differences and limited source externalization (Middlebrooks, 1999; Best et al., 2020).
VI. COMPARISON OF CI-NH PERFORMANCE
In terms of comparing the results from the CI and NH subjects, the NH vocoded conditions were the most appropriate to use for the comparison because the vocoder is intended as a type of CI simulation. Since there were no significant differences between the COR and UNC vocoder conditions, we used the UNC vocoder for the comparison.
A. SRTs
Figures 9(A)–9(C) (left column) compare SRTs from the CI listeners with those from the NH listeners tested under the UNC condition.3 A three-way mixed ANOVA with factors masker angle (three levels: 0, ±30°, ±90°), group (two levels: CI and NH), and microphone condition (three levels: KEMAR, beam, and triple beam) revealed that there was a main effect of group [F(1,14) = 7.46, p < 0.05, = 0.35], a significant masker angle × group interaction [F(2,28) = 4.28, p < 0.05, = 0.23″], and a significant masker angle × group × microphone interaction [F(4,56) = 3.07, p < 0.05, = 0.18], but the group × microphone was not significant (p > 0.05). While the pattern of performance across microphone conditions was similar between the groups, there were differences suggesting the unsurprising conclusion that vocoding is not a perfect simulation of real CI processing. The analysis also showed significant main effects of masker angle [F(2,28) = 319.7, p < 0.0001, = 0.96] and microphone [F(2,28) = 201.0, p < 0.0001, = 0.94], as well as a significant masker angle × microphone interaction [F(4,56) = 109.3, p < 0.0001, = 0.89].
FIG. 9.
A comparison of group average performance for CI listeners and NH listeners tested under the vocoded uncorrelated carrier (UNC) condition. The left column of (A)–(C) shows a comparison of SRTs from experiments I and II. The center column of (D)–(F) contains the percentage of trials in which the listeners reported one, two, or three sources when a single source was presented from experiments III and IV. The right column of (G)–(I) shows a comparison of the group average absolute localization error from experiments III and IV. The three different microphone conditions are depicted: KEMAR [(A), (D), and (G)], beam [(B), (E), and (H)], and triple beam [(C), (F), and (I)] in rows from top to bottom, respectively. Error bars represent ±1 standard deviation.
The comparison in Figs. 9(A)–9(C) provides some additional insight into the results described in experiments I and II. The similarity in performance across microphone conditions for both groups indicates that when the spectrotemporal and spatial cues are degraded, the ability to focus on and extract information from the target location and use ITDs for binaural unmasking also diminishes, perhaps to the point that little perceived spatial separation occurs, while the benefit of beamforming correspondingly increases, and spatial benefits are derived primarily by TMR improvements. This general trend is qualitatively similar to that found for listeners with sensorineural hearing loss, where the benefit of beamforming increases as natural binaural/spatial abilities decrease (Kidd et al., 2015).
B. Numerosity judgments
Figures 9(D)–9(F) (middle column) show that, based on the group averages, there was a qualitatively similar pattern in the numerosity judgments of the one-source presentations when comparing the responses of CI listeners to the responses of the NH listeners tested with vocoded stimuli using an interaurally uncorrelated carrier. However, quantitatively, there were some notable differences between groups in that the NH listeners reported more one-source responses and had fewer three-source responses than did the CI listeners. The CI listeners S11, S12, and S14 performed the task more like the NH listeners than did the CI listeners S2 and S13 (Fig. 5 compared to Fig. 7).
Across all microphone conditions, three talkers were reported more often for the CI listeners than for the NH listeners. This occurred despite the fact that all of the NH listeners had performed experiment II but only one CI listener had performed experiment I. This might indicate that the impression of split images is particularly strong for CI listeners (even more so than NH UNC) as opposed to indicating a superior ability to differentiate talkers.
C. Localization
It is interesting to note the high qualitative similarity of the patterns of localization error between the CI listeners and the NH listeners presented vocoded stimuli. This comparison is illustrated in Figs. 9(G)–9(I) (right column). Of the three measurements in which these groups were compared, the localization error appeared to yield mostly similar patterns of results, including the main effect of microphone condition.
VII. GENERAL DISCUSSION
The purpose of these experiments was to examine the potential benefits of a novel hybrid beamformer, triple beam (Fig. 1), relative to a single beam on the tasks of understanding masked speech and counting/localizing sound sources for bilateral CI listeners. The triple beam is intended to provide the benefits of beamforming and improved SNRs, while also providing ILD-based localization cues to facilitate perceived separation of sound sources. Although understanding speech in the presence of interferer sounds remains a challenging task for CI listeners, our findings support the proposition that beamforming algorithms can significantly mitigate a portion of these difficulties, at least in laboratory settings. Furthermore, the triple-beam approach provided encouraging indications that sound-source localization may be maintained for bilateral CI listeners, at least in some of the limited set of conditions that were tested.
The first main conclusion of this study is that CI listeners received substantial benefit from both beamformer algorithms tested here in speech-on-speech masking situations (Fig. 3). This conclusion was supported by the finding that, on average, SRTs were relatively high for the bilateral CI listeners under natural listening conditions (i.e., KEMAR HRTFs), while relatively low SRTs were found for both beamformer conditions. The fact that the KEMAR condition provided no improvement to SRTs for the bilateral CI listeners reinforces how little benefit CI users receive from natural spatial cues in speech unmasking scenarios, particularly with symmetrically placed maskers. Average SRM for the bilateral CI listeners with two symmetrical maskers was within the range of 0 to –2 dB. This result was consistent with studies in which the average SRM was also between +3 and –2 dB with symmetrically spaced speech maskers (Misurelli and Litovsky, 2012, 2015; Rana et al., 2017; Hu et al., 2018). CI listeners, however, do experience more SRM under asymmetrical masker configurations (e.g., van Hoesel and Tyler, 2003; Schleich et al., 2004; Loizou et al., 2009; Goupell et al., 2018), which is driven mostly by a monaural better-ear/acoustic head-shadow effect (Kayser et al., 2009; Mayo and Goupell, 2020). Because of head shadow, one of the ears receives a better TMR than the other, yielding an advantage over co-located presentation. In fact, many of these studies found that most of the SRM observed in their experiments was due to the head-shadow effect with minimal contribution from other binaural factors. For example, in Schleich et al. (2004), which used only one noise masker at +90° or –90°, there was a 6.8 dB head-shadow benefit and only a 0.9 dB binaural interaction (i.e., squelch or binaural unmasking) benefit [however, see recent work on disentangling the contributions of binaural benefits in Dieudonné and Francart (2019, 2020)]. In the present study, the long-term average “better-ear” effect was limited because the maskers were located symmetrically around the listener. Short-term glimpsing of the target speech might still be possible (Brungart and Iyer, 2012; Best et al., 2017a), although glimpsing opportunities for the three sentences presented with the similar sentence structure could be less likely than more natural speech conditions. Furthermore, glimpsing appears to be greatly diminished in CI listeners (Hu et al., 2018). The poor performance for the KEMAR condition reinforces the dominant role that the TMR plays in CI speech understanding and the negligible role of binaural interaction in many situations (Baumgartel et al., 2015b; Dieudonné and Francart, 2019).
The possible mechanisms underlying the benefits obtained by the CI listeners are perhaps best understood by considering the NH results as the spatial cues are degraded from the unprocessed condition to the vocoded conditions (Figs. 4 and 9). Vocoding was used to diminish the availability of the spatial cues to the NH listeners. The NH listening conditions in which the stimuli are vocoded, particularly when using interaurally uncorrelated noise carriers, should also disrupt the relationship between the temporal fine structure of the waveforms reaching the two ears and theoretically provide the closest analog to CI listening that was tested here. Therefore, in moving from the NAT to the vocoded conditions, we expected that the spatial performance of the NH listeners would decrease, and the listening experience would resemble that of CI listeners more closely. This expectation was mostly confirmed by the correspondence between the NH-UNC and CI results with respect to SRTs [Figs. 9(A)–9(C)]. For both NH-UNC and CI listeners, the beamforming microphone conditions (beam and triple beam) resulted in significantly better SRTs than the KEMAR conditions (Figs. 3, 4, and 9). The similarity in performance also occurred for the numerosity judgments and localization performance [Figs. 9(D)–9(I)]. For both groups, there was more accurate single-source sound localization when listening through triple beam and KEMAR compared to the single beam. Judgments of numerosity, as defined and measured here, were on average similar between these groups; however, some individual CI listeners showed some notably different response patterns (Fig. 5). Finally, there was no significant difference between the COR and UNC conditions in experiments II and III, so we cannot conclude that one makes for a better CI simulation.
The second main conclusion is that the amplification of off-axis masker sources by the side beams associated with triple beam resulted in diminished SRT improvements relative to beam in the speech-on-speech masking experiments for the CI listeners and NH listeners presented vocoded stimuli. Because triple beam mixes each side beam with the front beam for each ear, the level of the masking speech can increase in both ears for the symmetric masker configuration. Figure 1(D) shows that iSNR improves most rapidly for the single-beam condition as a function of masker angle; iSNR also improves for triple beam, but not as rapidly. These results are consistent with the prominent role that TMR plays in speech understanding in noise or competing speech for CI listeners and their diminished ability to use binaural cues. It should be noted that the triple-beam algorithm was designed to foster the perceptual segregation of sources and, therefore, was expected to potentially yield the largest advantages in performance. However, the SRTs were very well explained by changes in iSNR [Fig. 3(B)] for the CI listeners. The results of the numerosity judgments and the sound-localization performance (Figs. 5 and 8) did not provide convincing evidence that source segregation was strongly enhanced for these symmetrical masker conditions for the CI listeners and NH listeners presented vocoded stimuli.
It is important to note that only one configuration of the triple beam was tested in this study: side beams at ±40° with a 1:1 sound level ratio between the gain of the target beam and that of each side beam. However, the side beams could be oriented toward different angles and/or the sound level ratio of gain adjusted. Thus, it is possible that other configurations could provide better results than found here or that the selection of triple-beam parameters could be adapted to specific acoustic environments (Kidd et al., 2020).
Additionally, although the triple-beam configuration did not perform as well as the single beam in speech-on-speech masking conditions for the bilateral CI listeners, it outperformed the single beam for the single-source sound-localization task, and performed similarly to the KEMAR condition (Figs. 5 and 9). In a real-world situation with multiple talkers, binaural cues are not only important for unmasking speech but also for locating talkers. Identifying off-axis target locations can be a challenge for a single beam, and this problem is diminished for the triple beam. The way a beamformer is steered (e.g., head turns or eye gaze) can be crucial in determining success in communication and for being able to rapidly and accurately follow source transitions (e.g., turn taking in conversation) (Best et al., 2017b; Roverud et al., 2018). From the combination of the results of single-source sound-localization and numerosity tests performed in these laboratory conditions, however, it cannot currently be determined whether triple beam provides improved localization benefits for situations with multiple simultaneous sound sources. If a localization benefit can be found in those situations, triple beam, in conjunction with the eye-gaze steering of a VGHA (as described in the Introduction), potentially could improve the ability of bilateral CI listeners to follow target source transitions while retaining some sense of spatial awareness/localization of sources surrounding the target source. Best et al. (2017c) demonstrated benefits of eye-gaze steering of a hybrid beamformer for both NH and hearing-impaired listeners in a dynamic speech task. Similar benefits for eye-gaze steering of triple beam may also occur for CI users because better localization results in a larger benefit of audio-visual speech understanding in noise (van Hoesel, 2015).
While we have shown advantages for certain situations, such as improved speech understanding for a target in front of the listener and for single-source sound localization, there are other situations that could be considered. For example, speech understanding for targets that are not in front is also a potential advantage of the triple-beam approach, as it intends to provide acoustic information in multiple directions. Therefore, different target directions and whether the listener has a priori knowledge of the listening direction are important future directions for this type of research (Best et al., 2017b; Roverud et al., 2018). It was particularly challenging for CI listeners and NH listeners presented vocoded speech to identify the number of talkers and their locations in situations with more than one source (see Figs. 5 and 7). This caused some difficulties with interpreting the sound-localization data (e.g., Weller et al., 2016) and determining whether the triple beam provided additional spatial-hearing benefits beyond TMR benefits. Therefore, another important direction for future research on the benefits of the triple-beam approach would be to more thoroughly investigate perceptual source segregation and sound-localization improvements in CI users with multiple sound sources. If perceptual segregation can be improved, additional benefits beyond TMR benefits may be realized with the triple-beam approach.
Finally, it is possible to extrapolate these results to broader populations. Hearing-impaired listeners do not demonstrate as much benefit from beamforming algorithms as CI listeners (Dietz and McAlpine, 2015), but moderately hearing-impaired and NH listeners benefit similarly from beamformers (Völker et al., 2015). The NH-NAT results [Figs. 4(A) and 4(D)] suggest that the triple beam provides similar or superior SRT advantages compared to KEMAR and the single-beam conditions. This benefit should occur for any acoustic-hearing listener, where there is some access to ITDs in low-frequency temporal fine structure and the perceived separation between sound sources is better than for CI listeners.
VIII. CONCLUSIONS
Bilateral CI listeners demonstrated relatively poor SRTs when only natural binaural cues were available under speech-on-speech masking conditions with symmetrically placed maskers. A single beamformer improved SRTs for spatially separated sources. A hybrid beamforming algorithm, triple beam, also was shown to improve SRTs. These benefits from triple beam were slightly less than that from the single beam (Fig. 3), and the data were best described by TMR improvements. The triple-beam algorithm, however, significantly enhanced single-source sound-localization performance, unlike the single beam (Figs. 5 and 9).
NH listeners readily achieved negative SRTs through the KEMAR HRTFs (Fig. 4). SRTs substantially increased with vocoding, and there was a relatively large SRT improvement for the single beam and triple beam compared to KEMAR for spatially separated sources. The results from NH listeners in vocoded conditions revealed a pattern of results similar to the bilateral CI listeners (Fig. 9). These results suggested that, as the NH listeners' spatial-hearing abilities and access to temporal-fine-structure cues were reduced to a level that more closely approximated that of a CI listener, the need for beamforming (and exaggerated ILDs associated with triple beam) became more important.
In conclusion, both the single beam and triple beam produced substantial benefits for speech-on-speech masking in bilateral CI listeners, while triple beam alone provided decent sound-localization abilities for single sound sources, albeit at some cost to optimal speech-on-speech masking. Targets and maskers were from a matrix sentence corpus, which has the potential for large amounts of confusability and informational masking, and contributed to the relatively large beamforming benefits reported here. Because of the limited range of conditions tested here for triple beam (e.g., only one type of masker, one side beam angular orientation, and one gain setting), there remains considerable room for further investigation of the merits and optimization of this approach.
ACKNOWLEDGMENTS
We would like to thank Danielle Zukerman and Ginny Alexander for help collecting and analyzing data for this project and Paul Mayo for technical help setting up the experiments. Thanks to Kristina Milvae, Anna Tinnemore, Danielle Zukerman, and Angelo Molina for helpful comments on a previous version of this paper. Christine Mason provided valuable feedback on the project as well as assistance with data collection, presentation, and analysis. Research reported in this publication was supported by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under Grant Nos. R01 DC014948 (M.J.G.) and R01 DC013286 (G.K.) and by Air Force Office of Scientific Research (AFOSR) Grant No. FA9550-16-1-0372 (G.K.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the United States Air Force.
Portions of this work were presented in “Benefits from different types of acoustic beamforming in bilateral cochlear-implant listener,” 177th Meeting of the Acoustical Society of America, Louisville, KY, USA, May 2019.
Footnotes
Beam is used in this paper to refer to a specific beamforming microphone condition used in this study as well as in studies by Kidd et al. (2013, 2015). It does not refer to the BEAM® technology of Cochlear Ltd.
There was one exception, a NH listener who first completed tests consisting of one track and was then tested using three interleaved tracks, resulting in data for four tracks total per condition.
Because there is no difference between COR and UNC carrier conditions, either would have produced a similar comparison with the CI data.
References
- 1. Adiloglu, K. , Kayser, H. , Baumgartel, R. M. , Rennebeck, S. , Dietz, M. , and Hohmann, V. (2015). “ A binaural steering beamformer system for enhancing a moving speech source,” Trends Hear. 19, 1–13. 10.1177/2331216515618903 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.ANSI (2017). ANSI/ASA S3.5-1997. Methods for Calculation of the Speech Intelligibility Index ( American National Standards Institute, New York: ). [Google Scholar]
- 3. Arbogast, T. L. , Mason, C. R. , and Kidd, G., Jr. (2002). “ The effect of spatial separation on informational and energetic masking of speech,” J. Acoust. Soc. Am. 112, 2086–2098. 10.1121/1.1510141 [DOI] [PubMed] [Google Scholar]
- 4. Aronoff, J. M. , Yoon, Y. S. , Freed, D. J. , Vermiglio, A. J. , Pal, I. , and Soli, S. D. (2010). “ The use of interaural time and level difference cues by bilateral cochlear implant users,” J. Acoust. Soc. Am. 127, EL87–EL92. 10.1121/1.3298451 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Aroudi, A. , and Doclo, S. (2020). “ Cognitive-driven binaural beamforming using EEG-based auditory attention decoding,” IEEE/ACM Trans. Audio Speech Lang. Process. 28, 862–875. 10.1109/TASLP.2020.2969779 [DOI] [Google Scholar]
- 6. Ausili, S. A. , Agterberg, M. J. H. , Engel, A. , Voelter, C. , Thomas, J. P. , Brill, S. , Snik, A. F. M. , Dazert, S. , Van Opstal, A. J. , and Mylanus, E. A. M. (2020). “ Spatial hearing by bilateral cochlear implant users with temporal fine-structure processing,” Front. Neurol. 11, article 915, 1–13. 10.3389/fneur.2020.00915 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Baltzell, L. S. , Swaminathan, J. , Cho, A. Y. , Lavandier, M. , and Best, V. (2020). “ Binaural sensitivity and release from speech-on-speech masking in listeners with and without hearing loss,” J. Acoust. Soc. Am. 147, 1546–1561. 10.1121/10.0000812 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Baumgartel, R. M. , Hu, H. , Krawczyk-Becker, M. , Marquardt, D. , Herzke, T. , Coleman, G. , Adiloglu, K. , Bomke, K. , Plotz, K. , Gerkmann, T. , Doclo, S. , Kollmeier, B. , Hohmann, V. , and Dietz, M. (2015a). “ Comparing binaural pre-processing strategies II: Speech intelligibility of bilateral cochlear implant users,” Trends Hear. 19, 1–18. 10.1177/2331216515617917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Baumgartel, R. M. , Krawczyk-Becker, M. , Marquardt, D. , Völker, C. , Hu, H. , Herzke, T. , Coleman, G. , Adiloglu, K. , Ernst, S. M. , Gerkmann, T. , Doclo, S. , Kollmeier, B. , Hohmann, V. , and Dietz, M. (2015b). “ Comparing binaural pre-processing strategies I: Instrumental evaluation,” Trends Hear. 19, 1–16. 10.1177/2331216515617916 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Baumgärtel, R. M. , Hu, H. , Kollmeier, B. , and Dietz, M. (2017). “ Extent of lateralization at large interaural time differences in simulated electric hearing and bilateral cochlear implant users,” J. Acoust. Soc. Am. 141, 2338. 10.1121/1.4979114 [DOI] [PubMed] [Google Scholar]
- 11. Bernstein, J. G. W. , Goupell, M. J. , Schuchman, G. , Rivera, A. , and Brungart, D. S. (2016). “ Having two ears facilitates the perceptual separation of concurrent talkers for bilateral and single-sided deaf cochlear implantees,” Ear Hear. 37, 289–302. 10.1097/AUD.0000000000000284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Best, V. , Baumgartner, R. , Lavandier, M. , Majdak, P. , and Kopco, N. (2020). “ Sound externalization: A review of recent research,” Trends Hear. 24, 1–14. 10.1177/2331216520948390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Best, V. , Mason, C. R. , Swaminathan, J. , Roverud, E. , and Kidd, G., Jr. (2017a). “ Use of a glimpsing model to understand the performance of listeners with and without hearing loss in spatialized speech mixtures,” J. Acoust. Soc. Am. 141, 81–91. 10.1121/1.4973620 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Best, V. , Roverud, E. , Mason, C. R. , and Kidd, G., Jr. (2017b). “ Examination of a hybrid beamformer that preserves auditory spatial cues,” J. Acoust. Soc. Am. 142, EL369–EL374. 10.1121/1.5007279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Best, V. , Roverud, E. , Streeter, T. , Mason, C. R. , and Kidd, G., Jr. (2017c). “ The benefit of a visually guided beamformer in a dynamic speech task,” Trends Hear. 21, 1–11. 10.1177/2331216517722304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Bronkhorst, A. W. , and Plomp, R. (1988). “ The effect of head-induced interaural time and level differences on speech intelligibility in noise,” J. Acoust. Soc. Am. 83, 1508–1516. 10.1121/1.395906 [DOI] [PubMed] [Google Scholar]
- 17. Brown, C. A. (2014). “ Binaural enhancement for bilateral cochlear implant users,” Ear Hear. 35, 580–584. 10.1097/AUD.0000000000000044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Brungart, D. S. (2001). “ Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101–1109. 10.1121/1.1345696 [DOI] [PubMed] [Google Scholar]
- 19. Brungart, D. S. , Chang, P. S. , Simpson, B. D. , and Wang, D. (2006). “ Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Am. 120, 4007–4018. 10.1121/1.2363929 [DOI] [PubMed] [Google Scholar]
- 20. Brungart, D. S. , and Iyer, N. (2012). “ Better-ear glimpsing efficiency with symmetrically-placed interfering talkers,” J. Acoust. Soc. Am. 132, 2545–2556. 10.1121/1.4747005 [DOI] [PubMed] [Google Scholar]
- 21. Buss, E. , Pillsbury, H. C. , Buchman, C. A. , Pillsbury, C. H. , Clark, M. S. , Haynes, D. S. , Labadie, R. F. , Amberg, S. , Roland, P. S. , Kruger, P. , Novak, M. A. , Wirth, J. A. , Black, J. M. , Peters, R. , Lake, J. , Wackym, P. A. , Firszt, J. B. , Wilson, B. S. , Lawson, D. T. , Schatzer, R. , D'Haese, P. S. , and Barco, A. L. (2008). “ Multicenter U.S. bilateral MED-EL cochlear implantation study: Speech perception over the first year of use,” Ear Hear. 29, 20–32. 10.1097/AUD.0b013e31815d7467 [DOI] [PubMed] [Google Scholar]
- 22. Chung, K. , and Zeng, F. G. (2009). “ Using hearing aid adaptive directional microphones to enhance cochlear implant performance,” Hear. Res. 250, 27–37. 10.1016/j.heares.2009.01.005 [DOI] [PubMed] [Google Scholar]
- 23. Chung, K. , Zeng, F. G. , and Acker, K. N. (2006). “ Effects of directional microphone and adaptive multichannel noise reduction algorithm on cochlear implant performance,” J. Acoust. Soc. Am. 120, 2216–2227. 10.1121/1.2258500 [DOI] [PubMed] [Google Scholar]
- 24. Churchill, T. , Kan, A. , Goupell, M. J. , and Litovsky, R. Y. (2014). “ Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners,” J. Acoust. Soc. Am. 136, 1246–1256. 10.1121/1.4892764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Desloge, J. G. , Rabinowitz, W. M. , and Zurek, P. M. (1997). “ Microphone-array hearing aids with binaural output. I. Fixed-processing systems,” IEEE Trans. Speech Audio Process. 5, 529–542. 10.1109/89.641298 [DOI] [Google Scholar]
- 26. Dietz, M. , and McAlpine, D. (2015). “ Advancing binaural cochlear implant technology,” Trends Hear. 19, 1–4. 10.1177/2331216515623374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Dieudonné, B. , and Francart, T. (2018). “ Head shadow enhancement with low-frequency beamforming improves sound localization and speech perception for simulated bimodal listeners,” Hear. Res. 363, 78–84. 10.1016/j.heares.2018.03.007 [DOI] [PubMed] [Google Scholar]
- 28. Dieudonné, B. , and Francart, T. (2019). “ Redundant information is sometimes more beneficial than spatial information to understand speech in noise,” Ear Hear. 40, 545–554. 10.1097/AUD.0000000000000660 [DOI] [PubMed] [Google Scholar]
- 29. Dieudonné, B. , and Francart, T. (2020). “ Speech understanding with bimodal stimulation is determined by monaural signal to noise ratios: No binaural cue processing involved,” Ear Hear. 41, 1158–1171. 10.1097/AUD.0000000000000834 [DOI] [PubMed] [Google Scholar]
- 30. Dorman, M. F. , Natale, S. , and Loiselle, L. (2018). “ Speech understanding and sound source localization by cochlear implant listeners using a pinna-effect imitating microphone and an adaptive beamformer,” J. Am. Acad. Audiol. 29, 197–205. 10.3766/jaaa.16126 [DOI] [PubMed] [Google Scholar]
- 31. Dubno, J. R. , Ahlstrom, J. B. , and Horwitz, A. R. (2002). “ Spectral contributions to the benefit from spatial separation of speech and noise,” J. Speech Lang. Hear. Res. 45, 1297–1310. 10.1044/1092-4388(2002/104) [DOI] [PubMed] [Google Scholar]
- 32. Egger, K. , Majdak, P. , and Laback, B. (2016). “ Channel interaction and current level affect across-electrode integration of interaural time differences in bilateral cochlear-implant listeners,” J. Assoc. Res. Otolaryngol. 17, 55–67. 10.1007/s10162-015-0542-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Favrot, S. , Mason, C. R. , Streeter, T. , Desloge, J. , and Kidd, G. J. (2013). “ Performance of a highly directional microphone array in a reverberant environment,” in Proceedings of the International. Conference on Acoustics, June 7, Montreal, Canada, pp. 1–8. [Google Scholar]
- 34. Fitzgerald, M. B. , Kan, A. , and Goupell, M. J. (2015). “ Bilateral loudness balancing and distorted spatial perception in recipients of bilateral cochlear implants,” Ear Hear. 36, e225–e236. 10.1097/AUD.0000000000000174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Fuglsang, S. A. , Dau, T. , and Hjortkjær, J. (2017). “ Noise-robust cortical tracking of attended speech in real-world acoustic scenes,” Neuroimage 156, 435–444. 10.1016/j.neuroimage.2017.04.026 [DOI] [PubMed] [Google Scholar]
- 36. Garadat, S. N. , Litovsky, R. Y. , Yu, G. , and Zeng, F. G. (2009). “ Role of binaural hearing in speech intelligibility and spatial release from masking using vocoded speech,” J. Acoust. Soc. Am. 126, 2522–2535. 10.1121/1.3238242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Goupell, M. J. , Kan, A. , and Litovsky, R. Y. (2013). “ Typical mapping procedures can produce non-centered auditory images in bilateral cochlear-implant users,” J. Acoust. Soc. Am. 133, EL101–EL107. 10.1121/1.4776772 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Goupell, M. J. , Kan, A. , and Litovsky, R. Y. (2016). “ Spatial attention in bilateral cochlear-implant users,” J. Acoust. Soc. Am. 140, 1652–1662. 10.1121/1.4962378 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Goupell, M. J. , Laback, B. , Majdak, P. , and Baumgartner, W.-D. (2008). “ Current-level discrimination and spectral profile analysis in multi-channel electrical stimulation,” J. Acoust. Soc. Am. 124, 3142–3157. 10.1121/1.2981638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Goupell, M. J. , and Stakhovskaya, O. A. (2018). “ Across-frequency processing of interaural time and level differences in perceived lateralization,” Acta Acust. United Acust. 104, 758–761. 10.3813/AAA.919217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Goupell, M. J. , Stakhovskaya, O. A. , and Bernstein, J. G. W. (2018). “ Contralateral interference caused by binaurally presented competing speech in adult bilateral cochlear-implant users,” Ear Hear. 39, 110–123. 10.1097/AUD.0000000000000470 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Grantham, D. W. , Ashmead, D. H. , Ricketts, T. A. , Haynes, D. S. , and Labadie, R. F. (2008). “ Interaural time and level difference thresholds for acoustically presented signals in post-lingually deafened adults fitted with bilateral cochlear implants using CIS+ processing,” Ear Hear. 29, 33–44. 10.1097/AUD.0b013e31815d636f [DOI] [PubMed] [Google Scholar]
- 43. Greenberg, J. E. , Desloge, J. G. , and Zurek, P. M. (2003). “ Evaluation of array-processing algorithms for a headband hearing aid,” J. Acoust. Soc. Am. 113, 1646–1657. 10.1121/1.1536624 [DOI] [PubMed] [Google Scholar]
- 44. Greenberg, J. E. , Peterson, P. M. , and Zurek, P. M. (1993). “ Intelligibility-weighted measures of speech-to-interference ratio and speech system performance,” J. Acoust. Soc. Am. 94, 3009–3010. 10.1121/1.407334 [DOI] [PubMed] [Google Scholar]
- 45. Hendrikse, M. M. E. , Llorach, G. , Hohmann, V. , and Grimm, G. (2019). “ Movement and gaze behavior in virtual audiovisual listening environments resembling everyday life,” Trends Hear. 23, 1–29. 10.1177/2331216519872362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Hládek, L. , Porr, B. , Naylor, G. , Lunner, T. , and Owen Brimijoin, W. (2019). “ On the interaction of head and gaze control with acoustic beam width of a simulated beamformer in a two-talker scenario,” Trends Hear. 23, 1–12. 10.1177/2331216519876795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Hu, H. , and Dietz, M. (2015). “ Comparison of interaural electrode pairing methods for bilateral cochlear implants,” Trends Hear. 19, 1–22. 10.1177/2331216515617143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Hu, H. , Dietz, M. , Williges, B. , and Ewert, S. D. (2018). “ Better-ear glimpsing with symmetrically-placed interferers in bilateral cochlear implant users,” J. Acoust. Soc. Am. 143, 2128–2141. 10.1121/1.5030918 [DOI] [PubMed] [Google Scholar]
- 49. Jennings, T. , and Kidd, G. (2018). “ A visually guided beamformer to aid listening in complex acoustic environments,” Proc. Mtgs. Acoust. 33, 050005. [Google Scholar]
- 50. Jones, H. , Kan, A. , and Litovsky, R. Y. (2014). “ Comparing sound localization deficits in bilateral cochlear-implant users and vocoder simulations with normal-hearing listeners,” Trends Hear. 18, 1–16. 10.1177/2331216514554574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Kan, A. , Goupell, M. J. , and Litovsky, R. Y. (2019). “ Effect of channel separation and interaural mismatch on fusion and lateralization in normal-hearing and cochlear-implant listeners,” J. Acoust. Soc. Am. 146, 1448–1463. 10.1121/1.5123464 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kan, A. , and Litovsky, R. Y. (2015). “ Binaural hearing with electrical stimulation,” Hear. Res. 322, 127–137. 10.1016/j.heares.2014.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Kayser, H. , Ewert, S. D. , Anemüller, J. , Rohdenburg, T. , Hohmann, V. , and Kollmeier, B. (2009). “ Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses,” EURASIP J. Adv. Signal Process. 2009, 1–10. 10.1155/2009/298605 [DOI] [Google Scholar]
- 54. Kelvasa, D. , and Dietz, M. (2015). “ Auditory model-based sound direction estimation with bilateral cochlear implants,” Trends Hear. 19, 1–16. 10.1177/2331216515616378 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Kerber, S. , and Seeber, B. U. (2012). “ Sound localization in noise by normal-hearing listeners and cochlear implant users,” Ear Hear. 33, 445–457. 10.1097/AUD.0b013e318257607b [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Kidd, G., Jr. (2017). “ Enhancing auditory selective attention using a visually guided hearing aid,” J. Speech Lang. Hear. Res. 60, 3027–3038. 10.1044/2017_JSLHR-H-17-0071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Kidd, G., Jr. , Best, V. , and Mason, C. R. (2008). “ Listening to every other word: Examining the strength of linkage variables in forming streams of speech,” J. Acoust. Soc. Am. 124, 3793–3802. 10.1121/1.2998980 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Kidd, G., Jr. , and Colburn, H. S. (2017). “ Informational masking in speech recognition,” in The Auditory System at the Cocktail Party, edited by Middlebrooks J. C., Simon J. Z., Popper A. N., and Fay R. R. ( Springer International Publishing, Cham, Switzerland: ), pp. 75-110. [Google Scholar]
- 59. Kidd, G., Jr. , Favrot, S. , Desloge, J. G. , Streeter, T. M. , and Mason, C. R. (2013). “ Design and preliminary testing of a visually guided hearing aid,” J. Acoust. Soc. Am. 133, EL202–EL207. 10.1121/1.4791710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Kidd, G., Jr. , Jennings, T. R. , and Byrne, A. J. (2020). “ Enhancing the perceptual segregation and localization of sound sources with a triple beamformer,” J. Acoust. Soc. Am. 148, 3598–3611. 10.1121/10.0002779 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Kidd, G., Jr. , Mason, C. R. , Best, V. , Roverud, E. , Swaminathan, J. , Jennings, T. , Clayton, K. , and Colburn, H. S. (2019). “ Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss,” J. Acoust. Soc. Am. 145, 440. 10.1121/1.5087555 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Kidd, G., Jr. , Mason, C. R. , Best, V. , and Swaminathan, J. (2015). “ Benefits of acoustic beamforming for solving the cocktail party problem,” Trends Hear. 19, 1–15. 10.1177/2331216515593385 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Kidd, G., Jr. , Mason, C. R. , Swaminathan, J. , Roverud, E. , Clayton, K. K. , and Best, V. (2016). “ Determining the energetic and informational components of speech-on-speech masking,” J. Acoust. Soc. Am. 140, 132–144. 10.1121/1.4954748 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Litovsky, R. Y. , Goupell, M. J. , Godar, S. , Grieco-Calub, T. , Jones, G. L. , Garadat, S. N. , Agrawal, S. , Kan, A. , Todd, A. , Hess, C. , and Misurelli, S. (2012). “ Studies on bilateral cochlear implants at the University of Wisconsin's Binaural Hearing and Speech Laboratory,” J. Am. Acad. Audiol. 23, 476–494. 10.3766/jaaa.23.6.9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Litovsky, R. Y. , Goupell, M. J. , Misurelli, S. M. , and Kan, A. (2017). “ Hearing with cochlear implants and hearing aids in complex auditory scenes,” in The Auditory System at the Cocktail Party, edited by Middlebrooks J. C., Simon J. Z., Popper A. N., and Fay R. R. ( Springer International Publishing, Cham Switzerland: ), pp. 261-291. [Google Scholar]
- 66. Loizou, P. C. (2006). “ Speech processing in vocoder-centric cochlear implants,” in Cochlear and Brainstem Implants, edited by Moller A. ( Karger, Basel, Switzerland: ), pp. 109-143. [DOI] [PubMed] [Google Scholar]
- 67. Loizou, P. C. , Hu, Y. , Litovsky, R. , Yu, G. , Peters, R. , Lake, J. , and Roland, P. (2009). “ Speech recognition by bilateral cochlear implant users in a cocktail-party setting,” J. Acoust. Soc. Am. 125, 372–383. 10.1121/1.3036175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Mayo, P. G. , and Goupell, M. J. (2020). “ Acoustic factors affecting interaural level differences for cochlear-implant users,” J. Acoust. Soc. Am. 147, EL357–EL362. 10.1121/10.0001088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Middlebrooks, J. C. (1999). “ Individual differences in external-ear transfer functions reduced by scaling in frequency,” J. Acoust. Soc. Am. 106, 1480–1492. 10.1121/1.427176 [DOI] [PubMed] [Google Scholar]
- 70. Middlebrooks, J. C. , and Simon, J. Z. (2017). The Auditory System at the Cocktail Party ( Springer, Cham, Switzerland). [Google Scholar]
- 71. Misurelli, S. M. , and Litovsky, R. Y. (2012). “ Spatial release from masking in children with normal hearing and with bilateral cochlear implants: Effect of interferer asymmetry,” J. Acoust. Soc. Am. 132, 380–391. 10.1121/1.4725760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Misurelli, S. M. , and Litovsky, R. Y. (2015). “ Spatial release from masking in children with bilateral cochlear implants and with normal hearing: Effect of target-interferer similarity,” J. Acoust. Soc. Am. 138, 319–331. 10.1121/1.4922777 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Rana, B. , Buchholz, J. M. , Morgan, C. , Sharma, M. , Weller, T. , Konganda, S. A. , Shirai, K. , and Kawano, A. (2017). “ Bilateral versus unilateral cochlear implantation in adult listeners: Speech-on-speech masking and multitalker localization,” Trends Hear. 21, 1–15. 10.1177/2331216517722106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Reiss, L. A. J. , Fowler, J. R. , Hartling, C. L. , and Oh, Y. (2018). “ Binaural pitch fusion in bilateral cochlear implant users,” Ear Hear. 39, 390–397. 10.1097/AUD.0000000000000497 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Ricketts, T. , and Henry, P. (2002). “ Evaluation of an adaptive, directional-microphone hearing aid,” Int. J. Audiol. 41, 100–112. 10.3109/14992020209090400 [DOI] [PubMed] [Google Scholar]
- 76. Roverud, E. , Best, V. , Mason, C. R. , Streeter, T. , and Kidd, G., Jr. (2018). “ Evaluating the performance of a visually guided hearing aid using a dynamic auditory-visual word congruence task,” Ear Hear. 39, 756–769. 10.1097/AUD.0000000000000532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Schleich, P. , Nopp, P. , and D'Haese, P. (2004). “ Head shadow, squelch, and summation effects in bilateral users of the MED-EL COMBI 40/40+ cochlear implant,” Ear Hear. 25, 197–204. 10.1097/01.AUD.0000130792.43315.97 [DOI] [PubMed] [Google Scholar]
- 78. Seeber, B. U. , and Fastl, H. (2008). “ Localization cues with bilateral cochlear implants,” J. Acoust. Soc. Am. 123, 1030–1042. 10.1121/1.2821965 [DOI] [PubMed] [Google Scholar]
- 79. Shannon, R. V. , Zeng, F. G. , Kamath, V. , Wygonski, J. , and Ekelid, M. (1995). “ Speech recognition with primarily temporal cues,” Science 270, 303–304. 10.1126/science.270.5234.303 [DOI] [PubMed] [Google Scholar]
- 80. Spriet, A. , Van Deun, L. , Eftaxiadis, K. , Laneau, J. , Moonen, M. , van Dijk, B. , van Wieringen, A. , and Wouters, J. (2007). “ Speech understanding in background noise with the two-microphone adaptive beamformer BEAM in the Nucleus Freedom Cochlear Implant System,” Ear Hear. 28, 62–72. 10.1097/01.aud.0000252470.54246.54 [DOI] [PubMed] [Google Scholar]
- 81. Stadler, R. W. , and Rabinowitz, W. M. (1993). “ On the potential of fixed arrays for hearing aids,” J. Acoust. Soc. Am. 94, 1332–1342. 10.1121/1.408161 [DOI] [PubMed] [Google Scholar]
- 82. Stakhovskaya, O. A. , and Goupell, M. J. (2017). “ Lateralization of interaural level differences with multiple electrode stimulation in bilateral cochlear-implant listeners,” Ear Hear. 38, e22–e38. 10.1097/AUD.0000000000000360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Swaminathan, J. , Mason, C. R. , Streeter, T. M. , Best, V. , Roverud, E. , and Kidd, G., Jr. (2016). “ Role of binaural temporal fine structure and envelope cues in cocktail-party listening,” J. Neurosci. 36, 8250–8257. 10.1523/JNEUROSCI.4421-15.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Valente, M. , Fabry, D. A. , and Potts, L. G. (1995). “ Recognition of speech in noise with hearing aids using dual microphones,” J. Am. Acad. Audiol. 6, 440–449. [PubMed] [Google Scholar]
- 85. Valente, M. , Mispagel, K. M. , Tchorz, J. , and Fabry, D. (2006). “ Effect of type of noise and loudspeaker array on the performance of omnidirectional and directional microphones,” J. Am. Acad. Audiol. 17, 398–412. 10.3766/jaaa.17.6.3 [DOI] [PubMed] [Google Scholar]
- 86. Valente, M. , Schuchman, G. , Potts, L. G. , and Beck, L. B. (2000). “ Performance of dual-microphone in-the-ear hearing aids,” J. Am. Acad. Audiol. 11, 181–189. [PubMed] [Google Scholar]
- 87. van Hoesel, R. J. (2015). “ Audio-visual speech intelligibility benefits with bilateral cochlear implants when talker location varies,” J. Assoc. Res. Otolaryngol. 16, 309–315. 10.1007/s10162-014-0503-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. van Hoesel, R. J. M. , Böhm, M. , Pesch, J. , Vandali, A. , Battmer, R. D. , and Lenarz, T. (2008). “ Binaural speech unmasking and localization in noise with bilateral cochlear implants using envelope and fine-timing based strategies,” J. Acoust. Soc. Am. 123, 2249–2263. 10.1121/1.2875229 [DOI] [PubMed] [Google Scholar]
- 89. van Hoesel, R. J. M. , and Tyler, R. S. (2003). “ Speech perception, localization, and lateralization with bilateral cochlear implants,” J. Acoust. Soc. Am. 113, 1617–1630. 10.1121/1.1539520 [DOI] [PubMed] [Google Scholar]
- 90. Völker, C. , Warzybok, A. , and Ernst, S. M. (2015). “ Comparing binaural pre-processing strategies III: Speech intelligibility of normal-hearing and hearing-impaired listeners,” Trends Hear. 19, 1–18. 10.1177/2331216515618609 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Weller, T. , Best, V. , Buchholz, J. M. , and Young, T. (2016). “ A method for assessing auditory spatial analysis in reverberant multitalker environments,” J. Am. Acad. Audiol. 27, 601–611. 10.3766/jaaa.15109 [DOI] [PubMed] [Google Scholar]
- 92. Wightman, F. L. , and Kistler, D. J. (1992). “ The dominant role of low-frequency interaural time differences in sound localization,” J. Acoust. Soc. Am. 91, 1648–1661. 10.1121/1.402445 [DOI] [PubMed] [Google Scholar]
- 93. Williges, B. , Jurgens, T. , Hu, H. , and Dietz, M. (2018). “ Coherent coding of enhanced interaural cues improves sound localization in noise with bilateral cochlear implants,” Trends Hear. 22, 1–18. 10.1177/2331216518781746 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Winn, M. B. , and Litovsky, R. Y. (2015). “ Using speech sounds to test functional spectral resolution in listeners with cochlear implants,” J. Acoust. Soc. Am. 137, 1430–1442. 10.1121/1.4908308 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Zirn, S. , Arndt, S. , Aschendorff, A. , Laszig, R. , and Wesarg, T. (2016). “ Perception of interaural phase differences with envelope and fine structure coding strategies in bilateral cochlear implant users,” Trends Hear. 20, 1–12. 10.1177/2331216516665608 [DOI] [PMC free article] [PubMed] [Google Scholar]









