Speech recognition as a function of the number of channels for Mid-Scala electrode array recipients

Katelyn A Berg; Jack H Noble; Benoit M Dawant; Robert T Dwyer; Robert F Labadie; René H Gifford

doi:10.1121/10.0012163

. 2022 Jul 1;152(1):67–79. doi: 10.1121/10.0012163

Speech recognition as a function of the number of channels for Mid-Scala electrode array recipients

Katelyn A Berg ^1,^a),^✉, Jack H Noble ², Benoit M Dawant ², Robert T Dwyer ³, Robert F Labadie ⁴, René H Gifford ¹

PMCID: PMC9984239 PMID: 35931512

Abstract

This study investigated the number of channels needed for maximum speech understanding and sound quality in 15 adult cochlear implant (CI) recipients with Advanced Bionics (AB) Mid-Scala electrode arrays completely within scala tympani. In experiment I, CI programs used a continuous interleaved sampling (CIS)-based strategy and 4–16 active electrodes. In experiment II, CI programs used an n-of-m strategy featuring 16 active electrodes with either 8- or 12-maxima. Speech understanding and sound quality measures were assessed. For CIS programs, participants demonstrated performance gains using up to 4–10 electrodes on speech measures and sound quality ratings. For n-of-m programs, there was no significant effect of maxima, suggesting 8-maxima is sufficient for this sample's maximum performance and sound quality. These results are largely consistent with previous studies using straight electrode arrays [e.g., Fishman, Shannon, and Slattery (1997). J. Speech Lang. Hear. Res. 40, 1201–1215; Friesen, Shannon, Baskent, and Wang (2001). J. Acoust. Soc. Am. 110, 1150–1163; Shannon, Cruz, and Galvin (2011). Audiol. Neurotol. 16, 113–123; Berg, Noble, Dawant, Dwyer, Labadie, and Gifford (2020). J. Acoust. Soc. Am. 147, 3646–3656] and in contrast with recent studies looking at cochlear precurved electrode arrays [e.g., Croghan, Duran, and Smith (2017). J. Acoust. Soc. Am. 142, EL537–EL543; Berg, Noble, Dawant, Dwuer, Labadie, and Gifford (2019b). J. Acoust. Soc. Am. 145, 1556-1564], which found continuous improvements up to 16 independent channels. These findings suggest that Mid-Scala electrode array recipients demonstrate similar channel independence to straight electrode arrays rather than other manufacturer's precurved electrode arrays.

I. INTRODUCTION

For individuals with moderate sloping to profound sensorineural hearing loss, cochlear implants (CIs) enable auditory and speech perception by stimulating the auditory nerve directly through an electrode array, surgically placed into the cochlea. CIs organize frequencies tonotopically using different electrode contacts along the array to mimic the tonotopicity of an unimpaired cochlea. In general, lower frequencies correspond to more apical electrodes and higher frequencies correspond to more basal electrodes, but the exact number of available electrode contacts is manufacturer dependent. Substantial overlap in electrode fields from adjacent electrode contacts is termed channel interaction (e.g., Shannon, 1983). Channel interaction is unavoidable in intracochlear stimulation because the electrode contacts rest in a highly conductive fluid and are relatively far from the spiral ganglia in the modiolus, which contributes to poor spectral resolution (e.g., Jones et al., 2013; Won et al., 2014).

Greater channel independence is assumed when performance improvements occur with an increasing number of active electrodes (Friesen et al., 2001). However, the original CI studies of channel independence in adult CI recipients showed asymptotic speech recognition with 4–8 channels, limiting access to the potential benefits of providing additional channels. For example, a group of 11 postlingually deafened adults with Cochlear Nucleus N22 straight electrode arrays (Sydney, Australia) demonstrated asymptotic speech recognition with 5 channels for consonants, 4 channels for topic-related sentences, and 8 channels for vowels and monosyllables (Fishman et al., 1997). Similarly, a group of ten Cochlear Nucleus N22 recipients and nine Advanced Bionics (AB) Clarion device recipients (Valencia, CA) demonstrated asymptotic speech recognition scores in quiet and in noise with eight channels for vowels and consonants and marginally significant improvements between 7 and 10 channels for monosyllables and sentences (Friesen et al., 2001). Later, Shannon et al. (2011) replicated their seminal report in a group of seven AB CII straight electrode array recipients by demonstrating no further gains beyond eight channels for any speech recognition measure (vowels, consonants, monosyllables, and sentences) or subjective sound quality.

Since these initial studies, the field has seen several significant changes, including (a) expansion of CI criteria with many recipients now having significant residual hearing before and after surgery and potentially better underlying neural populations to stimulate (e.g., Carlson et al., 2011; Leigh et al., 2016; Holder et al., 2018); (b) improved surgical techniques with less traumatic electrode array insertion using slimmer, more flexible electrode arrays (e.g., Wanna et al., 2014); (c) advances in signal processing strategies (e.g., Skinner et al., 2002; Riss et al., 2014; Reynolds and Gifford, 2019); (d) improvement in postoperative computerized tomography (CT) imaging coupled with advancement in image processing, allowing specification of postoperative electrode array position for scalar location and distance to the modiolus (e.g., Noble et al., 2012); and (e) recognition that good CI performers often demonstrate ceiling effects on speech recognition measures in quiet (e.g., Gifford et al., 2008; Dunn et al., 2020).

In light of these surgical, technological, and methodological advancements, several studies have since reexamined the long-standing contention that no more than 8–10 independent channels are needed to maximize CI recipients' performance. First, a group of ten adult CI recipients with Cochlear devices, nine with precurved electrode arrays and one with a straight electrode array, demonstrated significantly higher speech recognition scores with 22 active electrodes versus 12 active electrodes (Croghan et al., 2017). Importantly, however, Croghan et al. (2017) used a closed set matrix sentence test rather than open set speech recognition materials, thus, participants selected from trained stimuli rather than recognition of novel speech. Second, a group of eight adult CI recipients with Cochlear devices and precurved electrode arrays demonstrated significant improvements for sentences in quiet scores and on speech recognition thresholds in noise [dB signal-to-noise ratio (SNR)] using sentences with 20 versus 8 active electrodes (Schvartz-Leyzac et al., 2017). Importantly, both of these studies used 8-maxima in an n-of-m strategy, irrespective of the number of active electrodes, which is an important distinction because stimulation was limited to only the 8 channels with the greatest spectral energy within each stimulation cycle, even with 20 electrodes active across the electrode array (i.e., 8-of-20). Further, neither study verified electrode array location via imaging for their recipients with precurved arrays, an electrode design associated with scala tympani-scala vestibuli (ST-SV) translocation in up to 42% of cases (Wanna et al., 2014). Translocated electrode arrays are known to cause more trauma to intracochlear structures during insertion and often lead to below-average audiologic outcomes, including lower speech recognition scores and hearing preservation rates (e.g., Finley et al., 2008; O'Connell et al., 2016; Shaul et al., 2018).

In three recent studies, our group revisited auditory perception and sound quality as a function of the number of channels to consider different electrode types and electrode placement factors via imaging. First, we investigated a group of 11 Cochlear recipients with a precurved electrode array verified to be entirely within scala tympani (ST; Berg et al., 2019a). Participants demonstrated significant performance improvements with 22 active electrodes and 16-maxima using an n-of-m strategy (i.e., 16-of-22) versus programs using 4–10 electrodes active with the CIS strategy (Berg et al., 2019b). In the electrode conditions using the CIS strategy, we observed significantly higher performances for 16 versus 8 active electrodes for monosyllable recognition and sentence recognition in noise at +5 dB SNR. We hypothesized that this performance improvement results from lower electrode-to-modiolus distance afforded by precurved electrode arrays because closer electrode-to-modiolus distances require less charge for upper stimulation levels (e.g., Saunders et al., 2002; Cohen, 2009; Davis et al., 2016). Indeed, electrode-to-modiolus distance and charge are essential factors as a lower charge has been shown to yield less channel interaction (e.g., Chatterjee and Shannon, 1998). Studies have also shown a significant relationship between electrode arrays with lower electrode-to-modiolus distances, and better spectral resolution and higher word recognition scores (Berg et al., 2019b; Chakravorti et al., 2019; Perkins et al., 2021).

In a second study with 18 Cochlear recipients, we investigated the effect of electrode placement on the number of channels required for optimal speech recognition. As verified by imaging, 11 participants had straight electrode arrays verified to be completely within ST and 7 participants had precurved electrode arrays that were translocated at least partially into scala vestibuli (Berg et al., 2020). The recipients with straight electrode arrays in ST and those with translocated precurved electrode arrays demonstrated asymptotic speech recognition scores with 8–10 active electrodes, consistent with previous literature (e.g., Fishman et al., 1997; Friesen et al., 2001; Shannon et al., 2011) and in contrast to our work with precurved electrode arrays in ST. Given these results, we hypothesized that straight and translocated precurved electrode arrays have less channel independence because the lateral wall and/or scala vestibuli (SV) placement is further from target auditory nerve endings in the modiolus.

In a third study, we investigated ten postlingually deafened adult CI recipients—verified by imaging to be entirely in ST—with either the MED-EL FLEX28 or STANDARD (Innsbruck, Austria), the longest straight electrode arrays (28–31.5 mm) featuring the largest interelectrode distances (2.1–2.4 mm; Berg et al., 2021). The results showed that these recipients did not demonstrate statistically significant improvements beyond eight channels on measures of speech recognition at the group level. However, nine of the ten participants did show clinically significant improvements (≥10%; Bierer et al., 2016) on at least one measure with more than eight channels. Participants also showed noticeable improvements in sound quality ratings with up to 12 channels at the group level. These findings suggest that a greater interelectrode distance could partially compensate for the adverse effects on channel independence, typically associated with straight arrays that feature larger electrode-to-modiolus distances than precurved arrays.

In our previous investigation of channel independence in Cochlear precurved electrode array recipients, we systematically increased the number of channels and maxima to understand their effects on speech recognition (Berg et al., 2019a). The results showed gains in performance with up to 16 channels using a CIS strategy. We also saw significant speech recognition improvements with 16- versus 8-maxima with 22 channels using an n-of-m strategy. From these results, we hypothesized that increasing the number of spectral peaks chosen with each stimulation cycle could provide additional usable spectral information that could help explain why the gains in performance that we saw were the greatest on tasks that are highly spectral dependent (i.e., speech in noise). Therefore, in the current study, we also investigated the effect of increasing the number of maxima on performance using AB precurved electrode arrays to be able to compare with the results of our previous investigation with Cochlear precurved electrode arrays.

Across manufacturers, there are two broad categories of commercially available electrode design types, straight electrode arrays that follow along the lateral wall and precurved electrode arrays that are designed to sit close to the modiolus (Dhanasingh and Jolly, 2017).The AB Mid-Scala electrode array is precurved but designed to be less tightly wrapped around the modiolus than other precurved arrays and rest in the middle of the ST with a targeted insertion depth of 420 deg (Boyle, 2016). The Mid-Scala electrode array is 18.5 mm total in length, has an active length of 15 mm, and features a 0.975 mm spacing between electrode contacts when measured from midpoint to midpoint of neighboring electrode contacts. Importantly, no prior studies have systematically investigated channel independence in a group of Mid-Scala recipients.

The current study was needed to determine if Mid-Scala electrode recipients are afforded greater channel independence than straight electrode recipients due to its precurved design (Croghan et al., 2017; Schvartz-Leyzac et al., 2017; Berg et al., 2019a) or if the mid-ST placement limits Mid-Scala recipients to 8–10 channels as seen in previous studies with straight electrode recipients (Fishman et al., 1997; Friesen et al., 2001; Shannon et al., 2011; Berg et al., 2020, 2021). Thus, the goals of this study were to investigate the number of channels (experiment I) and maxima (experiment II) available to AB recipients, who have a Mid-Scala electrode array placed entirely within ST.

For experiment I, we hypothesized that AB Mid-Scala recipients would perform similarly to CI recipients with Cochlear precurved electrode arrays in ST because of their close mean electrode-to-modiolus distance, which could demonstrate more than 8–10 independent channels as shown with previous generation devices. In addition, we hypothesized that AB Mid-Scala recipients would reach performance asymptotes with fewer than 16 independent channels because the Mid-Scala electrode array is designed to sit in the middle of ST, less tightly wrapped around the modiolus than the Cochlear precurved electrode arrays. For experiment II, we hypothesized that AB Mid-Scala recipients would perform similarly to recipients with Cochlear precurved electrode arrays in ST because of their closer mean electrode-to-modiolus distance. This could allow these recipients to see additional performance gains with more than 8-maxima. Similar to experiment I, we also hypothesized that AB Mid-Scala recipients in experiment II would reach asymptotes in performance with fewer than 16-maxima because the Mid-Scala electrode array is not designed to rest as close to the modiolus as the Cochlear precurved electrode arrays.

II. EXPERIMENT I

A. Study participants

All 15 participants were postlingually deafened adult CI recipients with AB CI systems. Nine participants had 90K Advantage CI systems, four had Ultra CI systems, and two had Ultra 3D CI systems. Of the Ultra and Ultra 3D CI system recipients, all six had the Version 1 device that was recalled by AB in March 2020; however, all of the study participants showed stable performance clinically in the months leading up to study participation. Therefore, we believe that the problems associated with Version 1 failures (i.e., decline in speech recognition, poor aided detection thresholds, and absent electrically evoked compound action potentials) did not affect the study results. Ultra participants 12 and 14 have since been revised and reimplanted with Version 2 devices, however, signs of failure did not begin with their Version 1 devices used in this study until at least one year after participation. Additionally, both participants returned to within test-retest of their peak speech recognition performance but did not exceed prior performance with their Version 1 device following revision with their Version 2 device. All of the participants had all 16 electrodes within ST as confirmed by postoperative CT scans and image analysis (e.g., Noble et al., 2012, 2014). Mean electrode-to-modiolus ( $\bar{m}$ ) distances for each electrode contact were calculated for each participant during imaging analyses. Inclusion criteria required at least 6 months of CI experience and the use of a full electric bandwidth (i.e., no acoustic stimulation) and at least 13 active electrodes in their clinical program. Table I provides demographic information for all of the participants.

TABLE I.

The participant demographics, including age in years, biological sex, ear tested, preoperative and postoperative pure tone average (PTA) for 250, 500, 1000, 2000, and 4000 Hz are displayed in the top half of the table. PTA data for participant 9 was unavailable as they were implanted and followed clinically at an outside center. The pulse duration (μs) in clinical program, channel stimulation rate (pps), active electrodes in clinical program, mean electrode-to-modiolus distance ( $\bar{m}$ ) in millimeters (mm), and CI experience in months are displayed in the bottom half of the table. The electrodes deactivated in clinical program are notated in parentheses. The $\bar{m}$ listed includes all 16 electrode contacts. Asterisks (“*”) in the “Identification” column denote participants that completed experiments I and II. All of the other participants only completed experiment I. N/A, nonapplicable.

Participant demographics
Participant	Age in years	Sex	Ear tested	Preoperative PTA	Postoperative PTA
1	42	M	L	99	102
2	69	M	R	65	114
3	39	F	L	81	105
4	76	M	L	78	109
5	63	F	R	98	104
6	57	M	L	65	116
7	42	F	L	89	92
8	77	M	L	106	116
9*	46	F	R	N/A	N/A
10*	80	M	R	91	107
11*	55	F	R	69	93
12*	33	F	R	76	78
13*	80	M	L	86	117
14*	66	M	L	100	115
15*	26	F	L	83	114
Mean	56.7years	7 F/8 M	9 L/6 R	85 dB HL	106 dB HL
CI device demographics
Participant	Clinical pulse duration (μs)	Channel stimulation rate (pps)	Number of active electrodes in clinical program [deactivated electrode(s)]	Mean electrode-to-modiolus distance ( $\bar{m}$ )	CI experience in months
1	41.3	1614	16	0.43	38
2	41.3	1614	14 (15 and 16)	0.58	29
3	30.5	2184	14 (15 and 16)	0.32	45
4	18.9	3535	16	0.46	54
5	25.1	2652	16	0.47	16
6	19.8	3375	16	0.57	10
7	25.1	2652	16	0.93	34
8	40.4	1650	16	0.29	11
9*	44.9	1485	16	0.51	46
10*	23.3	2855	16	0.45	6
11*	18.9	3535	14 (15 and 16)	0.58	17
12*	43.1	1547	13 (14–16)	0.59	17
13*	23.3	2855	14 (15 and 16)	0.35	44
14*	20.7	3228	16	0.51	6
15*	23.3	2855	15 (16)	0.22	41
Mean	29.3 μs	2509 pps	N/A	0.48 mm	27.5 months

Open in a new tab

B. Methods

Experiment I and II activities were completed using Institutional Review Board (IRB) approved protocols at Vanderbilt University and Vanderbilt University Medical Center. To establish baseline performance, all of the participants were tested using their clinical program settings with 16 active electrodes and the Optima-S speech coding strategy prior to beginning experimentation; that is, this was the clinical default strategy used by all of the participants at the time of enrollment. Most current CI processing strategies are based on continuous interleaved sampling (CIS; Wilson, 1991), which extracts the temporal envelope for bandpass filters associated with each electrode and then modulates a pulse train with the channel-specific amplitude envelope. Frequency-specific information is delivered via place of stimulation because stimulation rate is kept constant across channels in CIS-based strategies. HiRes is the version of high-rate CIS that is available for use with AB's current devices. AB also offers newer speech coding strategies that use current steering to manipulate spectral coding (Fidelity-120 and Optima). For all three strategies (HiRes, Fidelity-120, and Optima), AB offers the option to choose between sequential (S) and paired (P) stimulation. Optima-S is the current default strategy used clinically due to demonstrated performance improvements (Reynolds and Gifford, 2019). Figure 1 displays the specific electrodes activated to achieve the spatially selective programs.

FIG. 1. — The channel deactivation methods and associated frequency allocations for all of the conditions. In cases for which the participant had an electrode(s) deactivated in their clinical program, we chose to activate the closest available electrode to maintain the greatest spatial separation between activated electrodes. For example, if electrode 15 elicited a nonauditory percept (e.g., facial stimulation), electrode 16 would be activated instead for the 8-, 10-, and 12-channel conditions.

Study programs were created using the BEPS+ (Bionic Ear Programming System plus) software provided by AB (2014). All of the study programs used the HiRes speech coding strategy. The participant's pulse duration, taken from their clinical program, was kept constant for all of the study programs (shown in Table I). The spatially selective programs were based on the deactivation methods of Friesen et al. (2001); however, this was not a direct replication as they had maintained the participants' everyday frequency allocation map throughout all of the conditions. Instead, the present study—as well as our previous study investigating musical sound quality as a function of the number of channels (Berg et al., 2019b)—automatically reallocated the frequency map based on the number of active electrodes as would occur in a clinical manipulation. In other words, although the input frequency range was held constant across the different experimental conditions (238–8054 Hz), as the number of active electrodes was reduced, the frequency bands allocated for each channel were broader for each active electrode as compared to the clinical frequency allocations.

All 15 participants used the same research Harmony sound processor during study participation, which was required for BEPS+ compatibility. The T-mics on all research Harmony sound processors were verified using a listening check by the research assistant prior to the start of each study session and verified via CI-aided sound field audiometric thresholds in the range of 20–30 dB hearing level (HL) from 250 through 6000 Hz. Participants with electrodes deactivated in their clinical program were left deactivated during the study. Specifically, participant 15 had one electrode (E16) deactivated clinically, participants 2, 3, 11, and 13 had two electrodes (E15 and E16) deactivated clinically, and participant 12 had three electrodes (E14–E16) deactivated clinically (see Table I). Upper stimulation levels were globally adjusted to achieve an equivalent loudness across all of the study programs using a loudness scaling chart. Loudness ratings were completed in “live speech mode” and all of the study programs were compared to the participant's clinical program. Threshold levels were not adjusted from the participant's clinical program; however, as described above, CI-aided detection thresholds were verified prior to experimentation.

The channel condition and assessment measure order were counterbalanced. All of the testing was completed acutely; that is, there was no acclimatization period provided for the different channel conditions, which is consistent with past studies (e.g., Fishman et al., 1997; Friesen et al., 2001; Shannon et al., 2011; Croghan et al., 2017; Schvartz-Leyzac et al., 2017). All of the testing was performed partially blinded. Participants were blinded to the program that they were listening with during the study. However, the same research assistant programmed and tested all of the participants and was aware of which program that each participant was listening with during testing.

The five CI experimental programs and the participant's clinical program were tested using a loudspeaker at 0-deg azimuth and 1 m from the participant in a single walled sound booth using Consonant-Nucleus-Consonant (CNC) words, AzBio sentences in quiet and in noise at +5 dB SNR using 20-talker babble noise, and vowels (closed set). Participants completed one 50-word list of CNC words and two 20-sentence lists of AzBio sentences, one in quiet and one at +5 dB SNR, for each channel condition. Speech recognition lists were randomized and were not repeated for any given participant. Subjective sound quality judgments were assessed using a visually presented ten-point scale (1 = very poor;10 = very good), in which the participant rated the overall sound quality of the list of CNC words, AzBio sentences in quiet, and +5 dB SNR for each condition. Subjective sound quality ratings were assessed immediately following the corresponding speech recognition measure. For example, participants completed a list of CNC words and then made a sound quality judgment based on the list of CNC words. Participants were allowed to rate the sound quality in between two numbers. For example, if a participant noted that the sound quality between a “3” and a “4,” a rating of 3.5 was recorded. Vowel stimuli consisted of 13 synthetic vowels in /bVt/ format (“bait, Bart, bat, beet, Bert, bet, bit, bite, boat, boot, bought, bout, but”). Vowel duration was held constant across tokens (90 ms) such that the vowel length could not serve as a cue. Target stimuli were presented at a calibrated level of 60 dB sound pressure level (SPL) from the loudspeaker at 0-deg azimuth placed 1 m from the seated participant.

C. Results

A series of univariate general linear models (GLMs) was completed with the number of channels as a fixed factor, mean electrode-to-modiolus distance ( $\bar{m}$ ) as a random factor, and speech/auditory perception scores and sound quality ratings as the dependent variables. In an attempt to minimize the influence of floor and ceiling effects, CNC and AzBio sentence recognition scores were converted from percent correct to rationalized arcsine units or RAU (Studebaker, 1985) prior to all of the analyses. Sound quality ratings for CNC and AzBio sentences in quiet and in noise were converted to z-scores in Statistical Package for the Social Sciences (SPSS) version 27 using the standard formula z = (x − μ)/σ, where x is the individual's score, μ is the group mean, and σ is the group standard deviation prior to analysis to mitigate inter-rater variability. Post hoc comparisons used a Holm-Sidak statistic to adjust for multiple comparisons. For all of the Pearson correlations, a bias adjustment was applied and 95% confidence intervals are displayed. A summary of the results is displayed in Table II and described in further detail below.

TABLE II.

The GLM results for speech recognition performance and sound quality ratings for the maxima conditions. Post hoc comparisons used a Holm-Sidak statistic to adjust for multiple comparisons. Degrees of freedom (d.f.).

Channels
Test	d.f.	F ratio	p	Post hoc (p < 0.05)
CNC words RAU	4, 56	49.31	<0.001	4 < 8,10,12,16
CNC sound quality z-score	4, 57	24.43	<0.001
AzBio quiet RAU	4, 57	36.64	<0.001	4 < 8,10,12,16
AzBio quiet sound quality z-score	4, 57	16.98	<0.001
AzBio noise RAU	4, 52	19.00	<0.001	4 < 8,10,12,16
AzBio noise sound quality z-score	4, 40	7.47	<0.001
Vowels percent correct	4, 55	22.65	<0.001	4 < 10,12,16
$\bar{m}$
Test	d.f.	F ratio	p	Post hoc (p < 0.05)
CNC words RAU	13, 52	11.54	<0.001
CNC sound quality z-score	13, 52	9.21	<0.001
AzBio quiet RAU	13, 52	27.45	<0.001
AzBio quiet sound quality z-score	13, 52	10.35	<0.001
AzBio noise RAU	13, 52	22.18	<0.001
AzBio noise sound quality z-score	13, 52	17.46	<0.001
Vowels percent correct	13, 52	6.86	<0.001
Channels × $\bar{m}$
Test	d.f.	F ratio	p	Post hoc (p < 0.05)
CNC words RAU	52, 5	0.48	0.918
CNC sound quality z-score	52, 5	0.30	0.989
AzBio quiet RAU	52, 5	0.39	0.961
AzBio quiet sound quality z-score	52, 5	0.27	0.995
AzBio noise RAU	52, 5	1.00	0.571
AzBio noise sound quality z-score	52, 5	0.08	0.999
Vowels percent correct	52, 5	0.75	0.736

Open in a new tab

The mean scores for recognizing CNC words, AzBio sentence recognition scores in quiet and in noise, and vowels are displayed in percent correct in Fig. 2. The mean sound quality ratings for CNC words and AzBio sentences in quiet and in noise are shown in Fig. 3. The error bars represent ±1 standard error of the mean (SEM). The horizontal lines across the 16-channel condition represent the mean scores for the participants with their clinical program using 16 channels and the Optima-S speech coding strategy. The condition using the participant's clinical program was used for comparison purposes only and, thus, was not included in statistical analyses.

FIG. 2. — The individual speech recognition outcomes in percent correct for the 15 participants across all of the channel conditions for CNC words (A), AzBio sentences in quiet (B) and in noise (C), and vowels (D). The mean for each channel condition is bold and error bars are ±1 SEM. The horizontal dotted lines across each measure represent mean scores for the 15 participants with their clinical program using 16 channels and the Optima-S speech coding strategy.

FIG. 3. — The individual sound quality ratings for the 15 participants across all of the channel conditions for CNC words (A), AzBio sentences in quiet (B), and AzBio sentences in noise (C). The mean for each channel condition is bold and error bars are ±1 SEM. The horizontal dotted lines across each measure represent mean scores for the 15 participants with their clinical program using 16 channels and the Optima-S speech coding strategy.

D. CNC word recognition and sound quality

The mean CNC word recognition was 25%, 60%, 70%, 71%, and 69% for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For CNC word recognition, there was a significant main effect of the number of channels [F_(4,56) = 49.31, p < 0.001, $η_{p}^{2}$ = 0.78] and a significant main effect of $\bar{m}$ [F_{(13, 52)} = 11.54, p < 0.001, $η_{p}^{2}$ = 0.74] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.48, p = 0.918, $η_{p}^{2}$ = 0.83]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed significant performance differences between 4- and 8-channels (p = 0.017), 4- and 10-channels (p = 0.005), 4- and 12-channels (p = 0.004), and 4- and 16-channels (p = 0.005). No other pair-wise comparisons were statistically significant. For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between CNC word recognition and $\bar{m}$ (r = 0.02, p = 0.874, 95% CI = −0.21–0.24).

The mean CNC sound quality ratings were 2.5, 6.1, 6.9, 6.8, and 6.8 for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For CNC sound quality ratings, there was a significant main effect of the number of channels [F_(4,57) = 24.43, p < 0.001, $η_{p}^{2}$ = 0.63] and a significant main effect of $\bar{m}$ [F_(13,52) = 9.21, p < 0.001, η_p² = 0.70] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.30, p = 0.989, $η_{p}^{2}$ = 0.76]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed no significant performance differences beyond 4-channels (p > 0.05 for all comparisons). For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between CNC word recognition and $\bar{m}$ (r = –0.04, p = 0.746, 95% CI = –0.26 – 0.19).

E. AzBio sentence recognition in quiet and sound quality

The mean AzBio sentence recognition in quiet was 41%, 72%, 74%, 76%, and 78% for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For AzBio sentence recognition in quiet, there was a significant main effect of the number of channels [F_(4,57) = 36.64, p < 0.001, $η_{p}^{2}$ = 0.72] and a significant main effect of $\bar{m}$ [F_(13,52) = 27.45, p < 0.001, $η_{p}^{2}$ = 0.87] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.40, p = 0.961, $η_{p}^{2}$ = 0.80]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed significant performance differences between 4- and 8-channels (p = 0.032), 4- and 10-channels (p = 0.019), 4- and 12-channels (p = 0.016), and 4- and 16-channels (p = 0.011). No other pair-wise comparisons were statistically significant. For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between AzBio sentence recognition in quiet and $\bar{m}$ (r = 0.13, p = 0.271, 95% CI = −0.10–0.35).

The mean AzBio sound quality in quiet ratings were 3.1, 5.8, 6.4, 6.7, and 7.1 for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For AzBio sound quality in quiet ratings, there was a significant main effect of the number of channels [F_(4,57) = 16.98, p < 0.001, $η_{p}^{2}$ = 0.54] and a significant main effect of $\bar{m}$ [F_(13,52) = 10.35, p < 0.001, $η_{p}^{2}$ = 0.72] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.27, p = 0.995, $η_{p}^{2}$ = 0.73]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed no significant performance differences beyond 4-channels (p > 0.05 for all of the comparisons). For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between AzBio sound quality in quiet ratings and $\bar{m}$ (r = –0.03, p = 0.832, 95% CI = −0.25–0.20).

F. AzBio sentence recognition in +5 dB SNR and sound quality

The mean AzBio sentence recognition in noise was 4%, 25%, 26%, 29%, and 28% for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For AzBio sentence recognition in noise, there was a significant main effect of the number of channels [F_(4,55) = 19.00, p < 0.001, $η_{p}^{2}$ = 0.58] and a significant main effect of $\bar{m}$ [F_(13,52) = 22.18, p < 0.001, $η_{p}^{2}$ = 0.85] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 1.00, p = 0.571, $η_{p}^{2}$ = 0.91]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed significant performance differences between 4- and 8-channels (p = 0.016), 4- and 10-channels (p = 0.009), 4- and 12-channels (p = 0.007), and 4- and 16-channels (p = 0.011). No other pair-wise comparisons were statistically significant. For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between AzBio sentence recognition in noise and $\bar{m}$ (r = 0.11, p = 0.347, 95% CI = −0.12–0.33).

The mean AzBio sound quality in noise ratings were 1.1, 2.7, 3.4, 3.3, and 3.7 for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For AzBio sound quality in noise ratings, there was a significant main effect of the number of channels [F_(4,40) = 7.47, p < 0.001, $η_{p}^{2}$ = 0.43] and a significant main effect of $\bar{m}$ [F_(13,52) = 17.46, p < 0.001, $η_{p}^{2}$ = 0.81] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.08, p = 0.999, $η_{p}^{2}$ = 0.44]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed no significant performance differences beyond 4-channels (p > 0.05 for all of the comparisons). For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between AzBio sound quality in quiet ratings and $\bar{m}$ (r = –0.04, p = 0.713, 95% CI = −0.27–0.19).

G. Vowel recognition

The mean vowel recognition was 20%, 40%, 43%, 47%, and 52% for the 4-, 8-, 10-, 12-, and 16-channel conditions, respectively. For vowel recognition, there was a significant main effect of the number of channels [F_(4,55) = 22.65, p < 0.001, $η_{p}^{2}$ = 0.62] and a significant main effect of $\bar{m}$ [F_(13,52) = 6.86, p < 0.001, $η_{p}^{2}$ = 0.63] but no significant interaction between the number of channels and $\bar{m}$ [F_(52,5) = 0.75, p = 0.736, $η_{p}^{2}$ = 0.89]. For the main effect of the number of channels, post hoc pair-wise comparisons revealed significant performance differences between 4- and 10-channels (p = 0.028), 4- and 12-channels (p = 0.013), and 4- and 16-channels (p = 0.006). No other pair-wise comparisons were statistically significant. For the main effect of $\bar{m}$ , a post hoc Pearson correlation revealed no significant relationship between vowel recognition and $\bar{m}$ (r = 0.10, p = 0.400, 95% CI = –0.13–0.32).

H. Discussion

AB recipients with Mid-Scala electrode arrays verified to be completely within ST did not demonstrate performance improvements beyond 4–8 spatially selective channels with an exception for vowels for which asymptotic performance was reached with ten channels. For sound quality measures, participants did not demonstrate significant improvements in subjective sound quality ratings beyond four channels. These results replicate the findings of previous studies (e.g., Fishman et al., 1997; Friesen et al., 2001; Shannon et al., 2011) and suggest that precurved electrode arrays placed farther from the modiolus may be more similar to straight electrode arrays located closer to the lateral wall. Consistent with this supposition, these 15 Mid-Scala recipients demonstrated similar asymptotic speech understandings as observed for recipients with Cochlear straight electrode arrays in ST (Berg et al., 2020). Moreover, electrode-to-modiolus distance did not have a significant relationship with performance for these AB recipients on any measure. These findings are in contrast to our recent work investigating Cochlear precurved electrode arrays, which found continuous performance improvements with up to 16-of-22 channels (Berg et al., 2019a). Interestingly, these AB recipients and the sample of full ST Cochlear precurved electrode array recipients from our previous investigation have a similar mean (0.48 mm and 0.49 mm, respectively), median (0.47 mm and 0.44 mm, respectively), and ranges (0.22–0.93 mm and 0.20–0.80 mm, respectively) of electrode-to-modiolus distances. We postulate that either differences in channel stimulation rate or angular insertion depth of the electrode array may be driving the observed channel interaction differences between Cochlear precurved and AB precurved devices. AB devices are designed to stimulate at the highest possible rate and results in much shorter inter-pulse intervals compared to Cochlear devices, which use a fixed, comparatively lower rate of 900 Hz. Further, AB devices are designed to have shallower insertion depths than Cochlear due to their shorter array lengths, which contributes to differences in the distribution of electrode contacts across the array between the two manufacturers.

III. EXPERIMENT II

AB recipients with well-placed precurved electrode arrays demonstrated asymptotic performance with 4–10 channels in experiment I, which is in contrast to our previous findings with Cochlear recipients with well-placed precurved electrode arrays who demonstrated asymptotic performance with up to 16-of-22 channels. Therefore, the purpose of experiment II was to investigate the number of maxima available to AB recipients when an n-of-m strategy is employed. We hypothesized that AB Mid-Scala recipients would reach asymptotes in performance with fewer than 16-maxima because the Mid-Scala electrode array is not designed to rest as close to the modiolus as the Cochlear precurved electrode arrays.