Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2013 Jul 15;134(2):EL244–EL250. doi: 10.1121/1.4814945

When intelligibilities of paired speech bands do not behave the way they are supposed to

Richard M Warren 1,a), James A Bashford Jr 1, Peter W Lenz 1
PMCID: PMC3732309  PMID: 23927232

Abstract

Two rectangular 1/3-octave passbands were derived from different spectral regions of everyday sentences, with the intelligibility of one band approximately twice the others. Both passbands were then filtered to produce a series of narrower rectangular passbands. Each of the original 1/3-octave passbands in turn served as the fixed bandwidth “pedestal” and was paired with each of the series of narrower passbands of the other band. Remarkably, dual band intelligibilities were the same, regardless of which band served as pedestal, so the summed bandwidths determined intelligibility: The summed intelligibilities were irrelevant. Implications of this paradoxical “summed bandwidth rule” are discussed.

Introduction

Comprehension of speech requires the combining of information present at different spectral locations. One way to measure this integration is to employ a pair of passbands with different center frequencies and compare the intelligibility of each band when heard alone with the intelligibility of the bands when heard together. In order to obtain direct measurements of this cross-frequency integration, it is necessary to eliminate out-of-band contributions by using extremely steep transition bands.1 A previous study employed six one-octave rectangular passbands that consisted of sentences that spanned the speech spectrum from 0.25 to 8 kHz. Intelligibility scores were obtained for the individual rectangular passbands and for each of their 15 possible pairings.2 It was found that whether the bands were contiguous or had center frequencies separated by 2, 3, or 4 octaves, the intelligibility of the dual bands was always hyperadditive, that is, they were always greater than the sum of the scores for the individual components when heard alone. The extent of hyperadditivity occurring with paired 1-octave passbands was modest, always being less than twice that of the sum of the individual components. Much greater hyperadditivities were obtained for sentences that were reduced to narrower passbands: For example, when two 1/3-octave rectangular passbands centered at 1 and 3 kHz were heard both singly and together, the paired intelligibility was five times greater than the sum of the individual components.3 These observations suggest that rules governing the integration of cross-frequency speech information may vary, depending upon the width of passbands. The present study examined this possibility in more detail, using a “pedestal” procedure in which one of the two rectangular bands (the pedestal) had a fixed bandwidth, and the other had a series of progressively narrower bandwidths. The center frequencies for the bands were selected such that when heard alone at the pedestal bandwidth, one of the bands had an intelligibility approximately twice that of the other. Hence, the dual-band intelligibilities would be expected to differ when the roles of the pedestal and the variable band were reversed. As we shall see, the results obtained were quite surprising.

Methods

Subjects

A total of 152 listeners participated in the study: Two groups of 28 listeners both in experiment 1 and in experiment 2, and two groups of 20 listeners in experiment 3. Differences in numbers of subjects across experiments were determined in part by differences in balancing constraints and the number of conditions employed. Listeners were undergraduate college students who were paid for their participation. They ranged in age from 18 to 29 and were native monolingual English speakers who had normal bilateral hearing, as measured by pure tone thresholds of 20 dB hearing level (HL) or better at octave frequencies from 250 to 8000 Hz.

Stimulus preparation

Experiments 1 and 2 employed passbands derived from the 100 (10 lists of 10) CID “everyday” sentences4 (e.g., “I'd like some ice cream with my pie.”), which contain 500 keywords (50 keywords per list) that are used for scoring. Stimuli employed in experiment 3 were derived from the low-probability (LP) sentences (e.g., “Mary could not discuss the tack.”), from the 8 equivalent forms of the revised (R-) SPIN sentences.5 The sentence-final keywords of the LP stimuli were used for scoring. All sentences were recorded by a male speaker of American English having no evident regional accent, with an average voicing frequency of approximately 100 Hz. The digital recordings (44.1 kHz sampling) were then transduced using a Sennheiser HD 280 Pro headphone. The slow-peak levels of the sentences were matched to within ±0.2 dBA using a flat-plate coupler in conjunction with a Brüel & Kjaer 2230 sound-level meter. In all experiments, bandpass filtering was performed using subsequent low- and high-pass filtering of sentences using the matlab function fir 1 with order n = 10 000 that produced slopes of approximately 4 dB/Hz. The resulting passbands were then equated again to ±0.2 dBA sound pressure level (SPL). A second series of subsequent low- and high-pass filtering again with n = 10 000 was performed at the 50-dB downpoints of the passbands' transition bands (as measured using Metric Halo's Spectrafoo) to ensure that stopband attenuation was complete.

In each of the three experiments, speech stimuli were divided into a series of blocks, with individual subjects receiving a separate block of sentences for each experimental condition, plus an additional practice block of sentences before the first experimental condition was presented. Experiments 1 and 2 each employed one practice and six experimental blocks. Since the 100 CID sentences used in those experiments have varying numbers of keywords, they were parsed into seven blocks composed of 13 to 15 sentences having from 70 to 73 keywords for use in the scoring of intelligibility. Experiment 3 employed one practice block of sentences and nine blocks for the experimental conditions. Hence, the 200 LP SPIN sentences used in that experiment were parsed into ten blocks of 20, and the 20 sentence-final keywords in each block were used to compute percent intelligibility. In all experiments, the listeners' assignment of sentence blocks to practice and experimental conditions was pseudo-randomized, with the restriction that each block occurred in each serial order of presentation an equal number of times across listeners (four times in experiments 1 and 2, and twice in experiment 3).

Sound files were mixed using matlab to produce the various dual-band stimuli needed for each experiment. These stimuli were then transferred to an audio compact disk and subsequently presented to listeners using a Marantz Model PMD 320 CD player, with output amplified by a Mackie Microseries 2-VLZ audio mixer and delivered diotically through Sennheiser HD 280 Pro headphones. In all experiments, the amplitude of each of the individual speech bands was set at a slow-peak amplitude of 60 dBA as measured by a Brüel & Kjaer model 2230 sound-level meter.

General procedure

Listeners who successfully passed an audiometric screening (20 dB HL or better at all octave frequencies for each ear) were then tested individually while seated in a large double-wall audiometric chamber with the experimenter. Listeners in each experiment first received a practice block and then a formal block of sentences comprised of the dual bands having matching maximum widths (4 semitones in experiments 1 and 2, and 12 semitones in experiment 3). In subsequent blocks the width of one band (experiments 1 and 3) or of both bands (experiment 2) was systematically reduced until the final block of sentences in which only the single pedestal band remained. Listeners were instructed to repeat the sentences as accurately as possible and to guess if unsure. The experimenter noted which keywords were repeated correctly and recorded the proportion of correct responses that occurred for each of the sentence sets. Testing occurred in a single session lasting about 30 min. Since preliminary observations indicated that some of the results of this study would be inexplicable by current theories, multiple tests of replicability both within and across experiments were built into the experimental design.

Experiment 1: Intelligibility of dual band stimuli with 4-semitone pedestals

This experiment used the rectangular bands of the CID everyday sentences centered at 1 kHz and at 3 kHz. Separate groups of listeners heard each of these frequency bands serving as a 4-semitone pedestal. These were paired with a second band having variable bandwidths presented in the order: 4, 3, 2, 1, or 0.5 semitones. This was followed by the 0-semitone condition that measured the intelligibility of each of the pedestals when heard alone. Before initiating the formal experiments, there was a separate practice stimulus employing the paired 1-kHz and 3-kHz bands at 4 semitones to familiarize listeners with dual-band speech. A separate group of 28 subjects was employed for each of the two pedestal conditions. In experiment 1a, the pedestal had a center frequency of 1 kHz, and in experiment 1b, the center frequency was 3 kHz. The conditions employed in experiment 1b were identical, including the same sentences in the same order: The only difference was the interchange of the pedestal and variable bands.

Results for experiment 1

Figure 1 shows the equivalence of dual-band intelligibility scores obtained in experiments 1a and 1b. It can be seen that, although it is distinctly counter-intuitive, the intelligibility for each of the paired matched bandwidths was the same regardless of which band served as the pedestal. This occurred despite the finding that the intelligibility of the 3-kHz band when heard alone (15%) was more than twice that of the 1-kHz band (6%). This difference in the stand-alone intelligibility scores was highly significant [F(1,54) = 10.0, p < 0.003].

Figure 1.

Figure 1

Mean intelligibility and standard error limits for the 4-semitone pedestals accompanied by a second passband with variable bandwidths. For further information see text for experiment 1.

Thus, experiment 1 found that the summed intelligibilities of the component bands had no measurable effect upon their dual-band intelligibility for the four variable bandwidths. Rather, intelligibility appears instead to be determined by the summed width of the paired bands. If this empirical “summed bandwidth rule” is valid, then it should apply when the same summed bandwidth is achieved using dual matching bandwidths. Experiment 2 was designed to provide this additional test of the summed bandwidth rule.

In experiments 1a and 1b the dual-band conditions employing 4 semitones plus 2 semitones (sum of 6 semitones) each had an intelligibility of 60% regardless of whether the 1-kHz or 3-kHz band served as the pedestal—if the summed bandwidth rule is valid, the score of approximately 60% should also be found when the 1- and 3-kHz passbands employed in experiment 1 each had the same bandwidth of 3 semitones.

Experiment 2: Testing the validity of the summed bandwidth rule for narrow rectangular passbands

This experiment employed different pairings of the narrow rectangular passbands of the everyday sentences employed in experiment 1. Conditions were kept as close as possible to those employed in the previous experiment except for the crucial difference that there was no pedestal, and the paired 1- and 3-kHz passbands each had identical bandwidths. None of the participants had been employed in experiment 1. The separate groups of 28 subjects in experiments 2a and 2b were presented with the dual matched bandwidths presented in the order of 4, 3, 2, 1, and 0.5 semitones. Listeners in each group also received an individual practice block with bands matched at 4 semitones as in experiment 1. The only difference in the procedures employed in experiments 2a and 2b was that, following presentation of the matched 0.5 semitone bands, the 4-semitone 1-kHz passband was heard alone in experiment 2a, and the 4-semitone 3-kHz passband was heard alone in experiment 2b. By duplicating the paired presentation of experiment 2a with experiment 2b, it is possible to examine replicability of results when fairly large independent groups of 28 individuals are employed.

Results for experiment 2

As shown in Fig. 2, the separate groups in experiments 2a and 2b receiving the paired 3-semitone condition (total bandwidth of 6 semitones) had almost identical intelligibility scores of 60% and 61%. These scores closely matched the 60% intelligibility obtained in experiment 1 for both groups receiving the summed bandwidth of 6 semitones (4+2 or 2+4 semitones), providing an independent support for the validity of the summed bandwidth rule while extending it to paired passbands with matched bandwidths.

Figure 2.

Figure 2

Mean intelligibility and standard error limits for the dual passbands with variable matching bandwidths. For further information see text for experiment 2.

There are other examples demonstrating replicability of the results obtained in experiments 1 and 2. Although not shown along with the matched bandwidths of Fig. 2, when the 1- and 3-kHz passbands were each heard alone at 4 semitones (the last condition in experiments 2a and 2b), their intelligibilities were 5% and 13%, respectively, these values were in good agreement with the corresponding scores of 6% and 15% found in experiment 1. As in experiment 1, the difference in intelligibility for the two 4-semitone pedestals in the stand-alone condition was highly significant [F(1,54) = 21.9, p < 0.001]. Also, both experiments included a condition in which the 1-kHz and 3-kHz bands were heard together at the same bandwidth of 4 semitones—here again there was close agreement in the results obtained: The intelligibility was 75% for the two groups in experiment 1, and averaged 76.5% for the listeners in experiment 2.

Experiment 3: Intelligibility of dual band stimuli with 12-semitone pedestals

Preliminary experiments indicated that when paired CID everyday sentences were used with 12 semitone (one-octave) pedestals, some of the narrower accompanying bandwidths produced ceiling intelligibilities of 95% or better. For this reason, sentences with a lower intelligibility than those used in experiments 1 and 2 were employed. Experiment 3 used the subset of the LP SPIN sentences.5 There is published intelligibility data for these sentences that employed six 1-octave rectangular passbands with center frequencies ranging from 0.25 to 8 kHz when heard alone and with all 15 possible pairings.3 The two 1-octave passbands centered at 1 and 4 kHz were reported to have intelligibilities of 39% and 17%, respectively (summed intelligibility 56%), and when heard together their score was 83%. These bands were synthesized and used as pedestals in experiment 3. Each of these 12-semitone pedestals in turn was accompanied by variable bandwidths in the following order: 12, 10, 8, 6, 4, 2, 1, 0.5, and 0 semitones. Before initiating the formal experiments, there was a practice block of sentences consisting of matching 12-semitone bands to familiarize the listener with the sound of dual-band speech. The final 0-semitone condition provided the intelligibility scores for each of the 12-semitone pedestals when presented alone. Separate groups of 20 subjects were employed for each of the two pedestal conditions. Experiment 3a employed the 1-kHz pedestal, and experiment 3b employed the 4-kHz pedestal.

Results for experiment 3

Figure 3 shows that when heard alone, the 1-kHz pedestal band had an intelligibility approximately twice that of the 4-kHz pedestal (42% vs 18%). The summed intelligibility of the individual passbands was 60%. However, when the one-octave bands were heard together, their intelligibility was only 80%. This extent of hyperadditivity was much less than that found in experiments 1 and 2. Unlike the results obtained in experiment 1 for each of the mismatched bandwidth conditions, the score was higher when the passband with the greater intelligibility was the pedestal. This is consistent with the conventional expectation that the pair with the higher aggregate intelligibility should have the higher score. Hence, experiment 3 indicates that the summed bandwidth rule, that is applicable in experiments 1 and 2 when all bandwidths were 4 semitones or less, does not apply when pedestal bandwidths were increased to 12 semitones.

Figure 3.

Figure 3

Mean intelligibility and standard error limits for the 12-semitone pedestals accompanied by a second passband with variable bandwidths. For further information see text for experiment 3.

Summary and discussion

Why does the summed bandwidth rule apply when passbands are 1/3-octave? And why does not the summed bandwidth rule apply when passbands are one-octave? A clue may be found in a recent study demonstrating dramatic effects of narrow bandwidths upon speech intelligibility.6 In that study, everyday sentences were reduced to 16 rectangular passbands having center frequencies ranging from 0.25 to 8 kHz placed at 1/3-octave intervals. Four arrays were employed having uniform very narrow rectangular bandwidths ranging from 0.5% to 4% of the passbands' center frequencies. When each of the passbands had a width of 2%, the intelligibility of the array was 80%. When these speech bands modulated an array of sixteen noise bands with the same center frequencies and bandwidths, as anticipated, intelligibility decreased dramatically from 80% to 23%. These noise-vocoded bands followed the envelope fluctuation, but vocoding eliminated the fine structure within the envelopes. Of considerable interest to the present study (and perhaps to speech perception in general) was the finding that when each of the noise bands modulated by the 2% bandwidth speech was not confined to that of the speech boundaries but was expanded (smeared) up to 1/3-octave, intelligibility of the vocoded speech increased dramatically from 23% to 83%. This indicates that the extent of the envelope bandwidth can be a potent contributor to intelligibility, even when fine structure is absent within the noise-vocoded envelope.

It seems that the passbands with bandwidths of 1/3-octave or less may play a special role in determining speech intelligibility. Some results obtained in the present study do not seem explicable by current theories and models dealing with speech intelligibility. The apparently paradoxical findings in experiments 1 and 2 are clear: When the paired passbands had mismatched bandwidths of 1/3-octave or less, then the intelligibilities were based upon their total bandwidths (the summed bandwidth rule), with differences in their aggregate intelligibilities found to be irrelevant. However, when the paired passbands in experiment 3 had mismatched bandwidths that included 1-octave passbands, then the results obtained were quite different: The summed bandwidth rule no longer applied, and the dual-band intelligibilities increased when the sum of the bandwidths was kept constant and the summed intelligibilities of passbands were increased. In other words, the low probability one-octave sentences behave the way they are supposed to, despite the lower extent of contextual information concerning their keywords.

It is hoped that further information concerning the properties of speech passbands with bandwidths of 1/3-octave or less will lead to an enhanced understanding of speech mechanisms, so that the current findings may no longer seem paradoxical, but be part of a more comprehensive model.

Acknowledgments

The project described was supported by Award Number R01DC000208 from the National Institute on Deafness and Other Communication Disorders. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Deafness and Other Communication Disorders or the National Institutes of Health.

References and links

  1. Warren R. M., J. A.Bashford, Jr., and Lenz P. W., “ Intelligibility of bandpass filtered speech: Steepness of slopes required to eliminate transition band contributions,” J. Acoust. Soc. Am. 115, 1292–1295 (2004). 10.1121/1.1646404 [DOI] [PubMed] [Google Scholar]
  2. Warren R. M., J. A.Bashford, Jr., and Lenz P. W., “ Intelligibilities of 1-octave rectangular bands spanning the speech spectrum when heard separately and paired,” J. Acoust. Soc. Am. 118, 3261–3266 (2005). 10.1121/1.2047228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Warren R. M., J. A.Bashford, Jr., and Lenz P. W., “ Intelligibility of dual rectangular speech bands: Implications of observations concerning amplitude mismatch and asynchrony,” Speech Comm. 40, 551–558 (2003). 10.1016/S0167-6393(02)00178-4 [DOI] [Google Scholar]
  4. Silverman S. R. and Hirsh I. J., “ Problems related to the use of speech in clinical audiometry,” Ann. Otol. Rhinol. Laryngol. 64, 1234–1244 (1995). [DOI] [PubMed] [Google Scholar]
  5. Bilger R. C., Nuetzel J. M., Rabinowitz W. M., and Rzeczkowski C., “ Standardization of a test of speech perception in noise,” Speech Hear. Res. 27, 32–48 (1984). [DOI] [PubMed] [Google Scholar]
  6. J. A.Bashford, Jr., Warren R. M., and Lenz P. W., “ When noise vocoding can improve the intelligibility of sub-critical band speech,” Proc. Meet. Acoust. 9, 060001 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES