Effects of fundamental frequency and vocal-tract length cues on sentence segregation by listeners with hearing loss

Carol L Mackersie; James Dewey; Lesli A Guthrie

doi:10.1121/1.3605548

. 2011 Aug;130(2):1006–1019. doi: 10.1121/1.3605548

Effects of fundamental frequency and vocal-tract length cues on sentence segregation by listeners with hearing loss

Carol L Mackersie ^1,^a), James Dewey ^1,^b), Lesli A Guthrie ¹

PMCID: PMC3190663 PMID: 21877813

Abstract

The purpose was to determine the effect of hearing loss on the ability to separate competing talkers using talker differences in fundamental frequency (F0) and apparent vocal-tract length (VTL). Performance of 13 adults with hearing loss and 6 adults with normal hearing was measured using the Coordinate Response Measure. For listeners with hearing loss, the speech was amplified and filtered according to the NAL-RP hearing aid prescription. Target-to-competition ratios varied from 0 to 9 dB. The target sentence was randomly assigned to the higher or lower values of F0 or VTL on each trial. Performance improved for F0 differences up to 9 and 6 semitones for people with normal hearing and hearing loss, respectively, but only when the target talker had the higher F0. Recognition for the lower F0 target improved when trial-to-trial uncertainty was removed (9-semitone condition). Scores improved with increasing differences in VTL for the normal-hearing group. On average, hearing-impaired listeners did not benefit from VTL cues, but substantial inter-subject variability was observed. The amount of benefit from VTL cues was related to the average hearing loss in the 1–3-kHz region when the target talker had the shorter VTL.

INTRODUCTION

The reduction in speech understanding in the presence of competing backgrounds depends, in part, on the nature of the background. When the competition consists of spectrally dense noise or unintelligible babble, speech understanding is largely driven by energetic masking caused by the spectral and temporal overlap of the target and competing signals (Assmann and Summerfield, 2004). A second form of masking, known as “informational masking,” may occur as a result of perceptual confusion caused by signal or masker uncertainty or by linguistic interference (Durlach et al., 2003; Schneider et al., 2007; Watson, 2005). When the competing signal consists of a single intelligible talker, the dominant source of interference is informational rather than energetic masking (Brungart, 2001; Brungart et al., 2006).

Acoustic differences between talkers can substantially enhance the recognition of one talker in the presence of a single competing talker, presumably by reducing informational masking (Bregman, 1990; Schneider et al., 2007). It is easier, for example, to recognize a talker when the competing talker has a different gender than when the competing talker has the same gender (Brungart, 2001; Brungart et al., 2001). The two primary differences between talkers of different gender, fundamental frequency (F0) and vocal-tract length (VTL), have been shown to provide robust acoustic cues for perceptual segregation (Darwin et al., 2003; Vestergaard et al., 2009; Vestergaard and Patterson, 2009). The robustness of these cues for people with cochlear hearing loss is less clear, however. The use of F0 and VTL segregation cues by listeners with hearing loss was the focus of the present study.

Fundamental frequency cues

The F0 and harmonically related components of a talker’s speech convey pitch information and contribute to perceived gender, age and size (Smith and Patterson, 2005). Improvement in intelligibility with increasing talker differences in F0 has been observed for both double-vowel stimuli and sentences (Assmann, 1999; Bird and Darwin, 1998; Brokx and Nooteboom, 1982; Darwin et al., 2003). For listeners with normal hearing, improvement in sentence recognition occurs for F0 differences up to 10 to 12 semitones (Bird and Darwin, 1998; Darwin et al., 2003; Drullman and Bronkhorst, 2004).

There is evidence that some listeners with hearing loss are unable to take full advantage of F0 differences between target and competing speech. In a series of studies examining the effectiveness of F0 cues in double-vowel perception, Arehart and her colleagues consistently demonstrated that individuals with hearing loss were less able to use F0 cues than were listeners with normal hearing (Arehart, 1998; Arehart et al., 1997, 2005). Differences between normal-hearing and hearing-impaired listeners have also been observed for sentence materials. Summers and Leek (1998) reported that benefit from increasing F0 differences between two sentences was similar for normal-hearing and hearing-impaired listeners for small F0 differences (2 semitones), but some hearing-impaired listeners were unable to take advantage of a larger F0 difference (4 semitones).

The reduction in F0 benefit among hearing-impaired listeners parallels the impaired pitch discrimination observed for other complex stimuli (Arehart, 1994; Bernstein and Oxenham, 2006; Leek and Summers, 2001; Moore and Peters, 1992). Similarly, hearing-impaired listeners have greater difficulty than do normal-hearing listeners in perceptual segregation of competing melodies and sequences of pure tones or harmonic complexes (Grimault et al., 2000; Grose and Hall, 1996; Mackersie et al., 2001; Rose and Moore, 1997, 2000).

The reduction in the ability of listeners with hearing loss to use talker differences in F0 may result from weaker than normal pitch perception caused by poor resolution of harmonics or a reduction in the ability to use periodicity information. In addition, cognitive and other age-related factors may be involved, as older listeners tend to show less benefit from F0-difference cues than do younger listeners (Rossi-Katz and Arehart, 2009; Summers and Leek, 1998). Inadequate audibility of high-frequency harmonics may also weaken auditory grouping based on harmonic structure. Low-frequency harmonics have generally been shown to dominate in the perception of pitch (Dai, 2000; Moore et al., 1985), but listeners appear to use information across a broader range of frequencies to facilitate the perceptual grouping of harmonics in both vowels (Culling and Darwin, 1993; Rossi-Katz and Arehart, 2005) and sentences (Bird and Darwin, 1998). Bird and Darwin, for example, determined that listeners with normal hearing used both high- and low-frequency information in sentences when the talker differences in F0 were large (five semitones or more), but not when F0 differences were small (two semitones and lower). It is, therefore, possible that hearing-impaired listeners’ limited access to high-frequency information partially explains the plateau in performance observed for larger F0 differences in the study conducted by Summers and Leek (1998).

In studies examining specific talker segregation cues, individualized amplification and frequency shaping have rarely been used to test listeners with hearing loss. Arehart et al. (1998) reported that amplification and low-pass filtering of stimuli in a double-vowel experiment did not increase the benefit from F0 differences over that observed for unfiltered stimuli. It is possible, however, that the single combination of filtering and amplification did not optimize the response for all listeners. In a later study in which the authors used individualized amplification, benefit from F0 differences in vowels was similar for listeners with and without hearing loss (Rossi-Katz and Arehart, 2005). Given the weak connection between double-vowel and sentence recognition (Summers and Leek, 1998); however, it is unknown to what extent these results would apply to sentences.

Vocal-tract length cues

Vocal-tract length affects the average spectral envelope and formant frequencies of a talker’s speech. Perception of talker size depends on VTL cues (Ives et al., 2005; Smith and Patterson, 2005). In addition, VTL cues have been shown to influence judgments of talker age and gender (Smith and Patterson, 2005). For listeners with normal hearing, increasing the difference between the average spectral envelope of a target and competing sentence results in an improvement in speech recognition (Darwin et al., 2003; Darwin and Hukin, 2000a, 2000b; Vestergaard et al., 2009). Darwin and his colleagues reported improvements of more than 20 percentage points as the VTL ratios were increased from 1.0 to 1.38. Also, the combined effects of F0 and VTL cues resulted in greater improvement than either cue alone.

The relative contribution of F0 and VTL cues was examined by Vestergaard et al. (2009) by co-varying VTL and glottal pulse rate (associated with F0) over a wide range. They matched the amplitude envelopes of target and competing syllables to prevent listeners from listening in the dips of the envelopes of the competing sounds. A change in VTL of 1.6 times the glottal pulse rate was needed to equate performance for VTL and glottal pulse rate manipulations. A similar trading relationship was found with simulated spatial separations of up to 8° azimuth (Ives et al., 2010). These findings suggest that although both pitch and VTL cues contribute substantially to talker segregation for normal-hearing listeners, pitch is a more prominent cue.

The extent to which listeners with hearing loss can use vocal tract length cues for talker segregation is unknown. If, however, this ability requires access to at least the first two formants, then it is possible that a reduction in high-frequency audibility and∕or frequency resolution typically associated with sensorineural hearing loss will affect performance, even when using appropriate amplification.

The primary purpose of the present study was to determine how well listeners with cochlear hearing loss can use F0 and apparent VTL cues to segregate two competing sentences when provided with individualized amplification. The talker cues were manipulated independently.

A secondary purpose was to evaluate the effects of hearing loss on the relative influence of energetic versus informational masking by comparing sentence recognition in the presence of a competing sentence and amplitude-modulated (AM) noise. Using CRM sentences with limited linguistic cues, Brungart (2001a) showed that recognition by normal-hearing listeners was substantially poorer in the presence of a single competing sentence than in the presence of more spectrally dense AM noise. Brungart interpreted the poorer performance with a single competing talker as evidence that performance in this condition was limited primarily by informational masking. However, energetic masking may have a greater influence on performance of listeners with hearing loss due to impaired frequency and temporal resolution. In the present study, the amplitude envelopes of individual CRM competing sentences were extracted and used to modulate noise so that energetic masking could be estimated for AM noise matched to the envelope of the competing sentence. It is important to note, however, that in different frequency bands, representation of the AM noise and speech envelopes within the auditory system would not be the same; auditory representation of the speech envelope would vary across frequency, whereas the AM noise envelope would not.

Listeners with hearing loss were provided with individualized amplification that mimicked the frequency response of a hearing aid. Although the frequency response of a typical hearing aid does not entirely compensate for the loss of audibility, the use of speech filtered to mimic a hearing-aid response may provide a clearer picture of the accessibility of talker difference cues for typical hearing aid users. Sentences were filtered and amplified to approximate the frequency response prescribed by the NAL-RP hearing aid prescription (Byrne and Cotton, 1988), which is designed to equalize loudness across mid-range frequencies.

GENERAL METHOD

Listeners

Thirteen adults with sensorineural hearing loss and six adults with normal hearing participated in the study. Table Table I. shows the pure-tone audiometric thresholds and monosyllabic word-recognition scores in quiet for the listeners’ test ears.

Table 1.

Age in years, monosyllabic word-recognition scores (WS%), pure-tone averages (PTA) in dB HL, and the test-ear pure-tone thresholds from 250 to 8000 Hz (in dB HL) for listeners with hearing loss (first 13) and normal hearing.

				Frequency (Hz)
ID	Age (yrs)	WS%	PTA (dB HL)	250	500	1000	1500	2000	3000	4000	6000	8000
HL01	61	68	48	25	30	45	60	70	70	70	75	80
HL02	76	77	30	15	15	30	40	45	55	60	75	85
HL03	59	89	40	20	25	35	50	60	65	70	70	75
HL04	67	83	47	25	30	50	50	60	65	65	70	90
HL05	45	73	52	20	25	55	75	75	70	65	85	NR
HL06	59	83	47	25	40	50	45	50	60	55	50	45
HL07	61	87	63	40	60	60	55	70	60	65	70	70
HL08	61	90	38	30	35	40	40	40	45	50	55	60
HL09	68	78	43	35	30	45	50	55	65	70	90	NR
HL10	62	97	22	-5	-5	10	40	60	65	80	80	55
HL11	63	73	45	10	15	40	80	80	85	95	NR	NR
HL12	59	76	60	55	60	60	60	60	65	70	70	70
HL13	55	77	45	15	25	45	55	65	90	95	NR	NR
NH01	68	100	8	0	5	10	10	5	10	5	25	25
NH02	50	100	6	0	0	10	10	5	10	5	0	0
NH03	53	100	14	10	15	20	15	5	15	25	25	25
NH04	52	100	4	0	0	0	5	10	10	5	20	15
NH05	25	100	6	5	10	5	10	0	10	15	10	10
NH06	59	100	15	20	20	15	15	10	20	25	20	20

Open in a new tab

The mean three-frequency average threshold (0.5, 1, and 2 kHz) of listeners with hearing loss was 46 dB hearing level and the mean monosyllabic word recognition score in quiet was 81%. Listeners had normal middle-ear admittance and air-bone gaps of 10 dB or less between 500 and 4000 Hz, indicating that there was no conductive involvement. Acoustic reflex findings for listeners with sensorineural hearing loss were within the 90% confidence intervals for cochlear hearing loss (Silman and Gelfand, 1981), suggesting that there was no retrocochlear involvement. The etiologies of the hearing losses varied: six listeners reported a history of noise exposure, one reported a family history of hearing loss, and two had suspected vascular etiologies. The remaining four had unknown etiologies.

Listeners with normal hearing ranged in age from 25 to 69 yr (mean: 48 yr), whereas listeners with hearing loss ranged in age from 45 to 76 yr of age (mean: 61 yr). All listeners were native speakers of American English and were screened for cognitive disabilities using the Mini-Mental Status Examination (Folstein et al., 1975). All listeners scored 29 or 30 out of 30 possible points. The test ear was randomly selected for listeners with bilaterally symmetrical hearing. For one listener with asymmetrical hearing, the better ear was selected; this ear most closely corresponded to an average hearing loss of 50 dB hearing level.

Materials and processing

Amplitude-modulated (AM) noise

Amplitude-modulated noise stimuli were created by extracting the amplitude envelopes from each competing sentence and applying them to samples of random noise. The upper frequency limit of the envelopes was 60 Hz. The random noise was filtered to match the average spectrum of five sentences with no F0 or VTL shift. Two different infinite impulse response low-pass filters were used to filter the noise samples. One filter had a cut-off of 625 Hz and a slope of 11 dB∕octave (dB∕cycle). The second filter had a cut-off of 1250 Hz with a slope of 14 dB∕octave (dB per cycle). The noise filtered with the 625 Hz cut-off was combined with the noise filtered with the 1250 Hz cut-off. These operations resulted in a set of amplitude-modulated noise samples, each with amplitude envelopes that matched the single competing sentence from which it was created. The rms level of each noise sample was matched to that of the competing sentences.

Speech materials

Speech materials were drawn from the Coordinate Response Measure (CRM) speech corpus developed for multi-talker research (Bolia et al., 2000). Sentences in the corpus are constructed from key words consisting of call signs (“Baron,” “Charlie,” “Eagle,” etc.), colors (“red,” “white,” “green,” and “blue”), and numbers (1–8). These key words are combined with simple cues: “ready,” “go to,” and “now,” to form sentences (e.g., “Ready” + “Charlie” + “go to” + “red” + “one” + “now”). The listener’s task is to repeat the key words (color and number) associated with the call sign “Baron” while ignoring the competing sentence. Six of the eight call signs were used (“Baron,” “Charlie,” “Ringo,” “Hopper,” “Tiger,” “Eagle”).

A single male talker (Talker 0) from the Coordinate Response Measure (CRM) speech corpus was used for both the target and competing sentences. Talker differences were created by manipulating the characteristics of the talker’s voice to produce changes in F0 and apparent vocal tract length, as described below.

Speech Processing

Speech manipulations were made using the pitch-synchronous overlap-add (PSOLA) algorithm (Moulines and Charpentier, 1990) as implemented by the Praat software package (Boersma and Weenink, 2006). Fundamental frequency and VTL were manipulated independently. The F0s were shifted by −3, 0, + 3, and + 6 semitones. Formant frequencies and speaking rate are unaffected by this processing.

The spectral envelope was scaled to produce changes in the apparent VTL. For convenience, the spectral envelope manipulations will be referred to as changes in VTL throughout the paper. Spectral envelopes were manipulated in the manner described by Darwin et al. (2003), resulting in proportional changes of 0.84, 0.92, 0.96, 1.00, 1.04, 1.08, and 1.16. Values below 1.0 correspond to shorter VTLs than the original and values above 1.0 correspond to longer VTLs. Even the smallest VTL shift would be expected to produce perceptually salient changes in perceived talker size based on the findings of Ives et al. (2005) showing size discrimination performance of 75% and higher for similar shifts in syllabic stimuli. The desired proportional changes (0.84, 0.92, etc.) were used as scaling factors (sf) to shift F0 and duration (F0 × sf; duration × 1∕sf). The stimuli were then re-sampled (original sampling rate of 40 kHz × sf) and stored for playback at the original sampling rate. These operations shifted the spectral envelope, but retained the original duration and F0.

The sentences were combined to produce sentence pairs with talker differences in F0 or VTL. The F0 differences and VTL ratios used are shown in Table Table II.. The processing type (F0 shift or proportional change in the spectral envelope) for the individual sentences comprising the sentences pairs is shown in parentheses. Talker differences in F0 ranged from 0 to 9 semitones. The ratio of spectral envelope shifts for the two sentences comprising each pair (VTL ratios) ranged from 1.00 (no shift) to 1.38.

Table 2.

Fundamental frequency and vocal-tract length (VTL) manipulations used to produce talker differences. The talker difference (in semitones or VTL ratio) is shown at the left of each column. The shifts for the individual sentences comprising the pairs are shown in parentheses.

F0 difference	VTL Ratio
(shift in semitones)	(Proportional shift)
0 (0, 0)	1.0 (1.00,1.00)
3 (−3, 0)	1.08 (0.96, 1.04)
6 (−3, + 3)	1.16 (0.92, 1.08)
9 (−3, + 6)	1.38 (0.84, 1.16)

Open in a new tab

Procedures

All stimuli were routed from a computer to a Tucker-Davis Technologies RX8 24-bit multi I∕O processor and presented monaurally through an Etymotic ER4 insert earphone. Listeners were tested in a sound-attenuated booth.

Presentation levels and stimulus filtering parameters were chosen to achieve the following goals: (1) comfortable loudness for all listeners and (2) frequency shaping for listeners with hearing loss that approximates a typical hearing aid. Unfiltered stimuli were presented to normal-hearing listeners at an average level of 71 dB sound pressure level, which corresponded to a comfortable loudness level for all listeners (see description of loudness testing below).

Presentation levels for listeners with hearing loss were based on a three-stage process (described below) involving (1) amplification and frequency shaping (2) verification of output level needed to adequately match the prescribed NAL-RP targets and (3) verification∕adjustment of overall output to the listener’s most comfortable loudness level.

Frequency shaping∕amplification and verification

For listeners with hearing loss, stimuli were individually amplified and filtered to approximate the frequency shaping prescribed by the National Acoustics Laboratory revised hearing aid formula (NAL-RP) (Byrne and Cotton, 1988). Individual filters were created for each listener. Amplification and filtering of the speech stimuli were based on targets for a 65 dB speech-shaped noise signal.

Individual real-ear-to-coupler differences (RECD) were measured with the Fonix 7000 using the same insert earphone used for subsequent speech recognition testing. The RECD was used to convert real-ear sound pressure level targets to 2 cc coupler targets.

Frequency responses were verified by comparing the spectrum of the amplified∕filtered speech-shaped noise to the 2 cc target value. The amplified∕filtered noise was played from the test system (Tucker-Davis Technologies RX8), delivered to the Etymotic ER4 insert earphone attached to a 2cc coupler, and measured using the Frye Fonix 7000 analysis system. The spectrum of the coupler response was compared to the target spectrum. Hearing aid microphone effects were excluded from the calculations. Filters were redesigned as needed until the measured values were as close to the target values as possible. Figure 1 shows the mean target and measured 2 cc coupler outputs (dB sound pressure level in 1∕3 octave bands) for the 13 hearing-impaired listeners.

Mean target and measured 2cc coupler outputs (in dB sound pressure level) for the listeners with hearing loss, measured in 1∕3 octave bands.

Loudness ratings

Categorical loudness ratings were obtained to verify that the amplifier and filter settings produced an output that corresponded to a comfortable loudness level. Starting with a sentence at a level 4 or 6 dB lower than the level corresponding to the prescribed settings, listeners were asked to rate the loudness on a scale of 2–8: [2—“Very soft”; 3—“Soft”; 4—“Comfortable, but slightly soft”; 5—“Comfortable”; 6— “Comfortable, but loud”; 7—“Loud, but OK”; or 8—“Uncomfortably loud.” The presentation level was increased in 3 dB steps until the listener reported a loudness rating above 5. The procedure was repeated until two ratings of “comfortable” were obtained at the same level. Typically, only one repetition was needed. The final presentation level for the speech stimuli was the level at which listeners consistently reported a loudness rating of “5.” On average, the prescribed and preferred levels were within 1.2 dB. Normal-hearing listeners also completed loudness ratings to ensure that the 71 dB sound pressure level stimuli fell within the “comfortable” range. This process was necessary to minimize possible confounding effects of inappropriate amplification.

Statistical analyses

Recognition scores (percentage correct) were converted to rationalized arcsine units for statistical analyses in order to stabilize the error variance (Studebaker, 1985). For all competing-talker conditions, repeated-measures analyses-of-variance were used to test for differences in means for the factors of interest. Partial eta-squared (η²_p) was used as a measure of effect size. Greenhouse-Geisser corrections were applied whenever violations of the sphericity assumption occurred. Newman-Keuls post hoc tests were used for pairwise comparisons of significant interactions and∕or main effects. A probability of.05 was used as a criterion for statistical significance.

EXPERIMENT 1: TALKER SEGREGATION WITH RANDOMIZATION OF TARGET-TALKER CHARACTERISTICS

Method

Recognition tests were administered across three test sessions of approximately 60 min each. Performance in quiet was measured during the first session. Performance was measured in the presence of both AM noise and competing sentences during sessions two and three, as described below.

Performance in quiet

To ensure that the F0 and VTL manipulations did not affect the intelligibility of speech in quiet, recognition of sentences was evaluated under all F0 and VTL processing conditions without competition. Listeners were given five practice blocks, each consisting of 10 sentences (20 key words) to orient them to the test procedure. Practice conditions consisted of the sentences with no shift in F0 or VTL and the sentences with the largest and smallest changes in F0 and VTL.

Practice segments were followed by separate test blocks, each containing ten sentences. The conditions with F0 and VTL shifts were presented in random order for each listener. The F0 and VTL reference conditions (no shift) were tested at the beginning or end of the session, counterbalanced across listeners. Performance was measured as the percentage of correctly repeated items (colors and numbers).

Performance in the presence of a competing talker or noise

Practice.

To familiarize listeners with the competing sentence task, initial training was provided for sentences with and without F0 and VTL changes. During the initial training, the target-to-competition ratio (TCR) was decreased from + 20 to 0 dB in 5-dB steps across a series of presentations for each condition. Re-instruction and additional practice were given to listeners until they became familiar with the task and performance stabilized. Following this initial training, two practice blocks were administered at a fixed TCR of 0 dB.

Test conditions.

The target sentence (call sign: “Baron”) was combined with a single competing sentence or with AM noise. Only the sentences with no shift in F0 or VTL were combined with the AM noise; sentences for all F0 or VTL shifted conditions were combined with the competing speech. The F0 and VTL conditions were completed in separate test sessions, with the test order counterbalanced within the two groups of listeners. The AM noise conditions were randomly assigned to either the beginning or the end of each session.

Sentences were presented at four TCRs (0, 3, 6, and 9 dB). Each condition (VTL, F0, or AM noise) was completed in a single 40-sentence block (10 sentences for each TCR). Easier TCRs (9 or 6 dB) were always presented first and were alternated with the more difficult TCRs (0 or 3 dB). The TCR sequence was randomly assigned to each listener. All TCRs for a given processing condition were completed before beginning a new processing condition.

During conditions in which there were talker differences, the target and competing talkers were randomly assigned to the higher or lower F0 or VTL values within each block of 10 sentences. The high and low values were equally distributed across the blocks of sentences presented at each TCR.

As in the training session, listeners were asked to repeat the color and number spoken by the talker with the call sign “Baron” following each presentation. The percentage of correctly repeated items (colors and numbers) was recorded.

RESULTS

Single-sentence performance

For all listeners with normal hearing and 10 of 13 listeners with hearing loss, there were no errors when listening to sentences without competition. For listeners with hearing loss the mean scores were 100, 99.2, 99.6, and 99.6% for F0 shifts of −3, 0, + 3, and + 6 semitones, respectively. Mean scores for the VTL conditions were 98.5, 98.9, 99.2, 98.6, 99.2, and 99.6% for VTL values of 0.84, 0.96, 1.00, 1.04, 1.08, and 1.16, respectively. Skewed distributions precluded the use of parametric statistics. Instead, separate Friedman analysis of variance tests were completed for the F0 and VTL data of listeners with hearing loss. There was no significant effect of processing for either the F0 (X² (3) = 1.40, p = 0.70) or VTL conditions (X² (6) = 3.17, p = 0.78). Based on these analyses, there is no evidence that the F0 or VTL processing had a substantial effect on speech intelligibility in quiet.

Effect of amplitude-modulated noise and a competing talker

Table Table III. shows mean recognition scores for sentences tested in the presence of AM noise and single competing sentences. Scores represent the means for sentences with no shift in VTL or F0. Mean scores in AM noise were close to 100% for all TCRs. All but two listeners with hearing loss scored 95% or higher under all AM noise conditions. Scores in the presence of a single competing sentence were substantially lower than scores in AM noise for both listener groups. This finding suggests that, even for listeners with hearing loss, informational masking was the dominant factor underlying performance in the competing talker conditions for the TCRs used in this study.

Table 3.

Mean recognition scores (% correct) for sentences without a F0 or VTL shift for TCRs of 0, 3, 6, and 9 dB for two masker types: amplitude-modulated (AM) noise and single sentences. Standard deviations are shown in parentheses.

	Normal Hearing		Hearing Loss
TCR	AM noise	Single sentence	AM noise	Single sentence
0	99.6 (1.1)	55.8 (13.4)	95.0 (8.6)	49.4 (7.9)
3	100 (0)	62.2 (13.9)	96.9 (5.5)	71.1 (12.7)
6	100 (0)	87.5 (14.6)	98.5 (2.6)	83.2 (10.0)
9	100 (0)	98.7 (2.2)	98.8 (2.9)	93.2 (7.5)

Open in a new tab

Effects of talker differences in F0

Performance for high- and low-F0 target talkers.

Mean scores for high- and low-F0 target talkers are shown in Figs. 2 3, respectively. When the target talker had the higher F0 value (Fig. 2), recognition improved with increasing F0 difference for both groups, mainly for the lower TCRs. At lower TCRs, listeners with normal hearing benefited more from a nine semitone F0 difference than did listeners with hearing loss. For the higher TCRs, there was less room for improvement as performance approached ceiling level.

Mean scores for trials in which the target was assigned to the higher F0 value. Data for different TCRs are shown in separate panels. In this and subsequent figures, error bars indicate ± 1 standard error.

Mean scores for trials in which the target was assigned to the lower F0 value.

When the target talker had the lower F0 (Fig. 3), there was no evidence of benefit from increasing F0 difference for either group. At the lowest TCR, mean performance of listeners with hearing loss decreased with increasing difference in F0.

Performance for the high and low target talkers was analyzed separately. A repeated-measures analysis of variance was completed using hearing status as a between-subjects factor, and F0 difference (0, 3, 6, 9 semitones) and TCR (0, + 3 dB) as within-subjects factors. Target-to-competition ratios of + 6 and + 9 dB were not included because at the higher TCRs, more than half of the listeners approached ceiling-level performance (> 90%) for the reference condition.

Analysis of variance: High-F0 target.

A summary of the analysis of variance results is shown in Table Table IV.. There were significant interactions between F0 difference and hearing status and between TCR and hearing status. Post hoc testing confirmed a significant improvement in recognition from 6 to 9 semitones for listeners with normal hearing (p = 0.004), but not for listeners with hearing loss (p = 0.92). For both groups, scores for F0 differences of 6 and 9 semitones were significantly higher than scores for an F0 difference of 0 semitones. The mean score for normal-hearing listeners was significantly higher than the score for listeners with hearing loss (p = 0.006) for a difference of nine semitones. The mean score for listeners with hearing loss was poorer than the score for normal-hearing listeners at the lower TCR, but was similar at the higher TCR.

Table 4.

Analyses of variance results for high- and low-F0 targets. Significant p values are shown in bold.

Effect: High F0 target	df	F	p	η²_p
Hearing	(1,17)	4.08	0.059	0.193
F0 difference	(3,51)	17.21	<0.001	0.503
TCR	(1,17)	43.82	<0.001	0.720
F0 diff × Hearing	(3,51)	3.05	0.037	0.152
F0 diff × TCR	(3,51)	2.86	0.064	0.144
TCR × Hearing	(1,17)	4.86	0.042	0.222
F0 diff × TCR × Hearing	(3,51)	0.41	0.748	0.023
Effect: Low F0 target	df	F	p	η²_p
Hearing	(1,17)	2.52	0.131	0.129
F0 difference	(3,54)	1.63	0.194	0.087
TCR	(1,17)	34.98	<0.001	0.673
Fo diff × Hearing	(3,54)	2.25	0.093	0.117
F0 diff × TCR	(3,54)	0.85	0.471	0.048
TCR × Hearing	(1,17)	10.35	0.005	0.378
F0 diff × TCR × Hearing	(3,54)	1.20	0.319	0.066

Open in a new tab

Analysis of variance: Low F0 target.

There was no significant effect of F0 difference or interaction involving F0 difference, confirming that F0 difference did not enhance recognition for the low F0 target. As in the previous analysis, the significant interaction between TCR and hearing status reflected greater differences between groups for the lower TCR. There were no other main effects or interactions.

Effects of talker differences in apparent vocal tract length

Mean scores for the high- and low-VTL targets were 77.1% and 78.0%, respectively. A repeated-measures analysis of variance did not reveal a significant difference between high and low target values (F(1,17) = 0.58, p = 0.45) nor any significant interactions between VTL target and any other factor. Therefore, scores for the high and low VTLs were averaged for subsequent analysis.

Mean scores are shown in Fig. 4 as a function of VTL ratio. On average, recognition by normal-hearing listeners improved with increasing VTL ratio. This effect was absent for the group with hearing loss.

Mean scores under conditions in which talker differences in VTL varied from 1.0 to 1.38. Data were averaged across the high and low target-talker values.

A repeated-measures analysis of variance with hearing status as a between-subjects factor and VTL ratio and TCR (0, + 3 dB) as within-subjects factors indicated a significant interaction between VTL ratio and hearing status (F(3,51) = 12.29, p < 0.0001, η²_p = 0.42) and between TCR and hearing status (F(3,51) = 4.62), p = 0.046, η²_p = 0.214). Post hoc tests for the normal-hearing group confirmed a significant improvement in scores (re no VTL shift) for VTL ratios of 1.16 (p = 0.02) and 1.38 (p = 0.0002), but no significant improvement for a VTL ratio of 1.08 (p = 0.65). In contrast, there was no significant improvement in scores for the group with hearing loss for any VTL ratio. These data provide no evidence that listeners with hearing loss can use differences of vocal-tract length to aid in the separation of two competing talkers, even when using appropriate amplification.

Summary

Both groups of listeners benefited from talker difference in F0 for the higher-F0 target, but not for the lower-F0 target. Recognition scores were significantly better for the higher-F0 target than for the lower-F0 target. The striking asymmetry between results for the higher- and lower-F0 targets suggests that listeners had difficulty focusing on the lower pitch in the presence of the higher-F0 competition. Scores for the higher- and lower-VTL targets were similar. Listeners with normal hearing benefited from VTL cues, but listeners with hearing loss did not. Recall that in both the F0 and VTL segments of the experiment, the higher and lower targets were randomly selected on each trial. It is possible that the demands of monitoring and switching attention limited performance under some conditions. This possibility was evaluated in Experiment 2.

EXPERIMENT 2: ELIMINATION OF TRIAL-TO-TRIAL UNCERTAINTY

Experiment 2 was conducted to examine the possible influence of trial-to-trial target uncertainty on the results of the Experiment 1. The goals were (1) to determine if target uncertainty affected the benefit from talker differences observed in Experiment 1 and (2) to quantify the contribution of target uncertainty to the high-low asymmetry effects observed for the F0 conditions in Experiment 1. Testing was limited to the most extreme F0 and VTL differences, because any changes in benefit would most likely occur for larger talker differences. Only the more difficult TCRs were used because performance for the easier TCRs approached ceiling level for Experiment 1.