Abstract
Sixty normally-hearing listeners, ages 5 to 61 years, participated in a monaural speech understanding task designed to assess the impact of a single-talker speech masker presented to the opposite ear. The speech targets were masked by ipsilateral speech-spectrum noise. Masker level was fixed and target level was varied to estimate psychometric functions. The target∕masker ratio that led to 51% correct performance in this task was taken as the baseline threshold. The impact of a modulated speech-spectrum noise, a male talker, or a female talker presented at a fixed level to the contralateral ear was quantified by the change in the baseline threshold and was assumed to reflect informational masking. The modulated-noise masker produced no informational masking across the entire age range. Speech maskers produced as much as 20 dB of informational masking for children aged 5–8 years and only 4 dB for adults. In contrast with previous studies using ipsilateral speech maskers, the male and female contralateral speech maskers produced comparable informational masking. Analyses of the developmental rate of change for informational masking and of the patterns of individual differences suggest that the informational masking produced by contralateral and ipsilateral maskers may be mediated by different mechanisms or processes.
INTRODUCTION
In a multi-source acoustical environment, selectively attending to one sound source and ignoring the others represents a significant challenge to the human auditory system. The acoustical waveforms from all sources mix linearly before they reach the ears, so it is up to the brain to parse the auditory scene and extract the attended source. For normally hearing adults this process seems effortless and automatic unless the environment is extremely noisy. However, for children and for individuals with hearing impairment, even a small amount of certain kinds of noise can interfere significantly with the scene analysis and selective attention processes. This is especially troublesome for young children who typically rely on auditory input for language development.
Research on the mechanisms and processes that subserve auditory scene analysis and selective attention has yet to produce a detailed understanding of what brain mechanisms are involved and how they work. However, there is a substantial body of evidence from behavioral studies on the stimulus parameters that favor successful auditory selective attention and on those that create the most interference. Much of this evidence has come from studies of auditory masking, in which one or more sounds constituting the “masker” interfere with the detection or recognition of a target sound, often called the “signal.” The amount of masking is estimated by calculating the difference in performance (in a detection or recognition task) between conditions in which the masker is either present or absent.
Bandpass filter models of the auditory periphery (Moore and Glasberg, 1987) have been very successful at predicting the amount of masking produced by a wide variety of maskers, mostly steady-state noises and tone complexes. The amount of masking predicted by the bandpass filter models is based on the masker energy in a frequency region surrounding the signal; thus, it is called “energetic masking.”
Recent research has shown that, when masker components are random from presentation to presentation or when there is a high degree of similarity between signal and masker (e.g., when both are speech), a qualitatively and quantitatively different kind of masking is produced. This masking, produced by masker uncertainty or signal-masker similarity, is called “informational masking” (Pollack, 1975). One of the most dramatic demonstrations of the qualitative and quantitative differences between energetic and informational masking is found in the research of Neff and colleagues (Neff, 1995; Neff and Dethlefs, 1995; Neff and Green, 1987), who reported as much as 30 dB of excess masking (above that predicted by bandpass filter models) of a sinusoidal signal by a random multi-tone complex. Moreover, unlike typical results with energetic maskers, Neff and Dethlefs (1995) also reported large individual differences in susceptibility to informational masking. These findings have been replicated and modeled by Oh and Lutfi (1998) and many others.
Informational masking of speech targets has been studied extensively using a paradigm known as the Coordinate Response Measure, or CRM (Brungart, 2001b; Brungart et al., 2006; Brungart and Simpson, 2002b; Brungart and Simpson, 2004; Brungart and Simpson, 2007; Brungart et al., 2001). In CRM experiments, a listener attends to a target speech signal which is masked by one or more speech messages that are similar in structure to the target. The maskers are either in the same voice as the target, a different voice of the same gender as the target, or a voice of a different gender as the target. They are presented either in the same ear as the target, in the other ear, or in both ears. The results from the CRM studies constitute a valuable catalog of informational masking effects, from which three major findings emerge: (1) no informational masking is produced by a noise that has both a temporal envelope and long-term spectrum similar to a speech masker; (2) much more informational masking is produced by a masker in the same voice as the target, less by a masker of the same gender but different voice, and much less by a masker of a different gender; (3) a masker presented to the non-target ear produces less informational masking than one presented to the target ear. The effects reported in the Brungart studies have been replicated by others, using both the CRM and other paradigms (Balakrishnan and Freyman, 2008; Freyman et al., 2001; Freyman et al., 2007; Freyman et al., 1999; Helfer and Freyman, 2008; Humes et al., 2006; Wightman and Kistler, 2005; Wightman et al., 2006)
Informational masking almost certainly plays a role in everyday speech communication, such as in environments that include one or more speech masker sounds. A school classroom is an example of such a noisy environment, one in which speech communication is extremely important. For this reason, recent research in our laboratory has focused on informational masking in children. Experiments have used tonal signals and maskers (Oh et al., 2001; Wightman et al., 2003) and speech targets and maskers (Wightman and Kistler, 2005; Wightman et al., 2006). The results suggest that children are considerably more susceptible to informational masking than adults, but that the main factors influencing the amount of informational masking (uncertainty and target-masker similarity) are the same in children as in adults. The speech experiments used the CRM paradigm and our results from adult listeners are comparable to those obtained by Brungart and Simpson (2002b, 2004, 2005).
One factor that can complicate interpretation of the results obtained from speech informational masking studies is the fact that a speech masker can produce both energetic and informational masking, thus confounding measures of the amount of alone. For example, the CRM task, which measures recognition of a speech message in the presence of a simultaneous speech masker, must include some energetic masking if the speech target and masker overlap spectrally and temporally. In research reported by Brungart and colleagues (Brungart, 2001b; Brungart et al., 2006; Brungart and Simpson, 2002b; Brungart et al., 2001), the estimated energetic masking component was small so it was ignored. However, it is possible that it cannot be ignored in studies of young children, who typically demonstrate elevated energetic masking (Elliott, 1979; Fallon et al., 2000; Fallon et al., 2002; Hall et al., 2002; Stuart, 2008).
A masking paradigm in which the target and masker are presented to different ears represents an effective way to isolate informational from energetic masking. It is highly unlikely that a speech masker presented to one ear has any measurable energetic masking effect on the recognition of a target presented to the other ear. Thus, any masking produced by the contralateral masker should be entirely informational. A previous study with tonal targets and maskers (Wightman et al., 2003) focused on conditions in which targets were presented to one ear and maskers were presented to the opposite ear. Adults showed no masking at all in these conditions (target thresholds were the same as with no masker present), but children showed substantial masking, which was interpreted as entirely informational. In a comparable experiment with tonal targets and maskers, Hall et al. (2005) came to the same conclusions. One of the motivations of the current experiment, in which the speech target and speech masker were presented to opposite ears, was to follow-up and extend the previous studies, with the aim of tracking the developmental changes in the informational masking produced by a contralateral speech masker.
In an earlier study of speech on speech masking from our laboratory (Wightman and Kistler, 2005), a contralateral masker was combined with an ipsilateral masker, in a paradigm modeled after one designed by Brungart and Simpson (2002b). For adult listeners, in both the Brungart and Simpson study and our own, the added contralateral speech masker had a modest degrading effect on performance, especially at target∕masker (T∕M) ratios less than 0 dB. A contralateral noise masker added to the ipsilateral speech masker had no effect. Similar results have been reported by Kidd et al. (2003) in a study using tonal targets and maskers and in a recent study of speech masking reported by Brungart and Simpson (2007), which examined the effect of target-masker similarity in the ipsilateral and contralateral stimuli. For the children in our study, although the amount of ipsilateral informational masking was much greater than for the adults, the effect of the added contralateral speech masker appeared to be about the same as for the adults.
Measuring the impact of a contralateral speech masker that is combined with an ipsilateral speech masker is complicated because many of the psychometric functions from the ipsilateral masker condition are non-monotonic. This non-monotonicity, which has been previously reported (Brungart and Simpson, 2002b; Wightman and Kistler, 2005), is thought to result from the listener adopting a strategy of “listening to the softer voice” when the target is slightly less intense than the masker. Thus, performance at T∕M ratios as low as −10 dB is equal to or even better than performance at a 0 dB T∕M ratio. Because the non-monotonicity disappears when a contralateral masker is added, the impact of the contralateral masker cannot be described as a simple shift of the psychometric function toward poorer performance. The fact that the non-monotonicity does not appear in the ipsilateral-only psychometric functions of young children further complicates comparison of the effect of a contralateral masker across age groups.
The research reported here addresses both the age-dependency of the informational masking produced by a contralateral speech masker and individual differences in the amount of informational masking. The absence of an ipsilateral speech masker eliminates the non-monotonicity of the psychometric functions and isolates any informational masking obtained to that produced by the contralateral masker. In order to establish a common baseline from which to estimate the contralateral informational masking contribution, we measure speech recognition performance in the presence of an ipsilateral speech spectrum noise, which is assumed to be a purely energetic masker. The level of the ipsilateral masker and the contralateral masker (when present) were held constant, and the level of the target was varied to map out a psychometric function. The threshold target∕masker ratio (51% correct) with no contralateral masker provided an estimate of baseline performance. The impact of adding a contralateral masker is expressed in terms of the resulting dB shift in the target∕masker threshold. This paradigm is identical to that used in one of the conditions studied by Brungart and Simpson (2002b, Fig. 4).
The main advantage of the simple dichotic listening paradigm for studying age effects and individual differences is that the amount of informational masking for each listener is expressed in terms of the dB shift at a constant performance level (threshold, or 51% correct). This assumes that the ipsilateral energetic masking and the contralateral informational masking simply add in dB, an assumption that is common in studies of informational masking and is predicted in models of tonal informational masking (Kidd et al., 2005; Lutfi, 1990). In many previous studies the amount of informational masking is inferred because ceiling effects prohibit a measure of performance without the informational masker. For example, in studies with a target message and a masker message presented to the same ear, performance with the masker absent is close to perfect (Brungart and Simpson, 2002b; Kidd et al., 2005). This complicates interpretation of either individual differences in informational masking or age effects because of the fact that ceiling performance levels may obscure important differences (especially between adults and children) in task demands or difficulty. Thus, even if two listeners achieve 100% correct performance in quiet, one listener may appear to be less resistant than another to informational masking because the task in quiet was more difficult for that listener. In the paradigm used here, if the task demands or difficulty in the baseline (no informational masking) condition are greater for young children, the baseline (energetic masking) threshold data will reveal that. The shift in threshold caused by the contralateral masker will then provide a measure of informational masking that is less contaminated by age-dependent task demands.
Few previous studies of informational masking with speech targets have specifically addressed the issue of individual differences in normally hearing adults or children (Leech et al., 2007). This is surprising given that studies with tonal stimuli have emphasized individual differences (Lutfi et al., 2003; Neff and Dethlefs, 1995). In fact, as mentioned earlier, one of the defining characteristics of informational, as contrasted with energetic, masking is individual differences. Our previous work (Lutfi et al., 2003; Wightman et al., 2003) has suggested that individual differences in informational masking may be larger in children than in adults. Thus, the second aim of the current study was to test a sample of children and adults large enough to permit explicit analyses of individual differences across a wide age range. The hope was that comparison of the patterns of individual differences with a contralateral masker (in this study) and with an ipsilateral masker (in previous studies) would reveal aspects of the processing strategies used by listeners in the two conditions and thus, might inform theories of selective attention and its development.
METHODS
Listeners
Twenty-four adults, ages 19–61 years, and 36 children, ages 5–16 years, participated in the experiment. Six additional adults did not return after the first session so their data are not included here. All children who were initially recruited completed the experiment. Participants were recruited by advertisements on the University of Louisville electronic bulletin board and through ads posted on bulletin boards in public and private schools. Participants were paid $8∕hr. All participants were required to pass a 20 dB HL audiometric screening at octave frequencies 250–8000 Hz and have normal middle ear function as confirmed by routine tympanometry. Participants who were reported to have attentional problems or a diagnosis of ADD or ADHD were excluded
Stimuli
The speech targets and maskers used here were the same as those used in the “Coordinate Response Measure” (CRM) experiments reported by Brungart and colleagues (Brungart, 2001a, 2001b; Brungart et al., 2006; Brungart and Simpson, 2002b; Brungart and Simpson, 2004; Brungart and Simpson, 2007; Brungart et al., 2005) and in those previously conducted in our laboratory (Wightman and Kistler, 2005; Wightman et al., 2006). The CRM involves a task in which listeners are asked to attend to a spoken target message of the form, “Ready call sign, go to color number now.” and to respond by indicating the color and number included in the target message. The CRM corpus contains 2048 digitally recorded messages by eight talkers using eight call signs, four colors and eight numbers (Bolia et al., 2000). In this experiment the target “call sign” was always “Baron” and the target talker was male talker #0. The target color and number were chosen randomly on each trial, from a set of four colors (“red,”` “blue,” “green,” and “white”) and eight numbers (1–8). Thus, chance performance was approximately 3% (1∕32.) The speech masker messages had exactly the same form as the target but a different talker, call sign, color, and number, randomly selected on each trial. Target and masker messages were temporally aligned at the beginning and were roughly the same total duration.
Four stimulus conditions were tested in this experiment. All conditions involved trials in which a single target message and a speech spectrum noise were presented to the listener’s right ear. The speech spectrum noise was a 3.2 s Gaussian noise filtered by the long-term average spectrum of all of the CRM messages. The onset of the noise was 500 ms prior to the onset of the speech and the offset was roughly 500 ms after the offset of the speech. In the monaural control condition, no stimulus was presented to the left ear. In male and female contralateral masker conditions, the masker phrase was presented to the left ear. In the male masker condition, the masker talker was randomly chosen from the remaining three talkers and in the female masker condition, from the four female talkers. A second control condition involved presentation of a modulated speech spectrum noise to the left ear with no added speech masker. The noise was amplitude modulated by the envelope of a different female CRM speech message on each trial.
The overall level of the ipsilateral speech spectrum noise was fixed at 55 dB SPL. The contralateral speech masker or modulated noise masker was presented at the same fixed level. The level of the target was varied randomly from trial to trial in 5 dB steps in order to allow estimation of a complete psychometric function (performance from near chance to 100% correct) for each listener in each condition. Depending on the listener and the condition, the target∕masker ratio varied from −20 to +20 dB. The relatively low level for the ipsilateral noise masker was chosen because some young children required signal levels more than 20 dB above the noise level to reach 100% correct performance. The peak factor in the CRM corpus was close to 20 dB, and we chose to limit the maximum sound level to 95 dB SPL.
All stimuli were digitally generated and converted to analog form (44 100 Hz sample rate) via the 24-bit D∕A converters on a two-channel PC soundcard (CardDeluxe). Targets were attenuated and mixed digitally from trial to trial. The stimuli were presented to listeners using calibrated Beyer DT990-Pro headphones. Listeners were tested in a double-walled sound booth (Acoustic Systems) in which the ambient sound level was less than 20 dBA.
Procedure
The CRM task requires listeners to attend to the target talker (identified in the phrase “ready Baron”), ignoring the masker talker (e.g., “ready Tango”), if present, and respond with the color and number indicated in the target talker sentence. For example, a correct response to the sentence “Ready Baron, go to blue 3 now” would include the color “blue” and the numeral “3.” Responses were given by the listener clicking a computer mouse on the appropriately colored and numbered box on a computer screen. The eight numbered response boxes were arranged in four (2×2) colored panels (3×3 with the middle space vacant). After a response was entered, the listener clicked on another box in the center of the screen to present the next trial. Feedback regarding the accuracy of the response was not provided.
At the beginning of the first session, listeners were presented a block of 30 trials with no maskers in order to familiarize them with the task. During this familiarization the level of the target message was varied randomly over a 20 dB range (35–55 dB SPL) in 5 dB steps. This procedure verified that the target would be audible with no noise or speech masker for the lowest levels to be tested. Performance was required to be at or near 100% at each level in order for the listener to proceed. All listeners passed this test.
Testing was designed to estimate a full psychometric function on each listener in each condition. Ideally, at the lowest target level tested performance would be very low (near chance) and at the highest level would be near perfect (100% correct). Each function contained a minimum of five target levels. One or more practice blocks of 30 or 60 trials were completed in each condition to establish the appropriate levels. During the practice phase, the four stimulus conditions were tested in a fixed order: (1) monaural, (2) modulated noise masker, (3) female speech masker, and (4) male speech masker. For the test phase, the four conditions were presented in random order. For each condition, 240–300 test trials were completed, resulting in at least 48 trials for five of the levels tested. Adult listeners and children over the age of 6 years were tested in blocks of 60 trials. The 5 year-old children were tested in 30-trial blocks. Sessions typically lasted 1.5–2 h. Adults required two or three sessions to complete the experiment and children required at three to six sessions to finish.
Data analysis
After verifying the absence of practice effects with split-half analyses, all the data from each individual listener in each condition were combined. Psychometric functions were estimated from these data using the procedures described by Wichman and Hill (2001a, 2001b). Three parameters were estimated for each function: threshold (target∕masker ratio corresponding to 51% correct), upper asymptote and slope. For each parameter, a bootstrapped estimate of the 95% confidence limits was also generated. The data presented here consist mainly of the threshold estimates and confidence limits for individual listeners in each condition.
Developmental trends and individual differences were assessed by fitting exponential decay functions [y=aebx+c] to relate the threshold data to age (e.g., Hartley et al., 2000; Schneider et al., 1989). The function parameters estimate the extent of decline (a), the rate of decline (b) and the asymptotic threshold (c). In order that the parameter estimating extent of decline could easily be interpreted relative to age 5 years (the age of the youngest listener tested in this study), 5 was subtracted from the actual age when fitting the functions. The deviations from fitted functions (residuals) were used to assess individual differences. Additionally, individual differences obtained in this study were compared to the individual differences from previous studies that involved an ipsilateral speech masker (Wightman and Kistler, 2005; Wightman et al., 2006). Because of the non-monotonicity of the psychometric functions obtained in those studies, each listener’s percent-correct performance at a target∕masker ratio of 0 dB was used instead of threshold T∕M ratios to examine individual differences.
RESULTS
A primary concern when conducting auditory research on children is the quality of the data. Children are notoriously inattentive in ways that may not be obvious, so contamination of the data with these non-sensory effects can be an issue. Although not all non-sensory factors can be controlled, one kind of attention can be monitored by examining a listener’s performance when conditions suggest it should be near perfect. This is one of the reasons why full psychometric functions are estimated. If the upper asymptote of the function is not near 100%, one obvious conclusion is that the listener was not paying attention on a certain proportion of the trials (Wightman and Allen, 1992). Fortunately, in this study, nearly all of the children had psychometric functions with upper asymptotes above 95% correct, as verified by the fitting procedure (Wichmann and Hill, 2001a, 2001b). The only exceptions were the speech masker conditions of one 7-year-old and several of the 5-year-olds. Other measures of data quality might include the size of the 95% confidence limits on the threshold estimates. The 95% confidence limits are naturally dependent on the slope of the psychometric function. Although the children produced psychometric functions with more shallow slopes in some of the conditions, there was no evidence of increases in confidence limits beyond that expected to result from the shallow psychometric functions. We conclude that the data obtained from the children in this study are of the same quality as the data from the adults.
Figures 12 show representative data from eight adults (Fig. 1) and from eight children (Fig. 2) in the four conditions of the experiment. The figures show data from two listeners (rows) at each of four ages (columns). These specific children and adults were chosen for display because their performance reflected the large individual differences in thresholds obtained at each age. Note that, for these listeners, the functions from the monaural control condition (no contralateral masker) and the condition in which a modulated noise was used as a contralateral masker are virtually identical. Note also that the rightward shift of the functions in the male and female contralateral speech masker conditions, reflecting informational masking, is highly listener dependent, but greater for the children than for the adults. Consider, for example, the two listeners aged 48 years (Fig. 1). The listener in the top panel showed no informational masking and the listener in the bottom panel showed about 5 dB of informational masking. A similar pattern can be observed in the data of the two children aged 11 years (Fig. 2), although the amount of informational masking for one child (bottom panel) was over 10 dB. Finally, note that for all of the listeners shown there is no difference between the functions from the male masker condition and the female masker condition.
Figure 3 shows the thresholds estimated from the psychometric functions of all listeners in all four conditions. The data are plotted on a log-age axis to emphasize the large change in performance at the younger ages. The fitted exponential functions relating threshold to age, the estimated parameters, and R2 values are also shown. Note that the exponential functions fit to the data in each of the four conditions accounted for more than 63% of the total variance. Several conclusions seem warranted given this display. First, the contralateral modulated noise masker had no effect; thresholds were the same in this condition as in the monaural control condition, as confirmed by the nearly identical parameters of the exponential function fits. Second, consistent with previous reports, thresholds in the baseline and the contralateral noise conditions were slightly elevated in children under the age of 10 years, with the youngest children demonstrating about a 5 dB elevation. Function fits in the two conditions predicted a 4.6 dB decrease in threshold with asymptotic values of −8.6 dB in the monaural and the modulated noise conditions. Third, there was no average difference in thresholds obtained with the male and female contralateral maskers, although for some children the difference was substantial. For each of the 60 listeners, comparisons between the male and female masker conditions were made using the 95% confidence intervals for the threshold estimates. Two of the children showed more informational masking with the female masker while six children had more informational masking with the male masker. None of the adult thresholds were statistically different. The exponential function fits in the male and female masker conditions also produced nearly identical parameters. The similarity of thresholds with both male and female contralateral maskers implies no release from informational masking with a masker talker of different gender than the target talker, a result that is inconsistent with the general findings when both masker and target are in the same ear (Brungart, 2001b; Wightman and Kistler, 2005). Fourth, there was an estimated 13 dB decrease in informational masking over the childhood years as indicated by the differences in the a parameters of the fitted functions between the male∕female conditions and the monaural control condition.
Figure 4 shows the differences between individual thresholds and the estimates derived from the exponential fits (residuals) in the three contralateral conditions. Individual differences were relatively small in the contralateral noise condition and were greater in the female and male masker conditions. Although the effect is not large, it is clear that individual differences are largest in the children between 7 and 12 years of age and smallest in the adults. This result is generally consistent with results from our previous studies of informational masking in children (Wightman et al., 2003; Wightman and Kistler, 2005; Wightman et al., 2006).
In speech informational masking experiments, analysis of the errors made by listeners can sometimes be revealing about the nature of the masking involved, or, stated differently, about the strategies used by listeners to segregate the target from the masker. For example, in the CRM experiments reported by Brungart and Simpson (2002a) and in our previous CRM experiments (Wightman and Kistler, 2005; Wightman et al., 2006), an analysis showed that many of the errors contained elements of the masker phrase(s). In other words, when the listener failed to name the correct color and number, either the color or the number or both that were named came from the masker. These intrusion errors suggest that listeners were processing both messages and were failing to identify or to report which was the correct one (in other words, a failure of the segregation strategy.) However, because errors occurred mainly when the target∕masker ratio was negative, the fact that the responses included masker components might simply reflect the listeners’ choice of the louder of the two messages (choice of an inappropriate segregation strategy.) Unfortunately, the data offer only limited leverage on this distinction. Although the overall pattern of errors in the current study is quite similar to that reported in previous studies, an analysis of these errors is not included here.
DISCUSSION
The results reported here are generally consistent with our previous reports on informational masking in children (Wightman et al., 2003; Wightman and Kistler, 2005; Wightman et al., 2006), and they reinforce our conclusion that informational masking is greater in children than in adults. The fact that this age effect lasts until the teenage years indirectly supports a connection between informational masking and the mechanisms and processes of attention. The process of focusing on a target message while attempting to ignore distracting messages is obviously a kind of selective attention. Several behavioral and electrophysiological studies suggest continued development of certain aspects of selective attention well into adolescence (Berman and Friedman, 1995; Coch et al., 2005; Doyle, 1973; Gomes et al., 2000; Ridderinkhof and van der Stelt, 2000; Zukier and Hagen, 1978). Also, given the relatively rapid average change in the impact of informational masking between the ages of six and 12 years (Fig. 3), it is perhaps not surprising that individual variability is highest in this age range (Fig. 4). This result is readily explained by individual differences in the development of attention skills (Doyle, 1973; Gomes et al., 2000) and is seen in all of our studies of informational masking in children (Wightman et al., 2003; Wightman and Kistler, 2005; Wightman et al., 2006).
Modern theories of attention (e.g., Pashler, 1998) argue that attention involves both bottom-up sensory processes and top-down cognitive processes. In the case of auditory attention, the bottom-up sensory processes might include those that are used to attend to one ear or the other, and the top-down processes might include those that mediate source segregation on the basis of linguistic or phonological parameters. In this context, the development of auditory attention can be viewed as involving differential development of both bottom-up and top-down processes.
The role of bottom-up attentional processes has been examined in studies of event-related brain potentials in adults (Woldorff et al., 1993). The results of these studies can be interpreted as reflecting the adult ability to attend selectively to one or the other ear in a dichotic listening paradigm. This ability is also called “early selection” (Broadbent, 1958) and is an important feature of some modern theories of attention (Ridderinkhof and van der Stelt, 2000). Early selection is usually thought to be mediated by physical properties of the stimulus, and, in the case of dichotic auditory stimuli, those properties would include the ear that is stimulated. It is possible that this early selection process is less well developed in children. However, the fact that our results reveal no additional masking with a noise stimulus in the non-target ear does suggest that children can differentiate between the two ears, depending on the stimulus properties, and thus that some components of their “early selection” mechanisms are intact.
Indirect evidence for the development of the top-down processes comes from studies of the role of the prefrontal cortex. It has been suggested that attending to a specific stimulus and filtering out competing stimuli involves prefrontal mediation (for a review see Miller and Cohen, 2001). That the prefrontal cortex may not be fully developed until late adolescence (Casey et al., 2005) might explain our findings that increased informational masking persists until at least 12 years of age.
Unlike most previous experiments on informational masking of speech, the only source of informational masking in this experiment was presented to the ear contralateral to the target. One previous study of speech informational masking (Brungart and Simpson, 2002b) included a condition nearly identical to the male contralateral masker condition of our experiment. In that study the male target talker was masked by speech spectrum noise, and the target threshold was at a T∕M ratio of about −7 dB, very close to the average value for adults (−8.7 dB) obtained in the current study (Fig. 3). When a speech masker (male talker) was presented to the contralateral ear, Brungart and Simpson reported a threshold T∕M of about −4 dB, which was also very close to our own result of −4.5 dB (Fig. 3)
The most notable differences between the results of this study and others are related to the fact that decreasing the similarity of the masker to the target (i.e., by changing from a male masker to a female masker) led to very little release from informational masking. For most children and adults, there was no difference between thresholds in the two conditions. In most informational masking experiments, with target and masker presented to the same ear, decreasing the similarity of the target and the masker(s) leads to substantial improvements in performance. Our own previous experiment with an ipsilateral female masker (Wightman and Kistler, 2005) produced about an 8 dB improvement for adult listeners, although the non-monotonicity of the psychometric functions complicated the assessment of the exact amount of masking release. At a T∕M ratio of 0 dB, the adult listeners in the previous study improved from 70% correct with a male masker to about 95% with a female masker. This is comparable to the improvement of 25%, from 60% correct to 85%, reported by Brungart (2001b) at the same T∕M ratio.
The lack of masking release with a female masker suggests that masking with a contralateral speech masker involves somewhat different attentional∕masking processes than masking with an ipsilateral masker. Thus, since ipsilateral and contralateral maskers may be processed differently, the data argue against any model of informational masking in which information from the two ears is simply summed (Brungart and Simpson, 2007). The data are also inconsistent with Treisman’s filter theory of attention (Treisman, 1964), which suggests that the signal in the unattended ear is attenuated and then combined with the signal in the target ear before further processing. Such a model would predict no difference in the patterns of intrusions in the ipsilateral and contralateral masker conditions. However, the data from the current experiment provide support for some aspects of Brungart and Simpson’s “integrated strategy model” (Brungart and Simpson, 2007). This model suggests that, in some conditions, the strategy that listeners use to segregate the target from the ipsilateral masker is also used to process the contralateral masker. In our case the ipsilateral masker was noise so the listener would be expected to adopt some kind of “energetic” strategy, given that masking was entirely energetic. If the same strategy were applied to the contralateral speech masker, it seems reasonable to expect no difference in masking effectiveness between same-sex and opposite-sex maskers.
The current measures of individual differences are different from what has been previously described in studies of speech informational masking, although there are few data with which they can be easily compared. In general, individual differences obtained here are smaller than those previously reported in studies involving maskers and targets in the same ear (especially in adults), and the age at which the individual differences are greatest is younger (Wightman and Kistler, 2005; Wightman et al., 2006). This suggests that the attentional processes tapped in the current and previous experiments may involve different mechanisms∕strategies that develop at different rates.
The hypothesis that different attentional strategies or mechanisms may be involved in the ipsilateral and contralateral masker conditions is supported by the results reported by Leech et al. (2007). In that study, listeners were asked to identify a key word in sentences of varying semantic complexity. Continuous, unrelated distracting speech was presented either to the same ear as the target sentence or to the opposite ear at a T∕M ratio of 0 dB. The results from 61 adults and 348 children (aged 5–18 years), in the form of percent correct as a function of inverse-transformed age, were well fit by linear functions. The slopes of these functions were significantly different in the ipsilateral and the contralateral ear conditions and suggested that performance in the ipsilateral condition reached adult levels later than performance in the contralateral condition. Figure 5 shows a comparable analysis of the results from the current study (contralateral male condition, proportion correct at T∕M ratio of 0 dB) and the results from 90 children and adults (ages 5–62) in the ipsilateral (male) masker condition (percent correct at T∕M ratio of 0 dB) from two of our previous studies (Wightman and Kistler, 2005; Wightman et al., 2006) and from unpublished data collected in our laboratory. For individuals who participated in more than one study, only data from the first study was included. Because the data represented percent correct scores, logistic functions were used to fit smooth curves to the data. The general forms of the current functions relating performance to age are very similar to those from the Leech et al. (2007) study, in spite of large differences in procedure and stimuli. The fact that the 95% confidence intervals of the function slopes for the ipsilateral data ([0.047, 0.061]) and for the contralateral data ([0.130, 0.178]) do not overlap indicates a significantly slower rate of change for the ipsilateral masker. This suggests a longer developmental course for performance in the ipsilateral masker condition than in the contralateral masker condition. Combined with the observation that, with a contralateral masker talker, gender does not appear to be a salient segregation cue, the results of the current study indicate that target-masker segregation is mediated by different processes when the masker is in the same ear as the target than when it is the opposite ear. This result could have important implications for interpretation of the results of experiments on the release from informational masking produced by spatial separation of target and masker (Arbogast et al., 2002; Brungart and Simpson, 2002a; Freyman et al., 1999; Litovsky, 2005; Noble and Perrett, 2002).
CONCLUSIONS
The experiment reported here measured informational masking in a speech task in a sample of 60 children and adults using a paradigm that allowed isolation of energetic and informational masking by presenting speech targets and maskers to opposite ears. Consistent with previous results, informational masking was larger in children than in adults. A novel finding was that the informational masking produced by male and female masker talkers was about the same. This finding stands in sharp contrast with results of ipsilateral speech masking studies which report a large release from informational masking for female maskers and male targets. The pattern of individual differences was also different than in previous ipsilateral speech informational masking studies in that the largest individual differences were observed in the younger children and not in the early teen-aged children. These two results were interpreted to suggest that the informational masking produced by ipsilateral and contralateral maskers may be mediated by different mechanisms or processes.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the assistance of Ms. Laricia Longworth-Reed in many facets of the research reported here. Ann M. Rothpletz provided helpful comments on several versions of the manuscript. Finally, great appreciation is expressed to the parents and children who participated in the research. Financial support was provided by a grant (FLW, PI) from the NIH (NICHD), Grant No. R01-HD023333.
Portions of this work were presented as a poster at the 30th Midwinter Meeting of the Association for Research in Otolaryngology, Denver, CO, February 2007.
References
- Arbogast, T. L., Mason, C. R., and Kidd, G., Jr. (2002). “The effect of spatial separation on informational and energetic masking of speech,” J. Acoust. Soc. Am. 112, 2086–2098. 10.1121/1.1510141 [DOI] [PubMed] [Google Scholar]
- Balakrishnan, U., and Freyman, R. L. (2008). “Speech detection in spatial and nonspatial speech maskers,” J. Acoust. Soc. Am. 123, 2680–2691. 10.1121/1.2902176 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman, S., and Friedman, D. (1995). “The development of selective attention as reflected by event-related brain potentials,” J. Exp. Child Psychol. 59, 1–31. 10.1006/jecp.1995.1001 [DOI] [PubMed] [Google Scholar]
- Bolia, R. S., Nelson, W. T., Ericson, M. A., and Simpson, B. D. (2000). “A speech corpus for multitalker communications research,” J. Acoust. Soc. Am. 107, 1065–1066. 10.1121/1.428288 [DOI] [PubMed] [Google Scholar]
- Broadbent, D. E. (1958). Perception and Communication (Pergamon, New York: ). 10.1037/10037-000 [DOI] [Google Scholar]
- Brungart, D. S. (2001a). “Evaluation of speech intelligibility with the coordinate response measure,” J. Acoust. Soc. Am. 109, 2276–2279. 10.1121/1.1357812 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S. (2001b). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101–1109. 10.1121/1.1345696 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., Chang, P. S., Simpson, B. D., and Wang, D. (2006). “Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Am. 120, 4007–4018. 10.1121/1.2363929 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., and Simpson, B. D. (2002a). “The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal,” J. Acoust. Soc. Am. 112, 664–676. 10.1121/1.1490592 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., and Simpson, B. D. (2002b). “Within-ear and across-ear interference in a cocktail-party listening task,” J. Acoust. Soc. Am. 112, 2985–2995. 10.1121/1.1512703 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., and Simpson, B. D. (2004). “Within-ear and across-ear interference in a dichotic cocktail party listening task: Effects of masker uncertainty,” J. Acoust. Soc. Am. 115, 301–310. 10.1121/1.1628683 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., and Simpson, B. D. (2005). “Interference from audio distracters during speechreading,” J. Acoust. Soc. Am. 118, 3889–3902. 10.1121/1.2126932 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., and Simpson, B. D. (2007). “Effect of target-masker similarity on across-ear interference in a dichotic cocktail-party listening task,” J. Acoust. Soc. Am. 122, 1724–1734. 10.1121/1.2756797 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., Simpson, B. D., Darwin, C. J., Arbogast, T. L., and Kidd, G. J. (2005). “Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task,” J. Acoust. Soc. Am. 117, 292–304. 10.1121/1.1835509 [DOI] [PubMed] [Google Scholar]
- Brungart, D. S., Simpson, B. D., Ericson, M. A., and Scott, K. R. (2001). “Informational and energetic masking effects in the perception of multiple simultaneous talkers,” J. Acoust. Soc. Am. 110, 2527–2538. 10.1121/1.1408946 [DOI] [PubMed] [Google Scholar]
- Casey, B. J., Tottenham, N., Liston, C., and Durston, S. (2005). “Imaging the developing brain: What have we learned about cognitive development?,” Trends Cogn. Sci. 9, 104–110. 10.1016/j.tics.2005.01.011 [DOI] [PubMed] [Google Scholar]
- Coch, D., Sanders, L. D., and Neville, H. J. (2005). “An event-related potential study of selective auditory attention in children and adults,” J. Cogn Neurosci. 17, 605–622. 10.1162/0898929053467631 [DOI] [PubMed] [Google Scholar]
- Doyle, A. B. (1973). “Listening to distraction: A developmental study of selective attention,” J. Exp. Child Psychol. 15, 100–115. 10.1016/0022-0965(73)90134-3 [DOI] [PubMed] [Google Scholar]
- Elliott, L. L. (1979). “Performance of children aged 9 to 17 years on a test of speech intelligibility in noise using sentence material with controlled word predictability,” J. Acoust. Soc. Am. 66, 651–653. 10.1121/1.383691 [DOI] [PubMed] [Google Scholar]
- Fallon, M., Trehub, S. E., and Schneider, B. A. (2000). “Children’s perception of speech in multitalker babble,” J. Acoust. Soc. Am. 108, 3023–3029. 10.1121/1.1323233 [DOI] [PubMed] [Google Scholar]
- Fallon, M., Trehub, S. E., and Schneider, B. A. (2002). “Children’s use of semantic cues in degraded listening environments,” J. Acoust. Soc. Am. 111, 2242–2249. 10.1121/1.1466873 [DOI] [PubMed] [Google Scholar]
- Freyman, R. L., Balakrishnan, U., and Helfer, K. S. (2001). “Spatial release from informational masking in speech recognition,” J. Acoust. Soc. Am. 109, 2112–2122. 10.1121/1.1354984 [DOI] [PubMed] [Google Scholar]
- Freyman, R. L., Helfer, K. S., and Balakrishnan, U. (2007). “Variability and uncertainty in masking by competing speech,” J. Acoust. Soc. Am. 121, 1040–1046. 10.1121/1.2427117 [DOI] [PubMed] [Google Scholar]
- Freyman, R. L., Helfer, K. S., McCall, D. D., and Clifton, R. K. (1999). “The role of perceived spatial separation in the unmasking of speech,” J. Acoust. Soc. Am. 106, 3578–3588. 10.1121/1.428211 [DOI] [PubMed] [Google Scholar]
- Gomes, H., Molholm, S., Christodoulou, C., Ritter, W., and Cowan, N. (2000). “The development of auditory attention in children,” Front. Biosci. 5, d108–d120. 10.2741/Gomes [DOI] [PubMed] [Google Scholar]
- Hall, J. W., III, Buss, E., and Grose, J. H. (2005). “Informational masking release in children and adults,” J. Acoust. Soc. Am. 118, 1605–1613. 10.1121/1.1992675 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall, J. W., III, Grose, J. H., Buss, E., and Dev, M. B. (2002). “Spondee recognition in a two-talker masker and a speech-shaped noise masker in adults and children,” Ear Hear. 23, 159–165. 10.1097/00003446-200204000-00008 [DOI] [PubMed] [Google Scholar]
- Hartley, D. E., Wright, B. A., Hogan, S. C., and Moore, D. R. (2000). “Age-related improvements in auditory backward and simultaneous masking in 6-to 10-year-old children,” J. Speech Lang. Hear. Res. 43, 1402–1415. [DOI] [PubMed] [Google Scholar]
- Helfer, K. S., and Freyman, R. L. (2008). “Aging and speech-on-speech masking,” Ear Hear. 29, 87–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Humes, L. E., Lee, J. H., and Coughlin, M. P. (2006). “Auditory measures of selective and divided attention in young and older adults using single-talker competition,” J. Acoust. Soc. Am. 120, 2926–2937. 10.1121/1.2354070 [DOI] [PubMed] [Google Scholar]
- Kidd, G., Jr., Mason, C. R., Arbogast, T. L., Brungart, D. S., and Simpson, B. D. (2003). “Informational masking caused by contralateral stimulation,” J. Acoust. Soc. Am. 113, 1594–1603. 10.1121/1.1547440 [DOI] [PubMed] [Google Scholar]
- Kidd, G., Jr., Mason, C. R., and Gallun, F. J. (2005). “Combining energetic and informational masking for speech identification,” J. Acoust. Soc. Am. 118, 982–992. 10.1121/1.1953167 [DOI] [PubMed] [Google Scholar]
- Leech, R., Aydelott, J., Symons, G., Carnevale, J., and Dick, F. (2007). “The development of sentence interpretation: Effects of perceptual, attentional and semantic interference,” Dev. Sci. 10, 794–813. 10.1111/j.1467-7687.2007.00628.x [DOI] [PubMed] [Google Scholar]
- Litovsky, R. Y. (2005). “Speech intelligibility and spatial release from masking in young children,” J. Acoust. Soc. Am. 117, 3091–3099. 10.1121/1.1873913 [DOI] [PubMed] [Google Scholar]
- Lutfi, R. A. (1990). “How much masking is informational masking?,” J. Acoust. Soc. Am. 88, 2607–2610. 10.1121/1.399980 [DOI] [PubMed] [Google Scholar]
- Lutfi, R. A., Kistler, D. J., Oh, E. L., Wightman, F. L., and Callahan, M. R. (2003). “One factor underlies individual differences in auditory informational masking within and across age groups,” Percept. Psychophys. 65, 396–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller, E. K., and Cohen, J. D. (2001). “An integrative theory of prefrontal cortex function,” Annu. Rev. Neurosci. 24, 167. 10.1146/annurev.neuro.24.1.167 [DOI] [PubMed] [Google Scholar]
- Moore, B. C., and Glasberg, B. R. (1987). “Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns,” Hear. Res. 28, 209–225. 10.1016/0378-5955(87)90050-5 [DOI] [PubMed] [Google Scholar]
- Neff, D. L. (1995). “Signal properties that reduce masking by simultaneous, random-frequency maskers,” J. Acoust. Soc. Am. 98, 1909–1920. 10.1121/1.414458 [DOI] [PubMed] [Google Scholar]
- Neff, D. L., and Dethlefs, T. M. (1995). “Individual differences in simultaneous masking with random-frequency, multicomponent maskers,” J. Acoust. Soc. Am. 98, 125–134. 10.1121/1.413748 [DOI] [PubMed] [Google Scholar]
- Neff, D. L., and Green, D. M. (1987). “Masking produced by spectral uncertainty with multicomponent maskers,” Percept. Psychophys. 41, 409–415. [DOI] [PubMed] [Google Scholar]
- Noble, W., and Perrett, S. (2002). “Hearing speech against spatially separate competing speech versus competing noise,” Percept. Psychophys. 64, 1325–1336. [DOI] [PubMed] [Google Scholar]
- Oh, E. L., and Lutfi, R. A. (1998). “Nonmonotonicity of informational masking,” J. Acoust. Soc. Am. 104, 3489–3499. 10.1121/1.423932 [DOI] [PubMed] [Google Scholar]
- Oh, E. L., Wightman, F., and Lutfi, R. A. (2001). “Children’s detection of pure-tone signals with random multitone maskers,” J. Acoust. Soc. Am. 109, 2888–2895. 10.1121/1.1371764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pashler, H. E. (1998). The Psychology of Attention (MIT, Cambridge, MA: ). [Google Scholar]
- Pollack, I. (1975). “Auditory informational masking,” J. Acoust. Soc. Am. 57, S5. 10.1121/1.1995329 [DOI] [Google Scholar]
- Ridderinkhof, K. R., and van der Stelt, O. (2000). “Attention and selection in the growing child: Views derived from developmental psychophysiology,” Biol. Psychol. 54, 55–106. 10.1016/S0301-0511(00)00053-3 [DOI] [PubMed] [Google Scholar]
- Schneider, B. A., Trehub, S. E., Morrongiello, B. A., and Thorpe, L. A. (1989). “Developmental changes in masked thresholds,” J. Acoust. Soc. Am. 86, 1733–1742. 10.1121/1.398604 [DOI] [PubMed] [Google Scholar]
- Stuart, A. (2008). “Reception thresholds for sentences in quiet, continuous noise, and interrupted noise in school-age children,” J. Am. Acad. Audiol 19, 135–146. 10.3766/jaaa.19.2.4 [DOI] [PubMed] [Google Scholar]
- Treisman, A. M. (1964). “Effect of irrelevant material on the efficiency of selective listening,” Am. J. Psychol. 77, 533–546. 10.2307/1420765 [DOI] [PubMed] [Google Scholar]
- Wichmann, F. A., and Hill, N. J. (2001a). “The psychometric function: I. Fitting, sampling, and goodness of fit,” Percept. Psychophys. 63, 1293–1313. [DOI] [PubMed] [Google Scholar]
- Wichmann, F. A., and Hill, N. J. (2001b). “The psychometric function: II. Bootstrap-based confidence intervals and sampling,” Percept. Psychophys. 63, 1314–1329. [DOI] [PubMed] [Google Scholar]
- Wightman, F. L., and Allen, P. (1992). “Individual differences in auditory capability among preschool children,” in Developmental Psychoacoustics, edited by Werner L. A. and Rubel E. W. (American Psychological Association, Washington, DC: ), pp. 113–133. 10.1037/10119-004 [DOI] [Google Scholar]
- Wightman, F. L., Callahan, M. R., Lutfi, R. A., Kistler, D. J., and Oh, E. (2003). “Children’s detection of pure-tone signals: Informational masking with contralateral maskers,” J. Acoust. Soc. Am. 113, 3297–3305. 10.1121/1.1570443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wightman, F. L., and Kistler, D. J. (2005). “Informational masking of speech in children: Effects of ipsilateral and contralateral distracters,” J. Acoust. Soc. Am. 118, 3164–3176. 10.1121/1.2082567 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wightman, F. L., Kistler, D. J., and Brungart, D. S. (2006). “Informational masking of speech in children: Auditory-visual integration,” J. Acoust. Soc. Am. 119, 3940–3949. 10.1121/1.2195121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D., and Bloom, F. E. (1993). “Modulation of early sensory processing in human auditory cortex during auditory selective attention,” Proc. Natl. Acad. Sci. U.S.A. 90, 8722–8726. 10.1073/pnas.90.18.8722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zukier, H., and Hagen, J. W. (1978). “The development of selective attention under distracting conditions,” Child Dev. 49, 870–873. 10.2307/1128259 [DOI] [Google Scholar]