Abstract
Previous studies in budgerigars (Melopsittacus undulatus) have indicated that they experience attention capture in a qualitatively similar way to humans. Here, we apply a similar objective auditory streaming paradigm, using modified budgerigar vocalizations instead of ABAB-… patterned pure tones, in the sound sequences. The birds were trained to respond to deviants in the target stream while ignoring the distractors in the background stream. The background distractor could vary among five different categories and two different sequential positions, while the target deviants could randomly appear at five different sequential positions and vary among two different categories. We found that unpredictable background distractors could deteriorate birds’ sensitivity to the target deviants. Compared to conditions where the background distractor appeared right before the target deviant, the attention capture effect decayed in conditions when the background distractor appeared earlier. In contrast to results from the same paradigm using pure tones, the results here are evidence for a faster recovery from attention capture using modified vocalization segments. We found that the temporally modulated background distractor captured birds’ attention more and deteriorated birds’ performance more than other categories of background distractor, as the temporally modulated target deviant enabled the birds to focus their attention toward the temporal modulation dimension. However, different from humans, birds have lower tolerances for suppressing the distractors from the same feature dimensions as the targets, which is evidenced by higher false alarm rates for the temporally modulated distractor than other distractors from different feature dimensions.
Keywords: Feature search, Vocalization, Budgerigars, Psychoacoustics
INTRODUCTION
Humans and other animal species that rely on acoustic signals in social communication might be confronted by challenging acoustic surroundings, where multiple sounds overlapping in time and frequency could arrive at the auditory receptors simultaneously. Some of these might include sounds that are unrelated to or could involuntarily disrupt the understanding of the sound target. Therefore, it is essential for the auditory system to evolve a mechanism to handle unexpected distractors or irrelevant sounds in every day sound perception, which involves top-down directed and bottom-up derived attentional processes (Salmi et al. 2009; Huang and Elhilali 2020). Depending on listeners’ expectations and task demands, top-down attention could facilitate the encoding of sound targets while simultaneously inhibiting the encoding of irrelevant sounds (Shamma et al. 2011; Lakatos et al. 2013). Bottom-up attention is an involuntary process which maintains a system’s vigilance to potential important changes in the environment (Corbetta and Shulman 2002; Kincade et al. 2005). fMRI studies suggest that the top-down and bottom-up attentional processes tend to activate mostly overlapping cortical networks (Salmi et al. 2009; Alho et al. 2015).
Attention capture is a bottom-up attentional process which causes an automatic attentional switch to a distractor. It has been demonstrated that the violation of cognitive expectedness, and not the novelty of a distractor, is the main driver of attention capture (Parmentier et al. 2011). Also, the attention capture process is not completely involuntary, its effects on target perception can be influenced by top-down processes (Luck et al. 2021). For example, specific foreknowledge or predictability of an imminent distractor could eliminate its interference on auditory perception (Sussman et al. 2003; Röer et al. 2015). Higher cognitive load in the task could reduce the event-related brain response that indicates the attention capture process (Berti and Schröger 2003; Bidet-Caulet et al. 2015). In addition, task irrelevant feature dimensions could modulate the attention capture effect. Task irrelevant features occurring within the distractor would disrupt auditory perception, but would facilitate perception if it occurs within the target (Dalton and Lavie 2004). Distractors that cue a task irrelevant feature dimension in a following target could alleviate the attention capture effect brought by the distractor (Sussman et al. 2003). Finally, temporal relationships between the target and distractor could affect the attention capture process. The attention capture effect disappeared when the onset asynchrony between distractor and target was getting longer (Schröger 1996; Gaeta et al. 2001). Visual or auditory distractors could lead to a short-lived facilitation effect when they appear right before or at the same time as the visual target (Folk et al. 1992; Folk and Remington 1998; Theeuwes 2005; SanMiguel et al. 2010). In short, the task settings could dramatically affect the attention capture effect in perception.
The process of auditory systems grouping dynamic sound features of sound mixtures into meaningful auditory objects is called auditory streaming (Bregman 1994). In both humans and other animal species, attention has been ubiquitously demonstrated to affect the target segregation process in auditory streaming (Carlyon et al. 2001; Snyder et al. 2006; Sussman et al. 2007; Cai and Dent 2020). In humans, the auditory streaming process can be reset by an abrupt change in the stimulus or brief switch of attention (Carlyon et al. 2001; Cusack et al. 2004; Thompson et al. 2011). Birds experience attention capture in a similar way as humans, where unexpected pure tone distractors in the background stream deteriorate the build-up process in the target stream, while predictable pure tone distractors mitigate the interference caused by attention capture (Cai and Dent 2020).
The neural infrastructure of the ascending auditory pathway has been evidenced to be optimized to process conspecific sounds (Fecteau et al. 2004; Schnupp et al. 2006; Theunissen and Shaevitz 2006; Recanzone 2008). Animal models have been used to unveil the effect that attention has on the neural representations of sounds, where pure-tone synthesized stimuli were exclusively used (Fritz et al. 2007a, b, 2010; Yin et al. 2014; Slee and David 2015). However, while streaming studies are common in animals such as birds (Dent and Bee 2018; Cai et al. 2018), few studies have investigated the attention capture effect with complex sounds. Moreover, attention capture has mostly been studied in vision in humans; fewer experiments have addressed how different categories of auditory distractors and temporal relationships between the target and distractor affect the attention capture process in animals. The present study investigates these two issues in behaving birds by using an objective auditory streaming paradigm with modified conspecific vocalizations.
METHODS
Experimental Animals
Seven adult budgerigars (five males and two females) participated in the experiment. The birds were either purchased from local pet suppliers or bred in the vivarium. The birds were individually housed and had free access to water. The vivarium was kept on a 12-h day/night cycle at the University at Buffalo, SUNY. Birds were maintained at 90–95% of their free-feeding body weights for the duration of the experiment. The birds were tested in two daily sessions, with each session lasting 45–60 min, 5–7 days a week.
Ethical Note
All procedures were approved by the University at Buffalo, SUNY’s Institutional Animal Care and Use Committee (IACUC) and were in accordance with the Guide for Care and Use of Laboratory Animals.
Stimuli
The sound stimulus consisted of a target stream and a background stream, both of which included eight repetitions of modulated contact call segments recorded from the same species by another laboratory (Tu et al. 2011). The target stream always preceded the background stream by 100 ms as a cue for the birds’ attention. The background stream was tailored to end simultaneously with the target stream. The standard segments in the target and background streams were generated by band passing the same contact call between 3100–4800 Hz and 100–1800 Hz, respectively (Figs. 1 and 2), which formed a 9.5-semitone frequency gap between the target and background streams. Previous studies have indicated that birds can generate a segregated perception when frequency separation between the target and background stream is larger than 6–8 semitones (Bregman 1978; Itatani and Klump 2017; Cai et al. 2018). In the target stream, the target deviant was either a different contact call (randomly selected from 402 segments recorded from the same bird) band passed between 3100 and 4800 Hz as the target standard segment; or modulated from the target standard segment by removing the temporal modulation above 5 Hz or 100 Hz as described in Elliott and Theunissen (2009), shown in Fig. 1. The background distractor was one of the following five sound segments: (1) TM distractor, removing temporal modulation above 5 Hz from the background standard segment; (2) FM distractor, removing spectral modulation above 0.5 cycle/kHz from the background standard segment; (3) “Alarm” distractor, band passing an alarm call between 100 and 1800 Hz; (4) “LongHarm” distractor, band passing a long harmonic call between 100 and 1800 Hz; or (5) “WhiteNoise” distractor, band passing a white noise burst between 100 and 1800 Hz, as shown in Fig. 2. All background segments, the target standard, and the temporally modulated target deviant were 158 ms, while the contact call target deviant could vary between 140 and 160 ms. The silent intervals between adjacent elements in the target or background stream were 50 ms. All segments were normalized to have the same RMS (root mean square) and were delivered at the same intensity (78 dB SPL).
Fig. 1.
Spectrograms of the target standard segment, the two temporally modulated (TM) target deviants with cut-off frequencies at 5 Hz and 100 Hz, and three samples of contact call deviants. All segments were limited between 3100 and 4800 Hz in frequency
Fig. 2.
Spectrograms of the background standard segments and the 5 different distractors. The cut-off frequencies of the temporally (TM) and spectrally (FM) modulated distractors were 5 Hz and 0.5 cycle/kHz, respectively. All segments were limited between 100 and 1800 Hz in frequency
The modulated deviants were generated by removing the high rates of spectral or temporal modulations from the corresponding standard call segments. The logarithm spectrogram of the target or background standard segments was obtained with Gaussian windows, as used in Singh and Theunissen (2003). Then, the two-dimensional fast Fourier transform (2D FFT) was calculated on the logarithm spectrogram. Spectral and temporal modulations were disentangled into separate terms in the transformed logarithm spectrogram, where the amplitudes of temporal or spectral modulations above the cut-off rates were set to zero. Finally, the waveforms of modulated deviants were formed by inverting the modulated spectrograms (Elliott and Theunissen 2009).
In each session, 70% of all trials were testing trials with a target deviant, as shown in Fig. 3. Among these, 60–70% were with band passed contact call deviants (Fig. 1), which could randomly appear at one of the five deviant positions (3rd to 7th) in the target stream at equal probability; the remaining were with temporally modulated contact call deviants, which only appeared at the 7th deviant position in the target stream. The background distractor could appear at either the 3rd or the 6th positions in the background stream in about 85% of the testing trials; the remaining testing trials had no background distractor. In each session, the remaining 30% of trials were sham trials, which had no target deviants in the target streams but always had background distractors in the background streams. By using the aforementioned stimuli combination, the birds were trained to ignore background distractors and only respond when they discriminated a target deviant.
Fig. 3.
Schematic spectrogram of a testing stimulus. Ts and BGs represent the target and background standard segments, respectively. Td represents the target deviant, which could be a band passed contact call appearing at one of the 5 deviant positions (3rd to 7th, black and white boxes), or a modulated Ts, which only appeared at the 7th position (white box). BGd represents the background distractor (one of the distractors in Fig. 2) which could randomly appear at the 3rd or 6th position (pink boxes). The target stream preceded the background stream by 100 ms, and the silent interval between the two adjacent segments was 50 ms
Behavioral Apparatus and Procedure
All birds were trained in an objective auditory streaming paradigm using an operant conditioning procedure. The experimental apparatus was described in detail in Cai et al. (2018). The birds were trained to peck the left key to initiate a sound sequence and discriminate a deviant appearing in the target stream by pecking the right key within 800 ms of the deviant onset. Responding to the deviant within 800 ms was counted as a “hit” trial and the birds were reinforced by 1.2 s access to millet. Responding before the target deviant onset would terminate the sound immediately and was followed by a 6-s blackout of the house light, during which the birds could not initiate a trial, no data was recorded, and the trial was repeated. Responses after 800 ms were counted as a “miss,” and no reinforcements or punishments were given after miss trials. For sham trials, no deviant appeared in the target stream. The response window onsets started at the same time as the sham stimuli onset, and offsets were extended to match the latest possible response window offset of testing trials. This was designed to prevent the birds from responding right after the sound offset to obtain some chances of rewards while 100% avoiding blackout. Hence, the birds needed to withhold their response throughout the presentation of the sham stimulus. Responding during the sham trial was counted as a “false alarm,” which terminated the sound stimulus immediately, and the birds were punished by a 6-s blackout of the house light. No response during a sham trial was counted as “correct rejection,” and the birds were reinforced by millet at a probability of 50% for correct rejected sham trials.
Statistical Analysis
Each bird’s overall hit rate and false alarm rates were calculated for each session. Sessions with overall false alarm rates lower than 20% and overall hit rates higher than 70% were used for statistical analysis. For each combination of background distractor category and position, the last 60 sham trials, the last 20 trials of the two modulated target deviants at the 7th position, and the last 9 trials of the band passed contact call deviants appearing at the 4th and the 7th positions in the target stream were collected for statistical analysis. Birds’ sensitivity to each target deviant were calculated at different background distractor conditions (d′ = ZHIT − ZFA) separately, where ZHIT and ZFA are the Z scores of a target deviant hit rate and false alarm rate obtained at the same background distractor condition. Four two-way rmANOVAs (2 background distractor positions 5 background distractor categories) were conducted on the false alarm rates and birds’ sensitivities to the three target deviants separately. Another two-way rmANOVA (5 background distractor categories 2 target deviant positions) was conducted on the sensitivities to the contact call deviants appearing at the 4th and the 7th positions with background distractors appearing right before them. For all rmANOVAs, sphericity assumptions were met according to Mauchly’s test (p > 0.05). Normality assumptions were met according to a Shapiro–Wilk test for each combination of background distractor category and position (p > 0.05). All analyses were performed in the R statistical computing environment (R version 4.0.3).
RESULTS
The present experiment followed up a previous study that tested the attention capture effect in birds performing an auditory streaming task (Cai and Dent 2020). Here, complex stimuli edited from budgerigar calls were used instead of simple pure tones. The experimental paradigm was designed to train birds to only respond to target deviants while holding back their responses to distractors appearing in the competing background stream. False alarm rates for each combination of background position and category were calculated separately. Birds’ sensitivities to temporally modulated or band passed contact call deviants were calculated for statistical analysis. Our studies reveal changes in false alarm rates and differences in target detection sensitivity across distractor type and distractor location conditions, which suggests that many factors are involved in attention capture in birds.
False alarms differed across trials with different background distractor conditions (Fig. 4a). A two-way rmANOVA (2 background distractor positions 5 background distractor categories) was conducted on the Z scores of the false alarm rates. We found a significant main effect of background distractor categories (F(4, 24) = 6.49, p = 0.001) and background distractor positions (F(1, 6) = 18.65, p = 0.005). The interaction between the background distractor position and background distractor category was not significant (F(4, 24) = 0.41, p = 0.8). Bonferroni post hoc tests indicated that the birds’ false alarm rates for the temporally modulated background distractor were significantly higher than for the frequency modulated call (p = 0.02), band passed alarm call (p = 0.002), and band passed white noise background distractors (p = 0.03). No other pairwise comparisons were significant (p > 0.05). False alarm rates were higher for trials where the background distractor was located in the third position than when it was in the sixth position. The higher false alarm rate to the temporally modulated background distractor is evidence that the birds had a more difficult time ignoring the temporally modulated distractor appearing in the background stream, even though all background distractors were delivered at the same sound intensity level.
Fig. 4.
Bean plot results for different background distractor conditions: false alarm Z scores (a), d′ sensitivities to the temporally modulated 5 Hz (b), 100 Hz (c), and band passed contact call target deviants (d) appearing at the 7th position in the target stream, as shown in Fig. 3. In each bean, the horizontal colored bar represents the mean across subjects, circles represent data for individual birds, and the bold vertical bar represents the SD of the results for that condition. The contour of each bean is a kernel density estimate of the data in that condition. FM, TM, A, LH, and WN represent frequency modulated distractor, temporally modulated distractor, band passed alarm calls, band passed long harmonic calls, and band passed white noise background distractors, as displayed in Fig. 2. The numbers in the horizontal tick labels represent the position of the distractor in the background stream
Thresholds also differed across targets with different background distractor types and locations when the target deviants had a temporal modulation cut-off by 5 Hz (Fig. 4b). A two-way rmANOVA (2 background distractor positions 5 background distractor categories) was conducted on the calculated sensitivities for this condition. We found a significant main effect of background distractor categories (F(4, 24) = 2.94, p = 0.04) and background distractor positions (F(1, 6) = 41.51, p = 0.0007). The interaction between the background distractor position and background distractor category was not significant (F(4, 24) = 0.33, p = 0.86). Bonferroni post hoc showed no significant paired comparisons between different background distractor categories. When the background distractor appeared at the 6th position in the background stream, which is right before the target deviant, it elicited more attention capture, leading to lower target deviant sensitivity.
Sensitivity was much lower, and did not differ across conditions, when the target deviant had a temporal modulation cut-off frequency by 100 Hz (Fig. 4c). For this condition, a two-way rmANOVA (2 background distractor positions 5 background distractor categories) was conducted on the calculated sensitivities. The main effects of background distractor categories (F(4, 24) = 0.26, p = 0.90) and background distractor positions (F(1, 6) = 0.95, p = 0.37) were not significantly different, nor was the interaction between the background distractor position and background distractor category (F(4, 24) = 2.07, p = 0.12). The lack of any effect for this target deviant might have been caused by a floor effect, as shown by the high similarity of spectrograms between this target and the target standard in Fig. 1. Also, birds’ discrimination performance on this target was barely above chance (overall hit rates were 34 ± 3%). Hence, the attention capture effect failed to degrade birds’ performance further.
When the band passed contact call deviant appeared at the 7th position in the target stream, sensitivity again differed (Fig. 4d). A two-way rmANOVA (2 background distractor positions 5 background distractor categories) was conducted on the calculated sensitivities for this experimental condition. We found a significant main effect of background distractor categories (F(4, 24) = 6.09, p = 0.002) and background distractor positions (F(1, 6) = 8.88, p = 0.03). The interaction was not significant (F(4, 24) = 0.88, p = 0.49). Bonferroni post hoc tests indicated that the temporally modulated background distractors generally caused more of an attention capture effect, which led to significantly lower target deviant sensitivity than the frequency modulated (p = 0.04) and band passed alarm call (p = 0.003) conditions. Also, background distractors appearing right before the target deviants (position 6) decreased the birds’ sensitivity more than when they were earlier in the stream (position 3).
To compare the band passed contact call deviants appearing at the 4th and the 7th positions with background distractors appearing right before them, we conducted a two-way rmANOVA (2 target deviant positions 5 background distractor categories) on birds’ sensitivities in these conditions. We found a significant effect of background distractor categories (F(4, 24) = 5.12, p = 0.004), while the effect of target positions was not significant (F(1, 6) = 2.08, p = 0.2). Bonferroni post hoc tests indicated that the temporally modulated background distractor led to worse sensitivity to the target than the band passed alarm call background distractor (p = 0.0005). Thus, although attention capture differs across background distractor types, it is affected similarly in the two conditions where background distractors appear right before the target deviants (as shown in Fig. 5).
Fig. 5.

Bean plot sensitivities to band passed contact call target deviants appearing at the 4th and 7th positions, with the background distractors appearing right before the two target deviant positions. FM3-CC4, TM3-CC4, A3-CC4, LH3-CC4, WN3-CC4 represent band passed contact call target deviants appearing at the 4th position in the target stream, and the background distractors appearing at the 3rd position in the background stream. FM6-CC7, TM6-CC7, A6-CC7, LH6-CC7, WN6-CC7 represent band passed contact call target deviants appearing at the 7th position in the target stream, and the background distractors appearing at the 6th position in the background stream. All other information is the same as in Fig. 4
We did not observe any effect of target deviant position on the birds’ performance to the contact call target deviant in trials without background distractors (F(4, 24) = 1.7, p = 0.18), as shown in Fig. 6, although psychophysical studies in both humans and budgerigars have demonstrated the build-up effect in an auditory streaming task (Bregman 1978; Cai et al. 2018). Here, the absence of the build-up effect might be ascribed to a ceiling effect (as shown in Fig. 6). Also, previous studies in both humans and budgerigars have indicated that any abrupt disruption of attention would reset the streaming process and deteriorate the build-up process (Thompson et al. 2011; Cai and Dent 2020). Consequentially, the unpredictable background distractors across 85% of the trials in the task could also have impaired the build-up process.
Fig. 6.
Mean hit rates to band passed contact call target deviants as a function of position appeared in the target stream. Data were from trials where no background distractor appeared. Different color circles represent data from different birds at each position (x coordinates were offset for visibility), and asterisks represent mean value at each position
DISCUSSION
In the present study, we modified natural vocalizations recorded from budgerigars and used these complex signals in an objective auditory streaming task to study the attention capture effect. The attention capture effect was gauged by measuring birds’ performance in a discrimination task. Birds were trained to discriminate a deviant in a target stream while ignoring the distractor appearing in the competing background stream. The unpredictable position of the target deviant across trials trained the birds to hold their attention throughout the presentation of the sound stimulus. The occurrence of the background distractor encouraged the birds to pay selective attention to the target stream to perform the task. We found that, similar to a study using pure tones (Cai and Dent 2020), attention capture also occurs when using modified natural vocalizations. Specifically, different categories of background distractor affected birds’ performance differently. Distractors with the same temporal modulation features as the target deviant tended to capture the birds’ attention more and degrade birds’ performance more than the other distractors. Also, the attention capture effect decayed over time, as the distractor appearing right before the target affected the birds’ performance more than the distractor appearing about 700 ms before the target.
We found significant effects of background distractor position on the birds’ sensitivity to target deviants, which reached significance for both the 5 Hz temporally modulated and the band passed contact call target deviants. Specifically, distractors appearing right before the target deviant captured birds’ attention more than those appearing 700 ms earlier, which indicated some degree of recovery from attention capture during the 700 ms interval. In humans, both the physiological and psychophysical markers of attention capture disappeared when the temporal interval between the preceding distractor and target deviant reached 560 ms (Schröger 1996). Hence, similar to humans, the attention capture effect decayed over time in birds. However, a previous study in the same species using the same paradigm with pure tones (Cai and Dent 2020) indicated a longer attention capture effect. It is likely that the difference was caused by different stimuli used in the two studies. In humans, visual distractors with different biological function or peripheral physiological correlates capture attention differently (Carretié et al. 2004, 2011; Carretié 2014), and attention capture elicited by environmental sounds or pure tones also leads to different recovery time (Escera et al. 2001). In addition, attention capture related brain responses in humans have been demonstrated to increase as the magnitude of the auditory change increased (Yago et al. 2001), and birds have shown worse performance with more salient distractors (Cai and Dent 2020). Hence, it is also possible that the shorter recovery time happens here because the relative perceptual salience between the background distractor and background standards is smaller than that used in previous study.
In addition, we found that the temporally modulated background distractor tended to capture birds’ attention in the task more than other distractors, at least for the contact call target deviants but not the temporally modulated target deviants. Visual studies in humans have found that distractors that share a feature dimension with a following target deteriorate the visual target search more (Folk et al. 1992; Harris et al. 2015). This deterioration effect is not suppressible even when the shared feature in the distractor has predictable properties (Weichselbaum and Ansorge 2018). However, we did not find a more severe attention capture effect brought by the temporally modulated distractor than the other distractors when discriminating temporally modulated target deviants. The temporal modulation envelope of the 5 Hz temporally modulated background distractor was more similar to that of the 5 Hz temporally modulated target deviant than the other distractors (as shown in Fig. 7). Analysis of the false alarm rates also indicated that the birds had more difficulty suppressing the 5 Hz temporally modulated background distractor than the other background distractors during sham trials. This strongly suggests that the birds selectively focused their attention search to task-relevant feature dimensions in the task. Hence, it is likely that when the target deviant was 5 Hz temporally modulated, the 5 Hz temporally modulated background distractor right before it was less salient compared to the other background distractor types. Therefore, it failed to cause a larger attention capture effect. However, when the target deviants were band passed contact calls, the 5 Hz temporally modulated background distractor “popped” into the birds’ attention. In this case, the feature dimension of the distractor was similar to the task-related feature. Hence, the birds’ attention was captured by the temporally modulated background distractor more than the other distractors.
Fig. 7.
Temporal envelopes of sound segments used to generate the stimuli. The plots titled with “Target Standard,” “TM 5 Hz,” and “TM 100 Hz” represent the temporal envelopes of the target standard, and the 5 Hz and 100 Hz temporally modulated target deviants, respectively (as shown in Fig. 1). The remaining plots display the temporal envelopes of sound segments used in the background stream, as shown in Fig. 2
One caveat of the present studies is that the interaction effects between the background distractor category and position were not significant across all conditions. One potential reason was that the effect of position was confounded by the different times where background distractors appeared relative to stimulus onsets, and background distractors appearing later might lead to higher false alarm rates. However, it is not likely that this can completely explain the results without considering the attention capture effect. First, we found no difference in birds’ sensitivities to band passed contact calls appearing at the 4th and 7th target positions with background distractors appearing right before them. If the confound of position on false alarm rates was the main driving factor toward the position effect observed here, we should have observed a lower sensitivity to deviants appearing at the 7th than that at the 4th position with the background distractors right before them. Also, Cai and Dent (2020) found that birds trained with the same paradigm did not show significantly different false alarm rates between conditions without background distractors and conditions with unpredictable background distractors.
In summary, the present study investigated the attention capture effect in birds in an objective auditory streaming paradigm, for the first time using modified complex vocalizations as stimuli. Birds’ auditory attention was captured by an unpredicted distractor appearing in the task-irrelevant background stream, and birds showed faster recovery from the interference in the complex stimuli context than they did in a simple pure tone stimuli context (Cai and Dent 2020). Similar to humans, birds can focus their attention to a specific feature dimension of the sound target. Birds do have a lower tolerance to the salience of distractors than humans, however.
Acknowledgements
Thank you to Faiza Hafeez for her help with data collection. This work was supported by the Mark Diamond Research Fund SU-19-02 to HC.
Funding
The research was funded by the Mark Diamond Research Fund SU-19–02 to HC.
Data availability
Data and material used in the study is available on request from the corresponding author.
Code Availability
Code related to the study is available on request from the corresponding author.
Declarations
Ethics Approval
All experiments were approved by the University at Buffalo, SUNY’s Institutional Animal Care and Use Committee (IACUC) and were in accordance with the Guide for Care and Use of Laboratory Animals.
Conflict of Interest
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Alho K, Salmi J, Koistinen S, Salonen O, Rinne T. Top-down controlled and bottom-up triggered orienting of auditory attention to pitch activate overlapping brain networks. Brain Res. 2015;1626:136–145. doi: 10.1016/j.brainres.2014.12.050. [DOI] [PubMed] [Google Scholar]
- Berti S, Schröger E. Working memory controls involuntary attention switching: evidence from an auditory distraction paradigm. Eur J Neurosci. 2003;17:1119–1122. doi: 10.1046/j.1460-9568.2003.02527.x. [DOI] [PubMed] [Google Scholar]
- Bidet-Caulet A, Bottemanne L, Fonteneau C, Giard M-H, Bertrand O. Brain dynamics of distractibility: interaction between top-down and bottom-up mechanisms of auditory attention. Brain Topogr. 2015;28:423–436. doi: 10.1007/s10548-014-0354-x. [DOI] [PubMed] [Google Scholar]
- Bregman AS. Auditory streaming is cumulative. J Exp Psychol Hum Percept Perform. 1978;4:380–387. doi: 10.1037//0096-1523.4.3.380. [DOI] [PubMed] [Google Scholar]
- Bregman AS. Auditory scene analysis: the perceptual organization of sound. Cambridge: Massachusetts London; 1994. [Google Scholar]
- Cai H, Dent ML. Attention capture in birds performing an auditory streaming task. PLoS ONE. 2020;15:e0235420. doi: 10.1371/journal.pone.0235420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai H, Screven LA, Dent ML. Behavioral measurements of auditory streaming and build-up by budgerigars (Melopsittacus undulatus) J Acoust Soc Am. 2018;144:1508–1516. doi: 10.1121/1.5054297. [DOI] [PubMed] [Google Scholar]
- Carlyon RP, Cusack R, Foxton JM, Robertson IH. Effects of attention and unilateral neglect on auditory stream segregation. J Exp Psychol Hum Percept Perform. 2001;27:115. doi: 10.1037//0096-1523.27.1.115. [DOI] [PubMed] [Google Scholar]
- Carretié L. Exogenous (automatic) attention to emotional stimuli: a review. Cogn Affect Behav Neurosci. 2014;14:1228–1258. doi: 10.3758/s13415-014-0270-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carretié L, Ruiz-Padial E, López-Martín S, Albert J. Decomposing unpleasantness: differential exogenous attention to disgusting and fearful stimuli. Biol Psychol. 2011;86:247–253. doi: 10.1016/j.biopsycho.2010.12.005. [DOI] [PubMed] [Google Scholar]
- Carretié L, Hinojosa JA, Martín-Loeches M, Mercado F, Tapia M. Automatic attention to emotional stimuli: neural correlates. Hum Brain Mapp. 2004;22:290–299. doi: 10.1002/hbm.20037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbetta M, Shulman GL. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci. 2002;3:201–215. doi: 10.1038/nrn755. [DOI] [PubMed] [Google Scholar]
- Cusack R, Deeks J, Aikman G, Carlyon RP. Effects of location, frequency region, and time course of selective attention on auditory scene analysis. J Exp Psychol Hum Percept Perform. 2004;30:643–656. doi: 10.1037/0096-1523.30.4.643. [DOI] [PubMed] [Google Scholar]
- Dalton P, Lavie N. Auditory attentional capture: effects of singleton distractor sounds. J Exp Psychol Hum Percept Perform. 2004;30:180–193. doi: 10.1037/0096-1523.30.1.180. [DOI] [PubMed] [Google Scholar]
- Dent ML, Bee MA. Principles of auditory object formation by nonhuman animals. In: Slabbekoorn H, Dooling RJ, Popper AN, Fay RR, editors. Effects of anthropogenic noise on animals. New York: Springer, New York; 2018. pp. 47–82. [Google Scholar]
- Elliott TM, Theunissen FE. The modulation transfer function for speech intelligibility. PLOS Comput Biol. 2009;5:e1000302. doi: 10.1371/journal.pcbi.1000302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Escera C, Yago E, Alho K. Electrical responses reveal the temporal dynamics of brain events during involuntary attention switching. Eur J Neurosci. 2001;14:877–883. doi: 10.1046/j.0953-816x.2001.01707.x. [DOI] [PubMed] [Google Scholar]
- Fecteau S, Armony JL, Joanette Y, Belin P. Is voice processing species-specific in human auditory cortex? An fMRI study. Neuroimage. 2004;23:840–848. doi: 10.1016/j.neuroimage.2004.09.019. [DOI] [PubMed] [Google Scholar]
- Folk CL, Remington R. Selectivity in distraction by irrelevant featural singletons: evidence for two forms of attentional capture. J Exp Psychol Hum Percept Perform. 1998;24:847–858. doi: 10.1037//0096-1523.24.3.847. [DOI] [PubMed] [Google Scholar]
- Folk CL, Remington RW, Johnston JC. Involuntary covert orienting is contingent on attentional control settings. J Exp Psychol Hum Percept Perform. 1992;18:1030–1044. doi: 10.1037/0096-1523.18.4.1030. [DOI] [PubMed] [Google Scholar]
- Fritz JB, Elhilali M, Shamma SA. Adaptive changes in cortical receptive fields induced by attention to complex sounds. J Neurophysiol. 2007;98:2337–2346. doi: 10.1152/jn.00552.2007. [DOI] [PubMed] [Google Scholar]
- Fritz JB, Elhilali M, David SV, Shamma SA. Does attention play a role in dynamic receptive field adaptation to changing acoustic salience in A1? Hear Res. 2007;229:186–203. doi: 10.1016/j.heares.2007.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fritz JB, David SV, Radtke-Schuller S, Yin P, Shamma SA. Adaptive, behaviorally gated, persistent encoding of task-relevant auditory information in ferret frontal cortex. Nat Neurosci. 2010;13:1011. doi: 10.1038/nn.2598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaeta H, Friedman D, Ritter W, Cheng J. An event-related potential evaluation of involuntary attentional shifts in young and older adults. Psychol Aging. 2001;16:55. doi: 10.1037/0882-7974.16.1.55. [DOI] [PubMed] [Google Scholar]
- Harris AM, Becker SI, Remington RW. Capture by colour: evidence for dimension-specific singleton capture. Atten Percept Psychophys. 2015;77:2305–2321. doi: 10.3758/s13414-015-0927-0. [DOI] [PubMed] [Google Scholar]
- Huang N, Elhilali M (2020) Push-pull competition between bottom-up and top-down auditory attention to natural soundscapes. eLife 9:e52984. 10.7554/eLife.52984 [DOI] [PMC free article] [PubMed]
- N Itatani GM Klump 2017 Animal models for auditory streaming Phil Trans R Soc B 372:20160112 10.1098/rstb.2016.0112 [DOI] [PMC free article] [PubMed]
- Kincade JM, Abrams RA, Astafiev SV, Shulman GL, Corbetta M. An event-related functional magnetic resonance imaging study of voluntary and stimulus-driven orienting of attention. J Neurosci. 2005;25:4593–4604. doi: 10.1523/JNEUROSCI.0236-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lakatos P, Musacchia G, O’Connel Monica N, Falchier Arnaud Y, Javitt Daniel C, Schroeder Charles E. The spectrotemporal filter mechanism of auditory selective attention. Neuron. 2013;77:750–761. doi: 10.1016/j.neuron.2012.11.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luck SJ, Gaspelin N, Folk CL, Remington RW, Theeuwes J. Progress toward resolving the attentional capture debate. Vis Cogn. 2021;29:1–21. doi: 10.1080/13506285.2020.1848949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parmentier FBR, Elsley JV, Andrés P, Barceló F. Why are auditory novels distracting? Contrasting the roles of novelty, violation of expectation and stimulus change. Cognition. 2011;119:374–380. doi: 10.1016/j.cognition.2011.02.001. [DOI] [PubMed] [Google Scholar]
- Recanzone GH. Representation of con-specific vocalizations in the core and velt areas of the auditory cortex in the alert macaque monkey. J Neurosci. 2008;28:13184–13193. doi: 10.1523/JNEUROSCI.3619-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Röer JP, Bell R, Buchner A. Specific foreknowledge reduces auditory distraction by irrelevant speech. J Exp Psychol Hum Percept Perform. 2015;41:692–702. doi: 10.1037/xhp0000028. [DOI] [PubMed] [Google Scholar]
- Salmi J, Rinne T, Koistinen S, Salonen O, Alho K. Brain networks of bottom-up triggered and top-down controlled shifting of auditory attention. Brain Res. 2009;1286:155–164. doi: 10.1016/j.brainres.2009.06.083. [DOI] [PubMed] [Google Scholar]
- SanMiguel I, Linden D, Escera C. Attention capture by novel sounds: distraction versus facilitation. Eur J Cogn Psychol. 2010;22:481–515. doi: 10.1080/09541440902930994. [DOI] [Google Scholar]
- Schnupp JWH, Hall TM, Kokelaar RF, Ahmed B. Plasticity of temporal pattern codes for vocalization stimuli in primary auditory cortex. J Neurosci. 2006;26:4785–4795. doi: 10.1523/JNEUROSCI.4330-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schröger E. A neural mechanism for involuntary attention shifts to changes in auditory stimulation. J Cogn Neurosci. 1996;8:527–539. doi: 10.1162/jocn.1996.8.6.527. [DOI] [PubMed] [Google Scholar]
- Shamma SA, Elhilali M, Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011;34:114–123. doi: 10.1016/j.tins.2010.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh NC, Theunissen FE. Modulation spectra of natural sounds and ethological theories of auditory processing. J Acoust Soc Am. 2003;114:3394–3411. doi: 10.1121/1.1624067. [DOI] [PubMed] [Google Scholar]
- Slee SJ, David SV. Rapid task-related plasticity of spectrotemporal receptive fields in the auditory midbrain. J Neurosci. 2015;35:13090–13102. doi: 10.1523/JNEUROSCI.1671-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snyder JS, Alain C, Picton TW. Effects of attention on neuroelectric correlates of auditory stream segregation. J Cogn Neurosci. 2006;18:1–13. doi: 10.1162/089892906775250021. [DOI] [PubMed] [Google Scholar]
- Sussman E, Winkler I, Schröger E. Top-down control over involuntary attention switching in the auditory modality. Psychon Bull Rev. 2003;10:630–637. doi: 10.3758/bf03196525. [DOI] [PubMed] [Google Scholar]
- Sussman ES, Horváth J, Winkler I, Orr M. The role of attention in the formation of auditory streams. Percept Psychophys. 2007;69:136–152. doi: 10.3758/bf03194460. [DOI] [PubMed] [Google Scholar]
- Theeuwes J. CHAPTER 69 - Irrelevant singletons capture attention. In: Itti L, Rees G, Tsotsos JK, editors. Neurobiology of attention. Burlington: Academic Press; 2005. pp. 418–424. [Google Scholar]
- Theunissen FE, Shaevitz SS. Auditory processing of vocal sounds in birds. Curr Opin Neurobiol. 2006;16:400–407. doi: 10.1016/j.conb.2006.07.003. [DOI] [PubMed] [Google Scholar]
- Thompson SK, Carlyon RP, Cusack R. An objective measurement of the build-up of auditory streaming and of its modulation by attention. J Exp Psychol Hum Percept Perform. 2011;37:1253–1262. doi: 10.1037/a0021925. [DOI] [PubMed] [Google Scholar]
- Tu H-W, Smith EW, Dooling RJ. Acoustic and perceptual categories of vocal elements in the warble song of budgerigars (Melopsittacus undulatus) J Comp Psychol. 2011;125:420. doi: 10.1037/a0024396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weichselbaum H, Ansorge U. Bottom-up attention capture with distractor and target singletons defined in the same (color) dimension is not a matter of feature uncertainty. Atten Percept Psychophys. 2018;80:1350–1361. doi: 10.3758/s13414-018-1538-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yago E, Corral MJ, Escera C. Activation of brain mechanisms of attention switching as a function of auditory frequency change. NeuroReport. 2001;12:4093–4097. doi: 10.1097/00001756-200112210-00046. [DOI] [PubMed] [Google Scholar]
- Yin P, Fritz JB, Shamma SA. Rapid spectrotemporal plasticity in primary auditory cortex during behavior. J Neurosci. 2014;34:4396–4408. doi: 10.1523/JNEUROSCI.2799-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data and material used in the study is available on request from the corresponding author.
Code related to the study is available on request from the corresponding author.






