Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2021 Jan 13;11:916. doi: 10.1038/s41598-020-79641-z

Discrimination of natural acoustic variation in vocal signals

Adam R Fishbein 1,2,#, Nora H Prior 1,2,#, Jane A Brown 1, Gregory F Ball 1,2, Robert J Dooling 1,2,
PMCID: PMC7807010  PMID: 33441711

Abstract

Studies of acoustic communication often focus on the categories and units of vocalizations, but subtle variation also occurs in how these signals are uttered. In human speech, it is not only phonemes and words that carry information but also the timbre, intonation, and stress of how speech sounds are delivered (often referred to as “paralinguistic content”). In non-human animals, variation across utterances of vocal signals also carries behaviorally relevant information across taxa. However, the discriminability of these cues has been rarely tested in a psychophysical paradigm. Here, we focus on acoustic communication in the zebra finch (Taeniopygia guttata), a songbird species in which the male produces a single stereotyped motif repeatedly in song bouts. These motif renditions, like the song repetitions of many birds, sound very similar to the casual human listener. In this study, we show that zebra finches can easily discriminate between the renditions, even at the level of single song syllables, much as humans can discriminate renditions of speech sounds. These results support the notion that sensitivity to fine acoustic details may be a primary channel of information in zebra finch song, as well as a shared, foundational property of vocal communication systems across species.

Subject terms: Animal behaviour, Auditory system, Social behaviour, Social neuroscience

Introduction

The parallels between acoustic communication in humans and non-human animals have fascinated casual observers as well as scientific researchers at least as far back as Aristotle1,2. This has motivated widespread investigation into how complex “information” and “meaning” are encoded in vocal signals36. These studies in both humans and non-human animals (hereafter “animals”) have been dominated by a search for linguistic content: identifying basic units and categories of vocalizations (such as words in humans and call types in non-human primates) and describing how these units may be combined into meaningful sequences713. But comparing the linguistic capabilities of humans and animals is difficult because these processes are largely internal, and animals appear to lack anything comparable to human semantics and syntax1418.

In contrast to the linguistic domain, both humans and animals can communicate complex information through subtle variation in acoustic features of the “voice” (e.g. pitch, intensity, timbre, and intonation)1921. Between individuals, this acoustic variation can encode information such as individual and group identity2227. Within individuals, this acoustic variation can carry socially-relevant information such as emotional or motivational state2835. For example, humans can perceptually categorize at least eight, and perhaps more than twelve, affective states from variation in the voice alone36,37. This shared ability of humans and animals to extract information from such fine acoustic variation is likely an important shared biological foundation of acoustic communication38.

Among acoustic communication systems, birdsong has long been a dominant model of human speech and vocal learning, and considerable research has been dedicated to describing the significance of variation in rhythm, timing, and the number and order of elements in birdsong within and across species2,3941. Birdsong also exhibits subtle acoustic variation across renditions and contexts (e.g.4244), but the significance of this information in communication has until recently gone largely unexplored. We now know that birds are acutely sensitive to variation in acoustic fine structure—the rapid fluctuations in amplitude and frequency within the envelope of a sound waveform4547. Studies with natural song and calls confirm that songbirds use the acoustic structure of vocal signals for individual recognition25,27. Additionally, songbirds are exquisitely sensitive to manipulations of individual syllables, while some species, at least, are relatively insensitive to changes in syllable order27,48, suggesting that fine acoustic variation, such as across renditions of song syllables, may be a primary carrier of information.

In the current study, we used a psychophysical paradigm to ask whether birds can perceive naturally occurring variation in their vocal signals that are typically judged to be the same by conventional spectrographic analyses. The zebra finch (Taeniopygia guttata) is particularly well suited for asking this question since males learn a single highly stereotyped motif, comprised of 3–8 harmonic syllables in a fixed sequence, which they repeat multiple times in a song bout49. Here, we tested how well zebra finches can discriminate different renditions of song syllables from a given male. As a check on whether this ability reflects a general or specialized perceptual process, we also tested zebra finches on a task involving discriminating different renditions of human vowels from the same talker. Finally, for comparison, we tested the ability of human participants to discriminate renditions of zebra finch motifs and human vowels.

Results

Zebra finches can discriminate renditions of song motifs and syllables

Our first aim was to determine if zebra finches can discriminate motifs produced in the same context and from the same or adjacent song bouts. Figure 1A identifies distinct renditions of motifs and syllables within a single male’s song bout. Because every male’s song is quite different, we used song from three different males. We created five stimulus sets per male motif to be used in psychoacoustic experiments. Three examples of these five stimulus sets are shown in the bottom panel of Fig. 1A. These are composed of near-identical renditions of whole motifs and motifs where a single syllable was replaced with one from a different rendition. The acoustic similarity between renditions was assessed using the percent similarity score in Sound Analysis Pro 2011 (SAP), which uses feature-based metrics of Euclidean distances50. By this measure, renditions of whole motifs were between 91 and 99% similar.

Figure 1.

Figure 1

(A) Time waveform (top) and spectrogram (bottom) for a natural zebra finch song bout (made in Sonic Visualiser51). The blue lines denote distinct motif renditions, composed of the same four syllables (indicated as discrete notes via the time waveform). The panel below indicates how individual syllables were identified within a single motif rendition. A single motif (e.g. rendition 1) was used as a repeating background (ABCD, with fixed inter-syllable silences) and targets (sounds different from the background that birds were tasked with discriminating) consisted of different renditions of all syllables or one of the syllables. The right panel shows exemplar stimulus sets used to test discrimination ability among renditions of motifs and syllables. Birds were tested on five stimulus sets per male’s motif (three of which are shown), for each of three males. For the whole motif rendition set, the background was composed of each syllable from one motif rendition and each of the seven targets were comprised of each syllable from a different rendition (stimulus set: whole motif renditions). For the other four stimulus sets, the background was again composed of each syllable from a motif rendition, and each target contained a different rendition of a single syllable and the same rendition as the background for the other syllables (stimulus set: syllable A renditions and syllable B renditions). Target syllables differing from the background are green and bolded. (B) Schematic of the behavioral operant task. While listening to a repeating background sound, a bird can initiate a trial by pressing the observation key. After a 2–7 s random interval, another peck to the observation key resulted in the presentation of a target sound (a different rendition) alternated with the background sound. Each target is presented twice within a 3 s response window during which a peck on the report key is scored as a hit. (C) Average performance (corrected percent correct (PC*) for discriminating target stimuli, mean ± SEM, N = 4) on discrimination of motif and syllable renditions from three male zebra finch motifs (ZF Motif 1–3). The authors thank Shelby Lawson for the drawing of the zebra finch in (B).

The discrimination ability of four zebra finches (two males and two females) was tested using an operant psychoacoustic paradigm (Fig. 1B). Overall, zebra finches were easily able to discriminate all motif and syllable renditions for each of the three male’s motifs (ZF Motif 1–3) (Fig. 1C). While performance was quite good overall, individual performance was statistically different for tests on syllable renditions, depending on motif and syllable (lmer model: motif * syllable interaction, χ2 = 16.20, df = 6, p = 0.013)—this is particularly evident when looking at performance on syllables C and D (Fig. 1C). This raises the question of why certain syllables may be more discriminable than others.

Zebra finch performance is influenced by syllable features

Zebra finch song syllables can be categorized into distinct syllable types (7–12 types) using acoustic features that are standard in many sound analysis programs49,52,53. In order to investigate how performance was related to syllable features, we quantified the following acoustic measures for each syllable from all three male’s motifs used as background stimuli: duration, fundamental frequency, mean frequency, peak frequency, goodness of pitch (referred to here as “harmonicity”), frequency modulation, amplitude modulation, entropy, spectral continuity over time, and spectral continuity over frequency.

We used linear regression models to ask what features of the background syllables explained differences in performance on discriminating syllable renditions (Table 1). Overall, performance was related to mean frequency, duration, and harmonicity of the background syllables, such that higher frequency syllables tended to be more easily discriminated and, depending on the motif, syllables longer in duration and more harmonic were more easily discriminated. Figure 2 shows which syllable renditions were easiest and hardest for each motif. Note that while the spectrogram and time waveform are useful for visualizing the acoustic differences between types of syllables, they do not capture the rendition-to-rendition acoustic variation that zebra finches are so easily able to discriminate. Still, these results provide insight into what types of syllables might be particularly good sources for behaviorally relevant information in rendition-to-rendition variation.

Table 1.

Results of linear regression models asking what types of syllables were most easily discriminated.

What features of syllables explain performance differences in syllable rendition discrimination?
χ2 Adj P-value R2
Fixed effect
FundFreq 3.77 0.224 0.152
MeanFreq 9.27 0.023* 0.294
PeakFreq 4.13 0.224 0.170
FreqMod 1.40 0.630 0.091
AmpMod 2.95 0.322 0.192
Entropy 1.65 0.598 0.101
Contt 0.03 0.993 0.032
Contf 0.67 0.993 0.175
Interaction
Duration:Motif 12.81 0.023* 0.262
Harmonicity:Motif 14.56 0.021* 0.265

Each model consisted of mean PC* scores on each task other than the whole motif renditions (4 syllable rendition tasks for each of the 3 motifs) for each subject as the response variable (37 observations), the fixed effects were an acoustic feature of the relevant background syllable (e.g. duration), motif set (3 levels), and the interaction between motif set and acoustic feature, and the random effects were subject and task (12 levels) to account for the repeated measures design of the experiment. The formulas for the models were as follows: PC* ~ Feature*Motif + (1|Subject) + (1|Task). Chi-square (χ2) values (comparing the full model against a model that includes all other terms) are given for the fixed effects involving an acoustic feature, unless the interaction between acoustic feature and motif set was significant, in which case that is reported. P-values for chi-square tests were adjusted using the Benjamini–Hochberg false discovery rate procedure. Marginal R2 is given for the fixed effects for each model. * < 0.05.

FundFreq fundamental frequency, MeanFreq mean frequency, PeakFreq peak frequency, FreqMod frequency modulation, AmpMod amplitude modulation, Contt spectral continuity over time, Contf spectral continuity over frequency.

Figure 2.

Figure 2

Time waveform (top) and spectrogram (bottom) for each background stimulus used for each of the three male’s motifs used in this experiment (made in Sonic Visualiser51). For each motif, the syllables are indicated (AD) based on their position in the motif. For each motif, the syllable rendition that was easiest across birds is highlighted in blue and the syllable that is the hardest is indicated in red. For ZF Motif 2, the spectrograms and time waveforms for the easiest and hardest syllable rendition targets are depicted below the background stimulus. Note that while the spectrogram and time waveform are useful for visualizing the acoustic differences between types of syllables, they do not capture the rendition-to-rendition acoustic variation that zebra finches are so easily able to discriminate.

Zebra finches can also discriminate natural variation in human vowels

As a test of whether zebra finch discrimination of natural variation is specific to their own vocalizations or reflects a more general capacity, we also tested the birds on renditions of natural human vowels (/a/, /i/, and /u/) (Fig. 3A). Just as with the conspecific vocal stimuli, zebra finches easily discriminated between renditions of spoken human vowels, showing that this perceptual sensitivity to acoustic details is not specific to conspecific signals (Fig. 3B).

Figure 3.

Figure 3

(A) Time waveform (top) and spectrogram (bottom) of a single human speaker producing the sustained vowels: /a/ /i/ /u/, as in “father”, “eat”, and “goose” respectively (made in Sonic Visualiser51). To prepare vowels for psychoacoustic tests, the middle section of a sustained vowel was extracted and given a 5 ms cosine rise/fall time. (B) Performance (PC*, mean ± SEM, N = 3) on discriminating the different renditions of extracted vowels from three human speakers.

Human discrimination of natural acoustic variation

For comparison, we also tested three human participants on the different rendition stimulus sets for one male’s motif (ZF Motif 2) and one speaker’s vowels (Fig. 4A). Human participants easily discriminated among renditions of spoken vowels (Fig. 4B). The ability of human participants to discriminate among renditions of song motifs and syllables was more mixed. Humans performed very well at discriminating whole motif renditions and syllable A; however, their mean performance was below 50% PC* for the other syllables (Fig. 4C).

Figure 4.

Figure 4

(A) Description of human psychophysical testing. Subjects were stationed at the same computers which controlled the bird operant tasks, outfitted with Sennheiser HD280PRO headphones, and given a response panel with two keys. Subjects were told they would be listening to a repeating background sound, during which they should press the observation key to cause a change in the background and to press the report key whenever they heard a change. (B) Performance (PC*, mean ± SEM, N = 3) on discriminating the different renditions of vowels from a single human speaker. (C) Performance (PC*, mean ± SEM, N = 3) on discriminating the different renditions of zebra finch song.

Discussion

Perhaps because the linguistic content (phonology, syntax, semantics) of human speech is so unique, much of the research comparing acoustic communication systems in humans and non-human animals has focused on finding parallels to these components of human language. However, the non-verbal acoustic features of vocalizations are also rich sources of socially-relevant information across taxa. Here, we show that zebra finches are easily able to discriminate renditions of syllables and motifs of multiple male’s songs. Furthermore, zebra finches are also easily able to discriminate between renditions of human vowels from multiple speakers. Additionally, we show that human subjects perform very well in discriminating motif and vowel renditions using the same behavioral paradigm. These results support the notion that the perception of subtle acoustic variation in the utterances of vocal signals is a fundamental aspect of acoustic communication across species.

A historical reliance on spectrographic analysis of birdsong based on visual representation of song in sonograms has biased observers away from the potential importance of subtle acoustic details like fine structure, which are not easily discerned in sonograms54. Indeed, the acoustic variation that humans and birds discriminate here are not evident in the spectrogram or time waveform. Instead, researchers have often assumed that important information is contained in the sequential patterns of birdsong (perhaps reflecting an assumption of linguistic content). However, experiments both in the field and lab show that sequence may not matter much, at least in some species16,48. By contrast, previous work has shown that birds, compared to humans and other mammals, have superior auditory temporal resolution and excel in the ability to discriminate changes in acoustic fine structure, or rapid fluctuations in frequency and amplitude, of both synthetic and natural complex signals46,5559. For zebra finches particularly, there is evidence that some of the smallest differences in acoustic fine structure found in their vocal signals may encode information about sex, call type, and individual identity47. This exquisite sensitivity to syllable details contrasts with the birds’ relative insensitivity to changes in syllable sequence in a motif27,48.

In the current experiment employing natural complex stimuli, we cannot determine whether the birds attended more to the faster changes in amplitude and frequency associated with fine structure or slower envelope cues. However, based on the abilities described in the previous paragraph, we might predict that zebra finches can discriminate across renditions of song syllables and human vowels based on variation in acoustic fine structure alone. Here, we used traditional acoustic measures in SAP to explain variation in performance across motifs and syllables. While these measures most certainly do not capture all the relevant acoustic features of the syllables, we provide evidence that syllables higher in mean frequency and, depending on the motif, syllables longer in duration and more harmonic may be particularly rich in behaviorally relevant information. In order to identify more precisely the perceptual mechanisms underlying the discrimination of natural renditions, further research using a wider set of stimuli, including experimentally manipulated sounds, would be needed to disentangle envelope and acoustic fine structure cues.

Regardless of the perceptual mechanisms, here we provide strong evidence that zebra finches can perceive some of the smallest acoustic variation present in their song. The more challenging question is whether and to what extent this variation is behaviorally relevant across social contexts. Certainly, there are already several lines of evidence that fine-grained variation in the acoustic structure of zebra finch calls conveys significant information (i.e. motivational state and breeding condition32,6062), and this may be true for song as well27,44. Furthermore, zebra finch song is used for courtship and pair maintenance, so rendition variability could convey properties of the sender such as mate quality, hormonal condition, and motivational state. Zebra finch song is also modulated by social context and can be classified as “directed” (female-directed) or “undirected”. The acoustic differences between these contexts has been shown to be important for mate choice63. Both directed (which we use in this study, see “Methods” section) and undirected song are composed of the same stereotyped motif but directed song is faster, longer, contains more introductory notes and has increased stereotypy at the level of the syllable, motif, and whole song63. Combined, these lines of research highlight the potential importance of these subtle acoustic features, including acoustic fine structure, for communication. Specific manipulation of acoustic fine structure in studies of natural behavior would be pivotal in testing the idea that it is a primary carrier of information in song.

Human participants in our study easily discriminated variation in human vowels and some of the song syllables. While humans are exquisitely primed to extract linguistic content64, a long line of research clearly shows that they are also sensitive to paralinguistic acoustic features of the voice in both speech and non-speech vocalizations37,65. In a recent study, Spierings and ten Cate (2014) tested zebra finches and humans on categorizing speech sounds based on prosodic (pitch, duration, and amplitude) or sequence (xxyy vs xyxy) cues66. When responding to test stimuli where subjects could use both cues, zebra finches always used prosodic cues more than sequence cues, while human participants used both. Thus, there is strong converging evidence that zebra finch song perception primarily focuses on acoustic details akin to the paralinguistic features of human speech.

Our current study adds to a growing body of research illustrating the parallels between the non-verbal content of human speech and acoustic communication in animals. Even though zebra finch song seems to lack linguistic content, the emotion and meaning contained in the acoustic fine structure of song could very well exceed that of human speech. Our study focused primarily on birds, but subtle acoustic variation in the “voice,” within categories and units of vocal signals, is well-documented across a range of species, including anurans (reviewed in67), such as in the territorial calls of Central American tree frogs68, and numerous species of mammals and birds. Altogether, there are many lines of evidence which would suggest that acoustic communication of affective state is a shared, foundational property of vocal communication systems19,65,69.

Methods

Subjects

Adult male and female zebra finches (> 120 days old) were used for these experiments. For the psychophysical experiments, five zebra finches (three males and two females) in total were used. Three zebra finches (one male and two females) were tested on all 15 stimulus sets for the zebra finch motif and syllable rendition experiments. An additional male zebra finch was tested on two of the stimulus sets. Three zebra finches (one male and two females) were tested on all 9 stimulus sets for the vowel rendition experiments. Two of the birds (two females) were tested on both the zebra finch song and vowel experiments. During the experiments, subjects were housed in individual cages on a light cycle of 8L:16D. Birds were mildly food deprived at about 90–95% of their free feeding weight to ensure they were motivated to engage in the psychophysical testing. White hulled millet was used as a food reward in the testing apparatus and birds received an additional portion of pellet or mixed seed at the end of the day. Birds also had access to grit and, occasionally, vegetables, fruit, or hard-boiled egg. Additionally, three human subjects were tested using the same psychophysical paradigm.

Preparation of zebra finch stimuli

We recorded directed song from three zebra finches in a foam-covered room. Recordings were made using a tie-clip microphone (AKG C417) and a Zoom F8 multitrack field recorder (sampling rate of 44.1 kHz). Songs were viewed in Adobe Audition (v. 2015.2), and motifs were selected that did not have competing background noise (i.e. wing fluffs, cage noises, and female calls, etc.) (Fig. 1). Using Adobe Audition, motifs were high-pass filtered with a cutoff frequency of 350 Hz. Consecutive motif renditions were taken when possible, on the assumption that this would maximize the similarity in acoustic fine structure of syllables. Eight renditions of individual syllables were then extracted from eight motif renditions for further preparation to be used as psychophysical stimuli in these experiments. The same eight motif renditions were used for each syllable type, and extracted syllables were given identifiers based on the syllable type (position in the motif A–D) and motif rendition (1–8). Thus, for three zebra finch songs, we had syllables A1–A8, B1–B8, etc.

After individual syllable renditions were extracted, motif stimuli were generated in MATLAB (MathWorks, Natick, MA). Stimulus motifs were created, making two adjustments in order for the stimuli to be appropriate for psychophysical testing. First, inter-syllable silences were fixed for each stimulus motif so that birds could not use differences in inter-syllable silences as a cue. These inter-syllable silences were based on the naturally occurring silences for a single rendition of that male’s motif. Second, each extracted syllable was given a 5 ms cosine rise/fall time. Consistent rise/fall times are necessary to preserve the acoustic features of syllables following inter-syllable intervals of complete silence.

In the psychophysical discrimination experiments, the repeating background stimulus was a motif with syllables in the natural order. For the whole motif rendition set, the background was composed of each syllable from one motif rendition (A1, B1, C1, D1) and each of the seven targets were comprised of each syllable type from a different rendition (e.g. A2, B2, C2, D2). For the other four stimulus sets, the background was again composed of each syllable from one motif rendition (A1, B1, C1, D1), and for each of the seven targets a single syllable was substituted from a different rendition (e.g. A2, B1, C1, D1; A3, B1, C1, D1) (Fig. 1).

Description of acoustic features

We analyzed the acoustic features of the motif renditions using Sound Analysis Pro 2011 (SAP)50. We quantified the acoustic similarity between each target rendition and the background using the percent similarity score in SAP, which uses feature-based metrics of Euclidean distances. In addition, we used SAP to describe key features of syllables (i.e. duration, fundamental frequency, mean frequency, peak frequency, goodness of pitch, frequency modulation, amplitude modulation, entropy, spectral continuity over time, and spectral continuity over frequency) by using the feature statistics to generate averages of the above features from the onset to the offset of each syllable. Goodness of pitch (referred to as “harmonicity” throughout the manuscript) is an estimate of how periodic the sound is, and values are higher when sounds are more harmonic and less noisy. Entropy is based on Wiener entropy values and estimates the noisiness or randomness of the sound. Spectral continuity measures continuity of frequency contours across time windows (whether spectral slopes are continuous). Spectral continuity over time values are high when the contours are long and spectral continuity over frequency values are high when the frequency range of the contours is high.

Preparation of vowel stimuli

Recordings were made of three human male speakers producing the sustained vowels /a/ /i/ /u/. Speakers were instructed to utter each instance of the vowel as consistently as possible. Recordings from one speaker were made in a foam-covered room by tie-clip microphones (AKG C417) and a Zoom F8 multitrack field recorder at a sampling rate of 44.1 kHz. Recordings from the other two speakers were made in an acoustically treated room by an AKG 414 ULS condenser mic into an apogee duet into Ableton Live at 44.1 kHz. A 150 ms section of the sustained vowel was extracted from the middle of each utterance in Adobe Audition and given a 5 ms cosine rise/fall time. Birds were additionally tested on two sets of stimuli created in the same way but with a 100 ms section of the vowel extracted from the middle of each utterance. A stimulus duration of 100 ms was chosen because this is similar to the average duration for zebra finch song syllables53.

Apparatus

As described previously, e.g.48,56,57, birds were trained and tested in a wire cage (23 × 25 × 16 cm) anchored inside of a sound-attenuated chamber (Industrial Acoustics Company, Bronx, NY, model IAC-3) lined with acoustic foam. Two response keys, each consisting of an LED attached to a microswitch, were mounted to the wall of the cage directly in front of a perch. Millet was delivered through activation of a solenoid. Stimuli were stored as wav files on an Intel Core 2 Duo computer (Mid Atlantic Data Systems, Gaithersburg, MD), which controlled the psychoacoustic experiments. The computer operated a Tucker Davis Technologies (TDT) System 3 module (Alachua, FL) that sent signals through a Crown D-75 amplifier (Elkhart, IN) and to an Orb full range point source speaker (Model Mod 1, Orb Audio, Sherman Oaks, CA), which was placed 40 cm above the bird’s head when standing on the perch. All stimuli were resampled to 24,414 Hz as required for the TDT system.

Psychophysical task

Birds were trained through operant conditioning to perform a psychophysical discrimination task. The training and testing procedures have been described in detail previously, e.g.48,56,57. Pure tones were used in training birds on the task and individuals were tested for months to years with this psychophysical task on a variety of stimuli. Subjects were not previously tested on the rendition stimuli used in these experiments. The discrimination task proceeded as follows: the birds listened to a repeating background sound and pecked the left LED (the observation key) to initiate a trial. This first peck on the left LED initiated a random interval of 2–7 s. Following this random interval of 2–7 s, another peck on the observation key resulted in the presentation of a target stimulus. If the bird pecked the right LED (the report key) within 3 s following the presentation of a target stimulus, this was considered a “hit” and they received a positive reinforcement (2 s access to millet from a food hopper) (Fig. 1B). Birds generally performed 100 trials in a session consisting of 10 × 10-trial blocks. Three of the trials within a 10-trial block were sham trials in which the background sound was inserted instead of a target, providing a means of assessing false alarm rate. If the bird pressed the report key during a sham trial (considered a “false alarm”) or outside of the response window, they received a mild punishment where the house lights were turned off for a short blackout period, which was set between 1–14 s at the start of the session depending on the response proclivities of each individual bird. If a bird performed with a high false alarm rate on a session, then the blackout time was set higher on the subsequent session. All stimuli were normalized to be played at 65 dBA measured with an SPL meter (BK Precision model 732) at approximately the location of the bird’s head when positioned in front of the observation key. Motifs were presented at a rate of 1/s so that there was always about a 300 ms interval between the end of one motif and the beginning of another. Thus, each bird had the opportunity to hear the target alternated with the background twice during the response window.

For each stimulus set, birds were tested in 100-trial sessions until their performance stabilized over 200 trials with a false alarm rate (# of sham trials resulting in a false alarm/total # of sham trials) below 20% for each 100-trial block and a difference in hit rate (# of target trials resulting in a hit/total # of target trials) less than 15% between blocks. It took birds 2–4 100-trial blocks to achieve stable performance. Overall, the average false alarm rates across birds was very low (mean ± SEM: 4.12 ± 2.32).

However, two separate individuals were not able to meet the false alarm criterion for one task (one bird on Motif 1 Syllable C Renditions and a different bird on Motif 2 Syllable D Renditions). In these two cases, the birds were tested on additional 100 trial blocks until their false alarm rate stabilized (28% in one case and 36% in the other). In these two instances, the birds also had low hit rates on those tasks (26% and 57%, respectively). As these birds met the false alarm criterion on all other tasks, the high false alarm rate was indicative of their difficulty in discriminating among renditions of those particular stimuli.

Human testing

We also tested humans on the same stimuli used to test the birds and on a similar psychophysical procedure. Human subjects were recruited from staff and students in the lab and informed consent was obtained from all participants. Participants had no prior experiences with these stimuli. The human testing procedure was modeled after the procedure used with the birds. Subjects were stationed at the same computers which controlled the bird operant tasks, outfitted with Sennheiser HD280PRO headphones, and given a response panel with two keys. Subjects were told they would be listening to a repeating background sound, during which they should press the observation key to effect a change in the background and to press the report key whenever they heard a change. Subjects were tested on a subset of the stimulus sets: a single bird’s motif (motif 2) and a single speaker’s vowels (speaker 2). Humans ran for one 100-trial session on each task.

Analysis

Performance (hits/misses/false alarms) on each task was summarized in 100-trial blocks for each individual and pooled together to calculate an averaged hit rate for each target and false alarm rate for each 100-trial block. The 200 trials that met the criterion were averaged and used for analysis. We used corrected percent correct (PC*) as a performance measure in order to minimize effects of different false alarm rates on each task70,71. Hit rates and false alarm rates were used to derive PC*:

PC=100×Hit rate-False alarm rate/100-False alarm rate

We conducted a linear mixed effects model using the function lmer from the lme4 package72 in R (v.3.6.3, R Foundation for Statistical computing)73 to test for differences in performance across motifs and syllables for each task other than the whole motif renditions (12 tasks total). In this model, PC* for each subject on each task was the response variable (37 observations), the fixed effects were motif (3 levels), syllable (4 levels), and the interaction between motif and syllable, and the random effect was subject. The formula for the model was as follows: PC* ~ Motif*Syllable + (1|Subject). We also used linear mixed effects models to ask what features of background syllables explained differences in performance. Each model consisted of PC* for each subject on each task (other than the whole motif renditions) as the response variable (37 observations), the fixed effects were an acoustic feature (e.g. duration) of the relevant background syllable, motif set (3 levels), and the interaction between motif set and acoustic feature, and the random effects were subject and task (12 levels). The formulas for the models were as follows: PC* ~ Feature*Motif + (1|Subject) + (1|Task). The function r.squaredGLMM from the MuMIn package74 was used to calculate marginal R2 for the fixed effects for each model. The function Anova from the car package75 was used to perform type 2 Wald chi-square tests, providing a chi-square (χ2) value and p-value for the fixed effects involving an acoustic feature in each regression model (comparing the full model against a model that includes all other terms). Adjusted p-values to account for multiple testing were calculated using the Benjamini–Hochberg false discovery rate procedure.

Ethics

Animal procedures were approved by the University of Maryland Animal Care and Use Committee (protocol number: 1191420). These procedures followed the Animal Behavior Society (ABS) and Acoustical Society of America (ASA) guidelines for the use of animals in research. The procedures for human participant work was approved by the University of Maryland Institutional Review Board (protocol number: 1361480). These procedures followed the ASA guidelines for the use of human participants in research and informed consent was obtained from all participants.

Acknowledgements

Thank you to the entire Ball/Dooling lab for feedback on early stages of the project and for help with animal husbandry. Thank you to Mattson Ogg for sharing recordings of human vowel renditions. For feedback on the experiments as well as versions of the manuscript, thank you to Savannah Clough, Matthew D. Taves, Benjamin A. Sandkam, William J. Idsardi, Juan Uriagereka, and two anonymous reviewers.

Author contributions

A.F. and N.P. equally contributed to writing the main manuscript text. All authors reviewed the manuscript.

Funding

This work was funded by a T32 training grant from the National Institutes of Health to N.H.P. and A.F. (NIDCD T32-DC00046), an F31 Grant to A.F. (NIDCD F31-DC017884), and a National Science Foundation award (under Grant No.1449815) to A.F.

Data availability

Data will be made available upon reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Adam R. Fishbein and Nora H. Prior.

References

  • 1.Peck AL. History of Animals. London: Loeb Classical Library; 1984. [Google Scholar]
  • 2.Doupe AJ, Kuhl PK. Birdsong and human speech: Common themes and mechanisms. Annu. Rev. Neurosci. 1999;22:567–631. doi: 10.1146/annurev.neuro.22.1.567. [DOI] [PubMed] [Google Scholar]
  • 3.Smith WJ. Message, meaning, and context in ethology. Am. Nat. 1965;99:405–409. doi: 10.1086/282382. [DOI] [Google Scholar]
  • 4.Kolodny O, Edelman S. The evolution of the capacity for language: The ecological context and adaptive value of a process of cognitive hijacking. Philos. Trans. R. Soc. B Biol. Sci. 2018;373:20170052. doi: 10.1098/rstb.2017.0052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Edelman S. Language and other complex behaviors: Unifying characteristics, computational models, neural mechanisms. Lang. Sci. 2017;62:91–123. doi: 10.1016/j.langsci.2017.04.003. [DOI] [Google Scholar]
  • 6.Seyfarth RM, Cheney DL. The origin of meaning in animal signals. Anim. Behav. 2017;124:339–346. doi: 10.1016/j.anbehav.2016.05.020. [DOI] [Google Scholar]
  • 7.Berwick RC, Okanoya K, Beckers GJL, Bolhuis JJ. Songs to syntax: The linguistics of birdsong. Trends Cogn. Sci. 2011;15:113–121. doi: 10.1016/j.tics.2011.01.002. [DOI] [PubMed] [Google Scholar]
  • 8.ten Cate C. The comparative study of grammar learning mechanisms: Birds as models. Curr. Opin. Behav. Sci. 2018;21:13–18. doi: 10.1016/j.cobeha.2017.11.008. [DOI] [Google Scholar]
  • 9.Manser MB. Psychological Mechanisms in Animal Communication. New York: Springer; 2016. pp. 223–249. [Google Scholar]
  • 10.Suzuki TN, Wheatcroft D, Griesser M. The syntax–semantics interface in animal vocal communication. Philos. Trans. R. Soc. B. 2020;375:20180405. doi: 10.1098/rstb.2018.0405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Coye C, Ouattara K, Arlet ME, Lemasson A, Zuberbühler K. Flexible use of simple and combined calls in female Campbell's monkeys. Anim. Behav. 2018;141:171–181. doi: 10.1016/j.anbehav.2018.05.014. [DOI] [Google Scholar]
  • 12.Bradbury JW, Vehrencamp SL. Principles of Animal Communication. Sunderland: Sinauer Associates; 1998. [Google Scholar]
  • 13.Kershenbaum A, et al. Acoustic sequences in non-human animals: A tutorial review and prospectus. Biol. Rev. 2016;91:13–52. doi: 10.1111/brv.12160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jackendoff R, Pinker S. The nature of the language faculty and its implications for evolution of language (Reply to Fitch, Hauser, and Chomsky) Cognition. 2005;97:211–225. doi: 10.1016/j.cognition.2005.04.006. [DOI] [Google Scholar]
  • 15.Berwick RC, Chomsky N. Birdsong, Speech, and Language: Exploring the Evolution of Mind and Brain. Cambridge: MIT Press; 2013. [Google Scholar]
  • 16.Fishbein AR, Idsardi WJ, Ball GF, Dooling RJ. Sound sequences in birdsong: How much do birds really care? Philos. Trans. R. Soc. B. 2019;375:20190044. doi: 10.1098/rstb.2019.0044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hauser MD, Chomsky N, Fitch WT. The faculty of language: What is it, who has it, and how did it evolve? Science. 2002;298:1569–1579. doi: 10.1126/science.298.5598.1569. [DOI] [PubMed] [Google Scholar]
  • 18.Fitch WTX. The Evolution of Language. Cambridge: Cambridge University Press; 2010. [Google Scholar]
  • 19.Hinde RA. Non-verbal Communication. Cambridge: Cambridge University Press; 1972. [Google Scholar]
  • 20.Papoušek M, Jürgens U, Papoušek H. Nonverbal Vocal Communication: Comparative and Developmental Approaches. Cambridge: Cambridge University Press; 1992. [Google Scholar]
  • 21.Manstead A, Oatley KX. Nonverbal Vocal Communication: Comparative and Developmental Approaches. Cambridge: Cambridge University Press; 1992. [Google Scholar]
  • 22.Tibbetts EA, Dale J. Individual recognition: It is good to be different. Trends Ecol. Evol. 2007;22:529–537. doi: 10.1016/j.tree.2007.09.001. [DOI] [PubMed] [Google Scholar]
  • 23.Bachorowski J-A, Owren MJ. Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context. Psychol. Sci. 1995;6:219–224. doi: 10.1111/j.1467-9280.1995.tb00596.x. [DOI] [Google Scholar]
  • 24.Bachorowski J-A. Vocal expression and perception of emotion. Curr. Direct. Psychol. Sci. 1999;8:53–57. doi: 10.1111/1467-8721.00013. [DOI] [Google Scholar]
  • 25.Elie JE, Theunissen FE. Zebra finches identify individuals using vocal signatures unique to each call type. Nat. Commun. 2018;9:1–11. doi: 10.1038/s41467-018-06394-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Keenan S, et al. Enduring voice recognition in bonobos. Sci. Rep. 2016;6:1–8. doi: 10.1038/srep22046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Geberzahn N, Derégnaucourt S. Individual vocal recognition in zebra finches relies on song syllable structure rather than song syllable order. J. Exp. Biol. 2020;223:220087. doi: 10.1242/jeb.220087. [DOI] [PubMed] [Google Scholar]
  • 28.Kramer E. Judgment of personal characteristics and emotions from nonverbal properties of speech. Psychol. Bull. 1963;60:408. doi: 10.1037/h0044890. [DOI] [PubMed] [Google Scholar]
  • 29.Banse R, Scherer KR. Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 1996;70:614. doi: 10.1037/0022-3514.70.3.614. [DOI] [PubMed] [Google Scholar]
  • 30.Gussenhoven, C. Intonation and biology. In Liber Amicorum Bernard Bichakjian (Festschrift for Bernard Bichakjian), 59–82 (2002).
  • 31.Jacobson JL, Boersma DC, Fields RB, Olson KL. Paralinguistic features of adult speech to infants and small children. Child Dev. 1983;54:436–442. doi: 10.2307/1129704. [DOI] [Google Scholar]
  • 32.Perez EC, et al. The acoustic expression of stress in a songbird: Does corticosterone drive isolation-induced modifications of zebra finch calls? Horm. Behav. 2012;61:573–581. doi: 10.1016/j.yhbeh.2012.02.004. [DOI] [PubMed] [Google Scholar]
  • 33.Perez EC, et al. Physiological resonance between mates through calls as possible evidence of empathic processes in songbirds. Horm. Behav. 2015;75:130–141. doi: 10.1016/j.yhbeh.2015.09.002. [DOI] [PubMed] [Google Scholar]
  • 34.Briefer EF. Vocal contagion of emotions in non-human animals. Proc. R. Soc. B Biol. Sci. 2018;285:20172783. doi: 10.1098/rspb.2017.2783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Briefer EF, et al. Perception of emotional valence in horse whinnies. Front. Zool. 2017;14:8. doi: 10.1186/s12983-017-0193-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cordaro DT, Keltner D, Tshering S, Wangchuk D, Flynn LM. The voice conveys emotion in ten globalized cultures and one remote village in Bhutan. Emotion. 2016;16:117. doi: 10.1037/emo0000100. [DOI] [PubMed] [Google Scholar]
  • 37.Simon-Thomas ER, Keltner DJ, Sauter D, Sinicropi-Yao L, Abramson A. The voice conveys specific emotions: Evidence from vocal burst displays. Emotion. 2009;9:838. doi: 10.1037/a0017810. [DOI] [PubMed] [Google Scholar]
  • 38.Mol C, Chen A, Kager RWJ, ter Haar SM. Prosody in birdsong: A review and perspective. Neurosci. Biobehav. Rev. 2017;81:167–180. doi: 10.1016/j.neubiorev.2017.02.016. [DOI] [PubMed] [Google Scholar]
  • 39.Tchernichovski O, Marcus G. Vocal learning beyond imitation: Mechanisms of adaptive vocal development in songbirds and human infants. Curr. Opin. Neurobiol. 2014;28:42–47. doi: 10.1016/j.conb.2014.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mooney R. Neurobiology of song learning. Curr. Opin. Neurobiol. 2009;19:654–660. doi: 10.1016/j.conb.2009.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Brainard MS, Doupe AJ. Translating birdsong: Songbirds as a model for basic and applied medical research. Annu. Rev. Neurosci. 2013;36:489–517. doi: 10.1146/annurev-neuro-060909-152826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Helekar SA, Marsh S, Viswanath NS, Rosenfield DB. Acoustic pattern variations in the female-directed birdsongs of a colony of laboratory-bred zebra finches. Behav. Proc. 2000;49:99–110. doi: 10.1016/S0376-6357(00)00081-4. [DOI] [PubMed] [Google Scholar]
  • 43.Rose EM, et al. Female song in eastern bluebirds varies in acoustic structure according to social context. Behav. Ecol. Sociobiol. 2020;74:1–7. doi: 10.1007/s00265-020-2824-3. [DOI] [Google Scholar]
  • 44.Prior NH, Fernandez MSA, Soula HA, Vignal C. Water restriction influences intra-pair vocal behavior and the acoustic structure of vocalisations in the opportunistically breeding zebra finch (Taeniopygia guttata) Behav. Proc. 2019;162:147–156. doi: 10.1016/j.beproc.2019.02.007. [DOI] [PubMed] [Google Scholar]
  • 45.Moore BC. Auditory Processing of Temporal Fine Structure: Effects of Age and Hearing Loss. Singapore: World Scientific; 2014. [Google Scholar]
  • 46.Dooling RJ, Prior NH. Do we hear what birds hear in birdsong? Anim. Behav. 2017;124:283–289. doi: 10.1016/j.anbehav.2016.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Prior NH, Smith E, Lawson S, Ball GF, Dooling RJ. Acoustic fine structure may encode biologically relevant information for zebra finches. Sci. Rep. 2018;8:6212. doi: 10.1038/s41598-018-24307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lawson SL, Fishbein AR, Prior NH, Ball GF, Dooling RJ. Relative salience of syllable structure and syllable order in zebra finch song. Anim. Cogn. 2018;21:467–480. doi: 10.1007/s10071-018-1182-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zann RA. The Zebra Finch: A Synthesis of Field and Laboratory Studies. Oxford: Oxford University Press; 1996. [Google Scholar]
  • 50.Tchernichovski O, Nottebohm F, Ho CE, Pesaran B, Mitra PP. A procedure for an automated measurement of song similarity. Anim. Behav. 2000;59:1167–1176. doi: 10.1006/anbe.1999.1416. [DOI] [PubMed] [Google Scholar]
  • 51.Cannam, C., Landone, C. & Sandler, M. In Proc. 18th ACM International Conference on Multimedia, 1467–1468.
  • 52.James LS, Sakata JT. Learning biases underlie “universals” in avian vocal sequencing. Curr. Biol. 2017;27:3676–3682. doi: 10.1016/j.cub.2017.10.019. [DOI] [PubMed] [Google Scholar]
  • 53.Lachlan RF, Van Heijningen CA, Ter Haar SM, Ten Cate C. Zebra finch song phonology and syntactical structure across populations and continents—A computational comparison. Front. Psychol. 2016;7:980. doi: 10.3389/fpsyg.2016.00980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Marler P. Nature's Music. Amsterdam: Elsevier; 2004. pp. 1–38. [Google Scholar]
  • 55.Dooling RJ, Leek MR, Gleich O, Dent ML. Auditory temporal resolution in birds: Discrimination of harmonic complexes. J. Acoust. Soc. Am. 2002;112:748–759. doi: 10.1121/1.1494447. [DOI] [PubMed] [Google Scholar]
  • 56.Fishbein AR, Lawson SL, Dooling RJ, Ball GF. How canaries listen to their song: Species-specific shape of auditory perception. J. Acoust. Soc. Am. 2019;145:562–574. doi: 10.1121/1.5087692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Vernaleo BA, Dooling RJ. Relative salience of envelope and fine structure cues in zebra finch song. J. Acoust. Soc. Am. 2011;129:3373–3383. doi: 10.1121/1.3560121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Lohr B, Dooling RJ, Bartone S. The discrimination of temporal fine structure in call-like harmonic sounds by birds. J. Comp. Psychol. 2006;120:239. doi: 10.1037/0735-7036.120.3.239. [DOI] [PubMed] [Google Scholar]
  • 59.Oetjen, H., Bovee, S., Steenken, F., Koppl, C., Klump, G. M. In ARO (San Jose, 2020).
  • 60.Perez EC, et al. Corticosterone triggers high-pitched nestlings’ begging calls and affects parental behavior in the wild zebra finch. Behav. Ecol. 2016;27:1665. [Google Scholar]
  • 61.Boucaud I, Perez EC, Ramos LS, Griffith SC, Vignal C. Acoustic communication in zebra finches signals when mates will take turns with parental duties. Behav. Ecol. 2017;28:645–656. doi: 10.1093/beheco/arw189. [DOI] [Google Scholar]
  • 62.Boucaud I, Mariette M, Villain A, Vignal C. Vocal negotiation over parental care? Partners adjust their time spent incubating based on their acoustic communication at the nest. Biol. J. Linnean Soc. 2015;117:322–336. doi: 10.1111/bij.12705. [DOI] [Google Scholar]
  • 63.Woolley SC, Doupe AJ. Social context-induced song variation affects female behavior and gene expression. PLoS Biol. 2008;6:e62. doi: 10.1371/journal.pbio.0060062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Järvinen-Pasley A, Pasley J, Heaton P. Is the linguistic content of speech less salient than its perceptual features in autism? J. Autism Dev. Disord. 2008;38:239–248. doi: 10.1007/s10803-007-0386-0. [DOI] [PubMed] [Google Scholar]
  • 65.Anikin A, Bååth R, Persson T. Human non-linguistic vocal repertoire: Call types and their meaning. J. Nonverb. Behav. 2018;42:53–80. doi: 10.1007/s10919-017-0267-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Spierings MJ, ten Cate C. Zebra finches are sensitive to prosodic features of human speech. Proc. R. Soc. B Biol. Sci. 2014;281:20140480. doi: 10.1098/rspb.2014.0480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Toledo LF, et al. The anuran calling repertoire in the light of social context. Acta Ethol. 2015;18:87–99. doi: 10.1007/s10211-014-0194-4. [DOI] [Google Scholar]
  • 68.Wells KD, Schwartz JJ. Vocal communication in a neotropical treefrog, Hyla ebraccata: Advertisement calls. Anim. Behav. 1984;32:405–420. doi: 10.1016/S0003-3472(84)80277-8. [DOI] [Google Scholar]
  • 69.Filippi P. Emotional and interactional prosody across animal communication systems: A comparative approach to the emergence of language. Front. Psychol. 2016;7:1393. doi: 10.3389/fpsyg.2016.01393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Klump GM, Dooling RJ, Fay R, Stebbins WC. Methods in Comparative Psychoacoustics. Basel: Birkhäuser; 2013. [Google Scholar]
  • 71.Gescheider GA. Psychophysics: Method, Theory, and Application. Mahwah: Lawrence Erlbaum; 1985. [Google Scholar]
  • 72.Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. arXiv preprint http://arXiv.org/1406.5823 (2014).
  • 73.R: A Language and Environment for Statistical Computing v.3.6.3. (R Foundation for Statistical Computing, Vienna, Austria, 2020). https://www.R-project.org.
  • 74.Bartoń, K. MuMIn: Multi-Model Inference. R package version 1.43.6. https://CRAN.R-project.org/package=MuMIn (2019).
  • 75.Fox, J., & Weisberg, G. An {R} Companion to Applied Regression, 2 edn. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion (Sage, Thousand Oaks, CA, 2011).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES