Abstract
Here we show how a migratory songbird, the chipping sparrow (Spizella passerina), achieves prompt and precise vocal imitation. Juvenile chipping sparrow males develop five to seven potential precursor songs; the normal development of these songs requires intact hearing but not imitation from external models. The potential precursor songs conform with general species-typical song parameters but differ from the song of wild, adult territorial males. As chipping sparrow males return from migration to start their first breeding season, they settle close to an older adult. The young male then stops producing all but one of its precursor songs, retaining the one that most resembles that of its neighbor. This single song then becomes more variable and, in a matter of days, is altered to closely match the neighbor's song. This elegant solution ensures species specificity and promptness of imitation.
Keywords: auditory feedback, chipping sparrows, precursor song, sensitive period, vocal learning
It is thought that birds that imitate their song first memorize an external model and then use auditory feedback to gradually modify their vocal output until it matches the memorized model (1). This developmental process can take weeks or months in juveniles because the point of departure can bear little semblance to the final imitation (2–5). In many seasonal, migratory songbirds, however, vocal learning often has to be mastered in a shorter period. For example, the song of a yearling migratory male often closely matches that of an adult close neighbor first encountered when the yearling migrant arrives at the spring breeding grounds (6–10). How such fast and precise song matching is achieved remains controversial (11–13). One way to do it would be for a juvenile to imitate a diversity of external models during its hatching year and then during the following spring retain only the imitation that best matches the song of a neighbor with whom it interacts (8, 11, 14, 15). However, songs acquired during the hatching year might not come close to the song of breeding-season neighbors. Here we describe a learning program that can achieve a fast yet precise imitation of the song of a neighbor first encountered during a young male's first breeding season. This program develops a set of “potential precursor songs” that can be altered later in life to precisely match an external model.
We encountered this program while studying the song ontogeny of a seasonal, migratory songbird, the chipping sparrow (Spizella passerina). Adult male sparrows sing only one song type during the breeding season; each song type consists of the repetition of a single syllable (10) (Fig. 1). In nature, there are ≈25–30 different syllable types [supporting information (SI) Fig. 7] distributed across the geographic range of the species (10). The very simple but diverse song of the chipping sparrow provides a great model to examine the detailed process of song ontogeny. A previous field study suggested that the simple song is most likely acquired by precise imitation of the song of an immediate adult neighbor at the breeding grounds during the juvenile's hatching summer or the following spring (10).
Results and Discussion
Song ontogeny in juvenile chipping sparrows starts at ≈1 month of age with soft and variable vocalizations referred to as “subsong” (16–18) (Fig. 2). Early in the following spring (7–9 months old), subsong gradually transforms into “plastic song,” with the appearance of recognizable song syllables and syntax. During this stage juveniles develop a repertoire of several juvenile “precursor” song types (Fig. 2). These multiple song types are delivered serially, each repeated several times before switching to a different one. We call them precursor songs because, as we shall see, in the normal course of events one of them will be modified to crystallized or match tutor song.
To determine whether these multiple precursor song types were imitated from adult tutors, juveniles (n = 6 males) were exposed to a single adult tutor for 3 weeks during their first summer only (1.5–2 months of age). During the following spring no tutor was provided, and each juvenile produced five to six recognizable song types (Fig. 3). Four of these birds imitated the tutor song in one of their multiple song types (SI Fig. 8A); the rest of song types did not match the song of the tutors or any wild-type adult song (Fig. 4). During early spring, at the plastic song stage (early March), the daily amount of time each juvenile spent singing increased significantly (Fig. 5A). Initially, the various song types were each given equal singing time, but in a few weeks one of these song types was produced much more frequently than the rest of song types (Fig. 5B). This song type then crystallized as breeding approached (late April to early May), and the rest of song types disappeared (SI Fig. 8A). The crystallized song was not necessarily the song copied from the tutor.
To examine whether the multiple precursor songs of juveniles could be produced without tutors, juveniles (n = 3 males) were visually and acoustically isolated from other birds. Each isolate produced five to six different song types in the spring (Fig. 3); these untutored multiple precursor songs were no different from uncopied songs produced by birds in the previous, “live-tutored” group [multivariate ANOVA (MANOVA), Wilks' λ = 0.917, χ2 = 185.54, P = 0.35]. The isolated birds also settled on a single adult song type (Fig. 3 and SI Fig. 8B). This crystallized song type was a modification of one of the songs in the juvenile multisong repertoire, and it did not match any of the wild-type adult songs recorded from free-ranging individuals in reproductive condition (Mann–Whitney U test, P < 0.0001).
We further tested whether auditory feedback is required to produce the multiple precursor songs by deafening juveniles at 18–28 days of age (n = 4 males). Deaf birds produced only one to three different song types during the plastic song stage. The juvenile multisong repertoire of deaf birds was significantly different from that of hearing birds (MANOVA, Wilks' λ = 0.65, χ2 = 873.18, P = 0.009) (Fig. 3). Eventually, as in hearing birds, deaf birds culled all juvenile song types except one, which they modified and retained as adult song (SI Fig. 8C).
To test whether these precursor song repertoires occurred also in nature, we trapped juvenile sparrows (n = 2) shortly before migration (late September; 2–4 months of age) and housed each one in a sound-proof chamber without hearing or interacting with other birds. Each of them produced five to six precursor song types during the following spring (Fig. 3 and SI Fig. 8D). These precursor songs were very similar to those of the hand-reared, hearing birds and did not closely resemble adult wild-type song. Multisong repertoires occur also in wild adult males, although we have never heard them in breeding, territorial individuals. We established this in two ways. (i) In nature, territorial adults produce only a single song type during the breeding season. However, when they (five adults, >2 years old) were captured and treated with testosterone, so they would sing a lot and could be used as live tutors, and housed singly, each produced multiple precursor song types (Fig. 3 and SI Fig. 8D), and only one of these songs was the one used during the breeding season. (ii) Another group of four adults was caught at the end of the breeding season and housed indoors. Early in the following spring they produced a multisong repertoire that, in addition to their adult song, included four to six other precursor songs. In both cases, the precursor song types produced by the adults were much like those of juveniles.
The multiple precursor song repertoire developed in juvenile sparrows is different from the wild-type adult songs used by individuals in breeding condition based on several lines of evidence: (i) the juvenile songs of multisong repertoires, except the one copied from the tutor, did not closely match any of the wild-type adult songs (Fig. 4). When we compared the song similarity between the songs from the multisong repertoires of juveniles (from tutored, isolated, and deaf groups) and each of 30 wild-type adult songs (SI Fig. 7), the highest similarity scores ranged from 0% to 36% similarity. When the same comparison was done for songs copied from a tutor, the similarity scores ranged from 81% to 93% (n = 7 tutored juveniles) (Fig. 4). (ii) Although the acoustic space of adult wild-type songs fell within the acoustic space defined by juvenile multisong repertoires (Figs. 2 and 3), the space of the former was significantly smaller than that of the latter (MANOVA, Wilks' λ = 0.6, χ2 = 1,367.72, P < 0.0001, one-tailed) (3). When songs drawn from juvenile or adult song samples were played to free-living territorial adults (n = 18 males), these adults responded more aggressively (or differently) toward the speaker broadcasting wild-type adult song than toward the speaker broadcasting songs from juvenile multisong repertoires. This effect persisted even when we took a single syllable from a song in the multisong repertoire of a juvenile and iterated it as in normal adult song so that intrasong syllable variability was similar in the playbacks that used juvenile or adult material (MANOVA, F = 24.5, P = 0.002) (see SI Table 1). Overall, these results suggest the adult and juvenile songs represent different acoustic signals.
The acoustic “space” defined by multisong repertoires was very similar in socially tutored (n = 6 birds), socially isolated (n = 3), wild-caught juveniles (n = 2) and wild-caught adults (n = 5) (MANOVA with four groups; χ2 = 132.6, P = 0.583) (Fig. 4) but differed significantly in the early-deafened birds (see above). Visual inspection of the multisong repertoire revealed that birds in the first four groups shared several song syllable types, including a high-pitched whistle (Fig. 3 and SI Fig. 8, song type a), a syllable reminiscent of “contact calls” (Fig. 3, type b), and one similar to “begging calls” (Fig. 3, type c).
Why do juvenile chipping sparrows develop a diverse song repertoire that is not acquired by imitation, that is sung profusely early in the spring, and that is not used during the breeding season? We speculated that one or more of these juvenile songs could be modified to match adult song types in the spring. To test this idea, seven juvenile males were not exposed to an adult tutor until the following spring, when each juvenile was tutored for 5 days. Soon after tutor exposure, one or two of the multiple song types in three of the tutored birds became unstable and then was modified to match that of the tutor; the other syllables remained more stable but over a period of 1–2 weeks stopped being produced. Within 5 days after first exposure to the tutor, a recognizable approximation to the tutor's song was already in place, with a stronger, stable match by day 10 (Fig. 6). Cluster analysis, discriminant function analysis, and song similarity tests (Fig. 6 and SI Fig. 9) revealed that the songs (i.e., syllables) used for modification in the three birds that imitated the external model were the ones that, before exposure to that tutor, had the closest acoustic distance to the tutor song syllable. Of the four remaining birds, one copied the song of another juvenile in an adjacent cage, which retained one of its juvenile songs, and the other two also retained one of their juvenile songs without matching their tutor (SI Fig. 9). The four birds that copied an external model also showed greater similarity between one of their own songs and the tutor song type (similarity scores 30–43%) than was the case for the four birds that did not imitate their adult tutor (similarity scores of 11–25%; Mann–Whitney U test, P = 0.03).
Adult chipping sparrows use only a single song type during the breeding season yet develop as juveniles multiple songs. We have referred to these juvenile songs as potential precursor songs because (i) none of these songs closely matches any wild-type adult song, (ii) they are not used during the breeding season, yet (iii) any one of them, including those consisting of repetitions of syllables reminiscent of begging calls, contact calls, or high-pitched whistles, can be later modified (Fig. 3 and SI Fig. 8) to match an external model (Fig. 6). The development of a normal repertoire of multiple potential precursor songs does not rely on imitation from external models, but it does require auditory feedback and can incorporate adult songs that were first encountered and memorized during the hatching year.
The acoustic space spanned by the repertoire of potential precursor songs may provide a species-specific constraint on what songs can be imitated. The acoustic space occupied by all of the precursor songs was similar among individuals of different experimental groups. In addition, early developed vocalizations, such as begging call-like and contact call-like sounds, seemed to be incorporated into the precursor song repertoires of most juveniles, suggesting that earlier vocal experience influences subsequent vocal development. The long, high-pitched whistles that occurred in potential precursor songs were acoustically more distant from wild-type adult songs; these whistles were absent in deaf birds, and their pattern of delivery was different from that of other precursor songs (SI Fig. 9), suggesting that long, high-pitched whistles have a different role in song ontogeny.
Our observations and interpretation differ from those of earlier studies. Marler (11) suggested that either of three alternative approaches could yield good imitations of tutor song. These approaches were (i) learning by instruction, (ii) learning based on selection, and (iii) instruction followed by selection. The “instruction” learning model refers to the early memorization of an external sound that then is used to guide vocal ontogeny until that sound is matched (imitated). The “selection” model refers to the idea that innate, selective auditory mechanisms (“templates”) help select what sounds will be imitated. These templates, according to Marler (11), can be “latent” or “preactive.” Whereas latent templates are activated by early exposure to conspecific sounds, preactive templates rely on auditory feedback to guide vocal ontogeny in the absence of external models. In addition, Marler and Sherman (19) also speculated that species' typical, innate, motor mechanisms played a role in guiding the acquisition of learned song, as shown by the fact that early-deafened individuals of different species developed songs that had some species-typical features. In the third model, instruction by early exposure to external models results in the imitation of several songs; however, only a subset of these imitations—those that matched the song of neighbors—is subsequently retained, whereas the others are selectively discarded.
In chipping sparrows, the development of five to seven potential precursor songs requires intact hearing and presumably is guided by auditory feedback; by this criterion, precursor songs are learned. Yet, even after these learned songs have been mastered, vocal ontogeny is still open to “instruction” from an external source, as when one of the precursor songs is modified to match a model. We believe that this kind of vocal ontogeny sheds light on the riddle that so interested Marler (11) that the number of different sounds in the repertoire of a species that learns its song is often relatively small, as is the case for the phonemes in our own speech. This conservativeness may stem from an innate program that results from the interplay of motor and perceptual predispositions yet leaves room for instruction from external sources. It is fortuitous that chipping sparrows so clearly shed light on this relation between information initially acquired from internal and external sources that is at the center of all vocal learning. In them self-learning and learning from others occur during separate developmental stages in a way that seems well suited to the breeding ecology, such that a first-year male, just returning from its winter quarters, can settle in a territory and promptly imitate the song of a neighbor that sings a song that, perhaps, the young bird had never heard before.
Materials and Methods
Experimental Subjects.
Twenty male chipping sparrows aged 5–10 days were collected over a period of 3 years. These juveniles were hand-raised to independence at 30–36 days of age and were assigned to four experimental groups.
Group 1: Socially tutored group (n = 6 males).
Juveniles at 1.5–2 months of age were housed singly in sound-proof chambers and presented with a male, adult, chipping sparrow tutor; the door of the chamber was open so that the “pupil” could see only its “tutor” in a cage just in front of its chamber and was able to interact with it visually and vocally. Adult tutors were presented for 3 weeks during the hatching summer, and each tutor produced 7–92 songs per day. No tutor was provided during the following spring. The door of the sound-proof chamber was closed after 3 weeks of tutor exposure. Sounds were then continuously recorded for 15 h per day (0500–2000 hours), 7 days a week, until the song was crystallized (May to June at the age of 11–12 months).
Group 2: Social isolation group (n = 3 males).
In this group, juveniles of the same age as in group 1 were housed singly in sound-proof chambers. No song tutor was provided to birds in this group, and they did not see or hear other chipping sparrows. Sounds were continuously recorded as described previously.
Group 3: Deafened group (n = 4 males).
Juveniles were deafened at ≈18–28 days of age. Both cochleae were removed, as described by Konishi (1). The deaf birds were each housed in a sound-proof chamber.
Group 4.
The other seven juvenile males were used for testing song imitation in the spring. Each bird was housed in a separate cage and could see or hear other juveniles in the same room. During the following spring, an adult tutor was placed next to each juvenile for 5 days, a condition that, we hoped, would approximate that in the wild when a juvenile settles next to a singing adult.
In addition, two juveniles were caught in early fall (late September; ≈2–4 months of age) just before migration. Each bird was immediately housed in a sound-proof chamber so that it could not hear or interact with other birds. No adult tutor was presented to these birds, nor did they receive testosterone implants.
Nine adult sparrows were captured in their territories and kept in the laboratory. The age of these birds was not known but at the time of the initial recordings was at least 2 years. During the following spring or summer, four of these adults were implanted with testosterone and kept in a sound-proof chamber for song recording.
Testosterone Implant.
An incision was made in the skin on the back of each bird, and a 5-mm silastic tube (0.76-mm inner and 1.65-mm outer diameters; Dow–Corning) filled with crystalline testosterone (Sigma) and sealed with silastic medical adhesive (Dow–Corning) was implanted subdermally. The implants were removed 1 month later. The other five birds were not implanted, and their song development was continuously recorded during the following spring.
Playback Experiment.
Playbacks of adult song were prepared by randomly choosing an adult song and repeating it at the typical singing rate (five to seven songs per minute) found in wild, territorial adults (12). Nine playbacks were prepared, each using the adult song of a different wild adult male (20). The playback segments were made by using sound analysis software Raven 1.2 (Cornell University). To prepare playbacks of potential precursor songs, we used different types of precursor song (with begging-call-like syllables, contact-call-like syllables, and long, high-pitched, whistled syllables) from four different males; each of these songs had a duration of 2–3 sec and was used to make a 3-min digital segment with the same singing rate as that found in territorial adults. In addition, a single syllable from a precursor song type was iterated as in normal song to reduce the variability in syllable delivery found in juvenile song types. Altogether, 12 playbacks of precursor song types were prepared. The playback experiments were conducted at the Rockefeller University Field Research Center and at the nearby Institute for Ecosystem Studies. Each subject (n = 18 territorial males with 27 trials) heard a playback of precursor song or wild-type adult song in the morning (0900–1100 hours). We placed the playback speaker (SME AFS-A70), connected to a Sony MZNH1 hi-minidisc recorder, near the center of each focal male's territory. The volume of playbacks was set to match the volume of diurnal adult wild-type, territorial song (≈75–80 dB). During each playback trial (3 min of song playback and 7 min of silence), only one (adult or precursor) song type was played to the subject. Behavioral data included closest distance to speaker, time latency to produce first vocalization, number of vocalizations, and time period spent within 10 m of speaker. We used principal-components analysis to analyze these variable with SPSS 14.0 (SPSS).
Sound Analysis and Statistical Analysis.
Song imitation was defined as instances in which the song of a juvenile reached a similarity index of 72% or better when compared with the tutor song. This measure of similarity uses Euclidean distances across six sound features: pitch, FM, AM, Wiener entropy, duration, and spectral continuity (for more details, see ref. 21). Measures of similarity index were obtained by taking each of 300 syllables of a same kind from the pupil's adult song and comparing them with 300 syllables of a kind from the tutor. For these comparisons we used Sound Analysis Pro (SAP) (21), with the minimum silent gap separating syllables adjusted to 5 ms. The justification of 72% as the criterion for imitation was based on the fact that, when two blind judges looking at sonograms of pupil and tutor songs (not identified) were asked to identify which pupils copied which tutors, these positive identifications carried similarity scores of 72% or higher, whereas comparisons with songs that in their judgment were not copies yielded scores of 32% or less.
Cluster Analysis.
We used cluster analysis to recognize song clusters. Recorded songs were first segmented as syllables by using SAP. We used the statistical software from SAS (JMP 5.1; SAS Institute) and SPSS to classify syllables for cluster analysis. We used hierarchical clustering analysis with equal weighting on each variable and with a dendrogram output. Based on the clustering dendrogram, we then selected the squared Euclidean distance to determine the distance between clusters; between groups of syllables (22), the further the Euclidean distance between two syllables, the more dissimilar the two syllables. We combined two independent ways to determine the number of clusters: (i) the cluster of syllables can be selected based on the sudden jump of Euclidean distance between groups; this way we can determine the approximate number of clusters (SPSS user manual). (ii) Two blind judges visually inspected the sonograms to classify the number of clusters. In some cases where these two methods did not agree, we reexamined each of the six song features of those ambiguous clusters and then used discriminant function analysis (see below) to determine whether these acoustic features were significantly different to be classified as two clusters.
Discriminant Function Analysis and MANOVA.
After different song type clusters were identified by cluster analysis, we used discriminant function analysis (SPSS 14.0 and JMP5.1) to compare the potential precursor songs and wild-type adult songs, with a measure of acoustic separation between the two. We also used this analysis to separate populations of song types and to decide which of the potential precursor syllables was acoustically closer to the tutor song. For discriminant function analysis, we used the same six acoustic features as independent variables that we had used for cluster analysis. MANOVA was used to test the significance of the discriminant functions obtained. In the statistical output, Wilks' λ was used to determine whether the discriminant model as a whole was significant, or what song features were significant when discriminating between two or more groups, information that we could then use to compare with the similarity scores obtained by SAP.
Supplementary Material
ACKNOWLEDGMENTS.
We thank Tim Gardner for providing his sound recording software and for his comments on the manuscript. Peter Marler, Mark Konishi, Mike Beecher, Heather Williams, and Michale Fee read our manuscript and made helpful suggestions. We thank Debbie Gardner and Dania Vallinova for helping hand-rear juvenile birds, and we thank Daun Jackson, Sharon Sepe, and Helen Ecklund for birdkeeping. The Institute of Ecosystem Studies provided the land for some of the field work. This work was supported by National Institutes of Mental Health Grant MH 18343 (to F.N.) and a Li Memorial Fellowship (to W.-c.L.).
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/cgi/content/full/0710067104/DC1.
References
- 1.Konishi M. Zeit Tierpsychol. 1965;22:770–783. [PubMed] [Google Scholar]
- 2.Immelmann K. In: Bird Vocalization. Hinde RA, editor. London: Cambridge Univ Press; 1969. pp. 61–74. [Google Scholar]
- 3.Tchernichovski O, Mitra PP, Lints T, Nottebohm F. Science. 2001;291:2564–2569. doi: 10.1126/science.1058522. [DOI] [PubMed] [Google Scholar]
- 4.Marler P. In: Imprinting and Cortical Plasticity. Rauschecker JP, Marler P, editors. New York: Wiley; 1987. pp. 99–135. [Google Scholar]
- 5.Baptista LF, Petrinovich L. Anim Behav. 1984;32:172–181. [Google Scholar]
- 6.Kroodsma DE. Zeit Tierpsychol. 1974;22:770–783. [Google Scholar]
- 7.Payne RB. Anim Behav. 1981;29:688–697. [Google Scholar]
- 8.Nelson DA. Behav Ecol Sociobiol. 1992;30:415–424. [Google Scholar]
- 9.Bell DA, Trail PW, Baptista LF. Anim Behav. 1998;55:939–956. doi: 10.1006/anbe.1997.0644. [DOI] [PubMed] [Google Scholar]
- 10.Liu W-C, Kroodsma D. Condor. 2006;108:509–517. [Google Scholar]
- 11.Marler P. J Neurobiol. 1997;33:501–516. [PubMed] [Google Scholar]
- 12.Nelson DA. In: Social Influences on Vocal Development. Snowdon CT, Hausberger M, editors. Cambridge, UK: Cambridge Univ Press; 1997. pp. 7–22. [Google Scholar]
- 13.Kroodsma DE. In: Nature's Music. Marler P, Slabbekoorn H, editors. San Diego: Elsevier Academic; 2004. pp. 108–131. [Google Scholar]
- 14.Nelson DA, Marler P. Proc Natl Acad Sci USA. 1994;91:10498–10501. doi: 10.1073/pnas.91.22.10498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nordby JC, Campbell SE, Beecher MD. Anim Behav. 2007 doi: 10.1006/anbe.1999.1304. in press. [DOI] [PubMed] [Google Scholar]
- 16.Thorpe WH. Ibis. 1955;97:247–251. [Google Scholar]
- 17.Marler P, Peters S. In: Acoustic Communication in Birds. Kroodsma DE, Miller EH, editors. Vol 2. New York: Academic; 1982. pp. 135–147. [Google Scholar]
- 18.Nottebohm F. J Exp Zool. 1972;179:35–50. [Google Scholar]
- 19.Marler P, Sherman V. J Neurosci. 1983;3:517–531. doi: 10.1523/JNEUROSCI.03-03-00517.1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kroodsma DE. Anim Behav. 1990;40:1138–1150. [Google Scholar]
- 21.Tchernichovski O, Nottebohm F, Ho CE, Pesaran B, Mitra PP. Anim Behav. 2000;59:1167–1176. doi: 10.1006/anbe.1999.1416. [DOI] [PubMed] [Google Scholar]
- 22.Sokal RR, Rohlf FJ. Biometry. 3rd Ed. New York: Freeman; 1995. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.