Abstract
Neural sequences are a fundamental feature of brain dynamics underlying diverse behaviors, but the mechanisms by which they develop during learning remain unknown. Songbirds learn vocalizations composed of syllables; in adult birds, each syllable is produced by a different sequence of action potential bursts in the premotor cortical area HVC. Here we carried out recordings of large populations of HVC neurons in singing juvenile birds throughout learning to examine the emergence of neural sequences. Early in vocal development, HVC neurons begin producing rhythmic bursts, temporally locked to a ‘prototype’ syllable. Different neurons are active at different latencies relative to syllable onset to form a continuous sequence. Through development, as new syllables emerge from the prototype syllable, initially highly overlapping burst sequences become increasingly distinct. We propose a mechanistic model in which multiple neural sequences can emerge from the growth and splitting of a common precursor sequence.
Introduction
Sequences of neural activity have been observed during various behaviors, including navigation1–4, short-term memory5–7, decision making8,9, and complex movements10,11, suggesting that neural sequences are a fundamental form of brain dynamics12,13. However, the circuit mechanisms underlying the generation of neural sequences and their development during learning are not well understood.
The songbird is a good model system to address such questions because the song produced by adults is learned during development14–18. Furthermore, adult song is associated with neural sequences in nucleus HVC (used as a proper name)19–24, a premotor cortical area necessary for the production of stereotyped adult song25–30. Most projection neurons in HVC generate a brief burst of spikes at one specific time in the song motif and different neurons are active at different times in the song19–24,30; thus, distinct syllable types are produced by largely non-overlapping neural sequences in HVC. Here we ask how these different neural sequences are constructed during vocal development.
Zebra finches acquire their stereotyped song through a gradual learning process14,31. Young birds initially produce a highly variable ‘subsong’31, akin to human babbling15. Birds then enter the protosyllable stage as they begin to incorporate syllables of a characteristic ~100 ms duration32–35. This is followed by the gradual emergence of multiple syllable types32,33,36, and a final ‘motif’ stage in which syllables are produced in a reliable sequence. While HVC activity is not required for subsong27,34,35, it is required for song components in all later stages, including protosyllables, emerging syllable types, and adult song25–28,34,35.
Developmental progression of HVC activity
To elucidate the mechanisms by which neural sequences in HVC develop, we recorded from populations of HVC projection neurons in juvenile and adult birds (n=1,150 neurons, 35 birds; Extended Data Fig. 1a). At all stages of vocal development, HVC projection neurons generated brief bursts of spikes during singing (Fig. 1a–c, Extended Data Fig. 1b–c). In the subsong stage (n=12 birds; defined by exponential distribution of syllable durations, prior to the emergence of protosyllables) roughly half of neurons generated bursts not temporally locked to syllable onsets (Extended Data Fig. 1d), while the other half produced bursts that tended to occur at a particular latency relative to subsong syllable onsets (Fig. 1a, Extended Data Fig. 1e–i; 19/39 neurons exhibited syllable locking). The fraction of neurons locked to syllable onsets exhibited a gradual and significant increase throughout vocal development (Fig. 1f; correlation with song stage: r=0.22, P<10−10; see Methods) until, in adult birds, virtually every projection neuron generated bursts precisely locked to syllables, as previously described19–24.
Song development is characterized by a gradual change in song rhythm33,37,38. The subsong stage, with little evidence of rhythmic song structure, ends with the emergence of a rhythmically produced protosyllable (5–10 Hz)32–35. This is followed by a subsequent increase in the period between repetitions of the same sound, attributable to the addition of new song syllables33. HVC exhibited parallel changes in rhythmicity. In the subsong stage, most projection neurons did not burst rhythmically (Fig. 1a, f; 3/39 neurons were rhythmic). In the protosyllable stage, roughly half of the projection neurons generated rhythmic bursts (5–10 Hz) (Fig. 1b,f; 70/135 neurons were rhythmic; period 169 ± 6.4 ms, mean ± s.e.m.). Such bursts were typically locked to rhythmic protosyllables, but were also commonly observed during less rhythmic portions of the song, particularly early in the protosyllable stage (Extended Data Fig. 2a–d). On average, both the fraction of rhythmic HVC neurons, and the period of the HVC burst rhythm gradually increased during the emergence of new syllable types and the formation of the song motif (Fig. 1f,g; correlation between song stage and fraction of rhythmic neurons: r=0.28, P<10−10; correlation between song stage and period of burst rhythm: r=0.57, P<10−10).
A significant fraction of projection neurons (285/1118 neurons) in juvenile birds generated bursts related to song bouts—defined as epochs of continuous singing bounded by periods of silence (see Methods). Bout-related neurons generated brief bursts of spikes immediately prior to bout onset (‘bout-onset’ neurons; 137/285 neurons) or after bout offset (98/285 neurons) (Fig. 1d,e, Extended Data Fig. 2e–l; an additional 50/285 neurons were active both before and after bouts).
Growth of a neural protosequence
We next wondered how the activity of HVC projection neurons is coordinated across the neural population during protosyllables. Multiple recordings in the same bird revealed that different neurons were active at different times with respect to protosyllable onsets (Fig. 2a,b; Extended Data Fig. 1n, 9k; n=3 birds, 54 neurons), with latencies spanning the duration of the protosyllable and the intervening gap (>90% burst coverage; Extended Data Fig. 2t). These findings suggest that protosyllables are generated by a rhythmic protosequence—a repeating motor program comprised of a continuous sequence of bursts in HVC.
We next examined the developmental emergence of this rhythmic protosequence. In the subsong stage (Fig. 2c; n=19 neurons, 12 birds), bursts had a significantly earlier distribution of latencies compared to the broader distribution of burst latencies in the protosyllable stage (n=104 neurons, 13 birds; P=0.02; 63% vs. 43% of bursts prior to syllable onset in subsong stage and protosyllable stage, respectively). Even though the range of latencies was narrower in subsong birds, different neurons recorded in the same bird were locked to syllable onsets at different latencies (Extended Data Fig. 1f–i). This suggests the existence of transient sequential activity, initiated just prior to syllable onset, but decaying within a few tens of milliseconds. This sequential activity appears to grow during the protosyllable stage to form longer sequences that can persist for more than a hundred milliseconds, throughout the duration of the protosyllable (Fig. 2b,c).
Sequence splitting during syllable formation
We next wondered how distinct sequences in HVC, each corresponding to a distinct adult syllable type, emerge during vocal learning. Here we hypothesize that new syllable types can emerge by the gradual splitting of a single protosequence. In this view, we imagine that the neural sequences underlying newly emerging syllable types would initially be largely overlapping, with neurons shared across the emerging syllables. Splitting would be associated with an increasing number of neurons selective for a particular emerging syllable type, and a decreasing fraction of shared neurons.
To test this hypothesis, we recorded from HVC projection neurons (n=769) in 6 juvenile birds while they acquired multiple syllable types. As a first example, we will describe changes in the HVC population activity in a bird (n=375 projection neurons; Bird 1) that developed two acoustically distinct syllable types (labeled β and γ) over the course of several days (Fig. 3a,b; β and γ eventually form adult syllables B and C, respectively). During the protosyllable stage (56–59 dph), the majority of projection neurons participated in a rhythmic protosequence (Extended Data Fig. 1n; n=14/16 neurons; e.g. Fig. 3c). After the emergence of syllable types β and γ (62–72 dph), many neurons were selectively active only during β or during γ, but not both (Fig. 3d,f; of 105 neurons active during either β or γ, 41 were β-specific and 42 were γ-specific). The bursts of these syllable-specific neurons exhibited a wide range of latencies, with spiking activity of neurons in each group spanning the entire duration of each syllable (Fig. 3g). Notably, we also observed a substantial population of neurons that were significantly active during both β and γ (n=22 ‘shared’ neurons; Fig. 3e–g). Simultaneous recordings revealed the co-occurrence, in different neurons, of shared and specific firing patterns (Fig. 3f, Extended Data Fig. 3a,b).
Shared neurons exhibited a number of striking characteristics. These neurons burst rhythmically with the same inter-burst interval as neurons recorded in the protosyllable stage (Fig. 3e, f; Extended Data Fig. 3f–j). Shared neurons were active, as a population, at a wide range of latencies within emerging syllables (Fig. 3g), and crucially, for a given shared neuron, the bursts during β occurred at a similar latency as the bursts during γ (Fig. 3g, Extended Data Fig. 4a–d). Thus, shared neurons generated the same continuous burst sequence during both β and γ. This shared sequence occurred even at times when there was a significant acoustic difference between the shared syllables (Extended Data Fig. 5). We also found that the fraction of shared neurons later in development (81–112 dph) was significantly lower compared to the earlier recordings (Fig. 3h; 10 shared and 90 specific neurons; P=0.03). Thus, the refinement of β and γ into the adult syllables B and C coincides with a decrease in the fraction of shared neurons, producing a gradual splitting of these representations into increasingly non-overlapping ‘daughter’ neural sequences.
The tendency of the song in Bird 1 to alternate between syllables β and γ means that syllable-specific neurons had an inter-burst interval, and thus a period, that was twice as long as that observed in the earlier protosyllable stage (Fig. 3c–f, Extended Data Fig. 3f–j). Therefore, the increase in the period of neural activity through skipping or alternating cycles of an underlying rhythm appears to be a basis of the increase in song period during vocal learning33.
Although our key findings are described above for Bird 1, a similar pattern of HVC coding by shared and specific neurons was seen in a total of 6 birds for which recordings were made during the emergence of multiple syllable types (Birds 1–6; 185 shared neurons and 496 specific neurons for 8 syllable pairs analyzed). Across three birds in which neurons were also recorded in later song stages, there was a significant decrease in the fraction of shared neurons during syllable development (n=5 syllable pairs; P=3×10−6; Birds 1, 2, 4). Neurons exhibiting an increased burst period by skipping cycles of an underlying rhythm were observed in 4 of the 6 birds (Birds 1, 3, 4, 6).
Splitting in other learning strategies
Behavioral studies have shown that new syllable types can emerge using several distinct developmental strategies32,33,36,39,40. The bird described above (Bird 1) used the ‘serial repetition’ strategy32 and ‘sound differentiation in situ’33 to develop two new syllables by alternating increasingly different variants of the protosyllable. Alternatively, birds can acquire multiple syllables simultaneously to form an entire motif (‘motif strategy’)32, or form new syllables at bout edges (onset or offset)39,40. We wondered if the splitting of neural sequences underlies these other strategies.
Neural recordings were obtained in three birds (Birds 1, 2, 5) that exhibited bout-onset syllable formation. We focus here on Bird 2 in which projection neurons were recorded throughout song development (57–84 dph). Tracking of syllable structure (Extended Data Fig. 6) revealed that syllables A and B of the adult song derived from a common, rhythmically repeated protosyllable (labeled α; Fig 4a,b), and that syllable B arose from the first repetition of α at bout onset (Fig. 4c,d). This bout-onset syllable emerged as a distinct syllable type (labeled β) by fusion of this first α with a brief vocal element ε at bout onset (Fig. 4c,d).
To examine the neural mechanisms underlying the emergence of the new syllable β at bout onsets, we analyzed the firing patterns of 125 HVC projection neurons. Before the emergence of syllable β, the majority of recorded projection neurons participated in a rhythmic protosequence (Fig. 2b; n=28/35 neurons; 57–64 dph,). A different subset of neurons was active at bout onsets (Fig. 4c; 4 of 35 neurons). After the emergence of β at bout onsets, roughly half of projection neurons generated bursts during both syllables α and β (65–72 dph; Fig. 4d,e; n=22 ‘shared’ neurons; 21 ‘specific’ neurons). These shared neurons produced nearly identical sequences during these two syllables (Fig. 4h, Extended Data Fig. 4c). Later in song development (73–84 dph), we observed a larger fraction of syllable-specific neurons (Fig. 4f,g,i; n=28 ‘specific’ neurons), and a correspondingly smaller fraction of shared neurons (4 ‘shared’ neurons; P=5×10−4), consistent with a gradual splitting of the protosequence into increasingly non-overlapping ‘daughter’ sequences. Evidence for sequence splitting during bout-onset differentiation was also observed in Birds 1 and 5 (Extended Data Fig. 7).
Note that the bout-onset differentiation in Bird 1 occurred after the earlier emergence of the syllables β and γ (Fig. 3), suggesting that new syllables may emerge in a hierarchical process—that is, by the splitting of sequences that are themselves the daughters of an earlier splitting process (Extended Data Fig. 7).
We were able to examine the question of whether neural sequence splitting also underlies the ‘motif strategy’ of song learning in two birds (Birds 3, 4; Extended Data Fig. 8, 9). In both birds, neural recordings showed the existence of rhythmically bursting neurons in the protosyllable stage (Extended Data Fig. 8e, 9e,f). After the emergence of multiple syllable types, every syllable in the emerging motifs had at least one neuron that was shared with another syllable at similar latencies (Extended Data Fig. 8f–j, 9g-o), consistent with the view that all of these syllables arose from the simultaneous splitting of a common protosequence.
Mechanistic Model and Discussion
Here, we propose a mechanistic model of learning in the HVC network to describe how sequences emerge during song development. This model is based on the idea that sequential bursting results from the propagation of activity through a continuous synaptically-connected chain of neurons within HVC21,41–47. It also captures non-uniformities such as increased burst density at syllable onsets, formulated in a perspective of HVC function emphasizing vocal gestures22.
Modeling studies have shown that a combination of two synaptic plasticity rules—spike-timing dependent plasticity (STDP) and heterosynaptic competition—can transform a randomly connected network into a feedforward synaptically-connected chain that generates sparse sequential activity43,44. We hypothesize that the same mechanisms can lead to the formation of a single chain that generates a rhythmic protosyllable, followed by the splitting of this chain into multiple daughter chains for different syllable types. To test this hypothesis, we constructed a simple network of binary units representing HVC projection neurons44.
The model neurons are initially connected with random excitatory weights, representing the subsong stage. We hypothesize that a subset of HVC neurons receives an external input at syllable onsets and serves as a seed from which chains grow during later learning stages43,45. Before learning, activation of these seed neurons produced a transiently propagating sequence of network activity that decayed rapidly (within tens of milliseconds; Fig. 5a).
In the next stage, the network is trained to produce a single protosyllable by activating seed neurons rhythmically (100 ms period). The connections are modified according to the learning rules described above43,44. As a result, connections were strengthened along the population of neurons sequentially activated after syllable onsets, resulting in the growth of a feedforward synaptically-connected chain that supported stable propagation of activity (Fig. 5b).
We found that this single chain could be induced to split into two daughter chains by dividing the seed neurons into two groups activated on alternate cycles of the rhythm (Fig. 5c,d, Supplementary Video 1). Local inhibition48 and synaptic competition were also increased (see Methods). During the splitting process, we observed neurons specific to each of the emerging syllable types, as well as shared neurons that were active at the same latencies in both syllable types (Fig. 5c). Just as observed in our data, the distribution of burst latencies in the model continued to broaden (Fig. 5e), and the fraction of shared neurons decreased during development (Fig. 5c,d). The average period of rhythmic bursting in model neurons increased during chain splitting as neurons became ‘specific’ for one emerging syllable type and began to participate only on alternate cycles of the protosyllable rhythm (Fig. 5d, Extended Data Fig. 10g,h).
Other strategies for syllable formation
Our model can reproduce other strategies by which birds learn new syllable types. We implemented bout-onset differentiation in the model by also including a population of seed neurons activated at bout onsets (cf. Fig. 1d, 4c; Extended Data Fig. 10a). This caused the protosyllable chain to split in such a way that one daughter chain was reliably activated only at bout onsets, while the other daughter chain was active only on subsequent syllables (Extended Data Fig. 10a–d, Supplementary Video 2). Our model was also able to simulate the simultaneous emergence of a three-syllable motif (‘motif strategy’) by dividing the seed neurons into three subpopulations (Extended Data Fig. 10e–h).
Our data and modeling support the possibility of syllable formation by mechanisms other than sequence splitting. For example, in several birds, a short vocal element emerged at bout onsets that did not appear to differentiate acoustically from the protosyllable (and thus was not bout-onset differentiation; e.g. ‘E’ in Bird 1, Extended Data Fig. 7a; or ‘C’ in Bird 2, Extended Data Fig. 6a,b). We found that by using different learning parameters, our model allows bout-onset seed neurons to induce the formation of a new syllable chain at bout onset, rather than inducing bout-onset differentiation (Extended Data Fig. 10i–k).
In summary, our model of learning in a simple sequence-generating network captures transformations that underlie the formation of new syllable types via a diverse set of learning strategies.
Why sequence splitting?
The process of splitting a prototype neural sequence allows learned components of a prototype motor program to be reused in each of the daughter motor programs. For example, one of the earliest aspects of vocal learning is the coordination between singing and breathing35, specifically, the alternation between vocalized expiration and non-vocalized inspiration typical of adult song49. The protosequence in HVC would allow the bird to learn the appropriate coordination of respiratory and vocal musculature. Duplication of the protosequence through splitting would result in two ‘functional’ daughter sequences, each already capable of proper vocal/respiratory coordination, and each suitable as a substrate for rapid learning of a new syllable type.
This proposed mechanism resembles a process thought to underlie the evolution of novel gene functions: gene duplication followed by divergence through independent mutations50. Similarly, for the acquisition of complex behaviors, the duplication of neural sequences by splitting, followed by independent differentiation through learning, may provide a mechanism for constructing complex motor programs.
Full Methods
Animals
We used juvenile male zebra finches (Taeniopygia guttata) 44–112 days post hatch (dph) singing undirected song (n=32 birds). Animals were not divided into experimental groups; thus, randomization and blinding were not necessary. No statistical methods were used to predetermine sample size. Birds were obtained from the Massachusetts Institute of Technology zebra finch breeding facility (Cambridge, Massachusetts). The care and experimental manipulation of the animals were carried out in accordance with guidelines of the National Institutes of Health and were reviewed and approved by the Massachusetts Institute of Technology Committee on Animal Care.
All the juvenile birds were raised by their parents in individual breeding cages until 38 ± 5.2 dph (mean ± s.d.) when they were removed and were singly housed in custom-made sound isolation chambers (maintained on a 12:12 hour day-night schedule). In a subset of the birds (Bird 1, 2, 4), additional tutoring was carried out after removal from the breeding cages to facilitate song imitation. This was done by playback of the tutor song through a speaker (20 bouts per day). Additional tutoring was done for 12 days for Bird 1, 7 days for Bird 2, and 18 days for Bird 4. Bird identification key: Bird 1, to3965; Bird 2, to3779; Bird 3, to3017; Bird 4, to5640; Bird 5, to3396; Bird 6, to2309; Bird 7, to3412; Bird 8, to3567; Bird 9, to2462; Bird 10, to2331; Bird 11, to2427; Bird 12, to3352.
To compare the activity of HVC projection neurons in juvenile birds with that of adult birds, we also included neurons recorded in adults (>120 dph, n=3 birds) which included a reanalysis of previously published HVC recordings performed in adult male zebra finches singing directed song20.
Song recordings
Songs were recorded with Sound Analysis Pro51 or a custom-written MATLAB software (A. Andalman), which was configured to ensure triggering of recordings on all quiet vocalizations of juvenile birds27. The vertical axis range for all spectrograms is 500–8000 Hz.
Classification of song stages
We classified each day of juvenile singing into four song stages: subsong stage, protosyllable stage, multi-syllable stage, and motif stage (Extended Data Fig. 1a). Subsong stage (48 ± 4 dph, median ± inter-quartile range) is defined as having a syllable duration distribution well-fit by an exponential distribution34,35, with an upper limit for the Lilliefors goodness-of-fit statistic of 6. Following the subsong stage, birds enter the protosyllable stage (58 ± 10 dph, median ± i.q.r.) characterized by the presence of syllables with consistent timing reflected in a peak in the distribution of syllable durations32–35. The onset of the protosyllable stage was defined here as the first day in which the syllable duration distribution deviated from an exponential distribution (Lilliefors goodness-of-fit statistic greater than 6). Following the protosyllable stage, birds transition to the multi-syllable stage (62 ± 12 dph, median ± i.q.r.) in which multiple distinct syllable types are visible in the song spectrogram and as multiple clusters in a scatter plot of syllable features52 (e.g. Fig. 3a, b; 62 dph). The motif stage (73 ± 21 dph, median ± i.q.r.) was defined by the production of a sequence of syllables in a relatively fixed order31. Finally, songs recorded in birds older than 120 dph were assigned as adult stage. A slightly older cutoff than the typical definition of adulthood in zebra finches (~90 dph)14 was used, because some of our birds in the 90–120 dph continued to undergo some small developmental changes, as has been reported31.
Syllable segmentation and bout extraction
Syllable segmentation of the juvenile song was done based on the song power in a spectral band between 1–4 kHz, as described previously27,34,35. In a few cases, cutoff frequencies of the band-pass filters were adjusted to avoid the inclusion of high-frequency inspiratory sounds35,53. Introductory notes were removed manually to avoid including HVC neurons that are rhythmically active during these elements54. Song bouts were defined as a sequence of syllables separated by gaps less than 300 ms35. Bout onset was defined as the onset of the first syllable in the bout, and bout offset was defined as the offset of the last syllable in the bout.
Syllable segmentation based on the song rhythmicity (‘phase segmentation’)
For Bird 3 (‘motif strategy’), it was difficult to segment syllables consistently using previous methods based on setting a threshold on the sound amplitude27,34,35. To overcome this limitation, we segmented syllables based on the phase of the rhythmicity in the song (‘phase segmentation’). The peak of the song rhythm, defined as the spectrum of the sound amplitude during singing38, exhibited a peak around 9 Hz (Extended Data Fig. 8c). To estimate the instantaneous phase of this rhythm, we first band-pass filtered the sound amplitude (Extended Data Fig. 8c, d; second-order IIR resonator filter with peak at 9 Hz and −3 dB half-bandwidth of 3 Hz; MATLAB command iirpeak). The band-pass filtered signal was then processed using the Hilbert transform (MATLAB command hilbert) to compute the instantaneous amplitude and phase (Extended Data Fig. 8d). Next, we set a threshold on this instantaneous amplitude to find the rhythmic part of the song. Finally, within this rhythmic part, song was segmented by detecting threshold crossings of the instantaneous phase (Extended Data Fig. 8d, bottom). Phase segments that contain no sounds or calls were manually removed. Similarly, phase segmentation (band-pass filter with peak at 10 Hz and half-bandwidth of 3 Hz) was used to segment the song during the protosyllable stage for Bird 4 (Extended Data Fig. 9a, e, f). Note that this method is best suited for segmenting songs that are rhythmic, but in which syllable boundaries are not strongly rhythmic. This appeared to be typical of birds employing the ‘motif strategy’32.
Syllable classification and labeling
Protosyllables were defined by their characteristic durations as has been described previously34,35. In short, to identify the protosyllables, we first subtracted the best-fit exponential distribution (using 200–400 ms) from the syllable duration distribution, and fitted a Gaussian distribution to this residual. Protosyllables were defined as syllables having durations within two standard deviations from the mean of this Gaussian distribution. We labeled protosyllables using the Greek letter ‘α’ in all our birds for consistency.
To label the emerging syllables in the juvenile song, we used the Greek letters β, γ, δ, and ε. In contrast, to label the syllables in the adult motif, we used the capital letters of the Latin alphabet A, B, C, etc. For birds in which the song learning trajectory was tracked developmentally, we labeled the syllables such that the correspondence between the juvenile syllables and adult syllables is straightforward: for example, α becomes A, β becomes B, γ becomes C, δ becomes D, and ε becomes E. Note that this labeling scheme leads to a slightly unconventional labeling of adult song in the sense that a motif can have letters in a reverse order (e.g. CBA in Fig. 4f, g; Extended Data Fig. 6a), or a motif might not have a syllable A (e.g. EDCB in Extended Data Fig. 7a).
Syllable labeling was done manually by visual inspection of the song spectrogram; this was done blind with respect to the neural activity. The existence of multiple distinct syllable types were confirmed by calculating the syllable duration and acoustic features commonly used to analyze birdsong syllables51,55, and visualizing the clusters for each syllables in a two-dimensional space52 (Fig. 3b, Extended Data Fig. 8b, 9d). In some cases, syllable order was used as an additional indicator of syllable identity (e.g. Extended Data Fig. 7a, 70 dph; Extended Data Fig. 8a, 51 dph; Extended Data Fig. 9a, 59 dph).
In Bird 1, syllables β and γ were labeled manually by the visual inspection of the song spectrogram (Fig. 3a). Since characterizing shared neurons and specific neurons depends on the reliable labeling of syllables, we took a conservative approach and only labeled syllables that were clearly identifiable and did not label the syllables that were ambiguous (fraction of syllables labeled as β or γ during 62–66 dph: 70 ± 5.5%, mean ± s.d.). We then estimated the error rate of our labeling procedure by plotting the labeled syllables (n=200 syllables per type on each day) in a two-dimensional space of syllable duration and mean pitch goodness (Fig. 3b), and obtained a decision boundary using linear discriminant analysis. We used mismatch between manual labeling and feature-based labeling to estimate the error rate for syllable β and γ. The error rate during the first five days of syllable differentiation (62–66 dph), when the labeling was most difficult, was only 1.1 % on average (range: 0.25–3.0%).
For the second round of differentiation in Bird 1, syllable order was used to assist in the labeling of syllables in early stages when syllables ‘B’ and ‘D’ were not easily distinguishable based on acoustic differences. Because these syllables underwent bout-onset differentiation, the first β after bout onset was labeled ‘D’; later renditions of β in the bout were labeled ‘B’ (Extended Data Fig. 7a).
In Bird 2, several emerging syllables could be easily distinguished based on syllable durations (Extended Data Fig. 6d). Specifically, syllables whose durations were 110–160 ms, and 180–250 ms were defined as α and β, respectively. Syllables that were 10–75 ms in duration were labeled γ if they were followed by a β, and labeled ε otherwise.
Chronic neural recordings
Single-unit recordings of HVC projection neurons during singing were carried out using a motorized microdrive described previously56,57. Single-units were confirmed by the existence of the refractory period in the inter-spike interval (ISI) distribution (Extended Data Fig. 1b). Neurons that were active only during distance calls and not during singing20 were excluded from the analysis. In addition, neurons recorded for less than 5 seconds of singing were excluded since the short recording duration did not allow us to reliably quantify the activity pattern of these neurons.
Antidromic identification of HVC projection neurons was carried out with a bipolar stimulating electrode implanted in RA and Area X (single pulse of 200 µs every 1 second; current amplitude: 50–500 µA)19,20,57–59. A subset of antidromically-identified projection neurons was further validated with collision testing19,20,57–59. A subset of single units were identified as putative projection neurons based on sparse bursting, but could not be antidromically identified because they did not respond to antidromic stimulation or were lost before antidromic identification could be carried out (211/1150 neurons). These neurons were included in the data set as unidentified HVC projection neurons (HVCp).
Analysis of neural activity
Spikes were sorted offline using a custom MATLAB software (D. Aronov).
Definition of bursts
HVC projection neurons exhibited bursts of action potentials during singing (Fig. 1a–c). The bursting nature of these neurons was evident in the inter-spike interval (ISI) distribution during singing, which exhibited two peaks with an inter-peak minimum near 30 ms. ISIs shorter than 30 ms correspond to ISIs within bursts, and ISIs longer than 30 ms correspond to ISIs between bursts (Extended Data Fig. 1b). We defined a ‘burst’ as a continuous group of spikes separated by intervals of 30 ms or less. Thus, by definition, bursts are separated from other spikes by intervals greater than 30 ms. Note that single spikes separated by more than 30 ms from both the preceding spike and the following spikes were also counted as a burst. Burst time was defined as the center of mass of all the spikes within the burst. Burst width was defined as the interval between the first and the last spike in a burst (Extended Data Fig. 1c, top). Firing rate during burst was defined as a reciprocal of the mean inter-spike interval in a burst (Extended Data Fig. 1c, bottom). For the calculation of burst width and firing rate during bursts, bursts composed of a single spike were excluded.
Syllable-related neural activity
To analyze the temporal relation between neural activity and song syllables, we aligned the spike times to syllable onsets and constructed a rate histogram (1 ms bin, smoothed over 20 bins; range: ±0.5 s from syllable onsets). Peak in this rate histogram was found between 50 ms before syllable onset and 200 ms after syllable onset. To test the significance of this peak, surrogate histograms were created by adding different random time shifts to the spike times on each trial60. Random time shifts were drawn from a uniform distribution over ±0.5 s. The peak of this surrogate histogram was recorded, and this shuffling procedure was repeated 1,000 times; P-values were obtained by analyzing the frequency with which the peaks of surrogate data were larger than that of the real data, and P<0.05 was considered significant60. To visualize the population activity associated with protosyllables, we constructed a population raster plot by choosing 20 protosyllable renditions for which each neuron was most active, and by plotting different neurons in different colors (Fig. 2b, Extended Data Fig. 1n, 9k). For all the other population raster plots associated with identified syllables, 20 random renditions were chosen for display. For all the population raster plots, syllable duration from each rendition was linearly time-warped to the mean duration of the syllable. Spike times were warped by the same factor.
Bout-related neural activity
A subset of HVC projection neurons exhibited bout-related activity: bursting before bout onsets and/or after bout offsets (Fig. 1d, e, Extended Data Fig. 2e–l). To quantify the pre-bout activity, we generated histograms aligned to bout onsets (Extended Data Fig. 2f, g) and found a peak in the histogram in a 300 ms window prior to bout onset. We considered a neuron to be exhibiting ‘pre-bout activity’ if the size of this peak was significant (P<0.05) compared to peaks obtained from the surrogate histograms (identical to the procedure described above in Syllable-related neural activity). To eliminate the possibility of including syllable-related activity as bout-related activity, we did not consider a neuron to be exhibiting pre-bout activity if the neuron showed a peak in the bout-onset aligned histogram and a peak at a similar latency (less than 25 ms apart) for the syllable-onset aligned histogram. We considered a neuron to be exhibiting ‘post-bout activity’ if there was a significant peak in the bout-offset aligned histogram (Extended Data Fig. 2j,k) in a 300 ms window after bout-offset.
Quantification of the rhythmic neural activity
To quantify the rhythmic neural activity of HVC projection neurons, we used four different methods: inter-burst interval, spike-train autocorrelation, spectrum of the spike train, and cepstrum of the spike train. Only spikes that were produced during singing (i.e. between the onset of the first syllable and the offset of the last syllable in the bout) were used for the calculation of these measures. (1) Inter-burst interval. Intervals between burst times were calculated and the peak between 80–1000 ms was found. (2) Spike-train autocorrelation. To quantify the second-order statistics of the firing pattern of HVC neurons, spike-train autocorrelation, expressed as a conditional firing rate61, was calculated, and the peak between 80–1000 ms was found. The width of the center peak indicates the width of bursts, and multiple side lobes with regular intervals indicate rhythmic bursting. (3) Spectrum of the spike train. Rhythmicity of the single-unit activity was also quantified in the frequency domain using the multi-taper spectral analysis of spike trains treated as point processes62. We used the Chronux software to calculate spectrum for the spike trains63,64. First, bouts of singing were segmented into non-overlapping analysis windows of 1.5 second long, and then spectrum for each window was calculated using the multi-taper spectral analysis with time-bandwidth product NW = 3/2 and the number of tapers K=2. To obtain the mean spectrum for a given neuron, spectra calculated from all the analysis windows were averaged. Finally, we found the peak in the mean spectrum within the range 2–15 Hz. (4) Cepstrum of the spike train. HVC projection neurons often exhibited rhythmic bursts with precise inter-burst intervals (Fig. 1b, c). Thus, the spectrum of the spike train tended to have multiple peaks at the multiples of the fundamental frequency. To represent these burst trains that have regular intervals in a more compact way, we calculated the cepstrum (a technique commonly used in speech processing to extract the period of glottal pulses) of the spike train, defined as the inverse Fourier transform of the log spectrum65, and found the peak in the cepstrum between 80–1000 ms.
To assess the significance of the peaks in these four measures, we compared the distribution of peak amplitude obtained from the real data with that of the surrogate data obtained by shuffling the bursts times. For this shuffling procedure, we first identified all the bursts during a bout of singing as described above. We then randomly placed bursts sequentially in an interval that has the same duration as the song bout; when spikes from two bursts were closer than 30 ms, we repeated the random placement until they were spaced by more than 30 ms. Note that this randomization procedure only shuffles the burst times and preserves both the number of bursts and the ISIs within bursts. Then, all four metrics listed above were calculated by applying the same method to this surrogate spike trains. This shuffling was repeated (1,000 times for the IBI and auto-correlation, 100 times for the spectrum and cepstrum) and the P-values of the peak were calculated by analyzing the frequency at which the peaks from the surrogate spike trains were larger than the peak obtained from real data. A neuron was considered to exhibit ‘rhythmic’ bursting if it had significant peaks in at least two of the four metrics. The period of the rhythm was defined as the location of the largest peak of spike-train autocorrelation between 80–1000 ms.
Quantification of the probabilistic neural activity during the protosyllable stage (Extended Data Fig. 2p)
Although many HVC projection neurons recorded in the juvenile bird exhibited rhythmic bursts, these bursts did not occur reliably on every cycle of the rhythm, but instead participated probabilistically (Fig. 2a). To quantify the degree of participation, we first extracted the protosyllables based on syllable duration (see Syllable classification and labeling above) and examined the fraction of protosyllables in which at least one spike occurred (time-window between 30 ms prior to protosyllable onset to 10 ms after protosyllable offset). The fraction of protosyllables in which the neuron was active was obtained for all the HVC projection neurons recorded during the protosyllable stage that showed a significant rhythmic bursting (Extended Data Fig. 2p).
Analysis of simultaneously recorded pairs of neurons (Extended Data Fig. 2q, r)
To test whether probabilistic bursting of neurons in the protosyllable stage is coordinated across many neurons, we analyzed the correlation between pairs of simultaneously recorded neurons (Fig. 2a, bottom). This analysis was restricted to pairs of neurons that were rhythmically bursting (n=11 pairs, 3 birds). Bursting activity of each neuron was converted to a binary string corresponding to its participation in each protosyllable (for the definition of protosyllables, see Syllable classification and labeling above). The activity of a neuron was assigned a ‘1’ for a protosyllable if the neuron exhibited activity in a time-window between 30 ms prior to protosyllable onset to 10 ms after protosyllable offset, and ‘0’ if it did not. Only activity during protosyllables was analyzed to avoid including the highly variable subsong syllables, which are likely generated by circuits outside HVC27,34. For simultaneously recorded pairs of neurons, this procedure resulted in two binary strings corresponding to the protosyllable-related activity of each neuron. We then calculated the coefficient of determination r2 by taking the square of the Pearson’s correlation coefficient r between the two binary strings calculated for each neuron in the pair. The distribution of coefficient of determination is shown in Extended Data Fig. 2q (median r2=0.072, 11 pairs).
We also carried out a mutual information analysis to quantify whether the activity of one neuron was predictive of the set of protosyllables for which the other neuron was active. Using the same binary representation described above, we calculated the joint probability distribution describing the four possible states of activity (neither neuron spikes, neuron A spikes, neuron B spikes, both neurons spike). The mutual information was computed from this joint distribution (Extended Data Fig. 2r, median mutual information=0.056 bits, 11 pairs).
Both the correlation and mutual information were extremely low, suggesting that different projection neurons participated on relatively independent sets of protosyllables. These findings suggest that individual projection neurons participate probabilistically and largely independently in an ongoing rhythmic protosequence within HVC.
Analysis of coverage by HVC projection neuron bursts (Extended Data Fig. 2s, t)
We wondered whether projection neuron bursts effectively span the entire duration of juvenile song syllables, or whether bursts are highly localized to specific times, leaving other times in the syllable unrepresented22. It is clear from the syllable aligned raster plots that some syllables were completely covered by bursts (e.g. Fig. 3h, syllable ‘C’), while other syllables showed some gaps in the burst coverage (e.g. Fig. 4i, syllable ‘A’). To further quantify this aspect of the HVC representation during singing, we analyzed the fraction of time within the syllables of juvenile birds that were ‘covered’ by the recorded projection neurons bursts (‘covered fraction’). This analysis was restricted to syllables with more than 10 associated bursts.
We first determined the region of the song syllable covered by each HVC projection neuron burst. We generated a histogram of syllable -onset or -offset aligned spike times recorded from a single neuron over every recorded rendition of the song syllable. Initial identification of candidate burst events was determined by smoothing the histogram (9 ms sliding square window, 1 ms steps), and setting a threshold to define a window in which to analyze burst spikes (2 Hz for protosyllable stage birds; 10 Hz threshold for older juveniles). To eliminate low-probability spike events, we only considered bursts for which spiking activity (at least one spike) occurred in the candidate burst window on at least 25% of the renditions for that syllable. Bursts were included only if they occurred between 30 ms prior to syllable onset and 10 ms after syllable offset.
For candidate bursts that met these criteria, all spikes occurring in the burst window were considered as contributing to that burst. Based on earlier measurements of postsynaptic currents and potentials of HVC and RA neurons66, each HVC spike in the burst window was conservatively assumed to exert a postsynaptic effect lasting no more than 5 ms. Thus, each spike in the dataset was replaced with a 5 ms postsynaptic square pulse (beginning at the spike time). We considered a region of the syllable to be ‘covered’ by this burst if at least three of these post-synaptic pulses overlapped at that time within the burst, across renditions of the syllable. This procedure yielded a small ‘patch’ of time covered by the burst. The patches associated with each different neuron were combined with a logical ‘OR’ operation to determine the total coverage time of the syllable (again in a window from 30 ms prior to syllable onset to 10 ms after syllable offset). The covered time was divided by the duration of the syllable window to determine the covered fraction. Only syllables that had more than 10 neurons bursting within the syllable window were analyzed. This criterion excluded syllables from Bird 3 (shown in Extended Data Fig. 8), from which relatively few neurons were recorded.
While most syllables had nearly complete burst coverage (>90%), one syllable had coverage of only 73% (Extended Data Fig. 2t), which could potentially be due to the relatively smaller number of neurons recorded in this bird. Thus, we asked whether the measured coverage is consistent with sparse sampling of the recorded bursts from a large number of uniformly placed bursts. To simulate this, we calculated the covered fraction for 1,000 surrogate datasets in which the ‘covered patches’ for each burst were randomly shuffled within the syllable. A random offset was added to the time of each patch, and a circular shift was used, allowing the patches to wrap around the edges of the syllable window. The distribution of covered fractions was determined over all shuffled surrogate datasets, and the 2.5–97.5 percentiles (95% confidence interval) of this distribution were determined (shown as vertical gray bars in Extended Data Fig. 2t).
Shared and specific neurons
To examine whether a given HVC projection neuron was active during multiple syllable types (‘shared’ neuron) or was active only during a specific syllable type (‘specific’ neuron), we first constructed a syllable-onset aligned histogram (1 ms bin, smoothed over 20 bins) for each syllable type. Spike times were linearly time warped67 to the mean duration of that syllable to reduce the trial-to-trial variability in the spike timing associated with the variation in the syllable duration. Next, we found the peak in the firing rate histogram in the interval between 30 ms before syllable onset and 10 ms after syllable offset. We visually inspected the syllable-aligned histograms, and adjusted the interval if necessary to avoid same burst being detected twice (i.e. being associated with an offset of one syllable and an onset of the next syllable). The significance of this peak was determined by comparing it with the peak size obtained from the shuffled histogram using the same method described above (Syllable-related neural activity).
We defined ‘shared’ and ‘specific’ neurons in the context of a particular syllable differentiation process (e.g. β and γ from Bird 1 in Fig. 3; α and β from Bird 2 in Fig. 4; B and D from Bird 1 in Extended Data Fig. 7). ‘Specific’ neurons were defined as neurons that had a significant peak in the syllable-aligned histogram for only one syllable type, whereas ‘shared’ neurons were defined as neurons that had significant peaks for both syllable types. We took a conservative approach and only considered a neuron to be shared if the peak was significant for both syllable types. However, some neurons classified as specific had weak activity for the other syllable that did not reach significance (e.g. Extended Data Fig. 6f). In other words, we believe this method likely underestimated the fraction of neurons with shared activity.
Our method likely underestimated the incidence of shared neurons for another reason as well. Specifically, we defined shared and specific neurons in the context of a particular pair of syllables undergoing differentiation. For example, in a bird that exhibited hierarchical differentiation (Bird 1; Extended Data Fig. 7), we saw examples of neurons that were B-specific when considering B-C differentiation but shared when considering B-D differentiation. Thus, when considering all the syllables in the motif, our definition of shared and specific neuron based on syllable pairs will underestimate the fraction of shared neurons and overestimate the fraction of specific neurons.
Quantification of the similarity of latencies in shared neurons (Extended Data Fig. 4a–d, Extended Data Fig. 8i, j)
To test whether shared neurons were active at similar latencies for multiple syllable types, we first calculated the latency of the peak in the syllable onset- or offset-aligned histograms. We then plotted the latency of the peak for one syllable against that of another syllable (Extended Data Fig. 4a–d). When a shared neuron was active for three or more syllables, two syllables associated with two highest firing rates were chosen. To quantify whether shared neurons were active at similar latencies for two syllable types, we calculated the Pearson’s correlation coefficient r between two latencies, and the P-value under the null hypothesis that r=0.
For the bird whose song was segmented based on the phase of the rhythm (Bird 3, Extended Data Fig. 8), we asked whether bursts of shared neurons during different syllables occurred at similar phases in the rhythm. To quantify the phase of the neural activity, we first detected the burst times during singing, and for each burst, we assigned an instantaneous phase extracted from the song using the Hilbert transform (see the section on phase segmentation above). Then, mean phase of all the bursts produced during a particular syllable type was calculated (φi where i = 1, 2, …, 5 indicates syllables). Finally, the two syllable types were chosen for which the neuron participated most reliably, and the difference between the mean phases for these two syllables (|Δφ| = |φm − φn|, where m and n are syllable indices) was obtained (Extended Data Fig. 8i). We tested the significance of this value by comparing the value of |Δφ| against that obtained from the shuffled data where the pairing of phases were randomized across all shared neurons (Extended Data Fig. 8j; 1,000 shuffles). P-values were obtained by analyzing the frequency with which |Δφ| of surrogate data was smaller than that of the real data, and P<0.05 was considered significant.
Quantification of the activity level difference in shared neurons (Extended Data Fig. 4i, j)
To quantify the difference in the activity level for multiple syllable types in the shared neurons, we calculated the ‘bias’ defined as follows:
where ri is the peak firing rate in the syllable-aligned histogram for syllable i. Bias of 0 indicates equal activity level for both syllable types, whereas bias of 1 indicates exclusive activity for only one of the syllable types (Extended Data Fig. 4j).
Analysis of acoustic features associated with bursts of shared neurons (Extended Data Fig. 5)
We wondered if the bursts of shared neurons were associated with different acoustic signals in the shared syllables at the time of the bursts. (An alternative possibility is that shared neurons burst only at times within the emerging syllable types when the acoustic signals are identical.) An example of a neuron analyzed here is shown in Extended Data Fig. 5a (from the same data shown in Fig. 3e). This neuron bursts just after the onset of both syllables β and γ. We analyzed the acoustic differences in a 0–50 ms analysis window after the burst time, but were most interested in acoustic differences in a narrower premotor window (10–40 ms), as this corresponds to the premotor latency for which one expects HVC neurons to exert an effect on vocal output29,58,68.
For each neuron analyzed, all syllables in which the neuron generated a burst were identified. The analysis was carried out for every syllable rendition on which the neuron burst, and was restricted to only those syllables. Syllables had previously been labeled by type (i.e. β and γ). We first directly visualized the spectral differences between the two syllable types using a sparse contour representation69,70, which is suitable for constructing an ‘average’ spectrogram. The analysis was carried out on the sound signal extracted from a 50 ms window after each burst. In many cases, this spectral representation revealed consistent differences between the different syllable types in this analysis window (Extended Data Fig. 5b, c).
One complication is that some of the shared neurons burst prior to syllable onsets or immediately before syllable offsets such that the 10–40 ms window after the bursts was obscured by silent gaps (9 of 24 HVCRA neurons and 59 of 120 HVCX neurons were obscured). These neurons were excluded from the analysis of acoustic difference.
We further quantified differences in the acoustic signals by extracting time varying acoustic and spectral features in a window 0–50 ms after burst time (see subsection Definition of bursts). We used 8 acoustic features previously established to analyze birdsongs (Wiener entropy, spectral center of gravity, spectral width, pitch, pitch goodness, sound amplitude, amplitude modulation, frequency modulation)51,55. The 8-dimensional vector of features was calculated in 1 ms steps over the 50 ms analysis window (Extended Data Fig. 5d, e).
Because each syllable was labeled, we could determine if the feature trajectories were significantly different for syllables labeled β and those labeled γ, and to make this determination at every time step in the analysis window (Extended Data Fig. 5d, e; s.e.m. indicated by shaded region around mean trajectory). Rather than quantify the difference in these trajectories one feature at a time, we used Fisher’s discriminant analysis71 to project the 8-dimensional acoustic feature vector onto a single dimension that gives maximum separability between the two syllable types. The projected direction is determined independently at each time point, and the feature vectors of all syllable renditions are projected, at each time point, to yield a distribution of projected samples. For most neurons, the different syllable types produce visibly different distributions of projected samples (Extended Data Fig. 5f) indicating distinct acoustic structure. The separability of the distributions (in one dimension) of projected samples for different syllable types was quantified using the d-prime metric (d’), corresponding to the distance between the means of the distributions, normalized by the pooled variance70:
Because the features evolve in time, this analysis is carried out independently at each 1 ms step in the 50 ms analysis window, and the d’ was plotted as a function of time (Extended Data Fig. 5g). Statistical significance of the d’ trajectory was assessed by randomizing the syllable labels and rerunning the d’ analysis on shuffled datasets (N=1,000 shuffles). For each randomization, the peak value of d’ in 10–40 ms premotor window was recorded; significance threshold was as set as the 95 percentile of the distribution of these peak values. A shared neuron was determined to have significant acoustic difference between the shared syllables only if the d’ trajectory remained above this significance threshold for the entire premotor window of 10–40 ms after the burst. Note that, in the simulated data, none of the 1,000 surrogate runs generated a d’ trajectory that met this stringent criterion.
Statistics
Results are expressed as the mean ± s.d. or s.e.m. as indicated. For χ2 tests, if the contingency table included a cell that has an expected frequency less than 5, Fisher’s exact test was used72. All tests were two-sided, and P<0.05 was considered significant. Bonferroni correction was used to account for multiple comparisons.
Figure 1(f) The statistical significance of developmental changes in the fraction of HVC neurons that were syllable-aligned was assessed in two different ways: 1) Each stage was compared with the adult stage using the χ2 test followed by a post-hoc pairwise test. 2) To quantify the developmental trend in the fraction of syllable-locked neurons, we calculated Pearson’s correlation coefficient r between the binary value for each neuron (0, unlocked; 1, locked) and song stage (subsong: 1, protosyllable: 2, multi-syllables: 3, motif: 4, adult: 5). The P-value was calculated under the null hypothesis that r=0. The significance of the developmental trend for rhythmic bursting was calculated similarly. Similar results were obtained for correlation between these metrics and the age at which each neuron was recorded, rather than song stage.
Figure 1(g) The statistical significance of developmental changes in the period of the HVC rhythm was also assessed in two different ways: 1) Each song stage was compared with the adult stage using the Kruskal-Wallis test followed by a post-hoc pairwise test. 2) To quantify the developmental trend in the period of the HVC rhythm, we calculated Pearson’s correlation coefficient r between burst period and song stage. Similar results were obtained for correlation between burst period and the age at which each neuron was recorded.
Figure 2(c) Wilcoxon rank-sum test was used to test whether the median of the syllable-onset aligned latency distribution was different between subsong and protosyllable stages.
Figure 3(g, h) and 4(h, i) To test whether the fraction shared neurons differed between early and late stages of syllable differentiation, we used the χ2 test on a 2 × 2 contingency table (shared/specific, early/late). Significance across all birds: To calculate whether the fraction of shared neurons differed between early and late stages of syllable differentiation over all birds (n=5 syllable pairs in 3 birds), we used the Cochran-Mantel-Haenszel test for repeated tests of independence73.
Extended Data Fig. 1(a) To quantify the relation between song stage and age, we calculated Spearman’s rank correlation coefficient ρ and the P-value under the null hypothesis that ρ=0. (c) We computed the statistical significance of developmental changes in burst width (top) and firing rate during bursts (bottom) by using Kruskal-Wallis test followed by a post-hoc pairwise test to compare each stage with the adult stage.
Extended Data Fig. 2(m–o) To test whether fraction of syllable-locked neurons (m), fraction of rhythmic neurons (n), and period of HVC rhythm (panel o) significantly differed between HVCRA and HVCX, we used χ2 test for all the pairwise comparisons with Bonferroni correction for multiple comparisons.
Extended Data Fig. 4(a–d) To calculated the relation between latencies of bursts associated with shared neurons, we calculated the Pearson’s correlation coefficient r together with the P-value under the null hypothesis that r=0.
Extended Data Fig. 5(m, n) To test whether mean d’ metric were different between HVCRA and HVCX, we used Wilcoxon rank-sum test. Only neurons with d’ trajectories that were significant (continuously from 10–40 ms) were included in this comparison.
Modeling
Binary neuron model
Code used to simulate the model is available as Supplementary Information. To illustrate a potential mechanism of chain splitting, we chose to implement the model as simply as possible. We modeled neurons as binary units and simulated their activity in discrete time steps44; at each time step (10 ms), the i-th neuron either bursts (xi = 1) or is silent (xi = 0).
Network architecture
A network of 100 binary neurons is recurrently connected in an all-to-all manner, with Wij representing the synaptic strength from presynaptic neuron j to postsynaptic neuron i. Self-excitation is prevented by setting Wii = 0 for all i at all times44. Synaptic weights are initialized with random uniform distribution such that each neuron receives, on average, its maximum total input. During learning, the strength of each synapse is constrained to be within the interval [0, wmax], while the total incoming and outgoing weights of each neuron are both constrained by the “soft bound” Wmax = m * wmax where m represents a target number of saturated synapses per neuron44 (see Synaptic plasticity rule section for details).
Network dynamics
The activity of each neuron in the network was determined in two steps; calculating the net feedforward input that comes from the previous time step, and determining whether that is enough to overcome the recurrent inhibition in the current time step.
First, the net feedforward input to the i-th neuron at time step t, was calculated by summing the excitation, feedforward inhibition, neural adaptation, and external inputs:
where [z]+ indicates a rectification (equal to z if z>0 and 0 otherwise). is the excitatory input from network activity on the previous time step. AIff(t) = β∑jxj (t − 1) is a feedforward inhibitory input44, where β sets the strength of this feedforward inhibition. , is an adaptation term44 where α is the strength of adaptation, and yi is a low-pass filtered record of recent activity in xi with time constant τadapt = 40 ms; that is . Bi(t) is the external input to neuron i at time t. For seed neurons, this term consists of training inputs (see section on Seed neurons). For non-seed neurons, it consists of random inputs with probability pin = 0.01 in each time step and size Wmax/10. Finally, θi is a threshold term used to reduce the excitability of seed neurons, making them less responsive to recurrent input than are other neurons in the network. For seed neurons, θi = 10 and for non-seed neurons, θi = 0. Including this term improves robustness of the training procedure by eliminating occasional situations in which seed neuron activity may be dominated by recurrent rather than external inputs. In these cases, external inputs may fail to exert proper control of network activity.
Second, we determined whether the i-th neuron will burst or not at time step t by examining whether the net feedforward input exceeds the recurrent inhibition AI_rec(t). We implemented recurrent inhibition by estimating the total activity of the network at time t:
and feeding it back to all the neurons. Parameter γ sets the strength of the recurrent inhibition. We assume that this recurrent inhibition operates on a fast time scale48 (i.e. faster than the duration of a burst). Thus, the final output of the i-th neuron at time t becomes:
where Θ [z] is the Heaviside step function (equal to 1 if z > 0 and 0 otherwise). To induce splitting, γ was gradually stepped up to γsplit following a sigmoid with time constant τγ and inflection point t0:
Seed neurons
A subset of neurons was designated as seed neurons, which received external training inputs used to shape network activity during learning43,45. The external training inputs activate seed neurons at syllable onsets, reflecting the observed onset-related bursts of HVC neurons during the subsong stage (Fig. 1a). The pattern of these inputs was adjusted in different stages of learning, and each strategy of syllable learning was implemented by different patterns of seed neuron training inputs.
Alternating differentiation (Fig. 5a–e)
Ten neurons were designated as seed neurons and received strong external input (Wmax) to drive network activity. In the subsong stage, seed neurons were driven (by external inputs) synchronously and randomly with probability 0.1 in each time step corresponding to the random occurrence of syllable onsets in subsong27,34. This was done only to visualize network activity; no learning was implemented at the subsong stage. During the protosyllable stage, seed neurons were driven synchronously and rhythmically with a period T = 100 ms. The protosyllable stage consisted of 500 iterations of 10 pulses each. To initiate chain splitting, the seed neurons were divided into two groups and each group was driven on alternate cycles. The splitting stage consisted of 2,000 iterations of 5 pulses in each group of seed neurons (1 second total).
Motif strategy (Extended Data Fig. 10e–h)
This was implemented in a similar manner as alternating differentiation, except that 9 seed neurons were used, and for the splitting stage, seed neurons were divided into 3 groups of 3 neurons, each driven on every third cycle.
Bout-onset differentiation (Extended Data Fig. 10a–d)
Seed neurons were divided into two groups: 5 bout-onset seed neurons and 5 protosyllable seed neurons. At all learning stages, external inputs were organized into bouts consisting of four separate input pulses: Bout-onset seed neurons were driven at the beginning of each bout. Then, 30 ms later, protosyllable seed neurons were driven three times with an interval of T = 100 ms. In the protosyllable stage, inputs to all seed neurons were of strength Wmax. In the splitting stage, the input to protosyllable seed neurons was decreased to Wmax/10. This allowed neurons in the bout-onset chain to suppress, through fast recurrent inhibition, the activity of protosyllable seed neurons during bout-onset syllables.
Each iteration of the simulation was 5 seconds long, consisting of 10 bouts, as described directly above, with random inter-bout intervals. The protosyllable stage consisted of 100 iterations, and the splitting stage consisted of 500 iterations.
Bout-onset syllable formation (Extended Data Fig. 10i–k)
Input to seed neurons was set high (2.5 * Wmax), and maintained at this high level throughout development. This prevented protosyllable seed neurons from being inhibited by neurons in the bout-onset chain. Furthermore, strong external input to the protosyllable seed neurons terminated activity in the bout-onset chain through fast recurrent inhibition, thus preventing further growth of the bout-onset chain, as occurs in bout-onset differentiation.
As in bout-onset differentiation, each iteration of the simulation was 5 seconds long, consisting of 10 bouts with random inter-bout intervals. The protosyllable stage consisted of 100 iterations, and the splitting stage consisted of 500 iterations.
Synaptic plasticity rule
As in previous models43,44, we hypothesized two plasticity rules in our model: Hebbian spike-timing dependent plasticity (STDP) to drive sequence formation74,75, and heterosynaptic long term depression (hLTD) to introduce competition between synapses of a given neuron43,44. STDP is governed by the antisymmetric plasticity rule with a short temporal window (one burst duration):
where the constant η sets the learning rate. hLTD limits the total strength of weights for neuron i, and the summed weight limit rule for incoming weights is given by:
and for outgoing weights from neuron j:
At each time step, total change in synapse weight is given by the combination of STDP and hLTD:
where ε sets the relative strength of hLTD.
Model parameters: subsong (Fig. 5a)
In our implementation of the subsong stage, there was no learning. Subsong model parameters were: β = 0.115, α = 30, η = 0, ε = 0, γ = 0.01.
Model parameters: alternating differentiation (Fig. 5b–d)
After subsong, learning progressed in two stages: the protosyllable stage and the splitting stage. Parameters that remained constant over development were: β = 0.115, α = 30, η = 0.025, ε = 0.2. To induce chain splitting, wmax was increased from 1 to 2, m was decreased from 10 to 5, and γ was increased from 0.01 to 0.18 following a sigmoid with time constant τγ = 200 iterations and inflection point t0 = 500 iterations into the splitting stage. No change in parameters occurred prior to the chain-splitting stage.
Model parameters: bout-onset differentiation (Extended Data Fig. 10a–d)
Parameters that remained constant over development were:β = 0.13, α = 30, η = 0.05, ε = 0.14. To induce chain splitting, wmax was increased from 1 to 2, m was decreased from 5 to 2.5, and τγ was increased from 0.01 to 0.04 following a sigmoid with time constant τγ = 200 iterations and inflection point t0 = 250 iterations into the splitting stage.
Model parameters: motif strategy (Extended Data Fig. 10e–h)
Parameters that remained constant over development were: β = 0.115, α = 30, η = 0.025, ε = 0.2. To induce chain splitting, wmax was increased from 1 to 2, m was decreased from 9 to 3, and γ was increased from 0.01 to 0.18 following a sigmoid with time constant τγ = 200 iterations and inflection point t0 = 500 iterations into the splitting stage.
Model parameters: formation of a new syllable at bout onset (Extended Data Fig. 10i–k)
Parameters that remained constant over development were: β = 0.13, α = 30, η = 0.05, ε = 0.15. To induce chain splitting, wmax was increased from 1 to 2, m was decreased from 5 to 2.5, and γ was increased from 0.01 to 0.05 following a sigmoid with time constant τγ = 200 iterations and inflection point t0 = 250 iterations into the splitting stage.
Shared and specific neurons
Neurons were classified as participating in a syllable type if the syllable onset-aligned histogram exhibited a peak that passed a threshold criterion. The criteria were chosen to include neurons where the histogram peak exceeded 90% of surrogate histogram peaks. Surrogate histograms were generated by placing one burst at a random latency in each syllable. (For example, in the protosyllable stage, the above criterion was found to be equivalent to having 5 bursts at the same latency in a bout of 10 protosyllables.) During the splitting phase, neurons were classified as shared if they participated in both syllable types, and specific if they participated in only one syllable type.
Visualizing network activity
We visualized network activity in two ways: network diagrams, and raster plots of population activity (e.g. Fig. 5a–d top and bottom panels, respectively). In both cases, we only included neurons that participated in at least one of the syllable types (see Shared and specific neurons above for participation criteria).
Network diagrams
Neurons are sorted along the x-axis based on their relative latencies. Neurons are sorted along the y-axis based on the relative strength of their synaptic input from specific neurons (or seed neurons) of each type (red or blue). Lines between neurons correspond to feedforward synaptic weights, and darker lines indicate stronger synaptic weights. For clarity of plotting, only the strongest six outgoing and strongest nine incoming weights are plotted for each neuron.
Population raster plots
Neurons are sorted from top to bottom according to their latency. Groups of seed neurons are indicated by magenta arrows. Shared neurons are plotted at the top and specific neurons are plotted below. As for network diagrams, neurons that did not reliably participate in at least one syllable type were excluded.
Further details for Figure 5(a–d)
Panels show network diagrams and raster plots at four different stages. (a) subsong stage (before learning), (b) end of protosyllable stage (iteration 500), (c) early chain splitting stage (iteration 992), (d) late chain-splitting stage (iteration 2,500).
Further details for Extended Data Fig. 10(a–d) (a) early protosyllable stage (iteration 5), (b) late protosyllable stage (iteration 100), (c) early chain splitting stage (iteration 130), (d) late chain splitting stage (iteration 600).
Extended Data
Supplementary Material
Acknowledgments
We thank M. Wilson, J. Kornfeld, M. Jazayeri, S. Seung, N. Ji, and M. Stetner for comments on the manuscript. Funding to M.S.F. was provided by the NIH (grant # R01DC009183) and by the Mathers Foundation, to T.S.O. by the Nakajima Foundation and Schoemaker Fellowship, to E.L.M. by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program, and to H.L.P. by the National Science Foundation (NSF) Graduate Research Fellowship Program (#DGE-114747) and the NSF Integrative Graduate Education and Research Traineeship (#0801700).
Footnotes
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author contributions
The study was conceived and designed by T.S.O. and M.S.F. Experimental data were collected by T.S.O. Data were analyzed by. T.S.O and M.S.F. with contributions from G.F.L. The modeling study was performed by E.L.M. and H.L.P. in collaboration with T.S.O. and M.S.F. All five authors contributed to writing the manuscript.
The authors declare no competing financial interest.
Code availability
Code used to simulate the model is available as Supplementary Information.
References
- 1.Wikenheiser AM, Redish AD. Hippocampal theta sequences reflect current goals. Nat Neurosci. 2015 doi: 10.1038/nn.3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pfeiffer BE, Foster DJ. Hippocampal place-cell sequences depict future paths to remembered goals. Nature. 2013;497:74–79. doi: 10.1038/nature12112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dragoi G, Tonegawa S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature. 2011;469:397–401. doi: 10.1038/nature09633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Davidson TJ, Kloosterman F, Wilson MA. Hippocampal replay of extended experience. Neuron. 2009;63:497–507. doi: 10.1016/j.neuron.2009.07.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fujisawa S, Amarasingham A, Harrison MT, Buzsaki G. Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nat Neurosci. 2008;11:823–833. doi: 10.1038/nn.2134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pastalkova E, Itskov V, Amarasingham A, Buzsaki G. Internally generated cell assembly sequences in the rat hippocampus. Science. 2008;321:1322–1327. doi: 10.1126/science.1159775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Eichenbaum H. Time cells in the hippocampus: a new dimension for mapping memories. Nature reviews. Neuroscience. 2014;15:732–744. doi: 10.1038/nrn3827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Harvey CD, Coen P, Tank DW. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature. 2012;484:62–68. doi: 10.1038/nature10918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Murakami M, Vicente MI, Costa GM, Mainen ZF. Neural antecedents of self-initiated actions in secondary motor cortex. Nat Neurosci. 2014;17:1574–1582. doi: 10.1038/nn.3826. [DOI] [PubMed] [Google Scholar]
- 10.Peters AJ, Chen SX, Komiyama T. Emergence of reproducible spatiotemporal activity during motor learning. Nature. 2014;510:263–267. doi: 10.1038/nature13235. [DOI] [PubMed] [Google Scholar]
- 11.Tanji J. Sequential organization of multiple movements: involvement of cortical motor areas. Annu Rev Neurosci. 2001;24:631–651. doi: 10.1146/annurev.neuro.24.1.631. [DOI] [PubMed] [Google Scholar]
- 12.Buzsaki G. Neural syntax: cell assemblies, synapsembles, and readers. Neuron. 2010;68:362–385. doi: 10.1016/j.neuron.2010.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vogels TP, Rajan K, Abbott LF. Neural network dynamics. Annu Rev Neurosci. 2005;28:357–376. doi: 10.1146/annurev.neuro.28.061604.135637. [DOI] [PubMed] [Google Scholar]
- 14.Immelmann K. In: Bird Vocalizations. Hinde RA, editor. Cambridge University Press; 1969. pp. 61–74. [Google Scholar]
- 15.Doupe AJ, Kuhl PK. Birdsong and human speech: common themes and mechanisms. Annu Rev Neurosci. 1999;22:567–631. doi: 10.1146/annurev.neuro.22.1.567. [DOI] [PubMed] [Google Scholar]
- 16.Mooney R. Neural mechanisms for learned birdsong. Learn Mem. 2009;16:655–669. doi: 10.1101/lm.1065209. [DOI] [PubMed] [Google Scholar]
- 17.Konishi M. Birdsong: from behavior to neuron. Annu Rev Neurosci. 1985;8:125–170. doi: 10.1146/annurev.ne.08.030185.001013. [DOI] [PubMed] [Google Scholar]
- 18.Brainard MS, Doupe AJ. Translating birdsong: songbirds as a model for basic and applied medical research. Annu Rev Neurosci. 2013;36:489–517. doi: 10.1146/annurev-neuro-060909-152826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hahnloser RH, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature. 2002;419:65–70. doi: 10.1038/nature00974. [DOI] [PubMed] [Google Scholar]
- 20.Kozhevnikov AA, Fee MS. Singing-related activity of identified HVC neurons in the zebra finch. J Neurophysiol. 2007;97:4271–4283. doi: 10.1152/jn.00952.2006. [DOI] [PubMed] [Google Scholar]
- 21.Long MA, Jin DZ, Fee MS. Support for a synaptic chain model of neuronal sequence generation. Nature. 2010;468:394–399. doi: 10.1038/nature09514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Amador A, Perl YS, Mindlin GB, Margoliash D. Elemental gesture dynamics are encoded by song premotor cortical neurons. Nature. 2013;495:59–64. doi: 10.1038/nature11967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Fujimoto H, Hasegawa T, Watanabe D. Neural coding of syntactic structure in learned vocalizations in the songbird. J Neurosci. 2011;31:10023–10033. doi: 10.1523/JNEUROSCI.1606-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Prather JF, Peters S, Nowicki S, Mooney R. Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature. 2008;451:305–310. doi: 10.1038/nature06492. [DOI] [PubMed] [Google Scholar]
- 25.Nottebohm F, Stokes TM, Leonard CM. Central Control of Song in Canary, Serinus-Canarius. Journal of Comparative Neurology. 1976;165:457–486. doi: 10.1002/cne.901650405. doi: [DOI] [PubMed] [Google Scholar]
- 26.Long MA, Fee MS. Using temperature to analyse temporal dynamics in the songbird motor pathway. Nature. 2008;456:189–194. doi: 10.1038/nature07448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Aronov D, Andalman AS, Fee MS. A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science. 2008;320:630–634. doi: 10.1126/science.1155140. [DOI] [PubMed] [Google Scholar]
- 28.Simpson HB, Vicario DS. Brain pathways for learned and unlearned vocalizations differ in zebra finches. J Neurosci. 1990;10:1541–1556. doi: 10.1523/JNEUROSCI.10-05-01541.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ali F, et al. The basal ganglia is necessary for learning spectral, but not temporal, features of birdsong. Neuron. 2013;80:494–506. doi: 10.1016/j.neuron.2013.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vallentin D, Long MA. Motor origin of precise synaptic inputs onto forebrain neurons driving a skilled behavior. J Neurosci. 2015;35:299–307. doi: 10.1523/JNEUROSCI.3698-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zann RA. The Zebra Finch: A Synthesis of Field and Laboratory Studies. Oxford University Press; 1996. [Google Scholar]
- 32.Liu WC, Gardner TJ, Nottebohm F. Juvenile zebra finches can use multiple strategies to learn the same song. Proc Natl Acad Sci U S A. 2004;101:18177–18182. doi: 10.1073/pnas.0408065101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Tchernichovski O, Mitra PP, Lints T, Nottebohm F. Dynamics of the vocal imitation process: how a zebra finch learns its song. Science. 2001;291:2564–2569. doi: 10.1126/science.1058522. [DOI] [PubMed] [Google Scholar]
- 34.Aronov D, Veit L, Goldberg JH, Fee MS. Two distinct modes of forebrain circuit dynamics underlie temporal patterning in the vocalizations of young songbirds. J Neurosci. 2011;31:16353–16368. doi: 10.1523/JNEUROSCI.3009-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Veit L, Aronov D, Fee MS. Learning to breathe and sing: development of respiratory-vocal coordination in young songbirds. J Neurophysiol. 2011;106:1747–1765. doi: 10.1152/jn.00247.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tchernichovski O, Mitra PP. Towards quantification of vocal imitation in the zebra finch. J Comp Physiol A Neuroethol Sens Neural Behav Physiol. 2002;188:867–878. doi: 10.1007/s00359-002-0352-4. [DOI] [PubMed] [Google Scholar]
- 37.Glaze CM, Troyer TW. Development of temporal structure in zebra finch song. J Neurophysiol. 2013;109:1025–1035. doi: 10.1152/jn.00578.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Saar S, Mitra PP. A technique for characterizing the development of rhythms in bird song. PLoS One. 2008;3:e1461. doi: 10.1371/journal.pone.0001461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lipkind D, et al. Stepwise acquisition of vocal combinatorial capacity in songbirds and human infants. Nature. 2013;498:104–108. doi: 10.1038/nature12173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lipkind D, Tchernichovski O. Quantification of developmental birdsong learning from the subsyllabic scale to cultural evolution. Proc Natl Acad Sci U S A. 2011;108(Suppl 3):15572–15579. doi: 10.1073/pnas.1012941108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jin DZ, Ramazanoglu FM, Seung HS. Intrinsic bursting enhances the robustness of a neural network model of sequence generation by avian brain area HVC. J Comput Neurosci. 2007;23:283–299. doi: 10.1007/s10827-007-0032-z. [DOI] [PubMed] [Google Scholar]
- 42.Li M, Greenside H. Stable propagation of a burst through a one-dimensional homogeneous excitatory chain model of songbird nucleus HVC. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;74:011918. doi: 10.1103/PhysRevE.74.011918. [DOI] [PubMed] [Google Scholar]
- 43.Jun JK, Jin DZ. Development of neural circuitry for precise temporal sequences through spontaneous activity, axon remodeling, and synaptic plasticity. PLoS One. 2007;2:e723. doi: 10.1371/journal.pone.0000723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fiete IR, Senn W, Wang CZ, Hahnloser RH. Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. Neuron. 2010;65:563–576. doi: 10.1016/j.neuron.2010.02.003. [DOI] [PubMed] [Google Scholar]
- 45.Buonomano DV. A learning rule for the emergence of stable dynamics and timing in recurrent networks. J Neurophysiol. 2005;94:2275–2283. doi: 10.1152/jn.01250.2004. [DOI] [PubMed] [Google Scholar]
- 46.Gibb L, Gentner TQ, Abarbanel HD. Inhibition and recurrent excitation in a computational model of sparse bursting in song nucleus HVC. J Neurophysiol. 2009;102:1748–1762. doi: 10.1152/jn.00670.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bertram R, Daou A, Hyson RL, Johnson F, Wu W. Two neural streams, one voice: pathways for theme and variation in the songbird brain. Neuroscience. 2014;277:806–817. doi: 10.1016/j.neuroscience.2014.07.061. [DOI] [PubMed] [Google Scholar]
- 48.Kosche G, Vallentin D, Long MA. Interplay of Inhibition and Excitation Shapes a Premotor Neural Sequence. Journal of Neuroscience. 2015;35:1217–1227. doi: 10.1523/JNEUROSCI.4346-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Goller F, Cooper BG. Peripheral motor dynamics of song production in the zebra finch. Ann N Y Acad Sci. 2004;1016:130–152. doi: 10.1196/annals.1298.009. [DOI] [PubMed] [Google Scholar]
- 50.Ohno S. Evolution by Gene Duplication. Springer-Verlag; 1970. [Google Scholar]
References
- 51.Tchernichovski O, Nottebohm F, Ho CE, Pesaran B, Mitra PP. A procedure for an automated measurement of song similarity. Anim Behav. 2000;59:1167–1176. doi: 10.1006/anbe.1999.1416. [DOI] [PubMed] [Google Scholar]
- 52.Tchernichovski O, Lints TJ, Deregnaucourt S, Cimenser A, Mitra PP. Studying the song development process: rationale and methods. Ann N Y Acad Sci. 2004;1016:348–363. doi: 10.1196/annals.1298.031. [DOI] [PubMed] [Google Scholar]
- 53.Goller F, Daley MA. Novel motor gestures for phonation during inspiration enhance the acoustic complexity of birdsong. Proc Biol Sci. 2001;268:2301–2305. doi: 10.1098/rspb.2001.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Rajan R, Doupe AJ. Behavioral and neural signatures of readiness to initiate a learned motor sequence. Curr Biol. 2013;23:87–93. doi: 10.1016/j.cub.2012.11.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mandelblat-Cerf Y, Fee MS. An automated procedure for evaluating song imitation. PLoS One. 2014;9:e96484. doi: 10.1371/journal.pone.0096484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Fee MS, Leonardo A. Miniature motorized microdrive and commutator system for chronic neural recording in small animals. J Neurosci Methods. 2001;112:83–94. doi: 10.1016/s0165-0270(01)00426-5. [DOI] [PubMed] [Google Scholar]
- 57.Okubo TS, Mackevicius EL, Fee MS. In Vivo Recording of Single-Unit Activity during Singing in Zebra Finches. Cold Spring Harbor Protocols. 2014 doi: 10.1101/pdb.prot084624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Fee MS, Kozhevnikov AA, Hahnloser RH. Neural mechanisms of vocal sequence generation in the songbird. Ann N Y Acad Sci. 2004;1016:153–170. doi: 10.1196/annals.1298.022. [DOI] [PubMed] [Google Scholar]
- 59.Hahnloser RH, Kozhevnikov AA, Fee MS. Sleep-related neural activity in a premotor and a basal-ganglia pathway of the songbird. J Neurophysiol. 2006;96:794–812. doi: 10.1152/jn.01064.2005. [DOI] [PubMed] [Google Scholar]
- 60.Goldberg JH, Fee MS. A cortical motor nucleus drives the basal ganglia-recipient thalamus in singing birds. Nat Neurosci. 2012;15:620–627. doi: 10.1038/nn.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rieke F. Spikes : Exploring the Neural Code. MIT Press; 1997. [Google Scholar]
- 62.Jarvis MR, Mitra PP. Sampling properties of the spectrum and coherency of sequences of action potentials. Neural Comput. 2001;13:717–749. doi: 10.1162/089976601300014312. [DOI] [PubMed] [Google Scholar]
- 63.Bokil H, Andrews P, Kulkarni JE, Mehta S, Mitra PP. Chronux: a platform for analyzing neural signals. J Neurosci Methods. 2010;192:146–151. doi: 10.1016/j.jneumeth.2010.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Mitra P, Bokil H. Observed Brain Dynamics. Oxford University Press; 2008. [Google Scholar]
- 65.Oppenheim AV, Schafer RW. From frequency to quefrency: A history of the Cepstrum. IEEE Signal Proc Mag. 2004;21:95–106. doi: [Google Scholar]
- 66.Garst-Orozco J, Babadi B, Olveczky BP. A neural circuit mechanism for regulating vocal variability during song learning in zebra finches. Elife. 2014;4:e03697. doi: 10.7554/eLife.03697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Leonardo A, Fee MS. Ensemble coding of vocal control in birdsong. J Neurosci. 2005;25:652–661. doi: 10.1523/JNEUROSCI.3036-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ashmore RC, Wild JM, Schmidt MF. Brainstem and forebrain contributions to the generation of learned motor behaviors for song. J Neurosci. 2005;25:8543–8554. doi: 10.1523/JNEUROSCI.1668-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Lim Y, Shinn-Cunningham B, Gardner TJ. Sparse Contour Representations of Sound. IEEE Signal Proc Let. 2012;19:684–687. doi: [Google Scholar]
- 70.Markowitz JE, Ivie E, Kligler L, Gardner TJ. Long-range order in canary song. PLoS Comput Biol. 2013;9:e1003052. doi: 10.1371/journal.pcbi.1003052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Duda RO, Hart PE, Stork D. Pattern Classification. 2nd. Wiley; 2001. G. [Google Scholar]
- 72.Kanji GK. 100 Statistical Tests. 3rd. Sage Publications; 2006. [Google Scholar]
- 73.McDonald JH. Handbook of Biological Statistics. 3rd. Sparky House Publishing; 2014. [Google Scholar]
- 74.Abbott LF, Blum KI. Functional significance of long-term potentiation for sequence learning and prediction. Cereb Cortex. 1996;6:406–416. doi: 10.1093/cercor/6.3.406. [DOI] [PubMed] [Google Scholar]
- 75.Dan Y, Poo MM. Spike timing-dependent plasticity: from synapse to perception. Physiol Rev. 2006;86:1033–1048. doi: 10.1152/physrev.00030.2005. [DOI] [PubMed] [Google Scholar]
- 76.Fee MS, Goldberg JH. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience. 2011;198:152–170. doi: 10.1016/j.neuroscience.2011.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Fiete IR, Hahnloser RH, Fee MS, Seung HS. Temporal sparseness of the premotor drive is important for rapid learning in a neural network model of birdsong. J Neurophysiol. 2004;92:2274–2282. doi: 10.1152/jn.01133.2003. [DOI] [PubMed] [Google Scholar]
- 78.Charlesworth JD, Tumer EC, Warren TL, Brainard MS. Learning the microstructure of successful behavior. Nat Neurosci. 2011;14:373–380. doi: 10.1038/nn.2748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Ravbar P, Lipkind D, Parra LC, Tchernichovski O. Vocal exploration is locally regulated during song learning. J Neurosci. 2012;32:3422–3432. doi: 10.1523/JNEUROSCI.3740-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Walton C, Pariser E, Nottebohm F. The zebra finch paradox: song is little changed, but number of neurons doubles. J Neurosci. 2012;32:761–774. doi: 10.1523/JNEUROSCI.3434-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.