Abstract
How are brain circuits constructed to achieve complex goals? The brains of young songbirds develop motor circuits that achieve the goal of imitating a specific tutor song to which they are exposed. Here, we set out to examine how song-generating circuits may be influenced early in song learning by a cortical region (NIf) at the interface between auditory and motor systems. Single-unit recordings reveal that, during juvenile babbling, NIf neurons burst at syllable onsets, with some neurons exhibiting selectivity for particular emerging syllable types. When juvenile birds listen to their tutor, NIf neurons are also activated at tutor syllable onsets, and are often selective for particular syllable types. We examine a simple computational model in which tutor exposure imprints the correct number of syllable patterns as ensembles in an interconnected NIf network. These ensembles are then reactivated during singing to train a set of syllable sequences in the motor network.
Subject terms: Network models, Learning and memory, Birdsong
Young songbirds learn to imitate their parents’ songs. Here, the authors find that, in baby birds, neurons in a brain region at the interface of auditory and motor circuits signal the onsets of song syllables during both tutoring and babbling, suggesting a specific neural mechanism for vocal imitation.
Introduction
Unlike motor circuits for innate behaviors, which are built by genetically specified developmental programs, motor circuits for complex learned behaviors must be built by a combination of genes and experience. Many of our most complex and expressive behavioral repertoires, from speech to music to cooking to sports, are learned through observing others, and through practice of simpler elements. For example, in the process of mastering a musical piece, musicians repeatedly practice very short musical elements. The process of learning often involves breaking a complex sequence of actions down into simple discrete pieces which are practiced individually and then assembled later. Such behavioral elements, or movement chunks, can be flexibly composed in the brain to produce complex motor sequences1–3. This powerful strategy for generating new behaviors supports the creation of a rich variety of actions, even from relatively few primitive building blocks. We were interested in how the brain makes building blocks for a new behavioral repertoire.
Songbirds are an excellent model system to address this question. Songbirds learn their vocalizations by imitating songs of tutors they heard as juveniles. They learn in two stages: a sensory stage where they memorize the tutor song, and a sensorimotor stage where they practice vocalizing4,5. The memory of the tutor song, called the song template, is sufficient to guide imitation5, even if restricted to a single 2-h window of tutor exposure6. Song learning exhibits convergent behavioral, circuit-level, and genetic parallels with human speech learning7–17, and also has parallels with other forms of mammalian motor learning18.
Like other complex motor behaviors such as speech, songs are divided into discrete simpler chunks, syllables19. After initial random babbling, a repeatable protosyllable emerges, which then differentiates into multiple daughter syllables4. As the song matures, syllables undergo a process of gradual refinement. The neural mechanisms of adult song production and refinement are fairly well understood, and rely, like mammalian motor learning, on a network of cortical, basal ganglia, and thalamic brain areas13,16,18. Together, this network of brain areas is thought to explore new song variations20–23; evaluate which variations sound good24–27; employ reinforcement learning to bias future song towards variants that sound better28–30; and ultimately execute precisely timed neural sequences to generate each song syllable31–39.
What neural signals might help build the sequences that precisely control adult song syllables? Song timing is thought to be controlled by HVC31,32,35,36, where neurons generate brief bursts in precisely timed sequences that span every moment in the song38,39. These precise sequences appear to emerge gradually over development, first with the growth of protosequences from syllable onsets, then the splitting of protosequences into daughter sequences40. Several existing computational models of HVC development rely on a population of syllable-onset-related training neurons that seed developing chains in HVC40,41. In these models, HVC receives brief bursts of external input, and the HVC network organizes into chains of sequentially active neurons that are triggered by inputs to training/seed neurons.
Based on a large body of previous work, we hypothesized that seed neurons in HVC may be driven by a signal from the auditory system, and that this could train the HVC network to form syllable-sized chains42. At a computational level, we view song imitation as learning a generative model of the tutor song. Exposure to a tutor song could imprint the desired collection of syllable-sized chunks in the auditory system43,44, which through interaction with HVC could then create an appropriate latent representation of song timing in the form of HVC sequences. More specifically, we proposed that auditory cortex may directly influence sequence formation in HVC by appropriately activating seed neurons during development. Several experimental results are consistent with this view: tutor exposure produces rapid overnight changes in the song motor system, including dramatic alterations of song features4, spontaneous activity45, and spine growth and stabilization46. Despite the apparent importance of auditory inputs to HVC during learning, these inputs have little role in adult singing, and indeed are gated off in adult birds47,48.
Several lines of evidence led us to focus on nucleus interface (NIf), a higher-order sensorimotor cortical area that projects to HVC and has been implicated in song learning. Like mammalian association cortex, NIf is a central part of an interconnected network of auditory and motor cortical areas49–54 (Fig. 1a). Specifically, it has been suggested that NIf is analogous to Spt, a region in the human speech processing circuit located at the interface of auditory and motor functions53,55, where neural activity is locked to onsets of words and phrases56. NIf appears to play a key role during vocal development: lesions of NIf have minimal effect on adult zebra finch song57, but inactivation of NIf in young juvenile birds causes loss of emerging spectral and temporal song structure58, and inactivating NIf while a juvenile bird is being tutored interferes with tutor imitation59. Furthermore, NIf neurons that project to HVC have been observed to burst at syllable onset60, and NIf lesions affect syllable ordering61, consistent with a role for NIf in chunking song into syllables.
To test our hypothesis that NIf provides a training input to seed neurons in HVC, we record from neurons in NIf in freely behaving juvenile zebra finches (Fig. 1b) while they sing and while they listen to a tutor song. We find that NIf exhibits bursts of activity near syllable onsets in juvenile birds, as has previously been reported60; we additionally find that NIf neurons exhibit syllable-specific activity, tending to burst more at the onsets of some syllables than others. In addition, our recordings during tutoring reveal that NIf also fires at syllable onsets, chunking the song into syllables in both singing and listening contexts. Thus, activity in NIf, or upstream of NIf, may serve to align emerging auditory and motor representations of song, allowing a stable reference frame for song learning. We develop a mechanistic neural network model, in which synaptic learning rules lead the NIf network to form a distinct ensemble for each tutor syllable. In a combined NIf/HVC model, these NIf ensembles are sufficient to train downstream premotor sequences in HVC. Finally, in light of recent optogenetics results62, we model how NIf may play a role specifying the durations of song syllables.
Results
Singing-related activity of NIf neurons in juvenile birds
We set out to characterize the neural activity patterns of NIf neurons in singing juvenile birds (89 single-unit recordings from 13 birds ages 44–92 dph, see “Methods” section). We recorded 29 antidromically identified neurons, as well as 60 neurons that were within the borders of NIf as determined by antidromic hash, but which did not meet our criteria for identified projectors. Some of these non-identified neurons could be HVC projectors that were not stimulated, while others could be local NIf interneurons. We found that many identified HVC-projecting NIf (NIfHVC) neurons generated bursts that were strongly locked to syllable onsets (Fig. 2a–d). A population average over all projection neurons revealed a robust peak in the average firing rate 19 ms prior to syllable onset (Fig. 2e, top; p < 2 × 10−6, see “Methods” section). A more detailed analysis reveals that 66% (19/29) of the projection neurons exhibited a significant firing rate peak within a window 50 ms before to +25 ms after syllable onsets (p < 0.05, with Bonferroni correction for 29 comparisons). Some neurons produced narrow bursts aligned immediately prior to syllable onsets (Fig. 2a, b, and e), and others produced wider bursts with longer premotor latencies (Fig. 2c–e), and still others burst shortly after syllable onset (Fig. 2e). This range of latencies in peak firing rates relative to syllable onset extended from 41 ms before syllable onset to 25 ms after (Fig. 2e). Across the song-locked NIfHVC neurons, the median latency of the peak was 16 ms prior to syllable onset. Note that most neurons that exhibited a significant peak at syllable onset also exhibited flanking regions of lower-than average firing rates, a pattern apparent in the population-average (Fig. 2e, top).
Turning now to the population of neurons not identified as HVC projectors, the average PSTH of these neurons during singing revealed a prominent peak at 15 ms after syllable onset (p < 2 × 10−6, Supplementary Fig. 2). A detailed analysis reveals that 35/60 individual neurons exhibited a significant peak in firing rate in the −100 to +25 ms range. Of these, 19 neurons had a firing rate peak in a narrow window 0–25 ms after syllable onset, while 16 neurons had a broader range of peak times prior to syllable onset, similar to the identified projection neurons. Overall, this somewhat bimodal distribution of properties among non-identified neurons might be consistent with the idea that this population comprises two distinct neuron types. The latter of these might correspond to HVC-projectors that were not stimulated in our antidromic identification protocol.
Earlier models of sequence formation in HVC have proposed that external inputs at subsong syllable onsets drive seed neurons in HVC that initiate the growth and splitting of synaptic chains within the HVC network40. Indeed, during subsong, many HVC projection neurons fire reliably at syllable onset40. To test the idea that such external input arises from NIf, we recorded six NIf neurons (including four NIfHVC neurons) during subsong. We found NIf neurons burst at syllable onset even at this early stage of song development (Fig. 3a–c).
Consistent with the possibility that NIf neurons function to activate HVC chains at syllable onsets, we found that the distribution of latencies of NIf neurons is much earlier and tighter than that of HVC neurons. The latencies of individual NIfHVC neurons are clustered prior to syllable onsets at all stages of song learning, with 80%, 79%, and 83% of bursts occurring prior to syllable onset in subsong, protosyllable, and multisyllable stages, respectively (Fig. 3c). In contrast, the distribution of HVC burst latencies appears to change over development in a way consistent with chains growing from syllable onsets40, such that in adult birds HVC bursts occur fairly uniformly throughout the song motif38,39. These results are consistent with a model in which HVC sequences grow from syllable onsets, triggered by inputs from NIf (Fig. 3e). We note that, while the peaks of the PSTHs of individual NIf neurons occur primarily at syllable onset, individual NIf neurons have also been observed to fire at moments other than syllable onset, for example, at transition points within multi-part syllables60. This observation is consistent with the possibility that some syllables may be formed by combining multiple protosyllables40.
We wanted to ensure that the observed narrow distribution of NIf latencies was not an artifact of biased spatial sampling in NIf. NIf is an elongated nucleus that sits along the mesopallial lamina (Supplementary Fig. 3, inset). Recent work has shown that projections to HVC display non-uniform topology, and lesions to different parts of HVC may result in different types of song deficits63–65. Using post-hoc histology, it was possible to estimate the anatomical position within NIf of some of our recording sites, and we found that our recordings were sampled from the entire extent of NIf. Notably, no significant correlation between response latency and the anatomical position of the recording site was observed (R2 = 0.15, Supplementary Fig. 3).
We were curious whether NIf onset-related activity during singing is selective for specific emerging syllable types, or whether the activity is equivalent for all syllable types. In an intermediate stage of song learning, multiple syllable types emerge from a common protosyllable, and multiple HVC sequences appear to emerge from a common protosequence4,40. We find that some NIfHVC neurons burst preferentially at the onsets of certain syllable types (Fig. 4). Of the song-locked putative HVC projection neurons we recorded in juveniles with multiple syllable types, most had significantly different firing rates for different syllables (6/8 pass ANOVA at 0.05 significance level with Bonferroni correction). These results are consistent with a potential role for NIf in shaping the emergence of syllable-specific sequences in HVC.
NIf activity during tutoring
It has previously been proposed that auditory experience of the tutor song can directly drive the formation of song sequences in HVC42,66. To investigate the potential role of NIf in mediating this hypothesized process, we recorded from NIf in juvenile birds during early tutor exposure (103 single-unit recordings from 14 birds between the ages of 41 and 67, including 29 putative HVC-projectors). Approximately half of these neurons produced clear bursts of activity during presentation of the tutor song (Fig. 5). The population average syllable-aligned PSTH exhibited a significant peak near syllable onset with a latency of 14 ms post onset (Fig. 5g; p < 2 × 10−6). The onset-related peak was observed separately in both the population of NIfHVC projection neurons (peak latency 2 ms; p < 1 × 10−4) and in the population of non-identified neurons (peak latency 15 ms; p < 2 × 10−6).
A more detailed analysis of individual neurons revealed that 51 of all 103 neurons (including 8 of 29 projection neurons) exhibited significantly elevated firing rates in a window between 0 and 25 ms after tutor syllable onset (Fig. 5h, i; p < 0.05, Bonferroni corrected). Among individual neurons with significant responses, the median latency was 14 ms after syllable onset. Interestingly, we also observed a small subset of neurons that produced a dip in firing rate in the 0–25 ms window after syllable onset (6/103 neurons, including 1/29 projectors; p < 0.05, Bonferroni corrected for 103 comparisons). For these neurons, the average firing rate change in this window was −7 Hz, compared to the +22 Hz firing rate change for the 51 neurons exhibiting an increase. Thus, as a population, the predominant modulation in NIf was a brief increase in firing rate at syllable onsets. As in the singing data, no significant correlation was observed between latency and the position of the recording site. Of the neurons we were able to record during both singing and tutoring, some were exclusively singing-locked (12/24 neurons, including 7/10 projectors), some were both singing-locked and tutor-locked (8/24 neurons, including 1/10 projectors), a few were exclusively tutor-locked (2/24 neurons, including 1/10 projectors), and the remaining 2/24 neurons did not show changes in firing rate related to singing or tutoring. Of the eight neurons that were locked at syllable onsets during both singing and tutoring, we observed a range of singing latencies from −6 to +18 ms, with a mean of 9.6 ms, and a range of tutoring latencies from +6 to +45 ms, with a mean of +20.3 ms. For the one HVC-projecting NIf neuron, the singing latency was −2 ms, and the tutoring latency was +26 ms.
We wondered whether NIf is active at syllable offsets as well as syllable onsets. For most syllables in the tutor song, it is difficult to decouple syllable offset responses from onset responses, because the offset of most syllables is followed by the onset of the next syllable in the song. Therefore, to analyze syllable offsets, we aligned NIf activity to the offsets of the final syllable in the tutor song bout. As a population, NIf neurons exhibited no significant modulation (either increase or decrease) following last syllable offset (Supplementary Fig. 4A). Only 2/103 neurons (neither of which were projectors) exhibited a significant increase in spiking activity in a 0–25 ms window following last-syllable offset (p < 0.05, Bonferroni correction for 103 comparisons, Supplementary Fig. 4B, C). Interestingly, both of these neurons also exhibited robust syllable-onset responses.
The hypothesis that NIf may translate the tutor song into an appropriate collection of HVC sequences suggests that NIf activity during tutoring may be syllable selective. Notably, of the putative projection neurons that had significantly elevated responses to tutor syllable onsets, most (6 of 8 neurons) also had different firing rates for different syllable types (Fig. 6; ANOVA at p < 0.05 with Bonferroni correction). For these selective neurons, firing rate variances across syllable types were 13 times larger than within syllable types (F-test statistic was on average 13). These findings are consistent with the idea that during tutoring, NIf exhibits strongly syllable-related and syllable-specific activity.
Mechanistic model and discussion
The recordings described above hint at how the brain may translate an auditory memory of the tutor song into motor commands to generate a precise imitation. Here, we formalize these ideas into a broader computational framework that includes both NIf and the downstream premotor area HVC. We build on previous models of HVC40 by replacing a hard-coded training input with a model of NIf that learns the structure of the tutor song and autonomously generates and transmits a training signal to HVC. First, we present a computational model of NIf (Fig. 7a, b): The key idea is that exposure to a tutoring input imprints a pattern of connectivity within NIf that defines an ensemble of neurons for each syllable in the tutor song. These ensembles are then re-activated later during singing. The activity of the model during the tutoring and singing stages captures features of our NIf recordings. We next combine the NIf model with a model of HVC, and describe how the NIf network learns an outline of song structure in the auditory domain and translates this structure to the model HVC network to guide learning in the motor domain. Finally, we use the combined NIf/HVC model to demonstrate a potential mechanism by which NIf may control the duration of syllable sequences in HVC, consistent with recent optogenetics findings that stimulation of NIf at different rhythms can imprint different syllable durations62. Note that this captures only the very earliest stages of learning, and does yet not include a mechanism to read out song errors, which is thought to occur separately18,26.
We set out to design a model NIf network that generates a distinct neural ensemble for each syllable in the tutor song, then reactivates those same ensembles autonomously during singing. To achieve this, we took inspiration from existing models of neural ensemble formation67,68, and used learning rules that both associate neurons together in clusters, and also compete clusters against each other (details below). Inputs to the network consist of two classes of syllable-related inputs (Fig. 7a). The first is a syllable-specific auditory input that is only active during tutoring and consists of a different random pattern of activity for each syllable in the tutor song. The second input is a syllable onset signal active during both tutoring and singing. During tutoring, this signal, derived elsewhere in the auditory system, potentiates NIf at syllable onset. During singing, this onset signal corresponds to an efference copy of preparatory motor commands originating in brain areas responsible for the initiation of early babbling syllables. The role of this efference copy input is to reactivate syllable-specific NIf populations during singing. Together, these two inputs lead to the formation of syllable-specific ensembles during tutoring (Fig. 7d, f, g) and to the reactivation of these ensembles during singing (Fig. 7e, h), when they serve as seed inputs to drive chain formation in HVC.
Before describing the model further, we briefly elaborate on evidence that syllable onset activity in NIf in early song vocalizations corresponds to an efference copy of motor commands originating elsewhere (Fig. 7a). One line of evidence arises from studies suggesting that subsong babbling is driven by a cortical nucleus lateral magnocellular nucleus of the nidopallium (LMAN), rather than the nucleus HVC that drives adult song. While both HVC and LMAN exhibit strong syllable onset activity in subsong, lesions of LMAN completely abolish these early vocalizations, while lesions of HVC have little effect40,69. LMAN is thought to activate subsong syllable onsets through a pathway to RA and then to brainstem vocal and respiratory centers. An efference copy of these onset signals could reach NIf via a feedback pathway from midbrain vocalization and respiratory centers through the thalamic nucleus Uvaeformis (Uva)70, a pathway thought to provide interhemispheric synchronization71,72, coordinate respiration34,73,74, as well as control the sequential generation of syllables in a song motif75,76, or sequential structure within syllables77. These ideas are further supported by the recent observation that Uva exhibits strong syllable onset-related signals during singing in adult birds78. Notably, efference copy signals could also reach HVC directly from Uva, or from Uva through another higher-order auditory area (CM), or via Area X and A1179, pathways that could also, in principle, play a role in sequence formation in HVC.
How do we construct a model such that, during the tutoring phase, auditory inputs give rise to the formation of syllable-specific ensembles? In the simplest case, auditory inputs for each syllable would activate non-overlapping populations of NIf neurons, which would then form independent ensembles for each syllable through Hebbian learning. In reality, auditory inputs for each syllable do overlap with each other, so the main challenge is to form independent ensembles in NIf in spite of overlapping inputs. There are many different ways to accomplish this goal67,68; in our implementation, we take inspiration from previous work demonstrating that anti-Hebbian learning can decorrelate input patterns67,80. Thus we initiate the tutoring phase with a brief episode of anti-Hebbian learning during which synapses between co-active neurons are made more negative. This ensures that auditory inputs for different syllables activate non-overlapping ensembles of neurons in the later tutoring stage. After the episode of anti-Hebbian learning, tutoring continues with a Hopfield-like Hebbian learning rule, which burns each non-overlapping population into the network as an independent, recurrently connected ensemble for each syllable (Fig. 7e). The full progression of network connectivity over time is shown in Supplementary Movie 1.
In our model, these syllable-specific ensembles are reactivated during singing as attractors of the NIf network by the rhythmically driven syllable onset signal. Because the onset signal has no syllable-specificity, the specific ensemble reactivated is determined by either noise or initial conditions. In our implementation, we have used an adaptation term (see “Methods” section) that prevents the same ensemble from being repeatedly activated on subsequent cycles of the onset signal. Thus, the model network cycles reliably through the stored syllable ensembles. While we did not set out to explicitly model how the syllable order is represented or controlled by the songbird brain, it is likely that weak, slow feed-forward connections between ensembles in NIf could, in principle, bias the song to reproduce syllable orderings presented during tutoring81.
The result of the NIf model is a network that organizes into recurrently connected ensembles corresponding to the different syllables presented during the tutoring phase. A numerical analysis of network performance reveals that the network can successfully form ensembles for each tutor syllable and robustly reproduce the correct number of independent syllables during singing. For example, when tutored with four syllables, the network both formed the correct number of syllable-specific ensembles during the tutoring phase and reproduced those syllables during the singing phase on 98/100 random initializations. The network was also largely successful when trained on three (81/100 successful runs) or five (79/100) syllables.
Previous models of HVC sequence formation incorporated syllable-specific training neurons as an input to HVC40,42, consistent with the activity we subsequently observed in NIf (and report here). To further examine this hypothesized interaction, we connected the NIf model to the earlier model of HVC40 and tested whether the replay of model NIf ensembles could train HVC to produce a unique sequence for each syllable in the tutor song. In the combined NIf/HVC model, activity in NIf was able to train HVC to assemble new sequences for each syllable in the tutor song (Fig. 8a). More specifically, in the combined model, HVC sequences grew in length and exhibited syllable-specific differentiation of activity, just as observed in HVC neurons recorded in developing birds. The combined model demonstrates explicitly how the formation of different ensembles in NIf can drive chain splitting in HVC such that different syllables in the song are represented by different chains in HVC.
Our combined NIf/HVC model also suggests a potential mechanism by which NIf may control the duration song syllables. Namely that the HVC network learns sequence durations that are strongly influenced by the period of activity imposed through NIf (Fig. 8b, c). This possibility was recently experimentally demonstrated by Zhao et al. in which patterned optogenetic stimulation of NIf in young birds was able to affect the durations of learned song syllables. For example, stimulating NIf at a fast 10 Hz rhythm led birds to sing shorter syllables62, while stimulation of NIf with longer pulses (300 ms) at irregular intervals led to the formation of pathologically long syllables with a wide range of durations (from 150 ms to almost 1 s). We found that our combined NIf/HVC model precisely reproduces these findings: rhythmic NIf activation led to the formation of precisely timed syllables with a duration equal to the period of the stimulation, while irregular stimulation at long intervals led to highly variable durations of HVC network activity, corresponding to long syllables ranging from 150 ms up to a second in duration (Fig. 8b–d). Thus, in the combined NIf/HVC model, syllable durations appear to be controlled by a combination of NIf inputs and dynamics within the HVC network. NIf may activate HVC, which then ‘reverberates’ for a period of time until it is either reset by the next NIf input, or the reverberation in HVC dies out, as determined by its intrinsic recurrent excitation and inhibition.
Interestingly, the NIf network frequently displayed other behaviors besides strict imitation of the tutor patterns. These other behaviors are reminiscent of natural variations described in zebra finch song learning (Fig. 9). For example, while zebra finches often imitate their tutor, approximately half of birds generate song variations by deleting a syllable, improvising a new syllable, or duplicating an individual tutor syllable to produce two distinct syllable variants25,82,83. Similarly, during singing, our model network sometimes fails to activate a tutored ensemble (deletion, Fig. 9a), or sometimes activates a novel ensemble comprising neurons that were not activated together during tutoring (improvisation, Fig. 9b). Finally, during tutoring, the model sometimes forms two distinct ensembles activated by the same tutor syllable, both of which are then reactivated during singing, resulting in the duplication of that tutor syllable (Fig. 9c). The relative prevalence of these variations depends on the choice of model parameters. For example, the adaptation time-constant during either tutoring or singing has a particularly prominent effect on these phenomena. Altogether, the model qualitatively recapitulates some surprising and unexplained aspects of vocal imitation in juvenile songbirds.
Our findings provide additional insight into potential mechanisms driving the mirroring of activity between sensory observation and motor action. The onset-related signals observed in NIf during both singing and tutoring give insights into how mirror neuron activity may arise66,84,85. In our view, the mirror activity in NIf is a signature of a sensory system that encodes a memory of a sensory event (the tutor song) subsequently used to train a motor system. The mirroring activity in the post-learning adult results from the vestigial neural architecture subserving the earlier learning process.
Together, our NIf recordings and modeling suggest potential neural mechanisms for breaking a tutor song into simple vocal-motor units that can be used to build a motor program that aligns with the auditory memory. By aligning activity to syllable onsets, NIf could chunk songs into syllables, each of which is accessible as a discrete unit during later motor learning and production. This view is consistent with the observation that NIf is necessary in some birds for flexible syllable sequencing61. In our model, simple neural plasticity rules produce an appropriate training input that reflects the structure of a tutor song. Building a motor program in this way, shaped by input from a sensory system, could create the precise alignment between a sensory memory and motor representations necessary for subsequent motor learning and evaluation. In the broadest view, this work elucidates neural mechanisms by which the brain, through sensory experience of the environment, can build motor programs appropriate to interact with that environment.
Methods
Subjects
Electrophysiological recordings were carried out in 20 juvenile zebra finches (Taeniopygia guttata) from the MIT zebra finch breeding facility (Cambridge, MA). Birds were implanted with motorized microdrives between 37 and 42 days post hatch. We injected tracer (fluorescently labeled Dextran or Choleratoxin) in HVC to visualize NIf in posthoc histology. After recovery from surgery, birds were tutored using playback from a speaker with an adult bird present. Some birds (isolate birds) were raised by female birds, so this tutoring was their only exposure to any tutor song. Other birds were raised by both parents, and we played their own father’s song during tutoring. We did not observe a significant difference between these groups of birds, so we combined the two groups in our analyses (17/21 song-locked in isolate birds with latency −21 ± 18 ms relative to syllable onset; 37/68 song-locked in non-isolate birds with latency −1 ± 27 ms relative to syllable onset; 27/48 tutor-locked in isolate birds with latency 15 ± 7 ms relative to syllable onset; 24/55 tutor-locked in non-isolate birds with latency 10 ± 29 ms relative to syllable onset). Once the birds started to sing, we also recorded during singing, and continued recording until either recording quality degraded or the bird’s song developed a stable motif. Animal care and experiments were carried out in accordance with NIH guidelines, and reviewed and approved by the Massachusetts Institute of Technology Committee on Animal Care.
Neural recordings in freely behaving zebra finches
We recorded 168 single units in 20 juvenile birds during tutoring and singing. The electrophysiological methods that were used are described in detail in previous publications86,87. Some units were antidromically identified as HVC-projectors using a stimulating electrode in HVC. Based on previous studies88, neurons were identified as putative HVC-projectors if they had <100 μs latency jitter from HVC stimulation. Neurons were further identified as collision-tested HVC-projectors if they passed the collision test (triggering HVC stimulation on spontaneous spikes blocks stimulation-evoked spikes, see Supplementary Fig. 1).
Analysis of song data
Songs were recorded with custom MATLAB software (A. Andalman), which was configured to trigger recordings of all quiet vocalizations of juvenile birds. Songs were segmented into syllables using custom MATLAB software40,69,74.
Syllable classification was performed using custom software89 (based on ref. 90), to visualize acoustic trajectories in low-dimensional space using t-stochastic neighbor embedding (t-SNE) methods91. Syllable classification was confirmed by observing spectrograms of syllable renditions40. Syllable labeling was done without reference to neural activity.
Analysis of neural data
Many of our results involve assessing whether and how neural activity is aligned to syllable onsets. Spikes were sorted offline using custom MATLAB software (D. Aronov). In order to analyze the alignment of neural activity to syllable onsets, we calculated onset-aligned rate histograms (1 ms bins, smoothed over 20 bins). The latency of each unit was calculated as the time of the onset-aligned PSTH maximum within a region from 100 ms before syllable onset to 100 ms after syllable onset40.
In order to assess whether a neuron exhibited a significant modulation in response to syllables, we compared its PSTH to a temporally shifted control. Control rasters are generated by circularly shifting the timing of spikes by a different random (uniform from −100 to 100 ms) amount for each row (syllable) of the raster. In order to estimate p-values, the true PSTH is compared to PSTHs calculated from random control rasters. p-values are Bonferroni corrected for the number of neurons being tested. In the singing data, the PSTH peak was compared to peaks from random control PSTHs. In the tutoring data, where latencies were clustered in a narrow window 25 ms following syllable onset, the PSTH in this window was compared to control PSTHs in this window.
Population analyses combined syllable-aligned PSTHs across neurons. First, the PSTH of each neuron was normalized (mean subtracted; divided by standard deviation). Then, an average PSTH was calculated across the population. In order to assess the significance of the peak in the population response, a control set of PSTHs was generated by shifting the PSTH of each individual neuron by a different random amount (uniform between −100 and 100 ms) in time. Plots show shaded regions at the p = 0.01 level. Random controls are also used to assess the p-value of the peak population response. In several cases, the peak exceeded all 5 × 106 random controls; here p-values are stated as p < 2 × 10−6.
In order to analyze whether neurons were selective for particular syllable types, we performed ANOVA analyses comparing spike counts of the neuron across different syllable types. Spike counts were calculated in a window from 50 ms before syllable onset to 20 ms after syllable onset.
Computational model
MATLAB code used to simulate the model is available in Supplementary Code 1 in the supplementary information. Neurons were modeled with activity as a threshold-linear function of membrane potential, with membrane potential capped at 0.5 to prevent runaway. Before each syllable presentation, membrane potential was reset to 0. Activity was simulated in continuous time, using MATLAB’s ode45 (which uses a Runge–Kutta method) with a dt of 1 ms.
A network of 100 of these neurons is recurrently connected in an all-to-all manner, with Wij representing the synaptic strength from neuron j to neuron i. Self-excitation is prevented by setting Wii = 0 for all i at all times. In addition to recurrent input, each neuron receives a high dimensional feed-forward input. Input patterns are passed through an input weight matrix, with fixed synaptic weights drawn from an asymmetric distribution spanning positive and negative values. The distribution was constructed as a de-meaned log-normal distribution (standard deviation 0.25) chosen so there would be many weak inhibitory synapses and a few strong excitatory synapses. For example model weight matrices, see Supplementary Fig. 5.
There are two types of input to the network: first, a syllable-specific auditory input that is only active during the tutoring context, and second, a syllable onset signal that is active during both tutoring and singing. The first type of inputs were random (uniformly distributed between 0 and 1) sparse patterns in 100-dimensional space. Sparsity was achieved by forcing some proportion (80% unless otherwise specified) of the input elements to be 0. The second type of inputs were the same on every syllable type, and meant to represent a non-specific syllable onset signal to NIf. Each pattern is presented for 30 ms (30 timesteps), followed by 70 ms of silence. These patterns are presented sequentially for 20 cycles during a tutoring phase, during which the recurrent weights are updated. Then, during the singing phase, the full input patterns are replaced with the onset signal alone, repeated for 20 cycles. No learning occurs during the singing phase.
During the tutoring stage, these recurrent weights change according to a Hopfield-like learning rule, where Wij is incremented by some Δ if neurons i and j are active, and Wij is decremented by the same Δ if only one of neurons i and j is active. If both neurons are inactive, the synapse strength does not change. Each weight Wij is capped to be between +1 and −1. For the anti-Hebbian learning, Δ = 0.05, and for the Hopfield-like stage, Δ = 0.01. Synapses are updated every time step.
During the first presentation of each syllable during tutoring, the recurrent weights undergo a phase of anti-Hebbian learning. That is, synapses between two co-active neurons are made more negative by an amount relative to the activity in each neuron. Specifically, , where ηAH is the anti-Hebbian learning rate (set to 0.05 unless stated otherwise), Y+ is the positive part of the membrane potential and is the transpose of the positive part of the membrane potential. This is simply an outer product of the positive part of the membrane potential with itself, weighted by a learning rate. After each syllable is presented once, this anti-Hebbian learning ceases and is replaced by the Hopfield-like learning rule described above.
The network dynamics proceed according to the equation below:
1 |
Where Y(t) is a vector containing the membrane potential for each neuron, Y+(t) is the positive part of Y(t) (i.e., it is Y(t) where Y(t) is positive and 0 otherwise), τ is the membrane time constant (equal to 10 ms in all simulations), W+ and W− are the positive and negative parts of the recurrent weight matrix, B(t) is a vector containing the inputs to the network, and WB is the input weight matrix. A(t) is the neural activity, which is a capped, threshold linear function of Y(t). α is the intracellular adaptation, which evolves (with time constant τi = 125 ms and steady-state coefficient ϵ = 10) according to
2 |
Finally, Σ is a vector containing the normalization constant for each neuron. Each neuron is scaled by the average amount of feedforward inputs that it receives. Σ is calculated from the input weight matrix as follows: , where Bi is the input pattern for syllable i, and K is the number of syllables. The main function of this term is to generate clusters of relatively uniform sizes, and it also contributes to decorrelating different input patterns.
Most simulations were done with 100 neurons, but we wanted to check whether performance was similar for different network sizes. We found that the number of necessary training episodes is consistent across different network sizes (100, 500, and 1000 neurons, the order of magnitude of the total number of neurons in NIf, estimated from ref. 65). The network learns quickly, from few training examples, consistent with birds’ ability to learn from very limited exposure to a tutor song.
t-SNE embedding plots
Plots for Fig. 7 and the supplemental movies were created by performing t-SNE on the NIf model’s recurrent weight matrix. To maximize similarity between embeddings across time, each run of t-SNE was given as initial conditions the embedding of the weight matrix during singing. MATLAB’s tsne function was used. The only non-default parameter was exaggeration, which was set to 2.5.
Movie of network running was generated using the dynamic t-SNE algorithm from ref. 92, in order to preserve consistency of t-SNE embeddings across different frames.
HVC model
The HVC model was identical to that used in ref. 40. Methods can be found there. In the section where rhythmic vs. non-rhythmic stimulation was compared, the model was driven with two different stimulus types. The rhythmic stimulation involved driving the model with trials composed of four pulses separated by 10 timesteps (100 ms total). The non-rhythmic stimulation involved driving the model with a single pulse. Inter-trial intervals were distributed according to a Poisson distribution with λ = 50 timesteps. The minimum inter-trial interval was 27 timesteps, and there were 7200 trials total. Trial structure was identical in rhythmic and non-rhythmic cases.
Syllable length analysis
At the end of these 7200 trials (see “HVC model” section above), the model was stimulated with a single pulse and allowed to run until activity ceased. The length of time for which the network was active following this stimulation was read-out as the syllable length. This single-pulse stimulation was repeated a total of 10 times for each run of the model, so that each run of the model produced 10 syllable length readings. This was done because the random noise in the network resulted in some variability in syllable length for each model run.
Combined NIf/HVC model
The combined model was achieved by driving the HVC model (described above and in ref. 40) with inputs defined by the outputs of the NIf model (described above). To convert NIf model outputs into HVC model inputs, the following steps were taken. First, the NIf network was configured to learn two patterns (this is achieved by simply changing a parameter corresponding to the number of patterns to learn). This resulted in “singing phase” activity of the NIf network corresponding to two patterns. Then, since one timestep in the HVC model corresponds to 10 timesteps in the NIf model, this NIf output was downsampled across time by a factor of 10. Additionally, the NIf network (typically 100 neurons) is meant to drive a population of seed neurons (typically 10) in the HVC model. So, we also downsampled across neurons by a factor of 10. Finally, since the HVC model is driven by onsets, we filtered the NIf network output so that only activity onsets, rather than the entire activity, was non-zero.
In cases where the HVC model was stimulated with differing rhythms (i.e., every 5 or 15 timesteps), as in Fig. 8b, artificial stimulation of NIf-to-HVC neurons62 was simulated by directly driving the HVC model with the frequencies described.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported by a grant from the Simons Collaboration for the Global Brain, the National Institutes of Health (NIH) [grant number R01 DC009183] and the G. Harold & Leila Y. Mathers Charitable Foundation. E.L.M. received support through the NDSEG Fellowship program. Thanks to Andrew Bahle, Galen Lynch, Nader Nikbakht, Hannah Wirtshafter, and Leenoy Meshulam for comments on the manuscript.
Author contributions
Conceptualization, E.L.M., M.T.L.H., and M.S.F.; Methodology, E.L.M., M.T.L.H., and M.S.F.; Software, E.L.M., and M.T.L.H.; Formal analysis, E.L.M., M.T.L.H., and M.S.F.; Investigation, E.L.M.; Writing—original draft, E.L.M., M.T.L.H., and M.S.F.; Writing—review & editing, E.L.M., M.T.L.H., and M.S.F.; Supervision, M.S.F.; Funding acquisition, M.S.F.
Data availability
Data (raw electrophysiology and audio data, metadata, and annotations), as well as MATLAB code to generate data figures, is publicly available on the CRCNS data sharing platform (10.6080/K0GQ6W0K)93.
Code availability
Analysis code and code to generate the data figures is posted alongside the raw data on the CRCNS data sharing platform (10.6080/K0GQ6W0K)93. MATLAB code to run the model and generate figures related to the model is provided as supplementary material.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information is available for this paper at 10.1038/s41467-020-18732-x.
References
- 1.Graybiel AM. The basal ganglia and chunking of action repertoires. Neurobiol. Learn. Mem. 1998;70:119–136. doi: 10.1006/nlme.1998.3843. [DOI] [PubMed] [Google Scholar]
- 2.Nachev P, Kennard C, Husain M. Functional role of the supplementary and pre-supplementary motor areas. Nat. Rev. Neurosci. 2008;9:856–69. doi: 10.1038/nrn2478. [DOI] [PubMed] [Google Scholar]
- 3.Smith KS, Graybiel AM. Investigating habits: strategies, technologies and models. Front. Behav. Neurosci. 2014;8:39. doi: 10.3389/fnbeh.2014.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tchernichovski O, Mitra PP, Lints T, Nottebohm F. Dynamics of the vocal imitation process: how a zebra finch learns its song. Science. 2001;291:2564–2569. doi: 10.1126/science.1058522. [DOI] [PubMed] [Google Scholar]
- 5.Zann, R. A. The Zebra Finch: A Synthesis of Field and Laboratory Studies (Oxford University Press, 1996).
- 6.Deshpande, M., Pirlepesov, F. & Lints, T. Rapid encoding of an internal model for imitative learning. Proc. R. Soc. B281, 20132630 (2014). [DOI] [PMC free article] [PubMed]
- 7.Doupe AJ, Kuhl PK. Birdsong and human speech: common themes and mechanisms. Annu. Rev. Neurosci. 1999;22:567–631. doi: 10.1146/annurev.neuro.22.1.567. [DOI] [PubMed] [Google Scholar]
- 8.Petkov CI, Jarvis ED. Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates. Front. Evol. Neurosci. 2012;4:12. doi: 10.3389/fnevo.2012.00012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Teramitsu I, Kudo LC, London SE, Geschwind DH, White SA. Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. J. Neurosci. 2004;24:3152–63. doi: 10.1523/JNEUROSCI.5589-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.White SA. Learning to communicate. Curr. Opin. Neurobiol. 2001;11:510–520. doi: 10.1016/S0959-4388(00)00242-7. [DOI] [PubMed] [Google Scholar]
- 11.Dugas-Ford J, Rowell JJ, Ragsdale CW. Cell-type homologies and the origins of the neocortex. Proc. Natl. Acad. Sci. USA. 2012;109:16974–9. doi: 10.1073/pnas.1204773109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pfenning AR, et al. Convergent transcriptional specializations in the brains of humans and song-learning birds. Science. 2014;346:1256846. doi: 10.1126/science.1256846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jarvis ED, et al. Avian brains and a new understanding of vertebrate brain evolution. Nat. Rev. Neurosci. 2005;6:151–159. doi: 10.1038/nrn1606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karten HJ. Homology and evolutionary origins of the neocortex. Brain Behav. Evol. 1991;38:264–72. doi: 10.1159/000114393. [DOI] [PubMed] [Google Scholar]
- 15.Karten HJ. Neocortical evolution: neuronal circuits arise independently of lamination. Curr. Biol. 2013;23:R12–R15. doi: 10.1016/j.cub.2012.11.013. [DOI] [PubMed] [Google Scholar]
- 16.Reiner A, et al. Revised nomenclature for avian telencephalon and some related brainstem nuclei. J. Comp. Neurol. 2004;473:377–414. doi: 10.1002/cne.20118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang Y, Brzozowska-Prechtl A, Karten HJ. Laminar and columnar auditory cortex in avian brain. Proc. Natl Acad. Sci. USA. 2010;107:12676–12681. doi: 10.1073/pnas.1006645107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fee MS, Goldberg JH. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience. 2011;198:152–70. doi: 10.1016/j.neuroscience.2011.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cynx J. Experimental determination of a unit of song production in the zebra finch (Taeniopygia guttata) J. Comp. Psychol. 1990;104:3–10. doi: 10.1037/0735-7036.104.1.3. [DOI] [PubMed] [Google Scholar]
- 20.Kao MH, Brainard MS. Lesions of an avian basal ganglia circuit prevent context-dependent changes to song variability. J. Neurophysiol. 2006;96:1441–1455. doi: 10.1152/jn.01138.2005. [DOI] [PubMed] [Google Scholar]
- 21.Ölveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biol. 2005;3:e153. doi: 10.1371/journal.pbio.0030153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ölveczky BP, Otchy TM, Goldberg JH, Aronov D, Fee MS. Changes in the neural control of a complex motor sequence during learning. J. Neurophysiol. 2011;106:386–397. doi: 10.1152/jn.00018.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stepanek L, Doupe AJ. Activity in a cortical-basal ganglia circuit for song is required for social context-dependent vocal variability. J. Neurophysiol. 2010;104:2474–2486. doi: 10.1152/jn.00977.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Keller GB, Hahnloser RHR. Neural processing of auditory feedback during vocal practice in a songbird. Nature. 2009;457:187–190. doi: 10.1038/nature07467. [DOI] [PubMed] [Google Scholar]
- 25.Mandelblat-Cerf, Y., Las, L., Denisenko, N. & Fee, M. S. A role for descending auditory cortical projections in songbird vocal learning. eLife3, e02152 (2014). [DOI] [PMC free article] [PubMed]
- 26.Gadagkar V, et al. Dopamine neurons encode performance error in singing birds. Science. 2016;354:1278–1282. doi: 10.1126/science.aah6837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chen, R. et al. Songbird ventral basal ganglia sends performance error signals to dopaminergic midbrain. Neuron103, 266–276.e4 (2019). [DOI] [PMC free article] [PubMed]
- 28.Andalman AS, Fee MS. A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. Proc. Natl Acad. Sci. USA. 2009;106:12518–12523. doi: 10.1073/pnas.0903214106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Warren TL, Tumer EC, Charlesworth JD, Brainard MS. Mechanisms and time course of vocal learning and consolidation in the adult songbird. J. Neurophysiol. 2011;106:1806–1821. doi: 10.1152/jn.00311.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tesileanu, T., Ölveczky, B. & Balasubramanian, V. Rules and mechanisms for efficient two-stage learning in neural circuits. eLife6, e20944 (2017). [DOI] [PMC free article] [PubMed]
- 31.Vu ET, Mazurek ME, Kuo YC. Identification of a forebrain motor programming network for the learned song of zebra finches. J. Neurosci. 1994;14:6924–6934. doi: 10.1523/JNEUROSCI.14-11-06924.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yu AC, Margoliash D. Temporal hierarchical control of singing in birds. Science. 1996;273:1871–1875. doi: 10.1126/science.273.5283.1871. [DOI] [PubMed] [Google Scholar]
- 33.Fee MS, Kozhevnikov AA, RichardHahnloser HR. Neural mechanisms of vocal sequence generation in the songbird. Ann. N. Y. Acad. Sci. 2004;1016:153–170. doi: 10.1196/annals.1298.022. [DOI] [PubMed] [Google Scholar]
- 34.Srivastava KH, et al. Motor control by precisely timed spike patterns. Proc. Natl Acad. Sci. USA. 2017;114:1171–1176. doi: 10.1073/pnas.1611734114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hahnloser RHR, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature. 2002;419:65–70. doi: 10.1038/nature00974. [DOI] [PubMed] [Google Scholar]
- 36.Long MA, Fee MS. Using temperature to analyse temporal dynamics in the songbird motor pathway. Nature. 2008;456:189–194. doi: 10.1038/nature07448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Long MA, Jin DZ, Fee MS. Support for a synaptic chain model of neuronal sequence generation. Nature. 2010;468:394–399. doi: 10.1038/nature09514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Picardo MA, et al. Population-level representation of a temporal sequence underlying song production in the zebra finch. Neuron. 2016;90:866–876. doi: 10.1016/j.neuron.2016.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lynch GF, Okubo TS, Hanuschkin A, Hahnloser RHR, Fee MS. Rhythmic continuous-time coding in the songbird analog of vocal motor cortex. Neuron. 2016;90:877–892. doi: 10.1016/j.neuron.2016.04.021. [DOI] [PubMed] [Google Scholar]
- 40.Okubo TS, Mackevicius EL, Payne HL, Lynch GF, Fee MS. Growth and splitting of neural sequences in songbird vocal development. Nature. 2015;528:352–357. doi: 10.1038/nature15741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jun JK, Jin DZ. Development of neural circuitry for precise temporal sequences through spontaneous activity, axon remodeling, and synaptic plasticity. PLoS ONE. 2007;2:e723. doi: 10.1371/journal.pone.0000723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mackevicius EL, Fee MS. Building a state space for song learning. Curr. Opin. Neurobiol. 2018;49:59–68. doi: 10.1016/j.conb.2017.12.001. [DOI] [PubMed] [Google Scholar]
- 43.London SE, Clayton DF. Functional identification of sensory mechanisms required for developmental song learning. Nat. Neurosci. 2008;11:579–586. doi: 10.1038/nn.2103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yanagihara S, Yazaki-Sugiyama Y. Auditory experience-dependent cortical circuit shaping for memory formation in bird song learning. Nat. Commun. 2016;7:11946. doi: 10.1038/ncomms11946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Shank SS, Margoliash D. Sleep and sensorimotor integration during early vocal learning in a songbird. Nature. 2009;458:73–77. doi: 10.1038/nature07615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Roberts TF, Tschida KA, Klein ME, Mooney R. Rapid spine stabilization and synaptic enhancement at the onset of behavioural learning. Nature. 2010;463:948–952. doi: 10.1038/nature08759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hamaguchi K, Tschida KA, Yoon I, Donald BR, Mooney R. Auditory synapses to song premotor neurons are gated off during vocalization in zebra finches. eLife. 2014;3:e01833. doi: 10.7554/eLife.01833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Vallentin D, Long MA. Motor origin of precise synaptic inputs onto forebrain neurons driving a skilled behavior. J. Neurosci. 2015;35:299–307. doi: 10.1523/JNEUROSCI.3698-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Akutagawa E, Konishi M. New brain pathways found in the vocal control system of a songbird. J. Comp. Neurol. 2010;518:3086–3100. doi: 10.1002/cne.22383. [DOI] [PubMed] [Google Scholar]
- 50.Bauer EE, Coleman MJ, Roberts TF, Roy A, Prather JF. A synaptic basis for auditory–vocal integration in the songbird. J. Neurosci. 2008;28:1509–22. doi: 10.1523/JNEUROSCI.3838-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cardin JA, Schmidt MF. Auditory responses in multiple sensorimotor song system nuclei are co-modulated by behavioral state. J. Neurophysiol. 2004;91:2148–2163. doi: 10.1152/jn.00918.2003. [DOI] [PubMed] [Google Scholar]
- 52.Cardin JA, Schmidt MF. Noradrenergic inputs mediate state dependence of auditory responses in the avian song system. J. Neurosci. 2004;24:7745–53. doi: 10.1523/JNEUROSCI.1951-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lewandowski B, Vyssotski A, Hahnloser RHR, Schmidt M. At the interface of the auditory and vocal motor systems: NIf and its role in vocal processing, production and learning. J. Physiol. 2013;107:178–92. doi: 10.1016/j.jphysparis.2013.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Calabrese A, Woolley SMN. Coding principles of the canonical cortical microcircuit in the avian brain. Proc. Natl Acad. Sci. USA. 2015;112:3517–3522. doi: 10.1073/pnas.1408545112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hickok G, Buchsbaum B, Humphries C, Muftuler T. Auditory–motor interaction revealed by fMRI: speech, music, and working memory in area spt. J. Cogn. Neurosci. 2003;15:673–682. doi: 10.1162/089892903322307393. [DOI] [PubMed] [Google Scholar]
- 56.Hamilton LS, Edwards E, Chang EF. A spatial map of onset and sustained responses to speech in the human superior temporal gyrus. Curr. Biol. 2018;28:1860–1871. doi: 10.1016/j.cub.2018.04.033. [DOI] [PubMed] [Google Scholar]
- 57.Otchy TM, et al. Acute off-target effects of neural circuit manipulations. Nature. 2015;528:358–363. doi: 10.1038/nature16442. [DOI] [PubMed] [Google Scholar]
- 58.Naie K, Hahnloser HR. Regulation of learned vocal behavior by an auditory motor cortical nucleus in juvenile zebra finches. J. Neurophysiol. 2011;106:291–300. doi: 10.1152/jn.01035.2010. [DOI] [PubMed] [Google Scholar]
- 59.Roberts TF, Gobes SM, Murugan M, Ölveczky BP, Mooney R. Motor circuits are required to encode a sensory model for imitative learning. Nat. Neurosci. 2012;15:1454–1459. doi: 10.1038/nn.3206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Vyssotski AL, Stepien AE, Keller GB, Hahnloser RHR. A neural code that is isometric to vocal output and correlates with its sensory consequences. PLoS Biol. 2016;14:e2000317. doi: 10.1371/journal.pbio.2000317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hosino T, Okanoya K. Lesion of a higher-order song nucleus disrupts phrase level complexity in Bengalese finches. Neuroreport. 2000;11:2091–2095. doi: 10.1097/00001756-200007140-00007. [DOI] [PubMed] [Google Scholar]
- 62.Zhao W, Garcia-Oscos F, Dinh D, Roberts TF. Inception of memories that guide vocal learning in the songbird. Science. 2019;366:83–89. doi: 10.1126/science.aaw4226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Basista MJ, et al. Independent premotor encoding of the sequence and structure of birdsong in avian cortex. J. Neurosci. 2014;34:16821–16834. doi: 10.1523/JNEUROSCI.1940-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Galvis D, Wu W, Hyson RL, Johnson F, Bertram R. A distributed neural network model for the distinct roles of medial and lateral HVC in zebra finch song production. J. Neurophysiol. 2017;118:677–692. doi: 10.1152/jn.00917.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Elliott KC, Wu W, Bertram R, Hyson RL, Johnson F. Orthogonal topography in the parallel input architecture of songbird HVC. J. Comp. Neurol. 2017;525:2133–2151. doi: 10.1002/cne.24189. [DOI] [PubMed] [Google Scholar]
- 66.Giret N, Kornfeld J, Ganguli S, Hahnloser HR. Evidence for a causal inverse model in an avian cortico-basal ganglia circuit. Proc. Natl Acad. Sci. USA. 2014;111:6063–8. doi: 10.1073/pnas.1317087111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pehlevan C, Sengupta AM, Chklovskii DB. Why do similarity matching objectives lead to hebbian/anti-hebbian networks? Neural Comput. 2018;30:84–124. doi: 10.1162/neco_a_01018. [DOI] [PubMed] [Google Scholar]
- 68.Litwin-Kumar, A. & Doiron, B. Formation and maintenance of neuronal assemblies through synaptic plasticity. Nat. Commun.5, 5319 (2014). [DOI] [PubMed]
- 69.Aronov D, Andalman AS, Fee MS. A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science. 2008;320:630–4. doi: 10.1126/science.1155140. [DOI] [PubMed] [Google Scholar]
- 70.Ashmore RC, Bourjaily M, Schmidt MF. Hemispheric coordination is necessary for song production in adult birds: implications for a dual role for forebrain nuclei in vocal motor control. J. Neurophysiol. 2008;99:373–385. doi: 10.1152/jn.00830.2007. [DOI] [PubMed] [Google Scholar]
- 71.Schmidt MF. Pattern of interhemispheric synchronization in hvc during singing correlates with key transitions in the song pattern. J. Neurophysiol. 2003;90:3931–3949. doi: 10.1152/jn.00003.2003. [DOI] [PubMed] [Google Scholar]
- 72.Vu ET, Schmidt MF, Mazurek ME. Interhemispheric coordination of premotor neural activity during singing in adult zebra finches. J. Neurosci. 1998;18:9088–9098. doi: 10.1523/JNEUROSCI.18-21-09088.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Schmidt MF, McLean J, Goller F. Breathing and vocal control: the respiratory system as both a driver and a target of telencephalic vocal motor circuits in songbirds. Exp. Physiol. 2012;97:455–461. doi: 10.1113/expphysiol.2011.058669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Veit L, Aronov D, Fee MS. Learning to breathe and sing: development of respiratory-vocal coordination in young songbirds. J. Neurophysiol. 2011;106:1747–65. doi: 10.1152/jn.00247.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Fee, M. S. & Long, M. A. 18. Neural mechanisms underlying the generation of birdsong: a modular sequential behavior. In Birdsong, Speech, and Language: Exploring the Evolution of Mind and Brain (eds Bolhuis, J. J. & Everaert, M.) 353 (MIT Press, 2013).
- 76.Andalman AS, Foerster JN, Fee MS. Control of vocal and respiratory patterns in birdsong: dissection of forebrain and brainstem mechanisms using temperature. PLoS ONE. 2011;6:e25461. doi: 10.1371/journal.pone.0025461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Hamaguchi K, Tanaka M, Mooney R. A distributed recurrent network contributes to temporally precise vocalizations. Neuron. 2016;91:680–693. doi: 10.1016/j.neuron.2016.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Danish HH, Aronov D, Fee MS. Rhythmic syllable-related activity in a songbird motor thalamic nucleus necessary for learned vocalizations. PLoS ONE. 2017;12:e0169568. doi: 10.1371/journal.pone.0169568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hamaguchi K, Mooney R. Recurrent interactions between the input and output of a songbird cortico-basal ganglia pathway are implicated in vocal sequence variability. J. Neurosci. 2012;32:11671–11687. doi: 10.1523/JNEUROSCI.1666-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Földiák. P. Forming sparse representations by local anti-hebbian learning. Biol. Cybern. 1990;64:165–170. doi: 10.1007/BF02331346. [DOI] [PubMed] [Google Scholar]
- 81.Kleinfeld D, Sompolinsky H. Associative neural network model for the generation of temporal patterns. theory and application to central pattern generators. Biophys. J. 1988;54:1039–1051. doi: 10.1016/S0006-3495(88)83041-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Mandelblat-Cerf Y, Fee MS. An automated procedure for evaluating song imitation. PLoS ONE. 2014;9:e96484. doi: 10.1371/journal.pone.0096484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Mets DG, Brainard MS. An automated approach to the quantitation of vocalizations and vocal learning in the songbird. PLoS Comput. Biol. 2018;14:e1006437. doi: 10.1371/journal.pcbi.1006437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Prather JF, Peters S, Nowicki S, Mooney R. Precise auditory–vocal mirroring in neurons for learned vocal communication. Nature. 2008;451:305–310. doi: 10.1038/nature06492. [DOI] [PubMed] [Google Scholar]
- 85.Rizzolatti G, Craighero L. The mirror-neuron system. Annu. Rev. Neurosci. 2004;27:169–92. doi: 10.1146/annurev.neuro.27.070203.144230. [DOI] [PubMed] [Google Scholar]
- 86.Fee MS, Leonardo A. Miniature motorized microdrive and commutator system for chronic neural recording in small animals. J. Neurosci. Methods. 2001;112:83–94. doi: 10.1016/S0165-0270(01)00426-5. [DOI] [PubMed] [Google Scholar]
- 87.Okubo TS, Mackevicius EL, Fee MS. In vivo recording of single-unit activity during singing in zebra finches. Cold Spring Harb. Protoc. 2014;2014:1273–1283. doi: 10.1101/pdb.prot084624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Hahnloser RHR, Fee MS. Sleep-related spike bursts in HVC are driven by the nucleus interface of the nidopallium. J. Neurophysiol. 2007;97:423–35. doi: 10.1152/jn.00547.2006. [DOI] [PubMed] [Google Scholar]
- 89.Deny, S. et al. Learning stable representations in a changing world with on-line tSNE: proof of concept in the songbird. In International Conference on Learning Representations (ICLR) (2016). https://openreview.net/forum?id=oVgo1jRRDsrlgPMRsBzY.
- 90.Berman GJ, Choi DM, Bialek W, Shaevitz JW. Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface. 2014;11:20140672–20140672. doi: 10.1098/rsif.2014.0672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
- 92.Rauber, P. E., Falcão, A. X. & Telea, A. Visualizing time-dependent data using dynamic t-sne. In Proc. EuroVis (2016).
- 93.Mackevicius, E. L., Happ, M. T. L. & Fee, M. S. Single unit electrophysiological recordings in nucleus interface in juvenile birds singing and listening to a tutor, including hvc-projectors. CRCNS.org, 10.6080/K0GQ6W0K (2020).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data (raw electrophysiology and audio data, metadata, and annotations), as well as MATLAB code to generate data figures, is publicly available on the CRCNS data sharing platform (10.6080/K0GQ6W0K)93.
Analysis code and code to generate the data figures is posted alongside the raw data on the CRCNS data sharing platform (10.6080/K0GQ6W0K)93. MATLAB code to run the model and generate figures related to the model is provided as supplementary material.