Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2014 Apr 7;111(16):6063–6068. doi: 10.1073/pnas.1317087111

Evidence for a causal inverse model in an avian cortico-basal ganglia circuit

Nicolas Giret a,b, Joergen Kornfeld a, Surya Ganguli c, Richard H R Hahnloser a,b,1
PMCID: PMC4000851  PMID: 24711417

Significance

Auditory neural responses mirror motor activity in a songbird cortical area. The average temporal offset of mirrored responses is roughly equal to short sensorimotor loop delays. This correspondence between mirroring offsets and loop delays constitutes evidence for a causal inverse model. Causal inverse models can map a desired sensation into the required action.

Keywords: lateral magnocellular nucleus of the anterior nidopallium, Hebbian learning, mirror neuron

Abstract

Learning by imitation is fundamental to both communication and social behavior and requires the conversion of complex, nonlinear sensory codes for perception into similarly complex motor codes for generating action. To understand the neural substrates underlying this conversion, we study sensorimotor transformations in songbird cortical output neurons of a basal-ganglia pathway involved in song learning. Despite the complexity of sensory and motor codes, we find a simple, temporally specific, causal correspondence between them. Sensory neural responses to song playback mirror motor-related activity recorded during singing, with a temporal offset of roughly 40 ms, in agreement with short feedback loop delays estimated using electrical and auditory stimulation. Such matching of mirroring offsets and loop delays is consistent with a recent Hebbian theory of motor learning and suggests that cortico-basal ganglia pathways could support motor control via causal inverse models that can invert the rich correspondence between motor exploration and sensory feedback.


The brain has evolved diverse strategies for combining sensory and motor signals, a prerequisite for many complex behaviors including hunting, communication, and observational learning of motor skills. For example, to accurately sense the external world while simultaneously moving within it, the nervous system must be able to detect changes in its sensory inputs that are not a predictable consequence of self-motion. Indeed, many sensory neurons respond with high sensitivity to unpredictable stimuli during motor behavior despite self-caused sensory feedback (15). Such remarkable sensitivity can be achieved by circuit mechanisms that counteract sensory feedback associated with self-generated motor output (6). Such mechanisms are also known as corollary discharges (7) or forward models of the motor system (8), which are synaptic mappings from motor neurons onto sensory neurons that can either predict or suppress sensory feedback from self-generated motor output.

In contrast to our understanding of how the brain cancels predictable, motor-induced sensory feedback, much less is known about neural mechanisms for computing motor codes that produce desired sensory targets. However, the ability to learn to produce desired behaviors by observing others is considered to be a key advantage of sociality and a main driver for the evolution of culture (9, 10). Learning by imitation occurs spontaneously when humans learn to speak, parrots imitate surrounding sounds (11), or songbirds learn to imitate a tutor’s song (12). However, insights into the neural implementation of sensory-guided motor learning remain sparse, largely because we lack empirical information about the principles underlying the flow of neural activity through synaptic mappings from sensory to motor areas. Such mappings are known as inverse models and flow in the opposite direction of forward models that direct motor activity to sensory areas.

To learn about principles of sensorimotor integration in the zebra finch, a vocal learner, we focus our attention on cortical premotor nuclei necessary for song production and song learning. The premotor area HVC is involved in generating the stereotyped song motifs of adult birds (13), whereas the lateral magnocellular nucleus of the anterior nidopallium (LMAN) forms the output of a basal-ganglia pathway involved in generating subtle song variability (1417). HVC neurons produce highly stereotyped firing patterns during singing (18, 19), whereas LMAN neurons produce highly variable patterns (15, 20, 21).

To gain insights into LMAN’s role in motor control, we consider recent theoretical work that establishes a conceptual link between inverse models and vocal-auditory mirror neurons (2224). We consider three (nonexhaustive) possibilities about the flow of sensory information into motor areas. First, auditory afferents with some feature sensitivity could map onto motor neurons involved in generating those same features (Fig. 1A). This mapping forms a causal inverse, in which a sensory target input generates a motor activity pattern required to cause, or generate that same sensory target. Second, auditory afferents with some feature sensitivity could map onto motor neurons that typically fire after the ones involved in generating those features (Fig. 1B). This mapping forms a predictive inverse, in which a sensory input at some point in a stereotyped acoustic sequence elicits a predictive motor activity pattern required to generate the next acoustic signal in that sequence. Third, the auditory-to-motor connections could be randomly wired (Fig. 1C), in which case there would be no regularity in the relationship between sensory and motor responses.

Fig. 1.

Fig. 1.

Three hypothetical sensorimotor mappings and associated mirroring offsets. Sensory-to-motor mappings could implement a causal inverse of the motor plan (A), a predictive inverse (B), or be random (C). Under a causal inverse, generated by variable sequences of song features (ABC-CBA), a spike burst in a motor neuron (neuron 2) triggers the production (black arrow) of a song feature (feature B) after latency Inline graphic, and the neuron receives sensory feedback (thick green arrow) from that same feature after an additional latency Inline graphic. In such a neuron, we expect to see a cross-covariance (CC) peak (red arrow) between singing-related and playback-evoked spike bursts (black vertical bars) at a time lag (the so-called mirroring offset, red horizontal bar) given by the delay of the sensorimotor loop Inline graphic. Under a predictive inverse (B), generated by stereotyped sequences of song features (ABC-ABC), the motor neuron 2 again triggers song feature B, but at the same time receives reliable feedback from the previous song feature A (thick green arrow). Thus, we expect to see a CC peak at a time lag much smaller than the sensorimotor loop delay Inline graphic. Finally, under a random sensory-to-motor mapping (C), we expected no CC between the motor- and sensory-evoked firing.

Rather than directly characterize the auditory-to-motor mapping (a daunting task), we probe this mapping indirectly by studying the neural responses it causes in experiments in which we compare auditory responses elicited by playback of the bird’s own song (BOS) to motor responses recorded during production of these songs. Indeed, the three possibilities in Fig. 1 make specific, testable predictions at the level of single neurons. Consider for example a motor neuron downstream of a causal inverse model (i.e., neuron 2 in Fig. 1A). This motor neuron generates song feature B after a motor latency Inline graphic. Now, when the bird is not singing, this motor neuron also has a sensory selectivity for song feature B (because it is downstream of a causal inverse model). Therefore, during song playback, this same motor neuron will fire after playback of song feature B with an auditory latency Inline graphic. Thus, if one temporally aligns both the playback-evoked spike train and the singing-related spike train with the onset of song, the playback train will lag, or mirror the singing related train by a mirroring offset equal to the sensorimotor loop delay Inline graphic. In more subtle scenarios in which the motor neuron has selectivity for multiple acoustic features, or is subject to greater degrees of noise, this temporal alignment between the motor and sensory responses of the neuron can still be detected through the position of a peak in the cross-covariance function (CC function) between the playback and singing related spike trains both time-aligned to song (Fig. 1A, Lower), even when the offset may not be visually apparent by simply looking at spike trains (see SI Methods and Fig. S1 for a theory of the CC function in the case of multiple latency auditory responses).

Conversely, consider a motor neuron downstream of a predictive inverse model (i.e., neuron 2 in Fig. 1B). Just as before, the neuron generates song feature B with a latency Inline graphic. However, because it is downstream of a predictive inverse model, when the bird is not singing, this neuron now has a sensory selectivity for the previous song feature, A. This selectivity occurs because a predictive inverse model takes a sensory stimulus, in this case song feature A, and generates a motor command for the next feature, in this case song feature B. Thus, when aligned to song, the playback spike occurs Inline graphic after syllable A while the singing related spike occurs Inline graphic before song feature B. The result is that the mirroring offset will be much smaller than the total sensorimotor loop delay Inline graphic, and the peak in the CC function will be much closer to zero time lag. Finally, in the random scenario in Fig. 1C, we expect no pronounced peak in the CC function.

In theory, whether to expect a causal or predictive inverse depends on the sequence stereotypy of produced song features (23, 24). What are those features? A song feature could be a song syllable, in which case the sequence of features is stereotyped because adult zebra finches sing stereotyped syllable sequences ABC-ABC (Fig. 1B). Stereotyped song sequences are generated mainly by stereotyped firing patterns in HVC, and according to previous theory (23, 24), we expect a predictive inverse and its associated signature of small mirroring offsets, to arise in HVC.

Alternatively, a song feature could be a brief pitch increase or decrease, in which case the features and their variable sequences (ABC, DBA; Fig. 1A) are mainly generated by variable LMAN firing patterns. Accordingly, we expect to find a causal inverse upstream of LMAN (23, 24). Indeed, in HVC neurons the mirroring offset between motor-related spiking and song-playback evoked spiking is less than 10 ms (25), much less than the roughly 40-ms loop delay of HVC estimated using electrical stimulation of HVC and using auditory stimulation of the ear (2631). By contrast, mirroring offsets in LMAN have not been quantified yet, leaving it open as to whether they provide evidence for causal inverses.

Results

We first estimated LMAN's motor latency Inline graphic by electrically stimulating LMAN neurons during singing using chronically implanted electrode pairs (Fig. 2A). Brief single or paired current pulses delivered at a random or a fixed time during a harmonic song syllable induced transient increases in frequency modulation (FM) of song (Fig. 2B and Fig. S2A). Transient song distortions (also of nonharmonic syllables) started in the range of 20–42 ms after stimulation onset (median latency, 30 ms; n = 3 birds; n = 8 syllables).

Fig. 2.

Fig. 2.

LMAN sensorimotor loop delay. (A) Sagittal schematic of the songbird brain. Both HVC and the LMAN project to the premotor RA. DLM, dorsal lateral nucleus of the medial thalamus; nXIIts, hypoglossal nucleus. (B) Electrical stimulation in LMAN using paired 0.2-ms current pulses of 500 µA (separated by 1 ms) during song leads to transient distortions of song syllables (brief pitch decrease, red square bracket) compared with catch trials. (Top) Log-power sound spectrograms (high and low power shown in yellow and black, respectively) of a nonstimulated (catch) syllable and a stimulated (stim) syllable. A stack plot of frequency modulation (FM; Middle) and the mean FM (Bottom) across 488 nonstimulated syllables (catch trials) and 454 stimulated syllables (Stim) reveals a transient FM increase corresponding to a brief pitch decrease (white square bracket in the sound spectrogram) roughly 20 ms (dashed red line) after stimulation onset (time origin, thick red line). (C) Log-power sound spectrogram (Top), raster plot (Middle), and mean firing rate (Bottom) of a LMAN single unit with short auditory latency of 18 ms to playback onset of the bird’s own song (n = 307 playbacks).

We estimated LMAN’s auditory latency Inline graphic in single neurons of quiet (possibly sleeping) birds by exposing them to BOS stimuli played in the dark. We found onset latencies of auditory responses in the range of 12–112 ms (median latency = 26 ms, n = 18/56 single or multiunit sites with clear onset responses within 120 ms of sound onset, n = 6 birds; Fig. 2C and Fig. S2B).

In combination, our estimate of the LMAN loop delay is in the range of 32–154 ms, with a median of 56 ms, similar to the loop delay estimate in HVC (2631). The theory (SI Methods) predicts that the mirroring offset in randomly firing premotor areas such as LMAN is within the range of short loop delays (because nearby pre-post spike pairs are expected to lead to strong synaptic potentiation, thus favoring short delays (24). Hence, we expect LMAN mirroring offsets to be in the range of 32–56 ms (shortest to median loop delay).

To measure the LMAN mirroring offset, we extracellularly recorded from LMAN single and multiunits both when birds were singing and subsequently when we broadcast the produced songs in random order through a loudspeaker. Zebra finches produce songs in bouts composed of introductory notes followed by one to five repetitions of a stereotyped song motif each containing 1–10 song syllables (Fig. 3A, ii). During production of song bouts (undirected song), recorded LMAN neurons tended to fire spike bursts (interspike intervals less than 10 ms), whereas in response to subsequent song playback they fired mostly single spikes (Fig. 3A, ii and iii). Spiking rates during singing were higher than during playback (mean firing-rate ratio song to playback = 2.5, range = 0.9–18, n = 50 single and multiunit sites; Fig. S3). We aligned spike trains with the stereotyped song motif and visually confirmed the high firing variability in LMAN cells (Fig. 3A, iii).

Fig. 3.

Fig. 3.

LMAN mirroring offset. (A) Large positive mirroring offset in an LMAN single unit. (A, i) Song oscillogram (Upper) and raw extracellular voltage trace of neural activity (Lower; Inset shows a spike burst). (A, ii) Song spectrogram of an example song bout, a song motif (marked by a red horizontal bar). The spike raster plot (Lower) shows spikes generated during production of that bout (blue rasters) and during 27 playbacks of that bout (black rasters). Firing-rate curves (Lower) are plotted in corresponding colors. (A, iii) Summary showing spike rasters during production (blue) and during playback (black) of different song motifs (delimited by vertical red lines); the firing-rate curves below are averages over all motifs (not all shown). Song-evoked firing tends to lead playback-evoked firing, in particular at the end of the motif. (A, iv) The CC function (thick red curve) of motif-related spike trains peaks at a time lag of about 50 ms. (B) The average (normalized) motif CC function (red curve) peaks at a time lag near 40 ms and exceeds there a significance threshold (black curves) of +3 Jackknife SDs (n = 50 sites in seven birds). (C) A similar behavior is seen in the population-averaged (and normalized) bout CC function (n = 48 sites in seven birds, red curve). A population-averaged random shift predictor (red dotted curve) remains below the 3 Jackknife significance threshold (black curves).

CC functions between motor-related spike trains within the boundary of the stereotyped song motif and corresponding playback spike trains exhibited peaks often at a lag near 40 ms after motor-related spikes (Fig. 3A, iv and Fig. S4A). Overall, CCs averaged over the 32- to 56-ms lag interval tended to be positive (in 22/50 single or multiunit sites, the median CC was positive and different from zero, Wilcoxon signed-rank test, P < 0.05; at 4/50 sites the median CC was negative, P < 0.05). Almost always, CC functions exhibited peaks also at time lags other than 32–56 ms (e.g., at −100 ms in Fig. 3A, iv). These other peaks occurred at diverse time lags and were irrelevant for testing our 32- to 56-ms mirroring offset hypothesis because they are expected to occur in neurons that fire more than once per song motif (e.g., when intervals between caused song features are short).

We also performed a population analysis by normalizing CC functions by their Jackknife SD estimates (to de-emphasize higher firing rates at multiunit sites) and then averaged the normalized CC functions (n = 50 LMAN sites including 18 single units, n = 7 birds; Methods). The resulting population CC function (Fig. 3B) peaked at a time lag in the range of 40–60 ms (suggesting a lead of motor activity on sensory responses).

To investigate mirroring also beyond single song motifs, we computed bout CC functions over entire song bouts (including song motifs and introductory notes). The resulting population CC function, normalized and averaged over all recording sites (n = 48 LMAN sites, n = 7 birds), also peaked near 40 ms (Fig. 3C), in agreement with the peak of the population CC function over motifs (Fig. 3B). In summary, at individual recording sites, we found a tendency for nonzero mirroring offsets. This tendency was amplified by averaging over recording sites to result in a significant average mirroring offset near 40 ms, suggesting a correspondence between sensory and motor responses.

To place the observed CC functions into perspective, we compared them to theoretical upper and lower bounds. We formed a lower bound by circularly shifting all playback-evoked spike trains by a random shift uniformly chosen within the duration of the song motif, thus eliminating any hypothetical temporal relationship between playback and motor responses. The resulting random-shift predictor (obtained after normalizing and averaging CC functions associated with randomly shifted spike trains) did not exceed the previously defined significance threshold of 3 Jackknife SDs (Fig. 3C), demonstrating that the nonzero mirroring offset we found could not simply occur by chance. We also estimated an upper bound of mirroring strength that corresponds to perfect mirroring limited only by intrinsic variability in playback-evoked responses. That is, we replaced all song-related spike-trains by one trial of a corresponding playback-evoked spike train, shifting it by −40 ms, and cross-correlated it with all other spikes trains evoked by the same song stimulus. If intrinsic variability was absent, all playback-evoked responses would be identical and the population CC function (which by virtue of division by the SD is a signal-to-noise estimate) would exhibit an infinitely high peak. Instead, we found an upper bound of the population CC function of 1.2, about six times above the observed peak of 0.2 in Fig. 3C. In summary, the LMAN mirroring strength of 0.2 (in units of signal to noise) at the 40-ms offset was more than five times higher than that of a random code (SD of random shift predictor) and about 17% of the upper bound corresponding to perfect mirroring.

We also estimated the temporal offset between LMAN sensory- and motor-evoked activity by inspection of the spike-triggered average (STA) sound amplitude, a curve that reports the average sound amplitude preceding and following a spike (Fig. S4 B and C). Sound amplitudes peaked on average 29 ms after LMAN motor spikes, and during playback, they peaked on average 8 ms before LMAN spikes (n = 50 LMAN sites), suggesting that LMAN neurons respond preferentially to increases in sound amplitudes and evidencing a combined motor-playback temporal offset of about 37 ms, which is within the range of estimated loop delays and mirroring offsets.

Discussion

Taken together, our mirroring offset analysis supports the hypothesis that LMAN neurons are part of, or downstream of, a causal inverse model (Fig. 1A). The roughly 40-ms mirroring offset we found was expectedly near the short end of estimated LMAN loop delays and much larger than mirroring offsets reported in HVC (25), all in agreement with a Hebbian view of sensorimotor integration in premotor areas (23, 24).

We found LMAN mirroring to be much weaker than the reported HVC mirroring. Among the sources contributing to weak mirroring in LMAN, we identified intrinsic noise in LMAN BOS responses. Intrinsic noise by itself limited mirroring strength at the level of individual neurons to a signal-to-noise ratio of less than 1.2. Obviously such noisy LMAN auditory responses hinder our ability to precisely measure sensorimotor correspondences. To mitigate the effects of intrinsic noise and response gating as much as possible, we performed the playback sessions while birds were quietly resting in the dark. Possibly the sleep state was not very deep throughout the experiment, which may have reduced auditory responses and their correlation with singing-related activity (21, 32). Although noisy LMAN responses to BOS playback may limit our ability to perform mirroring analyses, they need not limit the bird’s ability to learn inverse models through synaptic learning rules simply because birds learn naturally using auditory feedback during singing and not BOS playback in a quiescent state. In the auditory forebrain, signal-to-noise ratios of more than 10 have been observed (4), revealing that highly sensitive song-related auditory signals are present in the forebrain and that such sensitive signals could underlie the formation of causal inverses.

A shortcoming of our experiments is that we have tested the mirroring hypothesis only on a population level, although in theory it could be tested on a single-neuron level. Namely, if the motor latency of an individual neuron is Inline graphic and its auditory latency is Inline graphic, then the expected mirroring offset is Inline graphic. A major obstacle to testing the single-neuron hypothesis is the difficulty of measuring motor latencies in single neurons: To do so, we would have to stimulate neurons during singing either alone or together with their cofiring neurons, which we were unable to do. Also, LMAN neurons could exhibit diverse auditory response latencies depending on both the time point in the motif at which they are stimulated and the feature composition of the auditory stimulus, as is the case for auditory responses in the primary auditory cortex analog field L (4). Thus, we imagine that a single LMAN neuron may exhibit a wide range of auditory latencies depending on the presynaptic neurons that drive it’s spiking at a given time. In the wake of this complexity associated with testing the mirroring hypothesis in single neurons, we tested for the existence of inverse models at a population level under the simplifying theoretical assumption that the eligibility trace for synaptic learning is a monotonically decaying function of time (nearby spike pairs lead to stronger potentiation than more widely separated spike pairs, which agrees with nearly all known spike-time dependent plasticity rules). This assumption implies that the population averaged mirroring offsets would lie within the earlier range of loop delays because synapses that mediate short-latency responses have higher eligibility and experience more potentiation than synapses that mediate long-latency responses (see SI Methods for a theoretical analysis).

In a sense, our theoretical prediction that mirroring offsets correspond to shorter loop delays is analogous to the immediacy effect in operant conditioning: In this effect, immediate reinforcement is more effective for modifying a response than delayed reinforcement. Indeed, birds can adapt their songs to escape from negatively reinforcing acoustic stimuli that are delivered during low-pitch renditions of their songs (33). In these operant conditioning experiments, birds were found to be unable to adapt their songs to escape negative reinforcement when the latency between pitch measurements and acoustic stimuli was 100 ms (33), revealing that birds are able to detect correlations between their songs and auditory stimuli when latencies of the latter are short but not when they are long.

The lag of 8 ms in our population analysis of STA sound amplitudes during playback (Fig. S4C) is short, even compared with the mean peak latency of 14 ms observed in spectro-temporal receptive fields (STRFs) in field L, the analog of primary auditory cortex (34). Although STRFs are not directly comparable to STAs, we conclude from the short 8-ms lag that presumably LMAN neurons are tuned to increases in sound amplitudes (such as occurring during syllable onsets) and not to peak amplitudes (such as occurring near the middle of syllables). Indeed, the average STA curve in Fig. S4C reaches its maximal slope about 25 ms before spikes, in agreement with the median lag of auditory response onsets.

In essence, our population analysis allowed us to conduct a first test of the inverse model hypothesis, but it leaves open several directions. Clearly, the range of validity of the causal inverse hypothesis remains unresolved; it remains to be investigated whether causal inverses apply to single neurons and even down to specific sound features generated by these neurons. In future work, to obtain improved latency estimates and to test the inverse model hypothesis on the single neuron level, it will be necessary to hold the signal of a single cell for a much longer time than the few minutes we were able to. There are promising new recording techniques that may provide the order of magnitude improvement required (35).

In the postulated Hebbian learning rule leading to formation of inverse models, synaptic strengthening occurs when presynaptic activity follows postsynaptic activity and not vice versa (24). This order allows sensory feedback arriving at motor neurons to be associated with past postsynaptic patterns of motor activity that could have caused this sensory feedback. Such rules have first been described in mormyrid electric fish (36) where they explain the formation of negative mirror images (corollary discharges) of reafferent (self-generated) sensory input (2). In mammals, a similar rule has been found to describe synaptic connections from cortex to the basal ganglia (37). Thus, a causal inverse may reside in the efferent synapses of a cortical area upstream of the basal ganglia homolog; or, based on anatomy, a causal inverse could be connected to a dopaminergic ventral tegmental area (38, 39), thereby establishing a possible link of our findings with reinforcement learning theories (see below). Other candidate pathways for the causal inverse could lead to LMAN through HVC or through the thalamo-cortical projections from the dorsolateral nucleus of the thalamus (DLM). HVC projects to the basal ganglia loop and so indirectly to LMAN. However, inactivation of HVC in anesthetized birds had almost no effect on auditory responses of LMAN neurons (Fig. S5). Further experiments will be needed to explore these and possibly other pathways leading to LMAN, like the recently discovered LMAN shell (40) (note that we cannot rule out that some of the antidromically activated cells in our study were located in the LMAN shell rather than in LMAN because of the close proximity of LMAN and its shell, their combined efference to the arcopallium, and the uncertainty of recording sites in our experiments).

A causal inverse upstream of LMAN could be formed during a sensorimotor phase of song development and be beneficial to that development. That is, the causal inverse could be formed when young birds engage in motor explorations during the sensorimotor learning phase, in agreement with LMAN’s known role in producing song variability in this critical phase (41, 42). Once the inverse mapping is established, it could be used to select downstream motor patterns in agreement with a sensory target and therefore steer the developing song in the right direction. By contrast, it is less clear how predictive inverses leading to HVC could be involved in motor learning. However, predictive inverses may aid vocal communication, especially when fast processing is required (43), e.g., during vocal exchanges with millisecond timescale precision in pair duetting birds (44) or during counter singing episodes (25).

Causal inverse models can in theory support some forms of single-trial imitation, but there are few reports on fast learning in zebra finches [although changes in the juvenile songs are identified after 2 d of exposure to the tutor’s song (45), which motivates further investigations into this possibility]. A promising avenue into LMAN-mediated fast learning are recent experiments on negative song reinforcement in which LMAN, after having been transiently prevented from contributing to vocal output, was shown after recovery to instantaneously mediate vocal escape behaviors (41), which was suggested to provide support for a motor efference copy to LMAN (41). Alternatively, we propose that such instantaneous escape behaviors arise from a causal inverse model upstream of LMAN. Our alternative explanation has the advantage that it invokes no postulated efference copy between a hypothetical premotor area and LMAN (but rather a causal inverse for which we find support). Moreover, in our scenario, LMAN can implement escape behaviors regardless of the source of variability (e.g., be it of neural or muscular origin), whereas in the efference copy scenario, escape would be confined to explorations mediated by the source of the efference copy (and by LMAN).

If reinforcement signals (e.g., dopamine) were mediated via causal inverse models instead of being released nonspecifically as assumed in many computational models (46, 47), then motor learning could be more efficient, in accordance with model-based reinforcement strategies (48). Indeed, simple reinforcement learning strategies can be enhanced with inverse models as a means to solve the structural credit assignment problem inherent in reinforcement learning (49). Our findings can thus be seen as providing neural support for efficient learning capabilities in basal ganglia pathways. Indeed many physiological and theoretical studies of learning in the basal ganglia have focused on the role of dopaminergic afferents onto striatal neurons in carrying reward prediction errors that are useful for reinforcement learning. On the other hand, the results of ref. 37 reveal that afferent cortical synapses onto striatal neurons also exhibit plasticity in which postsynaptic before presynaptic firing strengthens the synapse, a rule useful for inverse model learning, according to the theory of refs. 23 and 24. These results, when combined with evidence presented here for a causal inverse model in the cortical output area of a basal ganglia pathway, suggest that cortico-basal ganglia pathways could contain rich physiological mechanisms capable of combining inverse model and reinforcement based learning.

In general, learning to imitate any complex action poses a challenging problem for sensorimotor circuits: a sensory representation of the action must be converted into motor activity patterns capable of reproducing the action. Our approach of focusing on mirroring offsets and to relate these offsets to the existence and type of inverse models may be useful not only for uncovering basal ganglia circuit mechanisms for vocal learning, but also for explaining a wide range of imitation behaviors beyond vocal learning. We speculate that the careful measurement of mirroring offsets may also provide insights into other mirror neuron systems including visual-tactile neurons in mammalian premotor cortex (50).

Overall, our work suggests we may need to enlarge our hypothesis space for cortico-basal ganglia function by developing and testing theoretical models in which both associative Hebbian plasticity and dopamine dependent reward driven plasticity could interact to mediate sophisticated model based reinforcement learning strategies. At a computational level, these multiple, interacting forms of plasticity could allow organisms to combine classical reinforcement and control-theoretic inverse models to efficiently learn sensorimotor behaviors.

Methods

We chronically recorded from LMAN neurons in Inline graphic freely moving adult male zebra finches (>90 d after hatch) using miniature motorized microdrives. Each song bout was played back a random number of times in the dark because auditory responses in premotor areas of birds are state dependent—they tend to be gated off in the awake and aroused bird but gated on during sleep (51, 52).

We analyzed song (Inline graphic) and playback (Inline graphic) spike trains Inline graphic and Inline graphic (mean-subtracted) that we aligned to the common sonogram and that were either restricted to stereotyped song motifs or to song bouts composed of several motifs and introductory notes. We averaged CC functions Inline graphic (with Inline graphic representing motif or bout duration) over all playbacks of a given song motif/bout. We then computed the mean CC function (Fig. 3A, iv) by averaging over all song motifs/bouts. In the mirroring analysis, we excluded the data from three LMAN single units because they were suppressed by playback and did not produce sufficient spikes to be correlated with motor activity.

Note that the key signature of inverse models is the CC between spike trains in motor and precisely corresponding sensory states. In principle, one could observe a peak in CC evidencing an inverse model even if spiking during song and playback was uniformly distributed in time, i.e., even if song and playback-evoked firing rate curves in Fig. 3B, ii and iii and Fig. S4A, ii were perfectly flat. In other words, the shapes of motif-aligned firing-rate curves may be indicative of mirrored responses but are useless to demonstrate their absence.

When computing population-averaged CC functions, to discount firing-rate differences among recording sites, we first normalized individual CC functions by their Jackknife estimate of SD before averaging (average in the interval [−150, 150] ms; Fig. 3 B and C). We assessed significance of peaks in population-averaged CC functions (Fig. 3 B and C) by their exceeding a threshold of 3 SDs estimated using Jackknifing over recording sites (3 Jackknife SDs Inline graphic correspond roughly to P = 0.01). We also explored normalizing CC functions by the Jackknife estimate of SD in the interval [30, 50] ms, yielding qualitatively similar results.

All playback trials in which the bird produced a vocalization or other sound were discarded. Given the weak auditory responses in LMAN neurons, we recorded their spike responses during long periods of time to obtain sufficient statistics (order of hundreds of song playbacks within 10–20 min). Because it was difficult to maintain either good single-unit isolation or stable multiunit activity across these long periods, in the population plots (Fig. 3 B and C), we averaged clear single-unit responses with stable multiunit responses.

Supplementary Material

Supporting Information

Acknowledgments

We thank Klaus Hepp, Alexander Hanuschkin, and Walter Senn for helpful discussions and critical reading. This work was funded by Swiss National Science Foundation Grant 31003A_127024, the European Research Council (ERC) under the European Community's Seventh Framework Programme (FP7/2007-2013/ERC Grant AdG 268911), Burroughs Wellcome Foundation, Sloan Foundation, and Defense Advanced Research Planning Agency.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1317087111/-/DCSupplemental.

References

  • 1.Ahrens MB, et al. Brain-wide neuronal dynamics during motor adaptation in zebrafish. Nature. 2012;485(7399):471–477. doi: 10.1038/nature11057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bell CC. An efference copy which is modified by reafferent input. Science. 1981;214(4519):450–453. doi: 10.1126/science.7291985. [DOI] [PubMed] [Google Scholar]
  • 3.Eliades SJ, Wang X. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature. 2008;453(7198):1102–1106. doi: 10.1038/nature06910. [DOI] [PubMed] [Google Scholar]
  • 4.Keller GB, Hahnloser RHR. Neural processing of auditory feedback during vocal practice in a songbird. Nature. 2009;457(7226):187–190. doi: 10.1038/nature07467. [DOI] [PubMed] [Google Scholar]
  • 5.Poulet JFA, Hedwig B. A corollary discharge maintains auditory sensitivity during sound production. Nature. 2002;418(6900):872–876. doi: 10.1038/nature00919. [DOI] [PubMed] [Google Scholar]
  • 6.Poulet JFA, Hedwig B. The cellular basis of a corollary discharge. Science. 2006;311(5760):518–522. doi: 10.1126/science.1120847. [DOI] [PubMed] [Google Scholar]
  • 7.Crapse TB, Sommer MA. Corollary discharge circuits in the primate brain. Curr Opin Neurobiol. 2008;18(6):552–557. doi: 10.1016/j.conb.2008.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Webb B. Neural mechanisms for prediction: Do insects have forward models? Trends Neurosci. 2004;27(5):278–282. doi: 10.1016/j.tins.2004.03.004. [DOI] [PubMed] [Google Scholar]
  • 9.Gergely G, Csibra G. Sylvia's recipe: The role of imitation and pedagogy in the transmission of cultural knowledge. In: Enfield NJ, Levenson SC, editors. Roots of Human Sociality: Culture, Cognition, and Human Interaction. Oxford, UK: Berg Publishers; 2006. pp. 229–255. [Google Scholar]
  • 10.Meltzoff AN, Decety J. What imitation tells us about social cognition: A rapprochement between developmental psychology and cognitive neuroscience. Philos Trans R Soc Lond B Biol Sci. 2003;358(1431):491–500. doi: 10.1098/rstb.2002.1261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Giret N, Albert A, Nagle L, Kreutzer M, Bovet D. Context-related vocalizations in African grey parrots (Psittacus erithacus) Acta Ethol. 2012;15(1):39–46. [Google Scholar]
  • 12.Funabiki Y, Konishi M. Long memory in song learning by zebra finches. J Neurosci. 2003;23(17):6928–6935. doi: 10.1523/JNEUROSCI.23-17-06928.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nottebohm F, Stokes TM, Leonard CM. Central control of song in the canary, Serinus canarius. J Comp Neurol. 1976;165(4):457–486. doi: 10.1002/cne.901650405. [DOI] [PubMed] [Google Scholar]
  • 14.Kao MH, Wright BD, Doupe AJ. Neurons in a forebrain nucleus required for vocal plasticity rapidly switch between precise firing and variable bursting depending on social context. J Neurosci. 2008;28(49):13232–13247. doi: 10.1523/JNEUROSCI.2250-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Olveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biol. 2005;3(5):e153. doi: 10.1371/journal.pbio.0030153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kao MH, Brainard MS. Lesions of an avian basal ganglia circuit prevent context-dependent changes to song variability. J Neurophysiol. 2006;96(3):1441–1455. doi: 10.1152/jn.01138.2005. [DOI] [PubMed] [Google Scholar]
  • 17.Stepanek L, Doupe AJ. Activity in a cortical-basal ganglia circuit for song is required for social context-dependent vocal variability. J Neurophysiol. 2010;104(5):2474–2486. doi: 10.1152/jn.00977.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hahnloser RHR, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature. 2002;419(6902):65–70. doi: 10.1038/nature00974. [DOI] [PubMed] [Google Scholar]
  • 19.Long MA, Jin DZ, Fee MS. Support for a synaptic chain model of neuronal sequence generation. Nature. 2010;468(7322):394–399. doi: 10.1038/nature09514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hessler NA, Doupe AJ. Social context modulates singing-related neural activity in the songbird forebrain. Nat Neurosci. 1999;2(3):209–211. doi: 10.1038/6306. [DOI] [PubMed] [Google Scholar]
  • 21.Hessler NA, Doupe AJ. Singing-related neural activity in a dorsal forebrain-basal ganglia circuit of adult zebra finches. J Neurosci. 1999;19(23):10461–10481. doi: 10.1523/JNEUROSCI.19-23-10461.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Oztop E, Kawato M, Arbib MA. Mirror neurons: Functions, mechanisms and models. Neurosci Lett. 2013;540:43–55. doi: 10.1016/j.neulet.2012.10.005. [DOI] [PubMed] [Google Scholar]
  • 23.Hahnloser R, Ganguli S. Vocal learning with inverse models. In: Panzeri S, Quiroga P, editors. Principles of Neural Coding. Boca Raton, FL: CRC Taylor and Francis; 2013. [Google Scholar]
  • 24.Hanuschkin A, Ganguli S, Hahnloser RHR. A Hebbian learning rule gives rise to mirror neurons and links them to control theoretic inverse models. Front Neural Circuits. 2013;7:106. doi: 10.3389/fncir.2013.00106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Prather JF, Peters S, Nowicki S, Mooney R. Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature. 2008;451(7176):305–310. doi: 10.1038/nature06492. [DOI] [PubMed] [Google Scholar]
  • 26.Day NF, Kinnischtzke AK, Adam M, Nick TA. Top-down regulation of plasticity in the birdsong system: “Premotor” activity in the nucleus HVC predicts song variability better than it predicts song features. J Neurophysiol. 2008;100(5):2956–2965. doi: 10.1152/jn.90501.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Margoliash D, Fortune ES. Temporal and harmonic combination-sensitive neurons in the zebra finch’s HVc. J Neurosci. 1992;12(11):4309–4326. doi: 10.1523/JNEUROSCI.12-11-04309.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.McCasland JS. Neuronal control of bird song production. J Neurosci. 1987;7(1):23–39. doi: 10.1523/JNEUROSCI.07-01-00023.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.McCasland JS, Konishi M. Interaction between auditory and motor activities in an avian song control nucleus. Proc Natl Acad Sci USA. 1981;78(12):7815–7819. doi: 10.1073/pnas.78.12.7815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Troyer TW, Doupe AJ. An associational model of birdsong sensorimotor learning II. Temporal hierarchies and the learning of song sequence. J Neurophysiol. 2000;84(3):1224–1239. doi: 10.1152/jn.2000.84.3.1224. [DOI] [PubMed] [Google Scholar]
  • 31.Wang CZH, Herbst JA, Keller GB, Hahnloser RHR. Rapid interhemispheric switching during vocal production in a songbird. PLoS Biol. 2008;6(10):e250. doi: 10.1371/journal.pbio.0060250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Doupe AJ, Konishi M. Song-selective auditory circuits in the vocal control system of the zebra finch. Proc Natl Acad Sci USA. 1991;88(24):11339–11343. doi: 10.1073/pnas.88.24.11339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tumer EC, Brainard MS. Performance variability enables adaptive plasticity of ‘crystallized’ adult birdsong. Nature. 2007;450(7173):1240–1244. doi: 10.1038/nature06390. [DOI] [PubMed] [Google Scholar]
  • 34.Sen K, Theunissen FE, Doupe AJ. Feature analysis of natural sounds in the songbird auditory forebrain. J Neurophysiol. 2001;86(3):1445–1458. doi: 10.1152/jn.2001.86.3.1445. [DOI] [PubMed] [Google Scholar]
  • 35.Guitchounts G, Markowitz JE, Liberti WA, Gardner TJ. A carbon-fiber electrode array for long-term neural recording. J Neural Eng. 2013;10(4):046016. doi: 10.1088/1741-2560/10/4/046016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bell CC, Han VZ, Sugawara Y, Grant K. Synaptic plasticity in a cerebellum-like structure depends on temporal order. Nature. 1997;387(6630):278–281. doi: 10.1038/387278a0. [DOI] [PubMed] [Google Scholar]
  • 37.Fino E, Glowinski J, Venance L. Bidirectional activity-dependent plasticity at corticostriatal synapses. J Neurosci. 2005;25(49):11279–11287. doi: 10.1523/JNEUROSCI.4476-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Person AL, Gale SD, Farries MA, Perkel DJ. Organization of the songbird basal ganglia, including area X. J Comp Neurol. 2008;508(5):840–866. doi: 10.1002/cne.21699. [DOI] [PubMed] [Google Scholar]
  • 39.Gale SD, Person AL, Perkel DJ. A novel basal ganglia pathway forms a loop linking a vocal learning circuit with its dopaminergic input. J Comp Neurol. 2008;508(5):824–839. doi: 10.1002/cne.21700. [DOI] [PubMed] [Google Scholar]
  • 40.Bottjer SW, Altenau B. Parallel pathways for vocal learning in basal ganglia of songbirds. Nat Neurosci. 2010;13(2):153–155. doi: 10.1038/nn.2472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Charlesworth JD, Warren TL, Brainard MS. Covert skill learning in a cortical-basal ganglia circuit. Nature. 2012;486(7402):251–255. doi: 10.1038/nature11078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bottjer SW, Miesner EA, Arnold AP. Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science. 1984;224(4651):901–903. doi: 10.1126/science.6719123. [DOI] [PubMed] [Google Scholar]
  • 43.Bonini L, Ferrari PF. Evolution of mirror systems: A simple mechanism for complex cognitive functions. Ann N Y Acad Sci. 2011;1225:166–175. doi: 10.1111/j.1749-6632.2011.06002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Fortune ES, Rodríguez C, Li D, Ball GF, Coleman MJ. Neural mechanisms for the coordination of duet singing in wrens. Science. 2011;334(6056):666–670. doi: 10.1126/science.1209867. [DOI] [PubMed] [Google Scholar]
  • 45.Tchernichovski O, Mitra PP, Lints T, Nottebohm F. Dynamics of the vocal imitation process: How a zebra finch learns its song. Science. 2001;291(5513):2564–2569. doi: 10.1126/science.1058522. [DOI] [PubMed] [Google Scholar]
  • 46.Frémaux N, Sprekeler H, Gerstner W. Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci. 2010;30(40):13326–13337. doi: 10.1523/JNEUROSCI.6249-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Fiete IR, Fee MS, Seung HS. Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. J Neurophysiol. 2007;98(4):2038–2057. doi: 10.1152/jn.01311.2006. [DOI] [PubMed] [Google Scholar]
  • 48.Atkeson CG, Santamaria JC. Proceedings of the IEEE International Conference on Robotics and Automation. 1997. A comparison of direct and model-based reinforcement learning; pp. 3557–3564. [Google Scholar]
  • 49.O’Reilly RC, Frank MJ. Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 2006;18(2):283–328. doi: 10.1162/089976606775093909. [DOI] [PubMed] [Google Scholar]
  • 50.Graziano MSA, Hu XT, Gross CG. Visuospatial properties of ventral premotor cortex. J Neurophysiol. 1997;77(5):2268–2292. doi: 10.1152/jn.1997.77.5.2268. [DOI] [PubMed] [Google Scholar]
  • 51.Dave AS, Yu AC, Margoliash D. Behavioral state modulation of auditory activity in a vocal motor system. Science. 1998;282(5397):2250–2254. doi: 10.1126/science.282.5397.2250. [DOI] [PubMed] [Google Scholar]
  • 52.Schmidt MF, Konishi M. Gating of auditory responses in the vocal control system of awake songbirds. Nat Neurosci. 1998;1(6):513–518. doi: 10.1038/2232. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES