Skip to main content
eLife logoLink to eLife
. 2017 Dec 19;6:e26973. doi: 10.7554/eLife.26973

Neural activity in cortico-basal ganglia circuits of juvenile songbirds encodes performance during goal-directed learning

Jennifer M Achiro 1, John Shen 1, Sarah W Bottjer 2,
Editor: Ronald L Calabrese3
PMCID: PMC5762157  PMID: 29256393

Abstract

Cortico-basal ganglia circuits are thought to mediate goal-directed learning by a process of outcome evaluation to gradually select appropriate motor actions. We investigated spiking activity in core and shell subregions of the cortical nucleus LMAN during development as juvenile zebra finches are actively engaged in evaluating feedback of self-generated behavior in relation to their memorized tutor song (the goal). Spiking patterns of single neurons in both core and shell subregions during singing correlated with acoustic similarity to tutor syllables, suggesting a process of outcome evaluation. Both core and shell neurons encoded tutor similarity via either increases or decreases in firing rate, although only shell neurons showed a significant association at the population level. Tutor similarity predicted firing rates most strongly during early stages of learning, and shell but not core neurons showed decreases in response variability across development, suggesting that the activity of shell neurons reflects the progression of learning.

Research organism: Other

Introduction

Recurrent cortico-basal ganglia circuits mediate procedural skill learning, which involves goal-oriented evaluation of behavioral outcomes to gradually select appropriate motor actions. Vocal learning in songbirds provides a powerful model for studying the control of experience-dependent skill learning by cortico-basal ganglia circuits during development. Similar to infants learning speech, juvenile songbirds memorize vocal sounds of an adult tutor. They then progressively refine their own vocal behavior to imitate the tutor song through iterative comparisons between feedback of their own vocalizations and tutor sounds. Successful acquisition requires the evaluation of behavioral feedback against a representation of the goal (tutor song) to guide the gradual acquisition of an accurate imitation during the sensorimotor stage of vocal learning.

Neural control of vocal learning in juvenile zebra finches (Taeniopygia guttata) is vested in cortico-basal ganglia loops that emanate from the cortical nucleus LMAN (lateral magnocellular nucleus of the anterior nidopallium; Figure 1) (Aronov et al., 2008; Bottjer et al., 1984; Scharff and Nottebohm, 1991). Core and surrounding shell subregions of LMAN make parallel connections through the basal ganglia and thalamus (Bottjer, 2004; Luo et al., 2001; Johnson et al., 1995; Gale et al., 2008; Person et al., 2008; Iyengar et al., 1999; Paterson and Bottjer, 2017) and appear functionally similar to sensorimotor and associative cortico-basal ganglia loops that contribute to different aspects of motor learning in mammals (Thorn et al., 2010; Yin et al., 2008; Samejima and Doya, 2007; Ashby et al., 2010; Redgrave et al., 2010; Graybiel, 2008; Gremel and Costa, 2013; Yin et al., 2009; Kupferschmidt et al., 2017; Alexander and Crutcher, 1990). The core pathway mediates vocal motor production in juvenile songbirds (Elliott et al., 2014; Scharff and Nottebohm, 1991; Aronov et al., 2008), whereas the shell pathway is involved in evaluating sensorimotor learning; lesions in the shell pathway of juvenile birds impair the ability to copy tutor song syllables, but do not cause motor disruption of song (Bottjer and Altenau, 2010) (Figure 1—figure supplement 1). This disruption of learning, but not motor performance, supports the idea that shell circuitry helps to evaluate whether self-generated vocalizations match learned tutor sounds.

Figure 1. Cortico-basal ganglia circuits for vocal learning in juvenile zebra finches.

Left: core (c, gray) and shell (s, red) subregions of the cortical nucleus LMAN give rise to parallel recurrent loops through the basal ganglia. LMAN-core projects to vocal motor cortex (RA); this pathway drives vocal motor output in juvenile birds. LMAN-shell projects to a region of motor cortex that is adjoined to the lateral margin of RA (AId); this pathway does not drive motor output but is involved in learning. Shell also forms a trans-cortical loop via AId that converges with the core-shell basal ganglia loops in a dorsal thalamic zone. A transient projection from core to AId (dashed line) is present only in juvenile birds and creates a site of integration between core and shell pathways in AId during the learning period. The dorsal thalamus feeds back to LMAN and feeds forward to the premotor cortical area, HVC (High Vocal Center), via medial MAN (latter pathway not shown for clarity). Right: upper panel shows a low-power coronal section containing core and shell regions of LMAN as well as the anterior basal ganglia (Area X, dashed white outline, is an anatomical subregion of basal ganglia in songbirds; it contains both striatal and pallidal cells and is necessary for vocal learning). Calbindin expression (dark staining) demarcates both core and shell regions by labeling terminals of afferent thalamic axons. Lower panel shows a high-magnification Nissl-stained coronal view of the border between core and shell subregions, which are distinguished by the higher density of magnocellular neurons within core.

Figure 1.

Figure 1—figure supplement 1. Lesions of AId prevent vocal learning in juvenile birds.

Figure 1—figure supplement 1.

Imitation of tutor (father) songs by adult birds that had received a bilateral lesion of AId as juveniles (at 45 dph, after tutor song memorization) compared to birds that received control lesions. Lesion of AId interrupts the trans-cortical loop from shell through AId and back to shell via the thalamus, and prevents the convergence of this trans-cortical loop with the cortico-basal ganglia loops through thalamus. Left bars: percent of syllables in the tutor songs copied by the sons; right bars: percent of syllables in the sons’ songs copied from the tutors (black, lesioned birds versus gray, control birds; means + s.e.m.). Lesions of AId prevent imitation of the tutor song as well as the development of a stereotyped sequence of syllables; however, the phonology of syllables is normal, indicating that AId lesions do not exert an effect on vocal motor production.
© 2010, Bottjer and Altenau
Reproduced, with permission, from Figure 3 of Bottjer and Altenau (2010). Nature Neuroscience. 13 (2):153–5.

Figure 1—figure supplement 2. Distinct populations of neurons in LMAN-shell respond to either tutor song or self-generated song (own song).

Figure 1—figure supplement 2.

LMAN neurons respond differentially to playback of songs in juvenile zebra finches during early stages of sensorimotor integration, following memorization of the tutor song (45 dph). Individual core neurons show significant responses to playback of their own song only or to both their own song and the tutor song (and frequently to other songs as well, data not shown); few core neurons respond to tutor song only. In contrast, individual shell neurons show significant responses either to their own song or to tutor song; few shell neurons respond to playback of both their own song and tutor song. Thus, a large population of shell neurons (~30%) responds significantly to tutor song, but not to own song (or other control songs). This pattern indicates that LMAN contains representations of both current behavior (self-generated song) and the goal behavior (tutor song). Tutor-tuned neurons are a transient population: their incidence decreases greatly by late stages of sensorimotor integration (not shown), indicating that tutor-tuned neurons are eliminated due to cell death or are re-tuned to each bird’s own song.
© 2013, Achiro and Bottjer.
Reproduced, with permission, under the terms of the Creative Commons Attribution-Non Commercial 3.0 Unported (CC BY-NC 3.0) license (https://creativecommons.org/licenses/by-nc/3.0/) from Figure 3A of Achiro and Bottjer (2013). The Journal of Neuroscience. 33 (36):14475–88.

Figure 1—figure supplement 3. Individual LMAN-core neurons send axon collaterals into both RA and AId only in juvenile birds.

Figure 1—figure supplement 3.

Left: the borders of AId are difficult to discern in Nissl-stained sections but are clearly demarcated by the axonal projection from LMAN. Labeled axons in this photomicrograph were produced by a large injection of HRP that covered both core and shell in an adult bird; axons enter the dorsal border of AId (most core axons enter RA through its lateral margin). Coronal view, medial is left; bar = 200 µm. Right: reconstructions of individual axon arbors of LMAN-core neurons in a juvenile bird (27 dph). Many core neurons of juvenile birds, but not adults, extend collaterals into AId; in adult birds, core neurons project only to RA and shell neurons project only to AId. This transient projection provides a point of integration between the two basal ganglia pathways as juvenile birds are actively engaged in learning but is completely retracted in adult birds. In addition, the transient projection suggests that these branches convey corollary discharge signals, since they are collateral branches of core neurons that drive vocal behavior via RA but lesions of AId do not cause motor disruption.
© 2012, The American Physiological Society. All rights reserved.
The figure in the right panel was originally published as Figure 11 in Miller-Sims and Bottjer (2012). Journal of Neurophysiology. 107 (4):1142–56. Further reproduction of the right panel would need permission from the copyright holder.
© 1989, Alan R Liss, Inc/Wiley. All rights reserved.
The figure in the left panel was originally published as Figure 4 in Bottjer et al. (1989). The Journal of Comparative Neurology. 279 (2):312–26. Further reproduction of the left panel would need permission from the copyright holder.

Within the shell subregion of LMAN in juvenile birds, distinct subpopulations of neurons respond selectively to playback of either their learned tutor song or their own self-generated song (Achiro and Bottjer, 2013), indicating that the shell pathway has access to neural representations of the goal behavior and the current version of the bird’s own song (Figure 1—figure supplement 2).

In contrast, core neurons are not selective for different song types at the onset of sensorimotor learning, but gradually become selectively tuned to playback of their own song (Doupe, 1997; Solis and Doupe, 1997). In addition, core neurons that project to vocal motor cortex (RA, Figure 1) send transient axon collaterals into the shell pathway, such that a copy of the motor signal is conveyed into the shell circuit only in juvenile birds (Miller-Sims and Bottjer, 2012) (Figure 1—figure supplement 3). Remarkably, the large population of tutor-tuned neurons in LMAN-shell is gone by late stages of sensorimotor learning, suggesting a key role for this subpopulation during a restricted period of development. These developmental changes suggest that the shell pathway may evaluate feedback about current vocal performance in relation to a memory of the tutor song during early sensorimotor learning and transmit that evaluation to core and other motor pathways (Achiro and Bottjer, 2013; Bottjer et al., 2010). We evaluated this idea by testing the activity of core and shell neurons in singing juvenile birds as they are actively involved in sensorimotor learning. The results support the idea that core neurons participate in motor-related actions and that activity in both core and shell neurons reflects evaluation of feedback of motor performance against the goal of memorized tutor sounds.

Results

Neurons in both core and shell subregions of LMAN exhibit singing-related neural activity in juvenile birds

We recorded single neurons in core and shell subregions of LMAN in juvenile zebra finches that had completed memorization of a tutor song and begun to practice their song vocalizations (43–60 dph). The majority of neurons in both core and shell showed significant modulation of firing rate during singing (Table 1; see Materials and methods). Among neurons that exhibited a significant change in firing rate during song production, approximately 65–70% showed excitation in both core and shell whereas the remainder were suppressed. Excitatory response strength was marginally higher in shell neurons, whereas suppressed response strength did not differ between core and shell (Table 1; Mann-Whitney tests: singing-excited neurons U = 1768, p=0.06, singing-suppressed neurons U = 459, p=0.90). Figure 2 shows spiking activity of example core and shell neurons that exhibited average levels of excitation (top) or suppression (bottom) during singing. Singing-excited neurons in both core and shell showed a significant increase in bursting during singing (fraction of spikes with interspike intervals <10 ms; Figure 2—figure supplement 1) as shown previously for LMAN-core neurons in juvenile birds (Olveczky et al., 2005).

Table 1. Response strength during episodes of singing.

Standardized response strength (mean ± s.e.m.) for core and shell neurons in LMAN that showed significant excitation or suppression during song production compared with quiet baseline periods (see Materials and methods).

CORE SHELL
Fraction Response strength Fraction Response strength
Excited 0.72 (66/92) 7.06 ± 0.71 0.65 (66/102) 7.28 ± 0.44
Suppressed 0.28 (26/92) −7.32 ± 1.15 0.35 (36/102) −5.82 ± 0.45

Figure 2. Single neurons in both core and shell subregions of LMAN showed singing-related activity in juvenile songbirds.

Left: Examples of two different core neurons during singing in juvenile birds showing either excitation (top; 54 dph) or suppression (bottom; 43 dph) compared with quiet baseline periods. Spectrograms depict three example singing episodes (frequency, 0–8 kHz, over time) and one non-singing baseline period; time-aligned spikes and corresponding firing rates (spikes/s; 10 ms bin size) are shown above. Right: Examples of two different shell neurons during singing in juvenile birds showing either excitation (top; 43 dph) or suppression (bottom; 50 dph), as in left panels.

Figure 2.

Figure 2—figure supplement 1. Spike bursts increased in excited but not suppressed neurons during singing.

Figure 2—figure supplement 1.

Percent of spikes that occurred in bursts (interspike intervals <10 ms) from core (gray) and shell (red) neurons during singing and local baselines (average of the two baseline periods nearest in time to each singing episode). Left panels: singing-excited neurons; right panels: singing-suppressed neurons. Box plots indicate medians and first/third quartiles. ***p<0.001; n.s. indicates no significant difference between baseline and singing (Wilcoxon signed-rank tests).
Figure 2—figure supplement 2. LMAN neurons had low selectivity for different syllable types.

Figure 2—figure supplement 2.

(A) Example spiking responses during production of four different syllable types in a 53 dph bird. Top row shows spectrograms; raster plots and PSTHs for each syllable in single core and shell neurons are below. Firing rates in both core and shell neurons were similar across production of different syllable types, suggesting little selectivity. (B) To quantify the selectivity of responses for different syllable types we computed an activity fraction for neurons that were excited during production of at least one syllable type, where a score of 0 indicates no selectivity for a specific syllable type and one indicates maximum selectivity (see Materials and methods). Cumulative distributions of activity fraction scores were low (filled circles; mean scores ± s.e.m.: core = 0.15 ± 0.02; shell = 0.16 ± 0.02; no difference between core and shell, Kolmogorov-Smirnov Z = 0.97, p=0.31). However, scores for both subregions were significantly different from distributions of scores for which responses were shuffled with respect to syllable type (dashed lines; core Kolmogorov-Smirnov Z = 2.48, p<0.001; shell Kolmogorov-Smirnov Z = 2.71, p<0.001), suggesting that neurons in both subregions of juvenile LMAN have a low level of selectivity for syllable identity.

Firing rates of neurons in core and shell were also similar during production of different syllable types and lacked temporal specificity across renditions of the same syllable type (Figure 2—figure supplement 2). Thus, spiking was relatively sparse and variable within and across syllable types in both core and shell neurons, as reported previously in core for both juvenile and adult birds during playback and singing (Doupe, 1997; Doupe and Solis, 1997; Kao et al., 2008; Olveczky et al., 2005; Solis and Doupe, 1997). In summary, the basic profile of neural activity during song production was similar between shell and core neurons.

Despite these overall similarities in neural activity between core and shell, the population of shell neurons would not be expected to show a coordinated increase in firing rate prior to the onset of singing if the shell pathway lacks a role in vocal motor production (Bottjer and Altenau, 2010). In accord with this idea, population histograms of mean-subtracted responses in pre-singing excited neurons showed synchronous increases in firing rates prior to syllable onsets in core neurons, whereas shell neurons revealed no evidence for coordinated pre-singing increases in firing rate (Figure 3). The lack of time-locked premotor activity in shell neurons is consistent with the absence of motor abnormalities following lesions that disrupt shell circuitry (Bottjer and Altenau, 2010), indicating a non-motor role in learning.

Figure 3. Pre-singing activity aligned to syllable onsets showed coordinated premotor activity in core but not shell.

Figure 3.

Pre-singing activity in core (gray, top panel) and shell (red, bottom panel) in juvenile birds for all neurons that showed significant excitation prior to syllable onsets. Solid lines show smoothed mean-subtracted population rate histograms aligned to syllable onsets (syllable onset at time 0), shading indicates s.e.m. Bars above and below the traces indicate times at which the rate change is significant (95% confidence interval outside of zero); n’s are indicated in parentheses. Population histograms aligned to syllable offsets showed no significant changes for either core or shell offset-excited neurons (data not shown).

Figure 3—source data 1. Pre-singing spiking activity of individual CORE and SHELL neurons.
elife-26973-fig3-data1.xlsx (398.2KB, xlsx)
DOI: 10.7554/eLife.26973.011

Neural activity in LMAN reflects similarity of self-generated syllables to tutor syllables

A large proportion of shell neurons are specifically tuned to playback of either the tutor song or the current version of self-generated song during early stages of sensorimotor integration when core neurons drive vocal motor output and lesions of LMAN pathways cause disruption of song learning (Achiro and Bottjer, 2013; Aronov et al., 2008; Bottjer et al., 1984; Olveczky et al., 2005; Scharff and Nottebohm, 1991; Bottjer and Altenau, 2010). This pattern suggests that LMAN is involved in sensorimotor learning in juveniles, but direct support is lacking for the idea that the activity of individual LMAN neurons represents how well syllables match the tutor song during singing. We tested this idea in juvenile birds (43–60 dph) by comparing neural activity during singing with the acoustic similarity of self-generated syllables to tutor syllables (see Materials and methods): regressions of baseline-corrected firing rates against tutor similarity were performed for each neuron. Unexpectedly, this analysis revealed that firing rates of cells in both core and shell could either increase or decrease as a function of tutor similarity: approximately half of all neurons in each subregion showed increased firing rates for syllables with higher tutor similarity (positive slopes, r values > 0), whereas the other half showed increased firing rates for syllables with lower tutor similarity (negative slopes, r values < 0) (Table 2). Figure 4A shows examples of syllable utterances with relatively high or low acoustic similarity to the closest tutor syllable, along with the corresponding firing rate of a shell neuron that showed a negative correlation with tutor similarity (Figure 4B). This neuron showed a low firing rate during production of syllables with high tutor similarity and excitation during production of syllables with low tutor similarity. Thus, LMAN neurons can encode similarity between self-generated utterances and tutor song via either increases or decreases in firing rate.

Table 2. Tutor similarity modulates baseline-corrected firing rates in single LMAN neurons.

Single neurons showed either positive or negative slopes for the regression of firing rate against tutor similarity. The incidence of neurons across the population that had either positive (increased firing rates for syllables with higher tutor similarity) or negative (increased firing rates for syllables with lower tutor similarity) r values was similar for core and shell. Most single neurons had nonsignificant r values, but clear effects were observed at the population level (see Figure 5, text).

Positive slope (r > 0) Negative slope (r < 0)
CORE 48 50 Total cell number (n = 98)
49.0 51.0 Percent
0.082 −0.065 Mean of r value across all cells
3.3 2.2 Estimated % significant core cells = 5.5
0.22 −0.21 Approximate mean r value for significant cells (n = 5)
SHELL 62 60 Total cell number (n = 122)
50.8 49.2 Percent
0.062 −0.081 Mean of r value across all cells
4.5 6.3 Estimated % significant shell cells = 10.8
0.21 −0.31 Approximate mean r value for significant cells (n = 13)

Figure 4. Single LMAN neurons encoded similarity to tutor song in singing juvenile birds.

Figure 4.

(A) Examples of juvenile syllables with relatively high or low similarity to tutor syllables. Spectrograms (frequency, 0–8 kHz, over time) showing two different tutor syllables (top) and examples of juvenile renditions from a 59 dph juvenile bird (bottom) which were matched to the corresponding tutor syllable and showed either relatively high or low similarity to that tutor syllable. (B) Baseline-corrected firing rates across all renditions of all syllable types for a shell neuron from the same bird during the same period of singing as in A; this cell showed a significant negative correlation between baseline-corrected firing rates and tutor similarity of self-generated utterances.

To estimate the fraction of single neurons showing a significant relationship between firing rate and tutor similarity, we performed repeated permutation tests by generating 1000 random shuffles of the relationship between firing rate and tutor similarity for each cell (O'Connor et al., 2010); see Materials and methods). The percent of significant neurons compared to these random distributions was 5.5% in core and 10.8% in shell (Table 2) (chi-square test between core and shell proportions = 1.98, p=0.18). core and shell neurons were evenly split between positive and negative associations of firing rate to degree of tutor song matching, and had fairly comparable r values (averages ranging from +0.22 to −0.31). In summary, tutor similarity predicted the firing rates of a relatively small percentage of single neurons in both core and shell, with somewhat more neurons in shell showing a significant correlation.

To test whether similarity to tutor syllables modulates neural activity at the population level, we analyzed the regressions of baseline-corrected firing rates with tutor similarity across all neurons using a mixed-effects linear regression model (fixed and random effects for tutor similarity nested within a random intercept for neurons; see Materials and methods). This analysis yielded a significant effect for shell neurons (t = −2.23, p=0.035) but not for core neurons (t = 0.91,p=0.37), indicating that the population activity of shell neurons during singing reflected tutor-matching performance. Because positive or negative relationships between firing rate and tutor similarity may reflect unique processes of evaluating tutor similarity (Table 2), we examined these two categories separately. Using the mixed-effects linear regression model to provide a descriptive assessment of the magnitude of positive and negative associations, we found significant effects for positive and negative relationships between firing rate and tutor similarity in both core and shell neurons (core positive: t = 3.42, p=0.003; negative: t = −2.90, p=0.004; shell positive: t = 4.02, p=0.001; negative: t = −2.80, p=0.009). To illustrate these relationships, Figure 5 shows response strengths during production of syllables with low versus high tutor similarity (bottom versus top 50% ranked by tutor similarity; Materials and methods) separated by whether cells showed positive (left panels) or negative (right panels) slope values. These data indicate that the majority of neurons in both core and shell showed either higher or lower firing rates during production of syllables with higher tutor similarity.

Figure 5. Similarity to tutor syllables modulated firing rate in either a positive or negative direction across the population of LMAN neurons.

Figure 5.

Standardized response strength for each neuron in core (gray) and shell (red) during production of syllable renditions representing low versus high similarity to corresponding tutor syllables based on a median split of tutor similarity scores (bottom versus top 50% of tutor similarity scores). Left panels, neurons that showed positive slopes in the regression analysis (r values > 0); right panels, neurons that showed negative slopes in the regression analysis (r values < 0) (left panels: core n = 48, shell n = 62; right panels: core n = 50, shell n = 60; two outliers have been removed from core for clarity of exposition). Box plots represent median and first/third quartiles.

An alternative interpretation of the significant association between firing rate and tutor similarity in shell is that firing rates were modulated by more prototypical juvenile utterances, that is, those closer to the center of the distribution of acoustic features for a given syllable type (see Materials and methods). We tested whether the significant correlation to tutor similarity we observed in shell neurons reflected a tendency to encode highly prototypical syllables using the mixed-effects regression model described above. This analysis showed a non-significant effect (t = −1.53, p=0.136), and there was no relationship between indices of tutor similarity and prototypicality (data not shown), indicating that tutor-similar syllable renditions are not more prototypical. Thus, the relationship between firing rates and tutor similarity in shell neurons does not appear to be based on prototypicality of self-generated syllables.

Variability in neural responses of both core and shell neurons reflects similarity of self-generated syllables to tutor syllables

To test if degree of similarity to tutor song also modulates firing rate variability of LMAN neurons in 43–60 dph birds, we calculated the CV (coefficient of variation) of firing rate during production of syllable renditions with low versus high similarity to tutor syllables (bottom versus top 50% of tutor similarity scores). Both core and shell neurons showed increased firing rate variability during production of syllable utterances with low tutor similarity compared to those with high similarity (Figure 6A; Wilcoxon signed-rank tests, core Z = −5.70, p<0.001; shell Z = −4.48, p<0.001). Thus, variability of firing rates during singing reflected the degree of tutor song matching in both core and shell neurons such that responses were less variable for syllable renditions that were more similar to tutor song.

Figure 6. Variability of firing rate was higher during syllable renditions with low tutor similarity in both core and shell neurons.

Figure 6.

(A) Coefficient of variation (CV) of firing rate for core (gray) and shell (red) neurons during production of syllables that had high or low similarity to tutor syllables (top versus bottom 50% of tutor similarity scores). ***p<0.001 between CV during production of syllables representing high versus low similarity to tutor (Wilcoxon signed-rank tests; core n = 87, shell n = 105). (B) Characteristics of syllables in high- versus low-similarity categories. Left: fraction of syllable types represented in high versus low similarity to tutor syllables for each bird for day of singing; each symbol represents the number of syllable types produced in a recording session as a fraction of the total number of syllable types produced. Gray markers represent sessions in which activity of core neurons was collected (n = 17), red markers represent sessions in which activity of shell neurons was collected (n = 14), black markers represent sessions in which activity of both core and shell neurons was collected (n = 11). Right: the average scatter for each syllable type represented in high- versus low- tutor similarity categories for each session, defined as the average of the acoustic distance between each point in the distribution to its centroid (lower values indicate a tighter cluster). Box plots represent median and first/third quartiles.

One possible caveat is that the decreased variability of firing rate for utterances with higher tutor similarity might simply reflect a smaller number of syllable types in this category, whereas low-similarity utterances might include many different syllable types. To test this idea, we assessed the fraction of syllable types (across all syllable renditions) that were produced within the high- versus low-similarity categories for each day of singing and found no significant difference (Figure 6B left; Mann-Whitney U = 693, p=0.06). In addition, there was no difference in the scatter of syllable renditions within a syllable type between the high- and low-similarity to tutor categories (Figure 6B right; Mann-Whitney U = 867.5, p=0.90). In summary, these data indicate that variability in firing rate was higher across the populations of both core and shell neurons during production of syllable renditions with low tutor similarity compared to production of syllable renditions with high tutor similarity.

Behavioral and neural changes during song development

Behavioral expression of learning

The developmental span over which we recorded birds’ vocal behavior (43–60 dph) corresponds with a transition from highly variable ‘subsong’ to ‘plastic song’ with less behavioral variability and a corresponding increase in similarity to tutor song. If the modulation of neural activity by tutor similarity we observed above reflects some aspect of learning, then it should change during the progression of learning. To test this idea, we assessed the degree of song development for each day of singing for each bird by calculating goodness-of-fit coefficients from an exponential fit to the distribution of syllable durations (see Materials and methods; Figure 7—figure supplement 1). Juveniles in subsong (but not plastic song) produce a graded distribution of syllable durations, which is well-fit by an exponential function (lower numbers indicate better fit; (Aronov et al., 2011; Tchernichovski et al., 2004; Aronov et al., 2008; Johnson et al., 2002). Comparing degree of song development with acoustic similarity to tutor across average syllable types yielded a significant correlation (Figure 7A; r = 0.23, p=0.03), supporting the canonical understanding that juvenile birds produce syllable renditions that are more similar to tutor syllables as sensorimotor integration progresses.

Figure 7. Subsets of syllable utterances during early sensorimotor learning had either high or low similarity to tutor syllables.

(A) Average similarity of all juvenile syllable types to corresponding tutor syllables as a function of the progression of song development from subsong to plastic song (goodness-of-fit coefficients plotted against average syllable-type similarity for each bird for each day of singing; see Figure 7—figure supplement 1). (B) Average similarity to tutor across development, plotted as in A, but segregating individual syllable renditions into low similarity (bottom 50% of tutor similarity scores, left panel) and high similarity (top 50% of tutor similarity scores, right panel) for each day of singing for each bird. For tutor similarity scores, 2 = no similarity, 0 = perfect similarity (see Materials and methods).

Figure 7.

Figure 7—figure supplement 1. Measuring degree of song development for each day of singing for each bird.

Figure 7—figure supplement 1.

(A) Left panel: example spectrograms of syllables from the same bird at different ages (47, 49, 50 and 51 dph). Right panel: distributions of syllable durations per day for this bird; red lines indicate exponential fits for each distribution. Coefficients shown for each day are goodness-of-fit coefficients from the Lilliefors test for fit to an exponential function (lower numbers indicate better fit). As birds progress to the plastic song stage of sensorimotor integration, they begin to produce syllables that appear as peaks in the distribution of syllable durations, causing the distributions to no longer be approximated by an exponential function. (B) Tutor similarity scores across all individual syllable renditions of all birds (regardless of whether or not renditions were assigned to syllable types) as a function of song development (0 = perfect similarity, 2 = no similarity; see Materials and methods). Black line indicates linear fit. These data show increases in tutor similarity across development for the population of individual syllable renditions in all birds.

We then examined individual syllable utterances with high versus low tutor similarity within each day of singing. Surprisingly, we found that birds produced many utterances with relatively high tutor similarity during subsong (Figure 7B, right). Furthermore, the level of tutor matching among these utterances was fairly constant across development such that there was no significant correlation between tutor similarity and song development (r = 0.13, p=0.42). This striking finding indicates that some of the highly variable vocalizations produced by birds during early sensorimotor learning are good matches to tutor syllables. Birds in subsong also produced many low-similarity utterances (Figure 7B, left), and the similarity between these poorly-matched renditions and tutor syllables increased as song development progressed, reflecting the expected increase in tutor matching (r = 0.34, p=0.03; cf. Figure 7A). Thus, birds in subsong produce some utterances with tutor similarity as high as that of birds in later stages of song development, and renditions with lower tutor similarity at the beginning of sensorimotor integration are gradually eliminated.

These data raise the possibility that syllable renditions with high versus low tutor similarity (Figure 7B) were distinct in other ways. For example, high-similarity syllable renditions in subsong birds might consist of simple harmonic stacks, whereas low-similarity renditions might consist of syllable types with complex modulations. We examined whether individual syllable types were selectively represented in high- versus low-similarity categories for each day of singing, and found that the syllable type that was produced most frequently in the high-similarity category (highest percentage of renditions among all syllable types) was also produced most frequently in the low-similarity category. The most common syllable type produced was the same for both high- and low-similarity renditions in 57% of vocal recording sessions across all birds, indicating that a single syllable type did not typically predominate in either low- or high- similarity categories. This pattern is consistent with the finding above that syllable renditions within both high- and low-similarity categories represent multiple syllable types (Figure 6B). In addition, qualitative examination revealed that syllable types with the most renditions in high- and low-similarity categories included both simple and complex syllables, although we noted a slight tendency for low-similarity renditions to include more frequency modulation. Thus, syllable renditions with high versus low tutor similarity did not show a strong pattern of syllable type.

Neural representation of performance during learning

How did neural activity in LMAN encode these changes in tutor similarity as learning progressed? Absolute values of response strength across song development increased in both core and shell neurons regardless of whether syllable renditions had high or low similarity to tutor syllables (data not shown). This pattern agrees with previously published work showing increases in firing rate of RA neurons as song learning progresses (Ölveczky et al., 2011), suggesting generic increases in firing rate with development. However, LMAN circuitry is critical for early sensorimotor learning when the number of tutor-tuned shell neurons is high (Achiro and Bottjer, 2013), indicating that the involvement of LMAN may be high initially when the goal representation is strong and decrease as learning progresses. In accord with this idea, correlations between firing rate and tutor similarity across development revealed that the activity of LMAN neurons tracked vocal performance during sensorimotor learning (Figure 8). Both core and shell neurons showed stronger associations between firing rate and tutor similarity during early stages of song development, and weaker associations as sensorimotor learning progressed. These trends were significant in core and shell neurons with positive associations and in core neurons with negative associations; shell neurons with negative slopes showed a weak but non-significant association between firing rate and tutor similarity across development. However, when the two bottom outliers in the panel for SHELL-negative slopes were removed the association was significant, r = 0.35, p=0.006. This pattern indicates that the activity of both core and shell neurons reflects the degree of song learning: tutor similarity is a better predictor of firing rate during early sensorimotor learning when many utterances have low tutor similarity (Figure 7B), and the strength of this association decreases in parallel with developmental increases in motor performance as song learning progresses.

Figure 8. The association between firing rate and tutor similarity decreased in strength with the progression of song development.

Figure 8.

The correlation between baseline-corrected firing rate and tutor similarity (r values, y-axis) is plotted against degree of song development (goodness-of-fit coefficients, x-axis) for neurons with a positive slope between firing rate and tutor similarity (r values > 0; top panels) and for neurons with a negative slope between firing rate and tutor similarity (r values < 0; bottom panels). core (gray, left panels); shell (red, right panels).

The variability of firing rate paralleled the behavioral changes in tutor similarity across development (Figure 7B) in shell but not core neurons. CV of firing rate in shell neurons decreased over the course of song development only during production of syllable renditions that had low similarity to tutor during early sensorimotor learning (Figure 9; low similarity, r = −0.24,p=0.01; high similarity, r = −0.08, p=0.42). Thus, variability of firing in shell neurons did not change for utterances with high similarity to tutor song across development. In contrast, the CV of firing rate in core neurons did not change as a function of song development regardless of whether syllable renditions had high or low tutor similarity (Figure 9; low similarity,r = 0.04, p=0.73; high similarity, r = 0.05, p=0.61). These data show that the variability of firing in shell neurons reflected the progression of learning, with higher neural variability early in development for renditions with lower similarity to tutor song, and lower neural variability later in development as the incidence of poorly-matched syllable renditions decreased.

Figure 9. Variability of firing rate decreased in shell neurons during development for syllable renditions with low tutor similarity during subsong.

Figure 9.

Top panels: correlation of CV of firing rate during production of syllable renditions in the bottom 50% of tutor similarity with the progression of song development. Bottom panels: correlation of CV of firing rate during production of syllable renditions in the top 50% of tutor similarity with the progression of song development. Black lines indicate linear fits. core (gray, left panels); shell (red, right panels).

Discussion

We tested whether spiking activity in LMAN of juvenile birds reflects the acoustic similarity of self-generated vocal utterances to memorized tutor syllables. Neurons in both core and shell subregions of LMAN showed singing-related activity that varied as a function of tutor song similarity during early stages of sensorimotor integration. This pattern represents the first discovery of an online correlate of song performance during learning in juvenile birds and suggests LMAN as a key component of circuitry that evaluates current behavior in relation to a goal behavior during procedural learning. Although neural activity across the population in both core and shell was modulated by degree of similarity between self-generated sounds and memorized tutor sounds, this trend was stronger across the population of shell neurons. As in prior work, we found that spiking activity in core neurons supports the idea that core drives vocal motor output in juvenile birds (Aronov et al., 2008). In contrast, neural activity in shell neurons did not exhibit coordinated premotor increases in firing rate, which is consistent with the lack of projections from shell to downstream motor circuitry and the absence of motor disruption following lesions to the shell pathway (Bottjer and Altenau, 2010; Bottjer et al., 2000) (Figure 1).

One interpretation of this pattern of results is that shell circuitry acts primarily as a ‘critic’ to evaluate comparisons between self-generated and tutor sounds, while core circuitry serves as an ‘actor’ that drives vocal motor output and receives instruction from the critic in order to bias action selection over the course of learning (Barto et al., 1983; Graybiel, 2008). The idea of an instructive function by ‘critic’ circuitry is complicated (here as elsewhere), since the recurrent loop architecture of cortico-basal ganglia circuits means that error or reinforcement signals (which are themselves widely distributed) may be propagated into multiple pathways (Lau et al., 2017). Thus in the current results it is difficult to know whether evaluative signals may originate in shell and be transmitted to core in order to instruct future motor actions. We favor this idea as a working model; the current data serve as a foundation for tests of this idea in future studies.

Parallels between behavioral and neural indices of learning

Interestingly, some individual syllable renditions were similar to tutor syllables even in subsong birds during early sensorimotor integration, indicating a surprising capacity to produce relatively mature syllabic utterances. This result is reminiscent of the finding that juvenile birds exposed to adult females are able to produce more stereotyped song patterns than normally seen in young birds (Kojima and Doupe, 2011). The ability to produce syllable renditions that represent a range of matches to the goal (i.e. both higher and lower similarity) may be an important component of the behavioral variability that is necessary for skill acquisition involving reinforcement learning.

Only syllable utterances with low levels of tutor matching during subsong showed a progressive increase in tutor similarity over the course of development, and the activity of shell but not core neurons reflected this difference in behavioral trajectory. Developmental changes in firing rate variability mirrored that seen for behavioral development only in shell (Figures 7 and 9): firing rate CV in shell neurons decreased throughout development only among utterances that showed an increase in similarity over the course of learning. In contrast, variability of firing in core neurons did not change developmentally. The decrease in variable firing in shell neurons in parallel with the decrease in incidence of poorly matched syllable renditions indicates that the activity of shell neurons tracks the degree of song matching, consistent with the hypothesis that these neurons are involved in comparing song behavior to a tutor song memory.

As vocal behavior was refined to achieve a more accurate imitation of tutor song, the correlation between tutor similarity and firing rate decreased in both core and shell neurons (Figure 8). The tendency toward weaker encoding of tutor similarity by firing rate as song learning progresses coincides with the loss of a large population of shell neurons that respond selectively to playback of tutor song (Achiro and Bottjer, 2013). Thus, tutor similarity is a better predictor of firing rate during early learning when behavioral variability is high and shell contains a large population of tutor-selective neurons. The loss of tutor-tuned neurons along with a progressive weakening of the association between firing rate and tutor similarity suggests that birds may be switching to a different strategy such as comparing their own utterances to a ‘self’ template more than to a tutor template. In accord with this idea, neural selectivity for self-generated sounds increases in core neurons between 45 and 60 dph (Achiro and Bottjer, 2013; Doupe, 1997; Doupe and Solis, 1997; Solis and Doupe, 1997). These patterns raise an important question: do individual shell neurons in which firing rate reflected tutor similarity (10.8%, Table 2) fall within the population of tutor-tuned neurons? Tutor-tuned neurons in juvenile LMAN-shell may act as a gate or filter for self-generated utterances that are tutor-similar, suggesting that evaluation of tutor similarity within LMAN may be selectively vested in the firing rate of neurons within this subtype during early sensorimotor integration.

Tutor similarity could be encoded by either increases or decreases in firing rate for both core and shell neurons. This intriguing aspect of the current results is subject to different interpretations. One possibility is that increases in firing rates for syllables with higher tutor similarity (positive slopes) could provide a reinforcement signal to increase the probability of producing that vocal pattern, whereas increases in firing rates for syllables with lower tutor similarity (negative slopes) could provide an error signal to decrease the probability of making that incorrect sound; both types of information could be conveyed to downstream neurons and used to guide accurate refinement of syllables during learning.

Developmental aspects of skill learning

The construction of cortical circuits has been studied extensively in primary sensory cortex of mammals (Katz and Shatz, 1996; Espinosa and Stryker, 2012; Levelt and Hübener, 2012), but few studies have probed developmental changes in cortico-basal ganglia circuits as part of the mechanisms of skill learning. The involvement of LMAN circuitry in goal-oriented learning is likely to be strongly dependent on developmental changes that occur in these pathways. In juveniles, but not adults, many core neurons that project to vocal motor cortex send a collateral branch into the shell pathway (Figure 1—figure supplement 3) (Miller-Sims and Bottjer, 2012). This transient pathway may transmit a copy of the premotor signal generated by core neurons to shell neurons. In addition, the incidence of shell neurons tuned to learned tutor sounds decreases sharply (Achiro and Bottjer, 2013). Because the overall volume of shell regresses sharply during development, it may be that tutor-tuned neurons are eliminated due to naturally-occurring cell death (Johnson and Bottjer, 1992; Johnson et al., 1995), thereby helping to close the sensitive period for learning. Topographic specificity develops in the projection from LMAN core to RA (vocal motor cortex) during early development and is dependent on normal auditory experience (Iyengar and Bottjer, 2002b). In addition, a high proportion of synapses formed by thalamic axons in LMAN are ‘silent synapses’ in early development (Bottjer, 2005); a decrease in their number may contribute to a period of synaptic refinement to confer greater specificity of connectivity in thalamo-cortical connections (Iyengar and Bottjer, 2002a; Nixdorf-Bergweiler, 2001); cf. Garst-Orozco et al., 2014) as well as to curtail the sensitive period (Huang et al., 2015). These developmental changes suggest that differences in the extent and plasticity of neural mechanisms may be causally related to goal-oriented learning during development, and help to explain decreases in behavioral plasticity in older animals.

Relationship of core and shell circuitry to mammalian cortico-basal ganglia pathways

A functional segregation of vocal learning into parallel core and shell cortico-basal ganglia loops is reminiscent of corresponding architecture in sensorimotor and associative cortico-basal ganglia loops of mammals. Studies in mammals have shown that neurons in associative loops show increased modulation early in learning of goal-directed tasks, whereas sensorimotor circuits increase their activity throughout training and may encode learned motor performance (Joel and Weiner, 1997; Histed et al., 2009; Yin et al., 2009; Thorn et al., 2010; Gremel and Costa, 2013; Kim et al., 2013; Samejima and Doya, 2007; Graybiel, 2008; Yin et al., 2008; Ashby et al., 2010; Redgrave et al., 2010; Ito and Doya, 2015; Lehéricy et al., 2005; Thorn and Graybiel, 2014; Atallah et al., 2007; Nakahara et al., 2001; Parent and Hazrati, 1995). Such evidence has suggested that associative loops function to evaluate motor performance during early stages of learning and sensorimotor loops encode behaviors as they become more habitual (Makino et al., 2016).

The overall patterns of neural activity we observed in shell and core are consistent with recent data showing that both associative and sensorimotor cortico-striatal circuits are engaged in skill acquisition, but that associative circuits disengage early in learning whereas sensorimotor circuits remain engaged (Kupferschmidt et al., 2017). The decline in spiking variability in shell neurons is consistent with a decreased involvement of associative circuits during early skill acquisition (Thorn et al., 2010; Yin et al., 2009). In addition, although variability of firing rate did not change developmentally in core neurons, firing rate CV was higher for syllable renditions with low tutor similarity in core neurons when averaged across all ages (Figure 6). This pattern suggests that core neurons may retain an exploratory (variable) mode (Graybiel, 2005), maintaining a higher CV for poorly-matched syllable renditions and a lower CV for well-matched renditions (Figure 6). Such an outcome would be consistent with the idea that core neurons retain some level of engagement at later stages of sensorimotor learning. Overall, the present results support the idea that motor learning across taxa entails the integrative product of multiple circuits whose functions reflect specific aspects during the progression of learning.

Materials and methods

Subjects

All procedures were performed in accordance with Protocol 9159 approved by the University of Southern California Animal Care and Use Committee and in accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. Ten juvenile male zebra finches (Taeniopygia guttata) were used, 43–60 days post hatch (dph) at the time of neural recordings (30 min recording sessions were collected across this age span on an average of 4 different days for each bird; thus each neuron represents a single 30 min recording session; see below). The period of sensorimotor integration begins in zebra finches when young males produce variable babbling sounds (subsong) starting at approximately 35 dph, and continues until the birds begin to produce a stereotyped imitation of the tutor song around 70–90 dph. Birds were raised under naturalistic conditions (by their parents within group aviaries) until they were at least 38 dph, ensuring normal tutor song exposure and social experience (Böhner, 1990; Immelmann, 1969; Roper and Zann, 2006; Böhner, 1983; Eales, 1985; Clayton, 1987; Mann and Slater, 1995; Chen et al., 2016).

Electrophysiology

Birds were anesthetized with isoflurane (1.5% inhalation) and an electrode assembly consisting of seven tungsten-wire stereotrodes was implanted into LMAN core and shell; each stereotrode consisted of two twisted-pair polyesteramide-insulated and imide-overcoated tungsten wires (diameter 25 μm, California Fine Wire Company, Grover Beach, CA) routed through fused silica capillary tubing. One stereotrode was implanted 1.5 mm dorsal to LMAN to serve as a reference electrode, and silver wire (diameter 250 µm) was placed between the skull and skin for the animal ground. Signals were acquired through a unity gain headstage (Neuralynx, Bozeman, MT) with a flexible cable that connected the electrode assembly to a commutator (Neuralynx) and two 8-channel amplifiers (Lynx8, Neuralynx). Vocalizations were recorded through a microphone mounted in the cage (Sanken COS-11D) and saved coincident with neural activity (band passed 300–6000 Hz; digitized at 32 kHz; Spike two software, Cambridge Electronic Design, UK). At the end of each experiment, small electrolytic lesions (7 µA for 20 s) were made to confirm recording locations. To verify the borders of LMAN core and shell, 50-µm-thick coronal sections were Nissl-stained or stained with a monoclonal antibody against calbindin-D-28K (Sigma-Aldrich Cat# C9848, RRID:AB_476894) using standard immunohistochemical procedures; calbindin expression specifically labels thalamic axons terminating in LMAN (Pinaud et al., 2007; Achiro and Bottjer, 2013) (Figure 1). The border between core and shell subregions of LMAN can be distinguished based on the density of magnocellular somata, and the outer borders of shell can be distinguished from surrounding regions based on the limit of calbindin expression (Figure 1). Recording sites were considered for analysis if they were confirmed histologically to be in either core or shell, excluding 50 µm on either side of the core-shell border (see Achiro and Bottjer, 2013).

Single units were isolated offline from stereotrode recordings made during each 30 min session using KlustaKwik (Ken Harris, Rutgers University) for automatic clustering as described previously (Achiro and Bottjer, 2013), and refined manually using MClust (A. David Redish, University of Minnesota) in MATLAB (Mathworks, Natick, MA, California). Single units were included if signal to noise ratio was >3 and if less than 1% of spikes had an interspike interval <2 ms (n = 127 neurons from core, 171 neurons shell). To compute an appropriate sample size, we used preliminary electrophysiological data to measure average response strength that was different from zero during singing (including all neurons regardless of whether they showed a significant response during singing by t-test, see below). One-sample, two-tailed power analysis indicated we would need a sample size of 97 core neurons and 45 shell neurons in order to determine that neurons in each subregion showed a significant response during singing at 90% power with a 0.05 two-sided significance level:

n= (σz(1 α2)+ z(1β)μ μ0)2, z= μ μ0σn

where n is sample size, µ is mean, µ0 is 0, σ is standard deviation, α is Type I error and 1 – β is power.

Analysis of neural activity

Analyses of neural activity were based on all neurons from the total of all daily 30-min recording sessions across birds and days (each neuron was recorded for 30 min on 1 day). Baseline periods were defined as periods of silence (non-singing) lasting at least 2 s that were 2 s or more away from singing, calls or movement/cage noise. A neuron was considered responsive if the average firing rate (spikes/s) during singing showed a significant change from the average firing rate during baseline (independent t-test, due to differing number of baseline and singing episodes, p<0.05). Neurons in both core and shell showed significant modulation of firing rate during singing and/or the 50 ms interval prior to syllable onsets compared with quiet baseline periods (core: 81%, 103/127 neurons; shell: 82%, 141/171 neurons).

Analysis of pre-singing related activity was restricted to cells that showed a significant increase in average firing rate between baseline periods and the 50 ms prior to onsets of all syllables (independent t-tests, p<0.05). The proportions of CORE versus SHELL neurons showing an increase in average firing rate during this 50 ms period did not differ [CORE 48% (61/127 neurons), SHELL 40% (68/171 neurons); chi-square test = 2.27, p=0.13]. To analyze the temporal pattern of firing leading up to syllable onsets across these neurons, histograms of population activity were made by calculating the mean-subtracted firing rate for each neuron: the average firing rate was calculated during ±200 ms surrounding syllable onsets in bin sizes of 2 ms and the rate during each bin was subtracted from the mean firing rate over all bins for each cell before smoothing with a Gaussian (40 ms smoothing); population functions were generated by averaging across neurons (Goldberg and Fee, 2012). To determine significance, we used the inverse student t cumulative distribution function to get a t-statistic for 95% probability. We calculated the two-tailed critical value of the t distribution for α = 0.05 with degrees of freedom equal to the number of neurons minus one; a bin of the population histogram was deemed significant if the critical value of the t distribution multiplied by the population histogram ±the s.e.m. for that bin was less than or greater than zero.

To compare changes in activity during singing across neurons with differing firing rates, responses for each neuron were calculated as standardized response strength:

standardizedresponsestrength=S--B-VarS+VarB-2*Covar(S,B)*n

where S is the firing rate during singing, B is the firing rate during local baseline periods (the average of the two baseline periods nearest in time to each singing episode), and n is the number of singing/baseline pairs. A positive value indicates an increased firing rate during singing, and a negative value indicates a decreased firing rate during singing compared to baseline periods. We refer to this standardized measure throughout the text as response strength. To measure the incidence of bursts during singing episodes and local baselines, we calculated a burst fraction by measuring the percentage of spike events with an interspike interval of less than 10 ms (Wilcoxon signed-rank tests, due to one average local baseline period per singing episode,p<0.05).

Analysis of singing behavior

Because juvenile birds in early stages of song learning do not produce stereotyped syllable sequences (song motifs), episodes of singing were defined as periods of continuous singing separated by gaps of at least 300 ms. Episodes of singing and individual syllables contained within episodes were detected automatically using amplitude threshold crossings and checked manually to remove cage noise and to adjust syllable start or stop boundaries where needed.

Classification of syllable types

Juvenile birds produce highly variable sequences of syllables, making it impossible to align the temporal pattern of neural activity across song motifs (Tchernichovski et al., 2001). However, one can align spiking activity of single neurons to multiple renditions of individual syllables of the same type. Because of the high variability of juvenile syllables, we created custom software in MATLAB using many features created for Sound Analysis Pro (Tchernichovski et al., 2000) to automatically classify syllables in juvenile birds, available online (Shen, 2017https://github.com/BottjerLab/Acoustic_Similarity. A copy is archived at https://github.com/elifesciences-publications/Acoustic_Similarity). To assign syllables to different types, we employed a combination of two measures of the acoustic distance between syllables that was then used to cluster syllables.

The first distance measure was based on summary statistics of the following syllable features: amplitude modulation, frequency modulation, center frequency, fundamental frequency, length, maximum frequency modulation within each frequency, maximum amplitude modulation across each frequency’s power, pitch goodness (estimate of the amount of periodic energy), total power, derivative of total power, and Wiener entropy (estimate of spectral disorder). We calculated the following summary statistics for each feature (except length) over the duration of each syllable: mean, standard deviation, maximum, minimum, onset (average of samples 1–3%), middle (average of samples 49–51%), offset (average of samples 97–100%), correlation coefficient of the linear trend, time of maximum, and time of minimum. Each syllable was thus represented as a point in high-dimensional space where each dimension was a summary statistic of one of the features (n = 101 feature values total). We then calculated the Euclidian distances between each point.

The second distance measure was based on time-varying changes in song features; we calculated the following syllable features for each time point in the syllable (9.27 ms window size, 7.91 ms overlap): Wiener entropy, frequency modulation, amplitude modulation, fundamental frequency and goodness of pitch (Tchernichovski et al., 2000)(Figure 10A). Syllables were represented as multi-dimensional feature vectors in time. The distance between the vectors was calculated with a dynamic time-warping algorithm including a warping penalty (Vintsyuk, 1972). The time-warping algorithm tolerates small perturbations in the timing of sub-syllabic events common to juvenile syllables.

Figure 10. Multiple acoustic features were used to cluster syllable types.

(A) Examples of the features used to calculate acoustic similarity in order to assign juvenile syllable renditions to types (clusters). Top row shows spectrograms for five renditions of a syllable type labeled e from a 59 dph bird (also shown in B). Below are plots of three features calculated across time for each syllable; a total of 11 features were used to generate an acoustic distance score for clustering of syllables into types (see Materials and methods). (B) Spectrograms for the five syllable types from this bird, a-e, resulting from automatic clustering of syllables.

Figure 10.

Figure 10—figure supplement 1. Examples of syllable renditions with either high or low acoustic similarity to the closest-matching tutor syllable.

Figure 10—figure supplement 1.

The tutor syllable was selected by finding the closest acoustic distance between each juvenile syllable rendition and the tutor syllables for that bird (see Materials and methods).

In order to normalize the measures, we converted each distance measure to percentiles based on the empirical distribution (computed over all pairs of syllables). To obtain a combined distance measure we calculated the geometric mean of the two percentile distance measures for each syllable pair. We then generated a matrix consisting of the combined distance measure for each syllable to all other syllables. A final distance score was calculated as a dissimilarity index by taking one minus the correlation between points of the combined distance measure matrix, and ranged between 0 (perfect similarity) and 2 (no similarity; i.e., if the syllables were completely anticorrelated, the correlation would be −1 and the distance would be 1 - (−1)=2). Syllables were clustered into 4–25 types by hierarchical agglomerative clustering using the dissimilarity index as the distance metric and complete linkage as the linkage criterion (Figure 10B). We then manually selected the number of types for each bird for each day, confirmed cluster quality, and merged clusters which were similar. Some clusters were rejected due to high variability; thus not all syllables were clustered into types, especially those from recordings of birds producing more immature vocalizations. The total percent of classified syllables (assigned to clusters) was 49% overall (across song recordings for all birds and ages); 38% of syllables were classified from song recordings in early stages of development (representing the bottom half of song recordings by goodness-of-fit coefficients of exponential fit to syllable duration distributions, see ‘Song development’ below) and 57% of syllables were classified for song recordings in later stages of song development (representing the top half of song recordings by goodness-of-fit coefficient).

We calculated firing rates during each syllable produced and the 50 ms prior to syllable onsets in order to include pre-syllable-related activity; this syllable-based firing rate was used in analyses of neural selectivity for syllable types and similarity to tutor song (see below). The average gap between syllables in juvenile birds is ~60 ms (Glaze and Troyer, 2013; Aronov et al., 2011), making it unlikely that activity during a previous syllable was included in these firing rates. In order to align syllables to construct PSTH’s of spiking activity, we linearly time warped each spike train to the average length for that syllable type following procedures of (Kao et al., 2008). We calculated the average duration of all syllables within each cluster and used that value as the ‘reference duration’. Then, each syllable in the cluster was linearly stretched/compressed (syllables were aligned at onsets) and spike trains were projected onto the time-warped axis for each syllable. To determine selectivity of responses to specific syllable types, we calculated a sparseness/activity fraction (AF) (Meliza and Margoliash, 2012; Vinje and Gallant, 2000) for all neurons which responded significantly to at least one syllable type:

AF=1 [(rin)2/(ri2n)]1 1n

where r is the firing rate to the ith syllable type and n is the number of syllable types. Thus a score of 0 indicates no selectivity for a specific syllable type (equal firing rates across all syllable types) and one indicates maximum selectivity (change in firing rate during only one syllable type).

Similarity to tutor song

Analyses of similarity to tutor song included all neurons for which at least 40 classified syllables (syllables which were able to be clustered into a given type) were produced during each neuron’s 30-min recording period. We used only classified syllable renditions for the tutor similarity analyses in order to use the same set of neurons and syllables for an analysis of prototypicality as a control (see below). We tested whether using only classified syllables influenced the tutor similarity results by examining the outcome using all syllables (both classified and not classified) and observed the same trends (data not shown); thus, use of classified syllables did not bias the results. These analyses included both singing-excited and singing-suppressed neurons, as well as neurons that did not show a significant change in firing rate during singing episodes (the latter cases ensured inclusion of cells that showed excitation during syllables with high similarity and suppression during syllables with low similarity, for example). To evaluate the similarity between juvenile syllables and tutor syllables we employed the final combined distance score described above. Similarity to tutor was calculated as the acoustic distance score between each syllable and its closest tutor syllable. We defined what constituted the closest tutor syllable in two ways: either as the tutor syllable which was closest in distance to the center of the syllable cluster to which each rendition belonged, or as the tutor syllable which was simply closest in distance to each rendition. These two approaches yielded highly similar results, and we therefore chose the latter method to reduce any effects of the syllable clustering routine on tutor similarity calculations. All syllable renditions produced during each neuronal recording were ranked by tutor similarity; for many analyses, a median split was employed in which the ranked juvenile syllables for each neuron were divided into the top 50% versus bottom 50% of tutor similarity – these two categories are referred to as high and low similarity, respectively (Figure 10—figure supplement 1).

For each neuron, we calculated the linear regression between baseline-corrected firing rate during each syllable rendition and similarity to closest tutor syllable (Figure 4B). We used repeated permutation tests to estimate the fraction of neurons with significant correlations (O'Connor et al., 2010): for each neuron, we generated 1000 permutations by randomly shuffling the relationship between firing rate and tutor similarity and calculating the resultant r values. This provided a distribution of values under the null hypothesis of no relationship for each test. The fraction of neurons exceeding chance for each permutation test was computed as the fraction of actual r values falling above the. 975 percentile (positive correlations) or below the. 025 percentile (negative correlations) of the null distribution, and the average of this significant fraction across all permutation tests was taken as the fraction of neurons with significant r values.

To assess the association of baseline-corrected firing rates on tutor similarity at the population level, we employed a mixed-effects linear regression model of baseline-corrected firing rates across all neurons with fixed and random effects for tutor similarity nested within a random intercept for neurons using an unstructured covariance matrix; because firing rate was measured at multiple tutor similarities within each neuron, our model fit a random intercept for neurons, with tutor similarity treated as both a fixed and random slope.

To examine if firing rate variability was modulated by similarity to tutor song, we calculated the CV of firing rate during production of syllables representing high and low similarity to corresponding tutor syllables (top and bottom 50% of syllables ranked by tutor similarity). Neurons were included in these analyses if at least 40 classified syllables were produced during the 30-min recording period and if the mean firing rate was non-zero for the most/least prototypical and most/least similar to tutor song syllables (because neurons with mean firing rates of zero would give undefined CV values).

Syllable prototypicality

We calculated a prototypicality score for each syllable based on (Niziolek et al., 2013), which measures whether renditions are similar to the center of that syllable’s distribution (more prototypical) or less similar to the center of the distribution (less prototypical). We computed the acoustic distance between each syllable rendition and the center of the syllable cluster to which it belonged, again employing the combined distance score described above. This measure served as a control for effects based on tutor similarity to assess whether correlations based on firing rate reflected prototypical utterances; we used the same mixed-effects regression model as above to assess a relationship between baseline-corrected firing rates and prototypicality. As indicated above, this analysis also included all neurons for which at least 40 classified syllables were produced during individual recording periods; in this way analyses of tutor similarity and prototypicality included the same set of neurons and syllables.

Song development

In order to assess how developed each bird’s song was for each day, we utilized methods previously described to define song stage (Aronov et al., 2011). Juvenile birds in the subsong stage of sensorimotor integration produce syllables of variable lengths, the distribution of which is well-fitted by an exponential function. As birds progress to the plastic song phase, they produce more regular syllable types which begin to appear as peaks in the distribution of syllable durations, and therefore are no longer well-fit by an exponential (Aronov et al., 2011; Tchernichovski et al., 2004). Based on this evidence, we fit an exponential to all syllable duration distributions for each bird, for each day of singing (500–8,000 syllables). We used the Lilliefors test (MATLAB) to quantify goodness-of-fit, and the resulting coefficient was scaled by the number of syllables. Therefore, smaller goodness-of-fit coefficients indicate less developed songs (i.e. subsong) and larger coefficients indicate more developed songs (i.e. plastic song). Response strength, CV, and r values (correlations between baseline-corrected firing rate and tutor similarity) for all neurons across all classified syllables (syllables assigned to clusters) produced in each recording session were analyzed as a function of song development using goodness-of-fit coefficients.

Statistics

Kolmogorov-Smirnov and Shapiro-Wilk tests were used to test for normality; t-tests were used to compare means for normally distributed data, and Mann-Whitney tests were used for non-normally distributed data. Differences in proportions were tested using chi-square tests. Correlations were performed using Pearson’s correlation. The significance of mean-subtracted bins for pre-singing activity (Figure 3) was calculated as the 95% confidence interval outside of zero and is described above. Specific statistical tests used are identified in context in the Results.

Acknowledgements

This work was supported by NINDS grants NS 037547 and NS 087506, NIDCD Training Grant DC 009975, and NINDS Training Fellowship NS 073323. The authors declare no competing financial interests. We thank Rachel Yuan for comments on the manuscript, Arthur Shau for technical assistance, and Nicholas Jackson for expert statistical advice.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Sarah W Bottjer, Email: sarahbottjer@gmail.com.

Ronald L Calabrese, Emory University, United States.

Funding Information

This paper was supported by the following grants:

  • NIH Office of the Director Research grant NS087506 to Sarah Bottjer.

  • NIH Office of the Director Training grant DC009975 to Sarah Bottjer.

  • NIH Office of the Director Training fellowship NS 073323 to Jennifer M Achiro.

  • NIH Office of the Director Research grant 037547 to Sarah Bottjer.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Validation, Visualization, Writing—original draft.

Software.

Conceptualization, Supervision, Funding acquisition, Project administration, Writing—review and editing.

Ethics

Animal experimentation: All procedures were performed in accordance with Protocol #9159 approved by the University of Southern California Animal Care and Use Committee and in accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health.

Additional files

Transparent reporting form
DOI: 10.7554/eLife.26973.022

References

  1. Achiro JM, Bottjer SW. Neural representation of a target auditory memory in a cortico-basal ganglia pathway. Journal of Neuroscience. 2013;33:14475–14488. doi: 10.1523/JNEUROSCI.0710-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexander GE, Crutcher MD. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends in Neurosciences. 1990;13:266–271. doi: 10.1016/0166-2236(90)90107-L. [DOI] [PubMed] [Google Scholar]
  3. Aronov D, Andalman AS, Fee MS. A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science. 2008;320:630–634. doi: 10.1126/science.1155140. [DOI] [PubMed] [Google Scholar]
  4. Aronov D, Veit L, Goldberg JH, Fee MS. Two distinct modes of forebrain circuit dynamics underlie temporal patterning in the vocalizations of young songbirds. Journal of Neuroscience. 2011;31:16353–16368. doi: 10.1523/JNEUROSCI.3009-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ashby FG, Turner BO, Horvitz JC. Cortical and basal ganglia contributions to habit learning and automaticity. Trends in Cognitive Sciences. 2010;14:208–215. doi: 10.1016/j.tics.2010.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Atallah HE, Lopez-Paniagua D, Rudy JW, O'Reilly RC. Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nature Neuroscience. 2007;10:126–131. doi: 10.1038/nn1817. [DOI] [PubMed] [Google Scholar]
  7. Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics. 1983;SMC-13:834–846. doi: 10.1109/TSMC.1983.6313077. [DOI] [Google Scholar]
  8. Bottjer SW, Alderete TL, Chang D. Conjunction of vocal production and perception regulates expression of the immediate early gene ZENK in a novel cortical region of songbirds. Journal of Neurophysiology. 2010;103:1833–1842. doi: 10.1152/jn.00869.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bottjer SW, Altenau B. Parallel pathways for vocal learning in basal ganglia of songbirds. Nature Neuroscience. 2010;13:153–155. doi: 10.1038/nn.2472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bottjer SW, Brady JD, Cribbs B. Connections of a motor cortical region in zebra finches: relation to pathways for vocal learning. The Journal of Comparative Neurology. 2000;420:244–260. doi: 10.1002/(SICI)1096-9861(20000501)420:2&#x0003c;244::AID-CNE7&#x0003e;3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  11. Bottjer SW, Halsema KA, Brown SA, Miesner EA. Axonal connections of a forebrain nucleus involved with vocal learning in zebra finches. The Journal of Comparative Neurology. 1989;279:312–326. doi: 10.1002/cne.902790211. [DOI] [PubMed] [Google Scholar]
  12. Bottjer SW, Miesner EA, Arnold AP. Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science. 1984;224:901–903. doi: 10.1126/science.6719123. [DOI] [PubMed] [Google Scholar]
  13. Bottjer SW. Developmental regulation of basal ganglia circuitry during the sensitive period for vocal learning in songbirds. Annals of the New York Academy of Sciences. 2004;1016:395–415. doi: 10.1196/annals.1298.037. [DOI] [PubMed] [Google Scholar]
  14. Bottjer SW. Silent synapses in a thalamo-cortical circuit necessary for song learning in zebra finches. Journal of Neurophysiology. 2005;94:3698–3707. doi: 10.1152/jn.00282.2005. [DOI] [PubMed] [Google Scholar]
  15. Böhner J. Song learning in the zebra finch (taeniopygia guttata): Selectivity in the choice of a tutor and accuracy of song copies. Animal Behaviour. 1983;31:231–237. doi: 10.1016/S0003-3472(83)80193-6. [DOI] [Google Scholar]
  16. Böhner J. Early acquisition of song in the zebra finch, Taeniopygia guttata. Animal Behaviour. 1990;39:369–374. doi: 10.1016/S0003-3472(05)80883-8. [DOI] [Google Scholar]
  17. Chen Y, Matheson LE, Sakata JT. Mechanisms underlying the social enhancement of vocal learning in songbirds. PNAS. 2016;113:6641–6646. doi: 10.1073/pnas.1522306113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Clayton NS. Song tutor choice in zebra finches. Animal Behaviour. 1987;35:714–721. doi: 10.1016/S0003-3472(87)80107-0. [DOI] [PubMed] [Google Scholar]
  19. Doupe AJ, Solis MM. Song- and order-selective neurons develop in the songbird anterior forebrain during vocal learning. Journal of Neurobiology. 1997;33:694–709. doi: 10.1002/(SICI)1097-4695(19971105)33:5&#x0003c;694::AID-NEU13&#x0003e;3.0.CO;2-9. [DOI] [PubMed] [Google Scholar]
  20. Doupe AJ. Song- and order-selective neurons in the songbird anterior forebrain and their emergence during vocal development. Journal of Neuroscience. 1997;17:1147–1167. doi: 10.1523/JNEUROSCI.17-03-01147.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Eales LA. Song learning in zebra finches: some effects of song model availability on what is learnt and when. Animal Behaviour. 1985;33:1293–1300. doi: 10.1016/S0003-3472(85)80189-5. [DOI] [Google Scholar]
  22. Elliott KC, Wu W, Bertram R, Johnson F. Disconnection of a basal ganglia circuit in juvenile songbirds attenuates the spectral differentiation of song syllables. Developmental Neurobiology. 2014;74:574–590. doi: 10.1002/dneu.22151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Espinosa JS, Stryker MP. Development and plasticity of the primary visual cortex. Neuron. 2012;75:230–249. doi: 10.1016/j.neuron.2012.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gale SD, Person AL, Perkel DJ. A novel basal ganglia pathway forms a loop linking a vocal learning circuit with its dopaminergic input. The Journal of Comparative Neurology. 2008;508:824–839. doi: 10.1002/cne.21700. [DOI] [PubMed] [Google Scholar]
  25. Garst-Orozco J, Babadi B, Ölveczky BP. A neural circuit mechanism for regulating vocal variability during song learning in zebra finches. eLife. 2014;3:e03697. doi: 10.7554/eLife.03697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Glaze CM, Troyer TW. Development of temporal structure in zebra finch song. Journal of Neurophysiology. 2013;109:1025–1035. doi: 10.1152/jn.00578.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Goldberg JH, Fee MS. A cortical motor nucleus drives the basal ganglia-recipient thalamus in singing birds. Nature Neuroscience. 2012;15:620–627. doi: 10.1038/nn.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Graybiel AM. The basal ganglia: learning new tricks and loving it. Current Opinion in Neurobiology. 2005;15:638–644. doi: 10.1016/j.conb.2005.10.006. [DOI] [PubMed] [Google Scholar]
  29. Graybiel AM. Habits, rituals, and the evaluative brain. Annual Review of Neuroscience. 2008;31:359–387. doi: 10.1146/annurev.neuro.29.051605.112851. [DOI] [PubMed] [Google Scholar]
  30. Gremel CM, Costa RM. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nature Communications. 2013;4:2264. doi: 10.1038/ncomms3264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Histed MH, Pasupathy A, Miller EK. Learning substrates in the primate prefrontal cortex and striatum: sustained activity related to successful actions. Neuron. 2009;63:244–253. doi: 10.1016/j.neuron.2009.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Huang X, Stodieck SK, Goetze B, Cui L, Wong MH, Wenzel C, Hosang L, Dong Y, Löwel S, Schlüter OM. Progressive maturation of silent synapses governs the duration of a critical period. PNAS. 2015;112:E3131–E3140. doi: 10.1073/pnas.1506488112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Immelmann K. Song development in the zebra finch and other estrildid finches. In: Hinde R. A, editor. Bird Vocalizations. Cambridge University Press; 1969. pp. 61–74. [Google Scholar]
  34. Ito M, Doya K. Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. Journal of Neuroscience. 2015;35:3499–3514. doi: 10.1523/JNEUROSCI.1962-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Iyengar S, Bottjer SW. Development of individual axon arbors in a thalamocortical circuit necessary for song learning in zebra finches. Journal of Neuroscience. 2002a;22:901–911. doi: 10.1523/JNEUROSCI.22-03-00901.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Iyengar S, Bottjer SW. The role of auditory experience in the formation of neural circuits underlying vocal learning in zebra finches. Journal of Neuroscience. 2002b;22:946–958. doi: 10.1523/JNEUROSCI.22-03-00946.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Iyengar S, Viswanathan SS, Bottjer SW. Development of topography within song control circuitry of zebra finches during the sensitive period for song learning. Journal of Neuroscience. 1999;19:6037–6057. doi: 10.1523/JNEUROSCI.19-14-06037.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Joel D, Weiner I. The connections of the primate subthalamic nucleus: indirect pathways and the open-interconnected scheme of basal ganglia-thalamocortical circuitry. Brain Research Reviews. 1997;23:62–78. doi: 10.1016/S0165-0173(96)00018-5. [DOI] [PubMed] [Google Scholar]
  39. Johnson F, Bottjer SW. Growth and regression of thalamic efferents in the song-control system of male zebra finches. The Journal of Comparative Neurology. 1992;326:442–450. doi: 10.1002/cne.903260309. [DOI] [PubMed] [Google Scholar]
  40. Johnson F, Sablan MM, Bottjer SW. Topographic organization of a forebrain pathway involved with vocal learning in zebra finches. The Journal of Comparative Neurology. 1995;358:260–278. doi: 10.1002/cne.903580208. [DOI] [PubMed] [Google Scholar]
  41. Johnson F, Soderstrom K, Whitney O. Quantifying song bout production during zebra finch sensory-motor learning suggests a sensitive period for vocal practice. Behavioural Brain Research. 2002;131:57–65. doi: 10.1016/S0166-4328(01)00374-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kao MH, Wright BD, Doupe AJ. Neurons in a forebrain nucleus required for vocal plasticity rapidly switch between precise firing and variable bursting depending on social context. Journal of Neuroscience. 2008;28:13232–13247. doi: 10.1523/JNEUROSCI.2250-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Katz LC, Shatz CJ. Synaptic activity and the construction of cortical circuits. Science. 1996;274:1133–1138. doi: 10.1126/science.274.5290.1133. [DOI] [PubMed] [Google Scholar]
  44. Kim H, Lee D, Jung MW. Signals for previous goal choice persist in the dorsomedial, but not dorsolateral striatum of rats. Journal of Neuroscience. 2013;33:52–63. doi: 10.1523/JNEUROSCI.2422-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kojima S, Doupe AJ. Social performance reveals unexpected vocal competency in young songbirds. PNAS. 2011;108:1687–1692. doi: 10.1073/pnas.1010502108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kupferschmidt DA, Juczewski K, Cui G, Johnson KA, Lovinger DM. Parallel, but dissociable, processing in discrete corticostriatal inputs encodes skill learning. Neuron. 2017;96:476–489. doi: 10.1016/j.neuron.2017.09.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lau B, Monteiro T, Paton JJ. The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia. Current Opinion in Neurobiology. 2017;46:241–247. doi: 10.1016/j.conb.2017.08.015. [DOI] [PubMed] [Google Scholar]
  48. Lehéricy S, Benali H, Van de Moortele PF, Pélégrini-Issac M, Waechter T, Ugurbil K, Doyon J. Distinct basal ganglia territories are engaged in early and advanced motor sequence learning. PNAS. 2005;102:12566–12571. doi: 10.1073/pnas.0502762102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Levelt CN, Hübener M. Critical-period plasticity in the visual cortex. Annual Review of Neuroscience. 2012;35:309–330. doi: 10.1146/annurev-neuro-061010-113813. [DOI] [PubMed] [Google Scholar]
  50. Luo M, Ding L, Perkel DJ. An avian basal ganglia pathway essential for vocal learning forms a closed topographic loop. Journal of Neuroscience. 2001;21:6836–6845. doi: 10.1523/JNEUROSCI.21-17-06836.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Makino H, Hwang EJ, Hedrick NG, Komiyama T. Circuit mechanisms of sensorimotor learning. Neuron. 2016;92:705–721. doi: 10.1016/j.neuron.2016.10.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Mann NI, Slater PJB. Song tutor choice by zebra finches in aviaries. Animal Behaviour. 1995;49:811–820. doi: 10.1016/0003-3472(95)80212-6. [DOI] [PubMed] [Google Scholar]
  53. Meliza CD, Margoliash D. Emergence of selectivity and tolerance in the avian auditory cortex. Journal of Neuroscience. 2012;32:15158–15168. doi: 10.1523/JNEUROSCI.0845-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Miller-Sims VC, Bottjer SW. Auditory experience refines cortico-basal ganglia inputs to motor cortex via remapping of single axons during vocal learning in zebra finches. Journal of Neurophysiology. 2012;107:1142–1156. doi: 10.1152/jn.00614.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Nakahara H, Doya K, Hikosaka O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. Journal of Cognitive Neuroscience. 2001;13:626–647. doi: 10.1162/089892901750363208. [DOI] [PubMed] [Google Scholar]
  56. Nixdorf-Bergweiler BE. Lateral magnocellular nucleus of the anterior neostriatum (LMAN) in the zebra finch: neuronal connectivity and the emergence of sex differences in cell morphology. Microscopy Research and Technique. 2001;54:335–353. doi: 10.1002/jemt.1147. [DOI] [PubMed] [Google Scholar]
  57. Niziolek CA, Nagarajan SS, Houde JF. What does motor efference copy represent? Evidence from speech production. Journal of Neuroscience. 2013;33:16110–16116. doi: 10.1523/JNEUROSCI.2137-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. O'Connor DH, Peron SP, Huber D, Svoboda K. Neural activity in barrel cortex underlying vibrissa-based object localization in mice. Neuron. 2010;67:1048–1061. doi: 10.1016/j.neuron.2010.08.026. [DOI] [PubMed] [Google Scholar]
  59. Olveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biology. 2005;3:e153. doi: 10.1371/journal.pbio.0030153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Parent A, Hazrati LN. Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Research Reviews. 1995;20:91–127. doi: 10.1016/0165-0173(94)00007-C. [DOI] [PubMed] [Google Scholar]
  61. Paterson AK, Bottjer SW. Cortical inter-hemispheric circuits for multimodal vocal learning in songbirds. Journal of Comparative Neurology. 2017;525:3312–3340. doi: 10.1002/cne.24280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Person AL, Gale SD, Farries MA, Perkel DJ. Organization of the songbird basal ganglia, including area X. The Journal of Comparative Neurology. 2008;508:840–866. doi: 10.1002/cne.21699. [DOI] [PubMed] [Google Scholar]
  63. Pinaud R, Saldanha CJ, Wynne RD, Lovell PV, Mello CV. The excitatory thalamo-"cortical" projection within the song control system of zebra finches is formed by calbindin-expressing neurons. The Journal of Comparative Neurology. 2007;504:601–618. doi: 10.1002/cne.21457. [DOI] [PubMed] [Google Scholar]
  64. Redgrave P, Rodriguez M, Smith Y, Rodriguez-Oroz MC, Lehericy S, Bergman H, Agid Y, DeLong MR, Obeso JA. Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. Nature Reviews Neuroscience. 2010;11:760–772. doi: 10.1038/nrn2915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Roper A, Zann R. The onset of song learning and song tutor selection in fledgling zebra finches. Ethology. 2006;112:458–470. doi: 10.1111/j.1439-0310.2005.01169.x. [DOI] [Google Scholar]
  66. Samejima K, Doya K. Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of the New York Academy of Sciences. 2007;1104:213–228. doi: 10.1196/annals.1390.024. [DOI] [PubMed] [Google Scholar]
  67. Scharff C, Nottebohm F. A comparative study of the behavioral deficits following lesions of various parts of the zebra finch song system: implications for vocal learning. Journal of Neuroscience. 1991;11:2896–2913. doi: 10.1523/JNEUROSCI.11-09-02896.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Shen J. Github; 2017. https://github.com/BottjerLab/Acoustic_Similarity [Google Scholar]
  69. Solis MM, Doupe AJ. Anterior forebrain neurons develop selectivity by an intermediate stage of birdsong learning. Journal of Neuroscience. 1997;17:6447–6462. doi: 10.1523/JNEUROSCI.17-16-06447.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Tchernichovski O, Lints TJ, Deregnaucourt S, Cimenser A, Mitra PP. Studying the song development process: rationale and methods. Annals of the New York Academy of Sciences. 2004;1016:348–363. doi: 10.1196/annals.1298.031. [DOI] [PubMed] [Google Scholar]
  71. Tchernichovski O, Mitra PP, Lints T, Nottebohm F. Dynamics of the vocal imitation process: how a zebra finch learns its song. Science. 2001;291:2564–2569. doi: 10.1126/science.1058522. [DOI] [PubMed] [Google Scholar]
  72. Tchernichovski O, Nottebohm F, Ho CE, Pesaran B, Mitra PP. A procedure for an automated measurement of song similarity. Animal Behaviour. 2000;59:1167–1176. doi: 10.1006/anbe.1999.1416. [DOI] [PubMed] [Google Scholar]
  73. Thorn CA, Atallah H, Howe M, Graybiel AM. Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron. 2010;66:781–795. doi: 10.1016/j.neuron.2010.04.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Thorn CA, Graybiel AM. Differential entrainment and learning-related dynamics of spike and local field potential activity in the sensorimotor and associative striatum. Journal of Neuroscience. 2014;34:2845–2859. doi: 10.1523/JNEUROSCI.1782-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science. 2000;287:1273–1276. doi: 10.1126/science.287.5456.1273. [DOI] [PubMed] [Google Scholar]
  76. Vintsyuk TK. Speech discrimination by dynamic programming. Cybernetics. 1972;4:52–57. doi: 10.1007/BF01074755. [DOI] [Google Scholar]
  77. Yin HH, Mulcare SP, Hilário MR, Clouse E, Holloway T, Davis MI, Hansson AC, Lovinger DM, Costa RM. Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nature Neuroscience. 2009;12:333–341. doi: 10.1038/nn.2261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Yin HH, Ostlund SB, Balleine BW. Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. European Journal of Neuroscience. 2008;28:1437–1448. doi: 10.1111/j.1460-9568.2008.06422.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Ölveczky BP, Otchy TM, Goldberg JH, Aronov D, Fee MS. Changes in the neural control of a complex motor sequence during learning. Journal of Neurophysiology. 2011;106:386–397. doi: 10.1152/jn.00018.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Ronald L Calabrese1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "Neural activity in corticobasal ganglia circuits of juvenile songbirds encodes performance during goal-directed learning" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Ronald L Calabrese (Reviewer #1), is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor..

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

This is an interesting manuscript, which compares the properties and developmental changes of neuronal activity in cortico-basal ganglia circuits of the zebra finch during the juvenile song learning period. This work focuses on the well-studied cortical LMAN CORE and on the surrounding SHELL. The hypothesis that individual LMAN neurons may function to compare sensory feedback and efference copy from current motor actions to the song target, with SHELL serving as the critic and CORE as the actor, is very appealing.

Major Concerns

1) This is mainly a correlative study here without any attempt to dissociate sensory from motor effects on firing.

2) There were substantial concerns about the appropriateness and correctness of the statistical analyses. The correlative nature of the study makes appropriate statistics critical.

3) The analyses of the data are very complicated and somewhat idiosyncratic to songbirds and their unique song-learning behavior. This make for a difficult read for a non-expert.

4) The issue of overwhelming details and smallness of effect concerned us. The idea of the paper is appealing, but we were not overly convinced by their argument. This seems particularly important while considering a manuscript in a journal with a general readership.

Reviewer #1:

This is an interesting manuscript, which compares the properties and developmental changes of neuronal activity in cortico-basal ganglia circuits of the zebra finch during the juvenile song learning period. During this period, juvenile songbirds are actively engaged in evaluating feedback of self-generated behavior in relation to their memorized tutor song (the goal). This work focuses on the well-studied cortical LMAN CORE and on the surrounding SHELL, which the corresponding author's group has delineated in several previous studies and implicated in the process of learning during song development of juveniles. It also tracks development of songs themselves during this period of song learning so that neuronal responses and songs can be compared as learning progresses. The paper uses rather complicated and multifaceted analyses of songs and of responses of CORE and SHELL neurons recorded in awake singing juvenile birds, which nevertheless present a consistent picture. The firing rate of a subset neurons in both SHELL and CORE reflect the degree of tutor song matching during singing, with a higher incidence of SHELL neurons showing correlations between firing rate and degree of tutor song matching compared to those in CORE. In addition, variability in firing rate was lower during production of the best-matched syllable utterances compared to poorly-matched utterances. Syllable utterances least similar to tutor syllables showed a progressive increase in similarity to the tutor over the course of vocal learning and the activity of SHELL, but not CORE neurons reflected this difference in learning trajectory between best and worst tutor-matched syllable renditions. These results suggest that individual LMAN neurons may function to compare sensory feedback and efference copy from current motor actions to the song target, with Shell serving as the critic and CORE as the actor.

This paper was difficult to read because the analyses are complicated and somewhat abstract, but the text is tightly written.

Major Concerns

1) The analyses of the data is very complicated and somewhat idiosyncratic to songbirds and their unique song-learning behavior. This make for a difficult read for a non-expert.

2) This reviewer cannot fully critique the data analytical methods but defers to the other experts. One point that make this reviewer particularly uncomfortable is the use of correlations, as in Figure 13 (but others also), which are critical for the central conclusion concerning developmental progression in the responses of SHELL neurons. The data has virtually no vertical spread and thus r will be very small and difficult to obtain significance or show substantive change. Moreover, I am uncomfortable in general that these analyses focus on the outer quartiles of the distributions only (most similar and least similar groups).

Reviewer #2:

This paper explores important issues regarding sensorimotor processing during vocal learning. However serious problems in the quantitative analyses prevent a clear interpretation of the data.

First, many analyses rest on comparing the proportions of statistical tests that achieve significance at p<0.05. Although such approaches can be useful, in many cases the authors do not perform the correct control analyses to support their claims. For example, the authors cite the fact that between 6-10% of correlations are significant as evidence that "neural activity is correlated with similarity to tutor song in both CORE and SHELL subregions of juvenile LMAN". This conclusion is not supported by the data unless the authors show that, for example, finding that 7% of CORE cells have a different firing rate for the most/least similar syllables is significantly different from chance. That is, running 1,000 t-tests on vectors of random numbers will, on average, produce significant correlations (at p<0.05) in 5% of cases. To provide evidence for their conclusions here the authors must demonstrate that finding 7% of correlations significant is surprising.

Additionally, the analysis presented in the section "Activity correlated with multiple, but not single, features" is not convincing. It is not clear to me whether the basic finding – that many neurons have a significant relationship with 11 features simultaneously, but few have sig relationships in single regression – is actually informative. First, finding a significant correlation with the 11 "aggregate features" might simply mean that activity is correlated with one, rather than several, of the features. It would therefore be unsurprising if a far smaller number of single-feature tests were significant. More importantly, and completely separate from the above concern, there is no guarantee that the acoustic features measured are the features that the bird actually evaluates or controls, so the conclusion about representing individual vs constellations of features is unconvincing, i.e. even if the individual vs aggregate analysis were performed correctly, it could be that the LMAN neurons really do care about single features, but that these features are different from, although possibly correlated with, the 11 features used in the analysis.

The logic of the analysis used in Figure 4 is confusing. As I understand it, the authors first select neurons that fire greater than baseline rate in the 50 msec prior to syllable onset, and THEN ask whether there is a change in average activity change locked to syl onset. (Such data are used to argue that only CORE cells shows "coordinated premotor activity"). But isn't it the case that to be included in this analysis, SHELL neurons by definition must have elevated rates (relative to baseline) that stay elevated throughout syllable onset, rather than dropping/modulating as is apparently typical in CORE cells? If so, it does not seem correct to say that SHELL neurons do not display coordinated premotor activity, or do so to a lesser extend than do CORE cells.

A further fundamental problem is that no attempt is made to separate the sensory from motor tunings of these neurons. All other problems with the analysis aside, finding that neural activity co-varies with syllable acoustics immediately raises the question of whether this covariation reflects premotor control (or efference copy) of future behavior, sensory processing of prior behavior, or some combination thereof. This issue confounds interpretation of most of the results in this paper. For example, the finding that variability in neural "responses" is correlated with similarity to tutor (subsection “Variability in neural responses of both CORE and SHELL neurons correlates with similarity to tutor song but not with syllable prototypicality”) might reflect either auditory or motor differences between the syllables that are most- or least-similar to those of the tutor. Given this concern, and the numerous problems with the quantitative analyses outlined above, it is not clear to me that this paper significantly advances our understanding of sensorimotor processing.

Reviewer #3:

The paper by Achiro, Chen and Bottjer shows evidence for neurons in a motor learning pathway that seem to encode the motor performance during learning. This finding is important as it provides evidence for a goal-directed signal that has been an elusive, to date. This has important implications for investigators studying motor learning, particularly those interested in the role of the basal ganglia in goal-directed behaviors. Overall, this manuscript is a tour de force that combines difficult (hard to get) neural recordings with a great deal of careful analysis. As far as I get tell the statistical analyses are good. Although the analysis is very thorough, the paper seems to suffer under the weight of complexities of the analysis so that as I through the paper the analysis became so detailed that I lost the narrative of the paper. This could be particularly problematic for non-birdsong readers. I am not sure how to fix the problem as most of the analysis seems important to 'make the case' and the authors summarize each section.

1) The results are correlative and there is no manipulation of the system to confirm that results. This requires additional experiments that are far beyond the scope of the current results, although they would add additional support to the important finding.

2) In addition to the analysis, the effect seems to be quite small; only a small percentage of neurons and a small-ish change in firing rate (Figure 8). I do not have a problem with the small percentage of neurons as the bird may not need many neurons to encode the performance, particularly if those neurons 'disappear' after learning. This does not mean it is not true, or unbelievable, it just makes it harder to understand and show easily. For example, the firing rate increases are small and the similarity modulation index shift is very hard to see from the graphs, although there is a statistical shift.

3) I think it would be useful to show some raw (at least spikes) data for Figure 2. In the figure it is difficult to tell if there is a change in firing rate from baseline as not much baseline is shown and for some of the songs, the firing rate does not seem to change, for example, in the first LMAN shell singing-excited neuron shown, the neuron seems to fire at the same rate before and during song, except for a burst of activity at song onset. In other words, if I looked at the firing rate I would not be able to tell when the bird song. One way this could be addressed is by providing more baseline activity for one song in each category.

4) In subsection “Neural responses in juvenile LMAN are variable during repeated renditions of emerging syllable types “, the authors say they time warped the spike train to look at timing. This seems to be a little tricky to warp the timing to look at timing. How much did the timing have to be warped (% of syllable duration)? How much warping occurred compared to the variability of the spiking activity?

5) I found the Similarity Modulation Index graphs a little confusing (Figure 8D and 9A). Where the neurons with positive response strengths and negative response strengths analyzed together? If you separate them, do the positive responses result in a significant result and negative responses result in a non-significant results?

[Editors’ note: what now follows is the decision letter after the authors submitted a revised manuscript for consideration.]

Thank you for submitting your article "Neural activity in corticobasal ganglia circuits of juvenile songbirds encodes performance during goal-directed learning" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Ronald L Calabrese (Reviewer #1), is a member of our Board of Reviewing Editors, and the evaluation has been overseen Andrew King as the Senior Editor. We had two new referees.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Given that this paper has gone through two full rounds of review and remains in an unacceptable form, we are prepared to offer only one more opportunity to provide an acceptable version of the manuscript.

Summary:

This interesting manuscript compares the properties and developmental changes of neuronal activity in cortico-basal ganglia circuits of the zebra finch during the juvenile song leaning period. During this period, juvenile songbirds are actively engaged in evaluating feedback of self-generated behavior in relation to their memorized tutor song (the goal). This work focuses on the well-studied cortical LMAN CORE and on the surrounding SHELL, which the corresponding author's group has delineated in several previous studied and implicated in the process of learning during song development of juveniles. It also tracks development of songs themselves during this period of song learning so that neuronal responses and songs can be compared as learning progresses. The paper uses sophisticated analyses of songs and of responses of CORE and SHELL neurons recorded in awake singing juvenile birds, which present a consistent picture. The spiking patterns of a subset neurons in both SHELL and CORE reflect the degree of tutor song matching during singing. Both CORE and SHELL neurons encode tutor similarity either by increases or decreases in firing rate, but only SHELL neurons showed a significant association at the population level. During development, tutor similarity (for syllables with low initial similarity to tutor syllables) predicted firing rates most strongly during early stages of learning, and SHELL but not CORE neurons showed decreases in response variability, suggesting that the activity of SHELL neurons reflects the progression of learning. These results suggest that individual LMAN neurons may function to compare sensory feedback and efference copy from current motor actions, with Shell serving as the critic and CORE as the actor.

Essential revisions:

This paper was difficult to read because the analyses are complicated and somewhat abstract, and many separate issues are addressed which are peripheral to the main interest of the paper; during development, tutor similarity predicted firing rates most strongly during early stages of learning, and SHELL but not CORE neurons showed decreases in response variability. This paper has thus much inherent interest but is too unwieldly as presented and how it should be revised was the subject of lively constructive discussion by the expert reviewers. In revision, the authors should focus and provide detailed responses to the points below.

1) The main revision must be to simplify the message of the paper by focusing ONLY on the song similarity analysis.a) The 'premotor vs non premotor' analyses are not necessary to support the main finding, and given the relatively weak support in favor of a CORE-SHELL distinction on that issue, it should be removed.b) A large portion of the results is devoted to showing that juvenile syllables become more and more similar to tutor during early song development. Although previous studies have focused on the later stages of learning, song similarity is known to increase even in the early stages of learning (e.g. Aronov et al., 2008). The authors should cut substantially the lengthy description and analysis of song development to concentrate on the core issues; during development, tutor similarity predicted firing rates most strongly during early stages of learning, and SHELL but not CORE neurons showed decreases in response variability.c) The concentration on the relation between LMAN firing and the acoustic properties of the syllables is unwarranted. The issue here is that findings are very weak. The authors make a very problematic statistical claim about representation of multiple features but no single feature. This could be, in principle, true, but the analyses show simply a very weak effect, as opposed to synergy or gestalt representation. It is not possible to see anything interesting in the raw data. To quote one of the expert reviewers in discussion "The entire section about analysis of acoustic features, and all figures showing nothing apparent, should be removed, and replaced by something like "we found a very weak, but statistically significant representation of song features (t=… p=… see suppl.)."d) Another issue that undermines the clarity of the paper is the intermixing of results that apply both to LMAN-core and LMAN-shell neurons and results that contrast between these two areas. The authors should highlight similarities and differences between these two areas. As suggested by the first sentence in the discussion, the main finding of the paper applies to both core and shell neurons, and only minor differences are found between the singing-related firing in these two populations. This could be made more explicit throughout the results.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Neural activity in corticobasal ganglia circuits of juvenile songbirds encodes performance during goal-directed learning" for further consideration at eLife. Your revised article has been favorably evaluated by Andrew King (Senior editor), a Reviewing editor, and two reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

The manuscript is much improved and all comments were fully addressed.

There are two remaining issues that should be resolved before this paper is published.

1) Regressions of baseline-corrected firing rates against tutor similarity for each neuron revealed that cells in both CORE and SHELL exhibited either positive or negative associations between neural activity and degree of tutor similarity: approximately half of all neurons in each subregion showed positive slopes (r values > 0, increased firing rates for syllables with higher tutor similarity).

We should have noticed this issue in the previous round, but this description is deficient. Saying that half increased and half decreased is like looking at random data, right? Instead, plot a histogram of all those slopes and/or r values, so readers can see the distribution on both positive and negative sides. Then do bootstrap (shuffling tutor similarity) to plot this histogram against a random distribution of similarities. This way, it would be possible to evaluate if this is a real correlation.

2) Response strength did not differ between CORE and SHELL neurons for either excitation or suppression (Table 1; Mann-Whitney tests: singing-excited neurons U = 1768, p = 0.06, singing-suppressed neurons U = 459, p = 0.90).

Maybe you should not say that there is no difference for singing excited neurons since 0.06 is almost significant. Instead, say that you see a borderline difference between core and shell, which did not reach significance level, for song-excited neurons, but no apparent effect in the song-suppressed neurons and then give the p values.

If the statistical tests in comment #1 work out satisfactorily, then further review by the expert reviewers will not be required.

eLife. 2017 Dec 19;6:e26973. doi: 10.7554/eLife.26973.025

Author response


[Editors’ note: the author responses to the first round of peer review follow.]

This is an interesting manuscript, which compares the properties and developmental changes of neuronal activity in cortico-basal ganglia circuits of the zebra finch during the juvenile song learning period. This work focuses on the well-studied cortical LMAN CORE and on the surrounding SHELL. The hypothesis that individual LMAN neurons may function to compare sensory feedback and efference copy from current motor actions to the song target, with SHELL serving as the critic and CORE as the actor, is very appealing.

All three reviewers expressed some strongly positive comments about the paper. All reviewers also made important comments, many of which we agree with and have used to greatly improve the analysis of the Results and the exposition of the paper overall. We respectfully disagree with some of the concerns expressed by the reviewers.

Major Concerns

1) This is mainly a correlative study here without any attempt to dissociate sensory from motor effects on firing.

2) There were substantial concerns about the appropriateness and correctness of the statistical analyses. The correlative nature of the study makes appropriate statistics critical.

3) The analyses of the data are very complicated and somewhat idiosyncratic to songbirds and their unique song-learning behavior. This make for a difficult read for a non-expert.

4) The issue of overwhelming details and smallness of effect concerned us. The idea of the paper is appealing, but we were not overly convinced by their argument. This seems particularly important while considering a manuscript in a journal with a general readership.

One major concern is that “this is mainly a correlative study”. This is true, but we disagree strongly with the premise that correlational studies cannot provide ground-breaking data, as this paper does. Many recent correlative studies have used the approach of chronic recordings in awake behaving animals, as we did here, to relate spiking activity to behavior and provide major new data. This is especially true for studies that have recorded the activity of cortical neurons during expression of a specific behavior; given the diversity of cortical neurons, this approach constitutes a major first step in understanding how cortical neurons mediate complex behaviors. One recent example (among many) comes from barrel cortex: Hires, Gutnisky, Yu, O’Connor, & Svoboda (2015, eLife, 4:e06619). This paper recorded activity in barrel cortex during an object localization task. In that study, as in ours, variations in a naturally-occurring behavior provide the contrast that is needed to make important inferences about neural function. In our study, variations in the similarity of self-generated behavior to memorized tutor sounds across the sensorimotor stage of learning was used as a predictor of neural activity. The results are the first to identify learning-related signals as juvenile songbirds are refining their behavioral output.

Specifically, our study provides the first direct evidence that the cortical region LMAN contains neurons that are involved in coding similarity between self-generated and goal behaviors. I published a paper in 1984 showing (based on lesion data) that LMAN plays an essential role in juvenile birds as they are actively engaged in learning. Since then, many songbird labs have looked for evidence that LMAN neurons in juveniles mediate song learning based on comparing vocal-auditory feedback to an internal representation of tutor song. The few studies that have tested this idea directly have provided negative evidence; for example, studies that attempted to disrupt auditory feedback during singing observed no difference in the spiking patterns of LMAN neurons (e.g., Leonardo, 2004, PNAS, 101:16935). The goal of our study was to test the hypothesis that LMAN neurons would encode variations in the similarity of on-going vocal production to corresponding tutor syllables. No previous study has attempted to quantify similarity between juvenile syllables and learned tutor syllables during early stages of sensorimotor integration; we accomplished this by further developing a program that has been universally used among songbird labs to analyze acoustic features (Sound Analysis Pro, Tchernichovski et al., 2000). In addition we carried out the difficult technical challenge of recording from vocalizing birds at this young age. We found strong evidence that the activity of LMAN neurons does encode similarity of self-generated behavior to tutor behavior, indicating a direct participation in mediating goal-directed learning, especially in LMAN-SHELL. This is the first and only evidence to this effect in the past thirty years, and constitutes an important break-through. In the songbird system in particular, where very little is known concerning the organization and physiology of cortical neurons, studies that measure spiking activity in relation to on-going behavior, represent necessary and important steps to advancing mechanistic understanding and formulating specific hypotheses for further tests.

A corollary point is that we disagree strongly that separating sensory from motor tuning, or any “manipulation of the system” is a necessary part of this study. In our opinion, the current data are a goldmine which begin to inform mechanisms and serve as an essential prerequisite for many future tests, including questions regarding whether the tutor-matching activity represents efference copy, auditory feedback, both, or other influences. Thus, the learning-related signals we report are likely to reflect a complex combination of sensory and motor signals; in the recurrent loop architecture of cortico-basal ganglia circuits these signals will be iteratively conveyed both through and across the types of parallel pathways we describe. We favor a combination of influences compatible with both forward and inverse models; that is, this neural circuitry is likely to entail efference copy learning to predict sensory feedback as well as sensory inputs learning to predict the motor commands that gave rise to them. The data in this paper will be extremely useful for developing new computational models of learning in this system to guide the many future experiments required to dissect these important questions.

The fact that the results provided strong evidence in favor of “outcome evaluation” between self-generated and tutor behavior is surprising, given that we know so little about the microcircuitry of LMAN and the heterogeneity of neurons contained therein – i.e., we were looking for the proverbial needle in a haystack. The single neurons that showed a significant association between tutor syllables and self-generated syllables might easily fall within a subtype of SHELL neurons, in which case the percentage of significant neurons would be much higher. As we comment in the Discussion, we think it is likely that single neurons that signal tutor similarity may be vested within the population of SHELL neurons that respond only to playback of tutor song (Achiro and Bottjer, 2013). But in any case, as pointed out by Reviewer #3, a small number of neurons can go a long way. The famous cholinergic interneurons of the striatum constitute only 0.3% of the total neuron number, but are important for procedural learning (Tepper & Bolam, 2004). This may be difficult to understand but does not detract from the importance of the effect.

Regarding the “smalli-ish change in firing rate”: LMAN neurons, like many cortical neurons, are sparsely firing. This has been observed in all published electrophysiological studies of LMAN (e.g., see Achiro and Bottjer, 2013, and references therein). The range of firing rates in any population of sparsely firing neurons will be limited. One could argue that it might be less likely to observe significant correlations for this reason, but we observed highly significant correlations between firing rate and tutor similarity. We have clarified this in the text.

We have stream-lined the analyses of the data and the presentation of the Results considerably so that the paper is much easier to read. Based on the concerns of reviewers concerning the statistical analyses, we sought expert statistical advice. Reviewer #1’s concern that we based many analyses on the top and bottom quartiles of tutor similarity turned out to be justified. We have eliminated this approach altogether, and for the main result of the paper (prediction of firing rate by tutor similarity), we substituted a mixed-effects regression analysis that employed all the data. This analysis yielded a main effect of tutor similarity in SHELL but not CORE neurons. This analysis also caused us to focus on the fact that single neurons could show either positive or negative associations between firing rate and similarity of self-generated utterance to tutor syllables; assessing the magnitude of positive versus negative correlations within each subregion (as descriptive data, since this was not a prediction of our study) clearly indicated that firing rate in CORE neurons is modulated by tutor similarity, albeit to a lesser degree than that of SHELL neurons. Thus the primary result of the paper did not change, although now the data include the added dimension that tutor similarity can be encoded by either increases or decreases in firing rate.

Reviewer #2 made an excellent point regarding the fact that many analyses rested on proportions of statistical tests of single neurons that achieve significance at p<.05. We agree and have altered that approach in two important ways. First, by applying correction factors for judging the proportion of single neurons (again, based on expert statistical advice). Secondly, for testing the percent of single neurons that encode tutor similarity, we performed permutation tests in which 1,000 random shuffles (of firing rate relative to tutor similarity) were performed for each neuron; r values were calculated for each random shuffle to provide a distribution of null values. For each permutation, the fraction of actual neurons exceeding chance (0.025/0.975) was calculated, and the average of this significant fraction across all permutations was taken as the fraction of neurons with significant r values. This approach fully addresses the concern of Reviewer #2 regarding whether 10% of neurons is “surprising” (i.e., significant), and produced only minor changes in the percentage of single neurons: 5.5% of CORE neurons and 10.8% of SHELL neurons exhibited a significant correlation of firing rate to degree of tutor song matching (previously these values were 6% and 10%). However, when we performed this same analysis on the prototypicality scores, the fraction of significant neurons dropped in SHELL neurons but increased slightly in CORE neurons, which eliminated the significant difference between them. Given this (and the fact that prototypicality showed no significant effect at the population level), we now employ prototypicality purely as a control for tutor similarity. This approach simplifies the exposition and makes the paper much easier to absorb for general readers.

We disagree with Reviewer #2 that the section entitled “Neural activity is correlated with multiple, but not single, acoustic features of syllables” is not convincing. This reviewer suggests that the correlation of a neuron with multiple acoustic features could be based on a strong correlation with a single feature; if this were true, then that single feature would have shown a strong correlation, but that was not the case. That is, single-feature tests could not explain the correlation seen in single neurons. The GLM (a General Linear Model – not a Generalized Linear Model) used in this analysis is simply a linear regression including all acoustic features, and includes partial regressions for each family of features (i.e., all summary statistics for each acoustic feature); no correction for multiple comparisons is applied to the partial regressions for single features since the analysis for single features already takes the other features into account. Our expert statistician confirmed that this analysis is fully appropriate. He also suggested that we replace the Bonferroni correction that we had applied to judge the percent of significant cells with an FDR (False Discovery Rate) correction in this case, which we have done.

This reviewer also commented that we may not have measured acoustic features that the bird actually evaluates or controls. This is completely true, and we have added this caveat to the manuscript. However, the acoustic features we measured have been in use in most songbird labs since 2000, when they were originally developed by Ofer Tchernichovski and Partha Mitra; it would be difficult to publish any songbird paper that quantified acoustic features without the use of such analyses. We have used 100% of the features used in their software (SAP, Sound Analysis Pro), and have added other features and measures. Of course this does not mean we have included the “right” features, but we have carried out one of the most complete acoustic analyses that we know of to date. Furthermore, we found that the acoustic features we measured predict the firing rate in a large percentage of LMAN neurons. It was this approach that enabled us to successfully analyze syllables produced by birds as young as 43-50 days, which no previous paper has done.

Lastly, we agree that the paper was very difficult to read and absorb, which we blame on ourselves. We have extensively re-written the paper to be much easier to read and accessible to a general audience. The only analyses of the data that are “idiosyncratic to songbirds” have to do with measurement of acoustic features. We have provided a simple description of how acoustic similarity was measured in the Results, and interested readers can consult the details in the Materials and methods. Our honest opinion is that the paper was poorly written overall, and that it “suffered under the weight of complexities” (as described by Reviewer #3), so that the problem was not that the analyses are idiosyncratic to songbirds. We believe the paper is no longer a difficult read, in its highly revised form. The results will be of particular interest to any readers interested in cortical-basal ganglia circuits, and to a wide audience of readers interested in learning, sensorimotor integration, and development.

[Editors’ note: the author responses to the re-review follow.]

[…] Essential revisions:

This paper was difficult to read because the analyses are complicated and somewhat abstract, and many separate issues are addressed which are peripheral to the main interest of the paper; during development, tutor similarity predicted firing rates most strongly during early stages of learning, and SHELL but not CORE neurons showed decreases in response variability. This paper has thus much inherent interest but is too unwieldly as presented and how it should be revised was the subject of lively constructive discussion by the expert reviewers. In revision, the authors should focus and provide detailed responses to the points below.

1) The main revision must be to simplify the message of the paper by focusing ONLY on the song similarity analysis.a) The 'premotor vs non premotor' analyses are not necessary to support the main finding, and given the relatively weak support in favor of a CORE-SHELL distinction on that issue, it should be removed.b) A large portion of the results is devoted to showing that juvenile syllables become more and more similar to tutor during early song development. Although previous studies have focused on the later stages of learning, song similarity is known to increase even in the early stages of learning (e.g. Aronov et al., 2008). The authors should cut substantially the lengthy description and analysis of song development to concentrate on the core issues; during development, tutor similarity predicted firing rates most strongly during early stages of learning, and SHELL but not CORE neurons showed decreases in response variability.c) The concentration on the relation between LMAN firing and the acoustic properties of the syllables is unwarranted. The issue here is that findings are very weak. The authors make a very problematic statistical claim about representation of multiple features but no single feature. This could be, in principle, true, but the analyses show simply a very weak effect, as opposed to synergy or gestalt representation. It is not possible to see anything interesting in the raw data. To quote one of the expert reviewers in discussion "The entire section about analysis of acoustic features, and all figures showing nothing apparent, should be removed, and replaced by something like "we found a very weak, but statistically significant representation of song features (t=… p=… see suppl.)."d) Another issue that undermines the clarity of the paper is the intermixing of results that apply both to LMAN-core and LMAN-shell neurons and results that contrast between these two areas. The authors should highlight similarities and differences between these two areas. As suggested by the first sentence in the discussion, the main finding of the paper applies to both core and shell neurons, and only minor differences are found between the singing-related firing in these two populations. This could be made more explicit throughout the results.

The biggest issue for all reviewers was the length and complexity of the paper, coupled with the fact that many of the issues addressed were peripheral to the main interest of the paper.

We apologize for the encyclopedic nature of the Results section. We had thought that it was important to fully characterize the activity patterns of shell neurons since no previous studies have made chronic recordings from this region. Clearly, in retrospect, that was a mistake. We have addressed this issue by focusing on the sections dealing with similarity of self-generated song behavior to memorized tutor song, as requested. We did this in the following ways: (1) We completely eliminated the section describing the correlation between neural activity and acoustic features of syllables; we agree with Reviewer 4 that as presented the findings are based on statistics and not on any biological demonstration, and rather than include 1-2 sentences we think it best to reserve these data for a future paper. (2) We also eliminated the entire section describing the variability of spiking activity across syllable renditions (along with the description of how we measured acoustic similarity of syllables, which is now contained only in the Materials and methods); this description has been replaced by two brief sentences in the first section of the Results, and the data are shown in Figure 2—figure supplement 1. (3) We shortened the description of the developmental increase in similarity between juvenile and tutor syllables; we continue to include the new finding that only syllabic utterances with low tutor similarity at the onset of song learning become more similar to tutor during song development; this pattern is important since the pattern of neural activity mirrors this behavioral pattern (which has not been described previously). (4) In the course of simplifying the text of the Results, we have attempted to highlight similarities and differences between core and shell in an organized way, as requested. (5) We eliminated the section on the time course of singing activity, but have included these data as a single paragraph at the end of the first section; the request to remove the ‘premotor vs non premotor analyses’ was confusing to us, in part because it did not appear in the comments of any of the three reviews; we have shortened and clarified the description to show that this result represents a strong difference between core and shell: the lack of a pre-motor response in shell neurons is essential to interpreting their role in skill learning; we think this issue is clear in the revised version. In summary, we condensed the first four sections of the previous version into one section, reducing the number of paragraphs from eight to three. This introductory section contains a brief description of the similarities in singing-related neural activity between shell and core, and of the one main difference that core neurons show a coordinated premotor increase in activity whereas shell neurons do not.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript is much improved and all comments were fully addressed.

There are two remaining issues that should be resolved before this paper is published.

1) Regressions of baseline-corrected firing rates against tutor similarity for each neuron revealed that cells in both CORE and SHELL exhibited either positive or negative associations between neural activity and degree of tutor similarity: approximately half of all neurons in each subregion showed positive slopes (r values > 0, increased firing rates for syllables with higher tutor similarity).

We should have noticed this issue in the previous round, but this description is deficient. Saying that half increased and half decreased is like looking at random data, right? Instead, plot a histogram of all those slopes and/or r values, so readers can see the distribution on both positive and negative sides. Then do bootstrap (shuffling tutor similarity) to plot this histogram against a random distribution of similarities. This way, it would be possible to evaluate if this is a real correlation.

There is some confusion. Subsection “Neural activity in LMAN reflects similarity of self-generated syllables to tutor syllables” of the latest version of the paper read as follows:

“Regressions of baseline-corrected firing rates against tutor similarity for each neuron revealed that cells in both core and shell exhibited either positive or negative associations between neural activity and degree of tutor similarity: approximately half of all neurons in each subregion showed positive slopes (r values > 0, increased firing rates for syllables with higher tutor similarity) whereas the other half showed negative slopes (r values < 0, increased firing rates for syllables with lower tutor similarity) (Table 2).”

This sentence has nothing to do with evaluating whether these correlations are “real” (i.e. significant) or not, but is simply meant to present a description. The purpose of the sentence is to introduce the basic (and unexpected) idea that the firing rate of cells could either increase or decrease as a function of tutor similarity. We only address the issue of the statistical significance of these relationships in the following two paragraphs:

In the last paragraph of subsection “Neurons in both CORE and SHELL subregions of LMAN exhibit singing-related neural 105 activity in juvenile birds” we test whether individual neurons have significant correlations by performing repeated permutation tests (1,000 random shuffles for each cell, as described in that paragraph and in the Materials and methods), this analysis provides a reliable estimate of how many cells showed a significant relationship (which turned out to be 5.5% in core and 10.8% in shell).

In subsection “Neural activity in LMAN reflects similarity of self-generated syllables to tutor syllables” we test these correlations at the population level using the mixed-effects linear regression model recommended by our expert statistician. In our opinion, that analysis is highly appropriate, and is preferable to the strategy of performing a single randomization (bootstrap) for comparison to the actual data, as suggested by the reviewer. In addition, we note that total n’s and overall means are presented as part of Table 2, and the firing rates (response strengths) for each cell are plotted in Figure 5 as descriptive data so as to include all of the slopes (both positive and negative across all cells); we think that showing the actual data in this way instead of as a histogram of slopes or r values is also preferable. In summary, we did test for significance across the population, but did so using a mixed-effects linear regression model.

Possibly the reviewer thought we were making a claim about significance in the second paragraph of subsection “Neurons in both CORE and SHELL subregions of LMAN exhibit singing-related neural activity in juvenile birds”; although that is not true, it raises the possibility that other readers might also be confused. So one solution would simply be to re-write that sentence such that it is framed mainly in terms of describing changes in firing rates:

“Unexpectedly, this analysis revealed that firing rates of cells in both core and shell could either increase or decrease as a function of tutor similarity: approximately half of all neurons in each subregion showed increased firing rates for syllables with higher tutor similarity (positive slopes, r values > 0,) whereas the other half showed increased firing rates for syllables with lower tutor similarity (negative slopes, r values < 0) (Table 2).”

A “stronger” solution would be to simply omit that sentence altogether from that paragraph (it is not necessary there but was intended to make the text more accessible by introducing the idea of increases in firing rate encoding either higher or lower tutor similarity). Instead, that sentence, along with the rest of that paragraph and Figure 4A, could be incorporated into the following paragraph. In my opinion the revision I have made is sufficient to alleviate any confusion, and seems more user-friendly in terms of providing a gradual unfolding of the data. However, I can make the “stronger” revision if the reviewer prefers.

2) Response strength did not differ between CORE and SHELL neurons for either excitation or suppression (Table 1; Mann-Whitney tests: singing-excited neurons U = 1768, p = 0.06, singing-suppressed neurons U = 459, p = 0.90).

Maybe you should not say that there is no difference for singing excited neurons since 0.06 is almost significant. Instead, say that you see a borderline difference between core and shell, which did not reach significance level, for song-excited neurons, but no apparent effect in the song-suppressed neurons and then give the p values.

If the statistical tests in comment #1 work out satisfactorily, then further review by the expert reviewers will not be required.

We agree with this comment and have emended the text in the Results to read as follows: “Excitatory response strength was marginally higher in SHELL neurons, whereas suppressed response strength did not differ between CORE and SHELL (Table 1; Mann-Whitney tests: singingexcited neurons U = 1768, p = 0.06, singing-suppressed neurons U = 459, p = 0.90).”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 3—source data 1. Pre-singing spiking activity of individual CORE and SHELL neurons.
    elife-26973-fig3-data1.xlsx (398.2KB, xlsx)
    DOI: 10.7554/eLife.26973.011
    Transparent reporting form
    DOI: 10.7554/eLife.26973.022

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES