Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

James N McGregor; Abigail L Grassler; Paul I Jaffe; Amanda Louise Jacob; Michael S Brainard; Samuel J Sober

doi:10.7554/eLife.75691

. 2022 Sep 15;11:e75691. doi: 10.7554/eLife.75691

Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain

James N McGregor ^1,^†,^✉, Abigail L Grassler ^2,^†, Paul I Jaffe ³, Amanda Louise Jacob ², Michael S Brainard ^3,⁴, Samuel J Sober ²

Editors: Jesse H Goldberg⁵, Barbara G Shinn-Cunningham⁶

PMCID: PMC9522248 PMID: 36107757

Abstract

Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that somatosensory feedback (cutaneous electrical stimulation) can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is critical for guiding adaptive vocal changes based on both auditory and somatosensory signals. The ability of this circuit to use both auditory and somatosensory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.

Research organism: Other

Introduction

A fundamental goal of neuroscience is to understand how the brain uses sensory feedback to drive adaptive changes in motor output (Graybiel et al., 1994; Hikosaka et al., 2002). Human speech is a prime example of a sensory-guided behavior, and humans are among the few species that use auditory feedback from their own vocalizations to compensate for perceived errors in vocal output (Doupe and Kuhl, 1999). This reliance on sensory feedback for speech production is lifelong: loss of hearing impairs both speech development and vocal production in adulthood, and adult speakers rely heavily on auditory signals to calibrate their vocal acoustics (Oller and Eilers, 1988; Cowie and Douglas-Cowie, 1983; Stoel-Gammon and Otomo, 1986; Houde and Jordan, 1998). Accordingly, studies of the neurobiology of speech have focused on the specialized neural pathways that process auditory feeback (Jarvis, 2019). In contrast, it is unclear whether the brain uses non-auditory sensory input to modulate the acoustics of vocal production, although studies demonstrating that humans use non-auditory (somatosensory) signals to calibrate jaw movements suggest that this might be the case (Nasir and Ostry, 2008; Tremblay et al., 2003).

We address how the brain processes different sources of sensory feedback to guide vocal behavior by using a model system ideally suited for the study of vocal learning, the Bengalese finch. Like humans, songbirds rely on auditory signals to precisely calibrate their vocal output throughout life (Sober and Brainard, 2009; Kuebrich and Sober, 2015; Konishi, 1965; Nordeen and Nordeen, 1992). Also similar to humans, songbirds have evolved specialized neural pathways for vocal learning, allowing the precise interrogation of the brain mechanisms of song plasticity (Jarvis, 2019; Brainard and Doupe, 2002). However, prior research on this brain network has focused almost exclusively on the role of auditory feedback, although recent work has shown the importance of visual cues (light) in shaping vocalizations (Veit et al., 2021; Zai et al., 2020). Previous studies have revealed that songbird brains have a basal ganglia-thalamocortical circuit, the anterior forebrain pathway (AFP), that is required for auditory-guided vocal learning but not vocal production (Figure 1a; Brainard and Doupe, 2000; Nordeen and Nordeen, 1993; Mooney, 2009; Bottjer et al., 1984). For example, lesions of LMAN (the output nucleus of the AFP) prevent adult vocal plasticity in response to perturbations of auditory feedback (Brainard and Doupe, 2000; Ali et al., 2013; Morrison and Nottebohm, 1993). Also, lesions or manipulations of dopaminergic input into Area X (the basal ganglia nucleus of the AFP) impair adult vocal learning in response to the pitch-contingent delivery of aversive auditory stimuli (white noise bursts) (Hoffmann et al., 2016; Hisey et al., 2018; Xiao et al., 2018). Recent work has demonstrated that the songbird AFP receives anatomical projections from brain regions that process non-auditory sensory information (Paterson and Bottjer, 2017), and that Area X plays a crucial role in processing visual information to shape vocal output (Zai et al., 2020), yet it remains unclear whether and how the AFP processes somatosensory feedback to drive vocal learning, and whether dopaminergic input to the AFP is involved in non-auditory forms of learning.

We performed a series of three experiments (Figure 1b) to investigate whether and how the brain uses non-auditory, somatosensory feedback to guide vocal learning. We first tested whether adult songbirds can adaptively modify specific elements of their song structure in response to somatosensory feedback (Figure 1b, Experiment 1). We used non-auditory stimuli (mild cutaneous electrical stimulation), which we delivered during ongoing song performance, to differentially reinforce the acoustics (fundamental frequency, or ‘pitch’) of specific song elements, or ‘syllables.’ In separate experiments, we tested birds using auditory stimuli consisting of brief playbacks of white noise, a well-established paradigm for driving changes in pitch in adult songbirds (Hoffmann et al., 2016; Andalman and Fee, 2009; Tumer and Brainard, 2007). Delivering non-auditory and auditory stimuli on the same schedule therefore allowed us to directly compare how different sensory modalities affect vocal behavior. We next assessed the neural circuit mechanisms underlying somatosensory-driven vocal learning by determining the necessity of LMAN (the output nucleus of the AFP) for somatosensory learning (Figure 1b, Experiment 2). Finally, we assessed the role of dopaminergic neural circuitry in somatosensory vocal learning by performing selective lesions of dopaminergic input to Area X (Figure 1b, Experiment 3).

Results

Non-auditory feedback can drive adult songbird vocal learning

We tested whether non-auditory feedback can drive vocal learning (Figure 1b, Experiment 1) by providing mild, pitch-contingent cutaneous stimulation through a set of wire electrodes on the scalps of adult songbirds. Before initiating cutaneous stimulation training, we continuously recorded song without providing any feedback for 3 days (baseline) (Figure 2a). Every day, songbirds naturally produce many renditions of song, which consist of repeated patterns of unique vocal gestures, called syllables (Figure 2b, top). For one ‘target’ syllable in each experimental subject, we quantified rendition-to-rendition variability in the fundamental frequency (which we refer to here as ‘pitch’) of each occurrence of this syllable on the final baseline day (Figure 2b, top). To differentially reinforce the pitch of a target syllable, we determined a range of pitches within this baseline distribution (either all pitches above the 20th percentile or all pitches below the 80th percentile), and then triggered the delivery of cutaneous stimulation in real time (within 40 ms of syllable onset) when the pitch of the target syllable fell within this range (Figure 2b, bottom). We performed this pitch-contingent cutaneous stimulation training continuously for 3 days. Note that the birds could choose not to sing in order to avoid triggering any cutaneous stimulation, and we carefully monitored animal subjects for any signs of distress (see ‘Materials and methods’).

Figure 2—figure supplement 1. — (a) Timeline of vocal learning experiments in this example bird. The order of the auditory vs. non-auditory experiments was randomized across birds. (b) Top: spectrograms and song syllables (labeled b–f) including target syllable (‘d’). Bottom: baseline pitch distribution and pitch threshold. Cutaneous stimulation was provided during renditions of the target syllable above a chosen pitch threshold (‘hit’). (c) Each dot represents the pitch of one rendition of the target syllable. Renditions in the ‘hit’ range rapidly triggered a cutaneous stimulation (within 40 ms of syllable onset). During washout, cutaneous stimulation was discontinued. (d) Cumulative distribution function (CDF) plot showing the probability a value of pitch from a distribution falls at or below the value on the x-axis. The pitch distribution at the end of cutaneous stimulation training was significantly greater than baseline (two-sample Kolmogorov–Smirnov test, p=1.178e-12). End of washout distribution was not significantly different from baseline (two-sample Kolmogorov–Smirnov test, p=0.606). Panels (**b–d**) show data from the same experiment. (e) Adaptive pitch change (in semitones) of the target syllables during cutaneous stimulation training, grouped across 13 experiments. The mean change during training was significantly greater than baseline (the probability of resampled mean pitch on all three training days 2 and 3 lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). (f) Learning magnitudes (adaptive pitch change by end of training) in individual birds that underwent both white noise and cutaneous stimulation training (n = 14). Open squares indicate birds that did not undergo craniotomies for sham LMAN lesions and received cutaneous stimulation on their neck, open circles indicate birds that did not undergo craniotomies for sham LMAN lesions and received cutaneous stimulation on their scalp, and closed circles indicate birds that underwent LMAN sham operations and received cutaneous stimulation on their scalp. No significant difference in learning magnitudes during cutaneous stimulation training vs. during white noise training (paired t-test, p=0.313).

Figure 2—source data 1. Source data for analyses in Figure 2.

elife-75691-fig2-data1.zip^{(712.8KB, zip)}

Figure 2—source data 2. Source data for analyses in Figure 2—figure supplement 5 and Figure 2—figure supplement 6.

elife-75691-fig2-data2.zip^{(712.9KB, zip)}

Figure 2—source data 3. Source data for analyses in Figure 2—figure supplement 7.

elife-75691-fig2-data3.zip^{(39.1KB, zip)}

Figure 2—source code 1. Source code for use with Figure 2—source data 1 for analyses in Figure 2B-D.

elife-75691-fig2-code1.zip^{(8KB, zip)}

Figure 2—source code 2. Source code for use with Figure 2—source data 1 for analyses in Figure 2E.

elife-75691-fig2-code2.zip^{(15.5KB, zip)}

Figure 2—source code 3. Source code for use with Figure 2—source data 2 for analyses in Figure 2—figure supplement 5 and Figure 2—figure supplement 6.

elife-75691-fig2-code3.zip^{(17.1KB, zip)}

Figure 2—source code 4. Source code for use with Figure 2—source data 3 for analyses in Figure 2—figure supplement 7.

elife-75691-fig2-code4.zip^{(30.9KB, zip)}

For example, in one experiment (shown in Figure 2a–d, Figure 2—source data 1), cutaneous stimulation was triggered on every rendition of the target syllable that had a pitch above 2.13 kHz (the 20th percentile of the baseline distribution) for 3 days. In this example experiment, the bird gradually changed the pitch of the targeted syllable downward (the adaptive direction), such that cutaneous stimulation was triggered less frequently (Figure 2c). In other experiments where the adaptive direction of pitch change is upward, we triggered cutaneous stimulation whenever the target syllable pitch was below the 80th percentile of this distribution. In the example experiment, at the start of the first day of cutaneous stimulation training, 80% of syllable renditions resulted in cutaneous stimulation and 20% of syllable renditions resulted in escapes. On the third (final) day of cutaneous stimulation training, escapes occurred on over 60% of target syllable renditions and the entire distribution of pitches had changed significantly in the adaptive direction, indicating that a significant amount of vocal learning occurred in this example experiment (Figure 2d; two-sample Kolmogorov–Smirnov test to assess the difference between baseline and end of cutaneous stimulation training, p=1.1776e-12). We then stopped triggering cutaneous stimulation and continued to record unperturbed song for six additional days (washout). After 6 days of washout, there was no significant difference between the distribution of target syllable pitches at the end of washout compared to baseline (Figure 2d; two-sample Kolmogorov–Smirnov test, p=0.606). For analysis of washout across all experiments, see Figure 2—figure supplement 1.

In order to assess whether non-auditory feedback is sufficient to drive vocal learning across multiple songbirds, we first measured the adaptive pitch change (in semitones) for each individual experiment. Semitones provide a normalized measure of pitch change such that a one semitone change corresponds to a roughly 6% change in the absolute frequency of an acoustic signal (see Equation 1). We employed a hierarchical bootstrap approach to measure SEM and assess significance (see ‘Materials and methods’; Saravanan et al., 2020; Saravanan et al., 2019) since this method more accurately quantifies the error in hierarchical data (e.g., many renditions of a target syllable collected across multiple birds). We found that the mean pitch (in semitones) of the target syllables showed a significant, adaptive change from baseline on days 2 and 3 of cutaneous stimulation training (Figure 2e; the probability of resampled mean pitch on cutaneous stimulation training days 2 and 3 lesser than or equal to zero was P_boot < 0.0010, limit due to resampling 10⁴ times, n = 13 experiments in 12 birds, one bird underwent two cutaneous stimulation experiments; Figure 2—source data 1). This demonstrates that non-auditory feedback is sufficient to drive vocal learning in adult songbirds. In all individual experiments where an upward pitch change resulted in less frequent triggering of cutaneous stimulation, the birds changed their pitch in the adaptive (upward) direction, and in all experiments where a downwards pitch change resulted in less frequent triggering of cutaneous stimulation, the birds changed their pitch in the adaptive (downward) direction (Figure 2—figure supplement 2a, Figure 2—source data 1).

To further characterize cutaneous stimulation training and compare this form of learning to well-established vocal learning paradigms, we performed multiple learning experiments – one cutaneous stimulation and one white noise – in 8 out of the 12 individual birds from this dataset where the implanted electrode wires remained intact for a long enough time to perform multiple sets of experiments (Figure 2a). To account for the potential influence of multiple trainings in the same individual birds on magnitude of learning, we randomized the order of white noise training and cutaneous stimulation training for the birds that underwent both training paradigms. We also included six LMAN sham-operated birds from a later set of experiments in this particular analysis. We did so because the sham-operated birds had intact song systems and underwent both cutaneous stimulation and white noise training. Also, we found no statistically significant difference between the magnitude of learning by the end of training in birds that did not undergo craniotomies for LMAN, 6-OHDA, or sham lesions compared with the magnitude of learning in birds that received sham LMAN lesions for either white noise experiments (two-sample t-test, p=0.779) or cutaneous stimulation experiments (two-sample t-test, p=0.148).

Consistent with prior studies (Ali et al., 2013; Hoffmann et al., 2016; Tumer and Brainard, 2007), by the end of white noise training, the adaptive pitch change (in semitones) across all white noise experiments performed in unoperated birds (birds that had wire electrodes surgically implanted but received no invasive brain procedures like sham operations) was significantly greater than baseline (zero) (Figure 2—figure supplement 3a; the probability of resampled mean pitch on all three cutaneous stimulation training days lesser than or equal to zero was P_boot < 0.0010; Figure 2—source data 1). In the separate experimental group of birds that underwent sham operations, we also observed significant adaptive pitch changes in response to white noise bursts, as expected (Figure 2—figure supplement 3b; the probability of resampled mean pitch on all three cutaneous stimulation training days lesser than or equal to zero was P_boot < 0.0010; Figure 2—source data 1). There was significant individual variability in learning magnitudes (adaptive pitch change at the end of training) during cutaneous stimulation and white noise experiments (Figure 2f, Figure 2—source data 1). We found no systematic differences between learning magnitude during cutaneous stimulation training and the learning magnitude during white noise training (Figure 2f; paired t-test, p=0.313). These results suggest that non-auditory stimuli can drive vocal learning as effectively as auditory stimuli.

To confirm that cutaneous stimulation learning was truly driven by the non-auditory stimulus and not by an unintentional, acute change in vocal output caused by the cutaneous stimulation, we measured the syllable features of interleaved ‘catch’ trials, where cutaneous stimulation was randomly withheld (see ‘Materials and methods’) on each day of cutaneous stimulation training. For each experiment, we normalized the pitch of each catch trial from each day of training to the mean pitch of all trials where cutaneous stimulation was provided. We excluded any experiments where the total number of catch trials was less than 10. In every case, the normalized catch trials did not differ significantly from 1, indicating that the pitch of catch trials was highly similar to trials where cutaneous stimulation was provided (Figure 2—figure supplement 4a; t-test, 0.071 < p < 0.997 for each experiment; Figure 2—source data 1). For comparison, we also performed the same analysis on randomly selected trials from a day of baseline recording, where cutaneous stimulation was not provided on any trials (Figure 2—figure supplement 4a, Figure 2—source data 1). There was no significant difference between this dataset and the normalized catch trials (paired t-test, p=0.339). We repeated this analysis for other syllable features, such as syllable duration, sound amplitude, and spectral entropy. In all cases, we did not see a robust, acute change in song performance caused by the cutaneous stimulation. To ensure that the cutaneous stimulation on the scalp did not drive learning through an unexpected influence on brain activity in dorsal auditory areas of the pallium, we implanted the wire electrodes in the neck instead of the scalp in 7 out of 12 birds used in these experiments. The magnitude of vocal learning did not differ between the two groups of birds on any day of training (0.679 < P_boot < 0.891). To demonstrate that the ability to learn to adaptively shift the pitch of the target syllable is consistent across individual birds, we have plotted the results of 6 out of the 12 birds used in this dataset in Figure 2—figure supplement 5 and Figure 2—figure supplement 6 (Figure 2—source data 2). Also, we reanalyzed this dataset by measuring syllable pitch from syllables produced in the evening (6 PM to 8 PM) instead of in the morning, and we found no significant difference in the magnitude of learning between song collected between 10 AM and noon and song collected between 6 PM and 8 PM on any day of training (Figure 2—figure supplement 7, 0.167 < P_boot < 0.951 on all days of training; Figure 2—source data 3). Taken together, these results indicate that the gradual, adaptive pitch shift is driven by non-auditory cutaneous stimulation and not by other unintentional effects of the stimulation, the methodology of wire implantation, or the time window of song analysis.

LMAN is required for non-auditory vocal learning

We next investigated the neural circuitry that processes non-auditory feedback to drive vocal learning. To assess whether the AFP is required for non-auditory vocal learning, we measured the effect of lesions of LMAN, the output nucleus of the AFP, on learning magnitude in response to non-auditory feedback (Figure 1b, Experiment 2). We performed cutaneous stimulation training experiments in the same individual birds before and after bilateral, electrolytic LMAN lesions (n = 5 birds, and one additional bird that only underwent postlesion training) or sham operations (Figure 3a, n = 5 birds). To perform cutaneous stimulation training in this group of experiments, we used the same protocol described previously, except we extended the period of cutaneous stimulation training an additional 2 days. During this extended training period, we set a new pitch threshold each morning to drive even greater amounts of learning (‘staircase’ training, see ‘Materials and methods’). We did so in case LMAN lesions differentially impacted small and large magnitudes of learning, In adult songbirds with intact song systems (prelesion), such staircase training drove significant amounts of learning (Figure 3c).

Figure 3. — (a) Timeline for electrolytic lesions of LMAN and sham operations. (b) CV of syllable pitch pre- vs. postlesion and pre- vs. postsham. LMAN lesions induced a significant reduction in pitch CV, sham operations did not (paired t-tests, p=0.002, p=0.911, respectively). (c) Prelesion experiment. Training consisted of 3 days using a fixed pitch threshold, then additional days where the threshold was changed each morning (‘staircase’). Each dot represents the pitch of a rendition of the target syllable. (d) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 6 LMAN-lesioned birds). Prelesion learning magnitude was significantly greater than baseline (the probability of resampled mean pitch on each day of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Postlesion learning magnitude did not significantly differ from baseline (0.297 < P_boot < 0.660 on each of the final 4 days of training). Prelesion learning magnitude was significantly greater than postlesion learning magnitude (the probability of resampled mean pitch of prelesion data on the final 4 days of training lesser than or equal to resampled mean pitch of postlesion data was P_boot < 0.0070, indicated by asterisks). (e) Adaptive pitch change during cutaneous stimulation training (n = 5 sham-operated birds). Learning magnitudes were significantly greater than baseline both pre- and postsham (the probability of resampled mean pitch on each day of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Learning magnitudes pre- vs. postsham did not significantly differ (0.120 < P_boot < 0.524 on all days of training).

Figure 3—source data 1. Source data for analysis in Figure 3B.

elife-75691-fig3-data1.zip^{(63.4KB, zip)}

Figure 3—source data 2. Source data for analyses in Figure 3C, D.

elife-75691-fig3-data2.zip^{(699.5KB, zip)}

Figure 3—source data 3. Source data for analysis in Figure 3E.

elife-75691-fig3-data3.zip^{(798.4KB, zip)}

Figure 3—source code 1. Source code for use with Figure 3—source data 1 for analysis in Figure 3B.

elife-75691-fig3-code1.zip^{(4.3KB, zip)}

Figure 3—source code 2. Source code for use with Figure 3—source data 2 for analysis in Figure 3C.

elife-75691-fig3-code2.zip^{(1.6KB, zip)}

Figure 3—source code 3. Source code for use with Figure 3—source data 2 for analysis in Figure 3D.

elife-75691-fig3-code3.zip^{(14.3KB, zip)}

Figure 3—source code 4. Source code for use with Figure 3—source data 3 for analysis in Figure 3E.

elife-75691-fig3-code4.zip^{(12.7KB, zip)}

Figure 3—figure supplement 1. — (a) Timeline for electrolytic lesions of LMAN and sham operations. (b) CV of syllable pitch pre- vs. postlesion and pre- vs. postsham. LMAN lesions induced a significant reduction in pitch CV, sham operations did not (paired t-tests, p=0.002, p=0.911, respectively). (c) Prelesion experiment. Training consisted of 3 days using a fixed pitch threshold, then additional days where the threshold was changed each morning (‘staircase’). Each dot represents the pitch of a rendition of the target syllable. (d) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 6 LMAN-lesioned birds). Prelesion learning magnitude was significantly greater than baseline (the probability of resampled mean pitch on each day of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Postlesion learning magnitude did not significantly differ from baseline (0.297 < P_boot < 0.660 on each of the final 4 days of training). Prelesion learning magnitude was significantly greater than postlesion learning magnitude (the probability of resampled mean pitch of prelesion data on the final 4 days of training lesser than or equal to resampled mean pitch of postlesion data was P_boot < 0.0070, indicated by asterisks). (e) Adaptive pitch change during cutaneous stimulation training (n = 5 sham-operated birds). Learning magnitudes were significantly greater than baseline both pre- and postsham (the probability of resampled mean pitch on each day of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Learning magnitudes pre- vs. postsham did not significantly differ (0.120 < P_boot < 0.524 on all days of training).

Figure 3—source data 1. Source data for analysis in Figure 3B.

elife-75691-fig3-data1.zip^{(63.4KB, zip)}

Figure 3—source data 2. Source data for analyses in Figure 3C, D.

elife-75691-fig3-data2.zip^{(699.5KB, zip)}

Figure 3—source data 3. Source data for analysis in Figure 3E.

elife-75691-fig3-data3.zip^{(798.4KB, zip)}

Figure 3—source code 1. Source code for use with Figure 3—source data 1 for analysis in Figure 3B.

elife-75691-fig3-code1.zip^{(4.3KB, zip)}

Figure 3—source code 2. Source code for use with Figure 3—source data 2 for analysis in Figure 3C.

elife-75691-fig3-code2.zip^{(1.6KB, zip)}

Figure 3—source code 3. Source code for use with Figure 3—source data 2 for analysis in Figure 3D.

elife-75691-fig3-code3.zip^{(14.3KB, zip)}

Figure 3—source code 4. Source code for use with Figure 3—source data 3 for analysis in Figure 3E.

elife-75691-fig3-code4.zip^{(12.7KB, zip)}

We then lesioned LMAN and performed postlesion white noise training across conditions (LMAN lesion and sham) (Figure 2—figure supplement 3b). The efficacy of LMAN lesions was confirmed both by the presence of a characteristic reduction in the trial-to-trial variability of syllable pitch, analyzed across all labeled song syllables produced by the birds (including the target syllable used in learning experiments) (Figure 3b, Figure 3—figure supplement 1a, LMAN lesions p=0.002, sham lesions p=0.911, paired t-tests; Figure 3—source data 1; Kao et al., 2005; Kao and Brainard, 2006; Hampton et al., 2009), as well as by post-hoc histological measurements (see ‘Materials and methods’ and Figure 3—figure supplement 2). Following LMAN lesions, songbirds did not significantly change the pitch of the target syllable from baseline (zero) (the probability of resampled mean pitch on the final 4 days of training lesser than or equal to zero was P_boot > 0.223, n = 3). In contrast, following sham lesions, birds significantly changed the pitch of the target syllable in the adaptive direction (the probability of resampled mean pitch on the final 4 days of training days lesser than or equal to zero was P_boot < 0.0010, n = 5). This indicates that LMAN lesions induced significant deficits in auditory-driven vocal learning, consistent with previous work (Hisey et al., 2018).

LMAN lesions also significantly impaired non-auditory vocal learning. Prelesion, songbirds adaptively changed the pitch of the target syllable away from baseline in response to non-auditory feedback (the probability of resampled mean pitch on each day of cutaneous stimulation training lesser than or equal to zero was P_boot < 0.0010) (Figure 3d, Figure 3—source data 2). Postlesion, non-auditory vocal learning was abolished in those same birds (the probability of resampled mean pitch on each of the final 4 days of training lesser than or equal to zero was 0.297 < P_boot < 0.660, where 0.025 < P_boot < 0.975 indicates no significant difference, n = 6 birds; one bird from this group did not undergo prelesion experimentation) (Figure 3d, Figure 3—source data 2). Learning magnitude prelesion was significantly greater compared to learning magnitude postlesion (P_boot < 0.007 on each of the final 4 days of training). We observed significant learning during cutaneous stimulation training in both pre- and postsham-lesioned datasets (Figure 3e, for both presham and postsham datasets, the probability of resampled mean pitch on each day of cutaneous stimulation training lesser than or equal to zero was P_boot < 0.0010, n = 5 birds; Figure 3—source data 3). Also, the learning magnitudes during cutaneous stimulation training did not significantly differ in pre- vs. postsham datasets (the probability of resampled mean pitch of presham data on each day of training lesser than or equal to resampled mean pitch of postlesion data was 0.120 < P_boot < 0.524). The amount of pitch change during cutaneous stimulation training for each individual experiment is shown in Figure 2—figure supplement 2c, d.

We also directly compared the lesion-induced change in learning magnitudes between conditions (LMAN lesion vs. sham) (Figure 3—figure supplement 1b and c). First, we calculated learning magnitude at the end of the fixed threshold training period across conditions. The lesion-induced change in learning magnitude (post – pre) for LMAN-lesioned birds was significantly greater than that for sham-operated birds (Figure 3—figure supplement 1b; two-sample Kolmogorov–Smirnov test, p=0.036, n = 5 birds in both groups). Next, we calculated learning magnitude at the end of the extended ‘staircase’ portion of cutaneous stimulation training across conditions. The lesion-induced change in learning magnitude (post – pre) for LMAN-lesioned birds calculated at this time point was also significantly greater than for sham-lesioned birds (Figure 3—figure supplement 1c; two-sample Kolmogorov–Smirnov test, p=0.004, n = 5 birds in both groups). These results show that LMAN is required for non-auditory vocal learning in adult songbirds, indicating that both auditory and non-auditory sensory feedback engage the AFP to drive adaptive changes to song.

Dopaminergic input to Area X is required for non-auditory vocal learning

We next assessed dopaminergic contributions to non-auditory vocal learning (Figure 1b, Experiment 3). Learning magnitude during cutaneous stimulation training was assessed before and after bilaterally lesioning dopaminergic projections in Area X, the basal ganglia nucleus of the AFP, in individual songbirds (Figure 4a, n = 5 birds). Selective lesions of dopaminergic projections in Area X were performed via bilateral 6-OHDA injections in Area X (see ‘Materials and methods’), and the effectiveness of the 6-OHDA injections at lesioning dopaminergic innervation in Area X was quantified (Figure 4—figure supplement 1). This approach has previously been shown to selectively lesion dopaminergic inputs to Area X without damaging non-dopaminergic cells (Hoffmann et al., 2016; Saravanan et al., 2019).

Figure 4. — (a) Timeline for 6-OHDA and saline (sham) injections into Area X. (b) CV of syllable pitch pre- vs. postlesion and pre- vs. postsham. Neither dopamine lesions nor shams induced significant changes in pitch CV (paired t-tests, p=0.397 and p=0.531, respectively). (c) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 5 lesioned birds). Prelesion learning magnitude was significantly greater than baseline (the probability of resampled mean pitch on each of the final 4 days of training lesser than or equal to zero was P_boot < 0.010, indicated by filled circles). Postlesion learning magnitude did not significantly differ from baseline except for on the final day, when the mean changed in the anti-adaptive direction (P_boot > 0.067 on training days 1–4, P_boot < 0.0010 on training day 5). Prelesion learning magnitude was significantly greater than postlesion learning magnitude (the probability of resampled mean pitch from prelesion dataset on each of the final 3 days of training lesser than or equal to resampled mean pitch from postlesion dataset was P_boot < 0.0010, indicated by asterisks). (d) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 3 sham-lesioned birds). Learning magnitudes were significantly greater than baseline both pre- and postsham (the probability of resampled mean pitch from presham and postsham datasets on each day other than day 2 of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Learning magnitudes pre- vs. postsham did not significantly differ (0.653 < P_boot < 0.931 on all days of training).

Figure 4—source data 1. Source data for analysis in Figure 4B.

elife-75691-fig4-data1.zip^{(49.7KB, zip)}

Figure 4—source data 2. Source data for analysis in Figure 4C.

elife-75691-fig4-data2.zip^{(595.9KB, zip)}

Figure 4—source data 3. Source data for analysis in Figure 4D.

elife-75691-fig4-data3.zip^{(357.7KB, zip)}

Figure 4—source code 1. Source code for use with Figure 4—source data 1 for analysis in Figure 4B.

elife-75691-fig4-code1.zip^{(4KB, zip)}

Figure 4—source code 2. Source code for use with Figure 4—source data 2 for analysis in Figure 4C.

elife-75691-fig4-code2.zip^{(14.9KB, zip)}

Figure 4—source code 3. Source code for use with Figure 4—source data 3 for analysis in Figure 4D.

elife-75691-fig4-code3.zip^{(15.1KB, zip)}

Figure 4—figure supplement 1. — (a) Timeline for 6-OHDA and saline (sham) injections into Area X. (b) CV of syllable pitch pre- vs. postlesion and pre- vs. postsham. Neither dopamine lesions nor shams induced significant changes in pitch CV (paired t-tests, p=0.397 and p=0.531, respectively). (c) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 5 lesioned birds). Prelesion learning magnitude was significantly greater than baseline (the probability of resampled mean pitch on each of the final 4 days of training lesser than or equal to zero was P_boot < 0.010, indicated by filled circles). Postlesion learning magnitude did not significantly differ from baseline except for on the final day, when the mean changed in the anti-adaptive direction (P_boot > 0.067 on training days 1–4, P_boot < 0.0010 on training day 5). Prelesion learning magnitude was significantly greater than postlesion learning magnitude (the probability of resampled mean pitch from prelesion dataset on each of the final 3 days of training lesser than or equal to resampled mean pitch from postlesion dataset was P_boot < 0.0010, indicated by asterisks). (d) Adaptive pitch change (in semitones) during cutaneous stimulation training (n = 3 sham-lesioned birds). Learning magnitudes were significantly greater than baseline both pre- and postsham (the probability of resampled mean pitch from presham and postsham datasets on each day other than day 2 of training lesser than or equal to zero was P_boot < 0.0010, indicated by filled circles). Learning magnitudes pre- vs. postsham did not significantly differ (0.653 < P_boot < 0.931 on all days of training).

Figure 4—source data 1. Source data for analysis in Figure 4B.

elife-75691-fig4-data1.zip^{(49.7KB, zip)}

Figure 4—source data 2. Source data for analysis in Figure 4C.

elife-75691-fig4-data2.zip^{(595.9KB, zip)}

Figure 4—source data 3. Source data for analysis in Figure 4D.

elife-75691-fig4-data3.zip^{(357.7KB, zip)}

Figure 4—source code 1. Source code for use with Figure 4—source data 1 for analysis in Figure 4B.

elife-75691-fig4-code1.zip^{(4KB, zip)}

Figure 4—source code 2. Source code for use with Figure 4—source data 2 for analysis in Figure 4C.

elife-75691-fig4-code2.zip^{(14.9KB, zip)}

Figure 4—source code 3. Source code for use with Figure 4—source data 3 for analysis in Figure 4D.

elife-75691-fig4-code3.zip^{(15.1KB, zip)}

We again measured the variability of syllable pitch pre- and postlesion by calculating syllable CV. Dopaminergic lesions in Area X did not induce a significant change in syllable CV (Figure 4b; paired t-test, p=0.397; Figure 4—source data 1). Sham operations also did not induce a significant change in syllable CV (Figure 4b; paired t-test, p=0.531). The lesion-induced changes in syllable CV (post – pre) were not significantly different for 6-OHDA-lesioned birds than for sham-lesioned birds (Figure 3—figure supplement 1d; two-sample Kolmogorov–Smirnov test, p=0.054). This finding is consistent with prior work using similar 6-OHDA injections to lesion dopaminergic input to Area X Hoffmann et al., 2016. Prior work has suggested a link between dopamine in songbird AFP and the generation of variability in syllable pitch in adult songbirds (Murugan et al., 2013; Sasaki et al., 2006; Leblois et al., 2010). It is likely that the dopamine lesion methodology we used, which spares about 50% of the dopaminergic input to Area X Hoffmann et al., 2016, is insufficient to impair dopamine-mediated generation of syllable variability. The result that these dopamine lesions do not alter vocal variability establishes that any learning deficits observed following lesions of AFP circuits are not simply due to decreased pitch variability.

Depletion of dopaminergic input to Area X significantly impaired adaptive, non-auditory vocal learning. Prelesion, songbirds adaptively changed the pitch of the target syllable during cutaneous stimulation training (the probability of resampled mean pitch on each of the final 4 days of cutaneous stimulation training lesser than or equal to zero was P_boot < 0.010) (Figure 4c, Figure 4—source data 2). Postlesion, these same birds were not able to adaptively change the pitch of the target syllable during cutaneous stimulation training (the probability of resampled mean pitch on each of the first 4 days of training lesser than or equal to zero was 0.067 < P_boot < 0.553; the probability of resampled mean pitch on the final day of training greater than or equal to zero was P_boot < 0.0010, n = 5 birds). Learning magnitude prelesion was significantly greater compared to learning magnitude postlesion (the probability of resampled mean pitch from prelesion dataset on each of the final 3 days of cutaneous stimulation training lesser than or equal to resampled mean pitch from postlesion dataset was P_boot < 0.0010). Both pre- and postsham, songbirds displayed significant amounts of learning during cutaneous stimulation training (Figure 4d; the probability of resampled mean pitch from the presham dataset on each day other than day 2 of cutaneous stimulation training lesser than or equal to zero was P_boot < 0.0010; the probability of resampled mean pitch from the postsham dataset on each day of cutaneous stimulation training lesser than or equal to zero was P_boot < 0.0010, n = 3 birds; Figure 4—source data 3). Also, the learning magnitudes during cutaneous stimulation training did not significantly differ pre- vs. postsham (the probability of resampled mean pitch of presham data on each day of training lesser than or equal to resampled mean pitch of postlesion data was 0.653 < P_boot < 0.931). These results demonstrate that dopaminergic input to Area X is required for adaptive changes in vocal output in response to non-auditory signals. The amount of pitch change during cutaneous stimulation training for each individual experiment is shown in Figure 2—figure supplement 2d and e.

Discussion

Our results demonstrate that non-auditory feedback can drive vocal learning in adult songbirds, and that the AFP and its dopaminergic inputs are required for non-auditory vocal learning. We first demonstrated that adult songbirds learn to adaptively change the pitch of their song syllables in response to cutaneous stimulation (Figure 1b, Experiment 1). We next demonstrated that LMAN, the output nucleus of the AFP, is necessary for the expression of this non-auditory vocal learning (Figure 1b, Experiment 2). Finally, we showed that dopaminergic input to Area X, the basal ganglia nucleus of the AFP, is necessary for non-auditory vocal learning (Figure 1b, Experiment 3). These results show that adult vocal learning is not solely dependent on auditory feedback, and that the songbird AFP is not specialized just for processing auditory feedback for vocal learning, as has previously been hypothesized (Murdoch et al., 2018). Instead, these results, in conjunction with prior work using visual cues to drive changes in vocal output (Zai et al., 2020), indicate that the AFP processes auditory feedback as well as non-auditory feedback to drive vocal learning. Our results further show that the AFP processes somatosensory information to guide vocal learning, and that dopaminergic neural circuitry is necessary for non-auditory learning. Prior work has shown that songbird vocal muscles use somatosensory feedback to compensate for experimentally induced changes in respiratory pressure during song performance (Suthers et al., 2002). The result that the AFP underlies vocal learning driven by somatosensory signals (cutaneous stimulation) suggests that it could play a role in processing somatosensory information from vocal muscles to guide song performance. Also, the fact that mild cutaneous stimulation is different than the direct proprioceptive feedback from vocal muscles or vocal effectors, yet the AFP still underlies vocal learning in response to cutaneous stimulation, suggests that the AFP can integrate sensory information from a wide variety of sources of sensory feedback, even those not directly produced by vocalizations.

Our findings suggest the importance of neural pathways that convey non-auditory sensory signals to the song system. The neuroanatomical pathways for auditory feedback to enter the AFP are well-characterized. For example, recent work has demonstrated that songbird ventral pallidum (VP) receives input from auditory cortical areas, encodes auditory feedback information, and projects to VTA (Chen et al., 2019). This represents a likely pathway by which sensory information from white noise bursts could influence neural activity in VTA, which could then drive changes in the AFP that promote song learning. Comparatively less is known about the pathways in the songbird brain that might carry sensory information from cutaneous stimulation to the AFP. The results showing that dopaminergic input to Area X (which originates in the VTA) is necessary for non-auditory vocal learning suggest that pathways for non-auditory information ultimately project to the VTA, where this information could be encoded and transmitted to the AFP to drive learning. Further work is necessary to fully reveal the role the dopaminergic system plays in guiding non-auditory vocal learning. Following dopamine depletions in Area X, songbirds displayed a small but significant anti-adaptive change in syllable pitch. We believe this finding should be treated with caution because two of the four postlesion experiments in this dataset had to be stopped earlier than expected due to pandemic-related disruptions, and the extent to which the change in significance on day 5 might reflect this issue is unclear. We believe follow-up studies should aim to confirm this result in additional songbirds.

Prior studies have hinted that non-auditory feedback may play an important role in shaping vocalizations in ethological contexts, particularly during development. For example, juvenile songbirds that receive both auditory and visual feedback from live tutors display more accurate copying of tutor songs relative to juvenile songbirds that only receive auditory feedback from their tutors (Chen et al., 2016). Also, visual displays from adult song tutors positively reinforce the acquisition of specific song elements in juvenile songbirds (West and King, 1988), further suggesting an important role for visual signals in social interactions during song learning. Our results that cutaneous stimulation can drive adaptive vocal changes in adult songbirds demonstrate that non-auditory signals, even in the absence of any social cues or other reinforcing sensory signals, can drive vocal learning with similar effectiveness as auditory feedback. Further, our work suggests that the AFP might play a role in processing non-auditory sensory information important to other social behaviors that involve vocal communication, such as courtship, territorial displays, and pair bonding.

It has been hypothesized that a key function of the songbird AFP circuitry is to encode auditory performance error: the evaluation of the match between the auditory feedback the songbirds receive and their internal goal for what their song should sound like (based on their stored memory of the tutor song template) (Sober and Brainard, 2009; Saravanan et al., 2019; Fee and Goldberg, 2011; Gadagkar et al., 2016). It has been difficult to determine the extent to which distorted auditory feedback drives adaptive changes in vocal output due to the aversive nature of the stimulus as opposed to the stimulus being interpreted by the bird as an auditory performance error. Some auditory vocal learning experiments have provided white noise bursts during ongoing song performance. In these experiments, songbirds adaptively modify their vocal output to avoid triggering white noise bursts as frequently (Hoffmann et al., 2016; Tumer and Brainard, 2007; Charlesworth et al., 2011). Also, white noise bursts can often cause song interruptions at first, suggesting that they are startling to the birds (Hoffmann et al., 2016; Tumer and Brainard, 2007). Other experiments have used distorted elements of song syllable segments played during song performance (distorted auditory feedback) and found that they elicit a pattern of activity in dopaminergic neurons consistent with the encoding of performance error (Gadagkar et al., 2016). Importantly, when bursts of noise are provided in non-vocal contexts, such as when a songbird stands on a particular perch (not during song performance), they can positively reinforce place preference (Murdoch et al., 2018). Thus, due to the various nuances in experimental methodology and the inherent difficulty in measuring the aversive nature of the auditory stimuli, it is unclear whether white noise bursts drive learning because the white noise is registered by the birds as a performance error or because the white noise is generally aversive. Although the results of the experiments described here do not directly address this, they do show that cutaneous stimulation (an explicit, external, aversive sensory stimulus) is sufficient to drive vocal learning. That the AFP underlies non-auditory learning suggests that the AFP does not solely encode auditory performance error. Instead, the AFP may encode more general information about whether vocal performance resulted in a ‘good’ or ‘bad’ outcome, and it may use this information to drive changes to future motor output.

The numerous analogies between the specialized vocal learning neural circuits that have evolved in songbirds and in humans suggest that our findings may be relevant to understanding the neural circuit mechanisms underlying human speech (Doupe and Kuhl, 1999; Jarvis, 2019; Brainard and Doupe, 2002; Brainard and Doupe, 2013). Human speech depends on both auditory and non-auditory sensory information to guide learning, yet very little is known about the neural mechanisms for non-auditory vocal learning (Goldstein et al., 2003; Locke and Snow, 2010; Kuhl, 2007). Our findings show that specialized vocal learning circuitry in songbirds processes non-auditory information to drive vocal plasticity. We suggest that the analogous vocal circuitry in humans may also underlie non-auditory vocal learning. This neural circuitry in humans may underlie the processing of multimodal sensory signals during social interactions that modulate speech learning (Goldstein et al., 2003; Locke and Snow, 2010; Kuhl, 2007), or the non-auditory, somatosensory feedback from vocal effectors during speech production (Tremblay et al., 2003).

Materials and methods

All subjects were adult (>100 days old) male Bengalese finches (Lonchura striata var. domestica). All procedures were approved by Emory University’s Institutional Animal Care and Use Committee (protocol #201700359). All singing was undirected (in the absence of a female bird) throughout all experiments.

Delivery of non-auditory sensory feedback

To deliver non-auditory feedback signals to freely behaving songbirds during ongoing song performance, we first performed a surgery prior to any experimentation. Stainless steel wires were uninsulated at the tip (2–4 mm) and implanted subcutaneously on the bird’s scalp. The approximate location of the scalp electrodes was 4.47 mm lateral and 6.3 mm anterior, relative to Y₀, far from the coordinates used for targeting auditory pallium, which are 1.1 mm anterior and 0.7 mm lateral, relative to Y₀, and 1.5 mm ventral from the surface of the brain (Spool et al., 2021). In 7 out of all 28 birds used across all experiments performed, wires were implanted intramuscularly in the birds' necks instead of on their scalps. The wires were soldered onto a custom-made circuit board that, during surgery, was placed on the bird’s skull using dental cement. The circuit was connected to an electric stimulator (A-M Systems Isolated Pulse Stimulator), which produced pitch-contingent electrical currents through the wires implanted on the bird. We set the duration of cutaneous stimulation to 50 ms, which was a long enough duration to overlap with a large portion of the targeted syllable, yet a short enough duration to avoid interfering with following song syllables. We typically set the magnitude of electric current used for producing the cutaneous stimulation to 100–350 μA, which is behaviorally salient (the first few instances of cutaneous stimulation interrupt song), yet subtle enough as to not produce any body movements or signs of distress. Importantly we tried to match the same level of behavioral saliency across birds and, although the magnitude of electric current varied by a large amount across individual birds, it only varied by small amounts (<20 μA) pre- vs postlesion in those sets of experiments. Stimulations typically occurred within 20–30 ms of target syllable onset. Acute effects of electrical cutaneous stimulation on song structure, such as pitch, sound amplitude, entropy, or syllable sequence, were assessed to ensure these non-auditory stimuli produced no immediate, systematic, acoustic effects. This ensures that any observed gradual changes to song structure in response to cutaneous stimulation are due to non-auditory learning.

Vocal learning paradigm and song analysis

Experimental testing of vocal learning was performed by driving adaptive changes in the fundamental frequency (pitch) of song syllables. To do so, we delivered pitch-contingent, non-auditory feedback (mild cutaneous electrical stimulation) to freely behaving songbirds in real time during song performance. We followed the same experimental protocols as experiments using white noise feedback to drive vocal learning (Hoffmann et al., 2016; Tumer and Brainard, 2007), except we used cutaneous stimulation instead of white noise bursts. After surgically implanting the fine-wire electrodes, we recorded song continuously for 3 days without providing any experimental feedback (cutaneous stimulation or white noise bursts). We refer to this period as ‘baseline’ (Figure 2a).

On the last (third) day of baseline, we measured the pitch of every rendition of the target syllable sung between 10 AM and 12 PM. We set a fixed pitch threshold based on the distribution of these pitches, such that we would provide sensory feedback only when the pitch of a rendition of the target syllable was above the 20th percentile of the baseline distribution (‘hit’), and all renditions outside of this range did not trigger any feedback (‘escape’). In this case, an adaptive vocal change would therefore be to change the pitch of the target syllable down, thereby decreasing the frequency of triggering cutaneous stimulation. In other experiments, we triggered feedback on all renditions below the 80th percentile of the baseline pitch distribution. In this case, an adaptive vocal change would be to change the pitch of the target syllable up. For each experiment, we randomly selected which of these two contingencies we employed so we could assess bidirectional adaptations in vocal motor output. In a subset of experiments, we used the 90th percentile and 10th percentile pitch values to set the pitch threshold. Importantly, we also randomly withheld triggering feedback on 10% of syllable renditions, regardless of syllable pitch or the experimental pitch contingency. This allows us to compare syllable renditions that did or did not result in cutaneous stimulation to assess any acute effects of this form of feedback on syllable structure.

At 10 AM on the fourth day of continuous song recording, we began providing pitch-contingent cutaneous stimulation in real time, targeted to specific song syllables sung within a specified range of pitches. We refer to this time period as ‘cutaneous stimulation training’ (Figure 2a). We used custom LabVIEW software to continuously record song, monitor song for specific elements indicative of the performance of the target syllable, perform online, rapid pitch calculation, and trigger feedback in real time. The computers running this software were connected to an electric stimulator. When the electric stimulator received input from the LabVIEW software, it would then trigger a 50 ms burst of electric current through the implanted wire electrodes. During cutaneous stimulation training, we continuously recorded song and provided pitch-contingent cutaneous stimulation at the set fixed pitch threshold for 3 days. During these 3 days, every time the bird sang within the ‘hit’ range, a mild cutaneous stimulation was immediately triggered.

After 3 days of cutaneous stimulation training, we stopped providing cutaneous stimulation but continued recording unperturbed song for six additional days. We refer to this period as ‘washout’ (Figure 2a). During washout, we consistently observed spontaneous pitch restoration back to baseline across all experiments, which is in congruence with results from numerous white noise learning experiments (Hoffmann et al., 2016; Andalman and Fee, 2009; Tumer and Brainard, 2007). This allows for multiple experiments to be performed from similar baseline conditions in the same individual songbird.

In 14 out of all 28 birds used throughout this study, we performed both white noise training and cutaneous stimulation training in the same individual birds (Figure 2a). After the end of cutaneous stimulation training and 6 days of washout (when the pitch of the target syllable had restored to baseline levels), we performed the exact same experimental protocol, but we used white noise feedback instead of cutaneous stimulation. We could then compare learning in response to two different sources of sensory feedback in the same individual subject. We also sometimes reversed the order of experimentation by performing white noise experiments first and cutaneous stimulation experiments second. The order of experimentation was randomly decided for each songbird before beginning any white noise or cutaneous stimulation training.

For all LMAN lesion (Figure 3a) and 6-OHDA lesion experiments (Figure 4a), we performed a cutaneous stimulation training experiment prelesion. After 6 days of washout, we then performed surgery to lesion the neural circuit of interest. We then performed another cutaneous stimulation experiment in the same individual bird using the exact same protocol we used prelesion. For all of these lesion cutaneous stimulation experiments, we used the aforementioned cutaneous stimulation training paradigm, but with one slight alteration: we extended the number of days of cutaneous stimulation training and introduced a new methodology for setting the pitch threshold on these extended days of training. We still set a fixed pitch threshold based on analysis of the pitch distribution from the final day of baseline and performed 3 days of cutaneous stimulation training using this fixed pitch threshold. We refer to this portion of the lesion experiments as ‘fixed’ because the pitch threshold for determining whether a cutaneous stimulation was provided remained the same for all 3 days. Rather than stopping cutaneous stimulation training at this point, we instead continued providing pitch-contingent cutaneous stimulation for an additional 1–5 days. In the morning (at 10 AM) on each of these extended days of cutaneous stimulation training, we changed the pitch threshold to the 20th or 80th percentile (consistent with the initial contingency) of the pitch distribution of all renditions of the target syllable sung between 8 AM to 9:30 AM on that same day. As the bird changed the pitch of the target syllable in the adaptive direction, the new pitch thresholds continued to be set further and further in the adaptive direction to drive greater amounts of learning. We refer to these additional days as ‘staircase.’ After 1–5 days of staircase training, we stopped providing cutaneous stimulation and began the washout portion of the experiment. We used this experimental approach for both prelesion and postlesion experiments in our LMAN, 6-OHDA, and sham datasets. Importantly, although the number of days of staircase varied between individual birds, for each individual bird we matched the same number of prelesion days of staircase and postlesion days of staircase to ensure that, in both experimental conditions, the bird had an equivalent amount of time and opportunity to learn.

Custom-written MATLAB software (The MathWorks) was used for song analysis. On each day of every experiment, we quantified important song features, such as the pitch, sound amplitude, and spectral entropy, of all renditions of the targeted syllable produced between 10 AM and 12 PM. We did so to account for potential circadian effects on song production. We also reassessed our results (shown in Figure 2) by analyzing only syllable renditions produced between 6 PM and 8 PM using new methods for automated labeling of song syllables (Cohen et al., 2022). We found no statistically significant difference in learning magnitudes between the two forms of analysis (Figure 2—figure supplement 7a, 0.167 < P_boot < 0.951 on all days of training). To ensure a level of consistency in the number of target syllable renditions measured on each day of an experiment, and to have a minimum number of syllable renditions necessary to get an accurate measure of average syllable pitch, we checked that at least 30 renditions of the target syllable were sung within the 10 AM to 12 PM window. If there were less than 30 renditions of the target syllable, we extended the time window for song analysis by 1 hr in both directions (9 AM to 1 PM) and then reassessed to see if there were at least 30 syllable renditions. If not, we continued this process of extending the time window by 1 hr until 30 song renditions were in that day’s dataset. Daily targeting sensitivity (hit rate) and precision (1 – false-positive rate) were measured in all experiments to ensure accurate targeting of the specific target syllable (and not accidentally targeting different song syllables). During the pitch-contingent feedback portion of the experiment, a subset (10%) of randomly selected target syllables did not trigger feedback, regardless of syllable pitch. These ‘catch trials’ allowed for the quantification and comparison of syllable features, such as pitch, sound amplitude, and entropy between trials when feedback was provided and trials when feedback was not provided. Pitch changes were quantified in units of semitones as follows:

s = 12 * l o g_{2} (h / b)

(1)

where s is the pitch change (in semitones) of the syllable, h is the average pitch (in Hertz) of the syllable, and b is the average baseline pitch (in Hertz) of the syllable.

Analysis of variability in syllable pitch

We compared pitch variability pre- and postlesion using methods described in prior literature (Kao et al., 2005; Kao and Brainard, 2006; Hampton et al., 2009.) We analyzed all song renditions (within the 10 AM to 12 PM time window) performed on the final day of baseline prelesion and on the final day of baseline postlesion. We did so in our LMAN lesion experimental group as well as our 6-OHDA lesion experimental group. To measure the variability in pitch of the song syllables, we calculated the coefficient of variation (CV) for the pitch of each syllable using the following formula: CV = (Standard Deviation/Mean) * 100.

LMAN lesions

Birds were anesthetized under ketamine and midazolam and were mounted in a stereotax. The beak angle was set to 20° relative to the surface level of the surgery table. For stereotactic targeting of specific brain regions (in this case, LMAN), anterior-posterior (AP) and medial-lateral (ML) coordinates were found relative to Y₀, a visible anatomical landmark located at the posterior boundary of the central venous sinus in songbirds. Dorsal-ventral (DV) coordinates were measured relative to the surface of the brain. Bilateral craniotomies were made at the approximate AP coordinates 4.9–5.7 mm and ML coordinates 1.5–2.5 mm. A lesioning electrode was then inserted 1.9–2.1 mm below the brain surface. These stereotactic coordinates targeted locations within LMAN. We then passed 100 μA of current for 60–90 s at 5–6 locations in LMAN in both hemispheres in order to electrolytically lesion the areas. This methodology was based on prior work involving LMAN lesions and LMAN inactivations (Ali et al., 2013; Andalman and Fee, 2009; Kao et al., 2005; Kao and Brainard, 2006; Hampton et al., 2009; Warren et al., 2011). In sham-operated birds, we instead performed small lesions in brain areas dorsal to LMAN. Again, this was consistent with methodology from prior studies (Ali et al., 2013; Kao et al., 2005; Kao and Brainard, 2006).

Birds recovered within 2 hr of surgery and began singing normally (at least 30 renditions of target syllable within 2 hr) typically 3–8 days after surgery. The number of songs sung per day did not differ significantly pre- vs. postlesion (paired t-test, p=0.249).

Behavioral measures indicated that LMAN was effectively lesioned in the birds in the LMAN lesion dataset. LMAN lesions in adult songbirds produce a significant decrease in the trial-to-trial variability of song syllable pitch (Kao et al., 2005; Kao and Brainard, 2006; Hampton et al., 2009). To assess lesion-induced changes in the variability of syllable pitch between conditions (LMAN lesion and sham), we calculated the CV of syllable pitch pre- and postlesion. We found that LMAN lesions induced a significant decrease in pitch CV (Figure 3b; paired t-test). Sham operations did not induce a significant change in syllable CV (Figure 3b; paired t-test, p=0.911). The lesion-induced changes in syllable CV (post – pre) were significantly greater than changes to CV in sham-lesioned controls (Figure 3—figure supplement 1a; two-sample Kolmogorov–Smirnov test, p=0.003).

Lesions were confirmed histologically using cresyl violet staining after completion of behavioral experimentation. In tissue from sham-operated birds, we identified Area X and LMAN based on regions of denser staining as well as well-characterized anatomical landmarks (Karten et al., 2013). The histology methodology we employed followed previous literature involving LMAN lesions (Ali et al., 2013; Kao et al., 2005). We performed Nissl stains to stain for neuronal cell bodies in brain slices after experiments were complete (Figure 3—figure supplement 2a). We then calculated the optical density ratio of the region containing LMAN compared to background (a pallial region outside of LMAN) (Figure 3—figure supplement 2b; Hoffmann et al., 2016; Saravanan et al., 2019). The distribution of OD ratios from LMAN-lesioned tissue was significantly less than the OD ratios from sham-lesioned tissue (Figure 3—figure supplement 2c; two-sample Kolmogorov–Smirnov test, p<0.0010). This suggests that the density of neuronal cell bodies within LMAN was reduced following electrolytic lesions compared to following sham. Similar to a prior study, we also qualitatively assessed each slice of brain tissue to measure the percentage of intact LMAN remaining in the tissue (Ali et al., 2013). We found that all of the LMAN-lesioned birds had 80–100% of LMAN lesioned in both hemispheres.

6-OHDA lesions

Birds were anesthetized using ketamine and midazolam and were mounted in a stereotax, where the beak angle was set to 20° relative to the surface level of the surgery table. Isoflurane was used in later hours of the surgery to maintain an anesthetized state. Bilateral craniotomies were made above Area X from the approximate AP coordinates 4.5–6.5 mm and ML coordinates 0.75–2.3 mm relative to Y₀.

In each hemisphere, we inserted a glass pipette containing a 6-OHDA solution and made 12 pressure injections in a 3 mm × 4 mm grid between AP coordinates 5.1 mm and 6.3 mm, ML coordinates 0.9 mm and 2.2 mm, and the DV coordinate 3.18 mm relative to Y₀. Additional bilateral 6-OHDA injections were made at the AP coordinate 4.8 mm, ML coordinate ±0.8 mm, and DV coordinate 2.6 mm from the brain surface to lesion the most medial portion of Area X. Each injection consisted of 13.8 nL of 6-OHDA solution, injected at a rate of 23 nL/s at each site. The pipette was kept in place for 30 s after each injection and was then slowly removed. 6-OHDA solution was prepared using 11.76 mg 6-OHDA-HBr and 2 mg ascorbic acid in 1 mL of 0.9% normal saline solution. The solution was light-protected after preparation to prevent oxidation. In sham-operated birds, we performed the same surgical operations, except saline was injected into Area X instead of 6-OHDA. Again, this was consistent with methodology from prior studies (Hoffmann et al., 2016; Saravanan et al., 2019).

In order to confirm the effectiveness of 6-OHDA injections at lesioning dopaminergic input to Area X, we quantified the extent of the reduction of catecholaminergic fiber innervation within Area X after completing the behavioral experimentation in each bird (Hoffmann et al., 2016; Saravanan et al., 2019). To visualize dopaminergic innervation, we labeled tissue with a common biomarker for catecholaminergic cells (Figure 4—figure supplement 1a). To determine whether the concentration of dopaminergic fibers in Area X had decreased, we measured the optical density ratio (OD): the ratio of the stain density of Area X to the stain density of the surrounding striatum. OD ratios from individual 6-OHDA-lesioned brains decreased compared to control (Figure 4—figure supplement 1b). The distribution of all OD ratios from all of the 6-OHDA-lesioned tissue was significantly lower than that of the brain tissue from sham-operated birds (Figure 4—figure supplement 1c; two-sample Kolmogorov–Smirnov test, p<0.001). These results are similar to previous reports that used 6-OHDA injections to lesion dopaminergic input to Area X Hoffmann et al., 2016; Saravanan et al., 2019, and they indicate that the 6-OHDA injections successfully lesioned dopaminergic input to Area X.

Lesion size was quantified by determining the proportion of 6-OHDA-lesioned tissue that had an OD ratio of Area X to non-X striatum that was less than the fifth percentile of OD ratios in sham tissue. There was not a significant correlation between lesion size and the lesion-induced change in learning magnitude (post – pre) (Figure 4—figure supplement 2a and b; R² = 0.019, p=0.137).

Histology

Between 14 and 54 days after surgery, birds were injected with a lethal dose of ketamine and midazolam and were perfused. The tissue was post-fixed in 4% paraformaldehyde at room temperature for 4–16 hr and then moved to a solution of 30% sucrose for at least 1 day at 4°C for cryoprotection. Then, brain tissue was sliced in 40 μm sections. A chromogenic tyrosine hydroxylase (TH) stain was used to quantify the depletion of catecholaminergic fiber innervations in tissue collected from 6-OHDA-lesioned birds, and Nissl and fluorescent NeuN staining was used to assess the density of cell bodies in tissue from LMAN-lesioned and sham-operated birds. For one bird in the 6-OHDA-lesioned group, a Nissl stain was performed on alternate tissue sections to ensure no cell death occurred as a result of the lesion.

For TH immunohistochemistry, the tissue was incubated overnight in a primary anti-TH antibody solution. The tissue was next incubated in biotinylated horse anti-mouse secondary antibody solution for 1 hr. Then, the tissue was submerged in a diaminobenzidine (DAB) solution (two DAB tablets, Amresco E733 containing 5 mg DAB per tablet, 20 mL Barnstead H₂O, 3 μL H₂O₂) for less than 5 min for visualization. The DAB solution was prepared 1 hr prior to use. The tissue was washed, mounted, and coverslipped using Permount mounting medium.

Tyrosine hydroxylase stain

Between each incubation, the tissue was washed with 0.1 M phosphate buffer (PBS) (23 g dibasic sodium phosphate, 5.25 g monobasic sodium phosphate, and 1 L deionized H₂O) three times for 10 min each. The tissue was first washed and then incubated in 0.3% H₂O₂ for 30 min and then 1% NaBH₄ for 20 min, followed by overnight incubation in a primary anti-TH antibody solution. The tissue was next incubated in biotinylated horse anti-mouse secondary antibody solution for 1 hr, then incubated in avidin-biotin-complex (ABC) solution for 1 hr that had been prepared 30 min prior to use. The tissue was then submerged in a DAB solution for less than 5 min. The tissue was then washed, mounted, and coverslipped using Permount mounting medium. These TH stains mark neurons expressing TH, which are catecholaminergic.

Nissl stain

Tissue was washed in 0.1 M PBS three times for 10 min and was then mounted. The slides were incubated in Citrisolv twice for 5 min each, then delipidized in the following ethanol concentrations for 2 min each: 100, 100, 95, 95, and 70%. The tissue was briefly (less than 15 s) rinsed in deionized water, then incubated in cresyl violet (665 μL glacial acetic acid, 1 g cresyl violet acetate, and 200 mL deionized water) for 30 min. The tissue was rinsed in deionized water, then briefly (less than 15 s) submerged in the following ethanol concentrations for 2 min each: 70, 95, 95, 100, and 100%. The tissue was then incubated in Citrisolv twice for 5 min. The tissue was coverslipped using Permount mounting medium. These Nissl stains mark neuronal cell bodies.

NeuN antibody stain

Between each incubation, the tissue was washed with 0.1 M PBS three times for 10 min each. The tissue was incubated in primary antibody solution (4 mL EMD Millipore guinea pig anti-NeuN Alexa Fluor 488 antibody, 6 mL Triton X-100, 20 mL normal donkey serum [NDS], and 1.95 mL 0.1 M PBS) overnight. The tissue was then washed and incubated in a secondary antibody solution (10 mL Jackson Labs donkey anti-guinea pig [DAG], 6 mL Triton X-100, and 1.975 mL 0.1 PBS) overnight. The tissue was then washed, mounted, and coverslipped with FluroGel mounting medium. Slides were sealed with lacquer. Images were taken under a widefield microscope (BioTek Lionheart FX, Sony ICX285 CCD camera, Gen5 acquisition software, ×1.25 magnification, 16-bit grayscale).

Lesion analysis

Analysis of lesions was based on previously published methodology (Hoffmann et al., 2016; Saravanan et al., 2019). Images of stained tissue sections were obtained using a slide scanner and were converted into 8-bit grayscale images in ImageJ. In birds that received sham 6-OHDA lesions, Area X stains darker than surrounding striatum in TH-DAB-stained tissue due to a higher density of catecholaminergic inputs in Area X Hoffmann et al., 2016; Saravanan et al., 2019. The baseline level of stain darkness can vary from bird to bird. Therefore, rather than directly comparing the stain density of lesioned and sham tissue, the ratio of the stain density of Area X to that of the surrounding striatum (OD ratio) was calculated to determine whether the concentration of catecholaminergic fibers was decreased. Prior work demonstrated that the vast majority of catecholaminergic input to Area X is dopaminergic (Hoffmann et al., 2016).

For each section of tissue containing Area X, a customized ImageJ macro was used to select regions of interest (ROIs) within Area X and within a portion of striatum outside Area X by manually outlining Area X and selecting a circular 0.5-mm-diameter region of striatum anterior to Area X. Pixel count and OD of each ROI were measured, and the density of TH-positive fibers was calculated using the ratio of the OD of Area X to the OD of non-X-striatum.

The cumulative distribution of OD ratios for sham-operated birds was used to construct a 95% confidence interval and determine the threshold for lesioned tissue. 6-OHDA-lesioned tissue in which the OD ratio fell below the 5th percentile of control tissue had a significantly reduced TH-positive fiber density.

Statistical testing

All error bars presented in the article represent SEM. When assessing whether a significant amount of vocal learning occurred in one experiment, we used one-sample t-tests to compare the mean pitch on the final day of training vs zero. To assess whether a significant difference in amount of learning occurred within an individual bird pre- vs. postlesion, we used paired t-tests. To assess significance between distributions of target syllable pitches on various days of the experiment (baseline, cutaneous stimulation training, washout), we used a two-sample Kolmogorov–Smirnov test.

Each experimental group had at least five birds, and for each bird, the target syllable was typically repeated well over 30 times a day. Therefore, the structure of our data is hierarchical, so error accumulates at different levels (birds and syllable iterations). Simply grouping all the data together ignores the non-independence between samples and underestimates the error. To address this issue, we employed a hierarchical bootstrap method to measure SEM and calculate p-values (Saravanan et al., 2020). For each experimental day, we calculated normalized pitch values (in semitones) (normalized to the mean pitch on the final baseline day during that particular experiment). We then generated a population of 10,000 bootstrapped means according to the following sampling procedure: to generate each individual subsample, we resampled across each level of hierarchy in our data (first resampled among the birds, then for each selected bird, we resampled among syllable iterations). The standard deviation of this population of bootstrapped means provides an accurate estimate of the uncertainty of the original data (Saravanan et al., 2020; Saravanan et al., 2019). Thus, the SEM values (which are used for error bars) we report when employing the hierarchical bootstrap method are equal to this standard deviation.

To calculate p-values and determine significance for comparing our data to zero using the hierarchical bootstrap method, we calculated P_boot: the proportion of bootstrapped means greater than zero compared to the total number of bootstrapped means. Using an acceptable type 1 error rate of 0.05, any value of this P_boot ratio greater than 0.975 indicates the mean was significantly greater than zero and any value less than 0.025 indicates the mean was significantly less than zero. P_boot values between 0.025 and 0.975 indicate no significant difference between the dataset and zero. Because we measure adaptive pitch changes in semitones, which are a normalized measure of pitch change where baseline is set to zero, this method of calculating P_boot was employed in all instances where it was necessary to assess whether there was a significant change in pitch at the end of training compared to baseline (zero).

We also sometimes sought to determine significance for the comparison of two means rather than what was previously described (where we assess significance between one mean compared to baseline [zero]). We used a similar hierarchical bootstrap statistical methodology and calculated P_boot. The key difference is that, rather than measuring the proportion of resampled means greater than or less than zero, we instead calculate a joint probability distribution for the means of the two resampled datasets. We measured the percentage of this joint probability distribution that was above one side of the unity line. This percentage is the P_boot value we report in these instances. If the proportion of this joint probability distribution that falls above the unity line is greater than 0.975, it indicated a significantly greater mean of dataset 1 over dataset 2. If the percentage of the joint probability distribution that was above the unity line was less than 0.025, it indicated a significantly lower mean of dataset 1 compared to dataset 2. P_boot values between 0.025 and 0.975 indicate no significant difference between the two datasets. This method was employed in all instances where it was necessary to assess whether the learning magnitudes (adaptive pitch changes by the end of training) were significantly different pre- vs. postlesion (or pre- vs. postsham) or across experimental conditions (e.g., postsham vs. postlesion or post-LMAN lesion vs. post-6-OHDA lesion).

In both forms of P_boot calculation, the lowest statistical limit for P_boot is P_boot < 0.0010, due to resampling 10⁴ times to create bootstrapped means. The highest possible limit for P_boot is P_boot > 0.9999, for the same reason.

Sample sizes were not predetermined using a power analysis. Sample sizes of all sets of experiments were comparable to relevant prior literature (Hoffmann et al., 2016; Tumer and Brainard, 2007; Saravanan et al., 2019). If at any point during cutaneous stimulation training or white noise training a bird’s rate of singing dropped below 10 songs per day for over 1 day, that experiment was stopped and the data were excluded from further analysis.

Acknowledgements

This work was supported in part by NIH grants R01-EB022872, R01-NS084844, and R01-NS099375, a grant from the Simons Foundation as part of the Simons-Emory International Consortium on Motor Control, and HHMI.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

James N McGregor, Email: jmcgregor2292@gmail.com.

Jesse H Goldberg, Cornell University, United States.

Barbara G Shinn-Cunningham, Carnegie Mellon University, United States.

Funding Information

This paper was supported by the following grants:

National Institutes of Health R01-EB022872 to James N McGregor.
National Institutes of Health R01-NS084844 to James N McGregor.
National Institutes of Health R01-NS099375 to James N McGregor.
Simons Foundation Emory International Consortium on Motor Control to Samuel J Sober.
Howard Hughes Medical Institute to Paul I Jaffe, Michael S Brainard.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing.

Conceptualization, Data curation, Software, Formal analysis, Investigation, Methodology, Writing – review and editing.

Conceptualization, Data curation, Validation, Methodology, Writing – review and editing.

Formal analysis, Methodology.

Conceptualization, Resources, Supervision, Project administration, Writing – review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Writing - original draft, Project administration, Writing – review and editing.

Ethics

All experimental protocols were approved by the Emory University and UC San Francisco Institutional Animal Care and Use Committees (protocol #201700359).

Additional files

Transparent reporting form

elife-75691-transrepform1.docx^{(112.1KB, docx)}

Data availability

Source data are provided for all main figures and relevant figure supplements (Figure 2b–f, Figure 2—figure supplements 1–7, Figure 3b–e, Figure 3—figure supplement 1, and Figure 4b–d). MATLAB code for generating these figures is also provided in the associated source code files. Data and source code have also been uploaded to a public data repository on figshare, in a project titled 'Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain'.

The following datasets were generated:

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_3.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_1.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_3.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_2.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_1.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_3.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_2.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_3.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_2.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_2.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_2.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_4.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_1.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_3.mat. figshare. Dataset. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_1.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_code_3.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_code_4.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_Code_1.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_Code_2.m. figshare. Software. figshare.

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_1.m. figshare. Software. figshare.

References

Ali F, Otchy TM, Pehlevan C, Fantana AL, Burak Y, Ölveczky BP. The basal ganglia is necessary for learning spectral, but not temporal, features of birdsong. Neuron. 2013;80:494–506. doi: 10.1016/j.neuron.2013.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andalman AS, Fee MS. A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. PNAS. 2009;106:12518–12523. doi: 10.1073/pnas.0903214106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bottjer SW, Miesner EA, Arnold AP. Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science. 1984;224:901–903. doi: 10.1126/science.6719123. [DOI] [PubMed] [Google Scholar]
Brainard MS, Doupe AJ. Interruption of a basal ganglia-forebrain circuit prevents plasticity of learned vocalizations. Nature. 2000;404:762–766. doi: 10.1038/35008083. [DOI] [PubMed] [Google Scholar]
Brainard MS, Doupe AJ. What songbirds teach us about learning. Nature. 2002;417:351–358. doi: 10.1038/417351a. [DOI] [PubMed] [Google Scholar]
Brainard MS, Doupe AJ. Translating birdsong: songbirds as a model for basic and applied medical research. Annual Review of Neuroscience. 2013;36:489–517. doi: 10.1146/annurev-neuro-060909-152826. [DOI] [PMC free article] [PubMed] [Google Scholar]
Charlesworth JD, Tumer EC, Warren TL, Brainard MS. Learning the microstructure of successful behavior. Nature Neuroscience. 2011;14:373–380. doi: 10.1038/nn.2748. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Y, Matheson LE, Sakata JT. Mechanisms underlying the social enhancement of vocal learning in songbirds. PNAS. 2016;113:6641–6646. doi: 10.1073/pnas.1522306113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen R, Puzerey PA, Roeser AC, Riccelli TE, Podury A, Maher K, Farhang AR, Goldberg JH. Songbird ventral pallidum sends diverse performance error signals to dopaminergic midbrain. Neuron. 2019;103:266–276. doi: 10.1016/j.neuron.2019.04.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen Y, Nicholson DA, Sanchioni A, Mallaber EK, Skidanova V, Gardner TJ. Automated annotation of birdsong with a neural network that segments spectrograms. eLife. 2022;11:e63853. doi: 10.7554/eLife.63853. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cowie RID, Douglas-Cowie E. In: Hearing Science and Hearing Disorders. Lutman ME, Haggard MP, editors. New York: Academic Press; 1983. Speech production in profound postlingual deafness; pp. 183–230. [DOI] [Google Scholar]
Doupe AJ, Kuhl PK. Birdsong and human speech: common themes and mechanisms. Annual Review of Neuroscience. 1999;22:567–631. doi: 10.1146/annurev.neuro.22.1.567. [DOI] [PubMed] [Google Scholar]
Fee MS, Goldberg JH. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience. 2011;198:152–170. doi: 10.1016/j.neuroscience.2011.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gadagkar V, Puzerey PA, Chen R, Baird-Daniel E, Farhang AR, Goldberg JH. Dopamine neurons encode performance error in singing birds. Science. 2016;354:1278–1282. doi: 10.1126/science.aah6837. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein MH, King AP, West MJ. Social interaction shapes babbling: testing parallels between birdsong and speech. PNAS. 2003;100:8030–8035. doi: 10.1073/pnas.1332441100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Graybiel AM, Aosaki T, Flaherty AW, Kimura M. The basal ganglia and adaptive motor control. Science. 1994;265:1826–1831. doi: 10.1126/science.8091209. [DOI] [PubMed] [Google Scholar]
Hampton CM, Sakata JT, Brainard MS. An avian basal ganglia-forebrain circuit contributes differentially to syllable versus sequence variability of adult bengalese finch song. Journal of Neurophysiology. 2009;101:3235–3245. doi: 10.1152/jn.91089.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hikosaka O, Nakamura K, Sakai K, Nakahara H. Central mechanisms of motor skill learning. Current Opinion in Neurobiology. 2002;12:217–222. doi: 10.1016/s0959-4388(02)00307-0. [DOI] [PubMed] [Google Scholar]
Hisey E, Kearney MG, Mooney R. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning. Nature Neuroscience. 2018;21:589–597. doi: 10.1038/s41593-018-0092-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoffmann LA, Saravanan V, Wood AN, He L, Sober SJ. Dopaminergic contributions to vocal learning. The Journal of Neuroscience. 2016;36:2176–2189. doi: 10.1523/JNEUROSCI.3883-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Houde JF, Jordan MI. Sensorimotor adaptation in speech production. Science. 1998;279:1213–1216. doi: 10.1126/science.279.5354.1213. [DOI] [PubMed] [Google Scholar]
Jarvis ED. Evolution of vocal learning and spoken language. Science. 2019;366:50–54. doi: 10.1126/science.aax0287. [DOI] [PubMed] [Google Scholar]
Kao MH, Doupe AJ, Brainard MS. Contributions of an avian basal ganglia-forebrain circuit to real-time modulation of song. Nature. 2005;433:638–643. doi: 10.1038/nature03127. [DOI] [PubMed] [Google Scholar]
Kao MH, Brainard MS. Lesions of an avian basal ganglia circuit prevent context-dependent changes to song variability. Journal of Neurophysiology. 2006;96:1441–1455. doi: 10.1152/jn.01138.2005. [DOI] [PubMed] [Google Scholar]
Karten HJ, Brzozowska-Prechtl A, Lovell PV, Tang DD, Mello CV, Wang H, Mitra PP. Digital atlas of the zebra finch (Taeniopygia guttata) brain: a high-resolution photo atlas. The Journal of Comparative Neurology. 2013;521:3702–3715. doi: 10.1002/cne.23443. [DOI] [PMC free article] [PubMed] [Google Scholar]
Konishi M. The role of auditory feedback in the control of vocalization in the white-crowned sparrow. Zeitschrift Fur Tierpsychologie. 1965;22:770–783. doi: 10.1111/j.1439-0310.1965.tb01688.x. [DOI] [PubMed] [Google Scholar]
Kuebrich BD, Sober SJ. Variations on a theme: songbirds, variability, and sensorimotor error correction. Neuroscience. 2015;296:48–54. doi: 10.1016/j.neuroscience.2014.09.068. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuhl PK. Is speech learning “gated” by the social brain? Developmental Science. 2007;10:110–120. doi: 10.1111/j.1467-7687.2007.00572.x. [DOI] [PubMed] [Google Scholar]
Leblois A, Wendel BJ, Perkel DJ. Striatal dopamine modulates basal ganglia output and regulates social context-dependent behavioral variability through D1 receptors. The Journal of Neuroscience. 2010;30:5730–5743. doi: 10.1523/JNEUROSCI.5974-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Locke JL, Snow C. In: Social Influences on Vocal Development. Snowdon CT, Hausberger M, editors. Cambridge: Cambridge University Press; 2010. Social influences on vocal learning in human and nonhuman primates; pp. 274–292. [DOI] [Google Scholar]
Mooney R. Neural mechanisms for learned birdsong. Learning & Memory. 2009;16:655–669. doi: 10.1101/lm.1065209. [DOI] [PubMed] [Google Scholar]
Morrison RG, Nottebohm F. Role of a telencephalic nucleus in the delayed song learning of socially isolated zebra finches. Journal of Neurobiology. 1993;24:1045–1064. doi: 10.1002/neu.480240805. [DOI] [PubMed] [Google Scholar]
Murdoch D, Chen R, Goldberg JH. Place preference and vocal learning rely on distinct reinforcers in songbirds. Scientific Reports. 2018;8:6766. doi: 10.1038/s41598-018-25112-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murugan M, Harward S, Scharff C, Mooney R. Diminished FOXP2 levels affect dopaminergic modulation of corticostriatal signaling important to song variability. Neuron. 2013;80:1464–1476. doi: 10.1016/j.neuron.2013.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nasir SM, Ostry DJ. Speech motor learning in profoundly deaf adults. Nature Neuroscience. 2008;11:1217–1222. doi: 10.1038/nn.2193. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nordeen KW, Nordeen EJ. Auditory feedback is necessary for the maintenance of stereotyped song in adult zebra finches. Behavioral and Neural Biology. 1992;57:58–66. doi: 10.1016/0163-1047(92)90757-u. [DOI] [PubMed] [Google Scholar]
Nordeen KW, Nordeen EJ. Long-Term maintenance of song in adult zebra finches is not affected by lesions of a forebrain region involved in song learning. Behavioral and Neural Biology. 1993;59:79–82. doi: 10.1016/0163-1047(93)91215-9. [DOI] [PubMed] [Google Scholar]
Oller DK, Eilers RE. The role of audition in infant babbling. Child Development. 1988;59:441–449. doi: 10.2307/1130323. [DOI] [PubMed] [Google Scholar]
Paterson AK, Bottjer SW. Cortical inter-hemispheric circuits for multimodal vocal learning in songbirds. The Journal of Comparative Neurology. 2017;525:3312–3340. doi: 10.1002/cne.24280. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saravanan V, Hoffmann LA, Jacob AL, Berman GJ, Sober SJ. Dopamine depletion affects vocal acoustics and disrupts sensorimotor adaptation in songbirds. ENeuro. 2019;6:ENEURO.0190-19.2019. doi: 10.1523/ENEURO.0190-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saravanan V, Berman GJ, Sober SJ. Application of the hierarchical bootstrap to multi-level data in neuroscience. Neurons, Behavior, Data Analysis and Theory. 2020;3:819334. doi: 10.1101/819334. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sasaki A, Sotnikova TD, Gainetdinov RR, Jarvis ED. Social context-dependent singing-regulated dopamine. The Journal of Neuroscience. 2006;26:9010–9014. doi: 10.1523/JNEUROSCI.1335-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sober SJ, Brainard MS. Adult birdsong is actively maintained by error correction. Nature Neuroscience. 2009;12:927–931. doi: 10.1038/nn.2336. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spool JA, Macedo-Lima M, Scarpa G, Morohashi Y, Yazaki-Sugiyama Y, Remage-Healey L. Genetically identified neurons in avian auditory pallium mirror core principles of their mammalian counterparts. Current Biology. 2021;31:2831–2843. doi: 10.1016/j.cub.2021.04.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoel-Gammon C, Otomo K. Babbling development of hearing-impaired and normally hearing subjects. The Journal of Speech and Hearing Disorders. 1986;51:33–41. doi: 10.1044/jshd.5101.33. [DOI] [PubMed] [Google Scholar]
Suthers RA, Goller F, Wild JM. Somatosensory feedback modulates the respiratory motor program of crystallized birdsong. PNAS. 2002;99:5680–5685. doi: 10.1073/pnas.042103199. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tremblay S, Shiller DM, Ostry DJ. Somatosensory basis of speech production. Nature. 2003;423:866–869. doi: 10.1038/nature01710. [DOI] [PubMed] [Google Scholar]
Tumer EC, Brainard MS. Performance variability enables adaptive plasticity of “crystallized” adult birdsong. Nature. 2007;450:1240–1244. doi: 10.1038/nature06390. [DOI] [PubMed] [Google Scholar]
Veit L, Tian LY, Monroy Hernandez CJ, Brainard MS. Songbirds can learn flexible contextual control over syllable sequencing. eLife. 2021;10:e61610. doi: 10.7554/eLife.61610. [DOI] [PMC free article] [PubMed] [Google Scholar]
Warren TL, Tumer EC, Charlesworth JD, Brainard MS. Mechanisms and time course of vocal learning and consolidation in the adult songbird. Journal of Neurophysiology. 2011;106:1806–1821. doi: 10.1152/jn.00311.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
West MJ, King AP. Female visual displays affect the development of male song in the cowbird. Nature. 1988;334:244–246. doi: 10.1038/334244a0. [DOI] [PubMed] [Google Scholar]
Xiao L, Chattree G, Oscos FG, Cao M, Wanat MJ, Roberts TF. A basal ganglia circuit sufficient to guide birdsong learning. Neuron. 2018;98:208–221. doi: 10.1016/j.neuron.2018.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zai AT, Cavé-Lopez S, Rolland M, Giret N, Hahnloser RHR. Sensory substitution reveals a manipulation bias. Nature Communications. 2020;11:5940. doi: 10.1038/s41467-020-19686-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.75691.sa0

Editor's evaluation

Jesse H Goldberg ¹

This is an important article that shows that songbirds can learn to adjust their song on the basis of somatosensory feedback, and not just auditory feedback as previously thought. Convincing evidence is provided that cutaneous stimulation-induced song learning requires the same dopamine-basal ganglia pathway previously implicated in natural auditory feedback-based learning, showing that vocal production circuits can flexibly learn from feedback from multiple modalities.

eLife. doi: 10.7554/eLife.75691.sa1

Decision letter

Editor: Jesse H Goldberg¹

Reviewed by: Nicolas Giret

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Barbara Shinn-Cunningham as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Nicolas Giret (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

(1) Data analysis

1.1 Independently quantify distinct sites of electrical stimulation

Electrical stimulation of ~100 uAm for ~50ms could cause electric fields large enough to reach pallium underlying the skull, which could in turn discharge neurons in dorsal auditory areas of the avian pallium. Can the authors provide coordinates for their cutaneous wire implants with respect to known coordinates of auditory pallium? Also, the birds with neck cutaneous stimulation provide good controls for this concern. The authors should make absolutely clear in their figures which data came from neck stim and which came from scalp stim (e.g. in Figure 2, FigSupp1b,d,f; Figure 2,FigSupp2), and should include the justification for the neck stim in the main text.

1.2 Learning magnitude in cutaneous vs white noise

The authors also claim that there is no systematic difference between learning magnitudes of cutaneous stimulation and of auditory white noise stimulation, suggesting that both training methods result in the same learning efficacy. While their data indeed shows no significant difference between these training methods, there is little ground for this claim. First, learning magnitudes seem to vary a lot across individuals, they may be similar on average but there does not seem to be a correlation between the two. Second, similar learning magnitudes only show that the saliency of the two stimuli were adjusted to be roughly equal, which is not surprising given that they adjusted the magnitude of electric current using a similar criterion as in their initial 2007 paper: In (Tumer and Brainard 2007) they adjusted white noise amplitude until they observed stoppages during the first day of exposure, and in this manuscript they adjusted electric current to interrupt song on the first few instances of cutaneous stimulation.

In the distributions of the adaptive pitch changes between cutaneous and white noise feedback (Figure 2f) the sham and unoperated birds actually appear quite different: the unoperated seem to have more change their pitch when exposed to the white noise, while it is the opposite for the sham who seem to change more with the cutaneous stimulation. Could the authors provide some more statistics to justify the pooling of the two groups of birds?

1.3 Cases of sparse and/or noisy data lead to unconvincing claims

1.3.1. There are a few instances where more data would help to better evaluate the significance of the results. For example, only one of the three days of baseline song is shown and for only one example bird, and worst of all, the data is reduplicated in this bird on two days, which points to a serious flaw in either the analysis or the illustration. Authors should show more baseline days and include more birds.

1.3.2. The 2-sided KS test to assess the difference between baseline and end of cutaneous stimulation is extremely significant (10^-12) for that one example bird, which is nice, but it would be useful to see whether this is the case for all birds and not just that example bird on that example day. Also, it would be interesting to see how these statistics behave when comparing two or more baseline days. It is unlikely that the washout the KS analysis reveals in this one bird will apply in all birds.

1.3.3. Surprisingly, five days after the depletion of the DA inputs to the basal ganglia (Area X), there is a change of the pitch in the anti-adaptative direction that reaches statistical significance on day 5 (Figure 4c). This effect on the 5th day only might be related to the fact that the depletion of the DA spares about 50% of the inputs to Area X. But what could be the explanation for the change in the anti-adaptative direction?

1.4 Why exclude afternoon singing data?

Why was the analysis restricted to the song syllables that were produced between 10am and 12pm? What is the rationale for such a restriction? Did past papers on syllable contingent feedback driven pitch learning impose such a restriction? If not, why not? And why is it here? Are the results different when considering all the song syllables per day? This is a keypoint to show. Also, the reader only finds that information in the method section although it seems to me as an important one that needs to be provided in the main text. Finally, for the analysis only data between 10 am and 12 pm are used, this window is extended if birds sing less than 30 renditions of the target syllable during this time window. It is unclear from their description how often this is the case and how it influences their analysis. Furthermore, they exclude birds that dropped their singing rate below 10 songs per day for more than a day, again not stating how many birds were excluded based on this criterion.

(2) Failure to cite and consider Zai et al., 2020

It is an egregious oversight that the authors did not cite or discuss Zai et al., 2020. Both the ability of birds to learn from non-auditory stimuli and the involvement of the AFP in this process have been shown previously. This study showed that visual stimuli (short periods of light off) can successfully drive changes in pitch both in hearing and in deaf birds; furthermore, in deaf birds, the involvement of the AFP in this process has been shown using a similar lesioning approach. Thus, two out of the three main claims of novelty in the manuscript are not novel, despite the authors' claims. Thus, the main novelty beyond the 2020 study is that McGregor et al. are the first to show that somatosensory information (cutaneous electrical stimulation) can induce vocal plasticity and that dopaminergic projections to the AFP are somehow involved in this process.

(3) Interpretation of dopamine lesion experiments

The authors claim that dopaminergic input is necessary for observing adaptive changes, but their data suggests otherwise, namely that dopamine sets the direction of the change. Strictly speaking, the statement 'dopaminergic inputs are required for non-auditory vocal learning' is incorrect, since the data shows reversal in learning direction, which is a form of learning as well. Therefore, the apparent reversal in learning in DA-depleted conditions should be discussed.

(4) Effect of cutaneous stimulation on ongoing song

The absence of a transient effect of the electrical stimulation on the ongoing song (not only song stopping but also FM, pitch, entropy etc.) is claimed but not demonstrated. As the authors did quantify some important features (as stated in the methods, l. 567-568), some examples and analyses for at least one or two acoustic features should be shown (e.g. in a Supp Fig).

(5) Accurately contextualize distorted auditory feedback studies

The paragraph at line 420 is written as though all pitch contingent auditory feedback studies have been done with loud white noise bursts. But Andalman and Fee, 2009 and Chen et al., 2020 used broadband noise filtered in the 2-8kHz range so that it sounds like a zebra finch call (this noise actually elicits social calls and drives place preference, as cited (Murdoch et al., 2018)). And Gadagkar et al., 2016 additionally used displaced syllable fragments of each bird's own song at decibels less than what the singing bird would hear at the ear. These nuances of distinct feedback protocols are relevant to this paragraph and should be spelled out – as a loud non-social white noise burst with powers at low frequencies resembling knocks is likely perceived differently as a sound filtered to be in the bird's own vocalization range that is known to drive positive place preference and to evoke social calls. Please correct this paragraph so the papers are cited properly.

(6) Figure 2C: data from day 0 and day 1 are identical!

eLife. 2022 Sep 15;11:e75691. doi: 10.7554/eLife.75691.sa2

Author response

Essential revisions:

(1) Data analysis

1.1 Independently quantify distinct sites of electrical stimulation

Electrical stimulation of ~100 uAm for ~50ms could cause electric fields large enough to reach pallium underlying the skull, which could in turn discharge neurons in dorsal auditory areas of the avian pallium. Can the authors provide coordinates for their cutaneous wire implants with respect to known coordinates of auditory pallium? Also, the birds with neck cutaneous stimulation provide good controls for this concern. The authors should make absolutely clear in their figures which data came from neck stim and which came from scalp stim (e.g. in Figure 2, FigSupp1b,d,f; Figure 2,FigSupp2), and should include the justification for the neck stim in the main text.

We thank the reviewers for this comment and have edited the manuscript to address their concerns. The approximate location of the scalp electrodes was 4.5 mm lateral and 6.3 mm anterior, relative to Y₀. The coordinates used in prior literature for targeting auditory pallium are 1.1 mm anterior and 0.7 mm lateral, relative to Y₀, and 1.5 mm ventral from the surface of the brain (Spool et al., 2021), which provides 6.1 mm of space between our electrodes and auditory pallium. We therefore believe the location of the scalp electrodes was far enough away from auditory pallium to avoid potentially discharging neurons in the region. We also agree that the neck stimulation also addresses this potential concern. We have included this information in the main manuscript text (pg. 6, lines 232-238, and pg. 15, lines 515-518):

“To ensure that the cutaneous stimulation on the scalp did not drive learning through an unexpected influence on brain activity in dorsal auditory areas of the pallium, we implanted the wire electrodes in the neck instead of the scalp in 7 out of 12 birds used in these experiments. The magnitude of vocal learning did not differ between the two groups of birds on any day of training (0.679<P_boot<0.891). Taken together, these results indicate that the gradual, adaptive pitch shift is driven by non-auditory cutaneous stimulation and not by other unintentional effects of the stimulation.”

“The approximate location of the scalp electrodes was 4.47 mm lateral and 6.3 mm anterior, relative to Y₀, far from the coordinates used for targeting auditory pallium, which are: 1.1 mm anterior and 0.7 mm lateral, relative to Y₀, and 1.5 mm ventral from the surface of the brain”

Also, we have edited the relevant main text figure (Figure 2 f in the revised manuscript) to illustrate which results were from birds with neck-implanted electrodes and which were from birds with scalp-implanted electrodes:

Figure 2F shows that electric shocks delivered to the neck and scalp produced overlapping ranges of learning magnitude (x-axis values of filled and empty squares). Moreover, note that the largest learning magnitudes were observed in animals with electrodes located on the neck (i.e. farther from the auditory pallium from the scalp-implanted electrodes).

Finally, we have added a new panel to Figure 2, Figure Supplement 1, to demonstrate the results of cutaneous training from experiments using neck vs scalp electrodes (Figure 2, Figure Supplement 2 b in the revised manuscript). There was no statistical difference in magnitude of adaptive pitch change between the two groups on any day of cutaneous training (0.679<P_boot<0.891).

1.2 Learning magnitude in cutaneous vs white noise

The authors also claim that there is no systematic difference between learning magnitudes of cutaneous stimulation and of auditory white noise stimulation, suggesting that both training methods result in the same learning efficacy. While their data indeed shows no significant difference between these training methods, there is little ground for this claim. First, learning magnitudes seem to vary a lot across individuals, they may be similar on average but there does not seem to be a correlation between the two. Second, similar learning magnitudes only show that the saliency of the two stimuli were adjusted to be roughly equal, which is not surprising given that they adjusted the magnitude of electric current using a similar criterion as in their initial 2007 paper: In (Tumer and Brainard 2007) they adjusted white noise amplitude until they observed stoppages during the first day of exposure, and in this manuscript they adjusted electric current to interrupt song on the first few instances of cutaneous stimulation.

We thank the reviewer for their comments. We agree that learning magnitudes vary substantially across individuals, and that our data do not allow us to make strong conclusions about relative stimulus strength/ aversiveness across modalities within single birds or within a single modality across subjects. We have therefore revised our text to make it clear that, as stated by the Reviewer above, the results in Figure 2f serve to establish primarily that shock and white noise produce (by design) similar ranges of learning, and do not establish the extent to which sensitivity to different sensory modalities varies across individuals. We have edited the manuscript text to clarify this (pg. 5, lines 183-197):

“To further characterize cutaneous stimulation training and to compare this form of learning to well-established vocal learning paradigms, we performed multiple learning experiments – one cutaneous stimulation and one white noise – in 8 out of the 12 individual birds from this data set where the implanted electrode wires remained intact for a long enough time to perform multiple sets of experiments (Figure 2a). To account for the potential influence of multiple trainings in the same individual birds on magnitude of learning, we randomized the order of white noise training and cutaneous stimulation training for the birds who underwent both training paradigms. We also included 6 LMAN sham operated birds from a later set of experiments in this particular analysis. We did so because the sham operated birds had intact song systems and underwent both cutaneous stimulation and white noise training. Also, we found no statistically significant difference between the magnitude of learning by the end of training in birds who did not undergo craniotomies for LMAN, 6OHDA, or sham lesions compared with the magnitude of learning in birds that received sham LMAN lesions for either white noise experiments (2 sample t-test, p = 0.779) or cutaneous stimulation experiments (2 sample t-test, p = 0.148).”

In the distributions of the adaptive pitch changes between cutaneous and white noise feedback (Figure 2f) the sham and unoperated birds actually appear quite different: the unoperated seem to have more change their pitch when exposed to the white noise, while it is the opposite for the sham who seem to change more with the cutaneous stimulation. Could the authors provide some more statistics to justify the pooling of the two groups of birds?

As requested, we have added a statistical analysis of whether learning magnitude differences between sham and unoperated birds. We found no significant difference between the groups when comparing the magnitude of learning by the end of white noise training (2 sample KS-test, p = 0.779). We also analyzed the average magnitude of learning during cutaneous stimulation training from unoperated birds compared with the learning magnitude during stimulation training from birds that received sham LMAN lesions, and again found no statistically significant difference between the groups (2 sample KS-test, p = 0.148). We have added this information to the manuscript (pg. 5, lines 192-197):

“Also, we found no statistically significant difference between the magnitude of learning by the end of training in unoperated birds compared with the magnitude of learning in birds that received sham LMAN lesions for either white noise experiments (2 sample t-test, p = 0.779) or cutaneous stimulation experiments (2 sample t-test, p = 0.148).”

1.3 Cases of sparse and/or noisy data lead to unconvincing claims

1.3.1. There are a few instances where more data would help to better evaluate the significance of the results. For example, only one of the three days of baseline song is shown and for only one example bird, and worst of all, the data is reduplicated in this bird on two days, which points to a serious flaw in either the analysis or the illustration. Authors should show more baseline days and include more birds.

We deeply apologize for this error and thank both Reviewers for bringing it to our attention. On investigation we discovered that this mistake was caused by an error in the figure plotting code only, did not reflect any errors in analysis, and did not affect any of the results reported. We have corrected the figure (Figure 2 C in the revised manuscript), and have added additional data from baseline days of song recording to demonstrate the stability of syllable pitch during baseline conditions.

To further address this concern, we have also added a new supplemental figure (Figure 2 —figure supplement 5 in the revised manuscript), where we show the pitch for all renditions of the target syllable between 10 – noon across every day of recording, including all days of baseline and cutaneous training, from 6 additional experiments, which we randomly selected from our dataset to illustrate the range of learning behavior across experiments. Also, we are providing full datasets for all experiments, along with the code to generate all figures.

1.3.2. The 2-sided KS test to assess the difference between baseline and end of cutaneous stimulation is extremely significant (10^-12) for that one example bird, which is nice, but it would be useful to see whether this is the case for all birds and not just that example bird on that example day. Also, it would be interesting to see how these statistics behave when comparing two or more baseline days. It is unlikely that the washout the KS analysis reveals in this one bird will apply in all birds.

To address this reviewer’s interest in seeing additional results to the example ones we chose for the main text figure, we have created a new Supplementary Figure (Figure 2 Supplement 6 in the revised manuscript), where we created the same CDF plot for multiple other example experiments from the dataset from the same birds shown above in Figure 2 Supplement 5 (see response to the above question). We report the result of KS-tests for each of these example experiments in the figure legend. We’ve also performed KS-tests comparing the data on the final day of baseline to the final day of cutaneous stimulation training for all of the experiments in this unoperated dataset. 11 out of the 12 experiments with 3 days of baseline recording resulted in a p-value < 0.05, and highly significant differences were the norm. The exact p-values for each experiment shown in Figure 2 Figure Supplement 6 are reported in the figure legend.

1.3.3. Surprisingly, five days after the depletion of the DA inputs to the basal ganglia (Area X), there is a change of the pitch in the anti-adaptative direction that reaches statistical significance on day 5 (Figure 4c). This effect on the 5th day only might be related to the fact that the depletion of the DA spares about 50% of the inputs to Area X. But what could be the explanation for the change in the anti-adaptative direction?

We agree with the reviewer that the observed anti-adaptive change in average pitch following dopamine lesions is interesting. However, although this difference achieves statistical significance, we believe that this finding should be treated with caution due to the fact that two of the four postlesion experiments had to be stopped early due to pandemic-related disruptions (Figure 2 Supplement 2, panel e in the revised manuscript, we have added this information to the figure legend.) The duration of the postlesion experiments differed due to the unexpected need to terminate experiments earlier than planned during the initial stages of the pandemic. It is therefore unclear the extent to which the change in significance on day 5 might reflect the removal of half of the subjects in the key condition. In preparing the manuscript, we considered excluding data from day 5 altogether due to these issues but decided to show the data to let readers make up their minds.

1.4 Why exclude afternoon singing data?

Why was the analysis restricted to the song syllables that were produced between 10am and 12pm? What is the rationale for such a restriction? Did past papers on syllable contingent feedback driven pitch learning impose such a restriction? If not, why not? And why is it here? Are the results different when considering all the song syllables per day? This is a keypoint to show. Also, the reader only finds that information in the method section although it seems to me as an important one that needs to be provided in the main text. Finally, for the analysis only data between 10 am and 12 pm are used, this window is extended if birds sing less than 30 renditions of the target syllable during this time window. It is unclear from their description how often this is the case and how it influences their analysis. Furthermore, they exclude birds that dropped their singing rate below 10 songs per day for more than a day, again not stating how many birds were excluded based on this criterion.

We thank the reviewers for these important questions. Prior studies from our own and other groups have similarly restricted analyses to particular time intervals (Sober and Brainard, 2009, Ali et al., 2013). In this study, we initially restricted the analysis to song produced between 10 A.M. and 12 P.M. for the sake of convenience, since labeling all syllable renditions for every day from the large number of experiments performed throughout these studies would have been very time-consuming. Moreover, restricting analysis to the same period of the day helps to mitigate the potential impact of circadian cycles on song behavior. In Sober and Brainard, 2009, we directly addressed whether this time restriction impacted behavior in a vocal learning paradigm by comparing the data produced between 10 A.M. and 12 P.M. with the analysis of song syllables produced between 6 P.M. – 8 P.M. and showed no statistical difference in learning between the two. We have added this information to the main text of the manuscript (pg. 6, lines 239-243) in addition to the Methods section (pg. 16, lines 633-638). Further, we have added a new supplemental figure where we analyzed all syllables produced between 6-8 P.M., and compared these results to those obtained by analyzing songs produced between 10 A.M. – 12 P.M. (Figure 2—figure supplement 7 in the revised manuscript). We found no statistically significant difference in the magnitude of learning between the two groups, suggesting that restricting our analysis to this particular time of the day did not impact the main results described in the paper.

(2) Failure to cite and consider Zai et al., 2020

It is an egregious oversight that the authors did not cite or discuss Zai et al., 2020. Both the ability of birds to learn from non-auditory stimuli and the involvement of the AFP in this process have been shown previously. This study showed that visual stimuli (short periods of light off) can successfully drive changes in pitch both in hearing and in deaf birds; furthermore, in deaf birds, the involvement of the AFP in this process has been shown using a similar lesioning approach. Thus, two out of the three main claims of novelty in the manuscript are not novel, despite the authors' claims. Thus, the main novelty beyond the 2020 study is that McGregor et al. are the first to show that somatosensory information (cutaneous electrical stimulation) can induce vocal plasticity and that dopaminergic projections to the AFP are somehow involved in this process.

We agree completely and apologize for this error. We have edited the manuscript text in multiple locations in the introduction and discussion to address this important prior study. Specifically, we have added additional information and citations on pg. 2, lines 81-83 and 88-94, and pg. 12 lines 418-422 (please see below for an example). We hope that the revised manuscript properly frames our study in the context of this important earlier study.

“Recent work has demonstrated that the songbird AFP receives anatomical projections from brain regions that process non-auditory sensory information²⁷, and that Area X plays a crucial role in processing visual information to shape vocal output¹⁷, yet it remains unclear whether and how the AFP processes somatosensory feedback to drive vocal learning, and whether dopaminergic input to the AFP is involved in non-auditory forms of learning.”

(3) Interpretation of dopamine lesion experiments

The authors claim that dopaminergic input is necessary for observing adaptive changes, but their data suggests otherwise, namely that dopamine sets the direction of the change. Strictly speaking, the statement 'dopaminergic inputs are required for non-auditory vocal learning' is incorrect, since the data shows reversal in learning direction, which is a form of learning as well. Therefore, the apparent reversal in learning in DA-depleted conditions should be discussed.

We thank the reviewer for this comment. Please see our response to 1.3.3 above, where we explain that we had to terminate a subset of experiments earlier than expected due to the pandemic and therefore do not wish to make any strong claims about the results on the final day of data collection for this particular experiment. We have edited the text of the revised manuscript to more carefully explain the differences in the expression of learning prelesion vs postlesion (pg. 11, lines 396-400):

“These results demonstrate that dopaminergic input to Area X is required for adaptive changes in vocal output in response to non-auditory signals. “

(4) Effect of cutaneous stimulation on ongoing song

The absence of a transient effect of the electrical stimulation on the ongoing song (not only song stopping but also FM, pitch, entropy etc.) is claimed but not demonstrated. As the authors did quantify some important features (as stated in the methods, l. 567-568), some examples and analyses for at least one or two acoustic features should be shown (e.g. in a Supp Fig).

We have included an additional supplemental figure to demonstrate the absence of a transient effect of cutaneous stimulation on ongoing song by assessing syllable pitch, entropy, volume, and duration (Figure 2—figure supplement 4 in the revised manuscript):

We discuss these results in the revised manuscript text (pg. 5, lines 216-232):

“To confirm that cutaneous stimulation learning was truly driven by the non-auditory stimulus and not by an unintentional, acute change in vocal output caused by the cutaneous stimulation, we measured the syllable features of interleaved “catch” trials, where cutaneous stimulation was randomly withheld (see Methods), on each day of cutaneous stimulation training. For each experiment, we normalized the pitch of each catch trial from each day of training to the mean pitch of all trials where cutaneous stimulation was provided. We excluded any experiments where the total number of catch trials was less than 10. In every case, the normalized catch trials did not differ significantly from 1, indicating that the pitch of catch trials were highly similar to trials where cutaneous stimulation was provided (Figure 2—figure supplement 4a; t-test, 0.071<p<0.997 for each experiment). For comparison, we also performed the same analysis on randomly selected trials from a day of baseline recording, where cutaneous stimulation was not provided on any trials (Figure 2—figure supplement 4a). There was no significant difference between this data set and the normalized catch trials (paired t-test, p = 0.339). We repeated this analysis for other syllable features, such as syllable duration, volume, and spectral entropy. In all cases, we did not see a robust, acute change in song performance caused by the cutaneous stimulation.”

(5) Accurately contextualize distorted auditory feedback studies

The paragraph at line 420 is written as though all pitch contingent auditory feedback studies have been done with loud white noise bursts. But Andalman and Fee, 2009 and Chen et al., 2020 used broadband noise filtered in the 2-8kHz range so that it sounds like a zebra finch call (this noise actually elicits social calls and drives place preference, as cited (Murdoch et al., 2018)). And Gadagkar et al., 2016 additionally used displaced syllable fragments of each bird's own song at decibels less than what the singing bird would hear at the ear. These nuances of distinct feedback protocols are relevant to this paragraph and should be spelled out – as a loud non-social white noise burst with powers at low frequencies resembling knocks is likely perceived differently as a sound filtered to be in the bird's own vocalization range that is known to drive positive place preference and to evoke social calls. Please correct this paragraph so the papers are cited properly.

We thank the reviewer for this comment and agree that the nuances of these different auditory feedback experiments warrant further discussion. We have edited this paragraph in the text of the revised manuscript to address these important points (pg. 14, lines 474-498):

“It has been hypothesized that a key function of the songbird AFP circuitry is to encode auditory performance error: the evaluation of the match between the auditory feedback the songbirds receive and their internal goal for what their song should sound like (based on their stored memory of the tutor song template)^11,31,43,44. It has been difficult to determine the extent to which distorted auditory feedback drives adaptive changes in vocal output due to the aversive nature of the stimulus as opposed to the stimulus being interpreted by the bird as an auditory performance error. Some auditory vocal learning experiments have provided white noise bursts during ongoing song performance. In these experiments, songbirds adaptively modify their vocal output to avoid triggering white noise bursts as frequently^24,29,45. Also, white noise bursts can often cause song interruptions at first, suggesting they are startling to the birds^24,29. Other experiments have used distorted elements of song syllable segments played during song performance (distorted auditory feedback), and found that they elicit a pattern of activity in dopaminergic neurons consistent with the encoding of performance error⁴⁴. Importantly, when bursts of noise are provided in non-vocal contexts, such as when a songbird stands on a particular perch (not during song performance), they can positively reinforce place preference³⁸. Thus, due to the various nuances in experimental methodology and the inherent difficulty in measuring the aversive nature of the auditory stimuli, it is unclear whether white noise bursts drive learning because the white noise is registered by the birds as a performance error or because the white noise is generally aversive. Although the results of the experiments described here do not directly address this, they do show that cutaneous stimulation (an explicit, external, aversive sensory stimulus) is sufficient to drive vocal learning. That the AFP underlies non-auditory learning suggests that the AFP does not solely encode auditory performance error. Instead, the AFP may encode more general information about whether vocal performance resulted in a “good" or “bad" outcome, and it may use this information to drive changes to future motor output.”

(6) Figure 2C: data from day 0 and day 1 are identical!

We thank the reviewer for this comment. Please see our response to 1.3.1, where we explain that this was caused by an error in the plotting and not an underlying issue with the data or analysis pipeline. We have edited the figure in the revised manuscript (Figure 2 c).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_3.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_1.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_3.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_2.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_1.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_3.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_data_2.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_3.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_2.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_2.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_2.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_4.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_data_1.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_data_3.mat. figshare. Dataset. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_4_Source_Code_1.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_code_3.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_code_4.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_Code_1.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_2_Source_Code_2.m. figshare. Software. figshare. [DOI]
McGregor J, Grassler A, Jaffe P, Jacob A, Brainard MS, Sober SJ. 2022. McGregor_et_al_Figure_3_Source_Code_1.m. figshare. Software. figshare. [DOI]

Supplementary Materials

Figure 2—source data 1. Source data for analyses in Figure 2.

elife-75691-fig2-data1.zip^{(712.8KB, zip)}

Figure 2—source data 2. Source data for analyses in Figure 2—figure supplement 5 and Figure 2—figure supplement 6.

elife-75691-fig2-data2.zip^{(712.9KB, zip)}

Figure 2—source data 3. Source data for analyses in Figure 2—figure supplement 7.

elife-75691-fig2-data3.zip^{(39.1KB, zip)}

Figure 2—source code 1. Source code for use with Figure 2—source data 1 for analyses in Figure 2B-D.

elife-75691-fig2-code1.zip^{(8KB, zip)}

Figure 2—source code 2. Source code for use with Figure 2—source data 1 for analyses in Figure 2E.

elife-75691-fig2-code2.zip^{(15.5KB, zip)}

Figure 2—source code 3. Source code for use with Figure 2—source data 2 for analyses in Figure 2—figure supplement 5 and Figure 2—figure supplement 6.

elife-75691-fig2-code3.zip^{(17.1KB, zip)}

Figure 2—source code 4. Source code for use with Figure 2—source data 3 for analyses in Figure 2—figure supplement 7.

elife-75691-fig2-code4.zip^{(30.9KB, zip)}

Figure 3—source data 1. Source data for analysis in Figure 3B.

elife-75691-fig3-data1.zip^{(63.4KB, zip)}

Figure 3—source data 2. Source data for analyses in Figure 3C, D.

elife-75691-fig3-data2.zip^{(699.5KB, zip)}

Figure 3—source data 3. Source data for analysis in Figure 3E.

elife-75691-fig3-data3.zip^{(798.4KB, zip)}

Figure 3—source code 1. Source code for use with Figure 3—source data 1 for analysis in Figure 3B.

elife-75691-fig3-code1.zip^{(4.3KB, zip)}

Figure 3—source code 2. Source code for use with Figure 3—source data 2 for analysis in Figure 3C.

elife-75691-fig3-code2.zip^{(1.6KB, zip)}

Figure 3—source code 3. Source code for use with Figure 3—source data 2 for analysis in Figure 3D.

elife-75691-fig3-code3.zip^{(14.3KB, zip)}

Figure 3—source code 4. Source code for use with Figure 3—source data 3 for analysis in Figure 3E.

elife-75691-fig3-code4.zip^{(12.7KB, zip)}

Figure 4—source data 1. Source data for analysis in Figure 4B.

elife-75691-fig4-data1.zip^{(49.7KB, zip)}

Figure 4—source data 2. Source data for analysis in Figure 4C.

elife-75691-fig4-data2.zip^{(595.9KB, zip)}

Figure 4—source data 3. Source data for analysis in Figure 4D.

elife-75691-fig4-data3.zip^{(357.7KB, zip)}

Figure 4—source code 1. Source code for use with Figure 4—source data 1 for analysis in Figure 4B.

elife-75691-fig4-code1.zip^{(4KB, zip)}

Figure 4—source code 2. Source code for use with Figure 4—source data 2 for analysis in Figure 4C.

elife-75691-fig4-code2.zip^{(14.9KB, zip)}

Figure 4—source code 3. Source code for use with Figure 4—source data 3 for analysis in Figure 4D.

elife-75691-fig4-code3.zip^{(15.1KB, zip)}

Transparent reporting form

elife-75691-transrepform1.docx^{(112.1KB, docx)}

Data Availability Statement