Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 2.
Published in final edited form as: Nature. 2013 Feb 27;495(7439):59–64. doi: 10.1038/nature11967

Elemental gesture dynamics are encoded by song premotor cortical neurons

Ana Amador 1,, Yonatan Sanz Perl 2, Gabriel Mindlin 2, Daniel Margoliash 1
PMCID: PMC3878432  NIHMSID: NIHMS441306  PMID: 23446354

Abstract

Quantitative biomechanical models can identify control parameters used during movements, and movement parameters encoded by premotor neurons. We fit a mathematical dynamical systems model including subsyringeal pressure, syringeal biomechanics, and upper vocal tract filtering to the songs of zebra finches. This reduced the dimensionality of singing dynamics, described as trajectories in pressure-tension space (motor “gestures”). We assessed model performance by characterizing the auditory response "replay" of song premotor HVC neurons to presentation of song variants in sleeping birds, and by examining HVC activity in singing birds. HVC projection neurons were excited and interneurons were suppressed with near-zero time lag, at times of gesture trajectory extrema. Thus, HVC precisely encodes vocal motor output via the timing of extreme points of movement trajectories. We propose that the sequential activity of HVC neurons represents the sequence of gestures in song as a “forward” model making predictions on expected behavior to evaluate feedback.


For a given set of movements, sets of movement parameters tend to be correlated with each other, so that it is difficult to resolve if motor cortical neurons encode different sets of static parameters (e.g. position, velocity, direction) or even to distinguish between static and time-dependent parameters (e.g. path trajectory)1. In principle, the motor coding problem can be addressed by developing quantitative models that describe the biomechanics of the movements2. To the extent that such models capture the actual control elements used to produce a movement, this permits motor neuron activity to be evaluated in a natural framework. We examined motor control in the bird song system from this perspective, creating a dynamical systems model of the avian vocal organ (syrinx) that captures many of the rich set of vocal behaviors that characterize bird songs3.

We assessed predictions of the biomechanical model by taking advantage of a neuronal replay phenomenon46. Neurons in the nucleus HVC, a secondary motor/association cortical structure which is the most central structure known to be essential for singing, emit precise premotor activity when a bird sings57, and have responses that are very similar in timing and structure6 that are highly selective for the bird’s own song (BOS) when a bird listens to playback of song8,9. In zebra finches, there is a striking state-dependent neuronal replay phenomenon4 associated with song learning10, so that the strongest and most selective auditory responses are recorded in sleeping birds. We used responses to song in sleeping adult zebra finches as a proxy for evaluating the structure of singing, and then tested emerging hypotheses in singing birds.

Validating a song model: static parameters

The avian vocal organ is a nonlinear device1113 capable of generating complex sounds even when driven by simple instructions14,15. We extended a low dimensional model of the avian syrinx and vocal tract that can capture a variety of acoustic features like the precise relationship between fundamental frequency and spectral content of zebra finch song16,17. The model used here is summarized in Fig. 1. A two dimensional set of equations describes the labial dynamics (see Methods) (Fig.1, x(t), red trace). Flow fluctuations are fed into a vocal tract, generating an input sound Pi(t) (green trace). The tract filters the sound and is characterized as a trachea, modeled by a tube, which connects to the oro-esophageal cavity (OEC), here modeled as a Helmholtz resonator18 (see Methods) . The output of the model is a time trace representing the uttered sound (Pout(t)) (blue trace).

Figure 1. Schematized view of a dynamical systems model describing labial dynamics and vocal tract filtering (trachea and oro-esophageal cavity, OEC).

Figure 1

The syringeal membrane was modeled as a mass (m) with damping (b) and a restitution (spring) force (K). Normal form equations for labial position (x(t), red line) were integrated, computing the input pressure at the vocal tract (Pi(t), green line) and ultimately the total output pressure (Pout(t), blue line). v, sound velocity; T, propagation time along trachea; γ, time constant (see Methods).

Using this model, we created synthetic versions of the songs our test birds sang. Time dependent parameters of the model describing the labial dynamics were reconstructed to account for the time dependent acoustic properties of the sound (see Methods). Following3,16,17, for each bird's song we used an algorithmic procedure to reconstruct unique functions for the air sac pressure (α(t)) and the tension of syringeal labia (β(t)). The result of the procedure for one song is illustrated in Fig. 2, showing that many features observed in the spectrograph of the recorded song (Fig. 2a) were also apparent in the synthesized song (Fig. 2b). Relatively simple time traces of reconstructed pressure and tension arose from fitting the bird’s song (Fig. 2c). These two functions drove the nonlinear equations for the labia to produce a wide range of diverse acoustic features. The parameter space of pressure vs. tension was organized by bifurcation curves (Fig. 2d, black lines), i.e. curves in the parameter space that separated regions where the model presented qualitatively different dynamics (sound patterns). Only one region (Fig. 2d, gray region) corresponded to oscillatory behavior, i.e. labial oscillations resulting in sound pressure fluctuations. Two features of the pressure-tension trajectories resulting in sound output were apparent (Fig. 2d). One, most of the control parameters were maintained close to bifurcation curves, facilitating rapid changes in the quality of sound output with small changes in parameter values. Two, there were many sounds that were characterized principally by movements in pressure or tension but not both.

Figure 2. A low dimensional model: reconstructing gestures.

Figure 2

Spectrographs of a bird's song (a) and model synthetic song (b). Song is described by fitted parameters α(t) and β(t), proportional to air sac pressure and labial tension, respectively (c). Each sound is generated by a continuous curve in the parameter space of the model, a "gesture" (d). Oscillations in the vicinity of a SN bifurcation present rich spectra, typical of zebra finch song. Note that the spectrally poor "high note" (green) is distant from the SN bifurcation. The gray area indicates the region of phonation. The distribution of gesture durations for five birds is displayed in (e).

Song was described by the sequence of these pressure-tension trajectories, which we call gestures, with gesture onsets and offsets defined as discontinuities in either the pressure or tension functions (Fig. 2c). Gestures include movements that do not result in phonation, such as pressure patterns associated with mini-breaths between syllables19 but our recordings here were limited to airborne sounds. In a sample of 8 modeled songs, there were 13±4 gestures per motif (largest basic unit of song, a repeated sequence of syllables). The distribution of gesture durations (mode = 22.5±2.5 ms, range 4–142 ms) was non-Gaussian, with 33% of the gestures ≤ 30 ms, and a long tail corresponding to slowly varying sounds such as constant frequency harmonic stacks (Fig. 2e).

This simple model captured essential features of sound production in a framework of labial tension and subsyringeal pressure over which birds have direct motor control 2022. Whereas the actual syrinx has considerable additional complexity, the model provided for substantial dimensionality reduction. This allowed us to capture a wide range of acoustic features in a small set of time dependent parameters.

We tested the model by comparing responses of HVC neurons to the broadcast of the modeled song (mBOS) and BOS in sleeping birds (Fig. 3). Responses to a grid of mBOS stimuli with identical timing but different spectra from BOS identified optimal estimates for two remaining free static parameters (Supplementary Fig. 1). In sleeping birds, song system neurons are exceptionally selective and it was far from trivial to induce a response: for example mBOS generated without the OEC component failed to elicit response. In a case where we mis-estimated the duration of a component of BOS by 5 ms, a neuron responded strongly to BOS but not at all to the synthetic song (Supplementary Fig. 2b). Over a population of 30 neurons, the best mBOS elicited 58%±8% of the response to BOS (Supplementary Note 1). Both phasic projection neurons (HVC(p)) (N=15) and tonic interneurons (HVC(i)) (N=15) responded selectively to mBOS over non-BOS stimuli (Supplementary Note 1). These results show that a low dimensional model representing an approximation of peripheral mechanics is sufficient to capture behaviourally relevant features of song.

Figure 3. Testing the low dimensional model.

Figure 3

The activity of HVC selective neurons of sleeping birds in response to the presentation of BOS and modeled BOS (mBOS) was similar. The timing of the three repeated motifs that were presented is indicated by the bold horizontal lines.

Projection neurons burst at gesture extrema

We then evaluated the activity of HVC neurons relative to model dynamics, analyzing the timing of spike bursting relative to the pressure-tension trajectories used to synthesize mBOS. This identified a compelling relation between the timing of HVC(p) spikes and the pressure-tension trajectories. For example, in Fig. 4a the spiking of two neurons (coded with different colors) is shown relative to the BOS spectrograph, oscillograph and reconstructed pressure and tension time series. One neuron bursts once, at the transition between descending frequency modulations and a constant frequency “high note”. The other neuron bursts twice, once when the pressure during a high note reached a maximum, the other time at the transition between a high frequency chevron and a broadband frequency modulated sound. Similar relations between spike burst timing and gestures were observed for 14 of the 15 HVC(p) (Supplementary Figs 2 and 3). In one case, a neuron emitted bursts in the interval between syllables. We hypothesize this pattern might arise if the bursts are associated with mini-breaths during singing19. Only the 17 bursts occurring during phonation were considered for further analysis.

Figure 4. Timing of gestures relative to bursting of projection neurons.

Figure 4

a, song spectrograph and oscillograph (top panels); reconstructed parameters pressure and tension (middle panels), with tick marks indicating the times of all GTEs. Bottom panel, raster plots of the responses of two neurons (color coded green and orange), together with their closest GTE, indicated with lines of the same colors. The trajectories (same color coding) in parameter space are displayed in (b), with a point indicating the mean position of a burst, and arrows indicating the trajectory direction. c, distribution of time differences between consecutive GTE occurrences (N = 5 birds). d, distribution of time differences between the time of each spike (Ts) and the time of the closest GTE in sleeping birds (N = 14 HVC(p), 5 birds). e, The same analysis of d on singing birds (N = 5 HVC(p), 2 birds).

Examining the responses of the HVC(p) on pressure vs. tension plots demonstrated that neurons burst preferentially at gesture trajectory extrema (GTE) associated with gestures (Fig. 4b). A gesture has at least two GTE, at its beginning and end, and up to two additional GTE, if the absolute maxima of pressure and/or tension represent unique and distinct time points. No additional GTE result in cases where the absolute maximum is not distinct in time, e.g., multiple local maxima with same magnitude. Of the 17 bursts (14 HVC(p)), 11 (65%) were aligned with onsets/offsets, and 6 (35%) were aligned with pressure or tension maxima. In a sample of 5 songs, there were 28±4 GTE per song (165 total GTE). From a total of 60 gestures, 20 (33.4%) had only onset and offset GTEs; 30 (50%) had in addition a unique peak in pressure (3 GTEs per gesture); 5 (8.3%) had in addition a unique peak in tension (3 GTEs per gesture); and 5 (8.3%) had in addition unique peaks in both pressure and tension. The distribution of time intervals between successive GTE (mode = 9±1ms, range 4 – 116 ms) was non-Gaussian, with 66% of the intervals ≤ 30 ms (Fig. 4c). This is graphically emphasized with tick marks showing all GTEs in Fig. 4a and Supplementary Figs 2,3. Most gestures corresponded to notes (the smallest unit of song organization recognized by ornithologists), yet motor activity at GTE maxima could subdivide notes, for example where a neuron burst and the pressure reached a maximum in the middle of a constant frequency harmonic stack (Supplementary Fig. 2). These examples highlight that for some HVC(p) the patterns of activity would not be interpretable with a purely spectrographic analysis of song5. We also observed cases where HVC(p) burst at the onset of relatively pure pressure-only or tension-only trajectories, with a preponderance for pressure-only trajectories (Fig. 2d). If such neurons project to distinct regions of HVC’s afferent targets, which are organized based on the syringeal muscles and interactions with respiratory system, such observation could help resolve the long-standing riddle of HVC’s topographic organization.

To quantify these observations, we calculated the time between each spike in each burst to the closest GTE for all 17 bursts. The resulting distribution was approximately Gaussian, with bursts on average preceding the closest GTE (mean = –5.6 ± 0.3 ms, σ = 6.7 ± 0.3 ms; Fig. 4d). A bootstrap procedure (Supplementary Note 2) confirmed that the correspondence to the closest GTE was statistically significant (F test, P<0.045). This indicates that the timing of HVC(p) bursts is associated with the timing of GTE. Given a minimal delay between activity of HVC(p) and sound production estimated between 25–50 ms23, the minimal 15 ms delay for auditory feedback to HVC8, and that the duration of intervals between GTE varied greatly (Fig. 4c), it is remarkable that the timing of HVC(p) bursting was synchronized with near–zero time lag to a model of actual behavioral output.

Interneurons are suppressed at GTE

We also noted a relation between the minima in the activity of HVC(i) and the timing of GTE. To characterize this, for each interneuron, we bined the spikes in 10 ms windows for each acoustic presentation. The resultant average response traces were smoothed and the minima in the smoothed traces were identified (see Methods). For an example neuron, the average response is shown in green, the superimposed smoothed curve in black, and the minima in red dots (Fig. 5a, bottom panel). Each HVC(i) did not have minima at all GTE, but across all neurons, we observed a close alignment between the times of the minima and the times of GTE. (A non-significant relation was observed for maxima of HVC(i) activity; Supplementary Fig. 4.) Computing the differences between the time of each minimum that occurred during phonation and the closest GTE resulted in a distribution that was approximately Gaussian (mean = –0.82 ms ± 0.60 ms, σ = 7.3 ± 1.4 ms; Fig. 5b). We compare this distribution to the distribution of randomly positioned minima within each motif using the bootstrap procedure and found them to be significantly different (F test, P<0.016, Supplementary Note 2). Additional tests identified marginally significant locking to GTE for one of four birds (Supplementary Note 3). Thus, the precise activity of HVC(i)7 can help shape the timing of HVC(p). This suggests a simple model where bursts of activity of HVC(p) suppress activity in HVC(i), whose ongoing activity helps shape the next HVC(p) burst.

Figure 5. Suppressed interneuron activity is associated with GTEs.

Figure 5

a, Organized as in Fig. 4a, but with spike count response to the song (10 ms bin, 20 repetitions; green line) for one HVC(i), and a smoothed measure of the response (black line; see Methods). Red squares indicate the time of the minima in the smoothed measure, and the vertical lines indicated the position of the closest GTE to each minima. b, distribution of time differences between spike response minima and their closest GTE in sleeping birds (15 HVC(i), 5 birds). c, Same analysis in singing birds (10 HVC(i), 3 birds).

A representation of gestures during singing

Given that our results were obtained by broadcasting songs to sleeping birds, it is natural to inquire if during singing the activity of HVC neurons are also locked to gesture transitions. Previous results have demonstrated tight temporal locking comparing daytime singing activity and auditory-driven responses during sleep of single RA neurons in zebra finches4, and HVC neurons in awake swamp sparrows and Bengalese finches that respond to auditory stimulation6, but similar observations have yet to be reported for zebra finch HVC neurons. We made recordings from HVC in singing birds (N = 3 birds), including 5 phasic neurons bursting during phonation (recorded in two of the three birds, Fig. 6, Supplementary Fig. 5); one neuron had two bursts per motif, and 10 tonic neurons. We confirmed that during singing, all sparse bursts of HVC(p) occurred at gesture transitions (Fig 4e). Following the same analysis as in sleeping birds (but here, since each motif of song could vary, it was independently modeled), we observed for singing birds even more precise timing of HVC(p) than was observed during sleeping (cf. Fig. 4d, e). The Gaussian fit for the population of phasic neurons recorded during singing (mean = –1.35 ms ± 0.10 ms, σ = 4.0 ± 0.1 ms; Fig. 4e) was significantly different from the bootstrapped random distribution (F test, P<0.025, see Supplementary Note 2 and Supplementary Fig. 6). The minima activity of tonic neurons recorded during singing also showed precise timing relative to GTEs (Gaussian fit for the minima: mean = –0.12 ms ± 0.4 ms, σ = 4.0 ± 0.4 ms; Fig. 5c), and this was significantly different than the bootstrapped random distribution (F test, P<0.002). Additional analyses demonstrated significant locking of minima to GTE in two of three singing birds (Supplementary Note 3). As for sleeping birds, the maxima of tonic neural activity showed no evidence of a significant locking to the GTEs (Supplementary Fig. 4c). Finally, examining the data from a prior study of zebra finches24 we observed that during singing the timing of HVC(RA) bursts were closely associated with the timing of HVC(X) bursts (Supplementary Fig. 7. In light of our results, this supports the hypothesis that all classes of HVC neurons are active in relation to the timing of gestures, although the multiple subtypes of HVC(RA), HVC(X), and HVC(i) have yet to be evaluated.

Figure 6. During singing, HVC(p) fired in the vicinity of GTE.

Figure 6

a, A HVC(p) neuron bursts locked to the vicinity of a GTE even as the syllable sequence and time interval varies. b, For another bird, the burst of a HVC(p) neuron is locked to a GTE in the vicinity of a subtle acoustic transition.

Previously it was concluded that the timing of song syllables was unrelated to the timing of HVC(p) discharge5,24 in singing birds. Given the sparse bursting of these cells this led to the idea that the output of HVC had a time clock-like function with a nearly uniform “tick” size of approximately 10 ms23 supported by a “syn-fire” chain of synaptic activity across HVC(p)5. Instead we find that the bursting of HVC(p) and modulation of HVC(i) activity is timed to significant instances of motor gestures. The sequential firing across the population of HVC(p) unfolds in an ordered fashion5 but time is not explicitly represented in HVC. Instead, the statistics of HVC activity are closely tied to syringeal/vocal tract mechanics. Given the broad distribution of times between GTE, if HVC activity is synchronized with GTEs this is inconsistent with a syn-fire network that is active at every moment. The distinction between these two models of HVC has additional broad implications for the functional organization of the song system, for song learning, and for motor coding.

Since gestures vary greatly in duration, and RA only has access to the times of GTE, then downstream components of the motor pathway (RA and presumably brainstem) should generate independent dynamical information to sustain the detailed structure within each gesture (cf.23,25). Previous experimental results, including the effects of electrical stimulation of HVC or RA during singing26 and lesions of nuclei afferent to HVC27 implicate information in HVC encoding larger units of song. This might arise if some gestures or transitions are over-emphasized in HVC relative to others. Finally, gestures are learned, which is consistent with the physiological properties of HVC neurons: integration over hundreds of milliseconds and multiple syllables, non-linear summation over syllables in a sequence preceding the excitatory response, and selective response to BOS4,8,9,28,29,30. The information about groupings of gestures such as syllables can be carried in these integrated signals. This also re-emphasizes synaptic modification in HVC, not just changes at HVC-RA synapses, are associated with feedback mediated sensorimotor learning (cf. 23). HVC also projects to the cortico-basal ganglia pathway which contributes to learning–mediated synaptic modification in RA by introducing variance into song output31,32. This suggests the hypothesis that the variance is structured not in an auditory framework but around specific features of song motor gestures.

A forward model for vocomotor control

If activity in HVC is in synchrony with little time lag with motor gestures occurring at the periphery this would tend to bring it into temporal register with fixed (circa 15 ms) delayed auditory33, proprioreceptive20, or brainstem34 feedback. This allows movements to be represented in HVC by gestures of greatly varying duration (with dynanics principally generated through internal HVC interactions) while each gesture is referenced to a common time framework for evaluating feedback (with feedback arriving through distinct, extrinsic inputs). This suggests that projection neurons represent a prediction about the actual behavioral output at that moment in time, constituting an unexpected form of a “forward” or predictive model to resolve the problem of the delay in sensorimotor control35. Assuming that behavior is subdivided into gestures, and only the transitions (GTE) are represented by HVC output (HVC(p)), then the intervals between the transitions could accumulate feedback information by modifying the tonic activity of HVC(i) and subsequently the spike bursting of HVC(p). Indeed, HVC receives multiple sources of feedback including input form the primary motor cortex RA36, thalamic input carrying brainstem respiratory, auditory, and proprioreceptive information21,34,37, and forebrain auditory input 38.

We have described song organization based on gestures, by taking advantage of the dynamical systems modeling framework to go beyond spectrographs. These features of motor systems organization could obtain generally39. Our data support Sherrington’s long-standing hypothesis that the motor cortex is a synthetic organ, representing segments of whole movements1,40. In humans the production of speech and the performance of athletes and musicians are an exceptional example of highly precise learned skilled behavior that could share mechanisms to those described here. Developing corresponding models for human speech production should help inform speech and language pathologies where sequential behavior is disrupted.

Methods

Subjects, songs, and surgeries

All procedures were in accordance with a protocol approved by the University of Chicago Institutional Animal Care and Use Committee. Songs were recorded from 12 birds and electrophysiology was conducted on 9 adult male zebra finches (Taeniopygia guttata) bred in our colony. Birds were prepared for recordings with surgeries using standard techniques to implant a head pin (for auditory experiments)10 or motorized microdrive (for singing experiments)5. For auditory experiments, adults were maintained on a 16/8 reversed light cycle in sound isolation boxes. Songs were recorded and filtered using custom software (SABER, A.S. Dave) then edited (Praat, P. Boersma and D. Weenink, www.praat.org). Edited songs included two or three repetitions of one motif, and were typically 2 – 4 s in duration. Birds were prepared for recordings with surgeries using standard techniques to implant a head pin (for auditory experiments)10 or motorized microdrive (for singing experiments)5. Birds were allowed to recover for 2 or 3 days before the first of the days of recordings, and rest for at least 2 days between recording sessions.

Electrophysiology, stimulus presentation, and spike analysis

HVC extracellular recordings were performed in head-fixed sleeping or singing tethered birds. Recordings were post-processed with a spike-sorting algorithm (Klusters, L. Hazan, klusters.sourceforge.net and custom software written by C.D. Meliza) to separate the times of spike events for each unit. For experiments in singing birds, all well-isolated neurons are reported. For auditory experiments, only BOS responsive neurons were recorded. The auditory stimuli were presented randomly with an interstimulus interval of 7±1 s. The neural response to each song is quantified in terms of the Z score25 :

Z=μsμBGVar(S)Var(BG)2Covar(S,BG)

where μS is the mean response during the auditory stimulus (S) and μBG is the mean response during background activity (BG). The denominator of the equation is the standard deviation of (S – BG). The background was estimated by averaging the firing rate during a 2 sec period. The Z score of the mBOS, CON, and REV were normalized to the BOS Z score, and averages across neurons were reported as mean of normalized responses±s.e.m. For interneurons, the strength of the response varied across the motifs42. We picked the last (second or third) motif, which gave the strongest response, to analyze the timing of spikes relative to GTE. This minimized false peaks and troughs in the response profiles. In singing birds, interneurons fired reliably for each motif and all motifs were incorporated into the analysis. The average response of each interneuron (1 ms resolution) was smoothed using a Savitsky Golay filter (polynomial local regression41) and the minima were identified using a 21-point sliding window.

Reconstruction of motor gestures

We assumed flow induced oscillations of opposing labia as a sound source model for birdsong production14. This model assumes that for high enough airflow values, the labia start to oscillate with a wavelike motion. Assuming two basic modes active (a flapping like motion and a lateral displacement of the tissues, appropriately out of phase), a system of equations describe the dynamics of the medial position x(t) of one of the opposing labia, at one of the sound sources. These read

dxdt=ydydt=(1/m)(k(x)x(b(y)+cx2)y+alabpav)

where the first term in the second equation is the restitution in the labium, the second term accounts for the dissipation, and the last term for the force due to the interlabial pressure. The average pressure pav can be written in terms of the displacement and its velocity3. These equations describe a set of qualitatively different dynamical regimes. To gain independence from the details of any particular model presenting these regimes, we worked with a normal form that unfolds into a Saddle node in limit cycle bifurcation and a Hopf bifurcation. The normal form, which is analytically derived43, constitutes the simplest set of equations for any model in which oscillations arise in either of these two bifurcations. Once this reduction is performed, the selection of parameters that allow obtaining a sound with specific acoustic features gives rise to unique values. The normal form equations are shown in Fig. 1, and display the same set of dynamical regimes3 as the physical model, with scaling through a time constant γ. Once x(t) is computed, the pressure at the input of the tract is computed as Pi(t)=α(t)x(t)-rPi(t-T) where T is the time for a sound wave to reach the end of the tube and return, and α(t) is proportional to the average mean velocity of the flow. The transmitted pressure fluctuation Pi(t)=(1-r) Pi(t-0.5T) forces the air in the glottis, which is approximated by the neck of a Helmholtz resonator (used to model the OEC3,44), i.e., a large container with a hole, such that the air in its vicinity oscillates due to the springiness of the air in the cavity. A linear set of three ordinary differential equations accounts for the dynamics of the air flow and pressure in this linear acoustic device3, resulting in the final output pressure Pout(t) (Fig. 1).

We reconstructed the parameters driving the equations of the normal form (α(t) and β(t)), as well as the parameters describing the tracheal length and the OEC cavity in such a way that the synthesized sounds presented the same fundamental frequencies and spectral content as natural song. Reconstructions over sequential sound segments gave estimates of the time dependence of physiological parameters used during song production. A linear integrator (𝝉 = 2.5 ms) was used to compute the envelope of the sound signal. A threshold was used to identify phonating segments. For those longer than 20 ms, we decomposed the recorded songs into successive 20 ms segments (time between consecutive segments ∆t = 1/20000 s). These were short enough to avoid large variation of the physiological gestures, and long enough to compute spectral content. For each segment, we computed the spectral content index (SCI)16 and the fundamental frequency. A search in the parameter space (α(t), β(t)) was performed over a grid so that the synthetic sounds produced would match the fundamental frequencies of the song segment being fitted. Over the set of (α(t), β(t)) values selected, a search was performed so that SCI of the synthetic sounds matched the value of the song segment3. For sound segments shorter than 20 ms, the fundamental frequency was computed as follows. First, we selected the relative maxima of the sound signal that reached the sound envelope. Then, the fundamental frequency was computed as the inverse of the time difference between the next two consecutive selected maxima. The SCI at that time was estimated as the average value among all the possible SCI values, corresponding to that frequency in the framework of the model16. With those estimations of fundamental frequency and SCI, (α(t), β(t)) were computed. Brief segments were typically fast trills. We modeled those as rapid oscillations of pressure and tension, with the amplitude of the pressure oscillations such that the maxima fall in the phonating region, and amplitude of the tension oscillations such that the frequency range of the vocalization was reproduced. We found that most of the parameters could be well approximated by either fractions of sine functions, exponential decays, constants, or combinations of those.

Using these analytic functions as parameters of the model to generate a synthetic copy of the recorded song resulted in a noiseless surrogate song (e.g., Supplementary Fig. 1, Noise=0). The addition of noise allowed the gradual recovery of realistic timbric features. In the text, the dimensionless variable Noise varied between 0 and 40, with Noise=5 corresponding to a fluctuation size equal to 2.5 percent of the maximum range of the β(t) parameter. Notice that the timbric effect will be more important for low frequency sounds, which explore a small range of β(t).

For each bird, the length of the trachea was chosen so that the frequencies close to 2.5 kHz and 7 kHz in the bird’s song were the first and second resonances of a tube closed at one end. This corresponds to a length of 3.5 cm45. Typically, zebra finch songs present a third important resonance around 4 kHz. The parameters of the Helmholtz resonator were adjusted so that its resonant frequency would account for this resonance3. The synthetic songs for sleeping birds were generated before doing the electrophysiological experiments. For singing birds all song reconstructions were also performed blind to the spike data.

Supplementary Material

1

Acknowledgements

We are grateful to Richard H. R. Hahnloser for help with the microdrives and techniques used to record from singing birds. We thank Henry D. I. Abarbanel, Timothy Q. Gentner, Howard C. Nusbaum, and Stephanie E. Palmer for valuable comments on the manuscript. Supported by a Human Frontiers Science Program cross-disciplinary fellowship award to AA, NIDCD006876, CONICET and UBA awards to GBM and YSP, and NIDCD and NSF/CRCNS awards to DM.

Footnotes

Supplementary Information is linked to the online version of the paper at www.nature.com/nature.

Author Contributions: AA, GBM and YSP developed the syringeal model, GBM and YSP modeled the songs, AA conducted surgeries, sound recordings and collected the electrophysiological data, AA, GBM, and DM conceived of and designed the experiments and prepared the manuscript, and all four authors participated in data analysis.

There are no competing financial interests.

References

  • 1.Hatsopoulos NG, Xu Q, Amit Y. Encoding of movement fragments in the motor cortex. J Neurosci. 2007;27:5105–5114. doi: 10.1523/JNEUROSCI.3570-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Nishikawa K, et al. Neuromechanics: an integrative approach for understanding motor control. Integr Comp Biol. 2007;47:16–54. doi: 10.1093/icb/icm024. [DOI] [PubMed] [Google Scholar]
  • 3.Perl YS, Arneodo EM, Amador A, Goller F, Mindlin GB. Reconstruction of physiological instructions from Zebra finch song. Phys Rev E. 2011;84 doi: 10.1103/PhysRevE.84.051909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dave AS, Margoliash D. Song replay during sleep and computational rules for sensorimotor vocal learning. Science. 2000;290:812–816. doi: 10.1126/science.290.5492.812. [DOI] [PubMed] [Google Scholar]
  • 5.Hahnloser RHR, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature. 2002;419:65–70. doi: 10.1038/nature00974. [DOI] [PubMed] [Google Scholar]
  • 6.Prather JF, Peters S, Nowicki S, Mooney R. Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature. 2008;451:305–310. doi: 10.1038/nature06492. [DOI] [PubMed] [Google Scholar]
  • 7.Yu AC, Margoliash D. Temporal hierarchical control of singing in birds. Science. 1996;273:1871–1875. doi: 10.1126/science.273.5283.1871. [DOI] [PubMed] [Google Scholar]
  • 8.Margoliash D. Acoustic parameters underlying the responses of song-specific neurons in the white-crowned sparrow. J Neurosci. 1983;3:1039–1057. doi: 10.1523/JNEUROSCI.03-05-01039.1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Margoliash D. Preference for autogenous song by auditory neurons in a song system nucleus of the white-crowned sparrow. J Neurosci. 1986;6:1643–1661. doi: 10.1523/JNEUROSCI.06-06-01643.1986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shank SS, Margoliash D. Sleep and sensorimotor integration during early vocal learning in a songbird. Nature. 2009;458:73–77. doi: 10.1038/nature07615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Amador A, Goller F, Mindlin GB. Frequency modulation during song in a suboscine does not require vocal muscles. J Neurophysiol. 2008;99:2383–2389. doi: 10.1152/jn.01002.2007. [DOI] [PubMed] [Google Scholar]
  • 12.Elemans CPH, Laje R, Mindlin GB, Goller F. Smooth operator: avoidance of subharmonic bifurcations through mechanical mechanisms simplifies song motor control in adult zebra finches. J Neurosci. 2010;30:13246–13253. doi: 10.1523/JNEUROSCI.1130-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Fee MS, Shraiman B, Pesaran B, Mitra PP. The role of nonlinear dynamics of the syrinx in the vocalizations of a songbird. Nature. 1998;395:67–71. doi: 10.1038/25725. [DOI] [PubMed] [Google Scholar]
  • 14.Mindlin GB, Laje R. The Physics of Birdsong. Berlin: Springer Verlag; 2005. [Google Scholar]
  • 15.Laje R, Gardner TJ, Mindlin GB. Neuromuscular control of vocalizations in birdsong: A model. Phys Rev E. 2002;65:051921:1–051921:8. doi: 10.1103/PhysRevE.65.051921. [DOI] [PubMed] [Google Scholar]
  • 16.Sitt JD, Amador A, Goller F, Mindlin GB. Dynamical origin of spectrally rich vocalizations in birdsong. Phys Rev E. 2008;78:011905:1–011905:6. doi: 10.1103/PhysRevE.78.011905. [DOI] [PubMed] [Google Scholar]
  • 17.Amador A, Mindlin GB. Beyond harmonic sounds in a simple model for birdsong production. Chaos. 2008;18:041023:1–041023:6. doi: 10.1063/1.3041023. [DOI] [PubMed] [Google Scholar]
  • 18.Riede T, Suthers RA, Fletcher NH, Blevins WE. Songbirds tune their vocal tract to the fundamental frequency of their song. Proc Natl Acad Sci U S A. 2006;103:5543–5548. doi: 10.1073/pnas.0601262103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hartley RS, Suthers RA. Air-flow and pressure during canary song - direct evidence for mini-breaths. J Comp Physiol A. 1989;165:15–26. [Google Scholar]
  • 20.Suthers RA, Goller F, Wild JM. Somatosensory feedback modulates the respiratory motor program of crystallized birdsong. P Natl Acad Sci USA. 2002;99:5680–5685. doi: 10.1073/pnas.042103199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wild JM. Functional neuroanatomy of the sensorimotor control of singing. Ann NY Acad Sci. 2004;1016:438–462. doi: 10.1196/annals.1298.016. [DOI] [PubMed] [Google Scholar]
  • 22.Suthers RA, Goller F, Pytte C. The neuromuscular control of birdsong. Philos T Roy Soc B. 1999;354:927–939. doi: 10.1098/rstb.1999.0444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fee MS, Kozhevnikov AA, Hahnloser RH. Neural mechanisms of vocal sequence generation in the songbird. Ann N Y Acad Sci. 2004;1016:153–170. doi: 10.1196/annals.1298.022. [DOI] [PubMed] [Google Scholar]
  • 24.Kozhevnikov AA, Fee MS. Singing-related activity of identified HVC neurons in the zebra finch. J Neurophysiol. 2007;97:4271–4283. doi: 10.1152/jn.00952.2006. [DOI] [PubMed] [Google Scholar]
  • 25.Fiete IR, Hahnloser RHR, Fee MS, Seung HS. Temporal sparseness of the premotor drive is important for rapid learning in a neural network model of birdsong. J Neurophysiol. 2004;92 doi: 10.1152/jn.01133.2003. [DOI] [PubMed] [Google Scholar]
  • 26.Vu ET, Mazurek ME, Kuo YC. Identification of a Forebrain Motor Programming Network for the Learned Song of Zebra Finches. J Neurosci. 1994;14:6924–6934. doi: 10.1523/JNEUROSCI.14-11-06924.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Williams H, Vicario DS. Temporal patterning of song production: Participation of nucleus uvaeformis of the thalamus. J Neurobiol. 1993;24:903–912. doi: 10.1002/neu.480240704. [DOI] [PubMed] [Google Scholar]
  • 28.Margoliash D, Fortune ES. Temporal and harmonic combination-sensitive neurons in the zebra finch's HVc. J Neurosci. 1992;12:4309–4326. doi: 10.1523/JNEUROSCI.12-11-04309.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Nick TA, Konishi M. Neural auditory selectivity develops in parallel with song. J Neurobiol. 2005;62:469–481. doi: 10.1002/neu.20115. [DOI] [PubMed] [Google Scholar]
  • 30.Prather JF, Nowicki S, Anderson RC, Peters S, Mooney R. Neural correlates of categorical perception in learned vocal communication. Nat Neurosci. 2009;12:221–228. doi: 10.1038/nn.2246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Brainard MS, Doupe AJ. Interruption of a basal ganglia-forebrain circuit prevents plasticity of learned vocalizations. Nature. 2000;404:762–766. doi: 10.1038/35008083. [DOI] [PubMed] [Google Scholar]
  • 32.Olveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biol. 2005;3:e153. doi: 10.1371/journal.pbio.0030153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Konishi M. The role of auditory feedback in the control of vocalization in the white-crowned sparrow. Z. Tierpsychol. 1965;22:770–783. [PubMed] [Google Scholar]
  • 34.Ashmore RC, Wild JM, Schmidt MF. Brainstem and forebrain contributions to the generation of learned motor behaviors for song. J Neurosci. 2005;25:8543–8554. doi: 10.1523/JNEUROSCI.1668-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wolpert DM, Ghahramani Z, Jordan MI. An internal model for sensorimotor integration. Science. 1995;269:1880–1882. doi: 10.1126/science.7569931. [DOI] [PubMed] [Google Scholar]
  • 36.Roberts TF, Klein ME, Kubke MF, Wild JM, Mooney R. Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song. J Neurosci. 2008;28:3479–3489. doi: 10.1523/JNEUROSCI.0177-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Coleman MJ, Roy A, Wild JM, Mooney R. Thalamic gating of auditory responses in telencephalic song control nuclei. J Neurosci. 2007;27:10024–10036. doi: 10.1523/JNEUROSCI.2215-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bauer EE, et al. A synaptic basis for auditory-vocal integration in the songbird. J Neurosci. 2008;28:1509–1522. doi: 10.1523/JNEUROSCI.3838-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mulliken GH, Musallam S, Andersen RA. Forward estimation of movement state in posterior parietal cortex. Proc Natl Acad Sci U S A. 2008;105:8170–8177. doi: 10.1073/pnas.0802602105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Leyton SS, Sherrington CS. Observations on the excitable cortex of the chimpanzee, orangutan and gorilla. Q J Exp Physiol. 1917;11:135–222. [Google Scholar]
  • 41.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes: The Art of Scientific Computing. Third edn. Cambridge: Cambridge University Press; 2007. [Google Scholar]
  • 42.Sutter ML, Margoliash D. Global synchronous response to autogenous song in zebra finch HVC. J Neurophysiol. 1994;72:2105–2123. doi: 10.1152/jn.1994.72.5.2105. [DOI] [PubMed] [Google Scholar]
  • 43.Guckenheimer J, Holmes P. Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Berlin: Springer Verlag; 1997. [Google Scholar]
  • 44.Fletcher NH, Riede T, Suthers RA. Model for vocalization by a bird with distensible vocal cavity and open beak. J Acoust Soc Am. 2006;119:1005–1011. doi: 10.1121/1.2159434. [DOI] [PubMed] [Google Scholar]
  • 45.Daley M, Goller F. Tracheal length changes during zebra finch song and their possible role in upper vocal tract filtering. J Neurobiol. 2004;59:319–330. doi: 10.1002/neu.10332. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES