Skip to main content
. Author manuscript; available in PMC: 2022 Aug 10.
Published in final edited form as: Curr Biol. 2021 Jun 16;31(15):3419–3425.e5. doi: 10.1016/j.cub.2021.05.035

Figure 1. A neural-network-based decoder to synthesize birdsong from premotor neural activity.

Figure 1.

(A) Neural activity is collected from awake-singing animals. Sorted, extracellularly recorded single- and multi-units show different degrees of singing-related sparseness, robustness, and spiking precision (4 example clusters; top traces: normalized mean firing rate over 70 repetitions of the bird’s motif; below: spectrogram of the motif; see also Figure S1).

(B) Downstream of HVC, the posterior motor pathway leads into nuclei that control the muscles driving the sound production (nXII and RAm/PAm).34 Syringeal and respiratory muscles act coordinately to modulate the flow of air through sets of labia and produce sound.35 The complex labial motion is captured by the equations of a nonlinear oscillator;23 parameters that define acoustic properties of the sounds are surrogates of the activities of syringeal and respiratory muscles.36

(C) To reproduce a particular vocalization (top) from the biomechanical model, we fit the parameters (middle {α(t), β(t), e(t)}) such that, upon integration, the synthetic song (bottom) matches the pitch and spectral richness.

(D) The input of the neural network is an array with the values of a set of neural features (spike counts of sorted units/multi-units) over a window of M previous time steps.

(E) The hidden layer(s) of the network are composed either by a densely connected layer (FFNN) or two layers of LSTM cells.

(F) When training or reconstructing directly the spectral features of the song, the output of the network is a vector of powers across a range of frequency bands at a given time; the generated spectral slices are then inverted to produce synthetic song (top). When training or reconstructing via the biomechanical model, the output of the network at a given time is a 3-dimensional vector of parameters (as illustrated in C); the equations of the model are then integrated with these values to produce synthetic song (bottom). Illustrations were taken from Arneodo.37