A. Spike raster plot in four antidromically identified NIfHVC cells (red and green rasters, top) and two NIf interneurons (orange and black rasters, bottom), aligned with song-motifs (the black arrow indicates alignment point). Sound amplitudes (binarized, shaded green area) reveal jitter in song tempo, explaining increased variability of NIfHVC spiking away from the alignment point. Syllable C looks like a concatenation of two other syllables (D and B). The concatenation is so tight that the bird seems to sing B-D-B-D with almost no gap in the transition from D to B (yet C1 and D, as well as C2 and B, significantly differed from each other; Kolmogorov-Smirnov test, p = 0.25). Interestingly, NIfHVC spike patterns during this quasi-repeat (two transparent black boxes) also forms a quasi-repeat, with clear differences visible in one of the spike bursts that was composed of 3.3 spikes on average for the first rendition (left blue arrow) and 1.3 spikes on average for the second rendition (right blue arrow). B. We computed syllable onsets and offsets as the time points at which sound amplitudes exceeded a threshold of three standard deviations above baseline (silence). C. The average cross-covariance function between NIfHVC spike trains and syllable onsets (black) and syllable offsets (red) reveal that NIfHVC cells tend to spike about 21 ms before syllable onsets (black arrow) and to not spike about 21 ms before syllable offsets (red arrow). D. Sound amplitudes peak after NIfHVC bursts. The cross covariance function (black) between NIfHVC spike trains and sound amplitudes averaged over n = 16 NIfHVC neurons peaks 46 ms after NIfHVC spikes. The broad peak exceeds a significance threshold (dashed lines) of two jackknife standard deviations (dashed lines). Blue and green curves show cross-covariance functions in juveniles and adults, respectively. All data can be downloaded as part of the supporting information files (S1 Data).