Skip to main content
. 2019 Oct 30;12(3):925–941. doi: 10.1111/tops.12474

Figure 2.

Figure 2

Neural network projections of birdsong vocalizations into a 2D latent space. (A) A scatter plot where each point in 2D space represents a syllable sung by a Cassin's vireo (library acquired from Hedley, 2016). Colors denote hand labeled syllable categories, which tend to cluster in the low‐dimensional space. The 5 × 5 grid in the lower right quadrant marks the locations of samples drawn from the 2D space. (B) Spectrograms of synthetically generated syllables, corresponding points in the 5 × 5 grid in (A), where each spectrogram is produced by projecting the 2D points into the decoder network. (C) A similar uniform grid, sampled from a 2D plane of a different neural network trained on European starling song. Signals generated from each point in the grid are presented to a different starling trained on a same‐different operant conditioning task. Distances between points on the grid, and thus the overall warping, reflect an empirically measured similarity between neighboring syllables. (D) A plot of transitions between syllable clusters in a 2D space similar to (A) but from a single European starling; transitions between sequential elements are shown as lines. Line color shows the relative time of a syllable transition within a bout; later transitions are darker.