Skip to main content
[Preprint]. 2024 Mar 3:2024.02.28.582631. [Version 1] doi: 10.1101/2024.02.28.582631

Fig 2. Three-step training process for the APD metric.

Fig 2.

A) Representation. Schematized process for converting to STAR-scale spectrograms. Linear spectrograms are converted to STAR-scale spectrograms through a set of STAR-scaled filter banks (see Materials and Methods for actual parameters). B) Pre-training. The CNN model is pre-trained on STAR-scale spectrograms of starling vocalizations to learn statistical features of song. Input spectrograms are divided into four equally sized spectral slices and shuffled (not shown here). The CNN outputs a ranking vector that indexes the original position of the shuffled slices. During training, all four slices are fed into the same CNN, yielding four feature vectors, which are then concatenated and passed to the fully connected (FC) ranking layers for classification. C) Fine-tuning. Once pre-trained, the learned CNN is fine-tuned on behavioral data. Pre-trained weights are transferred directly to the same CNN which is now connected to task-specific networks. Inputs to the fine-tuned model are unsegmented STAR-scale spectrograms. Outputs are animal judgments collected during the behavioral experiment.