Explanatory illustration of our data augmentation interventions. (a) ‘Foreground’ recordings, which also contain some signal content coming from the background habitat. The foreground and background might not vary independently, especially in the case of territorial animals. (b) ‘Background’ recordings, recorded when the focal animal is not vocalizing (c) In adversarial data augmentation, we mix each foreground recording with a background recording from another individual, and measure the extent to which this alters the classifier’s decision. (d) In stratified data augmentation, each foreground recording is mixed with a background recording from each of the other classes. This creates an enlarged data set with reduced confounding correlations.