Behavioral generalization across renditions in all-call-type tests. Effect of gain of experience with the sounds over the course of a daily test on behavioral performance where discrimination for vocalizer is assessed for all call types simultaneously. a, b show the results when subjects (n = 13) are familiar to the vocalizers (same vocalizers as in single-call-type tests). c, d show the results when subjects (n = 7) are naive to the vocalizers. a, c Behavioral tests were divided into three consecutive 30-min sessions labeled 1−3 (see Supplementary Methods). Call types are ordered along a decreasing effect of session from left to right. For both Familiar and Naive subjects, the OR increases with session rank (effect of Session:VocType: LRT on GLME for Familiar, χ22 = 59.5, p = 1.2035×10−13; for Naive, χ22 = 17.035, p = 1.9996×10−4). a Despite the behavioral improvement due to session training, the discrimination of vocalizers is above chance level even during the first 30-min session for Familiar subjects (effect of VocType on first session data: LRT on GLME, χ21 = 60.1, p = 8.9928e−15). The improvement in birds’ performances depends on the call type (effect of Session:VocType:CallType: LRT on GLME, χ212 = 28.7, p = 0.0044348). b, d The scatter plots show behavioral performances when the dataset is restricted to stimuli constructed with renditions heard once or twice. As in Fig. 4b, colored circles show performance of individual subjects and the black diamonds and error bars show the average performance and 95% confidence intervals obtained from the coefficients of GLME. Filled (vs. empty) circles indicate that the behavioral performance is significantly above chance. Vocalizations are labeled and color coded as in Fig. 4a. b, Familiar birds discriminate the vocalizers above chance level irrespective of the call type except on Songs (effect of VocType: LRT on GLME, χ21 = 66.9, p = 3.3307×10−16). Te and Th are the best ID-discriminated call types while Songs is not significant (effect of VocType:CallType: LRT on GLME, χ26 = 52.3, p = 1.6413×10−9; effect of CallType: LRT on GLME, χ26 = 17.2, p = 0.0084594). d Pairs of vocalizers were also discriminated on average above chance by subjects naive to the vocalizers (effect of VocType: LRT on GLME: χ21 = 8.9, p = 0.002914)