Fig. 2 |. How people alter their voices when vocalizing to infants.
Eleven acoustic features had a statistically significant difference between infant-directed and adult-directed vocalizations, within-voices, in speech, song, or both. Consistent with the LASSO results (Fig. 1b and Supplementary Table 2), the acoustic features operated differently across speech and song. For example, median pitch was far higher in infant-directed speech than in adult-directed speech, whereas median pitch was comparable across both forms of song. Some features were highly consistent across fieldsites (e.g., lower inharmonicity in infant-directed speech than adult-directed speech), whereas others were more variable (e.g., lower roughness in infant-directed speech than adult-directed speech). The boxplots, which are ordered approximately from largest to smallest differences between effects across speech and song, represent each acoustic feature’s median (vertical black lines) and interquartile range (boxes); the whiskers indicate 1.5 × IQR; the notches represent the 95% confidence intervals of the medians; and the doughnut plots represent the proportion of fieldsites where the main effect repeated, based on estimates of fieldsite-wise random effects. Only comparisons that survived an exploratory-confirmatory analysis procedure are plotted; faded comparisons did not reach significance in confirmatory analyses. Significance values are computed via linear combinations with two-sided tests, following multi-level mixed-effects models (n = 1,570 recordings); *p < 0.05, **p < 0.01, ***p < 0.001; no adjustments made for multiple-comparisons due to the exploratory-confirmatory approach taken. Regression results are in Supplementary Table 3 and full reporting of fieldsite-level estimates is in Supplementary Table 5. Note: the model estimates are normalized jointly on speech and song data so as to enable comparisons across speech and song for each feature; as such, the absolute distance from 0 for a given feature is not directly interpretable, but estimates are directly comparable across speech and song.