Skip to main content
. Author manuscript; available in PMC: 2023 Apr 13.
Published in final edited form as: Nat Hum Behav. 2022 Jul 18;6(11):1545–1556. doi: 10.1038/s41562-022-01410-x

Fig. 4 |. Human inferences about infant-directedness are predictable from acoustic features of vocalizations.

Fig. 4 |

To examine the degree to which human inferences were linked to the acoustic forms of the vocalizations, we trained two LASSO models to predict the proportion of “baby” responses for each non-confounded recording from the human listeners. While both models explained substantial variability in human responses, the model for speech was more accurate than the model for song, in part because the human listeners erroneously relied on acoustic features for their predictions in song that less reliably characterized infant-directed song across cultures (see Figs. 1b and 2). Each point represents a recorded vocalization (after exclusions n = 528 speech recordings; n = 587 song recordings), plotted in terms of the model’s estimated infant-directedness of the model and the average “infant-directed” rating from the naïve listeners; the barplots depict the relative explanatory power of the top 8 acoustical features in each LASSO model, showing which features were most strongly associated with human inferences (the green or red triangles indicate the directions of effects, with green higher in infant-directed vocalizations and red lower); the dotted diagonal lines represent a hypothetical perfect match between model predictions and human guesses; the solid black lines depict linear regressions (speech: F(1,526) = 773, R2 = 0.59; song: F(1, 585) = 126, R2 = 0.18; ps < .0001; p-values computed using robust standard errors); and the grey ribbons represent the standard errors of the mean, from the regressions.