Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 25;20(6):e0326569. doi: 10.1371/journal.pone.0326569

Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant gender

Emily M Neer 1,*, Anvi Brahmbhatt 2, Catherine R Walsh 1, Anne S Warlaumont 2
Editor: Marcela de Lourdes Peña Garay3
PMCID: PMC12192138  PMID: 40560879

Abstract

Children’s everyday language environments can be full of rich and diverse input, especially adult speech. Prosodic modifications when adults speak to infants are observed cross-culturally and are believed to enhance infant learning and emotion. However, factors such as what and why adults are speaking as well as speaker gender can affect the prosody of adults’ speech. This study asks whether prosodic modifications to infant-directed speech depend on perceived adult speaker gender, assigned infant gender, and the perceived pragmatic function of an utterance. We examined 3,607 adult speech clips from daylong home audio recordings of 60 North American, English-speaking, 3- to 20-month-old infants (28 female). Adult speakers used significantly more imperatives and questions and sang more frequently to infants than other adults. While infant-directed speech tended to have greater mean pitch and pitch modulation than adult-directed speech overall, these patterns were modulated, sometimes in complex ways, by pragmatic function, perceived adult gender, and infant gender. For example, we found that female-sounding adult speakers exhibited greater IDS-ADS mean pitch differences than male-sounding adult speakers when providing information or engaging in conversational niceties. An additional example is that male-sounding adults used higher pitch when singing to male infants compared to female infants. These findings invite further research on how individual, demographic, and situational factors affect speech to infants and possibly infant learning. The study’s pragmatic context tags are added to an existing open dataset of infant- and adult-directed speech.

Introduction

Adults speak differently to infants than to other adults. This phenomenon is established across cultures [13]. Infant-directed speech (IDS) often differs from adult-directed speech (ADS) in having higher pitch, greater variation in intonation and other prosodic features [35], elongated vowels [6], shorter utterances [7], and simpler lexicons [8]. These differences are believed to have benefits for children’s attention to and acquisition of language [911].

The focus of the current study is on voice pitch across IDS and ADS registers. Although some speakers and some cultures manipulate IDS acoustics more than others, generally speaking, adults tend to use overall higher pitch [2] and more pitch modulations when speaking to infants compared to adults [3]. It has been theorized that higher pitch may help infants to match adult speech acoustics to their own, facilitating comparison between self-produced and caregiver-produced utterances [12]. Others have emphasized that greater pitch variability within an utterance may make it more attention-grabbing or help to highlight key words to facilitate infant language learning [13].

It is important to note that much of the research on acoustic differences between IDS and ADS is limited to speech samples produced in specific contexts, with most of the focus being on the speech that is intended to engage an infant’s interest and facilitate language–learning. Moreover, speech samples are often obtained through elicitation by a researcher. In contrast, infants often experience infant-directed speech that has a range of different goals, for example, that of soothing, rather than exciting, and of perhaps facilitating sleep or cessation of an activity, rather than engagement in conversation or with objects. In addition, the ways in which adults speak in real-world settings, such as in their homes while interacting with multiple family members and engaged in various day-to-day tasks and facing various stressors, have the potential to greatly alter the acoustic features of both ADS and IDS.

Pragmatic contexts and acoustic characteristics

Despite general trends differentiating IDS from ADS registers, there is also research indicating that pragmatics, or the functional use of speech, affects speech acoustics in ways that may modulate the IDS-ADS differences. For infant-directed speech, pragmatic contexts can include attention-bidding, approval, prohibition, soothing, and singing [14], among others. Prior work has demonstrated that pitch can vary across pragmatic contexts. For example, mothers are more likely to use lower pitch when soothing an infant [14] and higher pitch when trying to get an infant’s attention [14,15]. Research has also found that adults use greater pitch variability when arousing or entertaining infants, but less variable pitch when soothing infants [16,17]. When singing, parents tend to use a higher pitch [3,18,19]. Moreover, different pragmatic types of singing have different acoustic features: Trainor et al. found that infant play songs were rated to have greater pitch variability than infant lullabies [18]. Pitch differences are also associated with pragmatics (e.g., agreement, criticism, suggestion) in adult speech to other adults [20].

Effects of adult and infant gender on communication

Another factor that can strongly influence adult voice acoustics is gender. Adult females generally have higher voice pitch compared to adult males [21,22] as well as other acoustic differences. These speaker gender differences in voice acoustics are in part attributable to anatomical differences between males and females in vocal folds and the vocal tracts associated with speech production [21,23]. In addition to anatomical differences, sociocultural factors contribute to differences in voice acoustics between males and females [21]. For example, a speaker’s emotional state, race, sex, and age can also affect acoustic variation [24] as well as social and cultural influences [25]. Variation in speech thus carries cues to attributes and states, such as gender, of the speaker, making it socially meaningful [26].

Adult gender is also associated at the group level with differences in the interactions infants have with their caregivers (see Ferjan Ramírez [27] for a review). Fathers have been found in some studies to talk to their children for shorter durations, and less frequently, compared to mothers [28,29], on average. One study found that infants between 3 and 20 months of age heard twice or three times as much speech from female caregivers compared to males [30]. Fathers have also been found (at a group level) to vary their pitch to a lesser degree compared to mothers when speaking to infants [14,29,31], while still varying their pitch more in IDS than in ADS [32].

Moreover, infant gender has also been found in at least one study to affect the IDS infants experience. Johnson et al. found that mothers preferentially responded to infant females whereas fathers preferentially responded to infant males [33].

Although this body of work provides evidence that gender plays some role in shaping IDS and ADS, it is important to note that the research focuses on a narrow range of family structure types, underrepresents gender nonconforming individuals, and often overlooks individual differences as well as interactions between contextual factors and demographics of the speaker as well as of the listener. Vocal communication could potentially be subject to multiple interactions between these and other factors.

Language input in children’s everyday environments

Research on early childhood development is increasingly using instruments and methodologies that capture real-world inputs, including infants’ language environments, from the child’s perspective. For instance, LENA’s infant-worn audio recorders can capture daylong recordings of the infant’s audio environment [34]. At this point, multiple repositories make widely available audio and even video recordings that capture infants’ language exposure, vocal productions, and interactions with objects and others in their home environment [35,36]. These data can provide rich insights into infants’ real-world experiences that cannot be captured in a lab setting.

Some research based on such recordings has highlighted the diversity in infants’ everyday experiences [37]. For example, the amount of speech infants hear varies by the hour and by the speaker [38]. Variability is also found in the faces [39] and even music [40] that infants experience across a given day. Other research has highlighted consistencies such as that infants experience specific words and objects within consistent locations and events (e.g., toothbrush in the bathroom [41,42]).

Particularly relevant for the current study, researchers have used day-long infant-centered audio recordings to analyze pitch contours of real-world adult speech heard by infants over the course of a day. Both similarities and differences have been found in comparison to studies of adult speech samples elicited in a lab setting. Pitch has been found to be reliably higher for IDS versus ADS in these real-world data, similar to elicited samples, but pitch variability and predictability effects have been less robustly replicated [43,44].

Current study

The current study extends the study of IDS and ADS acoustics within daylong audio recordings, exploring the extent to which findings from prior studies using more restricted sampling contexts translate to real-world settings and the ways in which pragmatic context, adult gender, and infant gender may modulate the features of the speech infants experience in their home environments. We leverage an existing cross-sectional, multi-site North American dataset of real-world IDS and ADS samples [30]. Specifically, we aim to (1) estimate the frequencies with which the infants experienced different pragmatic contexts of adult speech, separating those experiences out by whether the speech they experienced was in the IDS or ADS register, (2) assess how pragmatic contexts relate to mean pitch and pitch variability, and to IDS-ADS differences in these measures, and (3) explore how adult speaker gender and infant gender are associated with the patterns observed through aims 1 and 2.

Note that we use an automatic estimate of fundamental frequency on a log scale—log(f0)—as a proxy for pitch (see Methods for details about the algorithm used for f0 estimation). Pitch refers to how f0 is perceived by a listener. Log-scaling f0 in Hertz (Hz) accounts for the tendency for a multiplication of f0 by a fixed factor to be perceived as a fixed amount of change in pitch. For example, doubling f0 is perceived by listeners as an octave change in pitch, regardless of whether the f0 being doubled is low in Hz or high in Hz. In contrast, an equal difference in raw Hz would be heard as relatively large or small depending on whether f0 is low or high, respectively. Log-scaling f0 in Hz accomplishes similar approximations to semitone, Mel, Bark and equivalent rectangular bandwidth (ERB) scales, although there are minor differences among the latter three and a log scale is the simplest and has been argued to be the optimal scale on which to measure approximate pitch differences [45].

Additionally, we use the term “gender” in this paper because pitch differences in vocal properties and the use of IDS and ADS are likely influenced by social constructs and cultural norms as well as by physiological factors. We recognize that infants in this study have not yet developed a gender identity and that their reported gender in this study reflects how they are being socialized. In addition, it should be noted that adult speaker gender in this study is based on listeners’ perceptions of whether a voice sounded male or female, and that this may not correspond to the actual gender identity of the adult speaker. Thus, any differences we observe related to perceived adult gender can be expected to relate primarily to trends for gender-typical voice differences. In other words, any findings with respect to gender categories in this dataset should be interpreted as reflecting gender norms more than actual gender differences.

Method

Dataset

This study used the IDSLabel dataset [30], from the HomeBank repository [36]. It consisted of sound clips from four corpora of daylong LENA recordings, each collected and contributed by a separate research group between 2012 and 2016 [4649]. Participants provided written consent to participate in the study and to have their recordings shared; this consent was obtained by the contributors of each HomeBank corpus and data collection and sharing protocols were approved by those contributors’ IRBs. Participants were 60 typically developing infants (28 female) from primarily-English-speaking North American families, with cross-sectional recordings collected between 3- and 20-months-old. Forty-three infants had parents with university education. The metadata from these corpora did not consistently include race/ethnicity demographics; in some cases, recording dates are available within the original HomeBank corpora metadata. The current study’s secondary analyses were approved by the University of California, Los Angeles Institutional Review Board (#21–000478) and the need for consent was waived. Data was accessed on April 13th, 2021. The authors did have access via HomeBank information that could potentially identify individual participants, including the audio recordings themselves and (in some cases) birthdates (except for the Warlaumont corpus, for which A.S.W. did have access to more identifying details, as the original contributor of that HomeBank corpus; none of those details were used or referenced as part of the current study).

To create the IDSLabel dataset, Bergelson et al. (2019) used the daylong audio (WAV) files together with the labels (provided by the LENA Pro software and available within each HomeBank corpus) to identify and extract probable adult speech clips. Adult speech clips varied in duration, but were always at least 1 second and could in principle include more than one adult speaker utterance if they were not separated by a substantial duration of silence or other sound types [50]. For each automatically identified adult speech clip, trained human coders verified that the clip was indeed adult speech and then classified the clip as adult-directed speech (ADS) or infant-directed speech (IDS), as well as categorizing the clips as male or female based on perceived speaker gender [30]. The IDS vs. ADS classification was based on the perceived speech register of the clip; listeners were asked to make their decision based on whether the clip sounded like the speech and voice were in a style typical for addressing an infant or young child as opposed to another adult. In the current project, these individual adult speech clips were further tagged using a pragmatic context coding scheme developed specifically for this project, described in the next section.

Our study also utilizes the previous pitch estimates obtained by a prior study [43] that obtained estimated pitch contours for the adult vocalization clips in the IDSLabel dataset and calculated the mean and standard deviation (to measure within-utterance variability) of each pitch contour (i.e., log(f0)). That study used the soundgen R package’s “analyze” function with the following parameter settings: pitchFloor = 75 Hz; pitchCeiling = 650 Hz; silence = 0.001; autocorThres = 0.7; pathfinding = “fast”; pitchMethods = “autocor”, “spec”, and “dom”; entropyThres = 0.6; step = 10 ms; wn = “hanning”; and windowLength = 50 [51].

Our dataset started with 3727 adult speech audio clips that had been tagged according to pragmatic context(s). We then excluded clips (n = 107; 81 IDS; 26 ADS) that were exclusively tagged as “noisy”, or indecipherable. We dropped a further 13 speech clips from the dataset because they did not include a tag for adult speaker gender. Our final dataset thus includes a total of 3607 adult speech clips (2210 IDS; 1397 ADS) after exclusions. Of these 3607 clips, 1125 clips were perceived as adult male speakers and 2482 were perceived as adult female speakers. Clips were in English except for one group of clips (n = 9), all from the same LENA “conversational block”, in which the adults spoke Spanish. A research assistant fluent in Spanish tagged pragmatic contexts for those clips.

Annotation

Coding scheme.

Table 1 shows the eight pragmatic context codes for tagging each adult speech clip. The coding scheme was developed after a review of research on parent-child communicative intent [14,5257]. Codes were not mutually exclusive; clips could be annotated for more than one pragmatic context (or for none). Context codes were applied to both ADS and IDS clips. An additional noisy category was included when pragmatic context was not decipherable.

Table 1. Abridged definitions of pragmatic contexts.
Pragmatic Context Definition (abridged)
Conversational Basics Standard social niceties found in speech
Comfort Soothing another adult or child
Singing Singing or humming a song or tune
Inform Explicit, detailed information conveyed
Reading Reading written material
Imperative Requests and commands for another person to do something
Question Question(s) posed
Vocal Play Pre-linguistic sounds (i.e., babbling or cooing)

Note. See the coding protocol on the project’s OSF page for full definitions. Contexts were non-mutually exclusive, meaning one clip could be annotated for more than one context.

Annotation procedure.

Trained annotators were assigned a randomized 5% of the IDSLabel dataset to annotate in each pass of coding. Each pass consisted of one LENA conversational block’s worth of adult clips from each participant. A conversational block consisted of a sequence(s) of clips separated by silences less than 5 seconds [30]. Annotators were blind to child gender. They were not provided with the child’s gender and did not listen to infant speech clips.

To establish annotator reliability, four annotators (including the first and second author) completed two training rounds in which reliability was established across coders for each context category. An additional two rounds of annotator reliability were completed between the first and second author before annotating pilot data for a total of four reliability rounds reported below. Inter-coder reliability was analyzed using Krippendorff’s alpha with 0 being no agreement and 1 being complete agreement [58]. The following reported mean alpha values for each context category was calculated by first averaging alpha values across pairs of coders for each context category in each round. Then, we averaged those values across the four rounds to establish an average reliability value for each category (in parentheses are each category’s range of scores across all pairs of coders and across the four rounds of training): conversational basics (Malpha= 0.55; Range = 0.30−.92), comfort (Malpha= 0.92; Range = 0.66–1.00), singing (Malpha= 0.99; Range = 0.94–1.00), inform (Malpha= 0.70; Range = 0.38–0.94), reading (Malpha= 1.00), imperative (Malpha= 0.72; Range = 0.21–0.97), question (Malpha= 0.77; Range = 0.65–0.97), vocal play (Malpha= 1.00). Note that reading and vocal play were relatively rare categories that had perfect intercoder agreement because none of the coders identified any instances of those categories within the subset of clips for which reliability was assessed. It may also be observed that the conversational basics category had the lowest reliability average compared to other categories. This might be related to the diversity of types of functions included among conversational basics: greetings, backchanneling, polite phrases, exclamations, etc. Note also that some of the reliability coding was performed before revisions to the coding scheme; however, all codes used for the analyses reported below were made using the final version of the coding scheme as recoding was done for files coded under earlier versions. In cases where multiple coders coded a block for reliability purposes, discrepancies were resolved through discussion.

Preregistration and pilot study

A pilot study was conducted with 505 audio clips (314 IDS and 191 ADS clips) annotated by the first and the second author. The pilot study and data aided in determining the final study design and in preparing a study preregistration. The final dataset included four coding passes plus the pilot data and training round annotations. The preregistration, coding scheme, and pilot results can be found on OSF (https://osf.io/va7c8/?view_only=63b948f200654b9ab767a4cf3e351c59).

Preregistered planned analyses and pilot analyses.

This paper includes two sets of planned analyses: (1) prevalence of pragmatic contexts as a function of register and (2) pitch characteristics in specific pragmatic contexts as a function of register and speaker gender. In the preregistration, we planned to analyze the prevalence of pragmatic contexts as a function of register and speaker gender using logistic mixed effects models. We decided to deviate from that prior design, instead adopting a Bayesian approach to model the prevalence of pragmatic contexts as a function of register, due to sparseness of the data in some contexts. We also excluded speaker gender in order to obtain model convergence. For the preregistration, pilot analysis of adult speaker gender and register on pitch characteristics in specific pragmatic contexts utilized a separate linear mixed effects model for each pragmatic context that met our a priori analysis criterion of at least 20 clips in both IDS and ADS registers. This criterion was somewhat arbitrary and was based on our intuitions about what would be a reasonable minimum sample size for clip-level acoustic analyses; our main aim in specifying this criterion ahead of time was to avoid data dredging. We did not examine interaction terms in our models as we do in the current study due to power concerns with the pilot dataset. Another difference to note is that the variables in the pilot analysis were not centered and scaled as they are in the current study.

Exploratory analyses.

Two of the analyses reported in the current study were not included in our preregistration. First, the analysis of pitch characteristics predicted by adult speaker gender and pragmatic context within IDS clips only was an approach we developed to better isolate the roles of speaker gender and pragmatic context, and their potential interaction, while keeping register constant, focusing on IDS as the primary register of this study. Second, the joint analysis of both adult speaker gender and infant gender on pitch characteristics within specific pragmatic contexts was not something we originally envisioned at the time of preregistration; our interest in exploring the potential influence of infant gender emerged later in the project.

Statistical analyses

Analyses comparing how prevalent each context was in IDS versus ADS and were run in R version 4.4.1 [59] using the brms (2.22.0) package; a Bayesian approach was used due to the small sample sizes for some context and register combinations [60]. Analyses to compare pitch measures across contexts, registers, and gender groups were run in R version 4.3.1 using the lme4 (1.1−34; [61]) and lmerTest (3.1−3; [62]) packages with restricted maximum likelihood (REML) as the estimation method. Our predictors were dummy coded: register (ADS = 0; IDS = 1), adult speaker gender (female = 0; male = 1), infant gender (female = 0; male = 1), and pragmatic context (non-presence = 0; presence = 1). Pitch measurements were log transformed, which is important to control for the nonlinear nature of pitch perception (and production), and then scaled by one standard deviation and centered using the scale function in R. We did not scale and center the register variable before entering it into the Bayesian mixed effects model as the variable being predicted, however, it was scaled and centered for inclusion as a predictor in the acoustic analyses, as were adult gender, infant gender, and pragmatic context. Subject identity and coder were entered as random intercepts for each model, based on likelihood ratio tests using varCompTest from the varTestnlme R package (1.3.5; [63]). Post-hoc pairwise comparisons were run using the R packages emmeans (1.8.7; [64]) applying Holm-Bonferroni corrections to account for multiple comparisons. Confidence intervals reported below in brackets next to the coefficients are 95% confidence intervals. Post hoc follow ups to significant interactions were interpreted by comparing the confidence intervals of the regression slopes and examining which of those post hoc regressions were statistically significant. All visualizations were created using the R package ggplot2 (3.4.2; [65]).

Results

Prevalence of pragmatic contexts

Table 2 shows how many clips in the dataset were tagged as belonging to each combination of register, adult speaker gender, and pragmatic context, as well as the percentage of clips with each context, separately for each register and adult speaker gender combination. Ninety clips did not receive any pragmatic context code, and 838 clips received more than one code. We first examined the relative frequencies of each pragmatic context in IDS versus ADS. We ran a Bayesian mixed effects model with register as the outcome variable (assuming a Bernoulli distribution), each pragmatic context as a fixed effect, and participant ID and coder ID as random effects. We initially attempted to include perceived adult speaker gender and its interactions with context as an additional fixed effect, but we removed adult speaker gender in order to achieve model convergence. Our model was as follows:

Table 2. Prevalence regression results and raw frequencies of clips categorized by context, register, and adult speaker gender.

Pragmatic context IDS regression coefficient Male Female
(95% CI) ADS IDS ADS IDS
Inform −0.55
(−0.68, −0.43)
252 (64.1%) 248 (33.9%) 888 (66.7%) 652 (33.6%)
Conversational Basics −0.13
(−0.22, −0.03)
74 (18.8%) 111 (15.2%) 252 (18.9%) 248 (12.8%)
Question 0.21
(0.10, 0.32)
58 (14.8%) 132 (18.0%) 154 (11.6%) 401 (20.6%)
Imperative 0.55
(0.43, 0.68)
8 (2.0%) 80 (10.9%) 35
(2.6%)
269 (13.9%)
Reading 15.93
(1.68, 74.01)
0
(0%)
63 (8.6%) 0
(0%)
169 (8.7%)
Singing 1.08
(0.75, 1.53)
1 (0.3%) 65 (8.9%) 1 (0.08%) 159 (8.2%)
Comfort 0.17
(0.00, 0.38)
0
(0%)
21 (2.9%) 2 (0.15%) 19 (1.0%)
Vocal Play 37.41
(1.02, 183.97)
0
(0%)
12 (1.6%) 0
(0%)
25 (1.3%)
Total 393 732 1332 1942

Note. Frequencies for the context categories do not add up to total number of clips (3607) because a non-mutually exclusive coding scheme was used. In parentheses are the regression 95% confidence intervals and the column-wise percentages.

register~conversational\ basics+comfort + \ singing]\, + \, \text {inform\, + \, imperative+question+reading+vocal play + (1|coder) + (1|participant).

We ran the model with 10 chains of 10,000 iterations, adapt_delta = 0.9 and max_treedepth = 12, resulting in all rhats = 1.0, bulk ESSs ranging from 6,844–52,284 and tail ESSs ranging from 5,250–37,140.

The results are shown in Table 2. Conversational basics and inform contexts were more prevalent in the ADS register compared to the IDS register. Questions, Imperatives, Reading, Singing, Comfort, and Vocal Play were more frequent in the IDS register compared to the ADS register.

Acoustic analyses

We next examined two key utterance-level acoustic characteristics, mean pitch and pitch variability (i.e., standard deviation of the pitch contour), in three different analyses.

Perceived adult speaker gender and register analysis within specific pragmatic contexts.

Our first acoustic analysis examined pitch and pitch variability of clips as a function of register and perceived adult speaker gender, separately for each of the four contexts that met our a priori analysis criterion of at least 20 instances in both registers: conversational basics, inform, questions, and imperative (Fig 1). For each analysis, the data were first subsetted to only include clips in which the context in question was present. This set of analyses addressed the question of whether prior findings of higher overall pitch and greater pitch variability in IDS versus ADS apply in naturalistic utterances belonging to each of those contexts, controlling for speaker gender (which is strongly associated with pitch differences) and allowing detection of possible interactions between speaker gender and register. The basic model structure run for each model is as below:

Fig 1. Pitch and pitch variability of clips as a function of register and adult speaker gender. Note. Each point within the violin plots shows the mean pitch (top row) or pitch variability (bottom row) of a clip. The four contexts for which acoustics were analyzed as a function of register and adult gender are each depicted in a separate pair of plots, one plot for each acoustic characteristic. Adult speaker gender main effects are denoted in text at the top right of each plot. Speech register main effects are denoted in text at the bottom right of each plot. Post-hoc pairwise comparisons following up on significant interactions are denoted by brackets with asterisks indicating significance level of the comparison and the bracket and asterisks bolded when a post-hoc comparison is statistically stronger than its counterpart; gender-comparison post-hocs are above the violins and register-comparison post-hocs are below the violins.

Fig 1

Log pitch characteristic ~ adult speaker gender*register + (1 | coder) + (1 | participant)

We chose to run separate models for each context because we were interested in examining the effects of adult speaker gender and register on pitch characteristic differences within specific pragmatic contexts. See S1 File for results using alternative analysis approaches that prioritize different research questions.

Inform context. 2040 clips from 60 participants were tagged as inform. For mean pitch within these clips, we found significant main effects of adult speaker gender and register (Table 3); we also found an interaction for register and adult speaker gender such that the IDS-ADS difference was greater for female speakers than for male speakers and the male-female difference was greater for IDS than for ADS. For pitch variability within inform clips, a significant main effect of register was found (β= 0.14 [0.09, 0.19], SE = 0.02, p < .001) meaning that inform clips were significantly more variable in IDS than in ADS. All other effects were non-significant.

Table 3. Effects of adult gender and register on mean pitch in the inform context.
Mean Pitch
Predictor Estimate SE CI p
Intercept 0.01 0.05 −0.10–0.11 .905
Adult Gender −0.43 0.02 −0.47 – −0.38 <.001
Register 0.19 0.02 0.15–0.23 <.001
Adult Gender*Register −0.09 0.02 −0.13 – −0.05 <.001
Female-Male ADS 0.72 0.07 0.57–0.86 <.001
Female-Male IDS 1.13 0.08 0.96–1.30 <.001
IDS-ADS Female 0.49 0.05 0.38–0.61 <.001
IDS-ADS Male 0.08 0.08 −0.11–0.26 .36

Note. Linear mixed effects model results. The four rows below the interaction term row give the post-hoc analysis results. The direction of effect for post-hoc findings is such that the category before the minus sign is greater than the category following the minus sign if the effect is positive.

Conversational basics context. 685 clips from 57 participants were tagged as conversational basics. For mean pitch of these clips, we found significant main effects of adult speaker gender and register (Table 4). We also found a significant interaction between register and adult speaker gender such that the female-male difference was greater for IDS than ADS and the IDS-ADS difference was greater for females than for males. For pitch variability within conversational basics clips, a significant main effect of register was found (β = 0.08 [0.01, 0.16], SE = 0.04, p = .02). No other effects were significant.

Table 4. Effects of adult gender and register on mean pitch in the conversational basics context.
Mean Pitch
Predictor Estimate SE CI p
Intercept 0.10 0.08 −0.06–0.27 .206
Adult Gender −0.33 0.04 −0.40 – −0.25 <.001
Register 0.16 0.04 0.09–0.23 <.001
Adult Gender*Register −0.11 0.03 −0.18 – −0.04 .001
Female-Male ADS 0.43 0.12 0.16–0.71 <.001
Female-Male IDS 0.95 0.12 0.68–1.23 <.001
IDS-ADS Female 0.45 0.09 0.26–0.65 <.001
IDS-ADS Male −0.06 0.14 −0.37–0.25 .68

Note. Linear mixed effects model results. The four rows below the interaction term row give the post-hoc analysis results. If the effect is positive, then the category before the minus sign tended to have greater values than the category after the minus sign.

Question context. 745 clips from 60 participants were tagged as question. For mean pitch of these clips, we found significant main effects for both adult speaker gender (β = −0.48 [−0.55, −0.41], SE = 0.04, p < .001) and register (β = 0.19 [0.12, 0.26], SE = 0.04, p < .001). For pitch variability within question clips, we found a significant main effect of adult speaker gender (β = −0.11 [−0.18, −0.03], SE = 0.04, p = .006). No other effects were statistically significant.

Imperative context. 392 clips from 51 participants were tagged as imperative. For mean pitch of these clips, a significant main effect for adult speaker gender was found (β = −0.43 [−0.57, −0.30], SE = 0.07, p < .001). No other significant effects were identified.

Comparison of adult speaker gender and pragmatic context within IDS.

Our second set of acoustic analyses addressed whether there were differences in mean pitch and pitch variability of IDS clips across contexts, considering perceived speaker gender (n = 2210 IDS clips, 60 participants; Fig 2). Vocal play was not included due to there being fewer than 20 IDS vocal play clips. Complete regression tables and interaction plots are in S1 File.

Fig 2. Mean Pitch and Pitch Variability of IDS Clips as a Function of Adult Speaker Gender and Pragmatic Contexts. Note. Each point within the violin plots shows the mean pitch (top row) or pitch variability (bottom row) of a clip. Only IDS clips are represented in this figure and in the corresponding analyses (see text). Asterisks next to pragmatic context categories denote significance of context main effect (the singing main effect was negative, and the conversational basics and comfort main effects were positive). Asterisks above the violins denote significance of the post hoc following up on speaker gender * context interactions.

Fig 2

Mean pitch within IDS. The linear mixed effect model to test for effects of adult gender and pragmatic context on mean pitch of IDS clips was formulated as:

Log mean pitch~adult speaker gender*conversational basics+adult speaker gender*comfort+adult speaker gender*singing+adult speaker gender*inform+adult speaker gender*imperative+adult speaker gender*question+adult speaker gender*reading + (1|coder) + (1|participant)

We found a significant main effect of adult speaker gender (β = −0.47 [−0.52, −0.42], SE = 0.03, p < .001) and a significant interaction between adult speaker gender and singing (β = −0.05 [−0.08, −0.01], SE = 0.02, p = .007). Post hoc pairwise comparisons revealed that adult male speakers used significantly lower pitch in singing versus non-singing IDS contexts (β = −0.39 [−0.71, −0.07], SE = 0.14, p = .01), and that this difference was significantly greater for adult males than for adult females, who did not exhibit a significant difference in mean pitch between singing and non-singing IDS.

Pitch variability within IDS. The linear mixed effect model for pitch variability within IDS was:

Log pitch variability~adult speaker gender*conversational basics+adult speaker gender*comfort+adult speaker gender*singing+adult speaker gender*inform+adult speaker gender*imperative+adult speaker gender*question+adult speaker gender*reading + (1|coder) + (1|participant)

We found a significant main effect of adult speaker gender (β = −0.09 [−0.14, −0.04], SE = 0.03, p = .001) such that females varied their pitch more than males. We also found main effects of singing (β = −0.19 [−0.23, −0.15], SE = 0.02, p < .001), conversational basics (β = 0.05 [0.004, 0.09], SE = 0.02, p = .03), and comfort (β = 0.06 [0.02, 0.09], SE = 0.02, p = .004). Singing clips were less variable and conversational basics and comfort clips were more variable than other contexts. We also found a significant interaction between adult speaker gender and singing (β = −0.06 [−0.09, −0.02], SE = 0.02, p = .005). Post hoc pairwise follow-up comparisons revealed that the difference in pitch variability between singing and non-singing contexts was significantly greater for adult males than for adult females and that the female-male difference in pitch variability was greater for singing than for non-singing contexts.

Adult speaker gender and infant gender within specific IDS contexts.

Our third set of acoustic analyses focused on mean pitch and pitch variability of IDS clips as a function of infant gender and perceived adult speaker gender, using separate models for each pragmatic context. These analyses tested whether the gender of the infant likely being addressed has any effects on speaker pitch, controlling for perceived adult speaker gender and enabling detection of possible interactions between infant gender and adult gender. We chose to analyze contexts separately rather than to attempt to include context as a fixed effect to increase the interpretability of the results. See S1 File for results of a single model with pragmatic context included as predictor variables. Vocal play was again excluded due to having less than 20 instances. The basic model structure run for each model was:

Log pitch characteristic ~ infant gender*adult speaker gender + (1 | coder) + (1 | participant)

In the conversational basics, reading, question, imperative, inform, and singing models, we found a significant main effect of perceived adult speaker gender in that females used a higher mean pitch when speaking to infants than males. Singing was the only context for which there was a significant effect for pitch variability. No significant effects were found in the comfort context for either mean pitch or pitch variability. None of the contexts had significant main effects for infant gender. However, we found two cases of significant interactions between infant gender and adult gender, reported below and visualized in Fig 3.

Fig 3. Estimated Marginal Means of Adult Speaker Gender and Infant Gender Interactions in the Inform and Singing Contexts.

Fig 3

Note. Estimated marginal means of the post hoc follow up tests for the significant interaction between adult speaker gender and infant gender effects on pitch variability in the inform context and on mean pitch in the singing context. Error bars denote 95% confidence intervals and *’s denote which post hoc follow up tests were statistically significant.

For pitch variability of inform context clips (900 IDS clips, 59 participants), there was a significant interaction between infant gender and perceived adult gender (β = −0.08 [−0.16, −0.01], SE = 0.04, p = .03). Post hoc pairwise comparisons revealed that male infants experienced a greater difference between adult females’ and adult males’ pitch variability, with adult females’ pitch variability being significantly higher than adult males’ for male infants (β= 0.41 [0.13, 0.68], SE = 0.12, p = .002).

For mean pitch of singing clips (224 IDS clips, 24 participants), there was a significant interaction between perceived adult speaker gender and infant gender for mean pitch (β = 0.23, [0.08, 0.37], SE = 0.07, p = .002). Post hoc pairwise revealed that for female infants, there was a greater difference between adult male and adult female mean pitch. Moreover, adult males used significantly higher mean pitch with male infants than female infants (β = −0.99 [−1.72, −0.27], SE = 0.31, p = .006), a trend that was significantly different from that of adult females, for whom there was no significant effect of infant gender on mean infant-directed singing pitch.

Discussion

Prevalence of pragmatic contexts

By analyzing randomly sampled real-world day-long recording data, we could get a lay of the land of how frequently this group of young North American children experienced different pragmatic types of adult speech, and whether those experiences occurred in IDS versus ADS register. Question, imperative, reading, singing, comfort, and vocal play contexts were more frequent in IDS compared to ADS, in line with prior research finding imperatives, reading, and questions to be frequent in children’s everyday input [38] along with music exposure consisting of live voices, including caregivers’ singing [40].

In contrast, inform and conversational basics contexts were more frequent in ADS compared to IDS. However, both these contexts were the most prevalent categories regardless of adult speaker gender and register, so infants received a fair amount of IDS input in these contexts as well. The high prevalence of inform clips supports the everyday relevance of the large body of work demonstrating for information-providing contexts that IDS is more salient and attractive to infants and that it also has features that tend to make learning language and concepts easier for infant learners [5,10,66].

As for conversational basics, relatively less literature has focused on IDS-ADS contrasts in this domain. However, there is a literature documenting that conversational basics utterances, such as salutations (e.g., “hi” and “bye”), are common early words in infants’ vocabulary [67,68]. The fact that conversational basics clips are even more prevalent in adult-directed speech in these daylong recordings indicates that these types of clips are not transient elements of early childhood language experience but clip types that infants are frequently exposed to, can learn early, and will continue to utilize throughout their lifespans.

Vocal play and comfort were the least frequent clip types overall. The low prevalence of comfort surprised us, given that, anecdotally, comforting or soothing constitutes much of what caregivers seem to do over the course of the day. The finding may relate to the way in which the category was defined; it is possible that much comforting takes place without explicit language that would have cued our listeners to make a confident judgment that an clip was a comfort clip [14,69]. Another possibility is that comforting in everyday contexts often takes place without accompanying adult speech. Also, anecdotal impressions of the frequency of a vocal pragmatic type do not necessarily correspond reliably to its actual frequency but may instead relate more closely to the experiential salience of the behavior—this mismatch has been proposed to explain the discrepancy between perceived prevalence of infant cries and laughs and their actual low frequencies of occurrence in day-long recordings compared to pre-speech sounds [70]. We also expected that vocal play might be higher, given the literature on adult imitation of infant babble [71]. As with comfort clips, it is possible that that the operationalization of vocal play in the current study was not as inclusive as in prior studies, or this behavior may be rarer than prior literature and anecdotal experience would suggest. Furthermore, both categories could have been interpreted as noise and coded exclusively as “noisy” clips and consequently excluded from our final dataset, and comfort could have occurred overlapping child crying and tagged as such by LENA, in which case it would not have been included among the clips in IDSLabel.

Finally, it is possible that the prevalence of the various register-context combinations could differ as a function of perceived adult gender and/or assigned child gender. The current study did not have sufficient sample size to analyze the prevalence results at that level of granularity.

Acoustic analyses

Perceived adult speaker gender and register within specific pragmatic contexts.

In the inform, conversational basics, and question categories, we found IDS clips to have significantly higher mean pitch than ADS clips. This finding provides real-word validation of higher pitch in IDS compared to ADS for all three of these contexts. However, we also found that for the inform and conversational basics contexts there was a significant interaction between speaker gender and register, such that the IDS-ADS pitch difference was more pronounced for speakers perceived as female than for speakers perceived as male. We also found that for the inform and conversational basics contexts, IDS clips were more variable in pitch than ADS clips which is consistent with prior literature [3,43].

Perceived adult speaker gender and comparisons between pragmatic contexts within IDS.

When looking across IDS clips from different pragmatic contexts, we found that comfort IDS involved greater pitch variability than non-comfort IDS and that infant-directed singing involved less pitch modulation than non-singing IDS. However, for both mean pitch and pitch variability, we also found significant interactions between perceived adult speaker gender and whether a clip involved singing, with adult speakers perceived as male exhibiting a greater reduction in mean pitch and pitch variability during singing than adult speakers perceived as female.

There are a variety of possible explanations for these findings. It is possible that parents in our sample sang mostly lullaby-type songs, which have been found to be lower in pitch and less variable than play songs [18]. The perceived adult gender effects could be related to differences in the songs that are sung, and song selection could vary by gender due to factors such as cultural, social, and individual preferences. It is also possible that perceived adult gender effects could be related to the manner in which a song is sung to infants by male versus female caregivers. In other words, these results could be influenced by biological differences in pitch range, differences in song choice, and/or differences in singing style. Alternatively, the perceived adult gender effects could reflect tendencies of the listeners who coded adult gender to be biased toward perceiving voices as male when pitch and pitch variability were lower.

The finding that comfort clips in IDS had greater pitch variability than IDS non-comfort clips was initially surprising to us because we had originally expected that comfort clips would have pitch properties with mellowing acoustic effects [72]. The finding of greater pitch variability for comfort clips might reflect the way we operationalized comfort in our listener coding protocol and/or the representation of different subtypes of comfort utterances within the dataset.

Perceived adult speaker gender and infant gender in IDS.

Our final set of acoustic analyses explored the effects of assigned infant gender and perceived adult speaker gender on pitch and pitch variability of IDS clips within specific pragmatic contexts. In the inform and singing contexts, we found some intriguing interactions. In the inform context, female-sounding adult speakers varied their pitch more with male infants compared to female infants. In the singing context, male-sounding adult speakers used lower mean pitch with female infants compared to male infants. Apparently, adult speakers sometimes modified their pitch more when addressing male infants compared to female infants, with these infant addressee gender effects depending on perceived adult speaker gender.

These patterns might be related to differences in the types of sounds made by male versus female infants. A prior study found that female adults preferentially responded to infant males with this preferential response appearing to be related to infant males having less of a nasal quality to their vocalizations, a trait that is rated as more socially favorable [73]. Furthermore, infant males have been found to be on average more active [74] and to vocalize more [75] than infant females, which could contribute to differing vocal responses from adults. It is important to emphasize that, while the present results support the idea that infant gender may influence adult speech acoustics, the results are exploratory. Future research using other datasets and/or experimental designs should test for associations between infant gender on parents’ acoustic characteristics to determine the reliability of these effects as well as gain insight into their possible explanations.

Overall implications

Taken together, our results corroborate prior research on overall pitch and pitch variability differences between ADS and IDS, but they also highlight the relevance of various pragmatic and demographic factors and the rich interplay between those factors when it comes to acoustic characteristics of the adult speech young children are exposed to in their everyday language environments. This is significant because it is well established in the IDS literature that prosodic features of adult speech play a role in language learning. For example, infants seem to prefer to listen to IDS over ADS [7678]. Infants’ preference for IDS, possibly due to its saliency and frequency in infants’ daily language input, may be related to its positive association with children’s language development [5,14]. The current study brings to light some of the various factors (i.e., adult speaker gender presentation, assigned infant gender, and pragmatic context) that can affect the prosody of IDS and ADS in infants’ everyday environments. This provides more fine-grained information that can be used to inform future experimental, computational modeling, and descriptive research.

Additionally, researchers could use our pragmatic context labels, combined with the original IDSLabel dataset, to ask additional questions regarding the effects of pragmatic contexts, perceived adult speaker gender, assigned infant gender, and/or register on adults’ speech inputs to children. For example, the inform context includes diverse clips like labeling statements and explanations. These clips could serve different functions of speech and could be further coded and analyzed to associate those different functions with differences in prosody, word choice, or other features of the clips.

The current study supports the value of research on various factors (beyond the pragmatic and gender-related features studied here) that may complexly affect acoustic properties of adult speech in young children’s everyday environments. Future research taking a more fine-grained approach to considering individual differences and demographic and situational variables (e.g., time of day or month, activity context, SES, and child age) may reveal yet more variation and context-sensitivity in prosodic differences in IDS and ADS in ecologically valid settings. For example, we did not include infant age as a factor in the current study, but prior research has shown that differences in adult speakers’ mean fundamental frequency are not as strong between IDS and ADS as children age [79], and it is conceivable that this age trend in mean f0 might differ across contexts, genders, etc. Like how Fernald’s (1989) cross-linguistic study of IDS laid the foundation for and prompted more fine-grained follow-up studies to extend and challenge the original work (e.g., Mazuka et al., 2015), we hope the current study will inspire find-grained, follow-up work [4,14].

Limitations

The dataset for the current study relies on infant-worn recorders [34] and on pre-processing to identify likely adult clips. The recorder and tagging software do not always capture clear input for all adult clips, and many adult clips may have been missed entirely due to being tagged as overlapping with another sound source or being mistagged as a different sound source type. These factors should also be considered when interpreting results, as should the noisiness of the audio in some cases due to using infant-worn recorders in real-world environments. However, the natural recordings used in the current study are important to establish high ecological validity, and some of that noisiness reflects noise that infants also experience (and must grapple with).

Furthermore, the current study relied on two existing datasets: the IDSLabel dataset [30] and pitch estimations from MacDonald et al. [43]. In the IDSLabel dataset, the IDS versus ADS distinction was based on the way clips sounded, rather than who the listener thought the speaker intended to address. This subjectivity in identifying register should be considered when interpretating the results of the current study. Additionally, speaker gender was based on the listener’s best guess with binary gender options rather than on speakers’ actual gender identities. Race, age, gender typicality, and other factors that can influence voice pitch and may further modulate how IDS and ADS vary. Future research is needed, ideally with datasets for which the speakers’ multiple social identity dimensions are self-reported [80].

The current study also uses pitch estimations calculated through an automated process. While we do not have any reason to think that this process systematically biased the results, it is possible that errors in that estimation process could have affected some of our results. In the future, it would be helpful to validate the automatically obtained pitch estimates against pitch measurements obtained by expert humans. Furthermore, the current study focused on just two acoustic features. While these are the most studied features when it comes to studies of IDS, there are many other acoustic features (e.g., pitch predictability, pitch intensity, voice quality, vowel formants) which could in principle be measured and assessed in relation to the pragmatic and gender variables analyzed here [3,10,43,81].

There were also limitations introduced by the choices we made in designing the pragmatic contexts coding protocol, as noted earlier in the Methods and Discussion. In many cases, it would be helpful for interpretation if the context categories were further broken down into subcategories. This is a concern that also applies to much prior research that describes communication behavior as a function of pragmatic context or function (e.g., [3,14]). The extent to which subtypes of pragmatic contexts are associated with acoustic differences in naturalistic input, and how this might interact with demographic and other factors could be addressed in future research.

Additionally, this dataset is limited to a North American sample from just four data collection sites, limiting the generalizability of our findings. Prior work has found that IDS, specifically, is a robust phenomenon in many cultures (e.g., [13,82]). However, studies have also found that communicative styles in IDS differ by culture and that prosodic characteristics of IDS may not be completely universal [3,4,82]. The current study differs from the approach of the aforementioned studies in that we examined the effect of communicative intent (i.e., pragmatic contexts), gender, and register on acoustic properties in real-world daylong audio recordings. Future research could expand on the current approach with speech samples from various languages and cultures.

There is also within-culture individual variation in infants’ everyday experiences [37] not explored in the current study. For example, the infants in our study range from 3- to 20- months of age. Infants’ experiences in their day-to-day environment change significantly with age due to motor ability as one example [83]. Additionally, infants have different routines (e.g., [38,84]), interactions with different adults [38], and overall sensory input (e.g., faces, music, etc. [39,40]) that can affect the speech input that they hear in a given day. Taken together, these factors contribute to the amount and type of input an individual infant receives in their everyday environment. Future research could examine prosodic features of speech in infants’ day-to-day environments with focus on individual differences.

Finally, examining child engagement and interactive effects related to child speech and conversation was beyond the scope of the current study and we did not analyze child vocalizations. However, prior work has found that mother-infant dyads had similar pitch characteristics in conversation with one another [85]; research on how the patterns of IDS pitch modifications found here relate to infant behavior, including infant vocalization acoustics, could shed light on both the mechanisms underlying the current study’s findings and its implications for infant behavior and development.

Conclusion

We utilized existing clips from real-world audio recordings and perceived speaker gender labels, perceived IDS register labels, and acoustic measurements for those clips, then deployed human listeners to code perceived pragmatic context. This enabled us to test whether findings from studies on IDS and ADS in more restricted contexts generalize to real-world data and to explore whether there are variations in these patterns as a function of perceived adult gender, assigned infant gender, and pragmatic context. Indeed, we found several cases of interactions between these contextual factors highlighting the rich interplay of factors that relate to infants’ real-world linguistic experiences.

Supporting information

S1 File

Supplementary Material.

(DOCX)

pone.0326569.s001.docx (20.5MB, docx)

Acknowledgments

The authors would like to thank the families who participated in the studies that are included in the IDSLabel Dataset; the other researchers who contributed the HomeBank and IDSLabel data; Kyle MacDonald, whose pitch measurements were used for this study and for discussions that inspired the current study; and Giselle Littleton and Amy Wong for their help with annotation. Special thanks to Catherine Sandhofer, the UCLA Emergence of Communication Lab members, and the UCLA Language and Cognitive Development Lab members for helpful comments and suggestions on the project and manuscript.

Data Availability

The data and code to reproduce the analyses presented here are publicly accessible on GitHub: https://github.com/emucla/context-study. Analyses were also pre-registered. Materials and the preregistration for this research are available at the following OSF page: https://osf.io/va7c8/

Funding Statement

A.S.W: National Science Foundation grants (1529127 and 1539129/1827744) https://www.nsf.gov/ A.S.W: James S McDonnell Foundation Scholar Award https://grants.jsmf.org/220020507/ Neither funders played a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Broesch T, Bryant GA. Fathers’ Infant-Directed Speech in a Small-Scale Society. Child Dev. 2018;89(2):e29–41. doi: 10.1111/cdev.12768 [DOI] [PubMed] [Google Scholar]
  • 2.Fernald A, Simon T. Expanded intonation contours in mothers’ speech to newborns. Dev Psychol. 1984;20(1):104–13. [Google Scholar]
  • 3.Hilton CB, Moser CJ, Bertolo M, Lee-Rubin H, Amir D, Bainbridge CM. Acoustic regularities in infant-directed speech and song across cultures. Nature Human Behaviour. 2022;6(11):1545–66. doi: 10.1038/s41562-022-01345-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mazuka R, Igarashi Y, Martin A, Utsugi A. Infant-directed speech as a window into the dynamic nature of phonology. Lab Phonol. 2015;6(3–4). [Google Scholar]
  • 5.Soderstrom M. Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Dev Rev. 2007;27(4). [Google Scholar]
  • 6.Kuhl PK, Andruski JE, Chistovich IA, Chistovich LA, Kozhevnikova EV, Ryskina VL, et al. Cross-language analysis of phonetic units in language addressed to infants. Science. 1997;277(5326):684–6. doi: 10.1126/science.277.5326.684 [DOI] [PubMed] [Google Scholar]
  • 7.Phillips JR. Syntax and vocabulary of mothers’ speech to young children: Age and sex comparisons. Child Dev. 1973;44(1):182. [Google Scholar]
  • 8.Foushee R, Griffiths TL, Srinivasan M. Lexical complexity of child-directed and overheard speech: implications for learning. In: Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016. 1697–702. [Google Scholar]
  • 9.McMurray B, Kovack-Lesh KA, Goodwin D, McEchron W. Infant directed speech and the development of speech perception: enhancing development or an unintended consequence?. Cognition. 2013;129(2):362–78. doi: 10.1016/j.cognition.2013.07.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Räsänen O, Kakouros S, Soderstrom M. Is infant-directed speech interesting because it is surprising? - Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition. 2018;178:193–206. doi: 10.1016/j.cognition.2018.05.015 [DOI] [PubMed] [Google Scholar]
  • 11.Weisleder A, Fernald A. Talking to children matters: early language experience strengthens processing and builds vocabulary. Psychol Sci. 2013;24(11):2143–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kuhl PK, Meltzoff AN. The bimodal perception of speech in infancy. Science. 1982;218(4577):1138–41. doi: 10.1126/science.7146899 [DOI] [PubMed] [Google Scholar]
  • 13.Golinkoff RM, Can DD, Soderstrom M, Hirsh-Pasek K. (Baby) talk to me: The social context of infant-directed speech and its effects on early language acquisition. Curr Dir Psychol Sci. 2015;24(5):339–44. [Google Scholar]
  • 14.Fernald A. Intonation and communicative intent in mothers’ speech to infants: is the melody the message?. Child Dev. 1989;60(6):1497–510. [PubMed] [Google Scholar]
  • 15.Ferrier LJ. Intonation in discourse: Talk between 12-month-olds and their mothers. In: Nelson KE. Children’s language. 1st ed. Lawrence Erlbaum Associates, Inc. 1985;35–60. [Google Scholar]
  • 16.Falk S. Melodic versus intonational coding of communicative functions: A comparison of tonal contours in infant-directed song and speech. Psychomusicology Music Mind Brain. 2011;21(1–2):54–68. [Google Scholar]
  • 17.Papoušek M, Papoušek H, Symmes D. The meanings of melodies in motherese in tone and stress languages. Infant Behav Dev. 1991;14(4):415–40. [Google Scholar]
  • 18.Trainor LJ, Clark ED, Huntley A, Adams BA. The acoustic basis of preferences for infant-directed singing. Infant Behav Dev. 1997;20(3):383–96. [Google Scholar]
  • 19.Trehub SE, Unyk AM, Kamenetsky SB, Hill DS, Trainor LJ, Henderson JL, et al. Mothers’ and fathers’ singing to infants. Dev Psychol. 1997;33(3):500–7. doi: 10.1037//0012-1649.33.3.500 [DOI] [PubMed] [Google Scholar]
  • 20.Hellbernd N, Sammler D. Prosody conveys speaker’s intentions: Acoustic cues for speech act perception. J Mem Lang. 2016;88:70–86. [Google Scholar]
  • 21.Munson B, Babel M. The phonetics of sex and gender. In: Katz WF, Assmann PF. The Routledge Handbook of Phonetics. 1st ed. New York, NY: Routledge. 2019;499–525. [Google Scholar]
  • 22.Titze IR. Physiologic and acoustic differences between male and female voices. J Acoust Soc Am. 1989;85(4):1699–707. [DOI] [PubMed] [Google Scholar]
  • 23.Fitch WT, Giedd J. Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J Acoust Soc Am. 1999;106(3 Pt 1):1511–22. doi: 10.1121/1.427148 [DOI] [PubMed] [Google Scholar]
  • 24.Eckert P. Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annu Rev Anthropol. 2012;41(1):87–100. [Google Scholar]
  • 25.Labov W, Rosenfelder I, Fruehwald J. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language. 2013;89(1):30–65. [Google Scholar]
  • 26.Babel M, Munson B. Producing socially meaningful linguistic variation. In: Goldrick M, Victor S, Miozzo M. The Oxford Handbook of Language Development. Oxford University Press. 2014;308–26. [Google Scholar]
  • 27.Ferjan Ramírez N. Fathers’ infant‐directed speech and its effects on child language development. Lang Linguist Compass. 2022;16(1):e12448. [Google Scholar]
  • 28.Leaper C, Anderson KJ, Sanders P. Moderators of gender effects on parents’ talk to their children: a meta-analysis. Dev Psychol. 1998;34(1):3–27. doi: 10.1037/0012-1649.34.1.3 [DOI] [PubMed] [Google Scholar]
  • 29.VanDam M, Thompson L, Wilson-Fowler E, Campanella S, Wolfenstein K, De Palma P. Conversation Initiation of Mothers, Fathers, and Toddlers in their Natural Home Environment. Comput Speech Lang. 2022;73:101338. doi: 10.1016/j.csl.2021.101338 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bergelson E, Casillas M, Soderstrom M, Seidl A, Warlaumont AS, Amatuni A. What do North American babies hear? A large-scale cross-corpus analysis. Dev Sci. 2019;22(1):1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Amano S, Nakatani T, Kondo T. Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings. J Acoust Soc Am. 2006;119(3):1636–47. doi: 10.1121/1.2161443 [DOI] [PubMed] [Google Scholar]
  • 32.Benders T, StGeorge J, Fletcher R. Infant-directed speech by Dutch fathers: Increased pitch variability within and across utterances. Lang Learn Dev. 2021;17(3):292–325. [Google Scholar]
  • 33.Johnson K, Caskey M, Rand K, Tucker R, Vohr B. Gender differences in adult-infant communication in the first months of life. Pediatrics. 2014;134(6):e1603-10. doi: 10.1542/peds.2013-4289 [DOI] [PubMed] [Google Scholar]
  • 34.Greenwood CR, Thiemann-Bourque K, Walker D, Buzhardt J, Gilkerson J. Assessing children’s home language environments using automatic speech recognition technology. Commun Disord Q. 2011;32(2):83–92. [Google Scholar]
  • 35.Adolph K, Tamis-LeMonda C, Gilmore RO, Soska K. Play and learning across a year (PLAY) project - Protocols & documentation. Databrary. 2019. 10.17910/b7.876 [DOI] [Google Scholar]
  • 36.VanDam M, Warlaumont A, Bergelson E, Cristia A, Soderstrom M, De Palma P. HomeBank: An online repository of daylong child-centered audio recordings. Semin Speech Lang. 2016;37(02):128–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.de Barbaro K, Fausey CM. Ten lessons about infants’ everyday experiences. Curr Dir Psychol Sci. 2022;31(1):28–33. doi: 10.1177/09637214211059536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bergelson E, Amatuni A, Dailey S, Koorathota S, Tor S. Day by day, hour by hour: Naturalistic language input to infants. Dev Sci. 2019;22(1):e12715. doi: 10.1111/desc.12715 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jayaraman S, Fausey CM, Smith LB. The Faces in Infant-Perspective Scenes Change over the First Year of Life. PLoS One. 2015;10(5):e0123780. doi: 10.1371/journal.pone.0123780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mendoza JK, Fausey CM. Everyday music in infancy. Dev Sci. 2021;24(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Goldenberg ER, Repetti RL, Sandhofer CM. Contextual variation in language input to children: A naturalistic approach. Dev Psychol. 2022;58(6):1051–65. doi: 10.1037/dev0001345 [DOI] [PubMed] [Google Scholar]
  • 42.Roy BC, Frank MC, DeCamp P, Miller M, Roy D. Predicting the birth of a spoken word. Proc Natl Acad Sci U S A. 2015;112(41):12663–8. doi: 10.1073/pnas.1419773112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.MacDonald KE, Räsänen O, Casillas M, Warlaumont A. Measuring prosodic predictability in children’s home language environments. Proc 42nd Annu Meet Cogn Sci Soc. 2020;695–701. [Google Scholar]
  • 44.Pretzer GM, Lopez LD, Walle EA, Warlaumont AS. Infant-adult vocal interaction dynamics depend on infant vocal type, child-directedness of adult speech, and timeframe. Infant Behav Dev. 2019;57:101325. doi: 10.1016/j.infbeh.2019.04.007 [DOI] [PubMed] [Google Scholar]
  • 45.Hirst DJ, Looze C. Measuring speech. Fundamental frequency and pitch. In: Knight RA, Setter J. The Cambridge Handbook of Phonetics. 1st ed. Cambridge University Press. 2021;336–61. [Google Scholar]
  • 46.Bergelson E. Bergelson SEEDlings HomeBank corpus. 2016. doi: 10.21415/T5PK6D [DOI] [Google Scholar]
  • 47.McDivitt K, Soderstrom M. McDivitt HomeBank corpus. doi: 10.21415/T5KK6G 2016. [DOI] [Google Scholar]
  • 48.VanDam M. VanDam Cougar HomeBank corpus. doi: 10.21415/t5wt25 2018. [DOI] [Google Scholar]
  • 49.Warlaumont AS, Pretzer GM, Mendoza S, Walle EA. Warlaumont HomeBank corpus. 2016. doi: 10.21415/T54S3C [DOI] [Google Scholar]
  • 50.Gilkerson J, Richards JA. A guide to understanding the design and purpose of the LENA ® system. LENA Found Tech Rep. 2020.
  • 51.Anikin A. Soundgen: An open-source tool for synthesizing nonverbal vocalizations. Behav Res Methods. 2019;51(2):778–92. doi: 10.3758/s13428-018-1095-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bornstein MH, Tamis-Lemonda CS, Hahn C-S, Haynes OM. Maternal responsiveness to young children at three ages: longitudinal analysis of a multidimensional, modular, and specific parenting construct. Dev Psychol. 2008;44(3):867–74. doi: 10.1037/0012-1649.44.3.867 [DOI] [PubMed] [Google Scholar]
  • 53.Chang LM, Deák GO. Maternal discourse continuity and infants’ actions organize 12-month-olds’ language exposure during object play. Dev Sci. 2019;22(3):e12770. doi: 10.1111/desc.12770 [DOI] [PubMed] [Google Scholar]
  • 54.Gros-Louis J, West JM, Goldstein HM, King PA. Mothers provide differential feedback to infants’ prelinguistic sounds. Int J Behav Dev. 2006;30(6):509–16. [Google Scholar]
  • 55.Ninio A, Snow CE, Pan BA, Rollins PR. Classifying communicative acts in children’s interactions. J Commun Disord. 1994;27(2):157–87. doi: 10.1016/0021-9924(94)90039-6 [DOI] [PubMed] [Google Scholar]
  • 56.Toda S, Fogel A, Kawai M. Maternal speech to three-month-old infants in the United States and Japan. J Child Lang. 1990;17(2):279–94. doi: 10.1017/s0305000900013775 [DOI] [PubMed] [Google Scholar]
  • 57.Vallotton C, Mastergeorge A, Foster T, Decker KB, Ayoub C. Parenting Supports for Early Vocabulary Development: Specific Effects of Sensitivity and Stimulation through Infancy. Infancy. 2017;22(1):78–107. doi: 10.1111/infa.12147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Krippendorff K. Computing Krippendorff’s alpha-reliability. https://repository.upenn.edu/asc_papers/43. 2011. [Google Scholar]
  • 59.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2020. [Google Scholar]
  • 60.van de Schoot R, Kaplan D, Denissen J, Asendorpf JB, Neyer FJ, van Aken MAG. A gentle introduction to bayesian analysis: applications to developmental research. Child Dev. 2014;85(3):842–60. doi: 10.1111/cdev.12169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bates D, Maechler M, Bolker B, Walker S. lme4: Linear mixed-effects models using “Eigen” and S4. 2003. https://CRAN.R-project.org/package=lme4 [Google Scholar]
  • 62.Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest package: Tests in linear mixed effects models. J Stat Softw. 2017;82(13). [Google Scholar]
  • 63.Baey C, Kuhn E. varTestnlme: An R package for variance components testing in linear and nonlinear mixed-effects models. J Stat Softw. 2023;107(6). [Google Scholar]
  • 64.Lenth RV, Bolker B, Buerkner P, Giné-Vázquez I, Herve M, Jung M. Emmeans: Estimated marginal means, aka least-squares means. 2023. 2024 August 1. https://CRAN.R-project.org/package=emmeans [Google Scholar]
  • 65.Wickham H. ggplot2: Elegant graphics for data analysis. New York: Springer-Verlag. 2016. [Google Scholar]
  • 66.Byers-Heinlein K, Tsui ASM, Bergmann C, Black AK, Brown A, Carbajal MJ. A Multilab Study of Bilingual Infants: Exploring the Preference for Infant-Directed Speech. Adv Methods Pract Psychol Sci. 2021;4(1):1–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Bloom L, Tinker E, Margulis C. The words children learn: Evidence against a noun bias in early vocabularies. Cogn Dev. 1993;8(4):431–50. [Google Scholar]
  • 68.Casey K, Potter CE, Lew-Williams C, Wojcik EH. Moving beyond “nouns in the lab”: Using naturalistic data to understand why infants’ first words include uh-oh and hi. Dev Psychol. 2023;59(11):2162–73. doi: 10.1037/dev0001630 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Fairhurst MT, Löken L, Grossmann T. Physiological and behavioral responses reveal 9-month-old infants’ sensitivity to pleasant touch. Psychol Sci. 2014;25(5):1124–31. doi: 10.1177/0956797614527114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Oller DK, Ramsay G, Bene E, Long HL, Griebel U. Protophones, the precursors to speech, dominate the human infant vocal landscape. Philos Trans R Soc B Biol Sci. 2021;376(1836):20200255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Albert RR, Schwade JA, Goldstein MH. The social functions of babbling: acoustic and contextual characteristics that facilitate maternal responsiveness. Dev Sci. 2018;21(5):e12641. doi: 10.1111/desc.12641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Owren MJ, Rendall D. Sound on the rebound: Bringing form and function back to the forefront in understanding nonhuman primate vocal signaling. Evol Anthropol Issues News Rev. 2001;10(2):58–71. [Google Scholar]
  • 73.Bloom K, Moore-Schoenmakers K, Masataka N. Nasality of infant vocalizations determines gender bias in adult favorability ratings. J Nonverbal Behav. 1999;23(3):219–36. [Google Scholar]
  • 74.Campbell DW, Eaton WO. Sex differences in the activity level of infants. Infant Child Dev. 1999;8(1):1–17. [Google Scholar]
  • 75.Oller DK, Griebel U, Bowman DD, Bene E, Long HL, Yoo H, et al. Infant boys are more vocal than infant girls. Curr Biol. 2020;30(10):R426–7. doi: 10.1016/j.cub.2020.03.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Cooper RP, Aslin RN. Preference for infant-directed speech in the first month after birth. Child Dev. 1990;61(5):1584–95. [PubMed] [Google Scholar]
  • 77.Fernald A. Four-month-old infants prefer to listen to motherese. Infant Behav Dev. 1985;8(2):181–95. [Google Scholar]
  • 78.Werker JF, McLeod PJ. Infant preference for both male and female infant-directed talk: a developmental study of attentional and affective responsiveness. Can J Psychol. 1989;43(2):230–46. doi: 10.1037/h0084224 [DOI] [PubMed] [Google Scholar]
  • 79.Cox C, Bergmann C, Fowler E, Keren-Portnoy T, Roepstorff A, Bryant G. A systematic review and Bayesian meta-analysis of the acoustic features of infant-directed speech. Nat Hum Behav. 2022;7(1):114–33. [DOI] [PubMed] [Google Scholar]
  • 80.Tripp A, Munson B. Perceiving gender while perceiving language: Integrating psycholinguistics and gender theory. Wiley Interdiscip Rev Cogn Sci. 2022;13(2):e1583. doi: 10.1002/wcs.1583 [DOI] [PubMed] [Google Scholar]
  • 81.Piazza EA, Iordan MC, Lew-Williams C. Mothers Consistently Alter Their Unique Vocal Fingerprints When Communicating with Infants. Curr Biol. 2017;27(20):3162-3167.e3. doi: 10.1016/j.cub.2017.08.074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Cox C, Dideriksen C, Keren‐Portnoy T, Roepstorff A, Christiansen MH, Fusaroli R. Infant‐directed speech does not always involve exaggerated vowel distinctions: evidence from Danish. Child Development. 2023. doi: cdev.13950 [DOI] [PubMed] [Google Scholar]
  • 83.Franchak JM. Changing Opportunities for Learning in Everyday Life: Infant Body Position Over the First Year. Infancy. 2019;24(2):187–209. doi: 10.1111/infa.12272 [DOI] [PubMed] [Google Scholar]
  • 84.Tamis-LeMonda CS, Kuchirko Y, Luo R, Escobar K, Bornstein MH. Power in methods: language to infants in structured and naturalistic contexts. Dev Sci. 2017;20(6):10.1111/desc.12456. doi: 10.1111/desc.12456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Ko E-S, Seidl A, Cristia A, Reimchen M, Soderstrom M. Entrainment of prosody in the interaction of mothers with their young children. J Child Lang. 2016;43(2):284–309. doi: 10.1017/S0305000915000203 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Marcela de Lourdes Peña Garay

PONE-D-24-37730Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant genderPLOS ONE

Dear Dr. Neer,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 19 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Marcela de Lourdes Peña Garay, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information .

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript investigates how prosodic modifications in infant-directed speech (IDS) vary based on pragmatic context, perceived adult gender, and infant gender. Analyzing 3,607 speech clips from daylong home recordings of 60 North American infants, the study assesses pitch characteristics across various speaking contexts. The authors exhibit rigor and transparency in their study design, including preregistration of their analyses. They clearly detail the method for assigning adult gender, which is noted as a study limitation—a commendable inclusion. I appreciate the authors’ distinction between planned and exploratory analyses, enhancing the clarity of the reporting. Below, I list some minor suggestions for revision:

1. The authors also provide a detailed explanation of inter-coder reliability, noting the average across four rounds of coding. Given that the practice of averaging reliability has been debated, it would be helpful if the authors also reported the range to give a fuller picture. For example, the reliability for most categories is strong, though it is lower for "Conversational Basics" (0.55). Further explanation on why this category’s reliability is comparatively low and how this may affect the interpretation of results would be valuable.

2. There are a few areas that require further clarification. For instance, it is not entirely clear if "3,727 adult speech audio clips" refers to distinct utterances. Later, the authors mention “3,607 adult speaker utterances (2,210 IDS; 1,397 ADS) after exclusions,” suggesting that all clips were treated as individual utterances. How were utterances determined? In the OSF repository, the first clip (comfort_inform_CDS) includes two grammatical segments (clauses). Were they counted as one utterance?

3. The pragmatic context coding scheme for utterances includes categories such as "Inform," "Conversational Basics," "Question," "Imperative," "Reading," "Singing," "Comfort," and "Vocal Play." It would be helpful to know whether all utterances fit into these categories or if there were cases that could not be coded within this framework.

4. Beyond the factors analyzed, I am curious if the authors considered other potential influences. For example, was the time of day when utterances occurred recorded, and could this factor influence speaker presence by gender? Additionally, given that 43 infants had parents with higher education, could parental education also play a role in prosodic modification patterns?

5. Further detail on the perceived gender distribution of adult speakers within each participant and across the total sample is recommended. Providing descriptive information about these characteristics would offer a clearer picture of the sample's demographic composition. It would also be beneficial if the authors elaborated on the automatic pitch detection methodology and whether they validated this method against human-coded data.

Overall, the manuscript offers a valuable contribution to the field, with results well-placed within the broader child development literature. While the core findings are robust and data-supported, further refinement in presentation and organization would enhance the manuscript’s clarity and impact.

Reviewer #2: First of all, I would like to congratulate the authors on this interesting study. The topic is relevant, the conducted experiment is feasible and the applied statistical analyses are well described, clear and appropriate. The authors discussed all important aspects, including the limitations of the study. I definitely recommend this manuscript for publication. I only have some minor comments and suggestions.

1. I am not familiar of LENA device and its working mechanism. On the official website I read that it’s deleting the actual conversation and store only the acoustic data in order to meet personality rights. If so, how audio playbacks were coded by the coders? The website also stated that the device is not able to “understand” spoken words but it can count the words. But it can also count utterances (as in the article there are information about the number of utterances). In the article there are also utterance-level acoustic characteristics, so it seems that the device is able to do utterance-level analyses. Is it able to do hyperarticulation analyses? Or any other vowel-level analyses? Anyway, it would be expedient to write a little bit more about the device, the software, the output, etc. for those who are not familiar with this data acquisition method.

2. It would be also nice to have a table with the number of clips in each pragmatic context category (authors noted that Reading and Vocal play were underrepresented and sometimes these contexts were excluded from analysis, but it would be important to see the exact numbers within each pragmatic context). It would be also important to see the exclusion rate of clips within each pragmatic context. For instance, we can read in the discussion that authors were surprised that comfort was rare in the dataset. But isn’t it possible that most of the comfort context had been excluded because the infant was crying and as it was considered as “noisy”? A detailed table about excluded clips would give an answer to this question.

3. Line 252 criterion of at least 20 utterances seems a little bit arbitrary. What was the basis for this inclusion criteria?

Reviewer #3: The study "Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant gender" explores how various factors influence the acoustic properties of infant-directed speech (IDS). Using an open dataset of speech clips recorded in the naturalistic environments of North American homes with infants aged 3 to 20 months, the authors compared the pitch features of IDS with adult-directed speech (ADS). Their analysis confirmed that IDS consistently exhibits a higher mean pitch and greater pitch variability compared to ADS, aligning with previous findings. Moreover, the study revealed that these pitch characteristics are not uniform but vary depending on the pragmatic function of the utterance, the gender of the speaker, and the gender of the infant. These findings highlight the nuanced and context-sensitive nature of IDS, underscoring its potential role in facilitating infant engagement and learning.

The authors made commendable efforts to address the issue of zero counts in some pragmatic contexts by introducing a randomization-based approach to include these categories in their logistic mixed-effects model. However, the limited observations for contexts such as 'reading,' 'singing,' and 'comforting,' particularly in the ADS register, raise concerns about the robustness of the conclusions drawn for these categories. Sparse or absent data for certain pragmatic contexts inherently limit the statistical power and reliability of the model's estimates for these categories. The artificial reclassification of data to enable model convergence, while creative, introduces additional uncertainty and potential biases that complicate the interpretability of the findings for these specific contexts.

The results from running the logistic mixed-effects model provided in the manuscript's accompanying code revealed no significant main effect of the pragmatic context of questions on the register, contrary to the findings reported in the paper (see attached html table). This discrepancy reinforces concerns about the robustness of the analysis, particularly given the limited observations for some pragmatic contexts and the reliance on resampling techniques to address zero counts. Furthermore, the warning message generated during model execution—“Model is nearly unidentifiable: large eigenvalue ratio - Rescale variables?”—raises additional doubts about the model's stability and reliability.

The study analyzes IDS across a broad age range of 3 to 20 months, which spans a critical period of infant development. Previous research has demonstrated that IDS evolves as infants grow, with notable changes in pitch characteristics (e.g., Cox et al., 2023), and pragmatic functions tailored to developmental stages, such as capturing young infants’ attention and conveying emotional affect versus supporting linguistic goals in older infants (Fernald, 1992; Kuhl et al., 1997). By aggregating data across this wide age range without accounting for potential developmental differences, the study may obscure important age-specific patterns in IDS. For example, pitch characteristics or the frequency of certain pragmatic contexts (e.g., comforting vs. reading) may differ markedly between younger and older infants, which could confound the results. A more nuanced analysis that examines IDS characteristics by narrower age bands or includes age as a covariate could provide deeper insights into how pragmatic contexts and pitch modulations vary across development.

Furthermore, the inclusion of Spanish utterances in the dataset introduces an additional variable that could influence the pitch characteristics of IDS, as cross-linguistic research has shown that IDS varies significantly across languages (e.g., Fernald et al., 1989). For instance, Spanish IDS may exhibit different pitch ranges, prosodic contours, or modulation patterns compared to English IDS due to phonological and cultural differences in speech patterns (e.g., Cox et al., 2023). These linguistic variations could confound the analysis, particularly if the distribution of Spanish and English utterances is uneven across pragmatic contexts, speaker genders, or infant age groups.

Differences in the pitch characteristics of male and female speakers in the context of singing could be influenced by the type of songs they sing rather than solely reflecting their vocal pitch range or communication style. This is a critical point to consider, as song selection can vary systematically by gender due to cultural, social, or individual preferences, which may introduce confounding factors into the analysis. It is unclear whether the observed pitch differences in singing across genders reflect biological differences in vocal range, variations in song choice, or a combination of these factors.

One of the notable strengths of this study is its commitment to transparency and reproducibility, exemplified by the accessibility of both the data and analysis code. By utilizing an existing open dataset of speech recordings, the authors enable other researchers to verify their findings and extend the work in new directions. Moreover, the availability of the analysis code provides a clear roadmap for reproducing the statistical models and simulations, facilitating peer verification.

Overall, this study offers valuable insights into how pitch characteristics of IDS vary by pragmatic context, speaker gender, and infant gender, yet several limitations and potential confounds warrant careful consideration. The broad age range of 3 to 20 months introduces variability that may obscure developmental changes in IDS, as prior research has shown that IDS evolves with infants’ linguistic and social development. Additionally, the sparse or absent observations for certain pragmatic contexts, such as reading and vocal play, especially in ADS, raise questions about the robustness of the conclusions drawn from these data. The method of artificially reclassifying utterances to address zero counts, while creative, introduces uncertainty and potential bias. Lastly, the observed gender differences in the context of singing could reflect differences in song selection rather than purely biological vocal characteristics or interaction styles. Addressing these issues through more stratified age analyses, and controls for song content—would strengthen the study’s claims and contribute to a more nuanced understanding of IDS dynamics.

Minor points:

Line 115: can LENA capture 16 hours a day? Limited by battery capacity or by storage capacity?

Line 144: But how is it measured automatically?

Line 200 ff: It seems not clear to me, whether a speech clip is synonymously used for utterance and what is meant by a conversational block (especially with regard to identifying how many utterances were in Spanish).

Line 220: How were annotators bling to the child’s gender? Couldn’t this be similarly perceived as the adult speaker’s gender?

Line 330: It might be better to use identical labels within figure and manuscript – standard deviation of pitch versus pitch variability

End of tables and figures are hard to parse: Possibly add “Note” to text below tables and figures still belonging to them.

Supplemental Material Code (Alt analyses): line 122 the infant model mean is missing a + after the interaction of adu_gender*chi_gender*comfort

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes:  Dr. Anna Gergely

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 25;20(6):e0326569. doi: 10.1371/journal.pone.0326569.r003

Author response to Decision Letter 1


28 Feb 2025

Dear Dr. de Lourdes Peña Garay,

Thank you for offering us the opportunity to revise the manuscript.

The reviews were exceedingly thoughtful and constructive. The reviewers raised excellent points, especially in regard to some of our analyses and interpretations which prompted us to learn and implement a new method. We’ve made major revisions to the paper addressing every point raised in the review, and as a result, the paper has been much improved. We reply individually to all the reviewers’ comments in the following pages.

Reviewer 1 Comments:

1. The authors also provide a detailed explanation of inter-coder reliability, noting the average across four rounds of coding. Given that the practice of averaging reliability has been debated, it would be helpful if the authors also reported the range to give a fuller picture. For example, the reliability for most categories is strong, though it is lower for "Conversational Basics" (0.55). Further explanation on why this category’s reliability is comparatively low and how this may affect the interpretation of results would be valuable.

Response: Thank you for these points and suggestions. We have added the interrater reliability range for each of the categories for which calculating the range was available We also note that vocal play and reading had perfect interrater reliability because none of the coders identified any instances of those categories within the subset of clips for which reliability was assessed so range is not reported for those two categories. As for conversational basics, this was a broad category that included greetings, backchanneling, polite phrases, exclamations, etc. This is one of the categories in which future research might benefit from dividing the category into subcategories. We have added this note about the conversational basics category to the manuscript and have elaborated on the relevant portion of the Limitations section of the Discussion.

2. There are a few areas that require further clarification. For instance, it is not entirely clear if "3,727 adult speech audio clips" refers to distinct utterances. Later, the authors mention “3,607 adult speaker utterances (2,210 IDS; 1,397 ADS) after exclusions,” suggesting that all clips were treated as individual utterances. How were utterances determined? In the OSF repository, the first clip (comfort_inform_CDS) includes two grammatical segments (clauses). Were they counted as one utterance?

Response: Thank you for bringing this to our attention. Adult speech audio clips were automatically identified by LENA software and varied in duration (by seconds). Therefore, a clip could include more than one utterance. Where applicable in the manuscript, we changed “utterances” to “clips” and added a sentence in the Dataset subsection to explain that clips could contain more than one utterance.

3. The pragmatic context coding scheme for utterances includes categories such as "Inform," "Conversational Basics," "Question," "Imperative," "Reading," "Singing," "Comfort," and "Vocal Play." It would be helpful to know whether all utterances fit into these categories or if there were cases that could not be coded within this framework.

Response: Thank you for the suggestion. We now report the number of clips that were not tagged with a pragmatic context toward the beginning of the Results section.

4. Beyond the factors analyzed, I am curious if the authors considered other potential influences. For example, was the time of day when utterances occurred recorded, and could this factor influence speaker presence by gender? Additionally, given that 43 infants had parents with higher education, could parental education also play a role in prosodic modification patterns?

Response: Thank you for these suggestions. We did not analyze other potential influences like time of day or parental education. The IDSLabel dataset and metadata do not include time of day, but it would be an interesting factor to consider and could potentially be pursued by linking the IDSLabel data back to the original HomeBank corpora from which they derive. We have added these suggestions to the Discussion section.

5. Further detail on the perceived gender distribution of adult speakers within each participant and across the total sample is recommended. Providing descriptive information about these characteristics would offer a clearer picture of the sample's demographic composition. It would also be beneficial if the authors elaborated on the automatic pitch detection methodology and whether they validated this method against human-coded data.

Response: Thank you for the recommendation. In the Dataset section, we have added descriptive information about the perceived gender of adult speakers across the total sample. The participant-level perceived adult gender information is available within the IDSLabel dataset and the materials that are shared on our GitHub repository, linked within the OSF project associated with the manuscript. Regarding the pitch analyses, Dr. Kyle MacDonald, the first author of the paper from whom the pitch values were obtained is no longer working in academia. The current paper’s senior author served as his mentor on that project and according to her memory and notes, the values were chosen based on informal pilot explorations informed by the soundgen manual and references to prior studies applying automated pitch estimation to caregiver speech samples. Formal validation against human-coded pitch contours was not performed. However, since the sound files are available on HomeBank, a motivated reader could attempt such a project. We did not update the methods to add these details because they are vague and based on personal communications and memories rather than details provided in the original paper and its associated GitHub repository (but we could add these comments to the manuscript if that seems appropriate and desirable). We did add a statement to the relevant section of the Discussion: “In the future, it would be helpful to validate the automatically obtained pitch estimates against pitch measurements obtained by expert humans.”

Reviewer #2 Comments:

1. I am not familiar of LENA device and its working mechanism. On the official website I read that it’s deleting the actual conversation and store only the acoustic data in order to meet personality rights. If so, how audio playbacks were coded by the coders? The website also stated that the device is not able to “understand” spoken words but it can count the words. But it can also count utterances (as in the article there are information about the number of utterances). In the article there are also utterance-level acoustic characteristics, so it seems that the device is able to do utterance-level analyses. Is it able to do hyperarticulation analyses? Or any other vowel-level analyses? Anyway, it would be expedient to write a little bit more about the device, the software, the output, etc. for those who are not familiar with this data acquisition method.

Response: Thank you for letting us know that this was confusing! The LENA system comes in different versions. The version that was used for all the recordings in the IDSLabel dataset was LENA Pro, which does save the audio and allows the user to export a WAV file for each recording. We have added some of this information to the Methods to hopefully make it clearer for readers. We also have changed to using the term “clip” instead of “utterance” as the latter can have linguistic definitions that don’t exactly map onto how the segmentation is done by the LENA software. We’ve added some more information about that segmentation process as well as a reference to a document written by LENA Foundation scientists that we hope helps clarify this. To answer your last two questions, no, the LENA software does not include hyperarticulation or other vowel-level analyses. The LENA resource just mentioned and added to the paper’s citations/references should clarify this for any interested readers who have the same question. We agree that those would be excellent to be able to include in future work, and we’ve now added vowel formants to the examples of other acoustic measures that could be included in future research.

2. It would be also nice to have a table with the number of clips in each pragmatic context category (authors noted that Reading and Vocal play were underrepresented and sometimes these contexts were excluded from analysis, but it would be important to see the exact numbers within each pragmatic context). It would be also important to see the exclusion rate of clips within each pragmatic context. For instance, we can read in the discussion that authors were surprised that comfort was rare in the dataset. But isn’t it possible that most of the comfort context had been excluded because the infant was crying and as it was considered as “noisy”? A detailed table about excluded clips would give an answer to this question.

Response: Thank you for bringing up this idea. The only clips that were excluded from our dataset is if they were only coded as noisy, or indecipherable (n = 107). And it is possible that some comfort may have been mistakenly categorized as “noisy”, as well as possible that some comfort may have not been included within IDSLabel, if it overlapped with child crying and was labeled as such by LENA. We have included this note in our discussion section. We have also included a table in our supplemental material detailing the number of clips in each category and the number of clips with a context code and a noisy code for further transparency.

3. Line 252 criterion of at least 20 utterances seems a little bit arbitrary. What was the basis for this inclusion criteria?

Response: Thank you for pointing out that this could be better explained. We’ve added the following statement: “This criterion was somewhat arbitrary and was based on our intuitions about what would be a reasonable minimum sample size for clip-level acoustic analyses; our main aim in specifying this criterion ahead of time was to avoid data dredging.”

Reviewer #3 Comments:

1. The authors made commendable efforts to address the issue of zero counts in some pragmatic contexts by introducing a randomization-based approach to include these categories in their logistic mixed-effects model. However, the limited observations for contexts such as 'reading,' 'singing,' and 'comforting,' particularly in the ADS register, raise concerns about the robustness of the conclusions drawn for these categories. Sparse or absent data for certain pragmatic contexts inherently limit the statistical power and reliability of the model's estimates for these categories. The artificial reclassification of data to enable model convergence, while creative, introduces additional uncertainty and potential biases that complicate the interpretability of the findings for these specific contexts.

Response: We are grateful for this criticism. In researching more about the possibility for statistical bias with small sample sizes, we discovered that taking a Bayesian rather than a maximum likelihood approach would be better for our logistic regression testing for differences in the prevalences of contexts in IDS vs. ADS. This change in approach enabled us to include all contexts in the model without the artificial random reclassification. (We did need to reduce the model complexity in order to achieve convergence, taking out adult gender as a predictor, but that variable was not significant with the prior approach.) This is one of those wonderful cases where a reviewer comment prompted us to learn and implement a new method, making both the current study and our own skillsets stronger.

2. The results from running the logistic mixed-effects model provided in the manuscript's accompanying code revealed no significant main effect of the pragmatic context of questions on the register, contrary to the findings reported in the paper (see attached html table). This discrepancy reinforces concerns about the robustness of the analysis, particularly given the limited observations for some pragmatic contexts and the reliance on resampling techniques to address zero counts. Furthermore, the warning message generated during model execution—“Model is nearly unidentifiable: large eigenvalue ratio - Rescale variables?”—raises additional doubts about the model's stability and reliability.

Response: Thank you so much for taking the time to run our models and for discovering this discrepancy. We must apologize; the previous submission included a typo in the reporting of the questions result from this analysis, accounting for the discrepancy. As we are now using a Bayesian model in place of the maximum likelihood based model used previously, small sample size is less of a concern (see van de Schoot et al., 2014), and the Bayesian regression runs without warnings and with good diagnostics (https://mc-stan.org/learn-stan/diagnostics-warnings.html).

van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J.B., Neyer, F.J. and van Aken, M.A.G. (2014), A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research. Child Dev, 85: 842-860. https://doi.org/10.1111/cdev.12169

3. The study analyzes IDS across a broad age range of 3 to 20 months, which spans a critical period of infant development. Previous research has demonstrated that IDS evolves as infants grow, with notable changes in pitch characteristics (e.g., Cox et al., 2023), and pragmatic functions tailored to developmental stages, such as capturing young infants’ attention and conveying emotional affect versus supporting linguistic goals in older infants (Fernald, 1992; Kuhl et al., 1997). By aggregating data across this wide age range without accounting for potential developmental differences, the study may obscure important age-specific patterns in IDS. For example, pitch characteristics or the frequency of certain pragmatic contexts (e.g., comforting vs. reading) may differ markedly between younger and older infants, which could confound the results. A more nuanced analysis that examines IDS characteristics by narrower age bands or includes age as a covariate could provide deeper insights into how pragmatic contexts and pitch modulations vary across development.

Response: Thank you for these thoughtful comments and connections to prior literature. We completely agree that age could moderate IDS acoustics in a way that might differ across contexts and/or speaker gender and/or infant gender. We had originally considered including age in our analyses but decided not to do so due to concerns about statistical power and about model complexity posing a challenge to interpretation (we wanted to avoid 3-way interactions). We hope that future research can incorporate age as well as other factors, but we expect that some other statistical approach (e.g., random forest) may become necessary. For these reasons we have not attempted to add this to the current study, but we have expanded the discussion to include some of these points.

4. Furthermore, the inclusion of Spanish utterances in the dataset introduces an additional variable that could influence the pitch characteristics of IDS, as cross-linguistic research has shown that IDS varies significantly across languages (e.g., Fernald et al., 1989). For instance, Spanish IDS may exhibit different pitch ranges, prosodic contours, or modulation patterns compared to English IDS due to phonological and cultural differences in speech patterns (e.g., Cox et al., 2023). These linguistic variations could confound the analysis, particularly if the distribution of Spanish and English utterances is uneven across pragmatic contexts, speaker genders, or infant age groups.

Response: We agree that IDS intonation appears to have both similarities and differences across languages (as is conveyed in the Discussion). However, the number of clips in which the adults spoke Spanish was an extremely small proportion of the IDSLabel dataset. We thus think it very unlikely that this is a confound for our study. We have made some edits to the Methods where it is mentioned that there were a few Spanish clips, including adding the exact number so that readers can see that this was an extrem

Attachment

Submitted filename: Response to Reviewers .docx

pone.0326569.s003.docx (38.2KB, docx)

Decision Letter 1

Marcela de Lourdes Peña Garay

PONE-D-24-37730R1Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant genderPLOS ONE

Dear Dr. Neer,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 30 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Marcela de Lourdes Peña Garay, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate the opportunity to review this revised manuscript. The authors have adequately addressed all of my previous comments.

Reviewer #3: I appreciate the thoughtful and thorough revisions made in response to the initial feedback. The authors have substantially improved the manuscript by increasing methodological transparency and adopting a Bayesian modeling approach to address issues related to sparse data. The updated analyses are clearly reported, and the convergence diagnostics suggest a well-specified model.

The authors note that adult speaker gender was removed from the Bayesian model to achieve convergence—an understandable decision given the data limitations and model complexity. However, since adult gender was central to the study’s original hypotheses and included in the preregistration, it would be helpful to explicitly acknowledge this as a deviation from the preregistered analysis and briefly reflect on its implications for interpreting the results. In addition, I recommend that the authors clearly state that the adoption of the Bayesian approach also constitutes a deviation from the preregistration (Variables and Analyses 1).

Reproducing the Bayesian analysis requires substantial computational time and resources (taking 11 hours for the Bayesian model to run on my end!). This could pose a barrier to reproducibility for researchers without access to high-performance computing setups. I suggest the authors consider caching model outputs (e.g., using saveRDS() or storing fitted model objects) and including them in the repository. This would facilitate transparency and reproducibility without requiring each user to rerun time-intensive sampling.

Minor:

p. 16: max_treedepth is 15 within the code but reported as 12 in the manuscript

Scaling and centering of binary predictors such as adult speaker gender, child gender, and register are performed using the scale() function. I am wondering whether effect coding (e.g., −0.5 / 0.5) would typically offer a more interpretable and robust approach in this context?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 25;20(6):e0326569. doi: 10.1371/journal.pone.0326569.r005

Author response to Decision Letter 2


16 May 2025

Dear Editors,

Thank you for offering us the opportunity to revise the manuscript. We are grateful to the Editor and all the reviewers for their helpful comments. Below are responses to the comments and suggestions on the previous revision.

Reviewer #3 Comments:

1. The authors note that adult speaker gender was removed from the Bayesian model to achieve convergence—an understandable decision given the data limitations and model complexity. However, since adult gender was central to the study’s original hypotheses and included in the preregistration, it would be helpful to explicitly acknowledge this as a deviation from the preregistered analysis and briefly reflect on its implications for interpreting the results. In addition, I recommend that the authors clearly state that the adoption of the Bayesian approach also constitutes a deviation from the preregistration (Variables and Analyses 1).

Response: Thank you for this suggestion, and we agree that transparency is needed. We have added a statement in the “Preregistered Planned Analyses and Pilot Analyses” section to highlight the adoption of the Bayesian approach and that this approach differs from the preregistration. Additionally, we added a short paragraph in the discussion section on the implications of excluding speaker gender from the analysis.

2. Reproducing the Bayesian analysis requires substantial computational time and resources (taking 11 hours for the Bayesian model to run on my end!). This could pose a barrier to reproducibility for researchers without access to high-performance computing setups. I suggest the authors consider caching model outputs (e.g., using saveRDS() or storing fitted model objects) and including them in the repository. This would facilitate transparency and reproducibility without requiring each user to rerun time-intensive sampling.

Response: Thank you so much for taking the time to run our analyses and for this helpful suggestion. We have taken the suggestion to save the model object using saveRDS() and the file is now included in the GitHub repository associated with the manuscript. We also updated the results output text file on GitHub and have updated the numbers in Table 2 and the reported ESSs to match this most recent run of the model.

3. p. 16: max_treedepth is 15 within the code but reported as 12 in the manuscript

Scaling and centering of binary predictors such as adult speaker gender, child gender, and register are performed using the scale() function. I am wondering whether effect coding (e.g., −0.5 / 0.5) would typically offer a more interpretable and robust approach in this context?

Response: Thank you for catching our error in reporting the value of max_treedepth! The correction has been made in the manuscript. We also appreciate the suggestion to consider effect coding. While we did not make this change to the effect coding for this manuscript, as it would have been a fairly major revision. However, we will consider taking an effect coding approach in future studies.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0326569.s004.docx (26.4KB, docx)

Decision Letter 2

Marcela de Lourdes Peña Garay

Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant gender

PONE-D-24-37730R2

Dear Dr. Neer,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Marcela de Lourdes Peña Garay, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Marcela de Lourdes Peña Garay

PONE-D-24-37730R2

PLOS ONE

Dear Dr. Neer,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Marcela de Lourdes Peña Garay

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    Supplementary Material.

    (DOCX)

    pone.0326569.s001.docx (20.5MB, docx)
    Attachment

    Submitted filename: Response to Reviewers .docx

    pone.0326569.s003.docx (38.2KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0326569.s004.docx (26.4KB, docx)

    Data Availability Statement

    The data and code to reproduce the analyses presented here are publicly accessible on GitHub: https://github.com/emucla/context-study. Analyses were also pre-registered. Materials and the preregistration for this research are available at the following OSF page: https://osf.io/va7c8/


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES