Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 19;15:13624. doi: 10.1038/s41598-025-98047-3

Maternal and paternal infant directed speech is modulated by the child’s age in two and three person interactions

Manuela Filippa 1,2,, Hervé Tissot 1,3, Tiffany Mancinelli 1, Nicolas Favez 1,3, Didier Grandjean 1,2
PMCID: PMC12009295  PMID: 40253572

Abstract

Prosody in infant-directed speech (IDS) serves important functions for the infant’s attention, regulation, and emotional expression. However, how the structural characteristics of this vocal signal are influenced by the presence or absence of one or two parents at different infant ages remains under-investigated. This study aimed to identify the acoustic characteristics of parental vocalizations in 69 families during specific phases of the Lausanne Trilogue Play (LTP) setting. Vocalizations were analyzed in both two-person contexts (mother-baby or father-baby interacting with the infant individually) and three-person contexts (mother-baby or father-baby interactions in the presence of the other parent) at three time points: when the infant was 3, 9, and 18 months old. Videos of interactions were coded, and the parental vocalizations were extracted. Five components of acoustic features related to the prosodic aspects of speech were extracted for subsequent analysis: intensity and its variability, pitch and pitch variability, formant amplitude, the intensity of specific speech frequency bands affecting sound timbre, and the rate of voiced and unvoiced segments per second. The study demonstrated a main effect of infant age on parental acoustic prosodic characteristics, along with significant interactions between infant age and interaction context (two- versus three-person) and between infant age and parental role (mother versus father). Across contexts and parental roles, intensity, pitch, and their variability consistently increased from 3 to 9 months. By 9 months, distinct prosodic patterns emerged, including a reduced syllable rate and formant amplitude, along with an increase in pauses. The mother’s voice exhibited a steady increase in intensity, as well as in pitch and intensity variability. Interestingly, when comparing parents across the two contexts, IDS in the three-person context is characterized by a higher rate of syllables and fewer pauses, with the most pronounced changes observed at 9 months of age. The development of prosodic characteristics in IDS is not constant across age and it is influenced by the complex interactions between age phases, parental gender, and contextual factors, with a dynamic adaptation of the communication strategies in three-person contexts. The current study underscores the importance of taking a comprehensive perspective in analyzing infant-directed speech within an interactive context involving both fathers and mothers in two- and three-person settings.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-98047-3.

Keywords: Infant-directed speech, Emotional prosody, Lausanne trilogue play

Subject terms: Psychology, Human behaviour

Introduction

Parental vocalizations, known as infant-directed speech (IDS) adjust dynamically to infants’ developmental milestones indicating both parental responsiveness and infants’ changing communicative requirements. Early vocal interactions establish a social feedback loop, wherein parental responses to infants’ vocalizations bolster their language acquisition and development1. These adaptations frequently entail alterations in prosody which have been noted to evolve as infants transition from prelinguistic vocalizations to nascent speech forms2. The investigation of IDS is crucial for understanding the development of parent-infant and child interactions3, as research employing naturalistic recording techniques has shown that parents adjust their speech patterns to align with their infants’ linguistic abilities4. IDS is characterized by distinct acoustical features compared to adult-directed speech, including a wider pitch range5, increased intonation contours both in tonal and atonal languages6 and higher mean pitch7, adapted to the infant’s age and feedback8. Additionally, IDS features a slower speech tempo, longer pauses, and higher segmentation between utterances7.

These prosodic characteristics reflect also parental affect and intentions, providing an ideal context for studying the affective dynamics that regulate infant development.

Infants’ age is one of the best predictors of changes in parent-directed speech9, especially when compared to other factors such as socioeconomic status or other family characteristics10,11.

As infants grow, the articulation rate and vowel duration in IDS gradually become more similar to adult-directed speech and the mean length of adult utterances increases, reflecting parents’ adaptation to the child’s developing language and communication skills12. While paternal IDS has been the subject of some research, most studies predominantly focus on maternal vocalizations13,14.

Research comparing maternal and paternal directed speech yields inconsistent results. Some studies have found similarities: both mothers and fathers use shorter utterances, longer pauses, higher mean fundamental frequency, and greater variability in IDS compared to adult-directed speech. Nevertheless, only mothers utilized a broader pitch range, signifying nuanced distinctions in parental engagement in IDS15.

Rosslund et al.16 investigated Norwegian parents engaging with their 8-month-old infants and discovered that both mothers and fathers demonstrated comparable acoustic alterations in their IDS marked by heightened prosody and broadened vowel spaces.

Research indicates that during three-person interactions, both mothers and fathers tend to reduce the quantity of their speech directed toward toddlers, likely to prevent overstimulation. This reduction manifests as shorter utterances and fewer words. Notably, fathers exhibit a more pronounced decrease in verbal communication compared to mothers in these settings17.

In contrast, other studies report minimal differences, such as mothers displaying a broader pitch range than fathers18, or substantial differences indicating that fathers may not adjust their prosody to the same extent as mothers when communicating with their toddlers19.

The nature of parental speech is also modified, with more facilitative speech (use of open-ended questions) in three-person context17,20, and less lexical variety in the words used10. Finally, both parents use less communicative gestures in the three-person context21.

Research comparing mothers’ and fathers’ IDS, in two- and three-person contexts is rather limited in the literature, and mainly focused on the amount – duration and frequency – of IDS for each parent in the different conditions16,17,20,21. Less attention has been given to the acoustic features of paternal and maternal IDS in both two- and three-person contexts, and the present study aims to address this gap.

In the present study we specifically focused on prosody, as an interface between language and affect22, and we hypothesized that a comprehensive perspective, including the characteristics of the interactional context (two- and three-person interaction) and the parental gender (mother or father) could provide new insights into the dynamic function of IDS across infant’s development.

The present research aimed at investigating the vocal prosody characteristics during early interactions between parents and their children, namely intensity, fundamental frequency, intensity/fundamental frequency variability, formant amplitude, and number and length of voiced and unvoiced segments per second. We specifically aimed at identifying the acoustical characteristics of parental vocalizations at three different timepoints, 3, 9, and 18 months of age, and investigating the differential effect of two-person (one parent interacting with the baby alone) or three-person (one parent interacting with the baby alone, with the other parent observing) contexts in these timepoints. Note that in the three-person context both father and mother are present, but they interact one at a time with the infant, while the other parent is a passive observer.

Methods

Population

The present study is part of a larger study examining the association between parental depressive symptoms and coparenting behaviors in three-person contexts23. All parents included in this subset were “low-risk” for depressive symptoms based on clinical screening. Low-risk” refers to families without significant clinical, socioeconomic, or psychological risk factors that could influence the study outcomes. Data were collected in a low-risk sample of 69 two-parent families living in the French-speaking part of Switzerland. Study inclusion criteria were that (a) both parents and the children had to live in the same household, and that (b) families had to be fluent in French (the language of all testing material).

Informed consents were obtained from all the subjects involved and from parents of the infants involved in this study.

SES was operationalized through parental education and income levels. The participants’ socioeconomic status varied from lower to upper middle class based on Hollingshead’s two-factor classification, with a majority of fathers and mothers classified as upper middle class (61.2% and 68.2%, respectively) and nearly half of the families having both parents in the upper middle class (49.2%).

The characteristics of the included population is summarized in Table 1.

Table 1.

Population characteristics.

(n = 69)
Socioeconomical status (%) 29.84 (1.97)
Maternal age at test (years), range, M (SD) 23–41, 32.3 (4.4)
Paternal age at test (years), range, M (SD) 23–54, 34.9 (5.8)
Couple status (%) Married (74.6%)
Sex (%) Girls (46.4%)
Primiparous mothers 68.2%
Infant’s age at test (days), M (SD) 98.7 (9.5)

Study design and procedure

The 69 families were involved at three different time points: 3 months (41 families, 163 videos), 9 months (39 families, 165 videos), and 18 months of age (45 families, 159 videos), for a total of 487 videos. Every family is present at least in one time point.

At each assessment time point, families participated in multiple interaction contexts.

The three-person condition in the LTP scenario is divided into four segments: (a) one parent engages with the child (the active parent role), while the other assumes a passive participant-observer role; (b) the parents switch roles; (c) all three engage in play together; and (d) the parents converse, temporarily leaving the child on their own.

We used the expression “three-person” to clearly distinguish the parent-infant interactions that occurs in the presence of the second parent (three-person as a 2 + 1) from those occurring in the absence of the second parent (“purely” dyadic). Even though the second parent is supposedly passive, previous research suggests that the mere presence of the second parent influences the interacting dyad, which makes this context a three-person context, even though only a dyad is active, as it has been shown in classical studies on early family interactions24,25.

A research assistant provides instructions to the parents, then leaves the experimental room and signals the transition from one segment to the next by activating a light. The order of the four phases is randomized and counterbalanced. Note that for the present study among the four sequences of the three-person condition in the LTP we only used the two sequences where one parent interacts with the child while the other parent observes. These two sequences were then compared with the two dyadic conditions (mother – baby and father – baby) where each parent interacts with the baby, in absence of the other parent. The order of the first three parts of the LTP is randomized to eliminate order effects, while the last part consistently occurs last due to its potential to cause distress, particularly for younger infants. Father-child and mother-child dyadic free play sessions take place in the same setting as the LTP at each time point, with each parent alone in the room during play.

The final sample of the parental vocalizations in the different conditions was constituted by 8807 vocalisations (1760 at 3 months, 2298 at 9, and 4749 at 18 months).

A detailed description of the vocalisations is reported in Table 2.

Table 2.

Characteristics of mothers’ and fathers’ vocalizations by condition (two- versus three-person condition).

Age (months) 3 9 18
Mother in two-person condition, N (mean, SD in sec) 483 ( m = 0.34, 0.39) 776 ( m = 0.30, 0.25) 983 ( m = 0.31, 0.25)
Mother in three-person condition, N (mean, SD in sec) 393 ( m = 0.38, 0.41) 450 ( m = 0.35, 0.41) 1322 ( m = 0.34, 0.33)
Father in two-person condition, N (mean, SD in sec) 462 ( m = 0.31, 0.39) 653 ( m = 0.28, 0.30) 1099 ( m = 0.28, 0.24)
Father in three-person condition, N (mean, SD in sec) 422 ( m = 0.35, 0.29) 419 ( m = 0.36, 0.45) 1345 ( m = 0.31, 0.30)

Observation setting

Parents and children were asked to interact following the scenario of the Lausanne Trilogue Play26,27. The LTP is conducted in a laboratory setting, where both parents participate in a four-part scenario with the child, as described in the previous paragraph.

The family members are seated in an equilateral triangular formation, maintaining a distance of 80 cm between the centers of each seat to facilitate interaction. The infant is positioned in a baby chair that can be adjusted to three orientations: facing one parent, facing the other, or positioned between both parents. The chair can be adjusted to a forward (“sit”) or backward (“lay”) position. Parents were instructed that they could change the position of the infant chair as they saw fit.

The room was empty, and parents were asked to interact without objects, except at 18 months, where a set of standardized objects (3 stuffed animals, 3 spoons, and 3 socks) is at their disposal. Parents were instructed that they could play with the infant as much as possible (given the context) as they would normally do at home.

The experimenter exited the room after giving the instructions to parents (for details see Procedure). Interactions were filmed with 5 cameras, and the audio was recorded using a high-sensitivity condenser microphone (AudioPro CX-1000) suspended from the ceiling, featuring a cardioid polar pattern to minimize ambient noise and enhance voice clarity.

Vocalizations analysis

In the three-person context (mother, father, and child), two conditions were identified: one parent interacting with the child while the other parent observes (father-to-child + mother; mother-to-child + father). In the two-person context, two conditions (mother-to-child and father-to-child) were also identified. In total, four conditions were thus identified.

All the sequences started when the parent first vocalized to the child and ends just before the other parent began his or her turn to speak.

The vocalizations were identified on audio tracks and manually coded using ELAN software28. Vocal superpositions and noise segments were classified and excluded from subsequent acoustic analysis. Manual checks were performed to exclude segments containing infant vocalizations and potential noise interferences. Adult-directed speech was excluded based on context information, including posture and content elements. ELAN codings were imported into Audacity software and subsequently extracted into multiple segments for acoustic analysis. Microphone distances were kept constant across sessions to minimize variability in loudness measures.

Acoustic measures

Acoustic characteristics of parent-child interaction vocal sequences were analyzed with the OpenSMILE software29, which allowed the extraction of 62 acoustic parameters (e.g., F0 semitone, loudness, Hammarberg index, etc.) from vocal extracts. We analyzed acoustical parameters in two main domains. First, we examined global acoustical features, identifying relevant parameters through principal component analysis and factor analysis using Statistica 14.0.0.15.

The first factor, labeled “Intensity,” is primarily composed of parameters related to loudness, including the mean, 50th and 80th percentiles, range, and mean rising, as well as the alpha ratio and Hammarberg index of voiced segments (loadings > 0.70). The second factor, ‘Formant Amplitude,’ includes mean formant amplitudes (F1, F2, and F3). Formant amplitude refers to the intensity of specific resonant frequencies in speech. It indicates how strongly a particular formant is resonant and stands out. When the formant amplitude is higher, the sound becomes more prominent in the speech signal, which can improve intelligibility and enhance the overall clarity of spoken language. Additionally, formant amplitude contributes to emotional expressiveness. Typically, in IDS, caregivers naturally exaggerate formant amplitude, making their speech more engaging and attention-grabbing for infants. The mean length of unvoiced segments—fricatives and stops—is also considered along with these formants. The duration and frequency of unvoiced segments can affect speech rhythm and flow, adding prosodic variation to the listener’s experience. Together, formant amplitudes and the length of unvoiced segments offer a comprehensive view of the acoustic characteristics that shape the way speech is heard and understood, particularly in contexts involving IDS where these elements are often modulated to facilitate communication and speech comprehension. The third factor, referred to as the “Pitch Factor,” is organized around F0 parameters (mean, 20th, 50th, and 80th percentiles) and the harmonics-to-noise ratio (mean). Finally, the fourth factor is characterized by high loading for F0 variability and, to a lesser extent, loudness variability (loading of 0.61). For detailed information on all loadings, please refer to Table 2 in the Supplementary Material.

Finally, we specifically focused on voiced and unvoiced segments, as these are indicative of the rhythmic aspects of infant-directed speech, which are crucial in early parent-infant interactions30,31.

It is important to notice that the present analysis focuses on the suprasegmental aspects of phrasal-level timing to capture the dynamic properties of IDS prosody. Specifically, the temporal measures employed in our study reflect patterns at the phrase level rather than individual syllabic rates. Within the phrase level, we introduced the voiced and unvoiced speech segment measures, which, in turn, affect syllable rate. In particular, in voiced segments, the vowels and voiced consonants are represented by vocal cord vibration. In the present analysis, the unvoiced segments can be both between syllables and between words and indicate the proportion of unvoiced fragments at a phrase level. A higher proportion of voiced segments in speech usually means a higher syllable rate at a given time. As opposite, unvoiced segments include pauses and voiceless consonants. Unvoiced segments often separate voiced elements. Speech intelligibility and rhythm depend on a higher proportion of unvoiced segments, especially extended pauses, but it can slow the syllable rate. Speech rhythm and fluency depend on voiced and unvoiced segment equilibrium and timing. An increase in voiced segments speeds up syllable rate, while an increase in unvoiced segments, such as pauses or silence, slows it32.

Statistical analysis

We performed general linear mixed models (GLMMs) using R (Version 4.0.0) within the RStudio environment, Version 1.2.5042;33. These models allowed us to account for the nested structure of our dataset, where observations are grouped within families and time points.

The statistical models included several fixed factors to examine the effects of key variables on the dependent measures. Specifically, Context (two modalities: two- and three-person), Parental Gender (two modalities: mother and father), and Conditions (four modalities: (a) one parent playing with the child while the other observes, (b) the second parent playing with the child while the first parent observes, (c) one parent playing with the child alone, and (d) the second parent playing with the child alone) were included as categorical predictors. Additionally, Child’s Age was treated as a fixed factor with three modalities (3, 9, and 18 months), representing the developmental trajectory of the infant over time.

These models were built incrementally, starting with a baseline model (model.lme0) containing only random intercepts for Time and FamilyID to account for repeated measures within families and across time points. Fixed effects were added sequentially to assess their contributions to model fit. Random effects were incorporated into the models to control for variability attributable to individual differences and repeated measurements. Random intercepts and slopes were specified for Time and Family ID to account for within-family variability across the different time points and conditions. The inclusion of random slopes ensures that the models account for differential trajectories among families, enhancing the robustness of the results.

Model diagnostics were conducted to ensure the validity of the GLMMs, including checks for multicollinearity, residual normality, and homoscedasticity. Post-hoc comparisons with appropriate corrections were applied to explore significant main effects and interactions.

Results

Voiced and unvoiced segments: child’s age, parental gender and context effect

Voiced segments per second, that are produced when the cord vibrates, are the vocalizations that the parents produce in a second, and the unvoiced segments indicate the vocal sounds when the vocal cords do not vibrate such as in whispering voice or silence and are measures in terms of length in a second within the selected utterances. In the present analysis a specific attention has been dedicated to these parameters because they constitute a proxy of rhythmic patterns which have been shown to be crucial for describing IDS34.

During early vocal interactions, silence allows synchrony in the dyad by creating natural pauses for turn-taking, enabling infants to process auditory input, anticipate responses, and engage actively in the exchange35. Pauses in speech facilitate turn-taking, enabling both parents and infants to manage the interaction’s rhythm, prevent overwhelming the infant, and enhance mutual attentiveness36. Turn-taking is crucial for sustaining the rhythm of interaction, with both participants (parent and child). Parents can utilize moments of silence to assess their child’s responses and modify their communication accordingly37. Beebe et al.38 assert that these silent intervals of mutual interaction facilitate infants’ understanding of conversational dynamics, thereby enhancing their social and communicative abilities. Thus, silence is not simply the lack of sound but an integral element of significant, responsive interactions that foster the child’s communicative development39.

As predicted, a significant effect of age on number of voiced segments per second (X2(2) = 7.84, p = 0.02) has been validated. More specifically, values significantly increased between 9 and 18 months (X2(1) = 7.64, p = 0.005), see Fig. 1a.

Fig. 1.

Fig. 1

Voiced segments per second across different conditions. Panel (a) shows the voiced segments per second in all conditions at three time points: 3, 6, and 18 months. Panel (b) compares the voiced segments per second in mother-infant and father-infant directed speech. Panel (c) presents the comparison between the two conditions (two- and three-person) for both mothers and fathers across the different time points. Data points represent individual participants, and error bars indicate standard error. Differences in voiced segments per second across conditions are interpreted in the context of parental interaction styles and the child’s age during observation.

The double interaction parents (father/mother) x age showed a significant effect on voiced segments per second (X2(2) = 7.73, p = 0.02). More specifically, the contrast was marginally significant at 3 versus 18 months (X2(1) = 2.88, p = 0.09), and significant in the 9 to 18 months’ contrast (X2(1) = 6.6, p = 0.01), with higher values of voiced segments per second for fathers, see Fig. 1b. The double interaction condition (two- and three-person) x age also showed a significant effect on voiced segment per second (X2(2) = 7.18, p = 0.02). More specifically, the contrast was significant at 3 versus 18 months (X2(1) = 7.07, p = 0.007), with increased values of voiced segments per second in the three-person context, see Fig. 1c.

No significant effect of age has been detected for the length of voiced segments.

A significant effect of child’s age on the length of unvoiced segments (X2(2) = 13.17, p = 0.001) has been found. More specifically, values significantly increased between 3 and 9 months (X2(1) = 7.60, p = 0.006) and decreased between 9 and 18 months (X2(1)   = 12.35, p = 0.0004), see Fig. 2a.

Fig. 2.

Fig. 2

Length of unvoiced segments across age and different conditions. Panel (a) shows the significant effect of child age on the length of unvoiced segments, with values increasing between 3 and 9 months, and decreasing between 9 and 18 months. Panel (b) presents the interaction between parent (mother/father) and age, highlighting significantly longer unvoiced segments for fathers at 9 months compared to other time points. Panel (c) illustrates the interaction between condition (two- and three-person) and age, with significantly longer unvoiced segments in the two-person condition, especially between 3 and 9 months, and 3 and 18 months. Data points represent individual participants, and error bars indicate standard error.

The double interaction parents (father/mother) x age showed a significant effect on the length of unvoiced segments (X2(2)  = 12.00, p = 0.002). More specifically, the contrast was significant at 3 versus 9 months (X2(1)  = 7.90, p = 0.005), and at 9 versus 18 months’ contrast (X2(1) = 10.36, p = 0.001), with higher values of the length of unvoiced segments per second for fathers at 9 months, see Fig. 2b. The double interaction condition (two- and three-person) x age also showed a significant effect on the length of unvoiced segments (X2(2) = 10.25, p = 0.005). More specifically, the contrast was significant at the 3 versus 9 months (X2(1)  = 9.08, p = 0.002), and at 3 versus 18 months (X2(1)  = 7.66, p = 0.006), with higher values of the length of unvoiced segments per second in the two-person condition, see Fig. 2c.

Intensity: child’s age, parental gender and context effect

A significant general effect of age on intensity (X2(2)   = 176.47, p < 0.001) has been found. More specifically, values significantly increased between 3 and 9 months (X2(1) = 176.09, p < 0.001), and between 9 and 18 months (X2(1) = 77.78, p < 0.001), see Fig. 3a.

Fig. 3.

Fig. 3

Intensity of vocalizations across age and different conditions. Panel (a) shows the general effect of age on intensity, with significant increases observed between 3 and 9 months, and 9 and 18 months. Panel (b) presents the interaction between parent (mother/father) and age, highlighting a significant increase in intensity, particularly for mothers, between 3 and 9 months, and a marginal increase from 9 to 18 months. Panel (c) illustrates the interaction between condition (two- and three-person) and age, showing higher intensity values in the three-person condition at 18 months, with significant differences between 3 and 18 months, and 9 and 18 months. Data points represent individual participants, and error bars indicate standard error. Differences in intensity across age, parental roles, and conditions are discussed in the context of parental interaction styles and child development.

The double interaction parents (father/mother) x age showed a significant effect on intensity (X2(2)   = 15.73, p < 0.001). More specifically, the contrast was significant at 3 versus 9 months (X2(1) = 15.67, p < 0.001), and marginally significant in the 9 to 18 months’ contrast (X2(1) = 3.64, p = 0.05), with increased values of intensity in particular for mothers, see Fig. 3b. The double interaction condition (two- and three-person) x age also showed a significant effect on intensity (X2(2)   = 17.08, p < 0.001). More specifically, the contrast is significant at 3 versus 18 months (X2(1) = 9.72, p = 0.001), and at 9 versus 18 months (X2(1) = 12.54, p < 0.001), with higher values of intensity in the three-person condition at 18 months, see Fig. 3c.

Fundamental frequency (F0) and intensity/F0 variability related factors: child’s age, parental gender and context effect

A significant effect of age on F0 factor (X2(2)  , p = 0.009) has been found. More specifically, values significantly decreased between 9 and 18 months (X2(1), p = 0.002), see Fig. 4a.

Fig. 4.

Fig. 4

F0 and intensity variability across age groups and conditions. Panel (a) shows a significant decrease in F0 between 9 and 18 months. Panel (b) presents the significant increase in F0 and intensity variability from 9 to 18 months and across all age groups (3 to 18 months). Panel (c) displays the significant interaction between parent (father/mother) and age, with higher variability for mothers, particularly at 18 months. Data points represent individual participants, and error bars indicate standard error. Differences across conditions are discussed in relation to parental roles and child age.

Age has been identified as a significant factor influencing the variability of F0 and intensity.

(X2(2)  , p < 0.001). More specifically, values marginally increased between 3 and 9 months (X2(1), p = 0.009), significantly between 9 and 18 months (X2(1), p < 0.001), and between 3 and 18 months (X2(1), p < 0.001), see Fig. 4b.

The double interaction parents (father/mother) x age showed a significant effect on the variability of F0 and intensity (X2(2)  , p = 0.03). More specifically, the contrast was significant at 3 versus 18 months (X2(1), p = 0.03), and between 9 and 18 months (X2(1), p = 0.03), with consistent higher values of F0 and intensity variability across ages for mothers, in particular at 18 months. For details, see Fig. 4c.

Formant amplitude factor: child’s age, parental gender and context effect

A significant effect of age on formant amplitude factor (X2(2)   = 16.94, p < 0.001) has been found. More specifically, values significantly decreased between 3 and 9 months (X2(1) = 9.40, p = 0.002), and increased between 9 and 18 months (X2(1) = 13.01, p < 0.001), see Fig. 5a.

Fig. 5.

Fig. 5

Formant amplitude factor across age, parental gender, and interaction context. Panel (a) shows a significant decrease in formant amplitude factor between 3 and 9 months, followed by a significant increase between 9 and 18 months. Panel (b) presents the significant interaction between parent (father/mother) and age, with fathers showing lower values at 9 months. Panel (c) displays the significant interaction between condition (two- and three-person) and age, with lower formant amplitude factor values in the two-person condition, particularly at 9 months. Data points represent individual participants, and error bars indicate standard error. Differences across conditions are discussed in relation to parental gender, interaction context, and child age.

The double interaction parents (father/mother) x age showed a significant effect on formant amplitude factor (X2(2)   = 6.50, p = 0.03). More specifically, the contrast was significant at 3 versus 9 months (X2(1) = 6.26, p = 0.01), and at 3 versus 18 months (X2(1) = 4.03, p = 0.04), with lower values of formant amplitude factor for fathers at 9 months, see Fig. 5b. The double interaction condition (two- and three-person) x age also showed a significant effect on formant amplitude factor (X2(2)   = 13.28, p = 0.001). More specifically, the contrast was significant at 3 versus 9 months (X2(1) = 12.92, p < 0.001), and in the 9 to 18 months’ contrast (X2(1) = 6.70, p = 0.009), with lower values of formant amplitude factor in the two-person condition, in particular at 9 months, see Fig. 5c.

Discussion

In the present study we demonstrated that the child’s age, in two- and three-person contexts, modulated multiple acoustical characteristics of maternal and paternal prosody in IDS, namely number and length of voiced and unvoiced segments per second, intensity, fundamental frequency, intensity/fundamental frequency variability, and formant amplitude factors.

Age affects parental prosody

In parental prosody, the intensity factor, pitch factor, and their variability increased from 3 to 9 months of child age. These acoustical characteristics represent the core elements of the IDS, as well as the most salient indicators of expressiveness40.

Results suggest a general increase in shared dyadic arousal during the first months of life. This is in accordance with the infant’s shift in the amount of time they spend in an active state and their ability to maintain progressively longer attentional states during development41. Interestingly, while the intensity and the pitch/intensity variability factors continue to increase up to 18 months, pitch factor starts decreasing at 9 months of the infant’s age. Moreover, at this age, interesting results emerged when considering the other acoustical parameters that describe the prosodic aspects of speech29. Instead of following a linear progression across age, IDS showed distinctive acoustical features – lower syllable rate, longer pauses, higher pitch and lower formant amplitude – when infants were 9 months old compared to 3 and 18 months. Nine months seems to be a crucial age for fathers and mothers’ vocal adjustment in two- and three-person contexts, just prior to the appearance of the first words.

From 2 to 3 to 9 months, the frequency of infant’s crying decreases abruptly and speechlike vocalizations become more common their vocal repertoires42. Moreover, around 8–9 months, infants reach the age of maximum exploitation of the variability and richness of their vocal nonverbal repertoire during the dyadic interaction, before the speech onset, this last requiring a specialization in the production of the sounds of their own language43. This phenomenon is perfectly in line with our finding pointing out the decrease, at this age, of the syllables rates in IDS and with longer pauses, allowing thus infants to actively intervene in the protoconversation. Moreover, even if at 3 months three-person patterns of communication are already in place, between 3 and 9 months, the developmental trajectory includes more complex pattern of parent-infant-object communication44. This last developmental shift can also explain the decrease of adult speech rates and increase of silence.

At the same age, we also observed the maximum peak of pitch and the minimum level of formant amplitude factors. Formant amplitudes refer to the resonant frequencies or pitch overtones of the vocal tract with the highest amplitude, and they vary based on the vocal tract’s configuration for producing different types of voiced sounds, particularly vowels. The inverse relationship between lower syllable rates and higher pitch here suggest a potential connection between the acoustical characteristics of IDS and the articulation of diverse vocal sounds, which contribute to determinate the affective speech content29.

Parental gender differently modulates the age effect on prosody

By adopting a comprehensive perspective that includes parental gender, we demonstrate that the age effect on prosody development is modulated differently based on parental gender.

Fathers determined the decrease of syllables and the increase of pauses from 3 to 9 months, before leading an important increase in syllables per second at 18 months, with the consequent decrease of pauses. Mothers, on the contrary, determined the constant increase of intensity and of variability in pitch and intensity factors, here interpreted as an increase of emotional content in voices.

We can argue that fathers were particularly sensitive to the 9 months age, as reflected in the equilibrium between words and pauses, while mothers tend to follow the infant age increase with a constant intensification of emotional cues in voice.

From a developmental standpoint, the present results confirm that fathers and mothers modulate differently their IDS when communicating with their infant across development. Indeed mothers, but not fathers, increased their fundamental frequency when addressing their toddlers19.

From a social standpoint, the period of 9 months in the infant’s age could be crucial in determining the quality of the father-infant relationship. One possible explanation is related to the social context, as the included mothers in the considered sample gradually start working full-time again, leading to a more balanced distribution of childcare responsibilities in terms of domestic time. Given the progressive increase in the fathers’ role in parenting and family health45,46, future studies should investigate how the different phases of child development articulate within the actual three-person configuration in child care.

Two- and three-person contexts differently modulate the age effect on prosody

When comparing the two extreme ages, 3 and 18, in the three-person condition we found higher values of syllables per second and higher levels of intensity factors in the IDS, while in the same age comparison, the dyadic condition modulates by increasing silences as infants grow older.

Thus, not only the total duration of vocalizations by fathers and mothers are higher in the three-person context as demonstrated in previous literature19, but here we show that also within each vocalization the pauses are shorter in the three-person context, confirming that this last is more stimulating overall, during infant’s development, with a pick at 9 months.

This effect could be explained by the simultaneous presence of three interacting partners, but it points also out the fact that the effect of the age on the IDS is nonlinear when considering a larger familiar context and it should be analyzed in a more ecological condition, where both parents interact with their infant.

Limitations

While our study provides valuable insights into parental vocalizations across different interactional contexts, several limitations must be considered.

One methodological limitation is the absence of a baseline-control condition for ADS. While we acknowledge the value of such a comparison for isolating infant-specific prosodic changes over time, we opted not to include an ADS condition to focus on naturalistic parent-infant interactions. Additionally, ADS can vary significantly based on contextual and relational factors, making it challenging to establish a definitive baseline in our corpus.

Another limitation is that we did not assess the potential influence of speech content on prosodic features, including formant frequencies. While content variability is a natural aspect of parent-infant interactions, and our study aimed to capture real-world dynamics, future research could benefit from addressing this limitation. For instance, controlling for speech content could introduce artificial constraints that may diminish the ecological relevance of the findings. Furthermore, although we acknowledge that different vowels have distinct formant values, our analysis focused on aggregated prosodic patterns rather than isolated phonemic characteristics. We used a Principal Component Analysis (PCA) approach to mitigate the potential confounding effects of individual vowel variations. However, future studies might integrate methods such as automated vowel classification or standardized tasks to better isolate the relationship between speech content and prosodic features while maintaining ecological validity.

We note that our findings on fundamental frequency patterns may diverge from those of previous studies, such as Kondaurova et al. (2023), which found that mothers (but not fathers) increased their F0 when addressing toddlers. However, differences in study populations, including age and cultural/linguistic context, may explain these discrepancies. Kondaurova et al. (2023) focused on American English-speaking fathers of 29–40 month-old toddlers, whereas our study involved a different demographic with potentially different linguistic and cultural backgrounds. This highlights the importance of considering contextual factors when interpreting such claims, and we will need a more varied corpus of ID vocalizations to investigate the role of age, culture, and language in shaping parental vocalizations.

Moreover, we did not assess, and integrate in the analyses, the amount of time spent with the infant, which could have impacted the vocalization patterns of both mothers and fathers. The study’s context—conducted in a WEIRD (Western, Educated, Industrialized, Rich, and Democratic) society—also may limit the generalizability of the findings to non-WEIRD populations. Future research would benefit from exploring these contextual factors and their potential influence on the vocalization patterns of parents in different cultural settings.

Finally, A limitation of this study is the lack of a detailed analysis of socioeconomic status (SES). However, the sample population was relatively homogeneous in terms of SES, as assessed using Hollingshead’s two-factor classification. Most fathers (61.2%) and mothers (68.2%) were classified as upper middle class, with nearly half of the families (49.2%) having both parents in this category.

Conclusions

The current study highlights the significance of adopting a holistic approach in analyzing infant-directed speech within a naturalistic, interactive context involving both fathers and mothers in two- and three-person context. Future studies on infant-directed speech should delve deeper into the impact of the infant’s age, especially when infants reach 9 months, on maternal and paternal vocal utterances during naturalistic interactions like shared play or shared book reading, fostering vocal interplay between partners.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (29.4KB, docx)

Acknowledgements

We wish to thank all the families involved in this study. Lucas Tamarit for his precious support in acoustical analysis.

Author contributions

F.M., H.T., N.F. and D.G. conceptualized and designed the study. H.T. and T.M. collected and organized the data. D.G. conducted data analysis, F.M. and N.F. wrote the main manuscript text. D.G. prepared Figs. 1, 2 and 3. All authors reviewed and approved the final manuscript.

Funding

NCCR Evolving Language, SNF grant n°32003B_125493.

Data availability

The datasets used in the current study are available from the corresponding author upon reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Ethics approval

The present research was conducted in accordance with APA ethical standards in the treatment of the study sample. The study and its protocol were approved by the Ethical Committee of the Faculty of Biology and Medicine of the State of Vaud, Switzerland (project number: 249/07).

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Warlaumont, A. S., Richards, J. A., Gilkerson, J. & Oller, D. K. A social feedback loop for speech development and its reduction in autism. Psychol. Sci.25 (7), 1314–1324 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Nathani, S., Ertmer, D. J. & Stark, R. E. Assessing vocal development in infants and toddlers. Clin. Linguist. Phon.20 (5), 351–369 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Stern, D. N., Spieker, S. & MacKain, K. Intonation contours as signals in maternal speech to prelinguistic infants. Dev. Psychol.18 (5), 727 (1982). [Google Scholar]
  • 4.Gilkerson, J. et al. Mapping the early Language environment using all-day recordings and automated analysis. Am. J. speech-language Pathol.26 (2), 248–265 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fernald, A. Intonation and Communicative Intent in Mothers’ Speech to Infants: Is the Melody the Message? Child Dev., 1497 – 510. (1989). [PubMed]
  • 6.Kitamura, C., Thanavishuth, C., Burnham, D. & Luksaneeyanawin, S. Universality and specificity in infant-directed speech: pitch modifications as a function of infant age and sex in a tonal and non-tonal Language. Infant Behav. Dev.24 (4), 372–392 (2001). [Google Scholar]
  • 7.Stern, D. N., Spieker, S., Barnett, R. & MacKain, K. The prosody of maternal speech: infant age and context related changes. J. Child Lang.10 (1), 1–15 (1983). [DOI] [PubMed] [Google Scholar]
  • 8.Smith, N. A. & Trainor, L. J. Infant-directed speech is modulated by infant feedback. Infancy13 (4), 410–420 (2008). [Google Scholar]
  • 9.Genovese, G. et al. Infant-directed speech as a simplified but not simple register: a longitudinal study of lexical and syntactic features. J. Child Lang.47 (1), 22–44 (2020). [DOI] [PubMed] [Google Scholar]
  • 10.Bingham, G. E., Kwon, K-A. & Jeon, H-J. Examining relations among mothers’, fathers’, and children’s Language use in a dyadic and triadic context. Early Child. Dev. Care. 183 (3–4), 394–414 (2013). [Google Scholar]
  • 11.Hoff, E. How social contexts support and shape Language development. Dev. Rev.26 (1), 55–88 (2006). [Google Scholar]
  • 12.Cox, C. et al. A systematic review and bayesian Meta-Analysis of the acoustic features of Infant-Directed speech. Nat. Hum. Behav., 1–20 (2022). [DOI] [PubMed]
  • 13.Spinelli, M., Fasolo, M. & Mesman, J. Does prosody make the difference? A meta-analysis on relations between prosodic aspects of infant-directed speech and infant outcomes. Dev. Rev.44, 1–18 (2017). [Google Scholar]
  • 14.Ferjan Ramírez, N. Fathers’ infant-directed speech and its effects on child Language development. Lang. Linguistics Compass. 16 (1), e12448 (2022). [Google Scholar]
  • 15.Benders, T., StGeorge, J. & Fletcher, R. Infant-directed speech by Dutch fathers: increased pitch variability within and across utterances. Lang. Learn. Dev.17 (3), 292–325 (2021). [Google Scholar]
  • 16.Rosslund, A., Hagelund, S., Mayor, J. & Kartushina, N. Mothers’ and fathers’ infant-directed speech have similar acoustic properties, but these are not associated with direct or indirect measures of word comprehension in 8-month-old infants. J. Child Lang.51 (6), 1424–1449 (2024). [DOI] [PubMed] [Google Scholar]
  • 17.Nandy, A., Nixon, E. & Quigley, J. Communicative functions of parents’ child-directed speech across dyadic and triadic contexts. J. Child Lang.48 (6), 1281–1294 (2021). [DOI] [PubMed] [Google Scholar]
  • 18.Fernald, A. et al. A Cross-Language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. J. Child Lang.16 (3), 477–501 (1989). [DOI] [PubMed] [Google Scholar]
  • 19.Kondaurova, M. V., VanDam, M., Zheng, Q. & Welikson, B. Fathers’ unmodulated prosody in child-directed speech. J. Acoust. Soc. Am.154 (6), 3556–3567 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jean, Q. & Elizabeth, N. Parent child directed speech in dyadic and triadic interaction: associations with co-parenting dynamics and child Language outcomes. Early Child. Res. Q.58, 125–135 (2022). [Google Scholar]
  • 21.Morelli, M., Baiocco, R., Cattelino, E., Longobardi, E. & Mothers’ Fathers’ bimodal communication in dyadic and triadic interaction with children. FIRST Lang.43 (2), 158–177 (2023). [Google Scholar]
  • 22.Grandjean, D., Bänziger, T. & Scherer, K. R. Intonation as an interface between Language and affect. Prog. Brain Res.156, 235–247 (2006). [DOI] [PubMed] [Google Scholar]
  • 23.Tissot, H., Favez, N., Frascarolo, F. & Despland, J-N. Coparenting behaviors as mediators between postpartum parental depressive symptoms and toddler’s symptoms. Front. Psychol.7, 1912 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Riegel, K. F. & Meacham, J. A. The Developing Individual in a Changing WorldVol. 2: Social and environmental issues (Walter de Gruyter GmbH & Co KG, 2019).
  • 25.Clarke-Stewart, K. A. And daddy makes three: The father’s impact on mother and young child. In Thebeginning: Readings on Infancy: Columbia University, 204–215 (1982).
  • 26.Favez, N., Scaiola, C. L., Tissot, H., Darwiche, J. & Frascarolo, F. The family alliance assessment scales: steps toward validity and reliability of an observational assessment tool for early family interactions. J. Child Fam. Stud.20, 23–37 (2011). [Google Scholar]
  • 27.McHale, J. P., Favez, N. & Fivaz-Depeursinge, E. The Lausanne trilogue play paradigm: breaking discoveries in family process and therapy. J. Child Fam. Stud.27, 3063–3072 (2018). [Google Scholar]
  • 28.Wittenburg, P., Brugman, H., Russel, A., Klassmann, A. & Sloetjes, H. (eds) ELAN: A professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006).
  • 29.Eyben, F. et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput.7 (2), 190–202 (2015). [Google Scholar]
  • 30.Nguyen, T., Zimmer, L. & Hoehl, S. Your turn, my turn. Neural synchrony in mother–infant proto-conversation. Phil Trans. R Soc. B. 378 (1875), 20210488 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lester, B. M., Hoffman, J. & Brazelton, T. B. The rhythmic structure of mother-infant interaction in term and preterm infants. Child Dev., 15–27 (1985). [PubMed]
  • 32.Miller, J. L., Green, K. P. & Reeves, A. Speaking rate and segments: A look at the relation between speech production and speech perception for the voicing contrast. Phonetica43 (1–3), 106–115 (1986). [Google Scholar]
  • 33.Team, R. C. RA Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
  • 34.Goswami, U. Language acquisition and speech rhythm patterns: an auditory neuroscience perspective. Royal Soc. Open. Sci.9 (7), 211855 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Leclère, C. et al. Why synchrony matters during mother-child interactions: a systematic review. PloS One9 (12), e113571 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jaffe, J. et al. Rhythms of dialogue in infancy: Coordinated timing in development. Monographs of the society for research in child development, i-149 (2001). [PubMed]
  • 37.Goldstein, M. H. & Schwade, J. A. Social feedback to infants’ babbling facilitates rapid phonological learning. Psychol. Sci.19 (5), 515–523 (2008). [DOI] [PubMed] [Google Scholar]
  • 38.Beebe, B. et al. A systems view of mother–infant face-to-face communication. Dev. Psychol.52 (4), 556 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Keller, H., Lohaus, A., Völker, S., Cappenberg, M. & Chasiotis, A. Temporal contingency as an independent component of parenting behavior. Child Dev.70 (2), 474–485 (1999). [Google Scholar]
  • 40.Trainor, L. J., Austin, C. M. & Desjardins, R. N. Is infant-directed speech prosody a result of the vocal expression of emotion? Psychol. Sci.11 (3), 188–195 (2000). [DOI] [PubMed] [Google Scholar]
  • 41.Thelen, E. & Fogel, A. Toward an action-based Theory of Infant Development. Action in Social Context: Perspectives on Early Development, 23–63 (Springer, 1989).
  • 42.Yeni-Komshian, G. H., Kavanagh, J. F. & Ferguson, C. A. Child Phonology: Volume 1 (Academic, 2014).
  • 43.Stark, R. E., Bernstein, L. E. & Demorest, M. E. Vocal communication in the first 18 months of life. J. Speech Lang. Hear. Res.36 (3), 548–558 (1993). [DOI] [PubMed] [Google Scholar]
  • 44.Adamson, L. B. Communication Development during Infancy (Routledge, 2018).
  • 45.Myers, S. M. & Booth, A. Forerunners of change in nontraditional gender ideology. Soc. Psychol. Q., 18–37 (2002).
  • 46.Yogman, M. W. & Eppel, A. M. The role of fathers in child and family health. Engaged Fatherhood for Men, Families and Gender Equality: Healthcare, Social Policy, and Work Perspectives, 15–30 (2022).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (29.4KB, docx)

Data Availability Statement

The datasets used in the current study are available from the corresponding author upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES